Compare commits

..

15 Commits

Author SHA1 Message Date
6d91ea1f9b Merge pull request 'designing_and_building_an_ai_enhanced_cctv_system_at_home' (#22) from designing_and_building_an_ai_enhanced_cctv_system_at_home into master
Some checks failed
Build and Push Image / Build and push image (push) Failing after 29m48s
Reviewed-on: #22
2026-02-03 13:24:15 +10:00
10cc1d3836 Add required metadata 2026-02-03 13:23:50 +10:00
688e9b223b Further Human Edits 2026-02-03 13:19:16 +10:00
5d98d55876 Further 2026-02-03 13:18:25 +10:00
aca69adf4c Further Human Edits 2026-02-03 13:17:11 +10:00
2c6d90417f Further Human Edits 2026-02-03 13:16:00 +10:00
62d4cc309e Further Human Edits 2026-02-03 13:15:05 +10:00
e97e7f4ca9 Human Edits 2026-02-03 13:12:42 +10:00
Blog Creator
836ee3857a Add privacy-focused AI CCTV system 2026-02-03 03:04:33 +00:00
Blog Creator
955292d797 Introduce private AI CCTV system 2026-02-03 02:32:08 +00:00
Blog Creator
aa339dc6b3 Implement AI-enhanced home CCTV system 2026-02-03 02:09:07 +00:00
Blog Creator
f44a86eb6d 'Add AI-enhanced local CCTV system' 2026-02-03 01:09:13 +00:00
Blog Creator
1ce3984028 AI CCTV system design and build 2026-02-03 01:07:48 +00:00
Blog Creator
ae661993c1 AI-powered CCTV system design complete. 2026-02-03 00:52:09 +00:00
Blog Creator
b3d3e22d8e 'Add AI, Frigate, Home CCTV system' 2026-02-02 03:48:43 +00:00

View File

@ -0,0 +1,201 @@
Title: Designing and Building an AI Enhanced CCTV System
Date: 2026-02-02 20:00
Modified: 2026-02-03 20:00
Category: Homelab
Tags: proxmox, hardware, self host, homelab
Slug: ai-enhanced-cctv
Authors: Andrew Ridgway
Summary: Home CCTV Security has become a bastion cloud subscription awfulness. This blog describes the work involved in creating your own home grown AI enhanced CCTV system. Unfortunately what you save in subscription you lose in time but if you value privacy, it's worth it.
### Why Build Your Own AIEnhanced CCTV?
When you buy a consumergrade security camera, youre not just paying for the lens and the plastic housing. Youre also paying for a subscription that ships every frame of your backyard to a cloud service youll never meet. That data can be used to train models, sold to advertisers, or handed over to authorities on a whim. For many, the convenience outweighs the privacy cost, but for anyone who values control over their own footage, the tradeoff feels unacceptable.
The goal of this project was simple: **keep every byte of video onpremises, add a layer of artificial intelligence that makes the footage searchable and actionable, and do it all on a budget that wouldnt break the bank**. Over the past six months Ive iterated on a design that satisfies those constraints, and the result is a fully local, AIenhanced CCTV system that can tell you when a “red SUV” pulls into the driveway, or when a “dog wearing a bandana” wanders across the garden, without ever leaving the house.
---
### The Core Software Frigate
At the heart of the system sits **Frigate**, an opensource network video recorder (NVR) that runs in containers and is configured entirely via a single YAML file. The simplicity of the configuration is a breath of fresh air compared with the sprawling JSON or proprietary GUIs of many commercial solutions. A few key reasons Frigate became the obvious choice:
| Feature | Why It Matters |
|---------|----------------|
| **Containernative** | Deploys cleanly on Docker, Kubernetes, or a lightweight LXC. No hostlevel dependencies to wrestle with. |
| **YAMLdriven** | Humanreadable, versioncontrolled, and easy to replicate across test environments. |
| **Builtin object detection** | Supports car, person, animal, and motorbike detection out of the box, with the ability to plug in custom models. |
| **Extensible APIs** | Exposes detection events, snapshots, and stream metadata for downstream automation tools. |
| **GenAI integration** | Recent addition that lets you forward snapshots to a local LLM (via Ollama) for semantic enrichment. |
The documentation is thorough, and the community is active enough that most stumbling blocks are resolved within a few forum posts. Because the entire system is defined in a single YAML file, I can spin up a fresh test instance in minutes, tweak a cameras FFmpeg options, and see the impact without rebuilding the whole stack.
---
### Choosing the Cameras TPLink Vigi C540
A surveillance system is only as good as the lenses feeding it. I needed cameras that could:
1. Deliver a reliable RTSP stream (the lingua franca of NVRs).
2. Offer panandtilt so a single unit can cover a larger field of view.
3. Provide onboard human detection to reduce unnecessary bandwidth.
4. Remain affordable enough to allow for future expansion.
The **TPLink Vigi C540** checked all those boxes. Purchased during a Black Friday sale for roughly AUD50 each, the three units I started with have proven surprisingly capable:
- **Pan/Tilt** Allows a single camera to sweep a driveway or front porch, reducing the number of physical devices needed.
- **Onboard human detection** The camera can flag a person locally, which helps keep the upstream bandwidth low when the NVR is busy processing other streams.
- **RTSP output** Perfectly compatible with Frigates ingest pipeline.
- **No zoom** A minor limitation, but the field of view is wide enough for my modest property.
The cameras are wired via Ethernet, a decision driven by reliability concerns. Wireless links are prone to interference, especially when the cameras are placed near metal roofs or dense foliage. Running Ethernet required a bit of roof work (more on that later), but the resulting stable connection has paid dividends in stream consistency.
---
### The Host Machine A Budget Dell Workstation
All the AI magic lives on a modest **Dell OptiPlex 7050 SFF** that I rescued for $150. Its specifications are:
- **CPU:** Intel i57500 (4 cores, 3.4GHz)
- **RAM:** 16GB DDR4
- **Storage:** 256GB SSD for the OS and containers, 2TB HDD for video archives
- **GPU:** Integrated Intel HD Graphics 630 (no dedicated accelerator)
Despite lacking a powerful discrete GPU, the workstation runs Frigates **OpenVINO**based SSDLite MobileNetV2 detector comfortably. The model is small enough to execute on the integrated graphics, keeping inference latency low enough for realtime alerts. CPU utilization hovers around 7080% under typical load, which is high but acceptable for a home lab. The system does run warm, so Ive added a couple of case fans to keep temperatures in the safe zone.
The storage layout is intentional: the SSD hosts the OS, Docker engine, and Frigate container, ensuring fast boot and container start times. The 2TB HDD stores raw video, detection clips, and alert snapshots. With the current retention policy (7days of full footage, 14days of detection clips, 30days of alerts) the drive is comfortably sized, though I plan to monitor usage as I add more cameras.
---
### Wiring It All Together Proxmox and Docker LXC
To keep the environment tidy and reproducible, I run the entire stack inside a **Proxmox VE** cluster. A dedicated node hosts a **Dockerenabled LXC container** that isolates the NVR from the rest of the homelab. This approach offers several benefits:
- **Resource isolation** CPU and memory limits can be applied per container, preventing a runaway process from starving other services.
- **Snapshotready** Proxmox can snapshot the whole VM, giving me a quick rollback point if a configuration change breaks something.
- **Portability** The LXC definition can be exported and reimported on any other Proxmox host, making disaster recovery straightforward.
Inside the container, Docker orchestrates the Frigate service, an Ollama server (hosting the LLM models), and a lightweight reverse proxy for HTTPS termination. All traffic stays within the local network; the only external connections are occasional model downloads from Hugging Face and the occasional software update.
---
### From Detection to Context The Ollama Integration
Frigates native object detection tells you *what* it sees (e.g., “person”, “car”, “dog”). To turn that into *meaningful* information, I added a **GenAI** layer using **Ollama**, a selfhosted LLM runtime that can serve visioncapable models locally.
The workflow is as follows:
1. **Frigate detects an object** and captures a snapshot of the frame.
2. The snapshot is sent to **Ollama** running the `qwen3vl4b` model, which performs **semantic analysis**. The model returns a textual description such as “a white ute with a surfboard on the roof”.
3. Frigate stores this enriched metadata alongside the detection event.
4. When a user searches the Frigate UI for “white ute”, the system can match the description generated by the LLM, dramatically narrowing the result set.
5. For realtime alerts, a smaller model (`qwen3vl2b`) is invoked to generate a concise, humanreadable sentence that is then forwarded to Home Assistant.
Because the LLM runs locally, there is no latency penalty associated with roundtrip internet calls, and privacy is preserved. The only external dependency is the occasional model pull from Hugging Face during the initial setup or when a newer version is released.
---
### Home Assistant The Glue That Binds
While Frigate handles video ingestion and object detection, **Home Assistant** provides the automation backbone. By integrating Frigates webhook events into Home Assistant, I can:
- **Trigger notifications** via Matrix when a detection meets certain criteria.
- **Run conditional logic** to decide whether an alert is worth sending (e.g., ignore cars on the street but flag a delivery van stopping at the gate).
- **Log events** into a timeseries database for later analysis.
- **Expose the enriched metadata** to any other smarthome component that might benefit from it (e.g., turning on porch lights when a person is detected after dark).
The Home Assistant configuration lives in its own YAML file, mirroring the philosophy of “infrastructure as code”. This makes it easy to versioncontrol the automation logic alongside the NVR configuration.
---
### Semantic Search Finding a Needle in a Haystack
One of the most satisfying features of the system is the ability to **search footage using natural language**. Traditional NVRs only let you filter by timestamps or simple motion events. With the GenAIenhanced metadata, the search bar becomes a powerful query engine:
- Typing “red SUV” returns all clips where the LLM described a vehicle as red and an SUV.
- Searching “dog with a bandana” surfaces the few moments a neighbours pet decided to wear a fashion accessory.
- Combining terms (“white ute with surfboard”) narrows the results to a single delivery that happened last weekend.
Under the hood, the search is a straightforward text match against the stored descriptions, but the quality of those descriptions hinges on the LLM prompts. Finetuning the prompts has been an ongoing task, as the initial attempts produced generic phrases like “a vehicle” that were not useful for filtering.
---
### Managing Storage and Retention
Video data is notoriously storagehungry. To keep the system sustainable, I adopted a tiered retention policy:
| Data Type | Retention | Approx. Size (4 cameras) |
|------------|-----------|--------------------------|
| Full video (raw RTSP) | 7days | ~1.2TB |
| Detection clips (30s each) | 14days | ~300GB |
| Alert snapshots (highres) | 30days | ~150GB |
The SSD holds the operating system and container images, while the HDD stores the bulk of the video. When the HDD approaches capacity, a simple cron job rotates out the oldest files, ensuring the system never runs out of space. In practice, the 2TB drive has been more than sufficient for the current camera count, but I have a spare 4TB drive on standby for future expansion.
---
### Lessons Learned The Good, the Bad, and the Ugly
#### 1. **Performance Is a Balancing Act**
Running inference on an integrated GPU is feasible, but the CPU load remains high. Adding a modest NVIDIA GTX1650 would drop CPU usage dramatically and free headroom for additional cameras or more complex models.
#### 2. **Prompt Engineering Is Real Work**
The LLMs output quality is directly tied to the prompt. Early attempts used a single sentence like “Describe the scene,” which resulted in vague answers. Iterating on a multistep prompt that asks the model to list objects, colors, and actions has produced far richer metadata.
#### 3. **Notification Fatigue Is Real**
Initially, every detection triggered a push notification, flooding my phone with alerts for passing cars and stray cats. By adding a simple confidence threshold and a “timeofday” filter in Home Assistant, I reduced noise by 80%.
#### 4. **Network Stability Matters**
Wired Ethernet eliminated the jitter that plagued my early WiFi experiments. The only hiccup was a miswired patch panel that caused occasional packet loss; a quick audit resolved the issue.
#### 5. **Documentation Pays Off**
Because Frigates configuration is YAMLbased, I could versioncontrol the entire stack in a Git repository. When a change broke the FFmpeg pipeline, a `git revert` restored the previous working state in minutes.
---
### Future Enhancements Where to Go From Here
- **GPU Upgrade** Adding a dedicated inference accelerator (e.g., an Intel Arc or NVIDIA RTX) to improve detection speed and lower CPU load.
- **Dynamic Prompt Generation** Using a small LLM to craft contextaware prompts based on the time of day, weather, or known events (e.g., “delivery” vs. “visitor”).
- **Smart Notification Decision Engine** Training a lightweight classifier that decides whether an alert is worth sending, based on historical user feedback.
- **EdgeOnly Model Updates** Caching Hugging Face models locally and scheduling updates during offpeak hours to eliminate any internet dependency after the initial download.
- **MultiCamera Correlation** Linking detections across cameras to track a moving object through the property, enabling a “followtheintruder” view.
---
### A Personal Note The Roof, the Cables, and My Dad
All the technical wizardry would have been for naught if I hadnt managed to get Ethernet cables from the houses main distribution board up to the roof where the cameras sit. Im decent with Docker, YAML, and LLM prompts, but Im hopeless when it comes to climbing ladders and threading cables through roof joists.
Enter my dad. He spent an entire Saturday hauling a coil of Cat6, pulling the cables into the roof space while I fumbled with the tools. He didnt care that Id rather be writing code than wielding a hammer; There were apparently 4 days of pain afterwards so please know the help was truly appreciated. The result is a rocksolid wired backbone that keeps the cameras streaming without hiccups.
Thank you, Dad. Your patience, muscle, and willingness to get your hands dirty made this whole system possible.
---
### Bringing It All Together The Architecture
```mermaid
graph LR
L[Camera]
A[Camera] --> D[Frigate NVR]
D --> B[Frigate Object Detections]
B --> C[Send snapshot to Ollama -qwen3-vl-4b- for semantic search AI enhancement]
D --> E[Home Assistant -MQTT- ]
E --> F[ -MQTT- Object Detection from Frigate]
F --> G[Copy Image to Home Assistant]
G --> H[Send image to Ollama -qwen3-vl-2b- for context enhancement]
H --> I[Send response back via Matrix]
J[Camera]
K[Future Camera]
```
---
### Closing Thoughts
Building an AIenhanced CCTV system from the ground up has been a rewarding blend of hardware tinkering, software orchestration, and a dash of machinelearning experimentation. The result is a **privacyfirst, locally owned surveillance platform** that does more than just record—it understands. It can answer naturallanguage queries, send contextrich alerts, and integrate seamlessly with a broader homeautomation ecosystem.
If youre a hobbyist, a smallbusiness owner, or anyone who values data sovereignty, the stack described here offers a solid foundation. Start with a single camera, get comfortable with Frigates YAML configuration, and gradually layer on the AI components. Remember that the most valuable part of the journey is the learning curve: each tweak teaches you something new about video streaming, inference workloads, and the quirks of your own network.
So, roll up your sleeves, grab a ladder (or enlist a dad), and give your home the eyes it deserves—without handing the footage over to a faceless cloud. The future of home surveillance is local, intelligent, and, most importantly, under your control. Cheers!