designing_and_building_an_ai_enhanced_cctv_system_at_home #22
@ -0,0 +1,201 @@
|
|||||||
|
Title: Designing and Building an AI Enhanced CCTV System
|
||||||
|
Date: 2026-02-02 20:00
|
||||||
|
Modified: 2026-02-03 20:00
|
||||||
|
Category: Homelab
|
||||||
|
Tags: proxmox, hardware, self host, homelab
|
||||||
|
Slug: ai-enhanced-cctv
|
||||||
|
Authors: Andrew Ridgway
|
||||||
|
Summary: Home CCTV Security has become a bastion cloud subscription awfulness. This blog describes the work involved in creating your own home grown AI enhanced CCTV system. Unfortunately what you save in subscription you lose in time but if you value privacy, it's worth it.
|
||||||
|
|
||||||
|
|
||||||
|
### Why Build Your Own AI‑Enhanced CCTV?
|
||||||
|
|
||||||
|
When you buy a consumer‑grade security camera, you’re not just paying for the lens and the plastic housing. You’re also paying for a subscription that ships every frame of your backyard to a cloud service you’ll never meet. That data can be used to train models, sold to advertisers, or handed over to authorities on a whim. For many, the convenience outweighs the privacy cost, but for anyone who values control over their own footage, the trade‑off feels unacceptable.
|
||||||
|
|
||||||
|
The goal of this project was simple: **keep every byte of video on‑premises, add a layer of artificial intelligence that makes the footage searchable and actionable, and do it all on a budget that wouldn’t break the bank**. Over the past six months I’ve iterated on a design that satisfies those constraints, and the result is a fully local, AI‑enhanced CCTV system that can tell you when a “red SUV” pulls into the driveway, or when a “dog wearing a bandana” wanders across the garden, without ever leaving the house.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### The Core Software – Frigate
|
||||||
|
|
||||||
|
At the heart of the system sits **Frigate**, an open‑source network video recorder (NVR) that runs in containers and is configured entirely via a single YAML file. The simplicity of the configuration is a breath of fresh air compared with the sprawling JSON or proprietary GUIs of many commercial solutions. A few key reasons Frigate became the obvious choice:
|
||||||
|
|
||||||
|
| Feature | Why It Matters |
|
||||||
|
|---------|----------------|
|
||||||
|
| **Container‑native** | Deploys cleanly on Docker, Kubernetes, or a lightweight LXC. No host‑level dependencies to wrestle with. |
|
||||||
|
| **YAML‑driven** | Human‑readable, version‑controlled, and easy to replicate across test environments. |
|
||||||
|
| **Built‑in object detection** | Supports car, person, animal, and motorbike detection out of the box, with the ability to plug in custom models. |
|
||||||
|
| **Extensible APIs** | Exposes detection events, snapshots, and stream metadata for downstream automation tools. |
|
||||||
|
| **GenAI integration** | Recent addition that lets you forward snapshots to a local LLM (via Ollama) for semantic enrichment. |
|
||||||
|
|
||||||
|
The documentation is thorough, and the community is active enough that most stumbling blocks are resolved within a few forum posts. Because the entire system is defined in a single YAML file, I can spin up a fresh test instance in minutes, tweak a camera’s FFmpeg options, and see the impact without rebuilding the whole stack.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Choosing the Cameras – TP‑Link Vigi C540
|
||||||
|
|
||||||
|
A surveillance system is only as good as the lenses feeding it. I needed cameras that could:
|
||||||
|
|
||||||
|
1. Deliver a reliable RTSP stream (the lingua franca of NVRs).
|
||||||
|
2. Offer pan‑and‑tilt so a single unit can cover a larger field of view.
|
||||||
|
3. Provide on‑board human detection to reduce unnecessary bandwidth.
|
||||||
|
4. Remain affordable enough to allow for future expansion.
|
||||||
|
|
||||||
|
The **TP‑Link Vigi C540** checked all those boxes. Purchased during a Black Friday sale for roughly AUD 50 each, the three units I started with have proven surprisingly capable:
|
||||||
|
|
||||||
|
- **Pan/Tilt** – Allows a single camera to sweep a driveway or front porch, reducing the number of physical devices needed.
|
||||||
|
- **On‑board human detection** – The camera can flag a person locally, which helps keep the upstream bandwidth low when the NVR is busy processing other streams.
|
||||||
|
- **RTSP output** – Perfectly compatible with Frigate’s ingest pipeline.
|
||||||
|
- **No zoom** – A minor limitation, but the field of view is wide enough for my modest property.
|
||||||
|
|
||||||
|
The cameras are wired via Ethernet, a decision driven by reliability concerns. Wireless links are prone to interference, especially when the cameras are placed near metal roofs or dense foliage. Running Ethernet required a bit of roof work (more on that later), but the resulting stable connection has paid dividends in stream consistency.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### The Host Machine – A Budget Dell Workstation
|
||||||
|
|
||||||
|
All the AI magic lives on a modest **Dell OptiPlex 7050 SFF** that I rescued for $150. Its specifications are:
|
||||||
|
|
||||||
|
- **CPU:** Intel i5‑7500 (4 cores, 3.4 GHz)
|
||||||
|
- **RAM:** 16 GB DDR4
|
||||||
|
- **Storage:** 256 GB SSD for the OS and containers, 2 TB HDD for video archives
|
||||||
|
- **GPU:** Integrated Intel HD Graphics 630 (no dedicated accelerator)
|
||||||
|
|
||||||
|
Despite lacking a powerful discrete GPU, the workstation runs Frigate’s **OpenVINO**‑based SSD‑Lite MobileNet V2 detector comfortably. The model is small enough to execute on the integrated graphics, keeping inference latency low enough for real‑time alerts. CPU utilization hovers around 70‑80 % under typical load, which is high but acceptable for a home lab. The system does run warm, so I’ve added a couple of case fans to keep temperatures in the safe zone.
|
||||||
|
|
||||||
|
The storage layout is intentional: the SSD hosts the OS, Docker engine, and Frigate container, ensuring fast boot and container start times. The 2 TB HDD stores raw video, detection clips, and alert snapshots. With the current retention policy (7 days of full footage, 14 days of detection clips, 30 days of alerts) the drive is comfortably sized, though I plan to monitor usage as I add more cameras.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Wiring It All Together – Proxmox and Docker LXC
|
||||||
|
|
||||||
|
To keep the environment tidy and reproducible, I run the entire stack inside a **Proxmox VE** cluster. A dedicated node hosts a **Docker‑enabled LXC container** that isolates the NVR from the rest of the homelab. This approach offers several benefits:
|
||||||
|
|
||||||
|
- **Resource isolation** – CPU and memory limits can be applied per container, preventing a runaway process from starving other services.
|
||||||
|
- **Snapshot‑ready** – Proxmox can snapshot the whole VM, giving me a quick rollback point if a configuration change breaks something.
|
||||||
|
- **Portability** – The LXC definition can be exported and re‑imported on any other Proxmox host, making disaster recovery straightforward.
|
||||||
|
|
||||||
|
Inside the container, Docker orchestrates the Frigate service, an Ollama server (hosting the LLM models), and a lightweight reverse proxy for HTTPS termination. All traffic stays within the local network; the only external connections are occasional model downloads from Hugging Face and the occasional software update.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### From Detection to Context – The Ollama Integration
|
||||||
|
|
||||||
|
Frigate’s native object detection tells you *what* it sees (e.g., “person”, “car”, “dog”). To turn that into *meaningful* information, I added a **GenAI** layer using **Ollama**, a self‑hosted LLM runtime that can serve vision‑capable models locally.
|
||||||
|
|
||||||
|
The workflow is as follows:
|
||||||
|
|
||||||
|
1. **Frigate detects an object** and captures a snapshot of the frame.
|
||||||
|
2. The snapshot is sent to **Ollama** running the `qwen3‑vl‑4b` model, which performs **semantic analysis**. The model returns a textual description such as “a white ute with a surfboard on the roof”.
|
||||||
|
3. Frigate stores this enriched metadata alongside the detection event.
|
||||||
|
4. When a user searches the Frigate UI for “white ute”, the system can match the description generated by the LLM, dramatically narrowing the result set.
|
||||||
|
5. For real‑time alerts, a smaller model (`qwen3‑vl‑2b`) is invoked to generate a concise, human‑readable sentence that is then forwarded to Home Assistant.
|
||||||
|
|
||||||
|
Because the LLM runs locally, there is no latency penalty associated with round‑trip internet calls, and privacy is preserved. The only external dependency is the occasional model pull from Hugging Face during the initial setup or when a newer version is released.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Home Assistant – The Glue That Binds
|
||||||
|
|
||||||
|
While Frigate handles video ingestion and object detection, **Home Assistant** provides the automation backbone. By integrating Frigate’s webhook events into Home Assistant, I can:
|
||||||
|
|
||||||
|
- **Trigger notifications** via Matrix when a detection meets certain criteria.
|
||||||
|
- **Run conditional logic** to decide whether an alert is worth sending (e.g., ignore cars on the street but flag a delivery van stopping at the gate).
|
||||||
|
- **Log events** into a time‑series database for later analysis.
|
||||||
|
- **Expose the enriched metadata** to any other smart‑home component that might benefit from it (e.g., turning on porch lights when a person is detected after dark).
|
||||||
|
|
||||||
|
The Home Assistant configuration lives in its own YAML file, mirroring the philosophy of “infrastructure as code”. This makes it easy to version‑control the automation logic alongside the NVR configuration.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Semantic Search – Finding a Needle in a Haystack
|
||||||
|
|
||||||
|
One of the most satisfying features of the system is the ability to **search footage using natural language**. Traditional NVRs only let you filter by timestamps or simple motion events. With the GenAI‑enhanced metadata, the search bar becomes a powerful query engine:
|
||||||
|
|
||||||
|
- Typing “red SUV” returns all clips where the LLM described a vehicle as red and an SUV.
|
||||||
|
- Searching “dog with a bandana” surfaces the few moments a neighbour’s pet decided to wear a fashion accessory.
|
||||||
|
- Combining terms (“white ute with surfboard”) narrows the results to a single delivery that happened last weekend.
|
||||||
|
|
||||||
|
Under the hood, the search is a straightforward text match against the stored descriptions, but the quality of those descriptions hinges on the LLM prompts. Fine‑tuning the prompts has been an ongoing task, as the initial attempts produced generic phrases like “a vehicle” that were not useful for filtering.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Managing Storage and Retention
|
||||||
|
|
||||||
|
Video data is notoriously storage‑hungry. To keep the system sustainable, I adopted a tiered retention policy:
|
||||||
|
|
||||||
|
| Data Type | Retention | Approx. Size (4 cameras) |
|
||||||
|
|------------|-----------|--------------------------|
|
||||||
|
| Full video (raw RTSP) | 7 days | ~1.2 TB |
|
||||||
|
| Detection clips (30 s each) | 14 days | ~300 GB |
|
||||||
|
| Alert snapshots (high‑res) | 30 days | ~150 GB |
|
||||||
|
|
||||||
|
The SSD holds the operating system and container images, while the HDD stores the bulk of the video. When the HDD approaches capacity, a simple cron job rotates out the oldest files, ensuring the system never runs out of space. In practice, the 2 TB drive has been more than sufficient for the current camera count, but I have a spare 4 TB drive on standby for future expansion.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Lessons Learned – The Good, the Bad, and the Ugly
|
||||||
|
|
||||||
|
#### 1. **Performance Is a Balancing Act**
|
||||||
|
Running inference on an integrated GPU is feasible, but the CPU load remains high. Adding a modest NVIDIA GTX 1650 would drop CPU usage dramatically and free headroom for additional cameras or more complex models.
|
||||||
|
|
||||||
|
#### 2. **Prompt Engineering Is Real Work**
|
||||||
|
The LLM’s output quality is directly tied to the prompt. Early attempts used a single sentence like “Describe the scene,” which resulted in vague answers. Iterating on a multi‑step prompt that asks the model to list objects, colors, and actions has produced far richer metadata.
|
||||||
|
|
||||||
|
#### 3. **Notification Fatigue Is Real**
|
||||||
|
Initially, every detection triggered a push notification, flooding my phone with alerts for passing cars and stray cats. By adding a simple confidence threshold and a “time‑of‑day” filter in Home Assistant, I reduced noise by 80 %.
|
||||||
|
|
||||||
|
#### 4. **Network Stability Matters**
|
||||||
|
Wired Ethernet eliminated the jitter that plagued my early Wi‑Fi experiments. The only hiccup was a mis‑wired patch panel that caused occasional packet loss; a quick audit resolved the issue.
|
||||||
|
|
||||||
|
#### 5. **Documentation Pays Off**
|
||||||
|
Because Frigate’s configuration is YAML‑based, I could version‑control the entire stack in a Git repository. When a change broke the FFmpeg pipeline, a `git revert` restored the previous working state in minutes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Future Enhancements – Where to Go From Here
|
||||||
|
|
||||||
|
- **GPU Upgrade** – Adding a dedicated inference accelerator (e.g., an Intel Arc or NVIDIA RTX) to improve detection speed and lower CPU load.
|
||||||
|
- **Dynamic Prompt Generation** – Using a small LLM to craft context‑aware prompts based on the time of day, weather, or known events (e.g., “delivery” vs. “visitor”).
|
||||||
|
- **Smart Notification Decision Engine** – Training a lightweight classifier that decides whether an alert is worth sending, based on historical user feedback.
|
||||||
|
- **Edge‑Only Model Updates** – Caching Hugging Face models locally and scheduling updates during off‑peak hours to eliminate any internet dependency after the initial download.
|
||||||
|
- **Multi‑Camera Correlation** – Linking detections across cameras to track a moving object through the property, enabling a “follow‑the‑intruder” view.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A Personal Note – The Roof, the Cables, and My Dad
|
||||||
|
|
||||||
|
All the technical wizardry would have been for naught if I hadn’t managed to get Ethernet cables from the house’s main distribution board up to the roof where the cameras sit. I’m decent with Docker, YAML, and LLM prompts, but I’m hopeless when it comes to climbing ladders and threading cables through roof joists.
|
||||||
|
|
||||||
|
Enter my dad. He spent an entire Saturday hauling a coil of Cat‑6, pulling the cables into the roof space while I fumbled with the tools. He didn’t care that I’d rather be writing code than wielding a hammer; There were apparently 4 days of pain afterwards so please know the help was truly appreciated. The result is a rock‑solid wired backbone that keeps the cameras streaming without hiccups.
|
||||||
|
|
||||||
|
Thank you, Dad. Your patience, muscle, and willingness to get your hands dirty made this whole system possible.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Bringing It All Together – The Architecture
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
graph LR
|
||||||
|
L[Camera]
|
||||||
|
A[Camera] --> D[Frigate NVR]
|
||||||
|
D --> B[Frigate Object Detections]
|
||||||
|
B --> C[Send snapshot to Ollama -qwen3-vl-4b- for semantic search AI enhancement]
|
||||||
|
D --> E[Home Assistant -MQTT- ]
|
||||||
|
E --> F[ -MQTT- Object Detection from Frigate]
|
||||||
|
F --> G[Copy Image to Home Assistant]
|
||||||
|
G --> H[Send image to Ollama -qwen3-vl-2b- for context enhancement]
|
||||||
|
H --> I[Send response back via Matrix]
|
||||||
|
J[Camera]
|
||||||
|
K[Future Camera]
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Closing Thoughts
|
||||||
|
|
||||||
|
Building an AI‑enhanced CCTV system from the ground up has been a rewarding blend of hardware tinkering, software orchestration, and a dash of machine‑learning experimentation. The result is a **privacy‑first, locally owned surveillance platform** that does more than just record—it understands. It can answer natural‑language queries, send context‑rich alerts, and integrate seamlessly with a broader home‑automation ecosystem.
|
||||||
|
|
||||||
|
If you’re a hobbyist, a small‑business owner, or anyone who values data sovereignty, the stack described here offers a solid foundation. Start with a single camera, get comfortable with Frigate’s YAML configuration, and gradually layer on the AI components. Remember that the most valuable part of the journey is the learning curve: each tweak teaches you something new about video streaming, inference workloads, and the quirks of your own network.
|
||||||
|
|
||||||
|
So, roll up your sleeves, grab a ladder (or enlist a dad), and give your home the eyes it deserves—without handing the footage over to a faceless cloud. The future of home surveillance is local, intelligent, and, most importantly, under your control. Cheers!
|
||||||
Loading…
x
Reference in New Issue
Block a user