Introduce private AI CCTV system

2026-02-03 02:32:08 +00:00 · 2026-02-03 02:32:08 +00:00 · 955292d797
commit 955292d797
parent aa339dc6b3
1 changed files with 104 additions and 34 deletions
--- a/src/content/designing_and_building_an_ai_enhanced_cctv_system_at_home.md
+++ b/src/content/designing_and_building_an_ai_enhanced_cctv_system_at_home.md
@ -1,49 +1,119 @@
-# Designing and Building an AI Enhanced CCTV System at Home
+### Introduction  
-The world of home security has come a long way from blinking red lights and hefty monthly fees. I've been exploring a more sophisticated approach – building a locally controlled, AI-enhanced CCTV system for my homelab. The goal? To move beyond simple surveillance and gain a deeper understanding of what's happening around my property, all while maintaining complete control over my data. It’s been a fascinating project, blending hardware, software, and a healthy dose of machine learning.
+Over the past half‑year I’ve been tinkering with a problem that many of us in the homelab community face every day: how to turn a handful of cheap IP cameras into a truly useful security system without surrendering our footage to a cloud service. The answer, after a lot of trial and error, is a stack that lives entirely on‑premises, talks to itself, and even adds a sprinkle of generative AI to make the alerts intelligible. In plain English, the system records video, spots objects, asks a local vision‑language model to describe what it sees, and then pushes a nicely worded notification to my phone. The result is a CCTV solution that is private, affordable, and surprisingly capable.
-The system centres around Frigate, an open-source Network Video Recorder (NVR) software. Frigate’s intuitive interface and container-native design make it surprisingly accessible, even for those of us who aren't coding wizards. The ability to configure it using simple YAML files is a game-changer, allowing for repeatable builds and easy experimentation. While the full configuration details remain private for now (a bit of tidying up is still needed!), my aim is to eventually share the setup publicly so others can benefit from it.
+### Why Local Control Matters  
-**The Hardware Foundation**
+The market is flooded with “smart” cameras that promise motion alerts, facial recognition, and cloud‑based video archives. Those devices are convenient, but they also hand over a valuable slice of your privacy to a third‑party. A compromised account or a mis‑configured API key can expose weeks of footage to the world. For a home that already has a smart‑home hub, a voice assistant, and a handful of IoT devices, adding another internet‑facing endpoint feels like inviting a stranger into the living room.
-My camera setup consists of three TP-Link ‘vigi’ C540 cameras, a bargain at AUD$50 each during a Black Friday sale. These cameras offer pan and tilt functionality, along with built-in human detection and tracking – features that have proven remarkably effective. While they lack zoom capabilities, they’re more than adequate for my surveillance needs.
+By keeping everything inside the home network we gain three things:
-**Frigate: The Core of the System**
+1. **Data sovereignty** – the video never leaves the LAN, so only the people you trust can view it.  
 2. **Predictable costs** – no monthly subscription for storage or AI inference; the only expense is the hardware you already own.  
 3. **Full customisation** – we can decide exactly how alerts are generated, what language they use, and when they are sent.
-Frigate acts as the central hub, handling video recording, metadata tagging, and integrating AI capabilities. I’m leveraging OpenVINO models for efficient on-device processing. This approach minimizes latency and reduces reliance on external services. The system is currently running on a Dell workstation powered by an Intel Core i5-7500 processor, a surprisingly capable machine I acquired for just $150. The “small” OpenVINO SSDLite model is providing good performance, although the system does run a little warm, averaging around 75% CPU usage.
+The only downside is the extra effort required to stitch the pieces together, but that’s where the fun begins.
-**AI-Powered Intelligence**
+### Choosing the Cameras  
-What truly sets this system apart is its integration with GenAI services via Ollama. This allows me to send detections and alerts to an external AI service, enabling context-aware semantic searches. For example, I can now search for "Red SUV" or "White Ute" and receive highly targeted results. This significantly enhances detection accuracy and reduces false positives.
+I needed cameras that were cheap enough to experiment with, yet offered enough features to make the system worthwhile. A Black Friday deal on Amazon AU landed me three (soon to be four) TP‑Link Vigi C540 units for AUD 50 each. For the price they are surprisingly capable:
-**Home Assistant: Orchestrating the Automation**
+* **Pan/tilt** – allows a single unit to cover a wide field of view without additional hardware.  
 * **On‑board human detection** – the camera can flag a person before any external processing, reducing false positives.  
 * **RTSP support** – the standard streaming protocol that Frigate expects, meaning we are not locked into a proprietary API.  
-To further streamline the process, I’ve integrated Frigate with Home Assistant, a powerful open-source home automation platform. This allows me to trigger notifications based on detected objects and leverage Home Assistant’s existing integration with Ollama and Matrix. The result is a sophisticated system that not only detects objects but also provides contextual information and sends notifications directly to my phone.
+The cameras lack optical zoom, but the 1080p stream is more than enough for a backyard or front‑gate view. Their built‑in motion logic also gives us a first line of defence against unnecessary processing on the NVR.
-**System Architecture**
+### The NVR Core: Frigate  
-[Diagram illustrating the system flow - see Mermaid code below]
+At the heart of the system sits **Frigate**, an open‑source network video recorder that runs as a Docker container. What makes Frigate a perfect fit for a homelab project?
 * **Container‑native** – it can be deployed on any host that supports Docker, which aligns nicely with my Proxmox LXC setup.  
 * **YAML‑driven configuration** – a single, version‑controlled file defines cameras, streams, detection models, and storage policies.  
 * **Built‑in object detection** – Frigate can invoke a variety of detectors (OpenVINO, TensorRT, Coral) without external orchestration.  
 * **Extensible API** – the platform exposes webhooks and REST endpoints that other services (Home Assistant, Ollama) can consume.
 Spinning up a new Frigate instance is as simple as pulling the image, mounting the configuration file, and pointing it at the RTSP URLs of the cameras. The UI instantly shows live feeds, detection boxes, and a timeline of events, which made the early testing phase feel like watching a sci‑fi control room.
 ### Getting the Detection Right  
 Frigate ships with a handful of pre‑configured detectors, but I wanted a model that could run on the modest hardware I had on hand: a Dell OptiPlex 7060 SFF rescued for $150, equipped with an Intel i5‑7500 and 16 GB of RAM. After a few experiments I settled on the **OpenVINO** backend with the **SSDLite MobileNet V2** model from the Open Model Zoo. The model is small enough to run on CPU while still delivering decent accuracy for cars, people, and animals.
 A few practical notes from the deployment:
 * **Model download** – Frigate pulls the model from Hugging Face at first start, so the host needs occasional internet access for updates.  
 * **CPU utilisation** – The i5 hovers around 75 % load under continuous detection, which is acceptable for now but leaves little headroom for additional workloads.  
 * **Thermal considerations** – The workstation runs warm; I’ve added a low‑profile fan to keep temperatures in check.
 The detector provides bounding boxes and confidence scores for each frame that triggers an event. Those raw detections are the raw material that later gets enriched by the generative AI layer.
 ### Adding Generative AI with Ollama  
 Frigate’s newest integration allows snapshots of detections to be sent to an external **GenAI** service. I run **Ollama** locally, which hosts a variety of large language models, including vision‑language variants such as **qwen3‑vl‑4b** and **qwen3‑vl‑2b**. The workflow is straightforward:
 1. An object (e.g., a car) is detected by Frigate.  
 2. Frigate captures a still image of the frame and forwards it to Ollama via the GenAI webhook.  
 3. The vision‑language model analyses the image and returns a concise textual description (“Red SUV parked in the driveway”).  
 4. The description is stored as metadata alongside the detection event.
 This extra step turns a generic “car” tag into a searchable phrase. In the Frigate UI I can now type “white ute” or “red scooter” and instantly retrieve the matching clips. The semantic search capability is the most exciting part of the project because it opens the door to natural‑language queries over a video archive—a feature that commercial cloud services charge a premium for.
 ### Home Assistant: The Glue That Makes It All Useful  
 Frigate already gives us detection and enriched metadata, but to turn that into actionable alerts we need an automation engine. **Home Assistant** fills that role perfectly:
 * **Event ingestion** – Frigate pushes detection events to Home Assistant via its native integration.  
 * **AI‑driven notifications** – Home Assistant receives the snapshot, forwards it to Ollama for a second pass (using a smaller model for speed), and then formats a friendly message.  
 * **Matrix delivery** – The final message, complete with the image and AI‑generated caption, is sent to my Matrix client on my phone.  
 The result is a notification that reads something like: “A white ute with a surfboard in the back has just entered the driveway.” Compared with a raw “motion detected” alert, this is a massive usability upgrade. Moreover, Home Assistant can trigger other automations based on the enriched data, such as turning on porch lights when a car arrives after sunset, or silencing the alarm when the mail carrier is recognised.
 ### Storage Strategy  
 Video storage is often the Achilles’ heel of a DIY CCTV system. My approach balances capacity, retention, and cost:
 | Storage tier | Device | Purpose | Retention |
 |--------------|--------|---------|-----------|
 | OS & containers | 256 GB SSD | Frigate, Home Assistant, Ollama | Indefinite |
 | Video archive | 2 TB HDD | Full‑resolution recordings, detection clips, alert snapshots | 7 days (full), 14 days (detections), 30 days (alerts) |
 Frigate automatically prunes old files based on the policies defined in the YAML configuration, so the drive never fills up unexpectedly. With four 1080p streams at 5 fps the 2 TB budget comfortably covers the stated retention windows.
 ### Deploying on Proxmox  
 All services run inside a dedicated LXC container on a Proxmox node that I carved out specifically for this project. The container hosts Docker, the Frigate image, Home Assistant, and Ollama. Using LXC gives me a lightweight isolation layer without the overhead of a full VM, and Proxmox’s snapshot feature lets me roll back the entire stack if a configuration change goes sideways.
 The network layout is simple: each camera connects via wired Ethernet to the same LAN as the Proxmox host, ensuring low latency and no packet loss that could otherwise cause frame drops. The container’s Docker bridge is bound to the host’s network stack, so Frigate sees the cameras as if it were running directly on the workstation.
 ### Challenges and What’s Next  
 The system works, but there are a few rough edges that I’m actively polishing:
 * **Prompt engineering** – The text prompts sent to the vision‑language models are still generic, leading to occasional mis‑classifications (e.g., a cat being described as a “small vehicle”). Fine‑tuning the prompts and possibly adding a few few‑shot examples should improve consistency.  
 * **Notification fatigue** – Right now every person or car that passes the driveway generates a push notification. I plan to add a lightweight decision model that evaluates the confidence score, time of day, and historical patterns before deciding whether to alert.  
 * **Hardware scaling** – The i5‑7500 is holding its own, but as I add more cameras or switch to a larger model (e.g., a 4‑billion‑parameter vision transformer), I’ll need a more powerful CPU or a dedicated GPU/Intel Arc accelerator.  
 * **Public configuration** – The deployment repository still contains secrets. Once scrubbed, I intend to publish the full YAML and Docker‑compose files so others can replicate the stack with a single `git clone`.
 ### A Personal Note: The Unsung Hero  
 No amount of software wizardry would have gotten the Ethernet cables into the roof without a bit of elbow grease. My dad, armed with a ladder, a drill, and a healthy dose of patience, ran the cabling that links the cameras to the network. He spent an entire Saturday climbing into the attic, pulling cables through joists, and making sure each connection was solid. I was mostly handing him tools and keeping the coffee warm. Without his help the whole project would have remained a pipe‑dream. So, Dad, thank you for the hard yards, the guidance, and the occasional dad‑joke that kept the mood light while we were up on the roof. This system is as much yours as it is mine.
 ### Closing Thoughts  
 Building an AI‑enhanced CCTV system at home is no longer a pipe‑dream reserved for large enterprises. With a modest budget, a handful of inexpensive cameras, and a stack of open‑source tools—Frigate, OpenVINO, Ollama, Home Assistant, and a bit of Proxmox magic—you can achieve a private, searchable, and context‑rich surveillance solution. The journey taught me a lot about video streaming, container orchestration, and the quirks of vision‑language models, but the biggest takeaway is the empowerment that comes from owning your data.
 If you’re curious about the nitty‑gritty—YAML snippets, Docker‑compose files, or the exact model versions—I’m happy to share once the repository is fully sanitized. Until then, enjoy the peace of mind that comes from knowing your backyard is being watched by a system that not only sees but also understands.
 ```mermaid
 graph TD
-    Camera --> Frigate NVR
+    Camera1[Camera] --> Frigate
-    Frigate NVR --> HomeAssistant
+    Camera2[Camera] --> Frigate
-    Frigate Object Detections --> Send snapshot to Ollama (qwen3-vl-4b) for semantic search AI enhancement
+    Camera3[Camera] --> Frigate
-    Frigate NVR --> HomeAssistant --> (Object Detection from Frigate) --> (Copy Image to Home Assistant) --> Send image to Ollama (qwen3-vl-2b) for context enhancement --> send response back via Matrix
+    Camera4[Future Camera] --> Frigate
-    Camera --> (Camera future)
+    Frigate -->|Object Detections| Ollama_qwen3_vl_4b[Send snapshot to Ollama (qwen3‑vl‑4b) for semantic search AI enhancement]
-```
+    Frigate --> Frigate_NVR[Frigate NVR]
-
+    Frigate_NVR --> HomeAssistant[Home Assistant]
-**Storage and Scalability**
+    HomeAssistant -->|Object Detection from Frigate| CopyImage[Copy Image to Home Assistant]
-
+    CopyImage --> Ollama_qwen3_vl_2b[Send image to Ollama (qwen3‑vl‑2b) for context enhancement]
-The system is currently housed within a node in my Proxmox cluster, utilizing Docker and LXC containers for efficient resource management. Storage is handled by a 2TB HDD, providing ample space for 7 days of full-resolution video, 14 days of detection data, and 30 days of alerts across the four cameras I plan to deploy.
+    Ollama_qwen3_vl_2b --> Matrix[Send response back via Matrix]
-
+```
 **Ongoing Development and Future Enhancements**
 While the system is already delivering impressive results, there’s still room for improvement. I’m currently refining the prompts used for semantic context enhancement and notifications to reduce false positives. A key goal is to implement AI-driven decision-making for notifications, preventing unnecessary alerts for every detected person or vehicle.
 **A Note of Gratitude**
 Finally, a sincere thank you to my father. His expertise and willingness to lend a hand were invaluable, particularly when it came to the less glamorous task of running Ethernet cables through the roof. Without his assistance, this project simply wouldn’t have been possible.