Implement AI-enhanced home CCTV system

2026-02-03 02:09:07 +00:00 · 2026-02-03 02:09:07 +00:00 · aa339dc6b3
commit aa339dc6b3
parent f44a86eb6d
1 changed files with 25 additions and 140 deletions
--- a/src/content/designing_and_building_an_ai_enhanced_cctv_system_at_home.md
+++ b/src/content/designing_and_building_an_ai_enhanced_cctv_system_at_home.md
@ -1,164 +1,49 @@
-Over the past six months I’ve been tinkering with a problem that most homeowners shrug off as “just get a camera and be done with it.”  The reality is that a plain‑old CCTV feed gives you a wall of motion‑triggered video that you have to trawl through manually, and the moment you start looking for anything useful you quickly realise the data is as opaque as a foggy morning on the Nullarbor.  My goal was simple on paper: a locally‑hosted surveillance system that keeps every frame under my own roof, yet still offers the kind of intelligent search and notification you’d expect from a cloud‑based service.  The result is an AI‑enhanced CCTV stack built around the open‑source NVR called **Frigate**, a handful of budget‑friendly TP‑Link Vigi C540 cameras, and a modest homelab that runs everything in Docker containers on a Proxmox LXC node.
+# Designing and Building an AI Enhanced CCTV System at Home
---
+The world of home security has come a long way from blinking red lights and hefty monthly fees. I've been exploring a more sophisticated approach – building a locally controlled, AI-enhanced CCTV system for my homelab. The goal? To move beyond simple surveillance and gain a deeper understanding of what's happening around my property, all while maintaining complete control over my data. It’s been a fascinating project, blending hardware, software, and a healthy dose of machine learning.
-## Why “Local‑First” Matters
+The system centres around Frigate, an open-source Network Video Recorder (NVR) software. Frigate’s intuitive interface and container-native design make it surprisingly accessible, even for those of us who aren't coding wizards. The ability to configure it using simple YAML files is a game-changer, allowing for repeatable builds and easy experimentation. While the full configuration details remain private for now (a bit of tidying up is still needed!), my aim is to eventually share the setup publicly so others can benefit from it.
-When you hand over video streams to a commercial provider you’re essentially giving a stranger a front‑row seat to every comings and goings at your doorstep.  The trade‑off is usually a subscription fee and the promise of “smart” alerts.  In practice those alerts are either too noisy (the neighbour’s cat triggers a “person detected” every night) or too vague (a generic “motion detected” that forces you to open the app and stare at a blank screen).  Moreover, the data lives somewhere you can’t audit, and any breach could expose a detailed visual diary of your life.
+**The Hardware Foundation**
-A local‑first approach eliminates the subscription, guarantees that the raw footage never leaves your network, and gives you full control over how the data is processed.  The only thing you have to trust is the hardware you own and the open‑source software you run.  That level of ownership is what makes the project worth the effort.
+My camera setup consists of three TP-Link ‘vigi’ C540 cameras, a bargain at AUD$50 each during a Black Friday sale. These cameras offer pan and tilt functionality, along with built-in human detection and tracking – features that have proven remarkably effective. While they lack zoom capabilities, they’re more than adequate for my surveillance needs.
---
+**Frigate: The Core of the System**
-## The Core: Frigate NVR
+Frigate acts as the central hub, handling video recording, metadata tagging, and integrating AI capabilities. I’m leveraging OpenVINO models for efficient on-device processing. This approach minimizes latency and reduces reliance on external services. The system is currently running on a Dell workstation powered by an Intel Core i5-7500 processor, a surprisingly capable machine I acquired for just $150. The “small” OpenVINO SSDLite model is providing good performance, although the system does run a little warm, averaging around 75% CPU usage.
-Frigate is a container‑native Network Video Recorder that does far more than simply write video to disk.  It watches each RTSP stream, runs object detection on every frame (or on a configurable subset), and exposes a clean REST API plus a web UI for live view, playback, and search.  The entire configuration lives in a single YAML file, which makes it trivial to version‑control, replicate, or spin up a fresh instance for testing.
+**AI-Powered Intelligence**
-Key points that made Frigate the obvious choice:
+What truly sets this system apart is its integration with GenAI services via Ollama. This allows me to send detections and alerts to an external AI service, enabling context-aware semantic searches. For example, I can now search for "Red SUV" or "White Ute" and receive highly targeted results. This significantly enhances detection accuracy and reduces false positives.
-| Feature | Why it matters |
+**Home Assistant: Orchestrating the Automation**
 |---------|----------------|
 | **Docker‑ready** | Deploys in seconds on any machine that can run containers. |
 | **YAML‑driven** | Human‑readable, repeatable, and easy to tweak via CI pipelines. |
 | **Built‑in object detectors** | Supports TensorRT, Coral, and CPU‑only models; can pull the latest weights from Hugging Face. |
 | **GenAI integration** | Direct hooks to Ollama for custom metadata generation. |
 | **Home Assistant integration** | Exposes events that can be consumed by any automation platform. |
-Because the configuration is declarative, I could spin up a fresh test environment, point it at a different camera, tweak the FFmpeg pipeline, and have the whole stack ready in under five minutes.  That rapid iteration cycle was essential while I was still learning the quirks of each camera and the best way to balance CPU load against detection accuracy.
+To further streamline the process, I’ve integrated Frigate with Home Assistant, a powerful open-source home automation platform. This allows me to trigger notifications based on detected objects and leverage Home Assistant’s existing integration with Ollama and Matrix. The result is a sophisticated system that not only detects objects but also provides contextual information and sends notifications directly to my phone.
---
+**System Architecture**
-## The Eyes: TP‑Link Vigi C540 Cameras
+[Diagram illustrating the system flow - see Mermaid code below]
 I chose three (soon to be four) TP‑Link Vigi C540 units because they hit the sweet spot between price and capability.  At AUD 50 each they were a steal during a Black Friday sale, and they come with pan‑and‑tilt motors, on‑board human detection, and a follow‑mode that keeps a moving subject in frame.  The cameras lack optical zoom, but the field of view is wide enough for a typical suburban front yard, and the built‑in motion detection reduces the amount of data that needs to be processed by Frigate.
 | Spec | Reason for selection |
 |------|-----------------------|
 | **Pan/Tilt** | Allows a single unit to cover multiple angles without extra hardware. |
 | **Wired Ethernet** | Guarantees a stable feed and eliminates Wi‑Fi interference. |
 | **On‑board human detection** | Provides a first line of filtering before Frigate even sees the frame. |
 | **Affordability** | Keeps the overall project budget modest, leaving room for a GPU upgrade later. |
 The cameras are mounted on the roof, with Ethernet cables run through the attic.  I’ll come back to the physical installation later, but the result is a reliable, low‑latency video source that feeds directly into the Frigate container.
 ---
 ## The Brain: GPU‑Accelerated Object Detection
 Frigate can offload inference to a GPU, a Coral Edge TPU, or even run on the CPU if you’re willing to accept slower detection rates.  My homelab node is a modest Intel N100 mini‑PC equipped with a mid‑range NVIDIA RTX 3060.  With the RTX in place, Frigate pulls the latest YOLO‑v8 models from Hugging Face at startup and runs detection at roughly 10 fps per camera, which is more than sufficient for a residential setting.
 Because the models are downloaded from the internet, the host needs occasional outbound connectivity to refresh weights.  This is a small price to pay for the ability to keep the detection pipeline up‑to‑date without manual intervention.
 ---
 ## Adding Smarts: Ollama and GenAI
 Frigate’s native object detection tells you *what* it sees – a person, a car, a dog – but it can’t answer “what colour is the car?” or “is that a delivery driver or a neighbour?”  That’s where Ollama comes in.  Ollama is a locally hosted LLM/vision service that can run multimodal models such as **qwen3‑vl‑4b** and **qwen3‑vl‑2b**.  By sending a snapshot to Ollama with a carefully crafted prompt, the model returns a natural‑language description that adds semantic depth to the raw detection.
 Two distinct flows exist:
 1. **Semantic Search Enrichment** – When a snapshot is sent to the 4‑billion‑parameter model, the response is stored as metadata alongside the detection.  This lets me type “Red SUV” into Frigate’s search bar and instantly filter to the relevant clips.
 2. **Contextual Notification** – A lighter 2‑billion‑parameter model receives the same snapshot but with a prompt that asks for a concise, security‑focused summary.  The result is attached to a Matrix message that lands on my phone, turning a generic “person detected” alert into “Delivery person in hi‑vis vest placing a parcel on the porch.”
 Because both models run locally, there is no latency penalty and no data leaves the house.  The only external dependency is the occasional model update from Hugging Face.
 ---
 ## Storage Strategy
 Video footage is storage‑hungry, especially when you keep full‑resolution streams for a week or more.  My node uses a two‑tier storage layout:
 * **SSD (256 GB)** – Hosts the OS, Docker engine, and Frigate’s database.  The SSD ensures fast metadata reads/writes and quick container restarts.
 * **2 TB HDD** – Dedicated to raw video files.  With three cameras recording at 1080p/15 fps, the drive comfortably holds:
  * 7 days of continuous footage,
  * 14 days of motion‑triggered clips,
  * 30 days of alert snapshots.
 I configured Frigate’s `retention` policy to automatically prune older files, and the system has been stable for months without running out of space.  If the footage ever exceeds capacity, the next step would be to add a second HDD and enable a simple RAID‑1 mirror for redundancy.
 ---
 ## Home Assistant: The Automation Glue
 Home Assistant (HA) is the hub that ties everything together.  Frigate pushes detection events to HA via its built‑in integration, and HA then orchestrates the following steps:
 1. **Receive event** – HA gets the object type, timestamp, and a URL to the snapshot.
 2. **Copy image** – HA downloads the snapshot to its own cache.
 3. **Ask Ollama** – HA sends the image to the 2‑billion‑parameter model with a prompt like “Summarise the scene for a homeowner, focusing on potential security relevance.”
 4. **Dispatch notification** – HA forwards the AI‑generated description and the image to a Matrix room that I have linked to my phone.
 Because HA already manages my lights, locks, and climate, the CCTV alerts can also trigger downstream actions.  For example, a “person at front door” event could automatically turn on the porch light, while a “delivery person” event could unlock a smart drop‑box for a brief window.
 ---
 ## The Physical Build: Getting Cable Through the Roof
 All the software in the world is useless if the cameras can’t talk to the server.  I opted for wired Ethernet because it guarantees a stable, low‑latency feed and avoids the headaches of Wi‑Fi dead zones.  Running Cat‑6 cable from the attic down to each camera required a ladder, a drill, and a lot of patience.  I’m not exactly a handyman, so I called in the one person who could actually climb onto the roof without panicking – my dad.
 He spent an entire Saturday pulling the cable through the roof cavity, crimping connectors, and testing continuity while I handed him tools and tried not to get in the way.  The result is a clean, tidy installation that looks as if it were done by a professional.  The effort was a reminder that even the most high‑tech projects still rely on good old‑fashioned elbow grease.
 ---
 ## The Architecture – A Visual Overview
 ```mermaid
-graph LR
+graph TD
-    Camera1[Camera] --> Frigate
+    Camera --> Frigate NVR
-    Camera2[Camera] --> Frigate
+    Frigate NVR --> HomeAssistant
-    Camera3[Camera] --> Frigate
+    Frigate Object Detections --> Send snapshot to Ollama (qwen3-vl-4b) for semantic search AI enhancement
-    Frigate --> ObjectDetections[Object Detections]
+    Frigate NVR --> HomeAssistant --> (Object Detection from Frigate) --> (Copy Image to Home Assistant) --> Send image to Ollama (qwen3-vl-2b) for context enhancement --> send response back via Matrix
-    ObjectDetections --> Ollama1[Ollama (qwen3‑vl‑4b) – Semantic Search]
+    Camera --> (Camera future)
    Frigate --> NVR[Frigate NVR]
    NVR --> HomeAssistant
    HomeAssistant --> CopyImg[Copy Image to Home Assistant]
    CopyImg --> Ollama2[Ollama (qwen3‑vl‑2b) – Context Enhancement]
    Ollama2 --> Matrix[Matrix Notification]
    CameraFuture[Future Camera] --> Frigate
 ```
---
+**Storage and Scalability**
-## Lessons Learned and Tweaks for the Future
+The system is currently housed within a node in my Proxmox cluster, utilizing Docker and LXC containers for efficient resource management. Storage is handled by a 2TB HDD, providing ample space for 7 days of full-resolution video, 14 days of detection data, and 30 days of alerts across the four cameras I plan to deploy.
-### Prompt Engineering Still a Work‑In‑Progress  
+**Ongoing Development and Future Enhancements**
 The quality of the AI‑generated metadata hinges on the prompts sent to Ollama.  My initial prompts were overly generic (“Describe this image”), which resulted in vague outputs like “A vehicle is present.”  After a few iterations I discovered that adding context (“Focus on colour, make, and any visible markings”) dramatically improved relevance.  I’m now experimenting with a small prompt library that can be swapped out depending on the time of day or the type of detection.
-### Smarter Notification Filtering  
+While the system is already delivering impressive results, there’s still room for improvement. I’m currently refining the prompts used for semantic context enhancement and notifications to reduce false positives. A key goal is to implement AI-driven decision-making for notifications, preventing unnecessary alerts for every detected person or vehicle.
 At the moment every person or vehicle triggers a push notification, which quickly becomes noisy.  The next step is to let the LLM decide whether an alert is worth sending.  By feeding the model additional data – such as the known list of resident vehicle plates or the typical schedule of the mail carrier – it can return a confidence score that HA can use to suppress low‑importance alerts.
-### Scaling to More Cameras  
+**A Note of Gratitude**
 Adding a fourth camera is on the roadmap, and the architecture is already prepared for it.  The biggest concern will be GPU utilisation; the RTX 3060 can comfortably handle three streams at 10 fps, but a fourth may push the limits.  Options include upgrading to a higher‑end GPU or distributing the load across two nodes in the Proxmox cluster.
-### Backup and Disaster Recovery  
+Finally, a sincere thank you to my father. His expertise and willingness to lend a hand were invaluable, particularly when it came to the less glamorous task of running Ethernet cables through the roof. Without his assistance, this project simply wouldn’t have been possible.
 While the HDD is sufficient for day‑to‑day storage, a catastrophic drive failure would erase weeks of footage.  I plan to add a nightly rsync job that mirrors the video directory to an external USB‑3.0 drive, and eventually to a cheap off‑site NAS for true redundancy.
 ---
 ## The Bigger Picture: Why This Matters
 What started as a personal curiosity has turned into a platform that could be replicated by anyone with a modest budget and a willingness to roll up their sleeves.  The key takeaways are:
 * **Privacy first** – All video stays on‑premises, and AI processing never leaves the house.
 * **Open‑source stack** – Frigate, Home Assistant, and Ollama are all free and community‑driven, meaning you’re not locked into a vendor’s roadmap.
 * **Extensible architecture** – The mermaid diagram shows a modular flow where each component can be swapped out (e.g., replace Ollama with a different LLM, or add a second NVR for redundancy).
 * **Real‑world utility** – Semantic search turns hours of footage into a few clicks, and context‑rich notifications cut down on false alarms.
 In an era where “smart” devices are often synonymous with “data‑selling machines,” building a locally owned, AI‑enhanced CCTV system is a small act of digital sovereignty.  It proves that you don’t need a multi‑million‑dollar budget to have a surveillance system that actually *understands* what it’s watching.
 ---
 ## A Massive Shout‑Out
 No amount of software wizardry could have gotten the Ethernet cables through the roof without a pair of strong hands and a willingness to climb up there on a scorching summer day.  My dad took the ladder, the drill, and the patience to guide the cable through the attic, and he did it with a grin (and a few jokes about “why the son can’t even change a lightbulb”).  This project would have remained a half‑finished dream without his help, and the system we now have is as much his achievement as mine.
 So, Dad – thank you for the hard yards, the guidance, and for not letting my nerdy enthusiasm turn into a half‑baked mess.  This CCTV system is a testament to teamwork, a bit of Aussie ingenuity, and the belief that the best security is the kind you build yourself.
 ---
 ## Closing Thoughts
 If you’re reading this and thinking “I could use something like this,” the answer is: start small.  Grab a cheap camera, spin up a Frigate container, and let the system record a few days of footage.  Once you’ve got the basics working, add Home Assistant, hook in Ollama, and watch the alerts become smarter.  Don’t be afraid to experiment with prompts, tweak the retention policy, or even add a second GPU down the line.
 The journey from “just a camera” to “AI‑enhanced CCTV” is a series of incremental steps, each one teaching you a little more about video pipelines, machine learning, and home automation.  The biggest reward isn’t just the peace of mind that comes from knowing exactly who’s at the door – it’s the satisfaction of building something that truly belongs to you, runs on your hardware, and respects your privacy.
 So, fire up that Docker daemon, pull the Frigate image, and let the cameras roll.  And when the first AI‑generated notification lands on your phone, you’ll know you’ve turned a simple surveillance feed into a genuinely intelligent guardian for your home.