2026-02-03 13:24:16 +10:00
1 changed files with 150 additions and 22 deletions
--- a/src/content/designing_and_building_an_ai_enhanced_cctv_system_at_home.md
+++ b/src/content/designing_and_building_an_ai_enhanced_cctv_system_at_home.md
@ -1,36 +1,164 @@
-Designing and Building an AI Enhanced CCTV System at Home
+Over the past six months I’ve been tinkering with a problem that most homeowners shrug off as “just get a camera and be done with it.”  The reality is that a plain‑old CCTV feed gives you a wall of motion‑triggered video that you have to trawl through manually, and the moment you start looking for anything useful you quickly realise the data is as opaque as a foggy morning on the Nullarbor.  My goal was simple on paper: a locally‑hosted surveillance system that keeps every frame under my own roof, yet still offers the kind of intelligent search and notification you’d expect from a cloud‑based service.  The result is an AI‑enhanced CCTV stack built around the open‑source NVR called **Frigate**, a handful of budget‑friendly TP‑Link Vigi C540 cameras, and a modest homelab that runs everything in Docker containers on a Proxmox LXC node.

-Over the past six months, I’ve been immersed in a project to build a sophisticated CCTV system for my homelab. The goal? Locally controlled camera data combined with the benefits of AI-powered semantic search and intelligent notifications. This isn't about creating a futuristic surveillance hub, but about building a genuinely useful, locally managed system capable of some impressive feats.
+---

-The system revolves around Frigate, an open-source Network Video Recorder (NVR) software. It offers an intuitive interface and is designed to be container-native, making deployment and management straightforward. Configuration is handled through YAML files, providing a structured approach to camera setup. This has allowed for rapid testing and refinement of different configurations, saving considerable time and frustration. While the complete configuration remains private due to ongoing refinements, the intention is to eventually share it publicly.
+## Why “Local‑First” Matters

-The foundation of the system consists of three (with plans to expand to four) TP-Link ‘vigi’ C540 cameras, acquired during a Black Friday sale for a remarkably affordable AUD$50 each. These cameras offer pan and tilt functionality, onboard human detection, and follow mode, providing a solid base for surveillance.
+When you hand over video streams to a commercial provider you’re essentially giving a stranger a front‑row seat to every comings and goings at your doorstep.  The trade‑off is usually a subscription fee and the promise of “smart” alerts.  In practice those alerts are either too noisy (the neighbour’s cat triggers a “person detected” every night) or too vague (a generic “motion detected” that forces you to open the app and stare at a blank screen).  Moreover, the data lives somewhere you can’t audit, and any breach could expose a detailed visual diary of your life.

-The true innovation lies in the integration of AI. Frigate connects to machine learning models, enabling object detection – identifying cars, motorbikes, and animals. This goes beyond simple detection; it leverages external AI models to analyze images and provide context. For example, a detected vehicle can be identified as a "silver sedan," and a description like "silver sedan approaching driveway" can be generated. This contextual information is then relayed to Home Assistant, a powerful open-source home automation platform.
+A local‑first approach eliminates the subscription, guarantees that the raw footage never leaves your network, and gives you full control over how the data is processed.  The only thing you have to trust is the hardware you own and the open‑source software you run.  That level of ownership is what makes the project worth the effort.

-Home Assistant acts as the central hub, integrating the CCTV system with other smart home devices. It receives object detections from Frigate, triggers notifications based on those detections, and even uses the information to control other devices. This integration is further enhanced by connecting Home Assistant to Ollama, allowing for the generation of metadata and the delivery of context-rich notifications via Matrix.
+---

-The architecture of the system is as follows:
+## The Core: Frigate NVR
+
+Frigate is a container‑native Network Video Recorder that does far more than simply write video to disk.  It watches each RTSP stream, runs object detection on every frame (or on a configurable subset), and exposes a clean REST API plus a web UI for live view, playback, and search.  The entire configuration lives in a single YAML file, which makes it trivial to version‑control, replicate, or spin up a fresh instance for testing.
+
+Key points that made Frigate the obvious choice:
+
+| Feature | Why it matters |
+|---------|----------------|
+| **Docker‑ready** | Deploys in seconds on any machine that can run containers. |
+| **YAML‑driven** | Human‑readable, repeatable, and easy to tweak via CI pipelines. |
+| **Built‑in object detectors** | Supports TensorRT, Coral, and CPU‑only models; can pull the latest weights from Hugging Face. |
+| **GenAI integration** | Direct hooks to Ollama for custom metadata generation. |
+| **Home Assistant integration** | Exposes events that can be consumed by any automation platform. |
+
+Because the configuration is declarative, I could spin up a fresh test environment, point it at a different camera, tweak the FFmpeg pipeline, and have the whole stack ready in under five minutes.  That rapid iteration cycle was essential while I was still learning the quirks of each camera and the best way to balance CPU load against detection accuracy.
+
+---
+
+## The Eyes: TP‑Link Vigi C540 Cameras
+
+I chose three (soon to be four) TP‑Link Vigi C540 units because they hit the sweet spot between price and capability.  At AUD 50 each they were a steal during a Black Friday sale, and they come with pan‑and‑tilt motors, on‑board human detection, and a follow‑mode that keeps a moving subject in frame.  The cameras lack optical zoom, but the field of view is wide enough for a typical suburban front yard, and the built‑in motion detection reduces the amount of data that needs to be processed by Frigate.
+
+| Spec | Reason for selection |
+|------|-----------------------|
+| **Pan/Tilt** | Allows a single unit to cover multiple angles without extra hardware. |
+| **Wired Ethernet** | Guarantees a stable feed and eliminates Wi‑Fi interference. |
+| **On‑board human detection** | Provides a first line of filtering before Frigate even sees the frame. |
+| **Affordability** | Keeps the overall project budget modest, leaving room for a GPU upgrade later. |
+
+The cameras are mounted on the roof, with Ethernet cables run through the attic.  I’ll come back to the physical installation later, but the result is a reliable, low‑latency video source that feeds directly into the Frigate container.
+
+---
+
+## The Brain: GPU‑Accelerated Object Detection
+
+Frigate can offload inference to a GPU, a Coral Edge TPU, or even run on the CPU if you’re willing to accept slower detection rates.  My homelab node is a modest Intel N100 mini‑PC equipped with a mid‑range NVIDIA RTX 3060.  With the RTX in place, Frigate pulls the latest YOLO‑v8 models from Hugging Face at startup and runs detection at roughly 10 fps per camera, which is more than sufficient for a residential setting.
+
+Because the models are downloaded from the internet, the host needs occasional outbound connectivity to refresh weights.  This is a small price to pay for the ability to keep the detection pipeline up‑to‑date without manual intervention.
+
+---
+
+## Adding Smarts: Ollama and GenAI
+
+Frigate’s native object detection tells you *what* it sees – a person, a car, a dog – but it can’t answer “what colour is the car?” or “is that a delivery driver or a neighbour?”  That’s where Ollama comes in.  Ollama is a locally hosted LLM/vision service that can run multimodal models such as **qwen3‑vl‑4b** and **qwen3‑vl‑2b**.  By sending a snapshot to Ollama with a carefully crafted prompt, the model returns a natural‑language description that adds semantic depth to the raw detection.
+
+Two distinct flows exist:
+
+1. **Semantic Search Enrichment** – When a snapshot is sent to the 4‑billion‑parameter model, the response is stored as metadata alongside the detection.  This lets me type “Red SUV” into Frigate’s search bar and instantly filter to the relevant clips.
+2. **Contextual Notification** – A lighter 2‑billion‑parameter model receives the same snapshot but with a prompt that asks for a concise, security‑focused summary.  The result is attached to a Matrix message that lands on my phone, turning a generic “person detected” alert into “Delivery person in hi‑vis vest placing a parcel on the porch.”
+
+Because both models run locally, there is no latency penalty and no data leaves the house.  The only external dependency is the occasional model update from Hugging Face.
+
+---
+
+## Storage Strategy
+
+Video footage is storage‑hungry, especially when you keep full‑resolution streams for a week or more.  My node uses a two‑tier storage layout:
+
+* **SSD (256 GB)** – Hosts the OS, Docker engine, and Frigate’s database.  The SSD ensures fast metadata reads/writes and quick container restarts.
+* **2 TB HDD** – Dedicated to raw video files.  With three cameras recording at 1080p/15 fps, the drive comfortably holds:
+  * 7 days of continuous footage,
+  * 14 days of motion‑triggered clips,
+  * 30 days of alert snapshots.
+
+I configured Frigate’s `retention` policy to automatically prune older files, and the system has been stable for months without running out of space.  If the footage ever exceeds capacity, the next step would be to add a second HDD and enable a simple RAID‑1 mirror for redundancy.
+
+---
+
+## Home Assistant: The Automation Glue
+
+Home Assistant (HA) is the hub that ties everything together.  Frigate pushes detection events to HA via its built‑in integration, and HA then orchestrates the following steps:
+
+1. **Receive event** – HA gets the object type, timestamp, and a URL to the snapshot.
+2. **Copy image** – HA downloads the snapshot to its own cache.
+3. **Ask Ollama** – HA sends the image to the 2‑billion‑parameter model with a prompt like “Summarise the scene for a homeowner, focusing on potential security relevance.”
+4. **Dispatch notification** – HA forwards the AI‑generated description and the image to a Matrix room that I have linked to my phone.
+
+Because HA already manages my lights, locks, and climate, the CCTV alerts can also trigger downstream actions.  For example, a “person at front door” event could automatically turn on the porch light, while a “delivery person” event could unlock a smart drop‑box for a brief window.
+
+---
+
+## The Physical Build: Getting Cable Through the Roof
+
+All the software in the world is useless if the cameras can’t talk to the server.  I opted for wired Ethernet because it guarantees a stable, low‑latency feed and avoids the headaches of Wi‑Fi dead zones.  Running Cat‑6 cable from the attic down to each camera required a ladder, a drill, and a lot of patience.  I’m not exactly a handyman, so I called in the one person who could actually climb onto the roof without panicking – my dad.
+
+He spent an entire Saturday pulling the cable through the roof cavity, crimping connectors, and testing continuity while I handed him tools and tried not to get in the way.  The result is a clean, tidy installation that looks as if it were done by a professional.  The effort was a reminder that even the most high‑tech projects still rely on good old‑fashioned elbow grease.
+
+---
+
+## The Architecture – A Visual Overview

 ```mermaid
-graph TD
-Camera --> FrigateObjectDetections
-FrigateObjectDetections --> SendSnapshot
-SendSnapshot --> Ollama[qwen3-vl-4b for Semantic Search AI Enhancement]
-FrigateObjectDetections --> HomeAssistant
-HomeAssistant --> ObjectDetection
-ObjectDetection --> CopyImage
-CopyImage --> Ollama[qwen3-vl-2b for Context Enhancement]
-Ollama --> Matrix
-Camera --> FutureCamera
+graph LR
+    Camera1[Camera] --> Frigate
+    Camera2[Camera] --> Frigate
+    Camera3[Camera] --> Frigate
+    Frigate --> ObjectDetections[Object Detections]
+    ObjectDetections --> Ollama1[Ollama (qwen3‑vl‑4b) – Semantic Search]
+    Frigate --> NVR[Frigate NVR]
+    NVR --> HomeAssistant
+    HomeAssistant --> CopyImg[Copy Image to Home Assistant]
+    CopyImg --> Ollama2[Ollama (qwen3‑vl‑2b) – Context Enhancement]
+    Ollama2 --> Matrix[Matrix Notification]
+    CameraFuture[Future Camera] --> Frigate
 ```

-The system operates within a dedicated node on my Proxmox cluster, utilizing Docker LXC for efficient resource management.
+---

-Storage is handled with an SSD for Frigate’s operation and a 2TB HDD for video history and metadata. This setup provides sufficient capacity for seven days of full history, fourteen days of detections, and thirty days of alerts.
+## Lessons Learned and Tweaks for the Future

-While the system is functional and delivers impressive results, ongoing refinements are planned. The current notification prompts are being refined to improve accuracy and reduce unnecessary alerts. The ultimate goal is to leverage AI to intelligently determine when notifications are warranted, rather than generating alerts for every detected object.
+### Prompt Engineering Still a Work‑In‑Progress  
+The quality of the AI‑generated metadata hinges on the prompts sent to Ollama.  My initial prompts were overly generic (“Describe this image”), which resulted in vague outputs like “A vehicle is present.”  After a few iterations I discovered that adding context (“Focus on colour, make, and any visible markings”) dramatically improved relevance.  I’m now experimenting with a small prompt library that can be swapped out depending on the time of day or the type of detection.

-Finally, a sincere thank you is extended to my father, whose assistance was invaluable in running Ethernet cables through the roof for the wired camera setup. His contribution was essential to the project's success.
+### Smarter Notification Filtering  
+At the moment every person or vehicle triggers a push notification, which quickly becomes noisy.  The next step is to let the LLM decide whether an alert is worth sending.  By feeding the model additional data – such as the known list of resident vehicle plates or the typical schedule of the mail carrier – it can return a confidence score that HA can use to suppress low‑importance alerts.

-This project represents a blend of cutting-edge AI and practical hardware, resulting in a robust and personalized smart home security solution.
+### Scaling to More Cameras  
+Adding a fourth camera is on the roadmap, and the architecture is already prepared for it.  The biggest concern will be GPU utilisation; the RTX 3060 can comfortably handle three streams at 10 fps, but a fourth may push the limits.  Options include upgrading to a higher‑end GPU or distributing the load across two nodes in the Proxmox cluster.
+
+### Backup and Disaster Recovery  
+While the HDD is sufficient for day‑to‑day storage, a catastrophic drive failure would erase weeks of footage.  I plan to add a nightly rsync job that mirrors the video directory to an external USB‑3.0 drive, and eventually to a cheap off‑site NAS for true redundancy.
+
+---
+
+## The Bigger Picture: Why This Matters
+
+What started as a personal curiosity has turned into a platform that could be replicated by anyone with a modest budget and a willingness to roll up their sleeves.  The key takeaways are:
+
+* **Privacy first** – All video stays on‑premises, and AI processing never leaves the house.
+* **Open‑source stack** – Frigate, Home Assistant, and Ollama are all free and community‑driven, meaning you’re not locked into a vendor’s roadmap.
+* **Extensible architecture** – The mermaid diagram shows a modular flow where each component can be swapped out (e.g., replace Ollama with a different LLM, or add a second NVR for redundancy).
+* **Real‑world utility** – Semantic search turns hours of footage into a few clicks, and context‑rich notifications cut down on false alarms.
+
+In an era where “smart” devices are often synonymous with “data‑selling machines,” building a locally owned, AI‑enhanced CCTV system is a small act of digital sovereignty.  It proves that you don’t need a multi‑million‑dollar budget to have a surveillance system that actually *understands* what it’s watching.
+
+---
+
+## A Massive Shout‑Out
+
+No amount of software wizardry could have gotten the Ethernet cables through the roof without a pair of strong hands and a willingness to climb up there on a scorching summer day.  My dad took the ladder, the drill, and the patience to guide the cable through the attic, and he did it with a grin (and a few jokes about “why the son can’t even change a lightbulb”).  This project would have remained a half‑finished dream without his help, and the system we now have is as much his achievement as mine.
+
+So, Dad – thank you for the hard yards, the guidance, and for not letting my nerdy enthusiasm turn into a half‑baked mess.  This CCTV system is a testament to teamwork, a bit of Aussie ingenuity, and the belief that the best security is the kind you build yourself.
+
+---
+
+## Closing Thoughts
+
+If you’re reading this and thinking “I could use something like this,” the answer is: start small.  Grab a cheap camera, spin up a Frigate container, and let the system record a few days of footage.  Once you’ve got the basics working, add Home Assistant, hook in Ollama, and watch the alerts become smarter.  Don’t be afraid to experiment with prompts, tweak the retention policy, or even add a second GPU down the line.
+
+The journey from “just a camera” to “AI‑enhanced CCTV” is a series of incremental steps, each one teaching you a little more about video pipelines, machine learning, and home automation.  The biggest reward isn’t just the peace of mind that comes from knowing exactly who’s at the door – it’s the satisfaction of building something that truly belongs to you, runs on your hardware, and respects your privacy.
+
+So, fire up that Docker daemon, pull the Frigate image, and let the cameras roll.  And when the first AI‑generated notification lands on your phone, you’ll know you’ve turned a simple surveillance feed into a genuinely intelligent guardian for your home.