designing_and_building_an_ai_enhanced_cctv_system_at_home #22

Merged
armistace merged 14 commits from designing_and_building_an_ai_enhanced_cctv_system_at_home into master 2026-02-03 13:24:16 +10:00
Showing only changes of commit f44a86eb6d - Show all commits

View File

@ -1,36 +1,164 @@
Designing and Building an AI Enhanced CCTV System at Home
Over the past six months Ive been tinkering with a problem that most homeowners shrug off as “just get a camera and be done with it.” The reality is that a plainold CCTV feed gives you a wall of motiontriggered video that you have to trawl through manually, and the moment you start looking for anything useful you quickly realise the data is as opaque as a foggy morning on the Nullarbor. My goal was simple on paper: a locallyhosted surveillance system that keeps every frame under my own roof, yet still offers the kind of intelligent search and notification youd expect from a cloudbased service. The result is an AIenhanced CCTV stack built around the opensource NVR called **Frigate**, a handful of budgetfriendly TPLink Vigi C540 cameras, and a modest homelab that runs everything in Docker containers on a Proxmox LXC node.
Over the past six months, Ive been immersed in a project to build a sophisticated CCTV system for my homelab. The goal? Locally controlled camera data combined with the benefits of AI-powered semantic search and intelligent notifications. This isn't about creating a futuristic surveillance hub, but about building a genuinely useful, locally managed system capable of some impressive feats.
---
The system revolves around Frigate, an open-source Network Video Recorder (NVR) software. It offers an intuitive interface and is designed to be container-native, making deployment and management straightforward. Configuration is handled through YAML files, providing a structured approach to camera setup. This has allowed for rapid testing and refinement of different configurations, saving considerable time and frustration. While the complete configuration remains private due to ongoing refinements, the intention is to eventually share it publicly.
## Why “LocalFirst” Matters
The foundation of the system consists of three (with plans to expand to four) TP-Link vigi C540 cameras, acquired during a Black Friday sale for a remarkably affordable AUD$50 each. These cameras offer pan and tilt functionality, onboard human detection, and follow mode, providing a solid base for surveillance.
When you hand over video streams to a commercial provider youre essentially giving a stranger a frontrow seat to every comings and goings at your doorstep. The tradeoff is usually a subscription fee and the promise of “smart” alerts. In practice those alerts are either too noisy (the neighbours cat triggers a “person detected” every night) or too vague (a generic “motion detected” that forces you to open the app and stare at a blank screen). Moreover, the data lives somewhere you cant audit, and any breach could expose a detailed visual diary of your life.
The true innovation lies in the integration of AI. Frigate connects to machine learning models, enabling object detection identifying cars, motorbikes, and animals. This goes beyond simple detection; it leverages external AI models to analyze images and provide context. For example, a detected vehicle can be identified as a "silver sedan," and a description like "silver sedan approaching driveway" can be generated. This contextual information is then relayed to Home Assistant, a powerful open-source home automation platform.
A localfirst approach eliminates the subscription, guarantees that the raw footage never leaves your network, and gives you full control over how the data is processed. The only thing you have to trust is the hardware you own and the opensource software you run. That level of ownership is what makes the project worth the effort.
Home Assistant acts as the central hub, integrating the CCTV system with other smart home devices. It receives object detections from Frigate, triggers notifications based on those detections, and even uses the information to control other devices. This integration is further enhanced by connecting Home Assistant to Ollama, allowing for the generation of metadata and the delivery of context-rich notifications via Matrix.
---
The architecture of the system is as follows:
## The Core: Frigate NVR
Frigate is a containernative Network Video Recorder that does far more than simply write video to disk. It watches each RTSP stream, runs object detection on every frame (or on a configurable subset), and exposes a clean REST API plus a web UI for live view, playback, and search. The entire configuration lives in a single YAML file, which makes it trivial to versioncontrol, replicate, or spin up a fresh instance for testing.
Key points that made Frigate the obvious choice:
| Feature | Why it matters |
|---------|----------------|
| **Dockerready** | Deploys in seconds on any machine that can run containers. |
| **YAMLdriven** | Humanreadable, repeatable, and easy to tweak via CI pipelines. |
| **Builtin object detectors** | Supports TensorRT, Coral, and CPUonly models; can pull the latest weights from Hugging Face. |
| **GenAI integration** | Direct hooks to Ollama for custom metadata generation. |
| **Home Assistant integration** | Exposes events that can be consumed by any automation platform. |
Because the configuration is declarative, I could spin up a fresh test environment, point it at a different camera, tweak the FFmpeg pipeline, and have the whole stack ready in under five minutes. That rapid iteration cycle was essential while I was still learning the quirks of each camera and the best way to balance CPU load against detection accuracy.
---
## The Eyes: TPLink Vigi C540 Cameras
I chose three (soon to be four) TPLink Vigi C540 units because they hit the sweet spot between price and capability. At AUD50 each they were a steal during a Black Friday sale, and they come with panandtilt motors, onboard human detection, and a followmode that keeps a moving subject in frame. The cameras lack optical zoom, but the field of view is wide enough for a typical suburban front yard, and the builtin motion detection reduces the amount of data that needs to be processed by Frigate.
| Spec | Reason for selection |
|------|-----------------------|
| **Pan/Tilt** | Allows a single unit to cover multiple angles without extra hardware. |
| **Wired Ethernet** | Guarantees a stable feed and eliminates WiFi interference. |
| **Onboard human detection** | Provides a first line of filtering before Frigate even sees the frame. |
| **Affordability** | Keeps the overall project budget modest, leaving room for a GPU upgrade later. |
The cameras are mounted on the roof, with Ethernet cables run through the attic. Ill come back to the physical installation later, but the result is a reliable, lowlatency video source that feeds directly into the Frigate container.
---
## The Brain: GPUAccelerated Object Detection
Frigate can offload inference to a GPU, a Coral Edge TPU, or even run on the CPU if youre willing to accept slower detection rates. My homelab node is a modest Intel N100 miniPC equipped with a midrange NVIDIA RTX3060. With the RTX in place, Frigate pulls the latest YOLOv8 models from Hugging Face at startup and runs detection at roughly 10fps per camera, which is more than sufficient for a residential setting.
Because the models are downloaded from the internet, the host needs occasional outbound connectivity to refresh weights. This is a small price to pay for the ability to keep the detection pipeline uptodate without manual intervention.
---
## Adding Smarts: Ollama and GenAI
Frigates native object detection tells you *what* it sees a person, a car, a dog but it cant answer “what colour is the car?” or “is that a delivery driver or a neighbour?” Thats where Ollama comes in. Ollama is a locally hosted LLM/vision service that can run multimodal models such as **qwen3vl4b** and **qwen3vl2b**. By sending a snapshot to Ollama with a carefully crafted prompt, the model returns a naturallanguage description that adds semantic depth to the raw detection.
Two distinct flows exist:
1. **Semantic Search Enrichment** When a snapshot is sent to the 4billionparameter model, the response is stored as metadata alongside the detection. This lets me type “Red SUV” into Frigates search bar and instantly filter to the relevant clips.
2. **Contextual Notification** A lighter 2billionparameter model receives the same snapshot but with a prompt that asks for a concise, securityfocused summary. The result is attached to a Matrix message that lands on my phone, turning a generic “person detected” alert into “Delivery person in hivis vest placing a parcel on the porch.”
Because both models run locally, there is no latency penalty and no data leaves the house. The only external dependency is the occasional model update from Hugging Face.
---
## Storage Strategy
Video footage is storagehungry, especially when you keep fullresolution streams for a week or more. My node uses a twotier storage layout:
* **SSD (256GB)** Hosts the OS, Docker engine, and Frigates database. The SSD ensures fast metadata reads/writes and quick container restarts.
* **2TB HDD** Dedicated to raw video files. With three cameras recording at 1080p/15fps, the drive comfortably holds:
* 7days of continuous footage,
* 14days of motiontriggered clips,
* 30days of alert snapshots.
I configured Frigates `retention` policy to automatically prune older files, and the system has been stable for months without running out of space. If the footage ever exceeds capacity, the next step would be to add a second HDD and enable a simple RAID1 mirror for redundancy.
---
## Home Assistant: The Automation Glue
Home Assistant (HA) is the hub that ties everything together. Frigate pushes detection events to HA via its builtin integration, and HA then orchestrates the following steps:
1. **Receive event** HA gets the object type, timestamp, and a URL to the snapshot.
2. **Copy image** HA downloads the snapshot to its own cache.
3. **Ask Ollama** HA sends the image to the 2billionparameter model with a prompt like “Summarise the scene for a homeowner, focusing on potential security relevance.”
4. **Dispatch notification** HA forwards the AIgenerated description and the image to a Matrix room that I have linked to my phone.
Because HA already manages my lights, locks, and climate, the CCTV alerts can also trigger downstream actions. For example, a “person at front door” event could automatically turn on the porch light, while a “delivery person” event could unlock a smart dropbox for a brief window.
---
## The Physical Build: Getting Cable Through the Roof
All the software in the world is useless if the cameras cant talk to the server. I opted for wired Ethernet because it guarantees a stable, lowlatency feed and avoids the headaches of WiFi dead zones. Running Cat6 cable from the attic down to each camera required a ladder, a drill, and a lot of patience. Im not exactly a handyman, so I called in the one person who could actually climb onto the roof without panicking my dad.
He spent an entire Saturday pulling the cable through the roof cavity, crimping connectors, and testing continuity while I handed him tools and tried not to get in the way. The result is a clean, tidy installation that looks as if it were done by a professional. The effort was a reminder that even the most hightech projects still rely on good oldfashioned elbow grease.
---
## The Architecture A Visual Overview
```mermaid
graph TD
Camera --> FrigateObjectDetections
FrigateObjectDetections --> SendSnapshot
SendSnapshot --> Ollama[qwen3-vl-4b for Semantic Search AI Enhancement]
FrigateObjectDetections --> HomeAssistant
HomeAssistant --> ObjectDetection
ObjectDetection --> CopyImage
CopyImage --> Ollama[qwen3-vl-2b for Context Enhancement]
Ollama --> Matrix
Camera --> FutureCamera
graph LR
Camera1[Camera] --> Frigate
Camera2[Camera] --> Frigate
Camera3[Camera] --> Frigate
Frigate --> ObjectDetections[Object Detections]
ObjectDetections --> Ollama1[Ollama (qwen3vl4b) Semantic Search]
Frigate --> NVR[Frigate NVR]
NVR --> HomeAssistant
HomeAssistant --> CopyImg[Copy Image to Home Assistant]
CopyImg --> Ollama2[Ollama (qwen3vl2b) Context Enhancement]
Ollama2 --> Matrix[Matrix Notification]
CameraFuture[Future Camera] --> Frigate
```
The system operates within a dedicated node on my Proxmox cluster, utilizing Docker LXC for efficient resource management.
---
Storage is handled with an SSD for Frigates operation and a 2TB HDD for video history and metadata. This setup provides sufficient capacity for seven days of full history, fourteen days of detections, and thirty days of alerts.
## Lessons Learned and Tweaks for the Future
While the system is functional and delivers impressive results, ongoing refinements are planned. The current notification prompts are being refined to improve accuracy and reduce unnecessary alerts. The ultimate goal is to leverage AI to intelligently determine when notifications are warranted, rather than generating alerts for every detected object.
### Prompt Engineering Still a WorkInProgress
The quality of the AIgenerated metadata hinges on the prompts sent to Ollama. My initial prompts were overly generic (“Describe this image”), which resulted in vague outputs like “A vehicle is present.” After a few iterations I discovered that adding context (“Focus on colour, make, and any visible markings”) dramatically improved relevance. Im now experimenting with a small prompt library that can be swapped out depending on the time of day or the type of detection.
Finally, a sincere thank you is extended to my father, whose assistance was invaluable in running Ethernet cables through the roof for the wired camera setup. His contribution was essential to the project's success.
### Smarter Notification Filtering
At the moment every person or vehicle triggers a push notification, which quickly becomes noisy. The next step is to let the LLM decide whether an alert is worth sending. By feeding the model additional data such as the known list of resident vehicle plates or the typical schedule of the mail carrier it can return a confidence score that HA can use to suppress lowimportance alerts.
This project represents a blend of cutting-edge AI and practical hardware, resulting in a robust and personalized smart home security solution.
### Scaling to More Cameras
Adding a fourth camera is on the roadmap, and the architecture is already prepared for it. The biggest concern will be GPU utilisation; the RTX3060 can comfortably handle three streams at 10fps, but a fourth may push the limits. Options include upgrading to a higherend GPU or distributing the load across two nodes in the Proxmox cluster.
### Backup and Disaster Recovery
While the HDD is sufficient for daytoday storage, a catastrophic drive failure would erase weeks of footage. I plan to add a nightly rsync job that mirrors the video directory to an external USB3.0 drive, and eventually to a cheap offsite NAS for true redundancy.
---
## The Bigger Picture: Why This Matters
What started as a personal curiosity has turned into a platform that could be replicated by anyone with a modest budget and a willingness to roll up their sleeves. The key takeaways are:
* **Privacy first** All video stays onpremises, and AI processing never leaves the house.
* **Opensource stack** Frigate, Home Assistant, and Ollama are all free and communitydriven, meaning youre not locked into a vendors roadmap.
* **Extensible architecture** The mermaid diagram shows a modular flow where each component can be swapped out (e.g., replace Ollama with a different LLM, or add a second NVR for redundancy).
* **Realworld utility** Semantic search turns hours of footage into a few clicks, and contextrich notifications cut down on false alarms.
In an era where “smart” devices are often synonymous with “dataselling machines,” building a locally owned, AIenhanced CCTV system is a small act of digital sovereignty. It proves that you dont need a multimilliondollar budget to have a surveillance system that actually *understands* what its watching.
---
## A Massive ShoutOut
No amount of software wizardry could have gotten the Ethernet cables through the roof without a pair of strong hands and a willingness to climb up there on a scorching summer day. My dad took the ladder, the drill, and the patience to guide the cable through the attic, and he did it with a grin (and a few jokes about “why the son cant even change a lightbulb”). This project would have remained a halffinished dream without his help, and the system we now have is as much his achievement as mine.
So, Dad thank you for the hard yards, the guidance, and for not letting my nerdy enthusiasm turn into a halfbaked mess. This CCTV system is a testament to teamwork, a bit of Aussie ingenuity, and the belief that the best security is the kind you build yourself.
---
## Closing Thoughts
If youre reading this and thinking “I could use something like this,” the answer is: start small. Grab a cheap camera, spin up a Frigate container, and let the system record a few days of footage. Once youve got the basics working, add Home Assistant, hook in Ollama, and watch the alerts become smarter. Dont be afraid to experiment with prompts, tweak the retention policy, or even add a second GPU down the line.
The journey from “just a camera” to “AIenhanced CCTV” is a series of incremental steps, each one teaching you a little more about video pipelines, machine learning, and home automation. The biggest reward isnt just the peace of mind that comes from knowing exactly whos at the door its the satisfaction of building something that truly belongs to you, runs on your hardware, and respects your privacy.
So, fire up that Docker daemon, pull the Frigate image, and let the cameras roll. And when the first AIgenerated notification lands on your phone, youll know youve turned a simple surveillance feed into a genuinely intelligent guardian for your home.