Implement AI-enhanced home CCTV system

This commit is contained in:
Blog Creator 2026-02-03 02:09:07 +00:00
parent f44a86eb6d
commit aa339dc6b3

View File

@ -1,164 +1,49 @@
Over the past six months Ive been tinkering with a problem that most homeowners shrug off as “just get a camera and be done with it.” The reality is that a plainold CCTV feed gives you a wall of motiontriggered video that you have to trawl through manually, and the moment you start looking for anything useful you quickly realise the data is as opaque as a foggy morning on the Nullarbor. My goal was simple on paper: a locallyhosted surveillance system that keeps every frame under my own roof, yet still offers the kind of intelligent search and notification youd expect from a cloudbased service. The result is an AIenhanced CCTV stack built around the opensource NVR called **Frigate**, a handful of budgetfriendly TPLink Vigi C540 cameras, and a modest homelab that runs everything in Docker containers on a Proxmox LXC node.
# Designing and Building an AI Enhanced CCTV System at Home
---
The world of home security has come a long way from blinking red lights and hefty monthly fees. I've been exploring a more sophisticated approach building a locally controlled, AI-enhanced CCTV system for my homelab. The goal? To move beyond simple surveillance and gain a deeper understanding of what's happening around my property, all while maintaining complete control over my data. Its been a fascinating project, blending hardware, software, and a healthy dose of machine learning.
## Why “LocalFirst” Matters
The system centres around Frigate, an open-source Network Video Recorder (NVR) software. Frigates intuitive interface and container-native design make it surprisingly accessible, even for those of us who aren't coding wizards. The ability to configure it using simple YAML files is a game-changer, allowing for repeatable builds and easy experimentation. While the full configuration details remain private for now (a bit of tidying up is still needed!), my aim is to eventually share the setup publicly so others can benefit from it.
When you hand over video streams to a commercial provider youre essentially giving a stranger a frontrow seat to every comings and goings at your doorstep. The tradeoff is usually a subscription fee and the promise of “smart” alerts. In practice those alerts are either too noisy (the neighbours cat triggers a “person detected” every night) or too vague (a generic “motion detected” that forces you to open the app and stare at a blank screen). Moreover, the data lives somewhere you cant audit, and any breach could expose a detailed visual diary of your life.
**The Hardware Foundation**
A localfirst approach eliminates the subscription, guarantees that the raw footage never leaves your network, and gives you full control over how the data is processed. The only thing you have to trust is the hardware you own and the opensource software you run. That level of ownership is what makes the project worth the effort.
My camera setup consists of three TP-Link vigi C540 cameras, a bargain at AUD$50 each during a Black Friday sale. These cameras offer pan and tilt functionality, along with built-in human detection and tracking features that have proven remarkably effective. While they lack zoom capabilities, theyre more than adequate for my surveillance needs.
---
**Frigate: The Core of the System**
## The Core: Frigate NVR
Frigate acts as the central hub, handling video recording, metadata tagging, and integrating AI capabilities. Im leveraging OpenVINO models for efficient on-device processing. This approach minimizes latency and reduces reliance on external services. The system is currently running on a Dell workstation powered by an Intel Core i5-7500 processor, a surprisingly capable machine I acquired for just $150. The “small” OpenVINO SSDLite model is providing good performance, although the system does run a little warm, averaging around 75% CPU usage.
Frigate is a containernative Network Video Recorder that does far more than simply write video to disk. It watches each RTSP stream, runs object detection on every frame (or on a configurable subset), and exposes a clean REST API plus a web UI for live view, playback, and search. The entire configuration lives in a single YAML file, which makes it trivial to versioncontrol, replicate, or spin up a fresh instance for testing.
**AI-Powered Intelligence**
Key points that made Frigate the obvious choice:
What truly sets this system apart is its integration with GenAI services via Ollama. This allows me to send detections and alerts to an external AI service, enabling context-aware semantic searches. For example, I can now search for "Red SUV" or "White Ute" and receive highly targeted results. This significantly enhances detection accuracy and reduces false positives.
| Feature | Why it matters |
|---------|----------------|
| **Dockerready** | Deploys in seconds on any machine that can run containers. |
| **YAMLdriven** | Humanreadable, repeatable, and easy to tweak via CI pipelines. |
| **Builtin object detectors** | Supports TensorRT, Coral, and CPUonly models; can pull the latest weights from Hugging Face. |
| **GenAI integration** | Direct hooks to Ollama for custom metadata generation. |
| **Home Assistant integration** | Exposes events that can be consumed by any automation platform. |
**Home Assistant: Orchestrating the Automation**
Because the configuration is declarative, I could spin up a fresh test environment, point it at a different camera, tweak the FFmpeg pipeline, and have the whole stack ready in under five minutes. That rapid iteration cycle was essential while I was still learning the quirks of each camera and the best way to balance CPU load against detection accuracy.
To further streamline the process, Ive integrated Frigate with Home Assistant, a powerful open-source home automation platform. This allows me to trigger notifications based on detected objects and leverage Home Assistants existing integration with Ollama and Matrix. The result is a sophisticated system that not only detects objects but also provides contextual information and sends notifications directly to my phone.
---
**System Architecture**
## The Eyes: TPLink Vigi C540 Cameras
I chose three (soon to be four) TPLink Vigi C540 units because they hit the sweet spot between price and capability. At AUD50 each they were a steal during a Black Friday sale, and they come with panandtilt motors, onboard human detection, and a followmode that keeps a moving subject in frame. The cameras lack optical zoom, but the field of view is wide enough for a typical suburban front yard, and the builtin motion detection reduces the amount of data that needs to be processed by Frigate.
| Spec | Reason for selection |
|------|-----------------------|
| **Pan/Tilt** | Allows a single unit to cover multiple angles without extra hardware. |
| **Wired Ethernet** | Guarantees a stable feed and eliminates WiFi interference. |
| **Onboard human detection** | Provides a first line of filtering before Frigate even sees the frame. |
| **Affordability** | Keeps the overall project budget modest, leaving room for a GPU upgrade later. |
The cameras are mounted on the roof, with Ethernet cables run through the attic. Ill come back to the physical installation later, but the result is a reliable, lowlatency video source that feeds directly into the Frigate container.
---
## The Brain: GPUAccelerated Object Detection
Frigate can offload inference to a GPU, a Coral Edge TPU, or even run on the CPU if youre willing to accept slower detection rates. My homelab node is a modest Intel N100 miniPC equipped with a midrange NVIDIA RTX3060. With the RTX in place, Frigate pulls the latest YOLOv8 models from Hugging Face at startup and runs detection at roughly 10fps per camera, which is more than sufficient for a residential setting.
Because the models are downloaded from the internet, the host needs occasional outbound connectivity to refresh weights. This is a small price to pay for the ability to keep the detection pipeline uptodate without manual intervention.
---
## Adding Smarts: Ollama and GenAI
Frigates native object detection tells you *what* it sees a person, a car, a dog but it cant answer “what colour is the car?” or “is that a delivery driver or a neighbour?” Thats where Ollama comes in. Ollama is a locally hosted LLM/vision service that can run multimodal models such as **qwen3vl4b** and **qwen3vl2b**. By sending a snapshot to Ollama with a carefully crafted prompt, the model returns a naturallanguage description that adds semantic depth to the raw detection.
Two distinct flows exist:
1. **Semantic Search Enrichment** When a snapshot is sent to the 4billionparameter model, the response is stored as metadata alongside the detection. This lets me type “Red SUV” into Frigates search bar and instantly filter to the relevant clips.
2. **Contextual Notification** A lighter 2billionparameter model receives the same snapshot but with a prompt that asks for a concise, securityfocused summary. The result is attached to a Matrix message that lands on my phone, turning a generic “person detected” alert into “Delivery person in hivis vest placing a parcel on the porch.”
Because both models run locally, there is no latency penalty and no data leaves the house. The only external dependency is the occasional model update from Hugging Face.
---
## Storage Strategy
Video footage is storagehungry, especially when you keep fullresolution streams for a week or more. My node uses a twotier storage layout:
* **SSD (256GB)** Hosts the OS, Docker engine, and Frigates database. The SSD ensures fast metadata reads/writes and quick container restarts.
* **2TB HDD** Dedicated to raw video files. With three cameras recording at 1080p/15fps, the drive comfortably holds:
* 7days of continuous footage,
* 14days of motiontriggered clips,
* 30days of alert snapshots.
I configured Frigates `retention` policy to automatically prune older files, and the system has been stable for months without running out of space. If the footage ever exceeds capacity, the next step would be to add a second HDD and enable a simple RAID1 mirror for redundancy.
---
## Home Assistant: The Automation Glue
Home Assistant (HA) is the hub that ties everything together. Frigate pushes detection events to HA via its builtin integration, and HA then orchestrates the following steps:
1. **Receive event** HA gets the object type, timestamp, and a URL to the snapshot.
2. **Copy image** HA downloads the snapshot to its own cache.
3. **Ask Ollama** HA sends the image to the 2billionparameter model with a prompt like “Summarise the scene for a homeowner, focusing on potential security relevance.”
4. **Dispatch notification** HA forwards the AIgenerated description and the image to a Matrix room that I have linked to my phone.
Because HA already manages my lights, locks, and climate, the CCTV alerts can also trigger downstream actions. For example, a “person at front door” event could automatically turn on the porch light, while a “delivery person” event could unlock a smart dropbox for a brief window.
---
## The Physical Build: Getting Cable Through the Roof
All the software in the world is useless if the cameras cant talk to the server. I opted for wired Ethernet because it guarantees a stable, lowlatency feed and avoids the headaches of WiFi dead zones. Running Cat6 cable from the attic down to each camera required a ladder, a drill, and a lot of patience. Im not exactly a handyman, so I called in the one person who could actually climb onto the roof without panicking my dad.
He spent an entire Saturday pulling the cable through the roof cavity, crimping connectors, and testing continuity while I handed him tools and tried not to get in the way. The result is a clean, tidy installation that looks as if it were done by a professional. The effort was a reminder that even the most hightech projects still rely on good oldfashioned elbow grease.
---
## The Architecture A Visual Overview
[Diagram illustrating the system flow - see Mermaid code below]
```mermaid
graph LR
Camera1[Camera] --> Frigate
Camera2[Camera] --> Frigate
Camera3[Camera] --> Frigate
Frigate --> ObjectDetections[Object Detections]
ObjectDetections --> Ollama1[Ollama (qwen3vl4b) Semantic Search]
Frigate --> NVR[Frigate NVR]
NVR --> HomeAssistant
HomeAssistant --> CopyImg[Copy Image to Home Assistant]
CopyImg --> Ollama2[Ollama (qwen3vl2b) Context Enhancement]
Ollama2 --> Matrix[Matrix Notification]
CameraFuture[Future Camera] --> Frigate
graph TD
Camera --> Frigate NVR
Frigate NVR --> HomeAssistant
Frigate Object Detections --> Send snapshot to Ollama (qwen3-vl-4b) for semantic search AI enhancement
Frigate NVR --> HomeAssistant --> (Object Detection from Frigate) --> (Copy Image to Home Assistant) --> Send image to Ollama (qwen3-vl-2b) for context enhancement --> send response back via Matrix
Camera --> (Camera future)
```
---
**Storage and Scalability**
## Lessons Learned and Tweaks for the Future
The system is currently housed within a node in my Proxmox cluster, utilizing Docker and LXC containers for efficient resource management. Storage is handled by a 2TB HDD, providing ample space for 7 days of full-resolution video, 14 days of detection data, and 30 days of alerts across the four cameras I plan to deploy.
### Prompt Engineering Still a WorkInProgress
The quality of the AIgenerated metadata hinges on the prompts sent to Ollama. My initial prompts were overly generic (“Describe this image”), which resulted in vague outputs like “A vehicle is present.” After a few iterations I discovered that adding context (“Focus on colour, make, and any visible markings”) dramatically improved relevance. Im now experimenting with a small prompt library that can be swapped out depending on the time of day or the type of detection.
**Ongoing Development and Future Enhancements**
### Smarter Notification Filtering
At the moment every person or vehicle triggers a push notification, which quickly becomes noisy. The next step is to let the LLM decide whether an alert is worth sending. By feeding the model additional data such as the known list of resident vehicle plates or the typical schedule of the mail carrier it can return a confidence score that HA can use to suppress lowimportance alerts.
While the system is already delivering impressive results, theres still room for improvement. Im currently refining the prompts used for semantic context enhancement and notifications to reduce false positives. A key goal is to implement AI-driven decision-making for notifications, preventing unnecessary alerts for every detected person or vehicle.
### Scaling to More Cameras
Adding a fourth camera is on the roadmap, and the architecture is already prepared for it. The biggest concern will be GPU utilisation; the RTX3060 can comfortably handle three streams at 10fps, but a fourth may push the limits. Options include upgrading to a higherend GPU or distributing the load across two nodes in the Proxmox cluster.
**A Note of Gratitude**
### Backup and Disaster Recovery
While the HDD is sufficient for daytoday storage, a catastrophic drive failure would erase weeks of footage. I plan to add a nightly rsync job that mirrors the video directory to an external USB3.0 drive, and eventually to a cheap offsite NAS for true redundancy.
Finally, a sincere thank you to my father. His expertise and willingness to lend a hand were invaluable, particularly when it came to the less glamorous task of running Ethernet cables through the roof. Without his assistance, this project simply wouldnt have been possible.
---
## The Bigger Picture: Why This Matters
What started as a personal curiosity has turned into a platform that could be replicated by anyone with a modest budget and a willingness to roll up their sleeves. The key takeaways are:
* **Privacy first** All video stays onpremises, and AI processing never leaves the house.
* **Opensource stack** Frigate, Home Assistant, and Ollama are all free and communitydriven, meaning youre not locked into a vendors roadmap.
* **Extensible architecture** The mermaid diagram shows a modular flow where each component can be swapped out (e.g., replace Ollama with a different LLM, or add a second NVR for redundancy).
* **Realworld utility** Semantic search turns hours of footage into a few clicks, and contextrich notifications cut down on false alarms.
In an era where “smart” devices are often synonymous with “dataselling machines,” building a locally owned, AIenhanced CCTV system is a small act of digital sovereignty. It proves that you dont need a multimilliondollar budget to have a surveillance system that actually *understands* what its watching.
---
## A Massive ShoutOut
No amount of software wizardry could have gotten the Ethernet cables through the roof without a pair of strong hands and a willingness to climb up there on a scorching summer day. My dad took the ladder, the drill, and the patience to guide the cable through the attic, and he did it with a grin (and a few jokes about “why the son cant even change a lightbulb”). This project would have remained a halffinished dream without his help, and the system we now have is as much his achievement as mine.
So, Dad thank you for the hard yards, the guidance, and for not letting my nerdy enthusiasm turn into a halfbaked mess. This CCTV system is a testament to teamwork, a bit of Aussie ingenuity, and the belief that the best security is the kind you build yourself.
---
## Closing Thoughts
If youre reading this and thinking “I could use something like this,” the answer is: start small. Grab a cheap camera, spin up a Frigate container, and let the system record a few days of footage. Once youve got the basics working, add Home Assistant, hook in Ollama, and watch the alerts become smarter. Dont be afraid to experiment with prompts, tweak the retention policy, or even add a second GPU down the line.
The journey from “just a camera” to “AIenhanced CCTV” is a series of incremental steps, each one teaching you a little more about video pipelines, machine learning, and home automation. The biggest reward isnt just the peace of mind that comes from knowing exactly whos at the door its the satisfaction of building something that truly belongs to you, runs on your hardware, and respects your privacy.
So, fire up that Docker daemon, pull the Frigate image, and let the cameras roll. And when the first AIgenerated notification lands on your phone, youll know youve turned a simple surveillance feed into a genuinely intelligent guardian for your home.