Introduce private AI CCTV system

This commit is contained in:
Blog Creator 2026-02-03 02:32:08 +00:00
parent aa339dc6b3
commit 955292d797

View File

@ -1,49 +1,119 @@
# Designing and Building an AI Enhanced CCTV System at Home ### Introduction
The world of home security has come a long way from blinking red lights and hefty monthly fees. I've been exploring a more sophisticated approach building a locally controlled, AI-enhanced CCTV system for my homelab. The goal? To move beyond simple surveillance and gain a deeper understanding of what's happening around my property, all while maintaining complete control over my data. Its been a fascinating project, blending hardware, software, and a healthy dose of machine learning. Over the past halfyear Ive been tinkering with a problem that many of us in the homelab community face every day: how to turn a handful of cheap IP cameras into a truly useful security system without surrendering our footage to a cloud service. The answer, after a lot of trial and error, is a stack that lives entirely onpremises, talks to itself, and even adds a sprinkle of generative AI to make the alerts intelligible. In plain English, the system records video, spots objects, asks a local visionlanguage model to describe what it sees, and then pushes a nicely worded notification to my phone. The result is a CCTV solution that is private, affordable, and surprisingly capable.
The system centres around Frigate, an open-source Network Video Recorder (NVR) software. Frigates intuitive interface and container-native design make it surprisingly accessible, even for those of us who aren't coding wizards. The ability to configure it using simple YAML files is a game-changer, allowing for repeatable builds and easy experimentation. While the full configuration details remain private for now (a bit of tidying up is still needed!), my aim is to eventually share the setup publicly so others can benefit from it. ### Why Local Control Matters
**The Hardware Foundation** The market is flooded with “smart” cameras that promise motion alerts, facial recognition, and cloudbased video archives. Those devices are convenient, but they also hand over a valuable slice of your privacy to a thirdparty. A compromised account or a misconfigured API key can expose weeks of footage to the world. For a home that already has a smarthome hub, a voice assistant, and a handful of IoT devices, adding another internetfacing endpoint feels like inviting a stranger into the living room.
My camera setup consists of three TP-Link vigi C540 cameras, a bargain at AUD$50 each during a Black Friday sale. These cameras offer pan and tilt functionality, along with built-in human detection and tracking features that have proven remarkably effective. While they lack zoom capabilities, theyre more than adequate for my surveillance needs. By keeping everything inside the home network we gain three things:
**Frigate: The Core of the System** 1. **Data sovereignty** the video never leaves the LAN, so only the people you trust can view it.
2. **Predictable costs** no monthly subscription for storage or AI inference; the only expense is the hardware you already own.
3. **Full customisation** we can decide exactly how alerts are generated, what language they use, and when they are sent.
Frigate acts as the central hub, handling video recording, metadata tagging, and integrating AI capabilities. Im leveraging OpenVINO models for efficient on-device processing. This approach minimizes latency and reduces reliance on external services. The system is currently running on a Dell workstation powered by an Intel Core i5-7500 processor, a surprisingly capable machine I acquired for just $150. The “small” OpenVINO SSDLite model is providing good performance, although the system does run a little warm, averaging around 75% CPU usage. The only downside is the extra effort required to stitch the pieces together, but thats where the fun begins.
**AI-Powered Intelligence** ### Choosing the Cameras
What truly sets this system apart is its integration with GenAI services via Ollama. This allows me to send detections and alerts to an external AI service, enabling context-aware semantic searches. For example, I can now search for "Red SUV" or "White Ute" and receive highly targeted results. This significantly enhances detection accuracy and reduces false positives. I needed cameras that were cheap enough to experiment with, yet offered enough features to make the system worthwhile. A Black Friday deal on Amazon AU landed me three (soon to be four) TPLink Vigi C540 units for AUD50 each. For the price they are surprisingly capable:
**Home Assistant: Orchestrating the Automation** * **Pan/tilt** allows a single unit to cover a wide field of view without additional hardware.
* **Onboard human detection** the camera can flag a person before any external processing, reducing false positives.
* **RTSP support** the standard streaming protocol that Frigate expects, meaning we are not locked into a proprietary API.
To further streamline the process, Ive integrated Frigate with Home Assistant, a powerful open-source home automation platform. This allows me to trigger notifications based on detected objects and leverage Home Assistants existing integration with Ollama and Matrix. The result is a sophisticated system that not only detects objects but also provides contextual information and sends notifications directly to my phone. The cameras lack optical zoom, but the 1080p stream is more than enough for a backyard or frontgate view. Their builtin motion logic also gives us a first line of defence against unnecessary processing on the NVR.
**System Architecture** ### The NVR Core: Frigate
[Diagram illustrating the system flow - see Mermaid code below] At the heart of the system sits **Frigate**, an opensource network video recorder that runs as a Docker container. What makes Frigate a perfect fit for a homelab project?
* **Containernative** it can be deployed on any host that supports Docker, which aligns nicely with my Proxmox LXC setup.
* **YAMLdriven configuration** a single, versioncontrolled file defines cameras, streams, detection models, and storage policies.
* **Builtin object detection** Frigate can invoke a variety of detectors (OpenVINO, TensorRT, Coral) without external orchestration.
* **Extensible API** the platform exposes webhooks and REST endpoints that other services (Home Assistant, Ollama) can consume.
Spinning up a new Frigate instance is as simple as pulling the image, mounting the configuration file, and pointing it at the RTSP URLs of the cameras. The UI instantly shows live feeds, detection boxes, and a timeline of events, which made the early testing phase feel like watching a scifi control room.
### Getting the Detection Right
Frigate ships with a handful of preconfigured detectors, but I wanted a model that could run on the modest hardware I had on hand: a Dell OptiPlex7060 SFF rescued for $150, equipped with an Intel i57500 and 16GB of RAM. After a few experiments I settled on the **OpenVINO** backend with the **SSDLite MobileNetV2** model from the Open Model Zoo. The model is small enough to run on CPU while still delivering decent accuracy for cars, people, and animals.
A few practical notes from the deployment:
* **Model download** Frigate pulls the model from Hugging Face at first start, so the host needs occasional internet access for updates.
* **CPU utilisation** The i5 hovers around 75% load under continuous detection, which is acceptable for now but leaves little headroom for additional workloads.
* **Thermal considerations** The workstation runs warm; Ive added a lowprofile fan to keep temperatures in check.
The detector provides bounding boxes and confidence scores for each frame that triggers an event. Those raw detections are the raw material that later gets enriched by the generative AI layer.
### Adding Generative AI with Ollama
Frigates newest integration allows snapshots of detections to be sent to an external **GenAI** service. I run **Ollama** locally, which hosts a variety of large language models, including visionlanguage variants such as **qwen3vl4b** and **qwen3vl2b**. The workflow is straightforward:
1. An object (e.g., a car) is detected by Frigate.
2. Frigate captures a still image of the frame and forwards it to Ollama via the GenAI webhook.
3. The visionlanguage model analyses the image and returns a concise textual description (“Red SUV parked in the driveway”).
4. The description is stored as metadata alongside the detection event.
This extra step turns a generic “car” tag into a searchable phrase. In the Frigate UI I can now type “white ute” or “red scooter” and instantly retrieve the matching clips. The semantic search capability is the most exciting part of the project because it opens the door to naturallanguage queries over a video archive—a feature that commercial cloud services charge a premium for.
### Home Assistant: The Glue That Makes It All Useful
Frigate already gives us detection and enriched metadata, but to turn that into actionable alerts we need an automation engine. **Home Assistant** fills that role perfectly:
* **Event ingestion** Frigate pushes detection events to Home Assistant via its native integration.
* **AIdriven notifications** Home Assistant receives the snapshot, forwards it to Ollama for a second pass (using a smaller model for speed), and then formats a friendly message.
* **Matrix delivery** The final message, complete with the image and AIgenerated caption, is sent to my Matrix client on my phone.
The result is a notification that reads something like: “A white ute with a surfboard in the back has just entered the driveway.” Compared with a raw “motion detected” alert, this is a massive usability upgrade. Moreover, Home Assistant can trigger other automations based on the enriched data, such as turning on porch lights when a car arrives after sunset, or silencing the alarm when the mail carrier is recognised.
### Storage Strategy
Video storage is often the Achilles heel of a DIY CCTV system. My approach balances capacity, retention, and cost:
| Storage tier | Device | Purpose | Retention |
|--------------|--------|---------|-----------|
| OS & containers | 256GB SSD | Frigate, Home Assistant, Ollama | Indefinite |
| Video archive | 2TB HDD | Fullresolution recordings, detection clips, alert snapshots | 7days (full), 14days (detections), 30days (alerts) |
Frigate automatically prunes old files based on the policies defined in the YAML configuration, so the drive never fills up unexpectedly. With four 1080p streams at 5fps the 2TB budget comfortably covers the stated retention windows.
### Deploying on Proxmox
All services run inside a dedicated LXC container on a Proxmox node that I carved out specifically for this project. The container hosts Docker, the Frigate image, Home Assistant, and Ollama. Using LXC gives me a lightweight isolation layer without the overhead of a full VM, and Proxmoxs snapshot feature lets me roll back the entire stack if a configuration change goes sideways.
The network layout is simple: each camera connects via wired Ethernet to the same LAN as the Proxmox host, ensuring low latency and no packet loss that could otherwise cause frame drops. The containers Docker bridge is bound to the hosts network stack, so Frigate sees the cameras as if it were running directly on the workstation.
### Challenges and Whats Next
The system works, but there are a few rough edges that Im actively polishing:
* **Prompt engineering** The text prompts sent to the visionlanguage models are still generic, leading to occasional misclassifications (e.g., a cat being described as a “small vehicle”). Finetuning the prompts and possibly adding a few fewshot examples should improve consistency.
* **Notification fatigue** Right now every person or car that passes the driveway generates a push notification. I plan to add a lightweight decision model that evaluates the confidence score, time of day, and historical patterns before deciding whether to alert.
* **Hardware scaling** The i57500 is holding its own, but as I add more cameras or switch to a larger model (e.g., a 4billionparameter vision transformer), Ill need a more powerful CPU or a dedicated GPU/Intel Arc accelerator.
* **Public configuration** The deployment repository still contains secrets. Once scrubbed, I intend to publish the full YAML and Dockercompose files so others can replicate the stack with a single `git clone`.
### A Personal Note: The Unsung Hero
No amount of software wizardry would have gotten the Ethernet cables into the roof without a bit of elbow grease. My dad, armed with a ladder, a drill, and a healthy dose of patience, ran the cabling that links the cameras to the network. He spent an entire Saturday climbing into the attic, pulling cables through joists, and making sure each connection was solid. I was mostly handing him tools and keeping the coffee warm. Without his help the whole project would have remained a pipedream. So, Dad, thank you for the hard yards, the guidance, and the occasional dadjoke that kept the mood light while we were up on the roof. This system is as much yours as it is mine.
### Closing Thoughts
Building an AIenhanced CCTV system at home is no longer a pipedream reserved for large enterprises. With a modest budget, a handful of inexpensive cameras, and a stack of opensource tools—Frigate, OpenVINO, Ollama, Home Assistant, and a bit of Proxmox magic—you can achieve a private, searchable, and contextrich surveillance solution. The journey taught me a lot about video streaming, container orchestration, and the quirks of visionlanguage models, but the biggest takeaway is the empowerment that comes from owning your data.
If youre curious about the nittygritty—YAML snippets, Dockercompose files, or the exact model versions—Im happy to share once the repository is fully sanitized. Until then, enjoy the peace of mind that comes from knowing your backyard is being watched by a system that not only sees but also understands.
```mermaid ```mermaid
graph TD graph TD
Camera --> Frigate NVR Camera1[Camera] --> Frigate
Frigate NVR --> HomeAssistant Camera2[Camera] --> Frigate
Frigate Object Detections --> Send snapshot to Ollama (qwen3-vl-4b) for semantic search AI enhancement Camera3[Camera] --> Frigate
Frigate NVR --> HomeAssistant --> (Object Detection from Frigate) --> (Copy Image to Home Assistant) --> Send image to Ollama (qwen3-vl-2b) for context enhancement --> send response back via Matrix Camera4[Future Camera] --> Frigate
Camera --> (Camera future) Frigate -->|Object Detections| Ollama_qwen3_vl_4b[Send snapshot to Ollama (qwen3vl4b) for semantic search AI enhancement]
``` Frigate --> Frigate_NVR[Frigate NVR]
Frigate_NVR --> HomeAssistant[Home Assistant]
**Storage and Scalability** HomeAssistant -->|Object Detection from Frigate| CopyImage[Copy Image to Home Assistant]
CopyImage --> Ollama_qwen3_vl_2b[Send image to Ollama (qwen3vl2b) for context enhancement]
The system is currently housed within a node in my Proxmox cluster, utilizing Docker and LXC containers for efficient resource management. Storage is handled by a 2TB HDD, providing ample space for 7 days of full-resolution video, 14 days of detection data, and 30 days of alerts across the four cameras I plan to deploy. Ollama_qwen3_vl_2b --> Matrix[Send response back via Matrix]
```
**Ongoing Development and Future Enhancements**
While the system is already delivering impressive results, theres still room for improvement. Im currently refining the prompts used for semantic context enhancement and notifications to reduce false positives. A key goal is to implement AI-driven decision-making for notifications, preventing unnecessary alerts for every detected person or vehicle.
**A Note of Gratitude**
Finally, a sincere thank you to my father. His expertise and willingness to lend a hand were invaluable, particularly when it came to the less glamorous task of running Ethernet cables through the roof. Without his assistance, this project simply wouldnt have been possible.