290 lines
9.4 KiB
Markdown
290 lines
9.4 KiB
Markdown
# Blog Creator
|
|
|
|
An automated blog generation system that uses CrewAI agents to research, write, and edit blog posts from Trilium notes.
|
|
|
|
## Architecture
|
|
|
|
The system uses three CrewAI crews orchestrated by a Flow:
|
|
|
|
1. **Research Crew** - A critical researcher agent with web search capabilities investigates the topic and produces verified findings
|
|
2. **Writing Crew** - Four creative journalist agents write draft blog articles in parallel, each with different creative styles
|
|
3. **Editor Crew** - A critical editor loads the drafts into a vector database, queries for relevant context, and produces the final polished document with metadata
|
|
|
|
## Requirements
|
|
|
|
- Python 3.10 or later
|
|
- Ollama server running with required models
|
|
- ChromaDB server for vector storage
|
|
- Trilium notes instance
|
|
- Gitea instance (for automated workflows)
|
|
- n8n instance (for notifications)
|
|
|
|
## Environment Variables
|
|
|
|
Create a `.env` file in the project root with the following variables:
|
|
|
|
```
|
|
# Trilium Configuration
|
|
TRILIUM_HOST=
|
|
TRILIUM_PORT=
|
|
TRILIUM_PROTOCOL=https
|
|
TRILIUM_PASS=
|
|
TRILIUM_TOKEN=
|
|
|
|
# Ollama Configuration
|
|
OLLAMA_PROTOCOL=http
|
|
OLLAMA_HOST=
|
|
OLLAMA_PORT=11434
|
|
EMBEDDING_MODEL=nomic-embed-text
|
|
EDITOR_MODEL=llama3.1:8b
|
|
CONTENT_CREATOR_MODELS=["phi4-mini:latest", "qwen3:1.7b", "gemma3:latest"]
|
|
|
|
# ChromaDB Configuration
|
|
CHROMA_HOST=chroma
|
|
CHROMA_PORT=8000
|
|
|
|
# Git Configuration
|
|
GIT_USER=
|
|
GIT_PASS=
|
|
GIT_PROTOCOL=https
|
|
GIT_REMOTE=git.aridgwayweb.com/armistace/blog.git
|
|
|
|
# Notification Configuration
|
|
N8N_SECRET=
|
|
N8N_WEBHOOK_URL=
|
|
|
|
# Ollama Web Search (required for researcher agent)
|
|
OLLAMA_API_KEY=
|
|
```
|
|
|
|
### CONTENT_CREATOR_MODELS Format
|
|
|
|
The `CONTENT_CREATOR_MODELS` variable should be a JSON array of Ollama model names. Each model will be used by one of the three journalist agents. Example:
|
|
|
|
```
|
|
CONTENT_CREATOR_MODELS=["llama3.1:8b", "qwen2.5:7b", "phi4:latest"]
|
|
```
|
|
|
|
### OLLAMA_API_KEY
|
|
|
|
The researcher agent uses Ollama's native web search API. Create an API key from your Ollama account (https://ollama.com) and add it to your `.env` file. This uses your existing Ollama subscription for web searches.
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
blog_creator/
|
|
├── .env # Environment variables (create this)
|
|
├── .gitea/workflows/deploy.yml # Gitea Actions workflow
|
|
├── docker-compose.yml # Local development setup
|
|
├── requirements.txt # Python dependencies
|
|
├── README.md # This file
|
|
└── src/
|
|
├── main.py # Entry point
|
|
└── ai_generators/
|
|
├── ollama_md_generator.py # Main interface (used by main.py)
|
|
├── blog_flow.py # CrewAI Flow orchestrator
|
|
├── crews/
|
|
│ ├── research_crew/ # Researcher agent with web search
|
|
│ ├── writing_crew/ # Three journalist agents
|
|
│ └── editor_crew/ # Editor agent with metadata generation
|
|
└── tools/
|
|
```
|
|
|
|
## Local Development Setup
|
|
|
|
### Using Docker Compose
|
|
|
|
1. Clone the repository and navigate to the project directory
|
|
|
|
2. Create your `.env` file with all required variables
|
|
|
|
3. Start the services:
|
|
|
|
```bash
|
|
docker-compose up -d
|
|
```
|
|
|
|
This starts:
|
|
- `blog_creator` - The main application container
|
|
- `chroma` - ChromaDB vector database
|
|
|
|
4. The container will run `main.py` automatically on startup. To run manually:
|
|
|
|
```bash
|
|
docker-compose exec blog_creator python src/main.py
|
|
```
|
|
|
|
### Manual Setup (without Docker)
|
|
|
|
1. Install system dependencies:
|
|
|
|
```bash
|
|
apt update && apt install -y rustc cargo python-is-python3 pip python3-venv libmagic-dev git
|
|
```
|
|
|
|
2. Create and activate a virtual environment:
|
|
|
|
```bash
|
|
python -m venv .venv
|
|
source .venv/bin/activate
|
|
```
|
|
|
|
3. Install Python dependencies:
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
4. Configure Git:
|
|
|
|
```bash
|
|
git config --global user.name "Blog Creator"
|
|
git config --global user.email "your-email@example.com"
|
|
git config --global push.autoSetupRemote true
|
|
```
|
|
|
|
5. Run the application:
|
|
|
|
```bash
|
|
python src/main.py
|
|
```
|
|
|
|
## How It Works
|
|
|
|
### Trilium Integration
|
|
|
|
The system fetches notes from Trilium that are tagged for blog creation. Each note becomes one blog post. The note content is used as the basis for the AI-generated article.
|
|
|
|
### Blog Generation Flow
|
|
|
|
1. **Research Phase** - The researcher agent investigates the topic using web search, critically evaluates claims, and produces verified findings
|
|
|
|
2. **Writing Phase** - Three journalist agents write creative drafts in parallel, each with different temperature and top_p settings for variety
|
|
|
|
3. **Editor Phase** - The editor:
|
|
- Chunks and embeds all drafts into ChromaDB
|
|
- Queries the vector database for relevant context
|
|
- Generates the final polished document with metadata header
|
|
|
|
### Output Format
|
|
|
|
Each blog post includes a metadata header followed by the markdown body:
|
|
|
|
```
|
|
Title: Designing and Building an AI Enhanced CCTV System
|
|
Date: 2026-02-02 20:00
|
|
Modified: 2026-02-02 20:00
|
|
Category: Homelab
|
|
Tags: proxmox, hardware, self host, homelab, ai_content, not_human_content
|
|
Slug: ai-enhanced-cctv
|
|
Authors: phi4-mini.ai, qwen3.ai, gemma3.ai
|
|
Summary: Home CCTV Security has become a bastion of cloud subscription awfulness. This blog describes creating your own AI enhanced system.
|
|
|
|
<full markdown blog body follows>
|
|
```
|
|
|
|
The metadata fields are generated as follows:
|
|
- **Title** - From the Trilium note title
|
|
- **Date/Modified** - Current datetime when generated
|
|
- **Category** - AI-generated single word (e.g., Homelab, DevOps, Security)
|
|
- **Tags** - AI-generated relevant tags plus `ai_content, not_human_content`
|
|
- **Slug** - AI-generated URL-friendly slug
|
|
- **Authors** - Derived from CONTENT_CREATOR_MODELS (model name + `.ai`)
|
|
- **Summary** - AI-generated 15-25 word summary
|
|
|
|
### Git Workflow
|
|
|
|
After generation, the blog post is:
|
|
1. Committed to a new branch named after the slug
|
|
2. Pushed to the configured Git remote
|
|
3. A notification is sent via n8n to Matrix for review
|
|
|
|
## Gitea Actions Workflow
|
|
|
|
The `.gitea/workflows/deploy.yml` file defines an automated workflow that:
|
|
|
|
- Runs on a schedule (daily at 18:15 UTC) or on push to master branch
|
|
- Installs all dependencies
|
|
- Creates the `.env` file from Gitea secrets and variables
|
|
- Runs the blog generation script
|
|
|
|
### Setting Up Gitea Variables
|
|
|
|
In your Gitea repository settings, configure the following:
|
|
|
|
**Variables** (Repository Settings -> Variables):
|
|
- `TRILIUM_HOST` - Your Trilium server hostname
|
|
- `TRILIUM_PORT` - Trilium port
|
|
- `TRILIUM_PROTOCOL` - http or https
|
|
- `OLLAMA_PROTOCOL` - http or https
|
|
- `OLLAMA_HOST` - Ollama server hostname
|
|
- `OLLAMA_PORT` - Ollama port (default 11434)
|
|
- `EMBEDDING_MODEL` - Embedding model name
|
|
- `EDITOR_MODEL` - Editor/Researcher model name
|
|
- `CONTENT_CREATOR_MODELS_1` through `CONTENT_CREATOR_MODELS_4` - Individual model names (the workflow joins these into an array)
|
|
- `GIT_PROTOCOL` - https or ssh
|
|
- `GIT_REMOTE` - Git repository URL
|
|
- `GIT_USER` - Git username for pushing
|
|
- `N8N_WEBHOOK_URL` - n8n webhook URL for notifications
|
|
- `CHROMA_HOST` - ChromaDB hostname
|
|
- `CHROMA_PORT` - ChromaDB port
|
|
|
|
**Secrets** (Repository Settings -> Secrets):
|
|
- `TRILIUM_PASS` - Trilium password
|
|
- `TRILIUM_TOKEN` - Trilium API token
|
|
- `GIT_PASS` - Git password or personal access token
|
|
- `N8N_SECRET` - n8n webhook secret key
|
|
- `OLLAMA_API_KEY` - Ollama API key for web search
|
|
|
|
### Workflow Triggers
|
|
|
|
The workflow runs automatically when:
|
|
- A push is made to the master branch
|
|
- The scheduled cron time is reached (18:15 UTC daily)
|
|
|
|
To trigger manually, push any change to master or modify the cron schedule in `.gitea/workflows/deploy.yml`.
|
|
|
|
## Customizing Agent Behavior
|
|
|
|
Agent personalities and task instructions are defined in YAML files under `src/ai_generators/crews/*/config/`. You can modify these without changing Python code:
|
|
|
|
- `research_crew/config/agents.yaml` - Researcher role, goal, backstory
|
|
- `research_crew/config/tasks.yaml` - Research task description
|
|
- `writing_crew/config/agents.yaml` - Four journalist personalities
|
|
- `writing_crew/config/tasks.yaml` - Writing task descriptions
|
|
- `editor_crew/config/agents.yaml` - Editor role, goal, backstory
|
|
- `editor_crew/config/tasks.yaml` - Editing task and metadata format
|
|
|
|
After editing YAML files, restart the application or container to apply changes.
|
|
|
|
## Troubleshooting
|
|
|
|
### Ollama Connection Errors
|
|
|
|
Ensure the Ollama server is running and accessible from the blog_creator container. Check `OLLAMA_HOST` and `OLLAMA_PORT` in your `.env` file.
|
|
|
|
### ChromaDB Connection Errors
|
|
|
|
Verify ChromaDB is running and the `CHROMA_HOST` and `CHROMA_PORT` variables are correct. In Docker Compose, use `chroma` as the host name.
|
|
|
|
### Ollama Web Search Errors
|
|
|
|
If the researcher agent fails with web search errors, check that `OLLAMA_API_KEY` is set correctly. Verify your Ollama subscription is active and has web search access.
|
|
|
|
### Empty Output
|
|
|
|
If blog posts are generated but empty, check:
|
|
- Ollama models are downloaded and available
|
|
- `CONTENT_CREATOR_MODELS` contains valid model names
|
|
- Sufficient timeout for model inference (default is 30 minutes per operation)
|
|
|
|
### Git Push Failures
|
|
|
|
Verify `GIT_USER` and `GIT_PASS` are correct and the user has write access to the remote repository. Check that the remote URL in `GIT_REMOTE` is accessible.
|
|
|
|
## Development Notes
|
|
|
|
- The `main.py` entry point should not be modified for normal operation
|
|
- All AI generation logic is in `src/ai_generators/`
|
|
- The Flow pattern allows easy addition of new crews or steps
|
|
- Vector database collections are named `blog_{title}_{random_id}` and persist across runs |