pr_reviewer/README.md
2026-05-21 21:06:42 +10:00

248 lines
8.3 KiB
Markdown

# PR Reviewer
Automated pull request review system using [CrewAI](https://crewai.com) Flows and MCP (Model Context Protocol) tools.
Performs three parallel reviews — code quality, security, and infrastructure — then synthesizes a consolidated report via a REST API. Supports both a direct API and a Gitea webhook integration that fetches diffs automatically and posts the review as a PR comment.
## Features
- **Code Review** — style, best practices, maintainability (powered by Semgrep)
- **Security Review** — vulnerabilities, injection risks, auth issues (powered by Trivy)
- **Infrastructure Review** — Dockerfiles, Kubernetes manifests, IaC (powered by Hadolint + Checkov)
- **Summarisation** — merges all three reviews into a single actionable report
- **REST API** — FastAPI endpoints for health check, manual review trigger, and Gitea webhook
- **Gitea Webhook** — process PR events directly; fetches diffs, runs reviews, posts results as a PR comment
- **Dockerized** — multi-stage build with all tools bundled
## Quick Start
### Prerequisites
- Docker
- An LLM provider (OpenAI API key, Anthropic key, or a running Ollama instance)
### Setup
```bash
cp .env.example .env
# Edit .env with your LLM provider details
```
### Run
```bash
docker compose up
```
Server starts at `http://localhost:8000`.
### Test
```bash
# Health check
curl http://localhost:8000/api/v1/health
# Trigger a review
curl -X POST http://localhost:8000/api/v1/review \
-H "Content-Type: application/json" \
-d '{
"pr_id": "123",
"title": "Add user authentication",
"repo": {"name": "myapp/backend", "url": "https://github.com/myapp/backend"},
"source": {"branch": "feature/auth"},
"target": {"branch": "main"},
"files": [
{
"path": "auth.py",
"status": "added",
"content": "def login(user, pwd):\n if user == \"admin\" and pwd == \"admin\":\n return True",
"additions": 3,
"deletions": 0
}
]
}'
```
## Architecture
```
POST /api/v1/review POST /api/v1/gitea-webhook
│ │
│ Gitea webhook payload
│ │
│ fetch diffs from
│ Gitea API
│ │
▼ ▼
CodeReviewFlow (CrewAI Flow)
┌────┼──────────────┐
▼ ▼ ▼
Code Security Infra
Review Review Review
│ │ │
└─────┼────────────┘
Summariser
JSON Response / PR Comment
```
LLM-agnostic via CrewAI's LLM abstraction — works with OpenAI, Anthropic, or Ollama.
## API
### `GET /api/v1/health`
Returns service status.
```json
{"status": "healthy", "service": "pr-reviewer"}
```
### `POST /api/v1/review`
Triggers a full PR review. Provide file contents and diffs directly in the request body.
**Request body:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `pr_id` | string | yes | PR identifier |
| `title` | string | yes | PR title |
| `description` | string | no | PR description |
| `repo.name` | string | yes | Repository name |
| `repo.url` | string | yes | Repository URL |
| `source.branch` | string | yes | Source branch |
| `source.commit` | string | no | Source commit SHA |
| `target.branch` | string | yes | Target branch |
| `target.commit` | string | no | Target commit SHA |
| `files[]` | array | no | Changed files |
| `files[].path` | string | yes | File path |
| `files[].content` | string | no | File contents |
| `files[].status` | string | yes | `added`, `modified`, `removed` |
| `files[].additions` | int | no | Lines added |
| `files[].deletions` | int | no | Lines removed |
| `files[].patch` | string | no | Unified diff |
| `context.code_review` | string | no | Code review guidelines override |
| `context.security_review` | string | no | Security review guidelines override |
| `context.infra_review` | string | no | Infrastructure review guidelines override |
**Response:**
```json
{
"review_id": "uuid",
"status": "completed",
"timestamp": "2024-01-01T00:00:00Z",
"results": {
"code_review": "...",
"security_review": "...",
"infra_review": "...",
"summary": "..."
},
"metadata": {
"processing_time_seconds": 290.22,
"pr_id": "123",
"repo": {"name": "myapp/backend", "url": "https://github.com/myapp/backend"}
}
}
```
### `POST /api/v1/gitea-webhook`
Receives Gitea webhook events. Only processes `pull_request` events with actions `opened`, `synchronize`, or `reopened`. All other events and actions are ignored.
The endpoint:
1. Validates the `X-Gitea-Signature` header using HMAC-SHA256 (if `ACCESS_GITEA_SECRET` is configured)
2. Fetches changed files and their contents from the Gitea API
3. Runs the full review pipeline (code, security, infrastructure, summariser)
4. Posts the review summary as a comment on the PR via the Gitea API
## Gitea Webhook Setup
### 1. Create an access token
In your Gitea instance, go to **Settings → Applications → Generate New Token** and create a token with `read:repository` scope.
### 2. Add the webhook
In your Gitea repository, go to **Settings → Webhooks → Add Webhook → Gitea**:
- **Target URL**: `http://<host>:30001/api/v1/gitea-webhook`
- **HTTP Method**: `POST`
- **Secret**: a random string (optional but recommended)
- **Trigger On**: Pull Request
### 3. Configure environment variables
Set the following in the container (or k8s secret):
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `ACCESS_GITEA_URL` | yes | `http://192.168.178.160:3000` | Gitea instance base URL |
| `ACCESS_GITEA_TOKEN` | yes | — | Gitea personal access token with `read:repository` scope |
| `ACCESS_GITEA_SECRET` | no | `""` | Webhook secret; if set, signatures are validated |
## Configuration
All configuration via environment variables in `.env`:
| Variable | Default | Description |
|----------|---------|-------------|
| `LLM_MODEL` | (required) | Model name (e.g. `gpt-4`, `gemma4:31b-cloud`) |
| `LLM_PROVIDER` | (required) | `openai`, `anthropic`, or `ollama` |
| `LLM_BASE_URL` | — | API base URL |
| `LLM_API_KEY` | — | API key (not needed for Ollama) |
| `ACCESS_GITEA_URL` | `http://192.168.178.160:3000` | Gitea instance base URL |
| `ACCESS_GITEA_TOKEN` | — | Gitea personal access token with `read:repository` scope |
| `ACCESS_GITEA_SECRET` | — | Webhook secret for HMAC-SHA256 signature verification |
| `TOTAL_FLOW_TIMEOUT` | `600` | Max seconds for full review |
| `PER_CREW_TIMEOUT` | `300` | Max seconds per crew |
| `LOG_LEVEL` | `INFO` | Logging level |
## Deployment
### Kubernetes
The repo includes a CI pipeline (`.gitea/workflows/build_push.yml`) that builds a multi-arch Docker image, pushes it to the registry, and deploys to Kubernetes.
The k8s deployment uses a NodePort service exposing port 30001, which maps to the container's port 8000.
Environment variables are stored in a k8s secret (`pr-reviewer-env`). The CI pipeline creates this secret automatically — add `ACCESS_GITEA_URL`, `ACCESS_GITEA_TOKEN`, and `ACCESS_GITEA_SECRET` as Gitea repo variables/secrets.
## Development
```bash
# Install deps
uv pip install -e ".[dev]"
# Run tests
pytest tests/
# Run server locally
uvicorn src.pr_reviewer.main:app --reload
```
## Project Structure
```
├── config/ # Shared agent/task YAML configs
├── contexts/ # Default review guidelines (markdown)
├── crews/ # Crew definitions (code, security, infra, summariser)
├── mcp_servers/ # MCP tool wrappers (Hadolint, Checkov)
├── src/pr_reviewer/ # Core application code
│ ├── main.py # FastAPI app, endpoints, webhook handler
│ ├── flow.py # CrewAI Flow orchestration
│ ├── state.py # Pydantic state models
│ ├── llm.py # LLM factory
│ └── context.py # Context resolution
├── tests/ # Unit and integration tests
├── kube/ # Kubernetes manifests
├── docker-compose.yaml
├── Dockerfile
└── pyproject.toml
```