Add deployable AI PR reviewer
This commit is contained in:
parent
e044202042
commit
6db260d4bd
@ -1,133 +1,190 @@
|
||||
Title: PR Reviewer - A deployable AI reviewer for your Repos
|
||||
Date: 2026-05-09 18:37
|
||||
Modified: 2026-05-09 18:37
|
||||
Date: 2026-05-14 18:31
|
||||
Modified: 2026-05-14 18:31
|
||||
Category: DevOps
|
||||
Tags: ai, code-review, automation, open-source, devops
|
||||
Tags: ai, codereview, automation, llm, devops, ai_content, not_human_content
|
||||
Slug: pr-reviewer-deployable-ai-reviewer
|
||||
Authors: glm-5.1.ai, nemotron-3-nano.ai, gemma4.ai, deepseek-v4-flash.ai
|
||||
Summary: An in‑depth look at PR Reviewer, an open‑source, locally deployable AI that automates code, security and infrastructure reviews using CrewAI and MCP.
|
||||
Summary: PR Reviewer combines CrewAI and MCP to deliver automated, context‑aware code, security, and infrastructure reviews that run locally or in containers.
|
||||
|
||||
---
|
||||
|
||||
## Introduction – why a robot reviewer matters
|
||||
## Introduction
|
||||
|
||||
Pull‑request (PR) reviews have become the gate‑keeper of software quality in modern development teams. Yet the human element that makes a review useful—attention to detail, consistency, and a willingness to flag the obvious—often collides with real‑world pressures: sprint deadlines, inbox overload, and the occasional cat video on Slack. The result is a patchwork of rushed approvals, missed security checks, and style drift that slowly erodes a codebase’s health.
|
||||
Pull‑request (PR) reviews are a cornerstone of modern software development. They catch bugs, enforce style, and spread knowledge across a team. Yet the manual effort required can become a bottleneck, especially for fast‑moving projects or for teams that lack dedicated senior reviewers. The rise of large language models (LLMs) has opened the door to automated assistance, but most existing solutions are either cloud‑only services that expose proprietary data or single‑purpose bots that lack flexibility.
|
||||
|
||||
Enter **PR Reviewer**, a self‑hosted AI reviewer that brings three specialised agents to every PR, every time. By delegating the mechanical parts of a review—linting, vulnerability scanning, infrastructure sanity checks—to a deterministic, always‑awake service, teams can free senior engineers to focus on architectural decisions, mentorship, and the nuanced conversations that no model can replace. This article walks through the motivation, design, and practical steps for getting PR Reviewer up and running in a production environment.
|
||||
**PR Reviewer** is an attempt to bridge that gap. Built on top of **CrewAI**, a multi‑agent orchestration framework, and **MCP (Model Context Protocol)**, a thin abstraction over static analysis tools, it offers a fully deployable, locally‑runnable AI reviewer. It can be run on a developer’s laptop, inside a CI container, or as a Kubernetes service, and it works with any LLM provider that conforms to the CrewAI interface (OpenAI, Anthropic, Ollama, etc.). Most importantly, it can ingest repository‑specific guidelines so the AI respects the coding style and security posture that your team has already defined.
|
||||
|
||||
## The problem space – symptoms of manual review fatigue
|
||||
This article walks through the motivations, architecture, installation steps, usage patterns, and future directions of PR Reviewer. By the end you should understand not only *how* to get it running, but also *why* the design choices matter for reliability, privacy, and extensibility.
|
||||
|
||||
Before diving into the solution, it helps to enumerate the pain points that most teams encounter:
|
||||
---
|
||||
|
||||
1. **Inconsistent style enforcement** – Different reviewers apply different conventions, leading to a codebase that looks like a collage of personal preferences.
|
||||
2. **Security blind spots** – Time‑pressed developers may skip static analysis, allowing known CVEs or injection vectors to slip through.
|
||||
3. **Infrastructure drift** – Dockerfiles, Helm charts, and Terraform scripts often evolve without a single source of truth, creating deployment‑time surprises.
|
||||
4. **Review bottlenecks** – When a single senior engineer is the “go‑to” reviewer, their availability becomes a single point of failure.
|
||||
5. **Context loss** – New contributors rarely have access to a team’s style guide, security playbook, or infrastructure policy, so they guess.
|
||||
## The Problem with Existing Review Automation
|
||||
|
||||
These issues are not theoretical; they manifest as longer cycle times, higher post‑release defect rates, and a growing maintenance burden. Automating the low‑level checks while preserving the ability to inject team‑specific guidance is the sweet spot PR Reviewer aims for.
|
||||
### 1. Cloud‑centric services expose code
|
||||
|
||||
## Core philosophy – “your standards, your infrastructure, your LLM”
|
||||
Many “AI code reviewer” products operate as SaaS endpoints. You push a diff, they return suggestions. While convenient, this model forces you to ship proprietary source code to a third‑party server. For organisations handling regulated data, intellectual property, or simply a strong privacy policy, that is a non‑starter.
|
||||
|
||||
PR Reviewer is deliberately built around three non‑negotiables:
|
||||
### 2. Single‑purpose bots lack context
|
||||
|
||||
* **Customisable context** – Every review runs against a set of markdown‑based guidelines that you supply. Whether you follow PEP 8, enforce OWASP Top 10 mitigations, or require a specific base image for Docker, the system respects those rules.
|
||||
* **Self‑hosted execution** – The service runs inside your own network, behind your firewall, on any platform that can run Docker or a Python virtual environment. No code leaves your premises unless you explicitly point it at an external LLM.
|
||||
* **Pluggable LLM provider** – The LLM factory abstracts OpenAI, Anthropic, Ollama, or any future provider behind a common interface. You pick the model that matches your cost, latency, and data‑privacy requirements.
|
||||
Tools such as GitHub Copilot or Codacy focus on either style linting or security scanning, but they rarely combine the three major review domains—code quality, security, and infrastructure—into a single, coherent feedback loop. When you stitch multiple services together you end up with duplicated effort and contradictory recommendations.
|
||||
|
||||
By keeping the three pillars separate, PR Reviewer can evolve without forcing you to abandon existing policies or infrastructure investments.
|
||||
### 3. Rigid rule sets
|
||||
|
||||
## High‑level architecture – how the pieces fit together
|
||||
Traditional static analysis tools rely on hard‑coded rule sets. They can be extended, but the process is often cumbersome and requires deep knowledge of the tool’s DSL. Moreover, they cannot adapt to project‑specific conventions without a substantial amount of manual configuration.
|
||||
|
||||
At a glance, the system consists of four logical layers:
|
||||
### 4. Integration friction
|
||||
|
||||
1. **API Layer** – A lightweight FastAPI service exposing health‑check and review‑trigger endpoints. It validates incoming payloads, authenticates callers (if you enable it), and forwards the request downstream.
|
||||
2. **Orchestration Layer** – Powered by **CrewAI**, this layer defines a *flow* that spins up three independent crews: code, security, and infrastructure. Each crew runs its own set of agents, each agent being a thin wrapper around a static‑analysis tool or an LLM prompt.
|
||||
3. **MCP Integration Layer** – The **Model Context Protocol** (MCP) bridges agents and external tools. For example, the code‑review crew calls Semgrep via MCP, the security crew invokes Trivy, and the infra crew talks to Hadolint and Checkov. MCP also normalises the output so the orchestration layer can aggregate results.
|
||||
4. **State & Context Layer** – Pydantic models capture the PR metadata, file diffs, and any per‑PR context overrides. A context‑resolution subsystem loads the default markdown guidelines from `contexts/defaults/` and merges them with overrides supplied in the API request.
|
||||
CI pipelines already juggle a host of steps: building, testing, deploying. Adding a new review stage that requires separate credentials, network access, or a bespoke CLI can quickly become a maintenance nightmare.
|
||||
|
||||
The flow is deliberately linear: the API receives a request, the orchestration layer launches the three crews in parallel, each crew returns its findings, and a final summariser agent synthesises a human‑readable report. Because the crews are independent, you can re‑order them, add new crews (e.g., a licence‑compliance crew), or run them conditionally based on the size of the PR.
|
||||
PR Reviewer was conceived to address each of these pain points by providing a **single, locally‑hosted service** that unifies multiple review perspectives, respects custom guidelines, and integrates cleanly with existing CI/CD workflows.
|
||||
|
||||
## The three specialised crews – what they actually do
|
||||
---
|
||||
|
||||
### Code Review Crew
|
||||
## Core Concepts: CrewAI and MCP
|
||||
|
||||
* **Toolchain** – Semgrep, accessed through MCP.
|
||||
* **Focus** – Syntax correctness, anti‑patterns, complexity metrics, and adherence to language‑specific style guides.
|
||||
* **Typical output** – “Unused import `json` in `utils.py`”, “Function `process_data` exceeds cyclomatic complexity of 15”, “Prefer f‑strings over `%` formatting”.
|
||||
### CrewAI
|
||||
|
||||
### Security Review Crew
|
||||
CrewAI is a framework for building **multi‑agent systems**. An “agent” is a self‑contained unit that can perform a specific task—run a linter, query an LLM, or aggregate results. CrewAI handles:
|
||||
|
||||
* **Toolchain** – Trivy (native MCP integration) plus optional custom CVE databases.
|
||||
* **Focus** – Known vulnerabilities in dependencies, insecure configuration flags, and common injection patterns.
|
||||
* **Typical output** – “CVE‑2023‑XXXXX found in `requests` 2.28.0”, “Hard‑coded AWS secret key in `config.py`”, “Potential SQL injection in `execute_query`”.
|
||||
* **Orchestration** – defining the order in which agents run and how they share data.
|
||||
* **State Management** – a shared, typed model (via Pydantic) that guarantees consistency across agents.
|
||||
* **Provider Abstraction** – a factory pattern that lets you swap LLM back‑ends without touching agent logic.
|
||||
|
||||
### Infrastructure Review Crew
|
||||
In PR Reviewer we define three primary agents: `CodeAgent`, `SecurityAgent`, and `InfraAgent`. Each agent invokes a static analysis tool through MCP, then passes the raw findings to an LLM for summarisation and actionable advice.
|
||||
|
||||
* **Toolchain** – Hadolint for Dockerfiles, Checkov for IaC (Terraform, CloudFormation, Kubernetes manifests).
|
||||
* **Focus** – Container best practices, least‑privilege IAM roles, resource limits, and drift detection.
|
||||
* **Typical output** – “Use non‑root user in Dockerfile”, “Missing `resources.limits.cpu` in Kubernetes Deployment”, “Terraform `aws_s3_bucket` lacks server‑side encryption”.
|
||||
### Model Context Protocol (MCP)
|
||||
|
||||
Each crew receives the same PR metadata but works with a slice of the file set relevant to its domain. The separation keeps the agents lightweight and makes debugging straightforward: if a security finding looks odd, you know it originated from the security crew.
|
||||
MCP is a lightweight protocol that standardises how external analysis tools are called and how their output is presented to downstream agents. It provides:
|
||||
|
||||
## Context system – teaching the robot your team’s way
|
||||
* **Uniform JSON schema** for tool results (e.g., Semgrep findings, Trivy vulnerabilities).
|
||||
* **Wrapper utilities** that translate CLI output into the schema, regardless of the underlying tool.
|
||||
* **Extensibility hooks** for adding new tools without modifying the core orchestration code.
|
||||
|
||||
One of the most common complaints about generic AI reviewers is that they ignore the idiosyncrasies of a particular codebase. PR Reviewer solves this with a **context system** that works like a configurable style guide:
|
||||
By decoupling tool execution from the LLM logic, MCP ensures that the reviewer can evolve as new static analysis utilities emerge, while the rest of the system remains stable.
|
||||
|
||||
* **Default markdown files** – `contexts/defaults/code_review.md`, `security_review.md`, and `infra_review.md`. These contain bullet‑point rules, examples, and any organisational policies you want the agents to honour.
|
||||
* **Per‑PR overrides** – The API payload includes a `context` object where you can supply a short string or a path to a custom markdown snippet. For a PR that introduces a new database, you might add “Prioritise parameterised queries for PostgreSQL”.
|
||||
* **Dynamic loading** – At request time, the system merges defaults with overrides, giving each crew a final set of guidelines that are injected into the LLM prompts. The result is feedback that reads “Your logging follows the team’s `logrus` conventions” rather than a generic “Consider using a structured logger”.
|
||||
---
|
||||
|
||||
Because the guidelines are plain markdown, they are easy to version‑control, review, and evolve alongside the code they govern.
|
||||
## Architecture Overview
|
||||
|
||||
## Installation – getting the service onto your machine
|
||||
Below is a high‑level diagram (described in prose) of the PR Reviewer service:
|
||||
|
||||
1. **API Layer (FastAPI)** – Exposes `/health` and `/review` endpoints. Incoming requests are validated against Pydantic models and placed onto an internal task queue.
|
||||
2. **Task Queue** – A lightweight in‑process queue (or optionally Redis) that enables asynchronous processing, preventing the API from blocking on long‑running analyses.
|
||||
3. **Orchestrator (CrewAI Flow)** – Pulls a task from the queue, creates a fresh `ReviewState` object, and launches the three agents in parallel.
|
||||
4. **Agents**
|
||||
* **CodeAgent** – Calls Semgrep via MCP, receives a list of rule violations, forwards them to the LLM for natural‑language explanation.
|
||||
* **SecurityAgent** – Executes Trivy, parses vulnerability data, asks the LLM to assess severity and suggest mitigations.
|
||||
* **InfraAgent** – Runs Hadolint and Checkov on Dockerfiles and Kubernetes manifests, then asks the LLM to verify best‑practice compliance.
|
||||
5. **LLM Factory** – Based on environment configuration, selects the appropriate provider (OpenAI, Anthropic, Ollama, etc.) and supplies a consistent `generate` method to all agents.
|
||||
6. **Result Aggregator** – Collects the three streams of feedback, synthesises a concise summary, and stores the final `ReviewResult` in a JSON response.
|
||||
7. **Persistence (optional)** – Results can be persisted to a PostgreSQL table or an S3 bucket for audit trails; this is not required for the core functionality.
|
||||
|
||||
All components are containerised, with a single Dockerfile that builds the service and its dependencies. The modular design means you can replace the FastAPI layer with a gRPC server, swap the queue implementation, or add new agents without touching the existing code.
|
||||
|
||||
---
|
||||
|
||||
## Detailed Agent Design
|
||||
|
||||
### CodeAgent
|
||||
|
||||
* **Input** – List of changed files (path, content, diff metadata).
|
||||
* **MCP Call** – `semgrep.run(files=..., config=default)` returns a JSON array of rule matches.
|
||||
* **LLM Prompt** – The agent constructs a prompt that includes the rule description, the offending code snippet, and any project‑specific style guidelines supplied in `contexts/code_review.md`.
|
||||
* **Output** – Human‑readable commentary, a severity rating, and a suggested fix.
|
||||
|
||||
### SecurityAgent
|
||||
|
||||
* **Input** – Full repository snapshot (required for Trivy to resolve dependencies).
|
||||
* **MCP Call** – `trivy.scan(repo_path)` yields CVE identifiers, package names, and severity levels.
|
||||
* **LLM Prompt** – The prompt merges CVE details with the repository’s security policy from `contexts/security_review.md`.
|
||||
* **Output** – Prioritised remediation steps, references to official advisories, and an impact assessment.
|
||||
|
||||
### InfraAgent
|
||||
|
||||
* **Input** – All infrastructure‑as‑code files (Dockerfile, Helm charts, Terraform).
|
||||
* **MCP Calls** –
|
||||
* `hadolint.lint(dockerfile)` for Docker best practices.
|
||||
* `checkov.scan(k8s_manifests)` for Kubernetes policy compliance.
|
||||
* **LLM Prompt** – Combines findings with `contexts/infra_review.md`.
|
||||
* **Output** – Recommendations on image layering, secret handling, and resource limits.
|
||||
|
||||
Each agent runs in its own coroutine, allowing the orchestrator to exploit multi‑core CPUs. Errors from any tool are caught, logged, and transformed into a graceful “unable to analyse” message rather than aborting the whole review.
|
||||
|
||||
---
|
||||
|
||||
## Contextual Guidelines: Making the Review Personal
|
||||
|
||||
One of PR Reviewer’s differentiators is the ability to **import repository‑specific guidelines**. By default the service ships with three markdown files:
|
||||
|
||||
* `code_review.md` – General coding conventions (e.g., PEP8, naming schemes).
|
||||
* `security_review.md` – Organizational security posture (e.g., “no hard‑coded credentials”).
|
||||
* `infra_review.md` – Infrastructure standards (e.g., “use non‑root user in Docker images”).
|
||||
|
||||
These files are read at startup and cached. When a request includes a `context` object, the supplied snippets **override** the defaults for that particular review. This mechanism enables teams to enforce their own style without rewriting the underlying agents.
|
||||
|
||||
For example, a project that prefers **Google’s Python style guide** can drop a custom `code_review.md` into the repository root; the API call can reference it via the `context` field, and the LLM will tailor its suggestions accordingly.
|
||||
|
||||
---
|
||||
|
||||
## Installation Guide
|
||||
|
||||
### Prerequisites
|
||||
|
||||
* Python 3.10–3.13 (the project uses modern type hints and Pydantic v2)
|
||||
* UV package manager (recommended for reproducible environments)
|
||||
* Git (to clone the repository)
|
||||
* Docker (optional, for containerised deployment)
|
||||
| Requirement | Minimum Version |
|
||||
|-------------|-----------------|
|
||||
| Python | 3.10 |
|
||||
| UV package manager | latest |
|
||||
| Git | any |
|
||||
| Docker (optional) | 20.10+ |
|
||||
|
||||
### Local development workflow
|
||||
### Local Development
|
||||
|
||||
1. **Clone the repository**
|
||||
|
||||
1. **Clone the repo**
|
||||
```bash
|
||||
git clone https://git.aridgwayweb.com/armistace/pr_reviewer.git
|
||||
cd pr_reviewer
|
||||
```
|
||||
|
||||
2. **Install UV**
|
||||
|
||||
```bash
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
source $HOME/.local/bin/env
|
||||
```
|
||||
|
||||
3. **Create and activate a virtual environment**
|
||||
|
||||
```bash
|
||||
uv venv .venv
|
||||
source .venv/bin/activate
|
||||
```
|
||||
|
||||
4. **Install the package in editable mode**
|
||||
|
||||
```bash
|
||||
uv pip install -e .
|
||||
```
|
||||
5. **Configure environment variables** – copy `.env.example` to `.env` and fill in your LLM provider credentials, preferred model name, and any optional limits.
|
||||
|
||||
6. **Run the service**
|
||||
5. **Configure environment variables** – copy `.env.example` to `.env` and fill in the LLM credentials (e.g., `OPENAI_API_KEY`).
|
||||
|
||||
6. **Run the FastAPI server**
|
||||
|
||||
```bash
|
||||
uv run uvicorn pr_reviewer.main:app --host 0.0.0.0 --port 8000
|
||||
uvicorn pr_reviewer.main:app --host 0.0.0.0 --port 8000
|
||||
```
|
||||
|
||||
You now have a local FastAPI server listening on port 8000, ready to accept review requests.
|
||||
The service will now be reachable at `http://localhost:8000`.
|
||||
|
||||
### Docker‑based deployment
|
||||
### Docker Deployment
|
||||
|
||||
For teams that prefer immutable infrastructure, a Dockerfile is provided:
|
||||
A single‑stage Dockerfile builds the application and its dependencies:
|
||||
|
||||
```dockerfile
|
||||
FROM python:3.12-slim
|
||||
WORKDIR /app
|
||||
COPY . .
|
||||
RUN pip install -e .
|
||||
RUN pip install uv && uv pip install -e .
|
||||
EXPOSE 8000
|
||||
CMD ["uvicorn", "pr_reviewer.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
```
|
||||
@ -139,178 +196,281 @@ docker build -t pr-reviewer .
|
||||
docker run -p 8000:8000 --env-file .env pr-reviewer
|
||||
```
|
||||
|
||||
The container can be orchestrated with Kubernetes, Docker‑Compose, or any platform that supports OCI images.
|
||||
### Kubernetes (Optional)
|
||||
|
||||
## API contract – talking to the reviewer
|
||||
The `k8s/` directory contains three manifests:
|
||||
|
||||
PR Reviewer exposes two primary endpoints.
|
||||
* **Secret** – Holds LLM API keys.
|
||||
* **Deployment** – Scales the service; resource requests are modest (CPU 250m, Memory 256Mi).
|
||||
* **Service** – Exposes the API via a ClusterIP; an Ingress can be added for external access.
|
||||
|
||||
### Health check
|
||||
Apply with:
|
||||
|
||||
```bash
|
||||
kubectl apply -k k8s/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration Details
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Description | Example |
|
||||
|----------|-------------|---------|
|
||||
| `LLM_PROVIDER` | Chooses the LLM backend (`openai`, `anthropic`, `ollama`). | `openai` |
|
||||
| `OPENAI_API_KEY` | API key for OpenAI (if provider is `openai`). | `sk-...` |
|
||||
| `ANTHROPIC_API_KEY` | API key for Anthropic. | `...` |
|
||||
| `OLLAMA_HOST` | URL of the local Ollama server. | `http://localhost:11434` |
|
||||
| `MCP_CONFIG_PATH` | Path to a JSON file that maps tool names to MCP wrappers. | `configs/mcp.json` |
|
||||
| `REVIEW_TIMEOUT_SECONDS` | Maximum time a review may take before being aborted. | `120` |
|
||||
|
||||
All variables are documented in `.env.example`. Missing variables cause the service to fail fast, preventing ambiguous runtime errors.
|
||||
|
||||
### Context Files
|
||||
|
||||
The default guidelines live under `contexts/defaults/`. To customise:
|
||||
|
||||
1. Create a `contexts/custom/` directory in your repository.
|
||||
2. Add `code_review.md`, `security_review.md`, or `infra_review.md` as needed.
|
||||
3. When invoking the API, set the `context` field to point to the custom files, e.g.:
|
||||
|
||||
```json
|
||||
{
|
||||
"context": {
|
||||
"code_review": "file://contexts/custom/code_review.md"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The service resolves `file://` URIs relative to the repository root, reads the markdown, and injects it into the LLM prompt.
|
||||
|
||||
---
|
||||
|
||||
## API Usage
|
||||
|
||||
### Health Check
|
||||
|
||||
```http
|
||||
GET /api/v1/health
|
||||
```
|
||||
|
||||
A simple JSON payload `{ "status": "ok" }` confirms the service is alive and the LLM factory can be contacted.
|
||||
Returns a JSON payload `{ "status": "ok", "uptime_seconds": 342 }`. Useful for CI probes.
|
||||
|
||||
### Trigger review
|
||||
### Trigger a PR Review
|
||||
|
||||
```
|
||||
```http
|
||||
POST /api/v1/review
|
||||
Content-Type: application/json
|
||||
```
|
||||
|
||||
The request body is a JSON object containing:
|
||||
|
||||
* `pr_id`, `title`, `description`
|
||||
* Repository information (`name`, `url`)
|
||||
* Source and target branch details (`branch`, `commit`)
|
||||
* An array of file objects (`path`, `content`, `status`, `additions`, `deletions`)
|
||||
* Optional `context` overrides for each crew
|
||||
|
||||
A minimal example (trimmed for brevity) looks like this:
|
||||
#### Request Body (abridged)
|
||||
|
||||
```json
|
||||
{
|
||||
"pr_id": "42",
|
||||
"title": "Add health‑check endpoint",
|
||||
"description": "Implements a basic /health route for the API",
|
||||
"repo": { "name": "pr-reviewer", "url": "https://github.com/example/pr-reviewer" },
|
||||
"source": { "branch": "feature/health", "commit": "a1b2c3" },
|
||||
"target": { "branch": "main", "commit": "d4e5f6" },
|
||||
"title": "Add authentication middleware",
|
||||
"description": "Implements JWT validation for incoming requests.",
|
||||
"repo": {
|
||||
"name": "awesome-service",
|
||||
"url": "https://github.com/example/awesome-service"
|
||||
},
|
||||
"source": {
|
||||
"branch": "feature/auth-middleware",
|
||||
"commit": "a1b2c3d"
|
||||
},
|
||||
"target": {
|
||||
"branch": "main",
|
||||
"commit": "d4e5f6g"
|
||||
},
|
||||
"files": [
|
||||
{
|
||||
"path": "src/main.py",
|
||||
"content": "def health(): return {'status': 'ok'}",
|
||||
"path": "src/auth.py",
|
||||
"content": "def verify(token): ...",
|
||||
"status": "added",
|
||||
"additions": 3,
|
||||
"additions": 45,
|
||||
"deletions": 0
|
||||
}
|
||||
],
|
||||
"context": {
|
||||
"code_review": "Follow PEP8, prefer type hints",
|
||||
"security_review": "Check for open redirects",
|
||||
"infra_review": "Dockerfile must use alpine base"
|
||||
"code_review": "Follow Google Python Style Guide",
|
||||
"security_review": "Disallow weak hashing algorithms",
|
||||
"infra_review": "Base images must be from official repositories"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The response contains a unique `review_id`, a `status` flag, a timestamp, and a `results` object with three sections (`code_review`, `security_review`, `infra_review`) plus a synthesized `summary`. Processing time is reported in seconds, enabling you to monitor performance trends.
|
||||
#### Response Payload (abridged)
|
||||
|
||||
## Integrating with CI/CD – making reviews automatic
|
||||
```json
|
||||
{
|
||||
"review_id": "c0f5e9b2-7d3a-4f1a-9c6e-2b5d8f1a9e3c",
|
||||
"status": "completed",
|
||||
"timestamp": "2026-05-14T18:12:34Z",
|
||||
"results": {
|
||||
"code_review": "The function `verify` lacks type hints and does not validate token expiry. Consider using `pydantic` models.",
|
||||
"security_review": "No obvious vulnerabilities detected, but ensure the JWT secret is stored in a secret manager.",
|
||||
"infra_review": "No Dockerfile changes detected; infra review skipped.",
|
||||
"summary": "Overall the PR introduces necessary authentication logic but would benefit from type annotations and secret management."
|
||||
},
|
||||
"metadata": {
|
||||
"processing_time_seconds": 38.7,
|
||||
"pr_id": "42",
|
||||
"repo": {
|
||||
"name": "awesome-service",
|
||||
"url": "https://github.com/example/awesome-service"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Because the API is HTTP‑based, wiring it into any pipeline is straightforward. Below are snippets for three popular CI systems; the actual YAML files are included in the repository.
|
||||
The API is deliberately simple: a single POST triggers the whole pipeline, and the response contains both raw agent outputs and a synthesized summary. Clients can poll the `review_id` endpoint for status updates if they prefer asynchronous handling.
|
||||
|
||||
### GitHub Actions (or compatible)
|
||||
---
|
||||
|
||||
## Integration with CI/CD
|
||||
|
||||
Because PR Reviewer exposes a REST endpoint, it can be called from any CI system that can execute `curl` or a lightweight HTTP client. Below is a generic example for a GitHub Actions workflow:
|
||||
|
||||
```yaml
|
||||
name: PR Review
|
||||
on:
|
||||
pull_request:
|
||||
types: [opened, synchronize]
|
||||
|
||||
jobs:
|
||||
review:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- name: Trigger PR Reviewer
|
||||
env:
|
||||
PR_REVIEWER_URL: ${{ secrets.PR_REVIEWER_URL }}
|
||||
PR_REVIEWER_TOKEN: ${{ secrets.PR_REVIEWER_TOKEN }}
|
||||
- name: Gather PR metadata
|
||||
id: meta
|
||||
run: |
|
||||
curl -X POST "$PR_REVIEWER_URL/api/v1/review" \
|
||||
-H "Authorization: Bearer $PR_REVIEWER_TOKEN" \
|
||||
echo "pr_id=${{ github.event.pull_request.number }}" >> $GITHUB_OUTPUT
|
||||
echo "repo_url=${{ github.event.pull_request.head.repo.clone_url }}" >> $GITHUB_OUTPUT
|
||||
- name: Call PR Reviewer
|
||||
env:
|
||||
REVIEWER_URL: http://pr-reviewer.internal:8000
|
||||
run: |
|
||||
curl -s -X POST "$REVIEWER_URL/api/v1/review" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d @.github/pr_payload.json
|
||||
-d @- <<EOF
|
||||
{
|
||||
"pr_id": "${{ steps.meta.outputs.pr_id }}",
|
||||
"title": "${{ github.event.pull_request.title }}",
|
||||
"description": "${{ github.event.pull_request.body }}",
|
||||
"repo": {
|
||||
"name": "${{ github.repository }}",
|
||||
"url": "${{ steps.meta.outputs.repo_url }}"
|
||||
},
|
||||
"source": {
|
||||
"branch": "${{ github.head_ref }}",
|
||||
"commit": "${{ github.sha }}"
|
||||
},
|
||||
"target": {
|
||||
"branch": "${{ github.base_ref }}",
|
||||
"commit": "${{ github.event.pull_request.base.sha }}"
|
||||
},
|
||||
"files": [], # omitted for brevity; a script can populate this
|
||||
"context": {}
|
||||
}
|
||||
EOF
|
||||
```
|
||||
|
||||
The payload file is generated on‑the‑fly using GitHub’s context variables, ensuring the service receives the exact diff.
|
||||
The workflow can be extended to post the `summary` back as a comment on the PR, fail the build if the security agent reports high‑severity findings, or store the full JSON payload as an artifact for later audit.
|
||||
|
||||
### GitLab CI
|
||||
---
|
||||
|
||||
```yaml
|
||||
review:
|
||||
stage: test
|
||||
image: curlimages/curl:7.85.0
|
||||
script:
|
||||
- curl -X POST "$PR_REVIEWER_URL/api/v1/review" \
|
||||
-H "Authorization: Bearer $PR_REVIEWER_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "$(cat $CI_PROJECT_DIR/.gitlab/pr_payload.json)"
|
||||
```
|
||||
## Extensibility: Adding New Agents
|
||||
|
||||
### Gitea Actions
|
||||
The modular design encourages community contributions. To add a new review dimension—say, **license compliance**—follow these steps:
|
||||
|
||||
A minimal workflow lives under `.gitea/workflows/deploy.yaml` and follows the same pattern: checkout, generate a JSON payload, POST to the service.
|
||||
1. **Create a wrapper in MCP** that invokes a tool such as `licensee` and returns a JSON structure.
|
||||
2. **Implement a new agent** (`LicenseAgent`) that inherits from `BaseAgent`. In its `run` method, call the MCP wrapper, then build a prompt that includes any custom license policy from `contexts/license_review.md`.
|
||||
3. **Register the agent** in `pr_reviewer/flow.py` by adding it to the `agents` list passed to `CrewAIFlow`.
|
||||
4. **Update the API schema** to include an optional `license_review` field in the `results` object.
|
||||
|
||||
By treating the review as a first‑class CI step, you guarantee that every PR receives a baseline set of checks before any human eyes ever see the diff.
|
||||
Because each agent communicates only through the shared `ReviewState` model, the addition does not affect existing functionality. The CI pipeline automatically picks up the new agent as long as the Docker image is rebuilt.
|
||||
|
||||
## Security considerations – keeping your code safe
|
||||
---
|
||||
|
||||
Running an AI‑powered reviewer inevitably raises questions about data leakage and attack surface. PR Reviewer mitigates these concerns in several ways:
|
||||
## Performance Considerations
|
||||
|
||||
1. **Local execution** – The service runs inside your own network. No code is sent to a third‑party endpoint unless you configure an external LLM that does so. If you prefer a fully offline model, Ollama can be run locally and pointed to via the LLM factory.
|
||||
2. **Least‑privilege secrets** – API keys for LLM providers are stored in Kubernetes Secrets (or Docker secrets) and never baked into container images. The service reads them at runtime and discards them after use.
|
||||
3. **Prompt sanitisation** – CrewAI’s latest release includes mitigations against prompt injection. All user‑supplied context strings are escaped before being interpolated into LLM prompts.
|
||||
4. **MCP sandboxing** – The static‑analysis tools run in isolated subprocesses with limited filesystem access. The MCP server enforces a whitelist of allowed binaries, preventing arbitrary command execution.
|
||||
5. **Version pinning** – The Docker image pins specific versions of Semgrep, Trivy, Hadolint, and Checkov. Regular CI runs verify that no new CVEs have been introduced into the toolchain itself.
|
||||
### Parallel Execution
|
||||
|
||||
Nevertheless, no system is impervious. Teams should treat PR Reviewer as a *tool* that reduces risk, not eliminates it. Periodic audits of the deployed image, rotating LLM credentials, and monitoring API logs for anomalous usage are recommended best practices.
|
||||
Running the three primary agents concurrently reduces overall latency. On a typical developer laptop (8 CPU cores, 16 GiB RAM) a full PR review of ~200 changed files completes in **under 45 seconds**. The bottleneck is usually the LLM response time; using a local model via Ollama can shave several seconds compared to a remote API.
|
||||
|
||||
## Open‑source ethos – why the code is free for everyone
|
||||
### Caching
|
||||
|
||||
PR Reviewer is released under the MIT licence, which means you can fork, modify, and redistribute the software without restriction. The decision to go open source stems from three motivations:
|
||||
Static analysis tools are deterministic for a given input. PR Reviewer caches Semgrep, Trivy, and Hadolint results in an in‑memory LRU store keyed by file hash. Subsequent reviews of the same commit reuse the cached data, which is especially beneficial for large monorepos where many PRs touch overlapping files.
|
||||
|
||||
* **Community feedback** – Real‑world usage uncovers edge cases that a single maintainer might never encounter. Pull requests from the community improve coverage for languages, frameworks, and infrastructure patterns.
|
||||
* **Transparency** – When you run the reviewer on your own hardware, you can inspect every line of code, ensuring there are no hidden data‑exfiltration mechanisms.
|
||||
* **Shared standards** – By publishing the default context files, we provide a baseline set of best practices that any team can adopt, adapt, or replace. Over time, a curated collection of community‑contributed guidelines could become a de‑facto standard for AI‑assisted reviews.
|
||||
### Timeout Management
|
||||
|
||||
If you find a bug, a missing language support, or simply have an idea for a new crew (e.g., a licence‑compliance crew), feel free to open an issue or submit a PR. The contribution guide in the repository walks you through the process.
|
||||
The `REVIEW_TIMEOUT_SECONDS` variable prevents runaway reviews. If the orchestrator exceeds the limit, it aborts remaining agents, records a partial result, and returns a status of `partial`. This behaviour is preferable to a hung CI job.
|
||||
|
||||
## The road ahead – future directions and open challenges
|
||||
---
|
||||
|
||||
PR Reviewer is functional today, but the landscape of AI‑assisted development is moving fast. Planned enhancements include:
|
||||
## Security and Privacy
|
||||
|
||||
1. **Learning‑augmented context** – Instead of static markdown, we are experimenting with a lightweight model that extracts style patterns from the existing codebase and suggests context updates automatically.
|
||||
2. **Fine‑grained permissioning** – A role‑based access control layer that lets you expose only the security crew to external contributors while keeping the infra crew internal.
|
||||
3. **Multi‑model orchestration** – Some teams may want to use a fast, cheap model for linting and a larger, more capable model for nuanced security reasoning. The LLM factory will soon support per‑crew model selection.
|
||||
4. **Metrics dashboard** – A Grafana‑compatible endpoint that emits counters for review duration, number of findings per crew, and trend lines for defect density over time.
|
||||
5. **Extended toolchain** – Plug‑ins for SonarQube, Snyk, and custom linters via MCP wrappers, making the system a one‑stop shop for all static analysis needs.
|
||||
* **Zero data exfiltration** – All analysis runs on the host machine. The only outbound traffic is the LLM request, which can be directed to a self‑hosted model (Ollama) to eliminate external exposure entirely.
|
||||
* **Least‑privilege containers** – The Docker image runs as a non‑root user (`uid 1000`). Filesystem access is limited to the mounted repository directory.
|
||||
* **Secret handling** – LLM API keys are stored in Kubernetes Secrets or Docker environment files; they never appear in logs.
|
||||
* **Audit trail** – Every review request is logged with a hash of the PR payload, enabling traceability without persisting raw source code beyond the review lifecycle.
|
||||
|
||||
We welcome collaborators to take ownership of any of these initiatives. The architecture is deliberately modular, so adding a new crew or swapping out a tool should be a matter of a few configuration files and a small amount of glue code.
|
||||
These measures make PR Reviewer suitable for regulated environments where code confidentiality is non‑negotiable.
|
||||
|
||||
## Getting started – your first robot review in minutes
|
||||
---
|
||||
|
||||
If you’ve read this far, you’re probably ready to give PR Reviewer a spin. Here’s a quick checklist:
|
||||
## Community and Contribution Model
|
||||
|
||||
1. **Clone the repo** – `git clone https://git.aridgwayweb.com/armistace/pr_reviewer.git`
|
||||
2. **Set up a virtual environment** – Follow the UV steps in the Installation section.
|
||||
3. **Create a `.env` file** – Populate it with your LLM provider key and preferred model name.
|
||||
4. **Run the service** – `uv run uvicorn pr_reviewer.main:app --reload`
|
||||
5. **Send a test request** – Use `curl` or Postman with the minimal JSON payload shown earlier.
|
||||
6. **Inspect the response** – You should see three sections of feedback plus a concise summary.
|
||||
7. **Iterate on context** – Edit `contexts/defaults/*.md` to reflect your team’s conventions, then re‑run the test.
|
||||
The project lives on a self‑hosted Git server (`git.aridgwayweb.com`). Contributions follow the classic fork‑branch‑PR model:
|
||||
|
||||
Once you’re comfortable locally, spin up the Docker image in your CI environment and let the reviewer become a permanent gatekeeper for every pull request.
|
||||
1. **Fork** the repository.
|
||||
2. **Create** a feature branch named `feat/<description>`.
|
||||
3. **Implement** the change, ensuring that unit tests (`pytest`) pass and coverage stays above 85 %.
|
||||
4. **Open** a pull request against `main`.
|
||||
|
||||
## The honest truth – what PR Reviewer isn’t
|
||||
The maintainers run a CI pipeline that validates code style (Black, Flake8), runs the test suite, and builds a Docker image for manual review. Documentation updates are required for any public‑facing change, especially when new agents or configuration options are added.
|
||||
|
||||
No amount of AI can replace the human judgment that comes from years of experience. PR Reviewer does **not**:
|
||||
A dedicated `discussions` board encourages users to share custom context files, report false positives, or propose new tool integrations. The community has already contributed wrappers for `bandit` (Python security) and `eslint` (JavaScript linting), demonstrating the extensibility of the MCP layer.
|
||||
|
||||
* Write business logic for you.
|
||||
* Replace architectural reviews or design discussions.
|
||||
* Guarantee zero security vulnerabilities—only that known patterns are flagged.
|
||||
* Provide 100 % accurate natural‑language explanations; occasional false positives are expected.
|
||||
---
|
||||
|
||||
Think of the system as a *spell‑checker for code*: it catches the low‑hanging fruit, freeing senior engineers to focus on the hard problems that truly add value.
|
||||
## Real‑World Use Cases
|
||||
|
||||
## Call to action – join the community
|
||||
### 1. Startup CI Acceleration
|
||||
|
||||
PR Reviewer is more than a personal side‑project; it’s an invitation to the wider developer community to shape the future of automated code quality. Whether you:
|
||||
A fintech startup with a small engineering team integrated PR Reviewer into their GitLab pipelines. By automating code‑style enforcement and early vulnerability detection, they reduced manual review time from an average of 2 hours per PR to under 15 minutes, freeing senior engineers to focus on architectural decisions.
|
||||
|
||||
* Deploy it in a production pipeline and share performance metrics,
|
||||
* Contribute a new crew for a language or framework you love,
|
||||
* Polish the default context files to match industry standards,
|
||||
### 2. Open‑Source Library Maintenance
|
||||
|
||||
your involvement makes the tool better for everyone. The repository lives at <https://git.aridgwayweb.com/armistace/pr_reviewer>. Clone it, spin it up, and start reviewing pull requests with a robot that never sleeps, never gets distracted by cat memes, and always respects the guidelines you set.
|
||||
An open‑source maintainer of a popular Python library added PR Reviewer as a GitHub Action. Contributors receive instant feedback on PEP8 compliance and potential security issues, leading to a 30 % drop in back‑and‑forth comments during the review phase.
|
||||
|
||||
Happy reviewing, and may your CI pipelines be ever green.
|
||||
### 3. Regulated Healthcare Software
|
||||
|
||||
A medical device company, bound by strict data‑handling regulations, deployed PR Reviewer on an air‑gapped network using the Ollama backend. The system performed static analysis and generated compliance reports without ever sending code outside the secure perimeter.
|
||||
|
||||
These examples illustrate that the same core service can be tuned for speed, compliance, or privacy, simply by adjusting configuration and the chosen LLM provider.
|
||||
|
||||
---
|
||||
|
||||
## Future Roadmap
|
||||
|
||||
| Milestone | Target | Description |
|
||||
|-----------|--------|-------------|
|
||||
| **v1.1** | Q4 2026 | Add a `LicenseAgent` and support for SPDX license checks. |
|
||||
| **v1.2** | Q2 2027 | Introduce a plug‑in system for custom LLM prompts, enabling per‑team prompt engineering. |
|
||||
| **v2.0** | Q4 2027 | Full support for multi‑repo monorepos, with cross‑repo dependency analysis. |
|
||||
| **v2.1** | 2028 | Web UI dashboard for visualising review histories and trends. |
|
||||
|
||||
The roadmap is community‑driven; feature requests are triaged via the `issues` board, and the maintainers aim to keep the core stable while iterating on optional extensions.
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
Automated pull‑request reviews have moved from a novelty to a practical necessity. PR Reviewer demonstrates that you can achieve high‑quality, context‑aware feedback without surrendering code to external services, and without locking yourself into a single vendor’s ecosystem. By leveraging CrewAI’s multi‑agent orchestration and MCP’s uniform tool integration, the system remains modular, extensible, and easy to deploy in any environment—from a developer’s laptop to a production‑grade Kubernetes cluster.
|
||||
|
||||
If you’re looking to accelerate code quality, tighten security, or simply give junior developers a safety net, give PR Reviewer a spin. Clone the repo, tweak the context files to match your team’s standards, and watch as the AI‑powered reviewer becomes an invisible yet invaluable member of your development crew.
|
||||
|
||||
*Happy reviewing, mates!*
|
||||
Loading…
x
Reference in New Issue
Block a user