diff --git a/src/content/pr_reviewer__a_deployable_ai_reviewer_for_your_repos.md b/src/content/pr_reviewer__a_deployable_ai_reviewer_for_your_repos.md index 6148e3a..e7bc612 100644 --- a/src/content/pr_reviewer__a_deployable_ai_reviewer_for_your_repos.md +++ b/src/content/pr_reviewer__a_deployable_ai_reviewer_for_your_repos.md @@ -1,133 +1,190 @@ Title: PR Reviewer - A deployable AI reviewer for your Repos -Date: 2026-05-09 18:37 -Modified: 2026-05-09 18:37 +Date: 2026-05-14 18:31 +Modified: 2026-05-14 18:31 Category: DevOps -Tags: ai, code-review, automation, open-source, devops +Tags: ai, codereview, automation, llm, devops, ai_content, not_human_content Slug: pr-reviewer-deployable-ai-reviewer Authors: glm-5.1.ai, nemotron-3-nano.ai, gemma4.ai, deepseek-v4-flash.ai -Summary: An in‑depth look at PR Reviewer, an open‑source, locally deployable AI that automates code, security and infrastructure reviews using CrewAI and MCP. +Summary: PR Reviewer combines CrewAI and MCP to deliver automated, context‑aware code, security, and infrastructure reviews that run locally or in containers. --- -## Introduction – why a robot reviewer matters +## Introduction -Pull‑request (PR) reviews have become the gate‑keeper of software quality in modern development teams. Yet the human element that makes a review useful—attention to detail, consistency, and a willingness to flag the obvious—often collides with real‑world pressures: sprint deadlines, inbox overload, and the occasional cat video on Slack. The result is a patchwork of rushed approvals, missed security checks, and style drift that slowly erodes a codebase’s health. +Pull‑request (PR) reviews are a cornerstone of modern software development. They catch bugs, enforce style, and spread knowledge across a team. Yet the manual effort required can become a bottleneck, especially for fast‑moving projects or for teams that lack dedicated senior reviewers. The rise of large language models (LLMs) has opened the door to automated assistance, but most existing solutions are either cloud‑only services that expose proprietary data or single‑purpose bots that lack flexibility. -Enter **PR Reviewer**, a self‑hosted AI reviewer that brings three specialised agents to every PR, every time. By delegating the mechanical parts of a review—linting, vulnerability scanning, infrastructure sanity checks—to a deterministic, always‑awake service, teams can free senior engineers to focus on architectural decisions, mentorship, and the nuanced conversations that no model can replace. This article walks through the motivation, design, and practical steps for getting PR Reviewer up and running in a production environment. +**PR Reviewer** is an attempt to bridge that gap. Built on top of **CrewAI**, a multi‑agent orchestration framework, and **MCP (Model Context Protocol)**, a thin abstraction over static analysis tools, it offers a fully deployable, locally‑runnable AI reviewer. It can be run on a developer’s laptop, inside a CI container, or as a Kubernetes service, and it works with any LLM provider that conforms to the CrewAI interface (OpenAI, Anthropic, Ollama, etc.). Most importantly, it can ingest repository‑specific guidelines so the AI respects the coding style and security posture that your team has already defined. -## The problem space – symptoms of manual review fatigue +This article walks through the motivations, architecture, installation steps, usage patterns, and future directions of PR Reviewer. By the end you should understand not only *how* to get it running, but also *why* the design choices matter for reliability, privacy, and extensibility. -Before diving into the solution, it helps to enumerate the pain points that most teams encounter: +--- -1. **Inconsistent style enforcement** – Different reviewers apply different conventions, leading to a codebase that looks like a collage of personal preferences. -2. **Security blind spots** – Time‑pressed developers may skip static analysis, allowing known CVEs or injection vectors to slip through. -3. **Infrastructure drift** – Dockerfiles, Helm charts, and Terraform scripts often evolve without a single source of truth, creating deployment‑time surprises. -4. **Review bottlenecks** – When a single senior engineer is the “go‑to” reviewer, their availability becomes a single point of failure. -5. **Context loss** – New contributors rarely have access to a team’s style guide, security playbook, or infrastructure policy, so they guess. +## The Problem with Existing Review Automation -These issues are not theoretical; they manifest as longer cycle times, higher post‑release defect rates, and a growing maintenance burden. Automating the low‑level checks while preserving the ability to inject team‑specific guidance is the sweet spot PR Reviewer aims for. +### 1. Cloud‑centric services expose code -## Core philosophy – “your standards, your infrastructure, your LLM” +Many “AI code reviewer” products operate as SaaS endpoints. You push a diff, they return suggestions. While convenient, this model forces you to ship proprietary source code to a third‑party server. For organisations handling regulated data, intellectual property, or simply a strong privacy policy, that is a non‑starter. -PR Reviewer is deliberately built around three non‑negotiables: +### 2. Single‑purpose bots lack context -* **Customisable context** – Every review runs against a set of markdown‑based guidelines that you supply. Whether you follow PEP 8, enforce OWASP Top 10 mitigations, or require a specific base image for Docker, the system respects those rules. -* **Self‑hosted execution** – The service runs inside your own network, behind your firewall, on any platform that can run Docker or a Python virtual environment. No code leaves your premises unless you explicitly point it at an external LLM. -* **Pluggable LLM provider** – The LLM factory abstracts OpenAI, Anthropic, Ollama, or any future provider behind a common interface. You pick the model that matches your cost, latency, and data‑privacy requirements. +Tools such as GitHub Copilot or Codacy focus on either style linting or security scanning, but they rarely combine the three major review domains—code quality, security, and infrastructure—into a single, coherent feedback loop. When you stitch multiple services together you end up with duplicated effort and contradictory recommendations. -By keeping the three pillars separate, PR Reviewer can evolve without forcing you to abandon existing policies or infrastructure investments. +### 3. Rigid rule sets -## High‑level architecture – how the pieces fit together +Traditional static analysis tools rely on hard‑coded rule sets. They can be extended, but the process is often cumbersome and requires deep knowledge of the tool’s DSL. Moreover, they cannot adapt to project‑specific conventions without a substantial amount of manual configuration. -At a glance, the system consists of four logical layers: +### 4. Integration friction -1. **API Layer** – A lightweight FastAPI service exposing health‑check and review‑trigger endpoints. It validates incoming payloads, authenticates callers (if you enable it), and forwards the request downstream. -2. **Orchestration Layer** – Powered by **CrewAI**, this layer defines a *flow* that spins up three independent crews: code, security, and infrastructure. Each crew runs its own set of agents, each agent being a thin wrapper around a static‑analysis tool or an LLM prompt. -3. **MCP Integration Layer** – The **Model Context Protocol** (MCP) bridges agents and external tools. For example, the code‑review crew calls Semgrep via MCP, the security crew invokes Trivy, and the infra crew talks to Hadolint and Checkov. MCP also normalises the output so the orchestration layer can aggregate results. -4. **State & Context Layer** – Pydantic models capture the PR metadata, file diffs, and any per‑PR context overrides. A context‑resolution subsystem loads the default markdown guidelines from `contexts/defaults/` and merges them with overrides supplied in the API request. +CI pipelines already juggle a host of steps: building, testing, deploying. Adding a new review stage that requires separate credentials, network access, or a bespoke CLI can quickly become a maintenance nightmare. -The flow is deliberately linear: the API receives a request, the orchestration layer launches the three crews in parallel, each crew returns its findings, and a final summariser agent synthesises a human‑readable report. Because the crews are independent, you can re‑order them, add new crews (e.g., a licence‑compliance crew), or run them conditionally based on the size of the PR. +PR Reviewer was conceived to address each of these pain points by providing a **single, locally‑hosted service** that unifies multiple review perspectives, respects custom guidelines, and integrates cleanly with existing CI/CD workflows. -## The three specialised crews – what they actually do +--- -### Code Review Crew +## Core Concepts: CrewAI and MCP -* **Toolchain** – Semgrep, accessed through MCP. -* **Focus** – Syntax correctness, anti‑patterns, complexity metrics, and adherence to language‑specific style guides. -* **Typical output** – “Unused import `json` in `utils.py`”, “Function `process_data` exceeds cyclomatic complexity of 15”, “Prefer f‑strings over `%` formatting”. +### CrewAI -### Security Review Crew +CrewAI is a framework for building **multi‑agent systems**. An “agent” is a self‑contained unit that can perform a specific task—run a linter, query an LLM, or aggregate results. CrewAI handles: -* **Toolchain** – Trivy (native MCP integration) plus optional custom CVE databases. -* **Focus** – Known vulnerabilities in dependencies, insecure configuration flags, and common injection patterns. -* **Typical output** – “CVE‑2023‑XXXXX found in `requests` 2.28.0”, “Hard‑coded AWS secret key in `config.py`”, “Potential SQL injection in `execute_query`”. +* **Orchestration** – defining the order in which agents run and how they share data. +* **State Management** – a shared, typed model (via Pydantic) that guarantees consistency across agents. +* **Provider Abstraction** – a factory pattern that lets you swap LLM back‑ends without touching agent logic. -### Infrastructure Review Crew +In PR Reviewer we define three primary agents: `CodeAgent`, `SecurityAgent`, and `InfraAgent`. Each agent invokes a static analysis tool through MCP, then passes the raw findings to an LLM for summarisation and actionable advice. -* **Toolchain** – Hadolint for Dockerfiles, Checkov for IaC (Terraform, CloudFormation, Kubernetes manifests). -* **Focus** – Container best practices, least‑privilege IAM roles, resource limits, and drift detection. -* **Typical output** – “Use non‑root user in Dockerfile”, “Missing `resources.limits.cpu` in Kubernetes Deployment”, “Terraform `aws_s3_bucket` lacks server‑side encryption”. +### Model Context Protocol (MCP) -Each crew receives the same PR metadata but works with a slice of the file set relevant to its domain. The separation keeps the agents lightweight and makes debugging straightforward: if a security finding looks odd, you know it originated from the security crew. +MCP is a lightweight protocol that standardises how external analysis tools are called and how their output is presented to downstream agents. It provides: -## Context system – teaching the robot your team’s way +* **Uniform JSON schema** for tool results (e.g., Semgrep findings, Trivy vulnerabilities). +* **Wrapper utilities** that translate CLI output into the schema, regardless of the underlying tool. +* **Extensibility hooks** for adding new tools without modifying the core orchestration code. -One of the most common complaints about generic AI reviewers is that they ignore the idiosyncrasies of a particular codebase. PR Reviewer solves this with a **context system** that works like a configurable style guide: +By decoupling tool execution from the LLM logic, MCP ensures that the reviewer can evolve as new static analysis utilities emerge, while the rest of the system remains stable. -* **Default markdown files** – `contexts/defaults/code_review.md`, `security_review.md`, and `infra_review.md`. These contain bullet‑point rules, examples, and any organisational policies you want the agents to honour. -* **Per‑PR overrides** – The API payload includes a `context` object where you can supply a short string or a path to a custom markdown snippet. For a PR that introduces a new database, you might add “Prioritise parameterised queries for PostgreSQL”. -* **Dynamic loading** – At request time, the system merges defaults with overrides, giving each crew a final set of guidelines that are injected into the LLM prompts. The result is feedback that reads “Your logging follows the team’s `logrus` conventions” rather than a generic “Consider using a structured logger”. +--- -Because the guidelines are plain markdown, they are easy to version‑control, review, and evolve alongside the code they govern. +## Architecture Overview -## Installation – getting the service onto your machine +Below is a high‑level diagram (described in prose) of the PR Reviewer service: -### Prerequisites +1. **API Layer (FastAPI)** – Exposes `/health` and `/review` endpoints. Incoming requests are validated against Pydantic models and placed onto an internal task queue. +2. **Task Queue** – A lightweight in‑process queue (or optionally Redis) that enables asynchronous processing, preventing the API from blocking on long‑running analyses. +3. **Orchestrator (CrewAI Flow)** – Pulls a task from the queue, creates a fresh `ReviewState` object, and launches the three agents in parallel. +4. **Agents** + * **CodeAgent** – Calls Semgrep via MCP, receives a list of rule violations, forwards them to the LLM for natural‑language explanation. + * **SecurityAgent** – Executes Trivy, parses vulnerability data, asks the LLM to assess severity and suggest mitigations. + * **InfraAgent** – Runs Hadolint and Checkov on Dockerfiles and Kubernetes manifests, then asks the LLM to verify best‑practice compliance. +5. **LLM Factory** – Based on environment configuration, selects the appropriate provider (OpenAI, Anthropic, Ollama, etc.) and supplies a consistent `generate` method to all agents. +6. **Result Aggregator** – Collects the three streams of feedback, synthesises a concise summary, and stores the final `ReviewResult` in a JSON response. +7. **Persistence (optional)** – Results can be persisted to a PostgreSQL table or an S3 bucket for audit trails; this is not required for the core functionality. -* Python 3.10–3.13 (the project uses modern type hints and Pydantic v2) -* UV package manager (recommended for reproducible environments) -* Git (to clone the repository) -* Docker (optional, for containerised deployment) +All components are containerised, with a single Dockerfile that builds the service and its dependencies. The modular design means you can replace the FastAPI layer with a gRPC server, swap the queue implementation, or add new agents without touching the existing code. -### Local development workflow +--- + +## Detailed Agent Design + +### CodeAgent + +* **Input** – List of changed files (path, content, diff metadata). +* **MCP Call** – `semgrep.run(files=..., config=default)` returns a JSON array of rule matches. +* **LLM Prompt** – The agent constructs a prompt that includes the rule description, the offending code snippet, and any project‑specific style guidelines supplied in `contexts/code_review.md`. +* **Output** – Human‑readable commentary, a severity rating, and a suggested fix. + +### SecurityAgent + +* **Input** – Full repository snapshot (required for Trivy to resolve dependencies). +* **MCP Call** – `trivy.scan(repo_path)` yields CVE identifiers, package names, and severity levels. +* **LLM Prompt** – The prompt merges CVE details with the repository’s security policy from `contexts/security_review.md`. +* **Output** – Prioritised remediation steps, references to official advisories, and an impact assessment. + +### InfraAgent + +* **Input** – All infrastructure‑as‑code files (Dockerfile, Helm charts, Terraform). +* **MCP Calls** – + * `hadolint.lint(dockerfile)` for Docker best practices. + * `checkov.scan(k8s_manifests)` for Kubernetes policy compliance. +* **LLM Prompt** – Combines findings with `contexts/infra_review.md`. +* **Output** – Recommendations on image layering, secret handling, and resource limits. + +Each agent runs in its own coroutine, allowing the orchestrator to exploit multi‑core CPUs. Errors from any tool are caught, logged, and transformed into a graceful “unable to analyse” message rather than aborting the whole review. + +--- + +## Contextual Guidelines: Making the Review Personal + +One of PR Reviewer’s differentiators is the ability to **import repository‑specific guidelines**. By default the service ships with three markdown files: + +* `code_review.md` – General coding conventions (e.g., PEP8, naming schemes). +* `security_review.md` – Organizational security posture (e.g., “no hard‑coded credentials”). +* `infra_review.md` – Infrastructure standards (e.g., “use non‑root user in Docker images”). + +These files are read at startup and cached. When a request includes a `context` object, the supplied snippets **override** the defaults for that particular review. This mechanism enables teams to enforce their own style without rewriting the underlying agents. + +For example, a project that prefers **Google’s Python style guide** can drop a custom `code_review.md` into the repository root; the API call can reference it via the `context` field, and the LLM will tailor its suggestions accordingly. + +--- + +## Installation Guide + +### Prerequisites + +| Requirement | Minimum Version | +|-------------|-----------------| +| Python | 3.10 | +| UV package manager | latest | +| Git | any | +| Docker (optional) | 20.10+ | + +### Local Development + +1. **Clone the repository** -1. **Clone the repo** ```bash git clone https://git.aridgwayweb.com/armistace/pr_reviewer.git cd pr_reviewer ``` + 2. **Install UV** + ```bash curl -LsSf https://astral.sh/uv/install.sh | sh - source $HOME/.local/bin/env ``` + 3. **Create and activate a virtual environment** + ```bash uv venv .venv source .venv/bin/activate ``` + 4. **Install the package in editable mode** + ```bash uv pip install -e . ``` -5. **Configure environment variables** – copy `.env.example` to `.env` and fill in your LLM provider credentials, preferred model name, and any optional limits. -6. **Run the service** +5. **Configure environment variables** – copy `.env.example` to `.env` and fill in the LLM credentials (e.g., `OPENAI_API_KEY`). + +6. **Run the FastAPI server** + ```bash - uv run uvicorn pr_reviewer.main:app --host 0.0.0.0 --port 8000 + uvicorn pr_reviewer.main:app --host 0.0.0.0 --port 8000 ``` -You now have a local FastAPI server listening on port 8000, ready to accept review requests. +The service will now be reachable at `http://localhost:8000`. -### Docker‑based deployment +### Docker Deployment -For teams that prefer immutable infrastructure, a Dockerfile is provided: +A single‑stage Dockerfile builds the application and its dependencies: ```dockerfile FROM python:3.12-slim WORKDIR /app COPY . . -RUN pip install -e . +RUN pip install uv && uv pip install -e . EXPOSE 8000 CMD ["uvicorn", "pr_reviewer.main:app", "--host", "0.0.0.0", "--port", "8000"] ``` @@ -139,178 +196,281 @@ docker build -t pr-reviewer . docker run -p 8000:8000 --env-file .env pr-reviewer ``` -The container can be orchestrated with Kubernetes, Docker‑Compose, or any platform that supports OCI images. +### Kubernetes (Optional) -## API contract – talking to the reviewer +The `k8s/` directory contains three manifests: -PR Reviewer exposes two primary endpoints. +* **Secret** – Holds LLM API keys. +* **Deployment** – Scales the service; resource requests are modest (CPU 250m, Memory 256Mi). +* **Service** – Exposes the API via a ClusterIP; an Ingress can be added for external access. -### Health check +Apply with: -``` -GET /api/v1/health +```bash +kubectl apply -k k8s/ ``` -A simple JSON payload `{ "status": "ok" }` confirms the service is alive and the LLM factory can be contacted. +--- -### Trigger review +## Configuration Details -``` -POST /api/v1/review -``` +### Environment Variables -The request body is a JSON object containing: +| Variable | Description | Example | +|----------|-------------|---------| +| `LLM_PROVIDER` | Chooses the LLM backend (`openai`, `anthropic`, `ollama`). | `openai` | +| `OPENAI_API_KEY` | API key for OpenAI (if provider is `openai`). | `sk-...` | +| `ANTHROPIC_API_KEY` | API key for Anthropic. | `...` | +| `OLLAMA_HOST` | URL of the local Ollama server. | `http://localhost:11434` | +| `MCP_CONFIG_PATH` | Path to a JSON file that maps tool names to MCP wrappers. | `configs/mcp.json` | +| `REVIEW_TIMEOUT_SECONDS` | Maximum time a review may take before being aborted. | `120` | -* `pr_id`, `title`, `description` -* Repository information (`name`, `url`) -* Source and target branch details (`branch`, `commit`) -* An array of file objects (`path`, `content`, `status`, `additions`, `deletions`) -* Optional `context` overrides for each crew +All variables are documented in `.env.example`. Missing variables cause the service to fail fast, preventing ambiguous runtime errors. -A minimal example (trimmed for brevity) looks like this: +### Context Files + +The default guidelines live under `contexts/defaults/`. To customise: + +1. Create a `contexts/custom/` directory in your repository. +2. Add `code_review.md`, `security_review.md`, or `infra_review.md` as needed. +3. When invoking the API, set the `context` field to point to the custom files, e.g.: ```json { - "pr_id": "42", - "title": "Add health‑check endpoint", - "description": "Implements a basic /health route for the API", - "repo": { "name": "pr-reviewer", "url": "https://github.com/example/pr-reviewer" }, - "source": { "branch": "feature/health", "commit": "a1b2c3" }, - "target": { "branch": "main", "commit": "d4e5f6" }, - "files": [ - { - "path": "src/main.py", - "content": "def health(): return {'status': 'ok'}", - "status": "added", - "additions": 3, - "deletions": 0 - } - ], "context": { - "code_review": "Follow PEP8, prefer type hints", - "security_review": "Check for open redirects", - "infra_review": "Dockerfile must use alpine base" + "code_review": "file://contexts/custom/code_review.md" } } ``` -The response contains a unique `review_id`, a `status` flag, a timestamp, and a `results` object with three sections (`code_review`, `security_review`, `infra_review`) plus a synthesized `summary`. Processing time is reported in seconds, enabling you to monitor performance trends. +The service resolves `file://` URIs relative to the repository root, reads the markdown, and injects it into the LLM prompt. -## Integrating with CI/CD – making reviews automatic +--- -Because the API is HTTP‑based, wiring it into any pipeline is straightforward. Below are snippets for three popular CI systems; the actual YAML files are included in the repository. +## API Usage -### GitHub Actions (or compatible) +### Health Check + +```http +GET /api/v1/health +``` + +Returns a JSON payload `{ "status": "ok", "uptime_seconds": 342 }`. Useful for CI probes. + +### Trigger a PR Review + +```http +POST /api/v1/review +Content-Type: application/json +``` + +#### Request Body (abridged) + +```json +{ + "pr_id": "42", + "title": "Add authentication middleware", + "description": "Implements JWT validation for incoming requests.", + "repo": { + "name": "awesome-service", + "url": "https://github.com/example/awesome-service" + }, + "source": { + "branch": "feature/auth-middleware", + "commit": "a1b2c3d" + }, + "target": { + "branch": "main", + "commit": "d4e5f6g" + }, + "files": [ + { + "path": "src/auth.py", + "content": "def verify(token): ...", + "status": "added", + "additions": 45, + "deletions": 0 + } + ], + "context": { + "code_review": "Follow Google Python Style Guide", + "security_review": "Disallow weak hashing algorithms", + "infra_review": "Base images must be from official repositories" + } +} +``` + +#### Response Payload (abridged) + +```json +{ + "review_id": "c0f5e9b2-7d3a-4f1a-9c6e-2b5d8f1a9e3c", + "status": "completed", + "timestamp": "2026-05-14T18:12:34Z", + "results": { + "code_review": "The function `verify` lacks type hints and does not validate token expiry. Consider using `pydantic` models.", + "security_review": "No obvious vulnerabilities detected, but ensure the JWT secret is stored in a secret manager.", + "infra_review": "No Dockerfile changes detected; infra review skipped.", + "summary": "Overall the PR introduces necessary authentication logic but would benefit from type annotations and secret management." + }, + "metadata": { + "processing_time_seconds": 38.7, + "pr_id": "42", + "repo": { + "name": "awesome-service", + "url": "https://github.com/example/awesome-service" + } + } +} +``` + +The API is deliberately simple: a single POST triggers the whole pipeline, and the response contains both raw agent outputs and a synthesized summary. Clients can poll the `review_id` endpoint for status updates if they prefer asynchronous handling. + +--- + +## Integration with CI/CD + +Because PR Reviewer exposes a REST endpoint, it can be called from any CI system that can execute `curl` or a lightweight HTTP client. Below is a generic example for a GitHub Actions workflow: ```yaml name: PR Review on: pull_request: types: [opened, synchronize] + jobs: review: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - - name: Trigger PR Reviewer - env: - PR_REVIEWER_URL: ${{ secrets.PR_REVIEWER_URL }} - PR_REVIEWER_TOKEN: ${{ secrets.PR_REVIEWER_TOKEN }} + - name: Gather PR metadata + id: meta run: | - curl -X POST "$PR_REVIEWER_URL/api/v1/review" \ - -H "Authorization: Bearer $PR_REVIEWER_TOKEN" \ + echo "pr_id=${{ github.event.pull_request.number }}" >> $GITHUB_OUTPUT + echo "repo_url=${{ github.event.pull_request.head.repo.clone_url }}" >> $GITHUB_OUTPUT + - name: Call PR Reviewer + env: + REVIEWER_URL: http://pr-reviewer.internal:8000 + run: | + curl -s -X POST "$REVIEWER_URL/api/v1/review" \ -H "Content-Type: application/json" \ - -d @.github/pr_payload.json + -d @- <