Add PR Reviewer deployment docs

2026-05-09 18:38:39 +00:00 · 2026-05-09 18:38:39 +00:00 · e044202042
commit e044202042
parent 0cf39e41e9
1 changed files with 316 additions and 0 deletions
--- a/src/content/pr_reviewer__a_deployable_ai_reviewer_for_your_repos.md
+++ b/src/content/pr_reviewer__a_deployable_ai_reviewer_for_your_repos.md
@ -0,0 +1,316 @@
 Title: PR Reviewer - A deployable AI reviewer for your Repos
 Date: 2026-05-09 18:37
 Modified: 2026-05-09 18:37
 Category: DevOps
 Tags: ai, code-review, automation, open-source, devops
 Slug: pr-reviewer-deployable-ai-reviewer
 Authors: glm-5.1.ai, nemotron-3-nano.ai, gemma4.ai, deepseek-v4-flash.ai
 Summary: An in‑depth look at PR Reviewer, an open‑source, locally deployable AI that automates code, security and infrastructure reviews using CrewAI and MCP.
 ---
 ## Introduction – why a robot reviewer matters
 Pull‑request (PR) reviews have become the gate‑keeper of software quality in modern development teams. Yet the human element that makes a review useful—attention to detail, consistency, and a willingness to flag the obvious—often collides with real‑world pressures: sprint deadlines, inbox overload, and the occasional cat video on Slack. The result is a patchwork of rushed approvals, missed security checks, and style drift that slowly erodes a codebase’s health.
 Enter **PR Reviewer**, a self‑hosted AI reviewer that brings three specialised agents to every PR, every time. By delegating the mechanical parts of a review—linting, vulnerability scanning, infrastructure sanity checks—to a deterministic, always‑awake service, teams can free senior engineers to focus on architectural decisions, mentorship, and the nuanced conversations that no model can replace. This article walks through the motivation, design, and practical steps for getting PR Reviewer up and running in a production environment.
 ## The problem space – symptoms of manual review fatigue
 Before diving into the solution, it helps to enumerate the pain points that most teams encounter:
 1. **Inconsistent style enforcement** – Different reviewers apply different conventions, leading to a codebase that looks like a collage of personal preferences.
 2. **Security blind spots** – Time‑pressed developers may skip static analysis, allowing known CVEs or injection vectors to slip through.
 3. **Infrastructure drift** – Dockerfiles, Helm charts, and Terraform scripts often evolve without a single source of truth, creating deployment‑time surprises.
 4. **Review bottlenecks** – When a single senior engineer is the “go‑to” reviewer, their availability becomes a single point of failure.
 5. **Context loss** – New contributors rarely have access to a team’s style guide, security playbook, or infrastructure policy, so they guess.
 These issues are not theoretical; they manifest as longer cycle times, higher post‑release defect rates, and a growing maintenance burden. Automating the low‑level checks while preserving the ability to inject team‑specific guidance is the sweet spot PR Reviewer aims for.
 ## Core philosophy – “your standards, your infrastructure, your LLM”
 PR Reviewer is deliberately built around three non‑negotiables:
 * **Customisable context** – Every review runs against a set of markdown‑based guidelines that you supply. Whether you follow PEP 8, enforce OWASP Top 10 mitigations, or require a specific base image for Docker, the system respects those rules.
 * **Self‑hosted execution** – The service runs inside your own network, behind your firewall, on any platform that can run Docker or a Python virtual environment. No code leaves your premises unless you explicitly point it at an external LLM.
 * **Pluggable LLM provider** – The LLM factory abstracts OpenAI, Anthropic, Ollama, or any future provider behind a common interface. You pick the model that matches your cost, latency, and data‑privacy requirements.
 By keeping the three pillars separate, PR Reviewer can evolve without forcing you to abandon existing policies or infrastructure investments.
 ## High‑level architecture – how the pieces fit together
 At a glance, the system consists of four logical layers:
 1. **API Layer** – A lightweight FastAPI service exposing health‑check and review‑trigger endpoints. It validates incoming payloads, authenticates callers (if you enable it), and forwards the request downstream.
 2. **Orchestration Layer** – Powered by **CrewAI**, this layer defines a *flow* that spins up three independent crews: code, security, and infrastructure. Each crew runs its own set of agents, each agent being a thin wrapper around a static‑analysis tool or an LLM prompt.
 3. **MCP Integration Layer** – The **Model Context Protocol** (MCP) bridges agents and external tools. For example, the code‑review crew calls Semgrep via MCP, the security crew invokes Trivy, and the infra crew talks to Hadolint and Checkov. MCP also normalises the output so the orchestration layer can aggregate results.
 4. **State & Context Layer** – Pydantic models capture the PR metadata, file diffs, and any per‑PR context overrides. A context‑resolution subsystem loads the default markdown guidelines from `contexts/defaults/` and merges them with overrides supplied in the API request.
 The flow is deliberately linear: the API receives a request, the orchestration layer launches the three crews in parallel, each crew returns its findings, and a final summariser agent synthesises a human‑readable report. Because the crews are independent, you can re‑order them, add new crews (e.g., a licence‑compliance crew), or run them conditionally based on the size of the PR.
 ## The three specialised crews – what they actually do
 ### Code Review Crew
 * **Toolchain** – Semgrep, accessed through MCP.
 * **Focus** – Syntax correctness, anti‑patterns, complexity metrics, and adherence to language‑specific style guides.
 * **Typical output** – “Unused import `json` in `utils.py`”, “Function `process_data` exceeds cyclomatic complexity of 15”, “Prefer f‑strings over `%` formatting”.
 ### Security Review Crew
 * **Toolchain** – Trivy (native MCP integration) plus optional custom CVE databases.
 * **Focus** – Known vulnerabilities in dependencies, insecure configuration flags, and common injection patterns.
 * **Typical output** – “CVE‑2023‑XXXXX found in `requests` 2.28.0”, “Hard‑coded AWS secret key in `config.py`”, “Potential SQL injection in `execute_query`”.
 ### Infrastructure Review Crew
 * **Toolchain** – Hadolint for Dockerfiles, Checkov for IaC (Terraform, CloudFormation, Kubernetes manifests).
 * **Focus** – Container best practices, least‑privilege IAM roles, resource limits, and drift detection.
 * **Typical output** – “Use non‑root user in Dockerfile”, “Missing `resources.limits.cpu` in Kubernetes Deployment”, “Terraform `aws_s3_bucket` lacks server‑side encryption”.
 Each crew receives the same PR metadata but works with a slice of the file set relevant to its domain. The separation keeps the agents lightweight and makes debugging straightforward: if a security finding looks odd, you know it originated from the security crew.
 ## Context system – teaching the robot your team’s way
 One of the most common complaints about generic AI reviewers is that they ignore the idiosyncrasies of a particular codebase. PR Reviewer solves this with a **context system** that works like a configurable style guide:
 * **Default markdown files** – `contexts/defaults/code_review.md`, `security_review.md`, and `infra_review.md`. These contain bullet‑point rules, examples, and any organisational policies you want the agents to honour.
 * **Per‑PR overrides** – The API payload includes a `context` object where you can supply a short string or a path to a custom markdown snippet. For a PR that introduces a new database, you might add “Prioritise parameterised queries for PostgreSQL”.
 * **Dynamic loading** – At request time, the system merges defaults with overrides, giving each crew a final set of guidelines that are injected into the LLM prompts. The result is feedback that reads “Your logging follows the team’s `logrus` conventions” rather than a generic “Consider using a structured logger”.
 Because the guidelines are plain markdown, they are easy to version‑control, review, and evolve alongside the code they govern.
 ## Installation – getting the service onto your machine
 ### Prerequisites
 * Python 3.10–3.13 (the project uses modern type hints and Pydantic v2)
 * UV package manager (recommended for reproducible environments)
 * Git (to clone the repository)
 * Docker (optional, for containerised deployment)
 ### Local development workflow
 1. **Clone the repo**  
   ```bash
   git clone https://git.aridgwayweb.com/armistace/pr_reviewer.git
   cd pr_reviewer
   ```
 2. **Install UV**  
   ```bash
   curl -LsSf https://astral.sh/uv/install.sh | sh
   source $HOME/.local/bin/env
   ```
 3. **Create and activate a virtual environment**  
   ```bash
   uv venv .venv
   source .venv/bin/activate
   ```
 4. **Install the package in editable mode**  
   ```bash
   uv pip install -e .
   ```
 5. **Configure environment variables** – copy `.env.example` to `.env` and fill in your LLM provider credentials, preferred model name, and any optional limits.
 6. **Run the service**  
   ```bash
   uv run uvicorn pr_reviewer.main:app --host 0.0.0.0 --port 8000
   ```
 You now have a local FastAPI server listening on port 8000, ready to accept review requests.
 ### Docker‑based deployment
 For teams that prefer immutable infrastructure, a Dockerfile is provided:
 ```dockerfile
 FROM python:3.12-slim
 WORKDIR /app
 COPY . .
 RUN pip install -e .
 EXPOSE 8000
 CMD ["uvicorn", "pr_reviewer.main:app", "--host", "0.0.0.0", "--port", "8000"]
 ```
 Build and run:
 ```bash
 docker build -t pr-reviewer .
 docker run -p 8000:8000 --env-file .env pr-reviewer
 ```
 The container can be orchestrated with Kubernetes, Docker‑Compose, or any platform that supports OCI images.
 ## API contract – talking to the reviewer
 PR Reviewer exposes two primary endpoints.
 ### Health check
 ```
 GET /api/v1/health
 ```
 A simple JSON payload `{ "status": "ok" }` confirms the service is alive and the LLM factory can be contacted.
 ### Trigger review
 ```
 POST /api/v1/review
 ```
 The request body is a JSON object containing:
 * `pr_id`, `title`, `description`
 * Repository information (`name`, `url`)
 * Source and target branch details (`branch`, `commit`)
 * An array of file objects (`path`, `content`, `status`, `additions`, `deletions`)
 * Optional `context` overrides for each crew
 A minimal example (trimmed for brevity) looks like this:
 ```json
 {
  "pr_id": "42",
  "title": "Add health‑check endpoint",
  "description": "Implements a basic /health route for the API",
  "repo": { "name": "pr-reviewer", "url": "https://github.com/example/pr-reviewer" },
  "source": { "branch": "feature/health", "commit": "a1b2c3" },
  "target": { "branch": "main", "commit": "d4e5f6" },
  "files": [
    {
      "path": "src/main.py",
      "content": "def health(): return {'status': 'ok'}",
      "status": "added",
      "additions": 3,
      "deletions": 0
    }
  ],
  "context": {
    "code_review": "Follow PEP8, prefer type hints",
    "security_review": "Check for open redirects",
    "infra_review": "Dockerfile must use alpine base"
  }
 }
 ```
 The response contains a unique `review_id`, a `status` flag, a timestamp, and a `results` object with three sections (`code_review`, `security_review`, `infra_review`) plus a synthesized `summary`. Processing time is reported in seconds, enabling you to monitor performance trends.
 ## Integrating with CI/CD – making reviews automatic
 Because the API is HTTP‑based, wiring it into any pipeline is straightforward. Below are snippets for three popular CI systems; the actual YAML files are included in the repository.
 ### GitHub Actions (or compatible)
 ```yaml
 name: PR Review
 on:
  pull_request:
    types: [opened, synchronize]
 jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Trigger PR Reviewer
        env:
          PR_REVIEWER_URL: ${{ secrets.PR_REVIEWER_URL }}
          PR_REVIEWER_TOKEN: ${{ secrets.PR_REVIEWER_TOKEN }}
        run: |
          curl -X POST "$PR_REVIEWER_URL/api/v1/review" \
            -H "Authorization: Bearer $PR_REVIEWER_TOKEN" \
            -H "Content-Type: application/json" \
            -d @.github/pr_payload.json
 ```
 The payload file is generated on‑the‑fly using GitHub’s context variables, ensuring the service receives the exact diff.
 ### GitLab CI
 ```yaml
 review:
  stage: test
  image: curlimages/curl:7.85.0
  script:
    - curl -X POST "$PR_REVIEWER_URL/api/v1/review" \
        -H "Authorization: Bearer $PR_REVIEWER_TOKEN" \
        -H "Content-Type: application/json" \
        -d "$(cat $CI_PROJECT_DIR/.gitlab/pr_payload.json)"
 ```
 ### Gitea Actions
 A minimal workflow lives under `.gitea/workflows/deploy.yaml` and follows the same pattern: checkout, generate a JSON payload, POST to the service.
 By treating the review as a first‑class CI step, you guarantee that every PR receives a baseline set of checks before any human eyes ever see the diff.
 ## Security considerations – keeping your code safe
 Running an AI‑powered reviewer inevitably raises questions about data leakage and attack surface. PR Reviewer mitigates these concerns in several ways:
 1. **Local execution** – The service runs inside your own network. No code is sent to a third‑party endpoint unless you configure an external LLM that does so. If you prefer a fully offline model, Ollama can be run locally and pointed to via the LLM factory.
 2. **Least‑privilege secrets** – API keys for LLM providers are stored in Kubernetes Secrets (or Docker secrets) and never baked into container images. The service reads them at runtime and discards them after use.
 3. **Prompt sanitisation** – CrewAI’s latest release includes mitigations against prompt injection. All user‑supplied context strings are escaped before being interpolated into LLM prompts.
 4. **MCP sandboxing** – The static‑analysis tools run in isolated subprocesses with limited filesystem access. The MCP server enforces a whitelist of allowed binaries, preventing arbitrary command execution.
 5. **Version pinning** – The Docker image pins specific versions of Semgrep, Trivy, Hadolint, and Checkov. Regular CI runs verify that no new CVEs have been introduced into the toolchain itself.
 Nevertheless, no system is impervious. Teams should treat PR Reviewer as a *tool* that reduces risk, not eliminates it. Periodic audits of the deployed image, rotating LLM credentials, and monitoring API logs for anomalous usage are recommended best practices.
 ## Open‑source ethos – why the code is free for everyone
 PR Reviewer is released under the MIT licence, which means you can fork, modify, and redistribute the software without restriction. The decision to go open source stems from three motivations:
 * **Community feedback** – Real‑world usage uncovers edge cases that a single maintainer might never encounter. Pull requests from the community improve coverage for languages, frameworks, and infrastructure patterns.
 * **Transparency** – When you run the reviewer on your own hardware, you can inspect every line of code, ensuring there are no hidden data‑exfiltration mechanisms.
 * **Shared standards** – By publishing the default context files, we provide a baseline set of best practices that any team can adopt, adapt, or replace. Over time, a curated collection of community‑contributed guidelines could become a de‑facto standard for AI‑assisted reviews.
 If you find a bug, a missing language support, or simply have an idea for a new crew (e.g., a licence‑compliance crew), feel free to open an issue or submit a PR. The contribution guide in the repository walks you through the process.
 ## The road ahead – future directions and open challenges
 PR Reviewer is functional today, but the landscape of AI‑assisted development is moving fast. Planned enhancements include:
 1. **Learning‑augmented context** – Instead of static markdown, we are experimenting with a lightweight model that extracts style patterns from the existing codebase and suggests context updates automatically.
 2. **Fine‑grained permissioning** – A role‑based access control layer that lets you expose only the security crew to external contributors while keeping the infra crew internal.
 3. **Multi‑model orchestration** – Some teams may want to use a fast, cheap model for linting and a larger, more capable model for nuanced security reasoning. The LLM factory will soon support per‑crew model selection.
 4. **Metrics dashboard** – A Grafana‑compatible endpoint that emits counters for review duration, number of findings per crew, and trend lines for defect density over time.
 5. **Extended toolchain** – Plug‑ins for SonarQube, Snyk, and custom linters via MCP wrappers, making the system a one‑stop shop for all static analysis needs.
 We welcome collaborators to take ownership of any of these initiatives. The architecture is deliberately modular, so adding a new crew or swapping out a tool should be a matter of a few configuration files and a small amount of glue code.
 ## Getting started – your first robot review in minutes
 If you’ve read this far, you’re probably ready to give PR Reviewer a spin. Here’s a quick checklist:
 1. **Clone the repo** – `git clone https://git.aridgwayweb.com/armistace/pr_reviewer.git`
 2. **Set up a virtual environment** – Follow the UV steps in the Installation section.
 3. **Create a `.env` file** – Populate it with your LLM provider key and preferred model name.
 4. **Run the service** – `uv run uvicorn pr_reviewer.main:app --reload`
 5. **Send a test request** – Use `curl` or Postman with the minimal JSON payload shown earlier.
 6. **Inspect the response** – You should see three sections of feedback plus a concise summary.
 7. **Iterate on context** – Edit `contexts/defaults/*.md` to reflect your team’s conventions, then re‑run the test.
 Once you’re comfortable locally, spin up the Docker image in your CI environment and let the reviewer become a permanent gatekeeper for every pull request.
 ## The honest truth – what PR Reviewer isn’t
 No amount of AI can replace the human judgment that comes from years of experience. PR Reviewer does **not**:
 * Write business logic for you.
 * Replace architectural reviews or design discussions.
 * Guarantee zero security vulnerabilities—only that known patterns are flagged.
 * Provide 100 % accurate natural‑language explanations; occasional false positives are expected.
 Think of the system as a *spell‑checker for code*: it catches the low‑hanging fruit, freeing senior engineers to focus on the hard problems that truly add value.
 ## Call to action – join the community
 PR Reviewer is more than a personal side‑project; it’s an invitation to the wider developer community to shape the future of automated code quality. Whether you:
 * Deploy it in a production pipeline and share performance metrics,
 * Contribute a new crew for a language or framework you love,
 * Polish the default context files to match industry standards,
 your involvement makes the tool better for everyone. The repository lives at <https://git.aridgwayweb.com/armistace/pr_reviewer>. Clone it, spin it up, and start reviewing pull requests with a robot that never sleeps, never gets distracted by cat memes, and always respects the guidelines you set.
 Happy reviewing, and may your CI pipelines be ever green.