pr_reviewer__a_deployable_ai_reviewer_for_your_repos #25
@ -0,0 +1,309 @@
|
||||
Title: PR Reviewer - A deployable AI reviewer for your Repos
|
||||
Date: 2026-05-21 18:30
|
||||
Modified: 2026-05-21 18:30
|
||||
Category: DevOps
|
||||
Tags: ai, code-review, automation, devops, open-source, ai_content, not_human_content
|
||||
Slug: pr-reviewer-deployable-ai-reviewer
|
||||
Authors: Andrew Ridgway... And Friends - glm-5.1.ai, nemotron-3-nano.ai, gemma4.ai, deepseek-v4-flash.ai
|
||||
Summary: An in‑depth look at PR Reviewer, a self‑hosted, LLM‑agnostic AI system that automates code, security and infrastructure reviews for any Git repository.
|
||||
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
Pull requests (PRs) are the lifeblood of modern software development. They enable collaboration, enforce quality gates, and provide a natural checkpoint before code reaches production. Yet, the manual review process is increasingly strained by the sheer volume of changes, the growing complexity of tech stacks, and the need for specialised expertise in security and infrastructure.
|
||||
|
||||
Enter **PR Reviewer**, a locally deployable AI‑driven review engine that brings automated, multi‑domain analysis to any repository. Built on top of CrewAI’s flow orchestration and the Model Context Protocol (MCP), the system runs three parallel review streams—code quality, security, and infrastructure—then synthesises a concise, actionable report. It is deliberately LLM‑agnostic, supporting OpenAI, Anthropic, Ollama and any other provider that conforms to CrewAI’s abstraction layer.
|
||||
|
||||
This article walks through the motivations behind PR Reviewer, its architectural choices, feature set, deployment pathways, and practical considerations for teams that want to augment their PR workflow with AI without surrendering control to a third‑party SaaS.
|
||||
|
||||
## The case for AI‑augmented PR reviews
|
||||
|
||||
### Scaling expertise
|
||||
|
||||
Traditional code reviews rely on senior engineers to spot anti‑patterns, security flaws, and deployment mis‑configurations. As teams grow, the pool of reviewers does not always keep pace, leading to bottlenecks and inconsistent feedback. An AI reviewer can apply a consistent set of rules across every PR, ensuring that even junior contributors receive high‑quality guidance.
|
||||
|
||||
### Reducing cognitive load
|
||||
|
||||
Human reviewers must juggle multiple concerns—style, correctness, performance, compliance—while also understanding the broader context of a change. By offloading routine checks to an automated system, reviewers can focus on architectural decisions and nuanced trade‑offs that truly require human judgement.
|
||||
|
||||
### Faster feedback loops
|
||||
|
||||
Continuous integration pipelines already provide rapid build and test feedback. Adding an AI review step that runs in parallel with existing checks shortens the time between code submission and actionable feedback, encouraging a “shift‑left” mentality where problems are caught earlier.
|
||||
|
||||
### Vendor‑neutral flexibility
|
||||
|
||||
Many commercial AI review tools lock users into proprietary APIs and cloud‑only deployments. PR Reviewer’s design deliberately avoids vendor lock‑in. By abstracting the LLM layer, teams can run the service on‑premise, on a private cloud, or even on a modest workstation using a local model such as Ollama.
|
||||
|
||||
## Core concepts
|
||||
|
||||
### CrewAI flows
|
||||
|
||||
CrewAI provides a lightweight framework for orchestrating multiple “crews” (agents) that each perform a specialised task. In PR Reviewer, three crews—**CodeReviewCrew**, **SecurityCrew**, and **InfraCrew**—operate concurrently. Each crew receives the same PR context, runs its own analysis toolchain (Semgrep, Trivy, Hadolint/Checkov respectively), and returns a structured narrative.
|
||||
|
||||
### Model Context Protocol (MCP)
|
||||
|
||||
MCP standardises how external tools expose their findings to an LLM. Instead of feeding raw tool output, MCP wraps results in a JSON schema that includes severity, location, and remediation suggestions. This uniform representation enables the summariser crew to merge disparate findings into a single coherent report.
|
||||
|
||||
### Summariser crew
|
||||
|
||||
The final crew consumes the three domain‑specific outputs and asks the LLM to produce a human‑readable summary. The prompt includes the repository’s coding style guidelines (if supplied) and any custom review policies, ensuring the tone and recommendations align with the team’s expectations.
|
||||
|
||||
## Feature overview
|
||||
|
||||
| Feature | Description |
|
||||
|---|---|
|
||||
| **Code review** | Style, maintainability and best‑practice checks powered by Semgrep. |
|
||||
| **Security review** | Vulnerability scanning, secret detection and container image analysis via Trivy. |
|
||||
| **Infrastructure review** | Dockerfile linting, Kubernetes manifest validation, IaC checks using Hadolint and Checkov. |
|
||||
| **Summarisation** | Consolidated, actionable report generated by an LLM. |
|
||||
| **REST API** | FastAPI endpoints for health checks, manual review triggers, and webhook handling. |
|
||||
| **Gitea webhook** | Automatic PR event processing, diff fetching, and comment posting. |
|
||||
| **Dockerised** | Multi‑stage build with all dependencies baked in. |
|
||||
| **Kubernetes ready** | Helm‑compatible manifests and CI pipeline for automated deployment. |
|
||||
| **LLM‑agnostic** | Works with OpenAI, Anthropic, Ollama or any CrewAI‑compatible provider. |
|
||||
| **Configurable guidelines** | Override default review policies with repository‑specific markdown files. |
|
||||
|
||||
## Architecture deep dive
|
||||
|
||||
At a high level, PR Reviewer follows a request‑response pattern orchestrated by FastAPI. When a review request arrives—either via the `/api/v1/review` endpoint or a Gitea webhook—the service extracts the PR metadata, fetches the changed files, and constructs an MCP‑compatible payload. This payload is then dispatched to the three review crews in parallel.
|
||||
|
||||
```
|
||||
POST /api/v1/review → FastAPI handler
|
||||
│
|
||||
├─► Fetch diffs from Gitea (or use supplied file list)
|
||||
├─► Build MCP payload
|
||||
├─► Parallel execution:
|
||||
│ ├─ CodeReviewCrew (Semgrep)
|
||||
│ ├─ SecurityCrew (Trivy)
|
||||
│ └─ InfraCrew (Hadolint + Checkov)
|
||||
└─► Summariser crew → LLM → JSON response
|
||||
└─► Return consolidated report
|
||||
```
|
||||
|
||||
### Parallelism and timeouts
|
||||
|
||||
Each crew runs in its own asynchronous task with a configurable timeout (`PER_CREW_TIMEOUT`). The overall workflow respects a global timeout (`TOTAL_FLOW_TIMEOUT`) to prevent runaway processing on large PRs. If a crew exceeds its limit, the summariser notes the omission and proceeds with the available data.
|
||||
|
||||
### Data flow and persistence
|
||||
|
||||
PR Reviewer is deliberately stateless. All inputs are supplied in the request body, and all outputs are returned as JSON. This design simplifies horizontal scaling—multiple instances can sit behind a load balancer without coordination. For audit purposes, teams can enable optional logging to an external store (e.g., Elasticsearch) via environment variables.
|
||||
|
||||
## Integration with LLM providers
|
||||
|
||||
CrewAI abstracts the LLM behind a simple interface: `generate(prompt, model, temperature)`. The service reads three environment variables to configure the provider:
|
||||
|
||||
* `LLM_PROVIDER` – `openai`, `anthropic`, or `ollama`.
|
||||
* `LLM_MODEL` – model identifier (e.g., `gpt-4`, `claude-3-sonnet`, `gemma4:31b-cloud`).
|
||||
* `LLM_API_KEY` – required for hosted services; omitted for local Ollama instances.
|
||||
|
||||
Because the prompt is generated programmatically, switching providers does not require code changes—only a restart with new environment values. This flexibility is crucial for teams that wish to experiment with emerging open‑source models without rewriting integration logic.
|
||||
|
||||
## Review flows in detail
|
||||
|
||||
### Code review crew
|
||||
|
||||
The code crew invokes Semgrep with a curated rule set that reflects common Python, JavaScript and Go best practices. Findings are normalised into MCP entries containing:
|
||||
|
||||
* **Severity** – `critical`, `high`, `medium`, `low`.
|
||||
* **Location** – file path and line range.
|
||||
* **Message** – concise description of the issue.
|
||||
* **Remediation** – suggested code change or reference to documentation.
|
||||
|
||||
If a repository supplies a custom `code_review.md` guideline file, its contents are appended to the prompt, allowing the LLM to tailor feedback to the team’s style (e.g., preferring f‑strings over `%` formatting).
|
||||
|
||||
### Security review crew
|
||||
|
||||
Security analysis runs Trivy in two modes: vulnerability scanning of any container images referenced in the PR, and filesystem scanning for secrets, mis‑configurations, and known vulnerable dependencies. The output is again wrapped in MCP, with an additional field indicating **exploitability** based on CVSS scores.
|
||||
|
||||
### Infrastructure review crew
|
||||
|
||||
Infrastructure checks focus on Dockerfiles, Kubernetes manifests, and generic IaC (Terraform, CloudFormation). Hadolint validates Dockerfile best practices, while Checkov evaluates cloud resource definitions against industry‑standard policies (e.g., CIS benchmarks). The crew also respects any `infra_review.md` file that may contain organisation‑specific constraints such as mandatory resource limits.
|
||||
|
||||
### Summariser crew
|
||||
|
||||
The summariser receives three JSON arrays and constructs a single prompt that asks the LLM to:
|
||||
|
||||
1. Produce an executive summary of the overall health of the PR.
|
||||
2. List the top‑5 findings across all domains, ordered by severity.
|
||||
3. Provide actionable recommendations, grouped by domain.
|
||||
4. Highlight any deviations from the repository’s own guidelines.
|
||||
|
||||
The result is a markdown document that can be posted directly as a PR comment, ensuring developers receive a readable, context‑aware report without additional formatting steps.
|
||||
|
||||
## API design
|
||||
|
||||
PR Reviewer exposes a minimal FastAPI surface:
|
||||
|
||||
* `GET /api/v1/health` – health check returning `{ "status": "healthy", "service": "pr-reviewer" }`.
|
||||
* `POST /api/v1/review` – manual trigger; expects a JSON payload describing the PR (metadata, file list, optional overrides). Returns a JSON object containing a unique `review_id`, timestamps, and the full review results.
|
||||
* `POST /api/v1/gitea-webhook` – endpoint for Gitea pull‑request events. Validates the `X-Gitea-Signature` header (if `ACCESS_GITEA_SECRET` is set), fetches the diff via the Gitea API, runs the review pipeline, and posts the markdown summary as a comment on the PR.
|
||||
|
||||
All endpoints respect standard HTTP status codes and include descriptive error messages for malformed requests, authentication failures, or internal timeouts.
|
||||
|
||||
## Gitea webhook integration
|
||||
|
||||
Gitea is the default CI/CD platform for the reference implementation, but the webhook handler is deliberately generic:
|
||||
|
||||
1. **Signature verification** – HMAC‑SHA256 using the secret configured in `ACCESS_GITEA_SECRET`. If the secret is omitted, verification is skipped (useful for local testing).
|
||||
2. **Payload parsing** – Only `pull_request` events with actions `opened`, `synchronize`, or `reopened` are processed. Other events are ignored to reduce noise.
|
||||
3. **Diff retrieval** – The handler calls the Gitea API (`/repos/{owner}/{repo}/pulls/{id}/files`) to obtain the list of changed files, their statuses, and raw content when needed.
|
||||
4. **Review execution** – The same parallel crew workflow described earlier runs on the fetched diff.
|
||||
5. **Comment posting** – Upon completion, the service posts the markdown report to the PR using the Gitea API (`/repos/{owner}/{repo}/issues/{id}/comments`).
|
||||
|
||||
### Adding support for other platforms
|
||||
|
||||
Because the webhook payload is parsed into a canonical internal model, extending support to GitHub, GitLab or Bitbucket merely requires a thin adapter that translates their event schemas into the same structure. The core review logic remains untouched, making cross‑platform adoption straightforward.
|
||||
|
||||
## Deployment options
|
||||
|
||||
### Docker compose (local development)
|
||||
|
||||
The repository ships with a `docker-compose.yaml` that defines two services:
|
||||
|
||||
* `pr-reviewer` – the FastAPI application.
|
||||
* `ollama` (optional) – a local LLM server for offline use.
|
||||
|
||||
Running `docker compose up` builds the multi‑stage image, injects environment variables from `.env`, and exposes the API on `http://localhost:8000`.
|
||||
|
||||
### Kubernetes (production)
|
||||
|
||||
For production workloads, a Helm chart (or plain manifests in `kube/`) provides:
|
||||
|
||||
* A Deployment with configurable replica count.
|
||||
* A Service of type `NodePort` (default port `30001`) or `LoadBalancer` for cloud environments.
|
||||
* A Secret (`pr-reviewer-env`) that stores all `.env` values, including Gitea tokens and LLM credentials.
|
||||
* An optional HorizontalPodAutoscaler that scales based on CPU utilisation.
|
||||
|
||||
The CI pipeline (`.gitea/workflows/build_push.yml`) automatically builds a multi‑arch Docker image, pushes it to the configured registry, and applies the Kubernetes manifests.
|
||||
|
||||
### Resource considerations
|
||||
|
||||
* **CPU** – The LLM inference dominates CPU usage. When using a hosted provider, the container’s CPU footprint is modest (mostly for Semgrep/Trivy). With a local model, allocate at least 4 vCPUs and 8 GB RAM.
|
||||
* **Memory** – Each review crew consumes roughly 200 MB of RAM; the summariser adds another 150 MB. The total stays under 1 GB for typical PR sizes.
|
||||
* **Storage** – The image size is ~1.2 GB (including all scanning tools). Persistent storage is not required unless audit logging is enabled.
|
||||
|
||||
## Configuration details
|
||||
|
||||
All runtime options are supplied via environment variables. The most important groups are:
|
||||
|
||||
| Variable | Required? | Description |
|
||||
|---|---|---|
|
||||
| `LLM_PROVIDER` | Yes | `openai`, `anthropic`, or `ollama`. |
|
||||
| `LLM_MODEL` | Yes | Model identifier (e.g., `gpt-4`). |
|
||||
| `LLM_API_KEY` | Conditional | API key for hosted providers. |
|
||||
| `ACCESS_GITEA_URL` | Yes | Base URL of the Gitea instance. |
|
||||
| `ACCESS_GITEA_TOKEN` | Yes | Personal access token with repository read scope. |
|
||||
| `ACCESS_GITEA_SECRET` | No | Webhook secret for HMAC verification. |
|
||||
| `TOTAL_FLOW_TIMEOUT` | No (default 600) | Max seconds for the whole review pipeline. |
|
||||
| `PER_CREW_TIMEOUT` | No (default 300) | Max seconds per individual crew. |
|
||||
| `LOG_LEVEL` | No (default `INFO`) | Python logging verbosity. |
|
||||
|
||||
Additional optional variables allow overriding default review guidelines (`CODE_REVIEW_GUIDELINES`, `SECURITY_REVIEW_GUIDELINES`, `INFRA_REVIEW_GUIDELINES`) by pointing to markdown files stored in the container or mounted via a volume.
|
||||
|
||||
## Operational considerations
|
||||
|
||||
### Monitoring
|
||||
|
||||
FastAPI’s built‑in metrics can be exposed via `/metrics` (Prometheus format). Key metrics include:
|
||||
|
||||
* `pr_review_requests_total`
|
||||
* `pr_review_duration_seconds`
|
||||
* `crew_timeout_total` (per crew)
|
||||
* `llm_api_errors_total`
|
||||
|
||||
Collecting these metrics enables alerting on abnormal latency spikes, which often indicate upstream LLM throttling or unusually large diffs.
|
||||
|
||||
### Logging
|
||||
|
||||
Structured JSON logs are emitted by default, containing fields such as `request_id`, `pr_id`, `crew`, and `severity`. When integrated with a log aggregation platform (e.g., Loki), operators can trace the lifecycle of a single PR review from receipt to comment posting.
|
||||
|
||||
### Security
|
||||
|
||||
* **Secret management** – Store all tokens and API keys in a secret manager (Kubernetes Secrets, HashiCorp Vault, or Azure Key Vault). Never commit `.env` files to source control.
|
||||
* **Network isolation** – If using a local LLM, keep the Ollama container on a private network and restrict outbound internet access.
|
||||
* **Rate limiting** – The service respects the `X-RateLimit-Remaining` header from hosted LLM APIs and backs off automatically to avoid hitting provider quotas.
|
||||
|
||||
## Extending to other CI/CD platforms
|
||||
|
||||
While the reference implementation focuses on Gitea, the architecture encourages reuse:
|
||||
|
||||
1. **Create an adapter** – Implement a small FastAPI route that accepts GitHub `pull_request` webhook payloads, validates the signature (`X-Hub-Signature-256`), and maps fields to the internal PR model.
|
||||
2. **Reuse the core flow** – Forward the transformed payload to the existing `/api/v1/review` endpoint. No changes to the review crews are required.
|
||||
3. **Deploy the new route** – Add the new route to the FastAPI app, update the Docker image, and configure the external webhook in the target platform.
|
||||
|
||||
Because the review logic is decoupled from the webhook source, teams can support multiple providers simultaneously, each posting its own comment to the respective PR.
|
||||
|
||||
## Development workflow
|
||||
|
||||
Contributors who wish to enhance PR Reviewer can follow these steps:
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://git.aridgwayweb.com/armistace/pr_reviewer.git
|
||||
cd pr_reviewer
|
||||
|
||||
# Install development dependencies
|
||||
uv pip install -e ".[dev]"
|
||||
|
||||
# Run the test suite
|
||||
pytest tests/
|
||||
|
||||
# Start the server locally for rapid iteration
|
||||
uvicorn src.pr_reviewer.main:app --reload
|
||||
```
|
||||
|
||||
The project uses **uv** for isolated virtual environments, **pytest** for unit and integration tests, and **ruff** for linting. CI pipelines enforce 100 % test coverage and run static analysis on every pull request.
|
||||
|
||||
### Adding a new review tool
|
||||
|
||||
To incorporate an additional analysis tool (e.g., a custom static analyser), developers should:
|
||||
|
||||
1. Write a thin wrapper that converts the tool’s output into the MCP schema.
|
||||
2. Register a new crew in `crews/` that invokes the wrapper.
|
||||
3. Update the orchestration flow (`flow.py`) to include the new crew in the parallel execution block.
|
||||
4. Add corresponding unit tests that mock the tool’s output and verify correct MCP conversion.
|
||||
|
||||
## Testing and quality assurance
|
||||
|
||||
PR Reviewer’s reliability hinges on three testing layers:
|
||||
|
||||
* **Unit tests** – Validate each crew’s MCP conversion logic, LLM prompt generation, and webhook parsing.
|
||||
* **Integration tests** – Spin up a temporary Docker Compose environment with a mock Gitea server, submit a synthetic PR payload, and assert that the final markdown report contains expected sections.
|
||||
* **End‑to‑end tests** – Deploy the Helm chart to a disposable Kubernetes namespace, trigger a real Gitea webhook, and verify that the comment appears on the PR with correct formatting.
|
||||
|
||||
All tests run in CI on every push, and failures block merges.
|
||||
|
||||
## Community and contributions
|
||||
|
||||
The project is deliberately open‑source, hosted on a self‑managed Gitea instance. Contributors are encouraged to:
|
||||
|
||||
* **Open issues** – Report bugs, request new review domains, or suggest LLM prompt improvements.
|
||||
* **Submit pull requests** – Follow the contribution guidelines in `CONTRIBUTING.md`, which outline code style, testing requirements, and documentation standards.
|
||||
* **Share custom guidelines** – Teams can publish repository‑specific markdown files (e.g., `code_review.md`) that the summariser will automatically honour.
|
||||
|
||||
Because the tool is designed for private deployment, there is no central SaaS offering. Instead, the community benefits from shared Docker images, Helm charts, and a growing catalogue of custom rule sets that can be forked and adapted.
|
||||
|
||||
## Limitations and future directions
|
||||
|
||||
### Current constraints
|
||||
|
||||
* **LLM dependence** – The quality of the final summary is directly tied to the underlying model’s capabilities. Low‑capacity models may produce vague recommendations.
|
||||
* **Static analysis scope** – While Semgrep, Trivy, Hadolint and Checkov cover many common languages and platforms, niche tech stacks (e.g., Rust, Terraform Cloud) require additional adapters.
|
||||
* **No built‑in CI/CD orchestration** – PR Reviewer focuses on the review step; it does not enforce merge policies or gate deployments. Teams must integrate the API into their existing pipelines.
|
||||
|
||||
### Planned enhancements
|
||||
|
||||
1. **Model‑agnostic prompt optimisation** – Research into dynamic prompt templates that adapt to the strengths of each LLM provider.
|
||||
2. **Feedback loop** – Capture developer reactions to the AI suggestions (e.g., thumbs up/down) and use them to fine‑tune future prompts.
|
||||
3. **Extended platform support** – Official adapters for GitHub Actions, GitLab CI, and Azure DevOps.
|
||||
4. **Cache layer** – Introduce a Redis‑backed cache for repeated scans of unchanged files, reducing compute cost on large monorepos.
|
||||
5. **Policy as code** – Allow organisations to define review policies in a declarative YAML format that the summariser can reference, enabling compliance‑first workflows.
|
||||
|
||||
## Conclusion
|
||||
|
||||
PR Reviewer demonstrates that AI‑driven code quality, security, and infrastructure analysis can be delivered as a self‑hosted, vendor‑neutral service without sacrificing flexibility or control. By leveraging CrewAI’s flow orchestration, MCP’s structured data exchange, and a modular architecture, the system provides consistent, actionable feedback across multiple domains while remaining easy to extend and integrate into existing CI/CD pipelines.
|
||||
|
||||
For teams that value privacy, customisation, and the ability to run sophisticated analysis on modest hardware, PR Reviewer offers a pragmatic path forward. The open‑source nature invites collaboration, and the clear separation between tooling, LLM inference and summarisation ensures that future improvements—whether in scanning capabilities or language model performance—can be adopted with minimal friction.
|
||||
|
||||
Give it a spin, contribute a rule set, or simply use it to offload the routine parts of your PR workflow. In doing so, you’ll free up senior engineers to focus on the strategic decisions that truly move software forward.
|
||||
Loading…
x
Reference in New Issue
Block a user