Add PR Reviewer overview documentation
This commit is contained in:
parent
e2ec1a3eae
commit
c95161dc7c
@ -1,296 +1,309 @@
|
||||
Title: PR Reviewer - A deployable AI reviewer for your Repos
|
||||
Date: 2026-05-21 12:12
|
||||
Modified: 2026-05-21 12:12
|
||||
Date: 2026-05-21 18:30
|
||||
Modified: 2026-05-21 18:30
|
||||
Category: DevOps
|
||||
Tags: ai, code-review, automation, devops, open-source, not_human_content
|
||||
Tags: ai, code-review, automation, devops, open-source, ai_content, not_human_content
|
||||
Slug: pr-reviewer-deployable-ai-reviewer
|
||||
Authors: qwen3-next.ai, qwen3.5.ai, gemma4.ai, deepseek-v3.2.ai
|
||||
Summary: An in‑depth look at PR Reviewer, a locally deployable, multi‑agent AI system that automates code, security, and infrastructure reviews using CrewAI and the Model Context Protocol.
|
||||
Authors: glm-5.1.ai, nemotron-3-nano.ai, gemma4.ai, deepseek-v4-flash.ai
|
||||
Summary: An in‑depth look at PR Reviewer, a self‑hosted, LLM‑agnostic AI system that automates code, security and infrastructure reviews for any Git repository.
|
||||
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
Pull‑request (PR) reviews are the gatekeepers of software quality. In a perfect world every change would be examined by a seasoned engineer who can spot bugs, security holes, and architectural drift before they reach production. In reality, teams are often stretched thin, review cycles stretch into days, and the inevitable “LGTM” (looks good to me) can mask subtle defects.
|
||||
Pull requests (PRs) are the lifeblood of modern software development. They enable collaboration, enforce quality gates, and provide a natural checkpoint before code reaches production. Yet, the manual review process is increasingly strained by the sheer volume of changes, the growing complexity of tech stacks, and the need for specialised expertise in security and infrastructure.
|
||||
|
||||
Enter **PR Reviewer**, a self‑hosted AI‑powered review engine that brings the rigor of a senior engineer to every PR, 24 hours a day, without the need for a cloud subscription. Built on top of **CrewAI**, a framework for orchestrating specialised LLM agents, and the **Model Context Protocol (MCP)**, which bridges LLMs with static analysis tools, PR Reviewer offers a modular, extensible, and privacy‑first alternative to hosted code‑review bots.
|
||||
Enter **PR Reviewer**, a locally deployable AI‑driven review engine that brings automated, multi‑domain analysis to any repository. Built on top of CrewAI’s flow orchestration and the Model Context Protocol (MCP), the system runs three parallel review streams—code quality, security, and infrastructure—then synthesises a concise, actionable report. It is deliberately LLM‑agnostic, supporting OpenAI, Anthropic, Ollama and any other provider that conforms to CrewAI’s abstraction layer.
|
||||
|
||||
This article walks through the motivations behind the project, its core capabilities, the architectural choices that make it both flexible and performant, and practical guidance on getting it up and running in your own environment. By the end you’ll understand not only *what* PR Reviewer does, but *why* its design decisions matter for teams that value control, security, and reproducibility.
|
||||
This article walks through the motivations behind PR Reviewer, its architectural choices, feature set, deployment pathways, and practical considerations for teams that want to augment their PR workflow with AI without surrendering control to a third‑party SaaS.
|
||||
|
||||
---
|
||||
## The case for AI‑augmented PR reviews
|
||||
|
||||
## Why Automated PR Reviews Matter
|
||||
### Scaling expertise
|
||||
|
||||
### The Human Bottleneck
|
||||
Traditional code reviews rely on senior engineers to spot anti‑patterns, security flaws, and deployment mis‑configurations. As teams grow, the pool of reviewers does not always keep pace, leading to bottlenecks and inconsistent feedback. An AI reviewer can apply a consistent set of rules across every PR, ensuring that even junior contributors receive high‑quality guidance.
|
||||
|
||||
Even the most disciplined teams eventually hit a capacity ceiling. Experienced reviewers are a scarce resource, and junior developers often lack the depth to provide comprehensive feedback. When review latency spikes, so does the risk of merging regressions, security oversights, or non‑conformant infrastructure changes.
|
||||
### Reducing cognitive load
|
||||
|
||||
### Consistency Across Repositories
|
||||
Human reviewers must juggle multiple concerns—style, correctness, performance, compliance—while also understanding the broader context of a change. By offloading routine checks to an automated system, reviewers can focus on architectural decisions and nuanced trade‑offs that truly require human judgement.
|
||||
|
||||
Large organisations typically maintain a suite of style guides, security policies, and infrastructure standards. Enforcing these manually is error‑prone; a single missed rule can cascade into production incidents. Automated reviewers can codify these expectations and apply them uniformly, ensuring that every PR is measured against the same baseline.
|
||||
### Faster feedback loops
|
||||
|
||||
### The Cost of Cloud‑Based AI
|
||||
Continuous integration pipelines already provide rapid build and test feedback. Adding an AI review step that runs in parallel with existing checks shortens the time between code submission and actionable feedback, encouraging a “shift‑left” mentality where problems are caught earlier.
|
||||
|
||||
Commercial AI review services usually require a SaaS subscription, sending source code to external endpoints. For organisations handling sensitive data, proprietary algorithms, or regulated workloads, that model is untenable. A locally deployable solution eliminates data egress concerns while still leveraging the latest LLM capabilities.
|
||||
### Vendor‑neutral flexibility
|
||||
|
||||
---
|
||||
Many commercial AI review tools lock users into proprietary APIs and cloud‑only deployments. PR Reviewer’s design deliberately avoids vendor lock‑in. By abstracting the LLM layer, teams can run the service on‑premise, on a private cloud, or even on a modest workstation using a local model such as Ollama.
|
||||
|
||||
## The PR Reviewer Vision
|
||||
## Core concepts
|
||||
|
||||
PR Reviewer is deliberately positioned as a **private, community‑driven project**. It is not a commercial product; rather, it is a toolbox that developers can run on their own hardware, customise with their own guidelines, and extend with additional analysis tools as needed.
|
||||
### CrewAI flows
|
||||
|
||||
Key aspirations include:
|
||||
CrewAI provides a lightweight framework for orchestrating multiple “crews” (agents) that each perform a specialised task. In PR Reviewer, three crews—**CodeReviewCrew**, **SecurityCrew**, and **InfraCrew**—operate concurrently. Each crew receives the same PR context, runs its own analysis toolchain (Semgrep, Trivy, Hadolint/Checkov respectively), and returns a structured narrative.
|
||||
|
||||
1. **Provider Agnosticism** – The LLM factory abstracts over OpenAI, Anthropic, Ollama, and any future provider that conforms to the standard API.
|
||||
2. **Context‑Aware Reviews** – By ingesting repository‑specific style guides and security policies, the system tailors its feedback to the conventions that matter to you.
|
||||
3. **Multi‑Agent Orchestration** – Separate agents specialise in code quality, security scanning, and infrastructure linting, each feeding results into a synthesiser that produces a human‑readable summary.
|
||||
4. **Extensible Architecture** – New agents or static analysis tools can be added without touching the core orchestration logic, thanks to the MCP integration layer.
|
||||
### Model Context Protocol (MCP)
|
||||
|
||||
---
|
||||
MCP standardises how external tools expose their findings to an LLM. Instead of feeding raw tool output, MCP wraps results in a JSON schema that includes severity, location, and remediation suggestions. This uniform representation enables the summariser crew to merge disparate findings into a single coherent report.
|
||||
|
||||
## Core Features
|
||||
### Summariser crew
|
||||
|
||||
### Code Review
|
||||
The final crew consumes the three domain‑specific outputs and asks the LLM to produce a human‑readable summary. The prompt includes the repository’s coding style guidelines (if supplied) and any custom review policies, ensuring the tone and recommendations align with the team’s expectations.
|
||||
|
||||
The code‑review agent runs **Semgrep** through MCP, applying a curated rule set that checks for common anti‑patterns, language‑specific best practices, and maintainability concerns. Because the rule set lives in a version‑controlled directory (`contexts/defaults/code_review.md`), teams can evolve it alongside their codebase.
|
||||
## Feature overview
|
||||
|
||||
### Security Review
|
||||
| Feature | Description |
|
||||
|---|---|
|
||||
| **Code review** | Style, maintainability and best‑practice checks powered by Semgrep. |
|
||||
| **Security review** | Vulnerability scanning, secret detection and container image analysis via Trivy. |
|
||||
| **Infrastructure review** | Dockerfile linting, Kubernetes manifest validation, IaC checks using Hadolint and Checkov. |
|
||||
| **Summarisation** | Consolidated, actionable report generated by an LLM. |
|
||||
| **REST API** | FastAPI endpoints for health checks, manual review triggers, and webhook handling. |
|
||||
| **Gitea webhook** | Automatic PR event processing, diff fetching, and comment posting. |
|
||||
| **Dockerised** | Multi‑stage build with all dependencies baked in. |
|
||||
| **Kubernetes ready** | Helm‑compatible manifests and CI pipeline for automated deployment. |
|
||||
| **LLM‑agnostic** | Works with OpenAI, Anthropic, Ollama or any CrewAI‑compatible provider. |
|
||||
| **Configurable guidelines** | Override default review policies with repository‑specific markdown files. |
|
||||
|
||||
Security is handled by the **Trivy** agent, which scans the PR’s dependency tree, container images, and configuration files for known vulnerabilities. The agent also respects custom security policies defined in `contexts/defaults/security_review.md`, allowing organisations to enforce, for example, “no use of insecure TLS versions”.
|
||||
## Architecture deep dive
|
||||
|
||||
### Infrastructure Review
|
||||
At a high level, PR Reviewer follows a request‑response pattern orchestrated by FastAPI. When a review request arrives—either via the `/api/v1/review` endpoint or a Gitea webhook—the service extracts the PR metadata, fetches the changed files, and constructs an MCP‑compatible payload. This payload is then dispatched to the three review crews in parallel.
|
||||
|
||||
Infrastructure‑as‑code files (Dockerfiles, Kubernetes manifests, Terraform) are examined by **Hadolint** and **Checkov** wrappers. The resulting feedback highlights misconfigurations, deprecated APIs, and opportunities for resource optimisation.
|
||||
|
||||
### Contextual Review
|
||||
|
||||
Beyond static analysis, PR Reviewer accepts a **context payload** that can embed project‑specific guidelines. This means the AI can reference your own coding style guide when suggesting changes, rather than relying on generic conventions.
|
||||
|
||||
### Automated Orchestration
|
||||
|
||||
CrewAI flows coordinate the three agents, handling parallel execution, error aggregation, and result synthesis. The final output is a concise markdown report that can be posted back to the PR as a comment, emailed to the author, or stored in an audit log.
|
||||
|
||||
### REST API
|
||||
|
||||
A lightweight **FastAPI** service exposes two endpoints: a health check and a review trigger. The API accepts a JSON payload describing the PR, the changed files, and any custom context. Responses include a unique `review_id`, processing time, and the full set of agent results.
|
||||
|
||||
### Containerised Deployment
|
||||
|
||||
The entire stack is packaged as a Docker image, enabling one‑line deployment on any host that runs Docker or Kubernetes. For teams that prefer a bare‑metal Python environment, a virtual‑environment based installation is also supported.
|
||||
|
||||
---
|
||||
|
||||
## Architectural Overview
|
||||
|
||||
At a high level PR Reviewer follows a **modular, flow‑based architecture** that separates concerns into distinct layers.
|
||||
|
||||
1. **API Layer** – FastAPI receives HTTP requests, validates payloads with Pydantic models, and forwards them to the orchestration engine.
|
||||
2. **Orchestration Layer** – CrewAI flows instantiate specialised agents (code, security, infra) and manage their lifecycle. Agents run concurrently, each returning a structured result.
|
||||
3. **LLM Factory** – A provider‑agnostic factory creates LLM clients based on environment variables (`LLM_PROVIDER`, `LLM_API_KEY`, etc.). This abstraction permits swapping providers without code changes.
|
||||
4. **Context Resolver** – Before agents run, the resolver merges repository‑wide guidelines with any per‑request overrides, producing a unified context object that agents can reference.
|
||||
5. **MCP Integration Layer** – Each static analysis tool is wrapped in an MCP server that exposes a simple JSON‑RPC interface. The agents invoke these servers, passing file contents and receiving findings.
|
||||
6. **Result Synthesiser** – The final agent consumes the raw findings, prompts the LLM to summarise them, and formats the output as markdown.
|
||||
|
||||
All components communicate via **typed Python data classes**, ensuring that contracts remain explicit and testable. The use of Pydantic for state management also provides automatic validation and serialization, reducing boilerplate.
|
||||
|
||||
---
|
||||
|
||||
## Installation Guide
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- **Python 3.10–3.13** – The project leverages modern language features such as structural pattern matching.
|
||||
- **UV package manager** – A fast, deterministic installer that replaces `pip` for reproducible builds.
|
||||
- **Git** – Required for cloning the repository and for any internal operations that need repository metadata.
|
||||
- **Docker** (optional) – For containerised deployment; not required if you prefer a virtual‑environment install.
|
||||
|
||||
### Local Development
|
||||
|
||||
1. **Clone the repository**
|
||||
|
||||
```bash
|
||||
git clone https://git.aridgwayweb.com/armistace/pr_reviewer.git
|
||||
cd pr_reviewer
|
||||
```
|
||||
|
||||
2. **Install UV**
|
||||
|
||||
```bash
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
source $HOME/.local/bin/env
|
||||
```
|
||||
|
||||
3. **Create and activate a virtual environment**
|
||||
|
||||
```bash
|
||||
uv venv .venv
|
||||
source .venv/bin/activate
|
||||
```
|
||||
|
||||
4. **Install the project in editable mode**
|
||||
|
||||
```bash
|
||||
uv pip install -e .
|
||||
```
|
||||
|
||||
5. **Configure environment variables**
|
||||
|
||||
Copy `.env.example` to `.env` and fill in values for your chosen LLM provider, API keys, and any MCP server endpoints.
|
||||
|
||||
### Docker Deployment
|
||||
|
||||
If you prefer an isolated container, the Dockerfile builds a minimal image based on `python:3.13-slim`.
|
||||
|
||||
```bash
|
||||
docker build -t pr-reviewer .
|
||||
docker run -p 8000:8000 --env-file .env pr-reviewer
|
||||
```
|
||||
POST /api/v1/review → FastAPI handler
|
||||
│
|
||||
├─► Fetch diffs from Gitea (or use supplied file list)
|
||||
├─► Build MCP payload
|
||||
├─► Parallel execution:
|
||||
│ ├─ CodeReviewCrew (Semgrep)
|
||||
│ ├─ SecurityCrew (Trivy)
|
||||
│ └─ InfraCrew (Hadolint + Checkov)
|
||||
└─► Summariser crew → LLM → JSON response
|
||||
└─► Return consolidated report
|
||||
```
|
||||
|
||||
The service will be reachable at `http://localhost:8000`.
|
||||
### Parallelism and timeouts
|
||||
|
||||
---
|
||||
Each crew runs in its own asynchronous task with a configurable timeout (`PER_CREW_TIMEOUT`). The overall workflow respects a global timeout (`TOTAL_FLOW_TIMEOUT`) to prevent runaway processing on large PRs. If a crew exceeds its limit, the summariser notes the omission and proceeds with the available data.
|
||||
|
||||
## Using the Service
|
||||
### Data flow and persistence
|
||||
|
||||
### Health Check
|
||||
PR Reviewer is deliberately stateless. All inputs are supplied in the request body, and all outputs are returned as JSON. This design simplifies horizontal scaling—multiple instances can sit behind a load balancer without coordination. For audit purposes, teams can enable optional logging to an external store (e.g., Elasticsearch) via environment variables.
|
||||
|
||||
A simple GET request to `/api/v1/health` returns a JSON payload confirming that the API and all downstream agents are operational.
|
||||
## Integration with LLM providers
|
||||
|
||||
### Triggering a Review
|
||||
CrewAI abstracts the LLM behind a simple interface: `generate(prompt, model, temperature)`. The service reads three environment variables to configure the provider:
|
||||
|
||||
POST a JSON document to `/api/v1/review`. The payload must contain:
|
||||
* `LLM_PROVIDER` – `openai`, `anthropic`, or `ollama`.
|
||||
* `LLM_MODEL` – model identifier (e.g., `gpt-4`, `claude-3-sonnet`, `gemma4:31b-cloud`).
|
||||
* `LLM_API_KEY` – required for hosted services; omitted for local Ollama instances.
|
||||
|
||||
- **PR metadata** – ID, title, description, source/target branches.
|
||||
- **Repository information** – Name and URL.
|
||||
- **Changed files** – Path, content, status, and diff statistics.
|
||||
- **Context** – Optional overrides for code, security, and infra guidelines.
|
||||
Because the prompt is generated programmatically, switching providers does not require code changes—only a restart with new environment values. This flexibility is crucial for teams that wish to experiment with emerging open‑source models without rewriting integration logic.
|
||||
|
||||
The service responds immediately with a `review_id`. You can poll a status endpoint (not shown here) or wait for the final result, which includes a synthesized markdown summary and the raw findings from each agent.
|
||||
## Review flows in detail
|
||||
|
||||
Because the API is deliberately thin, it can be integrated into any CI/CD platform that supports HTTP calls – GitHub Actions, GitLab CI, Azure Pipelines, you name it.
|
||||
### Code review crew
|
||||
|
||||
---
|
||||
The code crew invokes Semgrep with a curated rule set that reflects common Python, JavaScript and Go best practices. Findings are normalised into MCP entries containing:
|
||||
|
||||
## Configuration Details
|
||||
* **Severity** – `critical`, `high`, `medium`, `low`.
|
||||
* **Location** – file path and line range.
|
||||
* **Message** – concise description of the issue.
|
||||
* **Remediation** – suggested code change or reference to documentation.
|
||||
|
||||
### Environment Variables
|
||||
If a repository supplies a custom `code_review.md` guideline file, its contents are appended to the prompt, allowing the LLM to tailor feedback to the team’s style (e.g., preferring f‑strings over `%` formatting).
|
||||
|
||||
Key variables include:
|
||||
### Security review crew
|
||||
|
||||
- `LLM_PROVIDER` – e.g. `openai`, `anthropic`, `ollama`.
|
||||
- `LLM_API_KEY` – Secret token for the chosen provider.
|
||||
- `MCP_SEMgrep_ENDPOINT`, `MCP_Trivy_ENDPOINT`, etc. – URLs of the MCP wrappers.
|
||||
- `REVIEW_TIMEOUT_SECONDS` – Upper bound for the total review duration.
|
||||
Security analysis runs Trivy in two modes: vulnerability scanning of any container images referenced in the PR, and filesystem scanning for secrets, mis‑configurations, and known vulnerable dependencies. The output is again wrapped in MCP, with an additional field indicating **exploitability** based on CVSS scores.
|
||||
|
||||
All defaults are documented in `.env.example`.
|
||||
### Infrastructure review crew
|
||||
|
||||
### Context Files
|
||||
Infrastructure checks focus on Dockerfiles, Kubernetes manifests, and generic IaC (Terraform, CloudFormation). Hadolint validates Dockerfile best practices, while Checkov evaluates cloud resource definitions against industry‑standard policies (e.g., CIS benchmarks). The crew also respects any `infra_review.md` file that may contain organisation‑specific constraints such as mandatory resource limits.
|
||||
|
||||
The `contexts/defaults/` directory ships with three markdown files that encode baseline guidelines:
|
||||
### Summariser crew
|
||||
|
||||
- **code_review.md** – Language‑agnostic style rules, naming conventions, and complexity thresholds.
|
||||
- **security_review.md** – Threat‑model assumptions, prohibited functions, and dependency‑version policies.
|
||||
- **infra_review.md** – Container‑image best practices, Kubernetes resource limits, and IaC linting rules.
|
||||
The summariser receives three JSON arrays and constructs a single prompt that asks the LLM to:
|
||||
|
||||
Projects can replace or extend these files, or supply per‑request overrides via the API’s `context` field. This flexibility ensures that the AI’s suggestions are always aligned with the team’s current standards.
|
||||
1. Produce an executive summary of the overall health of the PR.
|
||||
2. List the top‑5 findings across all domains, ordered by severity.
|
||||
3. Provide actionable recommendations, grouped by domain.
|
||||
4. Highlight any deviations from the repository’s own guidelines.
|
||||
|
||||
---
|
||||
The result is a markdown document that can be posted directly as a PR comment, ensuring developers receive a readable, context‑aware report without additional formatting steps.
|
||||
|
||||
## Development Workflow
|
||||
## API design
|
||||
|
||||
### Running Tests
|
||||
PR Reviewer exposes a minimal FastAPI surface:
|
||||
|
||||
The test suite is split into unit and integration tests.
|
||||
* `GET /api/v1/health` – health check returning `{ "status": "healthy", "service": "pr-reviewer" }`.
|
||||
* `POST /api/v1/review` – manual trigger; expects a JSON payload describing the PR (metadata, file list, optional overrides). Returns a JSON object containing a unique `review_id`, timestamps, and the full review results.
|
||||
* `POST /api/v1/gitea-webhook` – endpoint for Gitea pull‑request events. Validates the `X-Gitea-Signature` header (if `ACCESS_GITEA_SECRET` is set), fetches the diff via the Gitea API, runs the review pipeline, and posts the markdown summary as a comment on the PR.
|
||||
|
||||
- **Unit tests** validate pure‑Python logic such as the context resolver and result synthesiser.
|
||||
- **Integration tests** spin up temporary MCP servers and verify end‑to‑end behaviour of each agent.
|
||||
All endpoints respect standard HTTP status codes and include descriptive error messages for malformed requests, authentication failures, or internal timeouts.
|
||||
|
||||
Execute the full suite with coverage reporting:
|
||||
## Gitea webhook integration
|
||||
|
||||
Gitea is the default CI/CD platform for the reference implementation, but the webhook handler is deliberately generic:
|
||||
|
||||
1. **Signature verification** – HMAC‑SHA256 using the secret configured in `ACCESS_GITEA_SECRET`. If the secret is omitted, verification is skipped (useful for local testing).
|
||||
2. **Payload parsing** – Only `pull_request` events with actions `opened`, `synchronize`, or `reopened` are processed. Other events are ignored to reduce noise.
|
||||
3. **Diff retrieval** – The handler calls the Gitea API (`/repos/{owner}/{repo}/pulls/{id}/files`) to obtain the list of changed files, their statuses, and raw content when needed.
|
||||
4. **Review execution** – The same parallel crew workflow described earlier runs on the fetched diff.
|
||||
5. **Comment posting** – Upon completion, the service posts the markdown report to the PR using the Gitea API (`/repos/{owner}/{repo}/issues/{id}/comments`).
|
||||
|
||||
### Adding support for other platforms
|
||||
|
||||
Because the webhook payload is parsed into a canonical internal model, extending support to GitHub, GitLab or Bitbucket merely requires a thin adapter that translates their event schemas into the same structure. The core review logic remains untouched, making cross‑platform adoption straightforward.
|
||||
|
||||
## Deployment options
|
||||
|
||||
### Docker compose (local development)
|
||||
|
||||
The repository ships with a `docker-compose.yaml` that defines two services:
|
||||
|
||||
* `pr-reviewer` – the FastAPI application.
|
||||
* `ollama` (optional) – a local LLM server for offline use.
|
||||
|
||||
Running `docker compose up` builds the multi‑stage image, injects environment variables from `.env`, and exposes the API on `http://localhost:8000`.
|
||||
|
||||
### Kubernetes (production)
|
||||
|
||||
For production workloads, a Helm chart (or plain manifests in `kube/`) provides:
|
||||
|
||||
* A Deployment with configurable replica count.
|
||||
* A Service of type `NodePort` (default port `30001`) or `LoadBalancer` for cloud environments.
|
||||
* A Secret (`pr-reviewer-env`) that stores all `.env` values, including Gitea tokens and LLM credentials.
|
||||
* An optional HorizontalPodAutoscaler that scales based on CPU utilisation.
|
||||
|
||||
The CI pipeline (`.gitea/workflows/build_push.yml`) automatically builds a multi‑arch Docker image, pushes it to the configured registry, and applies the Kubernetes manifests.
|
||||
|
||||
### Resource considerations
|
||||
|
||||
* **CPU** – The LLM inference dominates CPU usage. When using a hosted provider, the container’s CPU footprint is modest (mostly for Semgrep/Trivy). With a local model, allocate at least 4 vCPUs and 8 GB RAM.
|
||||
* **Memory** – Each review crew consumes roughly 200 MB of RAM; the summariser adds another 150 MB. The total stays under 1 GB for typical PR sizes.
|
||||
* **Storage** – The image size is ~1.2 GB (including all scanning tools). Persistent storage is not required unless audit logging is enabled.
|
||||
|
||||
## Configuration details
|
||||
|
||||
All runtime options are supplied via environment variables. The most important groups are:
|
||||
|
||||
| Variable | Required? | Description |
|
||||
|---|---|---|
|
||||
| `LLM_PROVIDER` | Yes | `openai`, `anthropic`, or `ollama`. |
|
||||
| `LLM_MODEL` | Yes | Model identifier (e.g., `gpt-4`). |
|
||||
| `LLM_API_KEY` | Conditional | API key for hosted providers. |
|
||||
| `ACCESS_GITEA_URL` | Yes | Base URL of the Gitea instance. |
|
||||
| `ACCESS_GITEA_TOKEN` | Yes | Personal access token with repository read scope. |
|
||||
| `ACCESS_GITEA_SECRET` | No | Webhook secret for HMAC verification. |
|
||||
| `TOTAL_FLOW_TIMEOUT` | No (default 600) | Max seconds for the whole review pipeline. |
|
||||
| `PER_CREW_TIMEOUT` | No (default 300) | Max seconds per individual crew. |
|
||||
| `LOG_LEVEL` | No (default `INFO`) | Python logging verbosity. |
|
||||
|
||||
Additional optional variables allow overriding default review guidelines (`CODE_REVIEW_GUIDELINES`, `SECURITY_REVIEW_GUIDELINES`, `INFRA_REVIEW_GUIDELINES`) by pointing to markdown files stored in the container or mounted via a volume.
|
||||
|
||||
## Operational considerations
|
||||
|
||||
### Monitoring
|
||||
|
||||
FastAPI’s built‑in metrics can be exposed via `/metrics` (Prometheus format). Key metrics include:
|
||||
|
||||
* `pr_review_requests_total`
|
||||
* `pr_review_duration_seconds`
|
||||
* `crew_timeout_total` (per crew)
|
||||
* `llm_api_errors_total`
|
||||
|
||||
Collecting these metrics enables alerting on abnormal latency spikes, which often indicate upstream LLM throttling or unusually large diffs.
|
||||
|
||||
### Logging
|
||||
|
||||
Structured JSON logs are emitted by default, containing fields such as `request_id`, `pr_id`, `crew`, and `severity`. When integrated with a log aggregation platform (e.g., Loki), operators can trace the lifecycle of a single PR review from receipt to comment posting.
|
||||
|
||||
### Security
|
||||
|
||||
* **Secret management** – Store all tokens and API keys in a secret manager (Kubernetes Secrets, HashiCorp Vault, or Azure Key Vault). Never commit `.env` files to source control.
|
||||
* **Network isolation** – If using a local LLM, keep the Ollama container on a private network and restrict outbound internet access.
|
||||
* **Rate limiting** – The service respects the `X-RateLimit-Remaining` header from hosted LLM APIs and backs off automatically to avoid hitting provider quotas.
|
||||
|
||||
## Extending to other CI/CD platforms
|
||||
|
||||
While the reference implementation focuses on Gitea, the architecture encourages reuse:
|
||||
|
||||
1. **Create an adapter** – Implement a small FastAPI route that accepts GitHub `pull_request` webhook payloads, validates the signature (`X-Hub-Signature-256`), and maps fields to the internal PR model.
|
||||
2. **Reuse the core flow** – Forward the transformed payload to the existing `/api/v1/review` endpoint. No changes to the review crews are required.
|
||||
3. **Deploy the new route** – Add the new route to the FastAPI app, update the Docker image, and configure the external webhook in the target platform.
|
||||
|
||||
Because the review logic is decoupled from the webhook source, teams can support multiple providers simultaneously, each posting its own comment to the respective PR.
|
||||
|
||||
## Development workflow
|
||||
|
||||
Contributors who wish to enhance PR Reviewer can follow these steps:
|
||||
|
||||
```bash
|
||||
pytest --cov=src.pr_reviewer
|
||||
# Clone the repository
|
||||
git clone https://git.aridgwayweb.com/armistace/pr_reviewer.git
|
||||
cd pr_reviewer
|
||||
|
||||
# Install development dependencies
|
||||
uv pip install -e ".[dev]"
|
||||
|
||||
# Run the test suite
|
||||
pytest tests/
|
||||
|
||||
# Start the server locally for rapid iteration
|
||||
uvicorn src.pr_reviewer.main:app --reload
|
||||
```
|
||||
|
||||
### Code Style
|
||||
The project uses **uv** for isolated virtual environments, **pytest** for unit and integration tests, and **ruff** for linting. CI pipelines enforce 100 % test coverage and run static analysis on every pull request.
|
||||
|
||||
The project enforces a strict style using **Black** for formatting and **Flake8** for linting. Developers should run these tools locally before committing.
|
||||
### Adding a new review tool
|
||||
|
||||
```bash
|
||||
black src/
|
||||
flake8 src/
|
||||
```
|
||||
To incorporate an additional analysis tool (e.g., a custom static analyser), developers should:
|
||||
|
||||
### Adding a New Agent
|
||||
1. Write a thin wrapper that converts the tool’s output into the MCP schema.
|
||||
2. Register a new crew in `crews/` that invokes the wrapper.
|
||||
3. Update the orchestration flow (`flow.py`) to include the new crew in the parallel execution block.
|
||||
4. Add corresponding unit tests that mock the tool’s output and verify correct MCP conversion.
|
||||
|
||||
To introduce a new review domain (e.g., license compliance), follow these steps:
|
||||
## Testing and quality assurance
|
||||
|
||||
1. Implement an MCP wrapper for the underlying tool.
|
||||
2. Create a Pydantic model for the agent’s input and output.
|
||||
3. Register the agent in the CrewAI flow configuration.
|
||||
4. Add corresponding context documentation under `contexts/defaults/`.
|
||||
PR Reviewer’s reliability hinges on three testing layers:
|
||||
|
||||
Because the orchestration layer treats agents as black boxes that accept a context and return a structured result, the integration effort is minimal.
|
||||
* **Unit tests** – Validate each crew’s MCP conversion logic, LLM prompt generation, and webhook parsing.
|
||||
* **Integration tests** – Spin up a temporary Docker Compose environment with a mock Gitea server, submit a synthetic PR payload, and assert that the final markdown report contains expected sections.
|
||||
* **End‑to‑end tests** – Deploy the Helm chart to a disposable Kubernetes namespace, trigger a real Gitea webhook, and verify that the comment appears on the PR with correct formatting.
|
||||
|
||||
---
|
||||
All tests run in CI on every push, and failures block merges.
|
||||
|
||||
## Deployment Strategies
|
||||
## Community and contributions
|
||||
|
||||
### Kubernetes
|
||||
The project is deliberately open‑source, hosted on a self‑managed Gitea instance. Contributors are encouraged to:
|
||||
|
||||
For production workloads, the `k8s/` directory provides Helm‑compatible manifests:
|
||||
* **Open issues** – Report bugs, request new review domains, or suggest LLM prompt improvements.
|
||||
* **Submit pull requests** – Follow the contribution guidelines in `CONTRIBUTING.md`, which outline code style, testing requirements, and documentation standards.
|
||||
* **Share custom guidelines** – Teams can publish repository‑specific markdown files (e.g., `code_review.md`) that the summariser will automatically honour.
|
||||
|
||||
- **Secret** – Stores LLM API keys and MCP credentials.
|
||||
- **Deployment** – Runs the FastAPI container with configurable replica count.
|
||||
- **Service** – Exposes the API via a ClusterIP or LoadBalancer, depending on your environment.
|
||||
Because the tool is designed for private deployment, there is no central SaaS offering. Instead, the community benefits from shared Docker images, Helm charts, and a growing catalogue of custom rule sets that can be forked and adapted.
|
||||
|
||||
The manifests are deliberately simple, allowing teams to augment them with sidecar containers for logging, monitoring, or additional security scanning.
|
||||
## Limitations and future directions
|
||||
|
||||
### CI/CD Integration
|
||||
### Current constraints
|
||||
|
||||
A sample GitHub Actions workflow (`.gitea/workflows/deploy.yaml`) demonstrates how to build the Docker image, push it to a registry, and apply the Kubernetes manifests on each merge to `main`. The same pattern can be adapted for GitLab CI, Azure Pipelines, or any other automation platform.
|
||||
* **LLM dependence** – The quality of the final summary is directly tied to the underlying model’s capabilities. Low‑capacity models may produce vague recommendations.
|
||||
* **Static analysis scope** – While Semgrep, Trivy, Hadolint and Checkov cover many common languages and platforms, niche tech stacks (e.g., Rust, Terraform Cloud) require additional adapters.
|
||||
* **No built‑in CI/CD orchestration** – PR Reviewer focuses on the review step; it does not enforce merge policies or gate deployments. Teams must integrate the API into their existing pipelines.
|
||||
|
||||
---
|
||||
### Planned enhancements
|
||||
|
||||
## Community and Contribution
|
||||
|
||||
PR Reviewer is a **community‑first** project. The repository is hosted on a self‑managed Git server, but pull requests are welcomed from anyone willing to improve the codebase. Typical contribution pathways include:
|
||||
|
||||
- **Bug fixes** – Reported via the issue tracker, with accompanying unit tests.
|
||||
- **Feature enhancements** – New agents, additional MCP wrappers, or UI improvements (e.g., a lightweight web dashboard).
|
||||
- **Documentation updates** – Clarifying installation steps, adding language‑specific guidelines, or improving the README.
|
||||
|
||||
All contributions should follow the standard fork‑branch‑PR workflow, and the CI pipeline will automatically run the test suite and linting checks.
|
||||
|
||||
---
|
||||
|
||||
## Future Directions
|
||||
|
||||
### Expanded Provider Support
|
||||
|
||||
While the current LLM factory covers the major commercial providers, the architecture is ready for emerging open‑source models (e.g., Llama 3, Mistral) that can be served locally via Ollama or vLLM. Adding a new provider is a matter of implementing a thin adapter that conforms to the `LLMClient` interface.
|
||||
|
||||
### Adaptive Learning
|
||||
|
||||
One avenue under investigation is **feedback loops** where the AI’s suggestions are rated by developers, and those ratings are fed back into a reinforcement‑learning pipeline. Over time the system could learn the nuances of a particular team’s style, reducing false positives.
|
||||
|
||||
### Richer UI Integration
|
||||
|
||||
Beyond posting markdown comments, a dedicated web UI could visualise findings, allow developers to acknowledge or dismiss specific issues, and provide one‑click remediation scripts.
|
||||
|
||||
### Policy‑as‑Code
|
||||
|
||||
Integrating with policy‑as‑code frameworks such as **OPA** would enable dynamic, rule‑driven security reviews that adapt to changing compliance requirements without code changes.
|
||||
|
||||
---
|
||||
1. **Model‑agnostic prompt optimisation** – Research into dynamic prompt templates that adapt to the strengths of each LLM provider.
|
||||
2. **Feedback loop** – Capture developer reactions to the AI suggestions (e.g., thumbs up/down) and use them to fine‑tune future prompts.
|
||||
3. **Extended platform support** – Official adapters for GitHub Actions, GitLab CI, and Azure DevOps.
|
||||
4. **Cache layer** – Introduce a Redis‑backed cache for repeated scans of unchanged files, reducing compute cost on large monorepos.
|
||||
5. **Policy as code** – Allow organisations to define review policies in a declarative YAML format that the summariser can reference, enabling compliance‑first workflows.
|
||||
|
||||
## Conclusion
|
||||
|
||||
Automated PR reviews have moved from a novelty to a necessity, especially as codebases grow and security expectations tighten. **PR Reviewer** offers a pragmatic, privacy‑preserving solution that brings together the best of LLM reasoning, static analysis, and community‑driven guidelines. Its modular design means you can start with the out‑of‑the‑box code, security, and infra agents, then extend the platform to cover any domain your team cares about.
|
||||
PR Reviewer demonstrates that AI‑driven code quality, security, and infrastructure analysis can be delivered as a self‑hosted, vendor‑neutral service without sacrificing flexibility or control. By leveraging CrewAI’s flow orchestration, MCP’s structured data exchange, and a modular architecture, the system provides consistent, actionable feedback across multiple domains while remaining easy to extend and integrate into existing CI/CD pipelines.
|
||||
|
||||
Because the system runs wherever you choose—on a developer laptop, a CI runner, or a Kubernetes cluster—you retain full control over data, costs, and performance. The open‑source licence (MIT) encourages collaboration, and the clear contribution path invites you to shape the tool’s evolution.
|
||||
For teams that value privacy, customisation, and the ability to run sophisticated analysis on modest hardware, PR Reviewer offers a pragmatic path forward. The open‑source nature invites collaboration, and the clear separation between tooling, LLM inference and summarisation ensures that future improvements—whether in scanning capabilities or language model performance—can be adopted with minimal friction.
|
||||
|
||||
If you’ve ever wished for a diligent reviewer that never sleeps, respects your coding style, and never leaks your proprietary code, give PR Reviewer a spin. The repository is ready at <https://git.aridgwayweb.com/armistace/pr_reviewer>, and the community is eager to see how you’ll make it your own.
|
||||
|
||||
---
|
||||
|
||||
*Happy reviewing, mates!*
|
||||
Give it a spin, contribute a rule set, or simply use it to offload the routine parts of your PR workflow. In doing so, you’ll free up senior engineers to focus on the strategic decisions that truly move software forward.
|
||||
Loading…
x
Reference in New Issue
Block a user