Compare commits

..

8 Commits

Author SHA1 Message Date
6d1294af3e Merge pull request 'pr_reviewer__a_deployable_ai_reviewer_for_your_repos' (#25) from pr_reviewer__a_deployable_ai_reviewer_for_your_repos into master
All checks were successful
Build and Push Image / Build and push image (push) Successful in 18m58s
Reviewed-on: #25
2026-05-22 20:47:31 +10:00
727949de93 Add Myself as Author 2026-05-22 20:41:55 +10:00
Blog Creator
c95161dc7c Add PR Reviewer overview documentation 2026-05-21 18:31:39 +00:00
Blog Creator
e2ec1a3eae Add comprehensive PR Reviewer guide 2026-05-21 12:13:53 +00:00
Blog Creator
2f4e98a8e3 Introduce PR Reviewer overview article 2026-05-21 11:39:13 +00:00
Blog Creator
85375a051e Add PR Reviewer guide documentation 2026-05-15 18:38:04 +00:00
Blog Creator
6db260d4bd Add deployable AI PR reviewer 2026-05-14 18:33:04 +00:00
Blog Creator
e044202042 Add PR Reviewer deployment docs 2026-05-09 18:38:39 +00:00

View File

@ -0,0 +1,309 @@
Title: PR Reviewer - A deployable AI reviewer for your Repos
Date: 2026-05-21 18:30
Modified: 2026-05-21 18:30
Category: DevOps
Tags: ai, code-review, automation, devops, open-source, ai_content, not_human_content
Slug: pr-reviewer-deployable-ai-reviewer
Authors: Andrew Ridgway... And Friends - glm-5.1.ai, nemotron-3-nano.ai, gemma4.ai, deepseek-v4-flash.ai
Summary: An indepth look at PR Reviewer, a selfhosted, LLMagnostic AI system that automates code, security and infrastructure reviews for any Git repository.
---
## Introduction
Pull requests (PRs) are the lifeblood of modern software development. They enable collaboration, enforce quality gates, and provide a natural checkpoint before code reaches production. Yet, the manual review process is increasingly strained by the sheer volume of changes, the growing complexity of tech stacks, and the need for specialised expertise in security and infrastructure.
Enter **PR Reviewer**, a locally deployable AIdriven review engine that brings automated, multidomain analysis to any repository. Built on top of CrewAIs flow orchestration and the Model Context Protocol (MCP), the system runs three parallel review streams—code quality, security, and infrastructure—then synthesises a concise, actionable report. It is deliberately LLMagnostic, supporting OpenAI, Anthropic, Ollama and any other provider that conforms to CrewAIs abstraction layer.
This article walks through the motivations behind PR Reviewer, its architectural choices, feature set, deployment pathways, and practical considerations for teams that want to augment their PR workflow with AI without surrendering control to a thirdparty SaaS.
## The case for AIaugmented PR reviews
### Scaling expertise
Traditional code reviews rely on senior engineers to spot antipatterns, security flaws, and deployment misconfigurations. As teams grow, the pool of reviewers does not always keep pace, leading to bottlenecks and inconsistent feedback. An AI reviewer can apply a consistent set of rules across every PR, ensuring that even junior contributors receive highquality guidance.
### Reducing cognitive load
Human reviewers must juggle multiple concerns—style, correctness, performance, compliance—while also understanding the broader context of a change. By offloading routine checks to an automated system, reviewers can focus on architectural decisions and nuanced tradeoffs that truly require human judgement.
### Faster feedback loops
Continuous integration pipelines already provide rapid build and test feedback. Adding an AI review step that runs in parallel with existing checks shortens the time between code submission and actionable feedback, encouraging a “shiftleft” mentality where problems are caught earlier.
### Vendorneutral flexibility
Many commercial AI review tools lock users into proprietary APIs and cloudonly deployments. PR Reviewers design deliberately avoids vendor lockin. By abstracting the LLM layer, teams can run the service onpremise, on a private cloud, or even on a modest workstation using a local model such as Ollama.
## Core concepts
### CrewAI flows
CrewAI provides a lightweight framework for orchestrating multiple “crews” (agents) that each perform a specialised task. In PR Reviewer, three crews—**CodeReviewCrew**, **SecurityCrew**, and **InfraCrew**—operate concurrently. Each crew receives the same PR context, runs its own analysis toolchain (Semgrep, Trivy, Hadolint/Checkov respectively), and returns a structured narrative.
### Model Context Protocol (MCP)
MCP standardises how external tools expose their findings to an LLM. Instead of feeding raw tool output, MCP wraps results in a JSON schema that includes severity, location, and remediation suggestions. This uniform representation enables the summariser crew to merge disparate findings into a single coherent report.
### Summariser crew
The final crew consumes the three domainspecific outputs and asks the LLM to produce a humanreadable summary. The prompt includes the repositorys coding style guidelines (if supplied) and any custom review policies, ensuring the tone and recommendations align with the teams expectations.
## Feature overview
| Feature | Description |
|---|---|
| **Code review** | Style, maintainability and bestpractice checks powered by Semgrep. |
| **Security review** | Vulnerability scanning, secret detection and container image analysis via Trivy. |
| **Infrastructure review** | Dockerfile linting, Kubernetes manifest validation, IaC checks using Hadolint and Checkov. |
| **Summarisation** | Consolidated, actionable report generated by an LLM. |
| **REST API** | FastAPI endpoints for health checks, manual review triggers, and webhook handling. |
| **Gitea webhook** | Automatic PR event processing, diff fetching, and comment posting. |
| **Dockerised** | Multistage build with all dependencies baked in. |
| **Kubernetes ready** | Helmcompatible manifests and CI pipeline for automated deployment. |
| **LLMagnostic** | Works with OpenAI, Anthropic, Ollama or any CrewAIcompatible provider. |
| **Configurable guidelines** | Override default review policies with repositoryspecific markdown files. |
## Architecture deep dive
At a high level, PR Reviewer follows a requestresponse pattern orchestrated by FastAPI. When a review request arrives—either via the `/api/v1/review` endpoint or a Gitea webhook—the service extracts the PR metadata, fetches the changed files, and constructs an MCPcompatible payload. This payload is then dispatched to the three review crews in parallel.
```
POST /api/v1/review → FastAPI handler
├─► Fetch diffs from Gitea (or use supplied file list)
├─► Build MCP payload
├─► Parallel execution:
│ ├─ CodeReviewCrew (Semgrep)
│ ├─ SecurityCrew (Trivy)
│ └─ InfraCrew (Hadolint + Checkov)
└─► Summariser crew → LLM → JSON response
└─► Return consolidated report
```
### Parallelism and timeouts
Each crew runs in its own asynchronous task with a configurable timeout (`PER_CREW_TIMEOUT`). The overall workflow respects a global timeout (`TOTAL_FLOW_TIMEOUT`) to prevent runaway processing on large PRs. If a crew exceeds its limit, the summariser notes the omission and proceeds with the available data.
### Data flow and persistence
PR Reviewer is deliberately stateless. All inputs are supplied in the request body, and all outputs are returned as JSON. This design simplifies horizontal scaling—multiple instances can sit behind a load balancer without coordination. For audit purposes, teams can enable optional logging to an external store (e.g., Elasticsearch) via environment variables.
## Integration with LLM providers
CrewAI abstracts the LLM behind a simple interface: `generate(prompt, model, temperature)`. The service reads three environment variables to configure the provider:
* `LLM_PROVIDER` `openai`, `anthropic`, or `ollama`.
* `LLM_MODEL` model identifier (e.g., `gpt-4`, `claude-3-sonnet`, `gemma4:31b-cloud`).
* `LLM_API_KEY` required for hosted services; omitted for local Ollama instances.
Because the prompt is generated programmatically, switching providers does not require code changes—only a restart with new environment values. This flexibility is crucial for teams that wish to experiment with emerging opensource models without rewriting integration logic.
## Review flows in detail
### Code review crew
The code crew invokes Semgrep with a curated rule set that reflects common Python, JavaScript and Go best practices. Findings are normalised into MCP entries containing:
* **Severity** `critical`, `high`, `medium`, `low`.
* **Location** file path and line range.
* **Message** concise description of the issue.
* **Remediation** suggested code change or reference to documentation.
If a repository supplies a custom `code_review.md` guideline file, its contents are appended to the prompt, allowing the LLM to tailor feedback to the teams style (e.g., preferring fstrings over `%` formatting).
### Security review crew
Security analysis runs Trivy in two modes: vulnerability scanning of any container images referenced in the PR, and filesystem scanning for secrets, misconfigurations, and known vulnerable dependencies. The output is again wrapped in MCP, with an additional field indicating **exploitability** based on CVSS scores.
### Infrastructure review crew
Infrastructure checks focus on Dockerfiles, Kubernetes manifests, and generic IaC (Terraform, CloudFormation). Hadolint validates Dockerfile best practices, while Checkov evaluates cloud resource definitions against industrystandard policies (e.g., CIS benchmarks). The crew also respects any `infra_review.md` file that may contain organisationspecific constraints such as mandatory resource limits.
### Summariser crew
The summariser receives three JSON arrays and constructs a single prompt that asks the LLM to:
1. Produce an executive summary of the overall health of the PR.
2. List the top5 findings across all domains, ordered by severity.
3. Provide actionable recommendations, grouped by domain.
4. Highlight any deviations from the repositorys own guidelines.
The result is a markdown document that can be posted directly as a PR comment, ensuring developers receive a readable, contextaware report without additional formatting steps.
## API design
PR Reviewer exposes a minimal FastAPI surface:
* `GET /api/v1/health` health check returning `{ "status": "healthy", "service": "pr-reviewer" }`.
* `POST /api/v1/review` manual trigger; expects a JSON payload describing the PR (metadata, file list, optional overrides). Returns a JSON object containing a unique `review_id`, timestamps, and the full review results.
* `POST /api/v1/gitea-webhook` endpoint for Gitea pullrequest events. Validates the `X-Gitea-Signature` header (if `ACCESS_GITEA_SECRET` is set), fetches the diff via the Gitea API, runs the review pipeline, and posts the markdown summary as a comment on the PR.
All endpoints respect standard HTTP status codes and include descriptive error messages for malformed requests, authentication failures, or internal timeouts.
## Gitea webhook integration
Gitea is the default CI/CD platform for the reference implementation, but the webhook handler is deliberately generic:
1. **Signature verification** HMACSHA256 using the secret configured in `ACCESS_GITEA_SECRET`. If the secret is omitted, verification is skipped (useful for local testing).
2. **Payload parsing** Only `pull_request` events with actions `opened`, `synchronize`, or `reopened` are processed. Other events are ignored to reduce noise.
3. **Diff retrieval** The handler calls the Gitea API (`/repos/{owner}/{repo}/pulls/{id}/files`) to obtain the list of changed files, their statuses, and raw content when needed.
4. **Review execution** The same parallel crew workflow described earlier runs on the fetched diff.
5. **Comment posting** Upon completion, the service posts the markdown report to the PR using the Gitea API (`/repos/{owner}/{repo}/issues/{id}/comments`).
### Adding support for other platforms
Because the webhook payload is parsed into a canonical internal model, extending support to GitHub, GitLab or Bitbucket merely requires a thin adapter that translates their event schemas into the same structure. The core review logic remains untouched, making crossplatform adoption straightforward.
## Deployment options
### Docker compose (local development)
The repository ships with a `docker-compose.yaml` that defines two services:
* `pr-reviewer` the FastAPI application.
* `ollama` (optional) a local LLM server for offline use.
Running `docker compose up` builds the multistage image, injects environment variables from `.env`, and exposes the API on `http://localhost:8000`.
### Kubernetes (production)
For production workloads, a Helm chart (or plain manifests in `kube/`) provides:
* A Deployment with configurable replica count.
* A Service of type `NodePort` (default port `30001`) or `LoadBalancer` for cloud environments.
* A Secret (`pr-reviewer-env`) that stores all `.env` values, including Gitea tokens and LLM credentials.
* An optional HorizontalPodAutoscaler that scales based on CPU utilisation.
The CI pipeline (`.gitea/workflows/build_push.yml`) automatically builds a multiarch Docker image, pushes it to the configured registry, and applies the Kubernetes manifests.
### Resource considerations
* **CPU** The LLM inference dominates CPU usage. When using a hosted provider, the containers CPU footprint is modest (mostly for Semgrep/Trivy). With a local model, allocate at least 4 vCPUs and 8GB RAM.
* **Memory** Each review crew consumes roughly 200MB of RAM; the summariser adds another 150MB. The total stays under 1GB for typical PR sizes.
* **Storage** The image size is ~1.2GB (including all scanning tools). Persistent storage is not required unless audit logging is enabled.
## Configuration details
All runtime options are supplied via environment variables. The most important groups are:
| Variable | Required? | Description |
|---|---|---|
| `LLM_PROVIDER` | Yes | `openai`, `anthropic`, or `ollama`. |
| `LLM_MODEL` | Yes | Model identifier (e.g., `gpt-4`). |
| `LLM_API_KEY` | Conditional | API key for hosted providers. |
| `ACCESS_GITEA_URL` | Yes | Base URL of the Gitea instance. |
| `ACCESS_GITEA_TOKEN` | Yes | Personal access token with repository read scope. |
| `ACCESS_GITEA_SECRET` | No | Webhook secret for HMAC verification. |
| `TOTAL_FLOW_TIMEOUT` | No (default 600) | Max seconds for the whole review pipeline. |
| `PER_CREW_TIMEOUT` | No (default 300) | Max seconds per individual crew. |
| `LOG_LEVEL` | No (default `INFO`) | Python logging verbosity. |
Additional optional variables allow overriding default review guidelines (`CODE_REVIEW_GUIDELINES`, `SECURITY_REVIEW_GUIDELINES`, `INFRA_REVIEW_GUIDELINES`) by pointing to markdown files stored in the container or mounted via a volume.
## Operational considerations
### Monitoring
FastAPIs builtin metrics can be exposed via `/metrics` (Prometheus format). Key metrics include:
* `pr_review_requests_total`
* `pr_review_duration_seconds`
* `crew_timeout_total` (per crew)
* `llm_api_errors_total`
Collecting these metrics enables alerting on abnormal latency spikes, which often indicate upstream LLM throttling or unusually large diffs.
### Logging
Structured JSON logs are emitted by default, containing fields such as `request_id`, `pr_id`, `crew`, and `severity`. When integrated with a log aggregation platform (e.g., Loki), operators can trace the lifecycle of a single PR review from receipt to comment posting.
### Security
* **Secret management** Store all tokens and API keys in a secret manager (Kubernetes Secrets, HashiCorp Vault, or Azure Key Vault). Never commit `.env` files to source control.
* **Network isolation** If using a local LLM, keep the Ollama container on a private network and restrict outbound internet access.
* **Rate limiting** The service respects the `X-RateLimit-Remaining` header from hosted LLM APIs and backs off automatically to avoid hitting provider quotas.
## Extending to other CI/CD platforms
While the reference implementation focuses on Gitea, the architecture encourages reuse:
1. **Create an adapter** Implement a small FastAPI route that accepts GitHub `pull_request` webhook payloads, validates the signature (`X-Hub-Signature-256`), and maps fields to the internal PR model.
2. **Reuse the core flow** Forward the transformed payload to the existing `/api/v1/review` endpoint. No changes to the review crews are required.
3. **Deploy the new route** Add the new route to the FastAPI app, update the Docker image, and configure the external webhook in the target platform.
Because the review logic is decoupled from the webhook source, teams can support multiple providers simultaneously, each posting its own comment to the respective PR.
## Development workflow
Contributors who wish to enhance PR Reviewer can follow these steps:
```bash
# Clone the repository
git clone https://git.aridgwayweb.com/armistace/pr_reviewer.git
cd pr_reviewer
# Install development dependencies
uv pip install -e ".[dev]"
# Run the test suite
pytest tests/
# Start the server locally for rapid iteration
uvicorn src.pr_reviewer.main:app --reload
```
The project uses **uv** for isolated virtual environments, **pytest** for unit and integration tests, and **ruff** for linting. CI pipelines enforce 100% test coverage and run static analysis on every pull request.
### Adding a new review tool
To incorporate an additional analysis tool (e.g., a custom static analyser), developers should:
1. Write a thin wrapper that converts the tools output into the MCP schema.
2. Register a new crew in `crews/` that invokes the wrapper.
3. Update the orchestration flow (`flow.py`) to include the new crew in the parallel execution block.
4. Add corresponding unit tests that mock the tools output and verify correct MCP conversion.
## Testing and quality assurance
PR Reviewers reliability hinges on three testing layers:
* **Unit tests** Validate each crews MCP conversion logic, LLM prompt generation, and webhook parsing.
* **Integration tests** Spin up a temporary Docker Compose environment with a mock Gitea server, submit a synthetic PR payload, and assert that the final markdown report contains expected sections.
* **Endtoend tests** Deploy the Helm chart to a disposable Kubernetes namespace, trigger a real Gitea webhook, and verify that the comment appears on the PR with correct formatting.
All tests run in CI on every push, and failures block merges.
## Community and contributions
The project is deliberately opensource, hosted on a selfmanaged Gitea instance. Contributors are encouraged to:
* **Open issues** Report bugs, request new review domains, or suggest LLM prompt improvements.
* **Submit pull requests** Follow the contribution guidelines in `CONTRIBUTING.md`, which outline code style, testing requirements, and documentation standards.
* **Share custom guidelines** Teams can publish repositoryspecific markdown files (e.g., `code_review.md`) that the summariser will automatically honour.
Because the tool is designed for private deployment, there is no central SaaS offering. Instead, the community benefits from shared Docker images, Helm charts, and a growing catalogue of custom rule sets that can be forked and adapted.
## Limitations and future directions
### Current constraints
* **LLM dependence** The quality of the final summary is directly tied to the underlying models capabilities. Lowcapacity models may produce vague recommendations.
* **Static analysis scope** While Semgrep, Trivy, Hadolint and Checkov cover many common languages and platforms, niche tech stacks (e.g., Rust, Terraform Cloud) require additional adapters.
* **No builtin CI/CD orchestration** PR Reviewer focuses on the review step; it does not enforce merge policies or gate deployments. Teams must integrate the API into their existing pipelines.
### Planned enhancements
1. **Modelagnostic prompt optimisation** Research into dynamic prompt templates that adapt to the strengths of each LLM provider.
2. **Feedback loop** Capture developer reactions to the AI suggestions (e.g., thumbs up/down) and use them to finetune future prompts.
3. **Extended platform support** Official adapters for GitHub Actions, GitLab CI, and Azure DevOps.
4. **Cache layer** Introduce a Redisbacked cache for repeated scans of unchanged files, reducing compute cost on large monorepos.
5. **Policy as code** Allow organisations to define review policies in a declarative YAML format that the summariser can reference, enabling compliancefirst workflows.
## Conclusion
PR Reviewer demonstrates that AIdriven code quality, security, and infrastructure analysis can be delivered as a selfhosted, vendorneutral service without sacrificing flexibility or control. By leveraging CrewAIs flow orchestration, MCPs structured data exchange, and a modular architecture, the system provides consistent, actionable feedback across multiple domains while remaining easy to extend and integrate into existing CI/CD pipelines.
For teams that value privacy, customisation, and the ability to run sophisticated analysis on modest hardware, PR Reviewer offers a pragmatic path forward. The opensource nature invites collaboration, and the clear separation between tooling, LLM inference and summarisation ensures that future improvements—whether in scanning capabilities or language model performance—can be adopted with minimal friction.
Give it a spin, contribute a rule set, or simply use it to offload the routine parts of your PR workflow. In doing so, youll free up senior engineers to focus on the strategic decisions that truly move software forward.