Add PR Reviewer overview documentation

This commit is contained in:
Blog Creator 2026-05-21 18:31:39 +00:00
parent e2ec1a3eae
commit c95161dc7c

View File

@ -1,296 +1,309 @@
Title: PR Reviewer - A deployable AI reviewer for your Repos
Date: 2026-05-21 12:12
Modified: 2026-05-21 12:12
Date: 2026-05-21 18:30
Modified: 2026-05-21 18:30
Category: DevOps
Tags: ai, code-review, automation, devops, open-source, not_human_content
Tags: ai, code-review, automation, devops, open-source, ai_content, not_human_content
Slug: pr-reviewer-deployable-ai-reviewer
Authors: qwen3-next.ai, qwen3.5.ai, gemma4.ai, deepseek-v3.2.ai
Summary: An indepth look at PR Reviewer, a locally deployable, multiagent AI system that automates code, security, and infrastructure reviews using CrewAI and the Model Context Protocol.
Authors: glm-5.1.ai, nemotron-3-nano.ai, gemma4.ai, deepseek-v4-flash.ai
Summary: An indepth look at PR Reviewer, a selfhosted, LLMagnostic AI system that automates code, security and infrastructure reviews for any Git repository.
---
## Introduction
Pullrequest (PR) reviews are the gatekeepers of software quality. In a perfect world every change would be examined by a seasoned engineer who can spot bugs, security holes, and architectural drift before they reach production. In reality, teams are often stretched thin, review cycles stretch into days, and the inevitable “LGTM” (looks good to me) can mask subtle defects.
Pull requests (PRs) are the lifeblood of modern software development. They enable collaboration, enforce quality gates, and provide a natural checkpoint before code reaches production. Yet, the manual review process is increasingly strained by the sheer volume of changes, the growing complexity of tech stacks, and the need for specialised expertise in security and infrastructure.
Enter **PR Reviewer**, a selfhosted AIpowered review engine that brings the rigor of a senior engineer to every PR, 24hours a day, without the need for a cloud subscription. Built on top of **CrewAI**, a framework for orchestrating specialised LLM agents, and the **Model Context Protocol (MCP)**, which bridges LLMs with static analysis tools, PR Reviewer offers a modular, extensible, and privacyfirst alternative to hosted codereview bots.
Enter **PR Reviewer**, a locally deployable AIdriven review engine that brings automated, multidomain analysis to any repository. Built on top of CrewAIs flow orchestration and the Model Context Protocol (MCP), the system runs three parallel review streams—code quality, security, and infrastructure—then synthesises a concise, actionable report. It is deliberately LLMagnostic, supporting OpenAI, Anthropic, Ollama and any other provider that conforms to CrewAIs abstraction layer.
This article walks through the motivations behind the project, its core capabilities, the architectural choices that make it both flexible and performant, and practical guidance on getting it up and running in your own environment. By the end youll understand not only *what* PR Reviewer does, but *why* its design decisions matter for teams that value control, security, and reproducibility.
This article walks through the motivations behind PR Reviewer, its architectural choices, feature set, deployment pathways, and practical considerations for teams that want to augment their PR workflow with AI without surrendering control to a thirdparty SaaS.
---
## The case for AIaugmented PR reviews
## Why Automated PR Reviews Matter
### Scaling expertise
### The Human Bottleneck
Traditional code reviews rely on senior engineers to spot antipatterns, security flaws, and deployment misconfigurations. As teams grow, the pool of reviewers does not always keep pace, leading to bottlenecks and inconsistent feedback. An AI reviewer can apply a consistent set of rules across every PR, ensuring that even junior contributors receive highquality guidance.
Even the most disciplined teams eventually hit a capacity ceiling. Experienced reviewers are a scarce resource, and junior developers often lack the depth to provide comprehensive feedback. When review latency spikes, so does the risk of merging regressions, security oversights, or nonconformant infrastructure changes.
### Reducing cognitive load
### Consistency Across Repositories
Human reviewers must juggle multiple concerns—style, correctness, performance, compliance—while also understanding the broader context of a change. By offloading routine checks to an automated system, reviewers can focus on architectural decisions and nuanced tradeoffs that truly require human judgement.
Large organisations typically maintain a suite of style guides, security policies, and infrastructure standards. Enforcing these manually is errorprone; a single missed rule can cascade into production incidents. Automated reviewers can codify these expectations and apply them uniformly, ensuring that every PR is measured against the same baseline.
### Faster feedback loops
### The Cost of CloudBased AI
Continuous integration pipelines already provide rapid build and test feedback. Adding an AI review step that runs in parallel with existing checks shortens the time between code submission and actionable feedback, encouraging a “shiftleft” mentality where problems are caught earlier.
Commercial AI review services usually require a SaaS subscription, sending source code to external endpoints. For organisations handling sensitive data, proprietary algorithms, or regulated workloads, that model is untenable. A locally deployable solution eliminates data egress concerns while still leveraging the latest LLM capabilities.
### Vendorneutral flexibility
---
Many commercial AI review tools lock users into proprietary APIs and cloudonly deployments. PR Reviewers design deliberately avoids vendor lockin. By abstracting the LLM layer, teams can run the service onpremise, on a private cloud, or even on a modest workstation using a local model such as Ollama.
## The PR Reviewer Vision
## Core concepts
PR Reviewer is deliberately positioned as a **private, communitydriven project**. It is not a commercial product; rather, it is a toolbox that developers can run on their own hardware, customise with their own guidelines, and extend with additional analysis tools as needed.
### CrewAI flows
Key aspirations include:
CrewAI provides a lightweight framework for orchestrating multiple “crews” (agents) that each perform a specialised task. In PR Reviewer, three crews—**CodeReviewCrew**, **SecurityCrew**, and **InfraCrew**—operate concurrently. Each crew receives the same PR context, runs its own analysis toolchain (Semgrep, Trivy, Hadolint/Checkov respectively), and returns a structured narrative.
1. **Provider Agnosticism** The LLM factory abstracts over OpenAI, Anthropic, Ollama, and any future provider that conforms to the standard API.
2. **ContextAware Reviews** By ingesting repositoryspecific style guides and security policies, the system tailors its feedback to the conventions that matter to you.
3. **MultiAgent Orchestration** Separate agents specialise in code quality, security scanning, and infrastructure linting, each feeding results into a synthesiser that produces a humanreadable summary.
4. **Extensible Architecture** New agents or static analysis tools can be added without touching the core orchestration logic, thanks to the MCP integration layer.
### Model Context Protocol (MCP)
---
MCP standardises how external tools expose their findings to an LLM. Instead of feeding raw tool output, MCP wraps results in a JSON schema that includes severity, location, and remediation suggestions. This uniform representation enables the summariser crew to merge disparate findings into a single coherent report.
## Core Features
### Summariser crew
### Code Review
The final crew consumes the three domainspecific outputs and asks the LLM to produce a humanreadable summary. The prompt includes the repositorys coding style guidelines (if supplied) and any custom review policies, ensuring the tone and recommendations align with the teams expectations.
The codereview agent runs **Semgrep** through MCP, applying a curated rule set that checks for common antipatterns, languagespecific best practices, and maintainability concerns. Because the rule set lives in a versioncontrolled directory (`contexts/defaults/code_review.md`), teams can evolve it alongside their codebase.
## Feature overview
### Security Review
| Feature | Description |
|---|---|
| **Code review** | Style, maintainability and bestpractice checks powered by Semgrep. |
| **Security review** | Vulnerability scanning, secret detection and container image analysis via Trivy. |
| **Infrastructure review** | Dockerfile linting, Kubernetes manifest validation, IaC checks using Hadolint and Checkov. |
| **Summarisation** | Consolidated, actionable report generated by an LLM. |
| **REST API** | FastAPI endpoints for health checks, manual review triggers, and webhook handling. |
| **Gitea webhook** | Automatic PR event processing, diff fetching, and comment posting. |
| **Dockerised** | Multistage build with all dependencies baked in. |
| **Kubernetes ready** | Helmcompatible manifests and CI pipeline for automated deployment. |
| **LLMagnostic** | Works with OpenAI, Anthropic, Ollama or any CrewAIcompatible provider. |
| **Configurable guidelines** | Override default review policies with repositoryspecific markdown files. |
Security is handled by the **Trivy** agent, which scans the PRs dependency tree, container images, and configuration files for known vulnerabilities. The agent also respects custom security policies defined in `contexts/defaults/security_review.md`, allowing organisations to enforce, for example, “no use of insecure TLS versions”.
## Architecture deep dive
### Infrastructure Review
At a high level, PR Reviewer follows a requestresponse pattern orchestrated by FastAPI. When a review request arrives—either via the `/api/v1/review` endpoint or a Gitea webhook—the service extracts the PR metadata, fetches the changed files, and constructs an MCPcompatible payload. This payload is then dispatched to the three review crews in parallel.
Infrastructureascode files (Dockerfiles, Kubernetes manifests, Terraform) are examined by **Hadolint** and **Checkov** wrappers. The resulting feedback highlights misconfigurations, deprecated APIs, and opportunities for resource optimisation.
```
POST /api/v1/review → FastAPI handler
├─► Fetch diffs from Gitea (or use supplied file list)
├─► Build MCP payload
├─► Parallel execution:
│ ├─ CodeReviewCrew (Semgrep)
│ ├─ SecurityCrew (Trivy)
│ └─ InfraCrew (Hadolint + Checkov)
└─► Summariser crew → LLM → JSON response
└─► Return consolidated report
```
### Contextual Review
### Parallelism and timeouts
Beyond static analysis, PR Reviewer accepts a **context payload** that can embed projectspecific guidelines. This means the AI can reference your own coding style guide when suggesting changes, rather than relying on generic conventions.
Each crew runs in its own asynchronous task with a configurable timeout (`PER_CREW_TIMEOUT`). The overall workflow respects a global timeout (`TOTAL_FLOW_TIMEOUT`) to prevent runaway processing on large PRs. If a crew exceeds its limit, the summariser notes the omission and proceeds with the available data.
### Automated Orchestration
### Data flow and persistence
CrewAI flows coordinate the three agents, handling parallel execution, error aggregation, and result synthesis. The final output is a concise markdown report that can be posted back to the PR as a comment, emailed to the author, or stored in an audit log.
PR Reviewer is deliberately stateless. All inputs are supplied in the request body, and all outputs are returned as JSON. This design simplifies horizontal scaling—multiple instances can sit behind a load balancer without coordination. For audit purposes, teams can enable optional logging to an external store (e.g., Elasticsearch) via environment variables.
### REST API
## Integration with LLM providers
A lightweight **FastAPI** service exposes two endpoints: a health check and a review trigger. The API accepts a JSON payload describing the PR, the changed files, and any custom context. Responses include a unique `review_id`, processing time, and the full set of agent results.
CrewAI abstracts the LLM behind a simple interface: `generate(prompt, model, temperature)`. The service reads three environment variables to configure the provider:
### Containerised Deployment
* `LLM_PROVIDER` `openai`, `anthropic`, or `ollama`.
* `LLM_MODEL` model identifier (e.g., `gpt-4`, `claude-3-sonnet`, `gemma4:31b-cloud`).
* `LLM_API_KEY` required for hosted services; omitted for local Ollama instances.
The entire stack is packaged as a Docker image, enabling oneline deployment on any host that runs Docker or Kubernetes. For teams that prefer a baremetal Python environment, a virtualenvironment based installation is also supported.
Because the prompt is generated programmatically, switching providers does not require code changes—only a restart with new environment values. This flexibility is crucial for teams that wish to experiment with emerging opensource models without rewriting integration logic.
---
## Review flows in detail
## Architectural Overview
### Code review crew
At a high level PR Reviewer follows a **modular, flowbased architecture** that separates concerns into distinct layers.
The code crew invokes Semgrep with a curated rule set that reflects common Python, JavaScript and Go best practices. Findings are normalised into MCP entries containing:
1. **API Layer** FastAPI receives HTTP requests, validates payloads with Pydantic models, and forwards them to the orchestration engine.
2. **Orchestration Layer** CrewAI flows instantiate specialised agents (code, security, infra) and manage their lifecycle. Agents run concurrently, each returning a structured result.
3. **LLM Factory** A provideragnostic factory creates LLM clients based on environment variables (`LLM_PROVIDER`, `LLM_API_KEY`, etc.). This abstraction permits swapping providers without code changes.
4. **Context Resolver** Before agents run, the resolver merges repositorywide guidelines with any perrequest overrides, producing a unified context object that agents can reference.
5. **MCP Integration Layer** Each static analysis tool is wrapped in an MCP server that exposes a simple JSONRPC interface. The agents invoke these servers, passing file contents and receiving findings.
6. **Result Synthesiser** The final agent consumes the raw findings, prompts the LLM to summarise them, and formats the output as markdown.
* **Severity** `critical`, `high`, `medium`, `low`.
* **Location** file path and line range.
* **Message** concise description of the issue.
* **Remediation** suggested code change or reference to documentation.
All components communicate via **typed Python data classes**, ensuring that contracts remain explicit and testable. The use of Pydantic for state management also provides automatic validation and serialization, reducing boilerplate.
If a repository supplies a custom `code_review.md` guideline file, its contents are appended to the prompt, allowing the LLM to tailor feedback to the teams style (e.g., preferring fstrings over `%` formatting).
---
### Security review crew
## Installation Guide
Security analysis runs Trivy in two modes: vulnerability scanning of any container images referenced in the PR, and filesystem scanning for secrets, misconfigurations, and known vulnerable dependencies. The output is again wrapped in MCP, with an additional field indicating **exploitability** based on CVSS scores.
### Prerequisites
### Infrastructure review crew
- **Python 3.103.13** The project leverages modern language features such as structural pattern matching.
- **UV package manager** A fast, deterministic installer that replaces `pip` for reproducible builds.
- **Git** Required for cloning the repository and for any internal operations that need repository metadata.
- **Docker** (optional) For containerised deployment; not required if you prefer a virtualenvironment install.
Infrastructure checks focus on Dockerfiles, Kubernetes manifests, and generic IaC (Terraform, CloudFormation). Hadolint validates Dockerfile best practices, while Checkov evaluates cloud resource definitions against industrystandard policies (e.g., CIS benchmarks). The crew also respects any `infra_review.md` file that may contain organisationspecific constraints such as mandatory resource limits.
### Local Development
### Summariser crew
1. **Clone the repository**
The summariser receives three JSON arrays and constructs a single prompt that asks the LLM to:
1. Produce an executive summary of the overall health of the PR.
2. List the top5 findings across all domains, ordered by severity.
3. Provide actionable recommendations, grouped by domain.
4. Highlight any deviations from the repositorys own guidelines.
The result is a markdown document that can be posted directly as a PR comment, ensuring developers receive a readable, contextaware report without additional formatting steps.
## API design
PR Reviewer exposes a minimal FastAPI surface:
* `GET /api/v1/health` health check returning `{ "status": "healthy", "service": "pr-reviewer" }`.
* `POST /api/v1/review` manual trigger; expects a JSON payload describing the PR (metadata, file list, optional overrides). Returns a JSON object containing a unique `review_id`, timestamps, and the full review results.
* `POST /api/v1/gitea-webhook` endpoint for Gitea pullrequest events. Validates the `X-Gitea-Signature` header (if `ACCESS_GITEA_SECRET` is set), fetches the diff via the Gitea API, runs the review pipeline, and posts the markdown summary as a comment on the PR.
All endpoints respect standard HTTP status codes and include descriptive error messages for malformed requests, authentication failures, or internal timeouts.
## Gitea webhook integration
Gitea is the default CI/CD platform for the reference implementation, but the webhook handler is deliberately generic:
1. **Signature verification** HMACSHA256 using the secret configured in `ACCESS_GITEA_SECRET`. If the secret is omitted, verification is skipped (useful for local testing).
2. **Payload parsing** Only `pull_request` events with actions `opened`, `synchronize`, or `reopened` are processed. Other events are ignored to reduce noise.
3. **Diff retrieval** The handler calls the Gitea API (`/repos/{owner}/{repo}/pulls/{id}/files`) to obtain the list of changed files, their statuses, and raw content when needed.
4. **Review execution** The same parallel crew workflow described earlier runs on the fetched diff.
5. **Comment posting** Upon completion, the service posts the markdown report to the PR using the Gitea API (`/repos/{owner}/{repo}/issues/{id}/comments`).
### Adding support for other platforms
Because the webhook payload is parsed into a canonical internal model, extending support to GitHub, GitLab or Bitbucket merely requires a thin adapter that translates their event schemas into the same structure. The core review logic remains untouched, making crossplatform adoption straightforward.
## Deployment options
### Docker compose (local development)
The repository ships with a `docker-compose.yaml` that defines two services:
* `pr-reviewer` the FastAPI application.
* `ollama` (optional) a local LLM server for offline use.
Running `docker compose up` builds the multistage image, injects environment variables from `.env`, and exposes the API on `http://localhost:8000`.
### Kubernetes (production)
For production workloads, a Helm chart (or plain manifests in `kube/`) provides:
* A Deployment with configurable replica count.
* A Service of type `NodePort` (default port `30001`) or `LoadBalancer` for cloud environments.
* A Secret (`pr-reviewer-env`) that stores all `.env` values, including Gitea tokens and LLM credentials.
* An optional HorizontalPodAutoscaler that scales based on CPU utilisation.
The CI pipeline (`.gitea/workflows/build_push.yml`) automatically builds a multiarch Docker image, pushes it to the configured registry, and applies the Kubernetes manifests.
### Resource considerations
* **CPU** The LLM inference dominates CPU usage. When using a hosted provider, the containers CPU footprint is modest (mostly for Semgrep/Trivy). With a local model, allocate at least 4 vCPUs and 8GB RAM.
* **Memory** Each review crew consumes roughly 200MB of RAM; the summariser adds another 150MB. The total stays under 1GB for typical PR sizes.
* **Storage** The image size is ~1.2GB (including all scanning tools). Persistent storage is not required unless audit logging is enabled.
## Configuration details
All runtime options are supplied via environment variables. The most important groups are:
| Variable | Required? | Description |
|---|---|---|
| `LLM_PROVIDER` | Yes | `openai`, `anthropic`, or `ollama`. |
| `LLM_MODEL` | Yes | Model identifier (e.g., `gpt-4`). |
| `LLM_API_KEY` | Conditional | API key for hosted providers. |
| `ACCESS_GITEA_URL` | Yes | Base URL of the Gitea instance. |
| `ACCESS_GITEA_TOKEN` | Yes | Personal access token with repository read scope. |
| `ACCESS_GITEA_SECRET` | No | Webhook secret for HMAC verification. |
| `TOTAL_FLOW_TIMEOUT` | No (default 600) | Max seconds for the whole review pipeline. |
| `PER_CREW_TIMEOUT` | No (default 300) | Max seconds per individual crew. |
| `LOG_LEVEL` | No (default `INFO`) | Python logging verbosity. |
Additional optional variables allow overriding default review guidelines (`CODE_REVIEW_GUIDELINES`, `SECURITY_REVIEW_GUIDELINES`, `INFRA_REVIEW_GUIDELINES`) by pointing to markdown files stored in the container or mounted via a volume.
## Operational considerations
### Monitoring
FastAPIs builtin metrics can be exposed via `/metrics` (Prometheus format). Key metrics include:
* `pr_review_requests_total`
* `pr_review_duration_seconds`
* `crew_timeout_total` (per crew)
* `llm_api_errors_total`
Collecting these metrics enables alerting on abnormal latency spikes, which often indicate upstream LLM throttling or unusually large diffs.
### Logging
Structured JSON logs are emitted by default, containing fields such as `request_id`, `pr_id`, `crew`, and `severity`. When integrated with a log aggregation platform (e.g., Loki), operators can trace the lifecycle of a single PR review from receipt to comment posting.
### Security
* **Secret management** Store all tokens and API keys in a secret manager (Kubernetes Secrets, HashiCorp Vault, or Azure Key Vault). Never commit `.env` files to source control.
* **Network isolation** If using a local LLM, keep the Ollama container on a private network and restrict outbound internet access.
* **Rate limiting** The service respects the `X-RateLimit-Remaining` header from hosted LLM APIs and backs off automatically to avoid hitting provider quotas.
## Extending to other CI/CD platforms
While the reference implementation focuses on Gitea, the architecture encourages reuse:
1. **Create an adapter** Implement a small FastAPI route that accepts GitHub `pull_request` webhook payloads, validates the signature (`X-Hub-Signature-256`), and maps fields to the internal PR model.
2. **Reuse the core flow** Forward the transformed payload to the existing `/api/v1/review` endpoint. No changes to the review crews are required.
3. **Deploy the new route** Add the new route to the FastAPI app, update the Docker image, and configure the external webhook in the target platform.
Because the review logic is decoupled from the webhook source, teams can support multiple providers simultaneously, each posting its own comment to the respective PR.
## Development workflow
Contributors who wish to enhance PR Reviewer can follow these steps:
```bash
# Clone the repository
git clone https://git.aridgwayweb.com/armistace/pr_reviewer.git
cd pr_reviewer
# Install development dependencies
uv pip install -e ".[dev]"
# Run the test suite
pytest tests/
# Start the server locally for rapid iteration
uvicorn src.pr_reviewer.main:app --reload
```
2. **Install UV**
The project uses **uv** for isolated virtual environments, **pytest** for unit and integration tests, and **ruff** for linting. CI pipelines enforce 100% test coverage and run static analysis on every pull request.
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
```
### Adding a new review tool
3. **Create and activate a virtual environment**
To incorporate an additional analysis tool (e.g., a custom static analyser), developers should:
```bash
uv venv .venv
source .venv/bin/activate
```
1. Write a thin wrapper that converts the tools output into the MCP schema.
2. Register a new crew in `crews/` that invokes the wrapper.
3. Update the orchestration flow (`flow.py`) to include the new crew in the parallel execution block.
4. Add corresponding unit tests that mock the tools output and verify correct MCP conversion.
4. **Install the project in editable mode**
## Testing and quality assurance
```bash
uv pip install -e .
```
PR Reviewers reliability hinges on three testing layers:
5. **Configure environment variables**
* **Unit tests** Validate each crews MCP conversion logic, LLM prompt generation, and webhook parsing.
* **Integration tests** Spin up a temporary Docker Compose environment with a mock Gitea server, submit a synthetic PR payload, and assert that the final markdown report contains expected sections.
* **Endtoend tests** Deploy the Helm chart to a disposable Kubernetes namespace, trigger a real Gitea webhook, and verify that the comment appears on the PR with correct formatting.
Copy `.env.example` to `.env` and fill in values for your chosen LLM provider, API keys, and any MCP server endpoints.
All tests run in CI on every push, and failures block merges.
### Docker Deployment
## Community and contributions
If you prefer an isolated container, the Dockerfile builds a minimal image based on `python:3.13-slim`.
The project is deliberately opensource, hosted on a selfmanaged Gitea instance. Contributors are encouraged to:
```bash
docker build -t pr-reviewer .
docker run -p 8000:8000 --env-file .env pr-reviewer
```
* **Open issues** Report bugs, request new review domains, or suggest LLM prompt improvements.
* **Submit pull requests** Follow the contribution guidelines in `CONTRIBUTING.md`, which outline code style, testing requirements, and documentation standards.
* **Share custom guidelines** Teams can publish repositoryspecific markdown files (e.g., `code_review.md`) that the summariser will automatically honour.
The service will be reachable at `http://localhost:8000`.
Because the tool is designed for private deployment, there is no central SaaS offering. Instead, the community benefits from shared Docker images, Helm charts, and a growing catalogue of custom rule sets that can be forked and adapted.
---
## Limitations and future directions
## Using the Service
### Current constraints
### Health Check
* **LLM dependence** The quality of the final summary is directly tied to the underlying models capabilities. Lowcapacity models may produce vague recommendations.
* **Static analysis scope** While Semgrep, Trivy, Hadolint and Checkov cover many common languages and platforms, niche tech stacks (e.g., Rust, Terraform Cloud) require additional adapters.
* **No builtin CI/CD orchestration** PR Reviewer focuses on the review step; it does not enforce merge policies or gate deployments. Teams must integrate the API into their existing pipelines.
A simple GET request to `/api/v1/health` returns a JSON payload confirming that the API and all downstream agents are operational.
### Planned enhancements
### Triggering a Review
POST a JSON document to `/api/v1/review`. The payload must contain:
- **PR metadata** ID, title, description, source/target branches.
- **Repository information** Name and URL.
- **Changed files** Path, content, status, and diff statistics.
- **Context** Optional overrides for code, security, and infra guidelines.
The service responds immediately with a `review_id`. You can poll a status endpoint (not shown here) or wait for the final result, which includes a synthesized markdown summary and the raw findings from each agent.
Because the API is deliberately thin, it can be integrated into any CI/CD platform that supports HTTP calls GitHub Actions, GitLab CI, Azure Pipelines, you name it.
---
## Configuration Details
### Environment Variables
Key variables include:
- `LLM_PROVIDER` e.g. `openai`, `anthropic`, `ollama`.
- `LLM_API_KEY` Secret token for the chosen provider.
- `MCP_SEMgrep_ENDPOINT`, `MCP_Trivy_ENDPOINT`, etc. URLs of the MCP wrappers.
- `REVIEW_TIMEOUT_SECONDS` Upper bound for the total review duration.
All defaults are documented in `.env.example`.
### Context Files
The `contexts/defaults/` directory ships with three markdown files that encode baseline guidelines:
- **code_review.md** Languageagnostic style rules, naming conventions, and complexity thresholds.
- **security_review.md** Threatmodel assumptions, prohibited functions, and dependencyversion policies.
- **infra_review.md** Containerimage best practices, Kubernetes resource limits, and IaC linting rules.
Projects can replace or extend these files, or supply perrequest overrides via the APIs `context` field. This flexibility ensures that the AIs suggestions are always aligned with the teams current standards.
---
## Development Workflow
### Running Tests
The test suite is split into unit and integration tests.
- **Unit tests** validate purePython logic such as the context resolver and result synthesiser.
- **Integration tests** spin up temporary MCP servers and verify endtoend behaviour of each agent.
Execute the full suite with coverage reporting:
```bash
pytest --cov=src.pr_reviewer
```
### Code Style
The project enforces a strict style using **Black** for formatting and **Flake8** for linting. Developers should run these tools locally before committing.
```bash
black src/
flake8 src/
```
### Adding a New Agent
To introduce a new review domain (e.g., license compliance), follow these steps:
1. Implement an MCP wrapper for the underlying tool.
2. Create a Pydantic model for the agents input and output.
3. Register the agent in the CrewAI flow configuration.
4. Add corresponding context documentation under `contexts/defaults/`.
Because the orchestration layer treats agents as black boxes that accept a context and return a structured result, the integration effort is minimal.
---
## Deployment Strategies
### Kubernetes
For production workloads, the `k8s/` directory provides Helmcompatible manifests:
- **Secret** Stores LLM API keys and MCP credentials.
- **Deployment** Runs the FastAPI container with configurable replica count.
- **Service** Exposes the API via a ClusterIP or LoadBalancer, depending on your environment.
The manifests are deliberately simple, allowing teams to augment them with sidecar containers for logging, monitoring, or additional security scanning.
### CI/CD Integration
A sample GitHub Actions workflow (`.gitea/workflows/deploy.yaml`) demonstrates how to build the Docker image, push it to a registry, and apply the Kubernetes manifests on each merge to `main`. The same pattern can be adapted for GitLab CI, Azure Pipelines, or any other automation platform.
---
## Community and Contribution
PR Reviewer is a **communityfirst** project. The repository is hosted on a selfmanaged Git server, but pull requests are welcomed from anyone willing to improve the codebase. Typical contribution pathways include:
- **Bug fixes** Reported via the issue tracker, with accompanying unit tests.
- **Feature enhancements** New agents, additional MCP wrappers, or UI improvements (e.g., a lightweight web dashboard).
- **Documentation updates** Clarifying installation steps, adding languagespecific guidelines, or improving the README.
All contributions should follow the standard forkbranchPR workflow, and the CI pipeline will automatically run the test suite and linting checks.
---
## Future Directions
### Expanded Provider Support
While the current LLM factory covers the major commercial providers, the architecture is ready for emerging opensource models (e.g., Llama3, Mistral) that can be served locally via Ollama or vLLM. Adding a new provider is a matter of implementing a thin adapter that conforms to the `LLMClient` interface.
### Adaptive Learning
One avenue under investigation is **feedback loops** where the AIs suggestions are rated by developers, and those ratings are fed back into a reinforcementlearning pipeline. Over time the system could learn the nuances of a particular teams style, reducing false positives.
### Richer UI Integration
Beyond posting markdown comments, a dedicated web UI could visualise findings, allow developers to acknowledge or dismiss specific issues, and provide oneclick remediation scripts.
### PolicyasCode
Integrating with policyascode frameworks such as **OPA** would enable dynamic, ruledriven security reviews that adapt to changing compliance requirements without code changes.
---
1. **Modelagnostic prompt optimisation** Research into dynamic prompt templates that adapt to the strengths of each LLM provider.
2. **Feedback loop** Capture developer reactions to the AI suggestions (e.g., thumbs up/down) and use them to finetune future prompts.
3. **Extended platform support** Official adapters for GitHub Actions, GitLab CI, and Azure DevOps.
4. **Cache layer** Introduce a Redisbacked cache for repeated scans of unchanged files, reducing compute cost on large monorepos.
5. **Policy as code** Allow organisations to define review policies in a declarative YAML format that the summariser can reference, enabling compliancefirst workflows.
## Conclusion
Automated PR reviews have moved from a novelty to a necessity, especially as codebases grow and security expectations tighten. **PR Reviewer** offers a pragmatic, privacypreserving solution that brings together the best of LLM reasoning, static analysis, and communitydriven guidelines. Its modular design means you can start with the outofthebox code, security, and infra agents, then extend the platform to cover any domain your team cares about.
PR Reviewer demonstrates that AIdriven code quality, security, and infrastructure analysis can be delivered as a selfhosted, vendorneutral service without sacrificing flexibility or control. By leveraging CrewAIs flow orchestration, MCPs structured data exchange, and a modular architecture, the system provides consistent, actionable feedback across multiple domains while remaining easy to extend and integrate into existing CI/CD pipelines.
Because the system runs wherever you choose—on a developer laptop, a CI runner, or a Kubernetes cluster—you retain full control over data, costs, and performance. The opensource licence (MIT) encourages collaboration, and the clear contribution path invites you to shape the tools evolution.
For teams that value privacy, customisation, and the ability to run sophisticated analysis on modest hardware, PR Reviewer offers a pragmatic path forward. The opensource nature invites collaboration, and the clear separation between tooling, LLM inference and summarisation ensures that future improvements—whether in scanning capabilities or language model performance—can be adopted with minimal friction.
If youve ever wished for a diligent reviewer that never sleeps, respects your coding style, and never leaks your proprietary code, give PR Reviewer a spin. The repository is ready at <https://git.aridgwayweb.com/armistace/pr_reviewer>, and the community is eager to see how youll make it your own.
---
*Happy reviewing, mates!*
Give it a spin, contribute a rule set, or simply use it to offload the routine parts of your PR workflow. In doing so, youll free up senior engineers to focus on the strategic decisions that truly move software forward.