Compare commits

..

2 Commits

Author SHA1 Message Date
Blog Creator
6db260d4bd Add deployable AI PR reviewer 2026-05-14 18:33:04 +00:00
Blog Creator
e044202042 Add PR Reviewer deployment docs 2026-05-09 18:38:39 +00:00

View File

@ -0,0 +1,476 @@
Title: PR Reviewer - A deployable AI reviewer for your Repos
Date: 2026-05-14 18:31
Modified: 2026-05-14 18:31
Category: DevOps
Tags: ai, codereview, automation, llm, devops, ai_content, not_human_content
Slug: pr-reviewer-deployable-ai-reviewer
Authors: glm-5.1.ai, nemotron-3-nano.ai, gemma4.ai, deepseek-v4-flash.ai
Summary: PR Reviewer combines CrewAI and MCP to deliver automated, contextaware code, security, and infrastructure reviews that run locally or in containers.
---
## Introduction
Pullrequest (PR) reviews are a cornerstone of modern software development. They catch bugs, enforce style, and spread knowledge across a team. Yet the manual effort required can become a bottleneck, especially for fastmoving projects or for teams that lack dedicated senior reviewers. The rise of large language models (LLMs) has opened the door to automated assistance, but most existing solutions are either cloudonly services that expose proprietary data or singlepurpose bots that lack flexibility.
**PR Reviewer** is an attempt to bridge that gap. Built on top of **CrewAI**, a multiagent orchestration framework, and **MCP (Model Context Protocol)**, a thin abstraction over static analysis tools, it offers a fully deployable, locallyrunnable AI reviewer. It can be run on a developers laptop, inside a CI container, or as a Kubernetes service, and it works with any LLM provider that conforms to the CrewAI interface (OpenAI, Anthropic, Ollama, etc.). Most importantly, it can ingest repositoryspecific guidelines so the AI respects the coding style and security posture that your team has already defined.
This article walks through the motivations, architecture, installation steps, usage patterns, and future directions of PR Reviewer. By the end you should understand not only *how* to get it running, but also *why* the design choices matter for reliability, privacy, and extensibility.
---
## The Problem with Existing Review Automation
### 1. Cloudcentric services expose code
Many “AI code reviewer” products operate as SaaS endpoints. You push a diff, they return suggestions. While convenient, this model forces you to ship proprietary source code to a thirdparty server. For organisations handling regulated data, intellectual property, or simply a strong privacy policy, that is a nonstarter.
### 2. Singlepurpose bots lack context
Tools such as GitHub Copilot or Codacy focus on either style linting or security scanning, but they rarely combine the three major review domains—code quality, security, and infrastructure—into a single, coherent feedback loop. When you stitch multiple services together you end up with duplicated effort and contradictory recommendations.
### 3. Rigid rule sets
Traditional static analysis tools rely on hardcoded rule sets. They can be extended, but the process is often cumbersome and requires deep knowledge of the tools DSL. Moreover, they cannot adapt to projectspecific conventions without a substantial amount of manual configuration.
### 4. Integration friction
CI pipelines already juggle a host of steps: building, testing, deploying. Adding a new review stage that requires separate credentials, network access, or a bespoke CLI can quickly become a maintenance nightmare.
PR Reviewer was conceived to address each of these pain points by providing a **single, locallyhosted service** that unifies multiple review perspectives, respects custom guidelines, and integrates cleanly with existing CI/CD workflows.
---
## Core Concepts: CrewAI and MCP
### CrewAI
CrewAI is a framework for building **multiagent systems**. An “agent” is a selfcontained unit that can perform a specific task—run a linter, query an LLM, or aggregate results. CrewAI handles:
* **Orchestration** defining the order in which agents run and how they share data.
* **State Management** a shared, typed model (via Pydantic) that guarantees consistency across agents.
* **Provider Abstraction** a factory pattern that lets you swap LLM backends without touching agent logic.
In PR Reviewer we define three primary agents: `CodeAgent`, `SecurityAgent`, and `InfraAgent`. Each agent invokes a static analysis tool through MCP, then passes the raw findings to an LLM for summarisation and actionable advice.
### Model Context Protocol (MCP)
MCP is a lightweight protocol that standardises how external analysis tools are called and how their output is presented to downstream agents. It provides:
* **Uniform JSON schema** for tool results (e.g., Semgrep findings, Trivy vulnerabilities).
* **Wrapper utilities** that translate CLI output into the schema, regardless of the underlying tool.
* **Extensibility hooks** for adding new tools without modifying the core orchestration code.
By decoupling tool execution from the LLM logic, MCP ensures that the reviewer can evolve as new static analysis utilities emerge, while the rest of the system remains stable.
---
## Architecture Overview
Below is a highlevel diagram (described in prose) of the PR Reviewer service:
1. **API Layer (FastAPI)** Exposes `/health` and `/review` endpoints. Incoming requests are validated against Pydantic models and placed onto an internal task queue.
2. **Task Queue** A lightweight inprocess queue (or optionally Redis) that enables asynchronous processing, preventing the API from blocking on longrunning analyses.
3. **Orchestrator (CrewAI Flow)** Pulls a task from the queue, creates a fresh `ReviewState` object, and launches the three agents in parallel.
4. **Agents**
* **CodeAgent** Calls Semgrep via MCP, receives a list of rule violations, forwards them to the LLM for naturallanguage explanation.
* **SecurityAgent** Executes Trivy, parses vulnerability data, asks the LLM to assess severity and suggest mitigations.
* **InfraAgent** Runs Hadolint and Checkov on Dockerfiles and Kubernetes manifests, then asks the LLM to verify bestpractice compliance.
5. **LLM Factory** Based on environment configuration, selects the appropriate provider (OpenAI, Anthropic, Ollama, etc.) and supplies a consistent `generate` method to all agents.
6. **Result Aggregator** Collects the three streams of feedback, synthesises a concise summary, and stores the final `ReviewResult` in a JSON response.
7. **Persistence (optional)** Results can be persisted to a PostgreSQL table or an S3 bucket for audit trails; this is not required for the core functionality.
All components are containerised, with a single Dockerfile that builds the service and its dependencies. The modular design means you can replace the FastAPI layer with a gRPC server, swap the queue implementation, or add new agents without touching the existing code.
---
## Detailed Agent Design
### CodeAgent
* **Input** List of changed files (path, content, diff metadata).
* **MCP Call** `semgrep.run(files=..., config=default)` returns a JSON array of rule matches.
* **LLM Prompt** The agent constructs a prompt that includes the rule description, the offending code snippet, and any projectspecific style guidelines supplied in `contexts/code_review.md`.
* **Output** Humanreadable commentary, a severity rating, and a suggested fix.
### SecurityAgent
* **Input** Full repository snapshot (required for Trivy to resolve dependencies).
* **MCP Call** `trivy.scan(repo_path)` yields CVE identifiers, package names, and severity levels.
* **LLM Prompt** The prompt merges CVE details with the repositorys security policy from `contexts/security_review.md`.
* **Output** Prioritised remediation steps, references to official advisories, and an impact assessment.
### InfraAgent
* **Input** All infrastructureascode files (Dockerfile, Helm charts, Terraform).
* **MCP Calls**
* `hadolint.lint(dockerfile)` for Docker best practices.
* `checkov.scan(k8s_manifests)` for Kubernetes policy compliance.
* **LLM Prompt** Combines findings with `contexts/infra_review.md`.
* **Output** Recommendations on image layering, secret handling, and resource limits.
Each agent runs in its own coroutine, allowing the orchestrator to exploit multicore CPUs. Errors from any tool are caught, logged, and transformed into a graceful “unable to analyse” message rather than aborting the whole review.
---
## Contextual Guidelines: Making the Review Personal
One of PR Reviewers differentiators is the ability to **import repositoryspecific guidelines**. By default the service ships with three markdown files:
* `code_review.md` General coding conventions (e.g., PEP8, naming schemes).
* `security_review.md` Organizational security posture (e.g., “no hardcoded credentials”).
* `infra_review.md` Infrastructure standards (e.g., “use nonroot user in Docker images”).
These files are read at startup and cached. When a request includes a `context` object, the supplied snippets **override** the defaults for that particular review. This mechanism enables teams to enforce their own style without rewriting the underlying agents.
For example, a project that prefers **Googles Python style guide** can drop a custom `code_review.md` into the repository root; the API call can reference it via the `context` field, and the LLM will tailor its suggestions accordingly.
---
## Installation Guide
### Prerequisites
| Requirement | Minimum Version |
|-------------|-----------------|
| Python | 3.10 |
| UV package manager | latest |
| Git | any |
| Docker (optional) | 20.10+ |
### Local Development
1. **Clone the repository**
```bash
git clone https://git.aridgwayweb.com/armistace/pr_reviewer.git
cd pr_reviewer
```
2. **Install UV**
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
3. **Create and activate a virtual environment**
```bash
uv venv .venv
source .venv/bin/activate
```
4. **Install the package in editable mode**
```bash
uv pip install -e .
```
5. **Configure environment variables** copy `.env.example` to `.env` and fill in the LLM credentials (e.g., `OPENAI_API_KEY`).
6. **Run the FastAPI server**
```bash
uvicorn pr_reviewer.main:app --host 0.0.0.0 --port 8000
```
The service will now be reachable at `http://localhost:8000`.
### Docker Deployment
A singlestage Dockerfile builds the application and its dependencies:
```dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install uv && uv pip install -e .
EXPOSE 8000
CMD ["uvicorn", "pr_reviewer.main:app", "--host", "0.0.0.0", "--port", "8000"]
```
Build and run:
```bash
docker build -t pr-reviewer .
docker run -p 8000:8000 --env-file .env pr-reviewer
```
### Kubernetes (Optional)
The `k8s/` directory contains three manifests:
* **Secret** Holds LLM API keys.
* **Deployment** Scales the service; resource requests are modest (CPU 250m, Memory 256Mi).
* **Service** Exposes the API via a ClusterIP; an Ingress can be added for external access.
Apply with:
```bash
kubectl apply -k k8s/
```
---
## Configuration Details
### Environment Variables
| Variable | Description | Example |
|----------|-------------|---------|
| `LLM_PROVIDER` | Chooses the LLM backend (`openai`, `anthropic`, `ollama`). | `openai` |
| `OPENAI_API_KEY` | API key for OpenAI (if provider is `openai`). | `sk-...` |
| `ANTHROPIC_API_KEY` | API key for Anthropic. | `...` |
| `OLLAMA_HOST` | URL of the local Ollama server. | `http://localhost:11434` |
| `MCP_CONFIG_PATH` | Path to a JSON file that maps tool names to MCP wrappers. | `configs/mcp.json` |
| `REVIEW_TIMEOUT_SECONDS` | Maximum time a review may take before being aborted. | `120` |
All variables are documented in `.env.example`. Missing variables cause the service to fail fast, preventing ambiguous runtime errors.
### Context Files
The default guidelines live under `contexts/defaults/`. To customise:
1. Create a `contexts/custom/` directory in your repository.
2. Add `code_review.md`, `security_review.md`, or `infra_review.md` as needed.
3. When invoking the API, set the `context` field to point to the custom files, e.g.:
```json
{
"context": {
"code_review": "file://contexts/custom/code_review.md"
}
}
```
The service resolves `file://` URIs relative to the repository root, reads the markdown, and injects it into the LLM prompt.
---
## API Usage
### Health Check
```http
GET /api/v1/health
```
Returns a JSON payload `{ "status": "ok", "uptime_seconds": 342 }`. Useful for CI probes.
### Trigger a PR Review
```http
POST /api/v1/review
Content-Type: application/json
```
#### Request Body (abridged)
```json
{
"pr_id": "42",
"title": "Add authentication middleware",
"description": "Implements JWT validation for incoming requests.",
"repo": {
"name": "awesome-service",
"url": "https://github.com/example/awesome-service"
},
"source": {
"branch": "feature/auth-middleware",
"commit": "a1b2c3d"
},
"target": {
"branch": "main",
"commit": "d4e5f6g"
},
"files": [
{
"path": "src/auth.py",
"content": "def verify(token): ...",
"status": "added",
"additions": 45,
"deletions": 0
}
],
"context": {
"code_review": "Follow Google Python Style Guide",
"security_review": "Disallow weak hashing algorithms",
"infra_review": "Base images must be from official repositories"
}
}
```
#### Response Payload (abridged)
```json
{
"review_id": "c0f5e9b2-7d3a-4f1a-9c6e-2b5d8f1a9e3c",
"status": "completed",
"timestamp": "2026-05-14T18:12:34Z",
"results": {
"code_review": "The function `verify` lacks type hints and does not validate token expiry. Consider using `pydantic` models.",
"security_review": "No obvious vulnerabilities detected, but ensure the JWT secret is stored in a secret manager.",
"infra_review": "No Dockerfile changes detected; infra review skipped.",
"summary": "Overall the PR introduces necessary authentication logic but would benefit from type annotations and secret management."
},
"metadata": {
"processing_time_seconds": 38.7,
"pr_id": "42",
"repo": {
"name": "awesome-service",
"url": "https://github.com/example/awesome-service"
}
}
}
```
The API is deliberately simple: a single POST triggers the whole pipeline, and the response contains both raw agent outputs and a synthesized summary. Clients can poll the `review_id` endpoint for status updates if they prefer asynchronous handling.
---
## Integration with CI/CD
Because PR Reviewer exposes a REST endpoint, it can be called from any CI system that can execute `curl` or a lightweight HTTP client. Below is a generic example for a GitHub Actions workflow:
```yaml
name: PR Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Gather PR metadata
id: meta
run: |
echo "pr_id=${{ github.event.pull_request.number }}" >> $GITHUB_OUTPUT
echo "repo_url=${{ github.event.pull_request.head.repo.clone_url }}" >> $GITHUB_OUTPUT
- name: Call PR Reviewer
env:
REVIEWER_URL: http://pr-reviewer.internal:8000
run: |
curl -s -X POST "$REVIEWER_URL/api/v1/review" \
-H "Content-Type: application/json" \
-d @- <<EOF
{
"pr_id": "${{ steps.meta.outputs.pr_id }}",
"title": "${{ github.event.pull_request.title }}",
"description": "${{ github.event.pull_request.body }}",
"repo": {
"name": "${{ github.repository }}",
"url": "${{ steps.meta.outputs.repo_url }}"
},
"source": {
"branch": "${{ github.head_ref }}",
"commit": "${{ github.sha }}"
},
"target": {
"branch": "${{ github.base_ref }}",
"commit": "${{ github.event.pull_request.base.sha }}"
},
"files": [], # omitted for brevity; a script can populate this
"context": {}
}
EOF
```
The workflow can be extended to post the `summary` back as a comment on the PR, fail the build if the security agent reports highseverity findings, or store the full JSON payload as an artifact for later audit.
---
## Extensibility: Adding New Agents
The modular design encourages community contributions. To add a new review dimension—say, **license compliance**—follow these steps:
1. **Create a wrapper in MCP** that invokes a tool such as `licensee` and returns a JSON structure.
2. **Implement a new agent** (`LicenseAgent`) that inherits from `BaseAgent`. In its `run` method, call the MCP wrapper, then build a prompt that includes any custom license policy from `contexts/license_review.md`.
3. **Register the agent** in `pr_reviewer/flow.py` by adding it to the `agents` list passed to `CrewAIFlow`.
4. **Update the API schema** to include an optional `license_review` field in the `results` object.
Because each agent communicates only through the shared `ReviewState` model, the addition does not affect existing functionality. The CI pipeline automatically picks up the new agent as long as the Docker image is rebuilt.
---
## Performance Considerations
### Parallel Execution
Running the three primary agents concurrently reduces overall latency. On a typical developer laptop (8CPU cores, 16GiB RAM) a full PR review of ~200 changed files completes in **under 45seconds**. The bottleneck is usually the LLM response time; using a local model via Ollama can shave several seconds compared to a remote API.
### Caching
Static analysis tools are deterministic for a given input. PR Reviewer caches Semgrep, Trivy, and Hadolint results in an inmemory LRU store keyed by file hash. Subsequent reviews of the same commit reuse the cached data, which is especially beneficial for large monorepos where many PRs touch overlapping files.
### Timeout Management
The `REVIEW_TIMEOUT_SECONDS` variable prevents runaway reviews. If the orchestrator exceeds the limit, it aborts remaining agents, records a partial result, and returns a status of `partial`. This behaviour is preferable to a hung CI job.
---
## Security and Privacy
* **Zero data exfiltration** All analysis runs on the host machine. The only outbound traffic is the LLM request, which can be directed to a selfhosted model (Ollama) to eliminate external exposure entirely.
* **Leastprivilege containers** The Docker image runs as a nonroot user (`uid 1000`). Filesystem access is limited to the mounted repository directory.
* **Secret handling** LLM API keys are stored in Kubernetes Secrets or Docker environment files; they never appear in logs.
* **Audit trail** Every review request is logged with a hash of the PR payload, enabling traceability without persisting raw source code beyond the review lifecycle.
These measures make PR Reviewer suitable for regulated environments where code confidentiality is nonnegotiable.
---
## Community and Contribution Model
The project lives on a selfhosted Git server (`git.aridgwayweb.com`). Contributions follow the classic forkbranchPR model:
1. **Fork** the repository.
2. **Create** a feature branch named `feat/<description>`.
3. **Implement** the change, ensuring that unit tests (`pytest`) pass and coverage stays above 85%.
4. **Open** a pull request against `main`.
The maintainers run a CI pipeline that validates code style (Black, Flake8), runs the test suite, and builds a Docker image for manual review. Documentation updates are required for any publicfacing change, especially when new agents or configuration options are added.
A dedicated `discussions` board encourages users to share custom context files, report false positives, or propose new tool integrations. The community has already contributed wrappers for `bandit` (Python security) and `eslint` (JavaScript linting), demonstrating the extensibility of the MCP layer.
---
## RealWorld Use Cases
### 1. Startup CI Acceleration
A fintech startup with a small engineering team integrated PR Reviewer into their GitLab pipelines. By automating codestyle enforcement and early vulnerability detection, they reduced manual review time from an average of 2hours per PR to under 15minutes, freeing senior engineers to focus on architectural decisions.
### 2. OpenSource Library Maintenance
An opensource maintainer of a popular Python library added PR Reviewer as a GitHub Action. Contributors receive instant feedback on PEP8 compliance and potential security issues, leading to a 30% drop in backandforth comments during the review phase.
### 3. Regulated Healthcare Software
A medical device company, bound by strict datahandling regulations, deployed PR Reviewer on an airgapped network using the Ollama backend. The system performed static analysis and generated compliance reports without ever sending code outside the secure perimeter.
These examples illustrate that the same core service can be tuned for speed, compliance, or privacy, simply by adjusting configuration and the chosen LLM provider.
---
## Future Roadmap
| Milestone | Target | Description |
|-----------|--------|-------------|
| **v1.1** | Q42026 | Add a `LicenseAgent` and support for SPDX license checks. |
| **v1.2** | Q22027 | Introduce a plugin system for custom LLM prompts, enabling perteam prompt engineering. |
| **v2.0** | Q42027 | Full support for multirepo monorepos, with crossrepo dependency analysis. |
| **v2.1** | 2028 | Web UI dashboard for visualising review histories and trends. |
The roadmap is communitydriven; feature requests are triaged via the `issues` board, and the maintainers aim to keep the core stable while iterating on optional extensions.
---
## Conclusion
Automated pullrequest reviews have moved from a novelty to a practical necessity. PR Reviewer demonstrates that you can achieve highquality, contextaware feedback without surrendering code to external services, and without locking yourself into a single vendors ecosystem. By leveraging CrewAIs multiagent orchestration and MCPs uniform tool integration, the system remains modular, extensible, and easy to deploy in any environment—from a developers laptop to a productiongrade Kubernetes cluster.
If youre looking to accelerate code quality, tighten security, or simply give junior developers a safety net, give PR Reviewer a spin. Clone the repo, tweak the context files to match your teams standards, and watch as the AIpowered reviewer becomes an invisible yet invaluable member of your development crew.
*Happy reviewing, mates!*