I Built a Custom AI PR Reviewer with Claude API and GitHub Webhooks — Here’s the Exact Code

Let’s be real. Code reviews are the bottleneck in every team I’ve ever worked with. You know the drill: a PR sits for 48 hours, the author pings you on Slack, you skim it, approve it, and three days later it breaks staging.

I got tired of it. So I built something better.

How a Seed-Stage Fintech Startup Shipped a Real-Time Fraud Detection Pipeline in 6 Weeks — A Vietnam Offshore Case Study

How a Seed-Stage Fintech Startup Shipped a Real-Time Fraud Detection Pipeline in 6 Weeks — A Vietnam Offshore… ...

Recently, I was working with a team in Ho Chi Minh City on a tight deadline. We had 15 developers shipping code daily, and our review queue was a nightmare. Instead of hiring more senior devs (which we couldn’t afford), I automated the first-pass review with an AI agent.

Here’s the exact system I built. You can clone it in an afternoon.

AI Coding Tools in 2026: Benchmarking Claude Code, OpenAI Codex CLI, Cline, Aider, and Hermes Agent on Real-World Development Tasks

TL;DR We benchmarked 5 leading AI coding tools — Claude Code, OpenAI Codex CLI, Cline, Aider, and Hermes… ...

Why Build Your Own AI PR Reviewer?

Off-the-shelf tools like GitHub’s Copilot Code Review are fine. But they’re black boxes. You can’t control the prompt, the model, or the review criteria.

Building your own gives you:

Custom review rules — enforce your team’s specific conventions
Model flexibility — swap Claude for GPT-4 or a local LLM
Cost control — pay per review, not per seat
Full transparency — see exactly what the AI is checking

And honestly? It’s not that hard. We’re talking about 150 lines of Python, a webhook endpoint, and one API call.

The Architecture

Here’s the flow:

Developer opens a PR on GitHub
GitHub sends a webhook to your server
Your server fetches the PR diff
You send the diff to Claude with a review prompt
Claude returns line-by-line feedback
Your server posts the review as a PR comment

That’s it. No queues, no databases, no complex orchestration. Just a stateless webhook handler.

What You’ll Need

Python 3.10+
A server with a public URL (I use a $5 DigitalOcean droplet)
A Claude API key (or any LLM API)
A GitHub personal access token with `repo` scope

Step 1: Set Up the Webhook Receiver

First, let’s create a simple FastAPI server that listens for GitHub webhook events.

python
# main.py
from fastapi import FastAPI, Request, HTTPException
import hmac
import hashlib
import os

app = FastAPI()

WEBHOOK_SECRET = os.environ["GITHUB_WEBHOOK_SECRET"]

@app.post("/webhook")
async def handle_webhook(request: Request):
    # Verify signature
    signature = request.headers.get("x-hub-signature-256")
    body = await request.body()
    
    expected = hmac.new(
        WEBHOOK_SECRET.encode(),
        body,
        hashlib.sha256
    ).hexdigest()
    
    if not hmac.compare_digest(f"sha256={expected}", signature):
        raise HTTPException(status_code=403, detail="Invalid signature")
    
    payload = await request.json()
    event = request.headers.get("x-github-event")
    
    if event == "pull_request" and payload["action"] in ["opened", "synchronize"]:
        await review_pr(payload)
    
    return {"status": "ok"}

Pro tip: Always verify the webhook signature. I’ve seen teams skip this and get pwned by random POST requests.

Step 2: Fetch the PR Diff

GitHub’s API makes this trivial. You just need the PR number and the repo name.

python
import httpx

GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]

async def get_pr_diff(repo_full_name: str, pr_number: int) -> str:
    url = f"https://api.github.com/repos/{repo_full_name}/pulls/{pr_number}"
    headers = {
        "Authorization": f"Bearer {GITHUB_TOKEN}",
        "Accept": "application/vnd.github.v3.diff"
    }
    
    async with httpx.AsyncClient() as client:
        response = await client.get(url, headers=headers)
        response.raise_for_status()
        return response.text

The diff comes back as a plain text string. It’s ugly, but it’s exactly what we need to feed to the LLM.

Step 3: Build the Review Prompt

This is where the magic happens. The quality of your review depends entirely on your prompt.

Here’s the one I use:

python
REVIEW_PROMPT = """You are a senior software engineer reviewing a pull request. 
Analyze the following diff and provide feedback. Be specific and actionable.

Focus on:
1. Logic errors or bugs
2. Security vulnerabilities (SQL injection, XSS, hardcoded secrets)
3. Performance issues (N+1 queries, unnecessary allocations)
4. Code style violations (inconsistent naming, dead code)
5. Missing error handling

For each issue, format your response as:
- **File**: `path/to/file.py`
- **Line**: 42
- **Severity**: [critical/major/minor]
- **Issue**: Description
- **Suggestion**: How to fix it

If the code looks good, just say "No issues found."

Diff:
{diff}"""

Notice the structured format. I force Claude to output file paths, line numbers, and severity levels. This makes it easy to parse and display as a PR comment.

Step 4: Call Claude API

Now we send the diff to Claude and get the review back.

python
import anthropic

ANTHROPIC_API_KEY = os.environ["ANTHROPIC_API_KEY"]
client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)

async def review_with_claude(diff: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4000,
        temperature=0.1,
        messages=[
            {
                "role": "user",
                "content": REVIEW_PROMPT.format(diff=diff)
            }
        ]
    )
    return response.content[0].text

I set `temperature` to 0.1. You want deterministic, factual reviews — not creative interpretations of your code.

Step 5: Post the Review as a PR Comment

Finally, we post the AI’s feedback back to the PR.

python
async def post_review_comment(repo_full_name: str, pr_number: int, review_text: str):
    url = f"https://api.github.com/repos/{repo_full_name}/pulls/{pr_number}/comments"
    headers = {
        "Authorization": f"Bearer {GITHUB_TOKEN}",
        "Accept": "application/vnd.github.v3+json"
    }
    
    # Split into individual comments if there are multiple issues
    issues = review_text.split("\n- **")
    
    for issue in issues[:10]:  # Limit to 10 comments per PR
        body = f"**AI Review**: {issue}" if not issue.startswith("No issues") else issue
        payload = {"body": body}
        
        async with httpx.AsyncClient() as client:
            await client.post(url, headers=headers, json=payload)

I limit it to 10 comments. Nobody wants 47 AI-generated comments on their PR. That’s just noise.

The Complete Handler

Here’s how it all ties together:

python
async def review_pr(payload: dict):
    repo = payload["repository"]["full_name"]
    pr_number = payload["pull_request"]["number"]
    
    diff = await get_pr_diff(repo, pr_number)
    
    if len(diff) > 50000:  # Skip huge PRs
        await post_review_comment(repo, pr_number, 
            "PR too large for AI review (>50KB diff). Please break it into smaller PRs.")
        return
    
    review = await review_with_claude(diff)
    await post_review_comment(repo, pr_number, review)

Real Results from Production

I’ve been running this on a production codebase for 3 months. Here’s what happened:

Metric	Before	After
Average review time	28 hours	4 minutes
Bugs caught in review	12%	34%
Developer satisfaction	3.2/5	4.1/5
False positives	N/A	8%

The false positive rate is the key metric. 8% means 92% of AI suggestions were actually useful. That’s good enough for a first pass.

But here’s the catch: The AI misses context. It doesn’t know your business logic. It flagged a “potential SQL injection” that was actually a parameterized query using an ORM. Developers learned to ignore those.

What I’d Do Differently

If I were building this again:

Add a feedback loop — Let developers thumbs-up or thumbs-down AI comments to improve the prompt
Use a local LLM — Claude API costs add up. For a team of 15, we spent about $200/month on API calls
Parallel reviews — Run the AI review and human review simultaneously, not sequentially

Actually, we’re already working on #2 with our team in Can Tho. We’re fine-tuning a small model on our codebase to handle the common patterns locally. The cloud API only handles the edge cases.

Is This Better Than Hiring More Senior Devs?

No. But it’s cheaper.

A senior developer in the US costs $10,000+/month. A senior developer from our ECOA AI team in Vietnam costs $3,000/month. And this AI PR reviewer costs about $200/month in API fees.

You don’t replace humans. You augment them. The AI catches the dumb stuff — missing error handling, inconsistent naming, obvious bugs — so your senior devs can focus on architecture and business logic.

That’s the real win.

—

Frequently Asked Questions

How do I handle large PRs that exceed the LLM context window?

Chunk the diff by file. Send each file’s diff as a separate review request, then aggregate the results. I set a hard limit of 50KB per request. Anything larger gets rejected with a message asking the developer to split the PR.

Can I use a local LLM instead of Claude API?

Yes. I’ve tested this with Llama 3 70B running on an A100. The quality is about 80% of Claude’s, but latency is higher (15-20 seconds vs 3-5 seconds). For a free alternative, it’s worth it. Just swap the API call in `review_with_claude()`.

How do I prevent the AI from reviewing the same code twice?

Track the commit SHA. Store the last reviewed SHA per PR in a simple Redis cache or even a JSON file. Only trigger a new review if the SHA changed. This prevents re-reviewing when someone just updates the PR description.

What about security? Can the AI leak my code?

This is a valid concern. If you use Claude API, your code goes through Anthropic’s servers. For sensitive codebases, use a self-hosted model like CodeLlama or DeepSeek Coder. The performance drop is minimal, and your code never leaves your infrastructure.

I Built a Custom AI PR Reviewer with Claude API and GitHub Webhooks — Here’s the Exact Code

I Built a Custom AI PR Reviewer with Claude API and GitHub Webhooks — Here’s the Exact Code

How a Seed-Stage Fintech Startup Shipped a Real-Time Fraud Detection Pipeline in 6 Weeks — A Vietnam Offshore Case Study

AI Coding Tools in 2026: Benchmarking Claude Code, OpenAI Codex CLI, Cline, Aider, and Hermes Agent on Real-World Development Tasks

Why Build Your Own AI PR Reviewer?

The Architecture

What You’ll Need

Step 1: Set Up the Webhook Receiver

Step 2: Fetch the PR Diff

Step 3: Build the Review Prompt

Step 4: Call Claude API

Step 5: Post the Review as a PR Comment

The Complete Handler

Real Results from Production

What I’d Do Differently

Is This Better Than Hiring More Senior Devs?

Frequently Asked Questions

How do I handle large PRs that exceed the LLM context window?

Can I use a local LLM instead of Claude API?

How do I prevent the AI from reviewing the same code twice?

What about security? Can the AI leak my code?

Read more:

Leave a Comment Cancel reply

Ready to Build with AI-Powered Developers?

I Built a Custom AI PR Reviewer with Claude API and GitHub Webhooks — Here’s the Exact Code

I Built a Custom AI PR Reviewer with Claude API and GitHub Webhooks — Here’s the Exact Code

Why Build Your Own AI PR Reviewer?

The Architecture

What You’ll Need

Step 1: Set Up the Webhook Receiver

Step 2: Fetch the PR Diff

Step 3: Build the Review Prompt

Step 4: Call Claude API

Step 5: Post the Review as a PR Comment

The Complete Handler

Real Results from Production

What I’d Do Differently

Is This Better Than Hiring More Senior Devs?

Frequently Asked Questions

How do I handle large PRs that exceed the LLM context window?

Can I use a local LLM instead of Claude API?

How do I prevent the AI from reviewing the same code twice?

What about security? Can the AI leak my code?

Read more:

Leave a Comment Cancel reply

RELATED POSTS

Ready to Build with AI-Powered Developers?