Build a Custom AI-Powered PR Reviewer with Claude API and GitHub Webhooks — Here’s the Exact Code

1 comment
(Developer Tutorials) - Learn how to build a production-grade AI code reviewer using Claude API and GitHub Webhooks. We'll walk through the exact Python code, architecture decisions, and deployment strategy — no fluff, just working code you can use today.

Build a Custom AI-Powered PR Reviewer with Claude API and GitHub Webhooks — Here’s the Exact Code

You’ve been there. A pull request lands at 4 PM. It’s got 47 files changed, 1,200 lines of new code, and the author wrote a commit message that just says “fixes.”

You can’t review it properly. You skim it. Bugs slip through. The cycle repeats.

This Week’s Hottest GitHub Repos: What’s Actually Worth Your Time?

This Week’s Hottest GitHub Repos: What’s Actually Worth Your Time?

TL;DR: This article breaks down three trending open-source projects on GitHub this week, sharing real-world use cases, performance… ...

I got tired of this. So I built an AI-powered PR reviewer that does the grunt work for me. It uses Claude API, listens to GitHub Webhooks, and posts detailed reviews directly on the PR. It’s been running in production for 3 months now, reviewing every PR in our main repo.

Here’s the exact code and architecture. You can steal this.

Why Smart CTOs Hire Vietnamese Developers for Scalable, Cost-Effective Engineering Teams

Why Smart CTOs Hire Vietnamese Developers for Scalable, Cost-Effective Engineering Teams

TL;DR: Vietnam is now the fastest-growing engineering hub in Southeast Asia. With 57,000+ IT graduates annually, competitive rates… ...

Why Build Your Own Instead of Using a SaaS Tool?

Honestly? Control. SaaS tools like CodeRabbit or GitHub Copilot Code Review are great, but they have limits:

  • No custom rule enforcement. Your team has specific conventions. “Use path aliases from `@/` not relative paths.” SaaS tools won’t catch that out of the box.
  • Data privacy. You’re sending code to a third-party server. If you’re working on a fintech or healthcare product, that’s a hard no.
  • Cost at scale. Reviewing 50 PRs a day with a SaaS tool can get expensive fast. Self-hosting a Claude-powered reviewer costs pennies per review.

The alternative? A lightweight Python server that sits in the middle. It listens for the `pull_request` event on GitHub, analyzes the diff with Claude, and posts a comment back.

Let’s build it.

The Architecture

It’s simpler than you think.


GitHub PR opened → Webhook POST → Flask server → Parse diff → Build prompt → Claude API → Format response → GitHub API comment

No queues. No complex orchestration. For a small team (2-10 devs) handling 20-50 PRs per day, a single-threaded Flask app on a $10 VPS handles this easily.

The Tech Stack

  • Python 3.11+ — We’re using pydantic for data validation. Get with the times.
  • Flask — Lightweight, gets the job done. Not everything needs FastAPI.
  • Claude API (Anthropic) — The Sonnet 4 model. Cheaper than GPT-4 and better at code analysis in my benchmarks.
  • GitHub API — We’ll use PyGithub for posting comments.
  • Redis — Optional, but we use it for deduplication (preventing the same webhook from triggering twice).

Step 1: Set Up the Webhook Receiver

First, you need a server that can receive POST requests from GitHub. GitHub sends a JSON payload with the PR data, including the diff.

python
# app.py
import hmac
import hashlib
import os
from flask import Flask, request, abort
from pydantic import BaseModel, Field
from typing import Optional

app = Flask(__name__)

# Load from environment variables
GITHUB_WEBHOOK_SECRET = os.environ["GITHUB_WEBHOOK_SECRET"]

class PullRequestPayload(BaseModel):
    action: str
    number: int
    pull_request: dict
    repository: dict

def verify_signature(payload_body, signature_header):
    """Verify that the webhook came from GitHub."""
    if not signature_header:
        abort(403, "No signature provided")
    
    hash_object = hmac.new(
        GITHUB_WEBHOOK_SECRET.encode(), 
        msg=payload_body, 
        digestmod=hashlib.sha256
    )
    expected_signature = "sha256=" + hash_object.hexdigest()
    
    if not hmac.compare_digest(expected_signature, signature_header):
        abort(403, "Invalid signature")

@app.route("/webhook", methods=["POST"])
def webhook():
    payload_body = request.data
    signature_header = request.headers.get("X-Hub-Signature-256")
    
    verify_signature(payload_body, signature_header)
    
    data = request.json
    event = request.headers.get("X-GitHub-Event")
    
    if event != "pull_request":
        return "OK", 200
    
    validated = PullRequestPayload(**data)
    
    if validated.action not in ["opened", "synchronize"]:
        # We only review on open and new commits
        return "OK", 200
    
    # We'll fill in review logic next
    print(f"Received PR #{validated.number}")
    
    return "OK", 200

Notice the signature verification. Don’t skip this. Without it, anyone can POST to your endpoint and trigger costly Claude API calls. I’ve seen teams rack up $200+ bills from a single malicious webhook replay.

Step 2: Fetch the Diff and Build the Prompt

GitHub’s API gives us the diff as a unified diff string. We’ll download it, chunk it (Claude has a token limit), and build a structured prompt.

python
# reviewer.py
import requests
import os
from anthropic import Anthropic

GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
ANTHROPIC_API_KEY = os.environ["ANTHROPIC_API_KEY"]
anthropic = Anthropic(api_key=ANTHROPIC_API_KEY)

def get_pr_diff(repo_full_name, pr_number):
    """Fetch the diff for a PR using GitHub's API."""
    headers = {
        "Authorization": f"token {GITHUB_TOKEN}",
        "Accept": "application/vnd.github.v3.diff"
    }
    url = f"https://api.github.com/repos/{repo_full_name}/pulls/{pr_number}"
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    return response.text

def build_review_prompt(diff, pr_title, pr_description):
    """Build a structured prompt for Claude."""
    return f"""You are a senior software engineer conducting a code review.
Your goal is to find bugs, security issues, and maintainability problems.

Review the following pull request diff.

PR Title: {pr_title}
PR Description: {pr_description}

Diff:

{diff[:50000]} # Limit to 50k chars to avoid token overflow



Provide your feedback in this exact format:

CRITICAL: [Anything that could cause a production outage or data loss]
MAJOR: [Bugs, performance issues, or security concerns]
MINOR: [Style issues, maintainability improvements]
PRAISE: [What was done well]

For each issue, include:
1. The file and line number
2. A clear explanation of the problem
3. A concrete suggestion for fixing it

If the PR looks clean, just say "🟢 No issues found." and a brief summary of what was done well."""

A few things here:

  • Token limit handling. I truncate the diff to 50,000 characters. For most PRs (under 30 files), this works. For massive PRs, you’ll need chunking logic. But honestly, if your team is making 50-file PRs regularly, you have a team problem, not a tool problem.
  • Strict format. I force Claude to use a structured format. Without this, the responses are inconsistent and harder to parse.

Step 3: Post the Review Comment

Once Claude returns the review, we post it as a PR comment using the GitHub API.

python
def review_pr(repo_full_name, pr_number):
    diff = get_pr_diff(repo_full_name, pr_number)
    
    # Fetch PR metadata
    pr_url = f"https://api.github.com/repos/{repo_full_name}/pulls/{pr_number}"
    pr_data = requests.get(pr_url, headers={
        "Authorization": f"token {GITHUB_TOKEN}"
    }).json()
    
    review_text = build_review_prompt(
        diff, 
        pr_data["title"], 
        pr_data["body"] or ""
    )
    
    # Call Claude
    message = anthropic.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[{"role": "user", "content": review_text}]
    )
    
    response_text = message.content[0].text
    
    # Post as a comment
    comments_url = f"https://api.github.com/repos/{repo_full_name}/issues/{pr_number}/comments"
    requests.post(comments_url, json={
        "body": f"## 🤖 AI Code Review\n\n{response_text}"
    }, headers={
        "Authorization": f"token {GITHUB_TOKEN}"
    })
    
    return response_text

That’s the core loop. Webhook comes in, we fetch the diff, Claude analyzes it, we post the result.

Step 4: Add Deduplication (Trust Me, You Need This)

GitHub sends webhooks for every event: `opened`, `synchronize`, `edited`, `labeled`, etc. If a dev pushes 3 commits in rapid succession, you’ll get 3 webhooks for the same PR. That’s $0.30 down the drain each time.

Here’s how we fix it with a simple in-memory cache:

python
# app.py (continued)
from datetime import datetime, timedelta
from collections import defaultdict

review_cache = defaultdict(lambda: None)

def is_already_reviewed(pr_number, commit_sha):
    """Check if this exact commit has been reviewed."""
    cache_key = f"{pr_number}:{commit_sha}"
    if review_cache.get(cache_key):
        last_reviewed = review_cache[cache_key]
        if datetime.now() - last_reviewed < timedelta(minutes=10):
            return True
    return False

@app.route("/webhook", methods=["POST"])
def webhook():
    # ... verification code ...
    
    pr = validated.pull_request
    head_sha = pr["head"]["sha"]
    
    if is_already_reviewed(validated.number, head_sha):
        print(f"Skipping PR #{validated.number} — already reviewed for commit {head_sha[:7]}")
        return "OK", 200
    
    review_pr(
        validated.repository["full_name"], 
        validated.number
    )
    
    review_cache[f"{validated.number}:{head_sha}"] = datetime.now()
    
    return "OK", 200

For production, use Redis. But for smaller teams, this in-memory dict approach works perfectly.

Step 5: Deploy It

I run this on a $12/month DigitalOcean droplet. A simple Docker setup works:

dockerfile
# Dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["gunicorn", "-w", "2", "-b", "0.0.0.0:8000", "app:app"]

Add a `requirements.txt`:


flask==3.0.0
anthropic==0.30.0
PyGithub==2.1.1
pydantic==2.5.0
gunicorn==21.2.0

Configure GitHub to send webhooks to `https://your-server.com/webhook` with the `pull_request` event selected. Done.

Real Results: What We Caught in the First Week

Within the first week, this AI reviewer caught:

  1. A hardcoded API key in a test file that would have been merged to `main`. The dev had pushed it by accident. Claude flagged it as "CRITICAL" with the line number. We deleted it immediately.
  2. A missing null check on a response from an external API. The code assumed the response would always contain a `data` field. Claude spotted the potential `KeyError` right away.
  3. A SQL injection vector in a raw query string. The developer was concatenating user input directly into a query. Claude called it out before it ever reached staging.

That last one alone paid for the server for a year.

The Costs

Claude Sonnet 4 costs $3 per million input tokens and $15 per million output tokens. An average PR review (500 lines diff, 300 tokens response) costs about $0.02. We do about 30 reviews per day. That's $0.60/day or $18/month in API costs.

Compare that to a SaaS reviewer at $50/month for a team of 5. You save 64% and get full control over the review rules.

What Doesn't Work (Be Honest)

This approach has blind spots:

  • It can't run the code. Claude will spot logical patterns and security issues, but it won't catch runtime exceptions or integration bugs. That's what your CI/CD test pipeline is for.
  • Large PRs are expensive. A 2,000-line diff costs about $0.10 to review. For huge PRs, consider splitting the review into multiple chunks. Or better yet, enforce smaller PRs through your team policy.
  • False positives happen. About 5-10% of Claude's "major issues" are false alarms. Your senior devs still need to eyeball the review. But catching 90% of the real issues automatically is still a massive time saver.

Should You Build or Buy?

Build it yourself if:

  • You need custom review rules (team conventions, framework-specific patterns)
  • You're handling sensitive code (fintech, healthcare, internal tools)
  • You want to keep costs under $20/month

Buy a SaaS tool if:

  • You don't want to maintain infrastructure
  • You need out-of-the-box integration with your existing CI/CD
  • You're a small startup with generic codebases

For my team at ECOA AI, we built it. We've got developers in Ho Chi Minh City and Can Tho working on sensitive client code. Running the review pipeline ourselves gives us the control we need.

Frequently Asked Questions

Q: Can I run this with GPT-4 instead of Claude?

Yes. Swap the Anthropic client for OpenAI's. The prompt structure stays the same. GPT-4 is slightly better at formatting but Claude is better at catching subtle bugs in our benchmarks. YMMV.

Q: How do I review PRs with multiple commits?

Listen for the `synchronize` action in the webhook. That fires when new commits are pushed to an existing PR. Our deduplication logic handles this — we only review the latest commit SHA.

Q: Will this code work with private repositories?

Absolutely. As long as your GitHub token has access to the repo, it works. We run it on private repos exclusively. Just make sure your token has `repo` scope.

Q: What's the latency? How long does a review take?

Average time from webhook receipt to comment posted is about 8-12 seconds. Claude processes the diff in about 4-5 seconds, and the GitHub API calls take another 2-3 seconds. Fast enough to feel real-time.

Related reading: Why Silicon Valley Is Quietly Flocking to Hire Vietnamese Developers

Related reading: Vietnam Outsourcing: Why Southeast Asia’s Tech Hub Is Redefining Offshore Development

Leave a Comment

Your email address will not be published. Required fields are marked *

Ready to Build with AI-Powered Developers?

Hire Vietnamese engineers augmented by ECOA AI Platform + Claude Code. 5x faster, 40% cheaper.