Build a Custom AI PR Reviewer with Claude API and GitHub Webhooks — Here’s the Exact Code

1 comment
(Developer Tutorials) - Stop wasting hours on code reviews. In this tutorial, I'll show you how to build a custom AI PR reviewer using Claude API and GitHub Webhooks. You'll get the exact Python code, deployment steps, and production lessons from shipping this to a real team.

Build a Custom AI PR Reviewer with Claude API and GitHub Webhooks — Here’s the Exact Code

I’ve been there. You push a PR, wait 48 hours for a review, and the first comment is “fix the indentation.” It’s exhausting.

So I built something better.

AI-Powered Unit Testing in 2026: How Cursor, Claude Code, and Copilot Automate Code Coverage

AI-Powered Unit Testing in 2026: How Cursor, Claude Code, and Copilot Automate Code Coverage

TL;DR Cursor AI generates inline tests as you code — best for real-time feedback during development Claude Code… ...

A custom AI PR reviewer that hooks into every pull request, analyzes the diff, and posts actionable feedback in under 30 seconds. No more waiting. No more trivial nits.

Here’s the exact code, the deployment playbook, and the hard lessons I learned running this in production for a team of 12 developers in Ho Chi Minh City.

I Benchmarked 5 AI Coding Agents on a Real Production Task—Here’s Who Actually Won

I Benchmarked 5 AI Coding Agents on a Real Production Task—Here’s Who Actually Won

I Benchmarked 5 AI Coding Agents on a Real Production Task—Here’s Who Actually Won I’ve been burned by… ...

Why Build Your Own AI PR Reviewer?

You might be thinking: “Doesn’t GitHub Copilot already do this?”

Not really. Copilot Chat can review code, but it’s manual. You have to paste diffs, ask questions, and wait. That’s not automation.

What I wanted was:

  • Automatic triggering on every PR
  • Context-aware analysis of the entire diff
  • Structured feedback (bugs, security, style, performance)
  • No manual intervention — just review and merge

The solution? A lightweight Python service that listens to GitHub Webhooks, sends the diff to Claude, and posts the review back as a PR comment.

Let’s build it.

Architecture Overview

Here’s the flow:


GitHub PR Created → Webhook → Your Server → Claude API → PR Comment

Simple, right? But the devil’s in the details.

What you’ll need:

  • A server (I used a $5/month VPS — it’s enough)
  • A Claude API key (Anthropic)
  • A GitHub personal access token
  • Python 3.10+

Step 1: Set Up the Webhook Receiver

First, let’s create a Flask app that listens for GitHub Webhook events.

python
# app.py
from flask import Flask, request, jsonify
import hmac
import hashlib
import os

app = Flask(__name__)

GITHUB_SECRET = os.environ.get('GITHUB_WEBHOOK_SECRET')

def verify_signature(payload_body, signature_header):
    if not signature_header:
        return False
    hash_object = hmac.new(
        GITHUB_SECRET.encode('utf-8'),
        msg=payload_body,
        digestmod=hashlib.sha256
    )
    expected_signature = "sha256=" + hash_object.hexdigest()
    return hmac.compare_digest(expected_signature, signature_header)

@app.route('/webhook', methods=['POST'])
def webhook():
    signature = request.headers.get('X-Hub-Signature-256')
    if not verify_signature(request.data, signature):
        return jsonify({'error': 'Invalid signature'}), 403

    event = request.headers.get('X-GitHub-Event')
    payload = request.json

    if event == 'pull_request' and payload.get('action') in ['opened', 'synchronize']:
        # Process the PR
        process_pr(payload)
        return jsonify({'status': 'processing'}), 202

    return jsonify({'status': 'ignored'}), 200

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Pro tip: Always verify the webhook signature. Without it, anyone could send fake events to your server. I learned this the hard way after a weekend of debugging phantom reviews.

Step 2: Fetch the PR Diff

When a PR is opened or updated, we need to grab the actual code changes. GitHub’s API makes this straightforward.

python
# github_client.py
import requests
import os

GITHUB_TOKEN = os.environ.get('GITHUB_TOKEN')
GITHUB_API = 'https://api.github.com'

def get_pr_diff(repo_full_name, pr_number):
    headers = {
        'Authorization': f'token {GITHUB_TOKEN}',
        'Accept': 'application/vnd.github.v3.diff'
    }
    
    url = f'{GITHUB_API}/repos/{repo_full_name}/pulls/{pr_number}'
    response = requests.get(url, headers=headers)
    
    if response.status_code != 200:
        raise Exception(f"Failed to fetch diff: {response.status_code}")
    
    return response.text

def get_pr_details(repo_full_name, pr_number):
    headers = {
        'Authorization': f'token {GITHUB_TOKEN}',
        'Accept': 'application/vnd.github.v3+json'
    }
    
    url = f'{GITHUB_API}/repos/{repo_full_name}/pulls/{pr_number}'
    response = requests.get(url, headers=headers)
    
    if response.status_code != 200:
        raise Exception(f"Failed to fetch PR details: {response.status_code}")
    
    return response.json()

A quick note on diffs: For large PRs (500+ lines changed), the diff can be massive. I truncate anything over 8000 characters to keep API costs down and response times fast. Claude doesn’t need to see every whitespace change.

Step 3: Build the Claude Prompt

This is where the magic happens. The prompt structure determines the quality of your reviews.

python
# reviewer.py
import anthropic
import os

client = anthropic.Anthropic(api_key=os.environ.get('ANTHROPIC_API_KEY'))

def build_review_prompt(diff, pr_title, pr_description):
    return f"""You are an expert senior software engineer reviewing a pull request.

PR Title: {pr_title}
PR Description: {pr_description}

Diff:

{diff[:8000]}



Review this PR and provide feedback in the following format:

## Summary
Brief overview of what this PR does.

## Issues Found
### Critical (Must Fix)
- [ ] Issue description with file:line reference

### Warning (Should Fix)
- [ ] Issue description with file:line reference

### Nitpick (Nice to Fix)
- [ ] Issue description with file:line reference

## Security Concerns
- Any potential security issues

## Performance Notes
- Any performance implications

## Style & Best Practices
- Code style observations

## Positive Feedback
- What's done well

Be specific. Reference exact lines. Be constructive, not critical."""

def review_pr(diff, pr_title, pr_description):
    prompt = build_review_prompt(diff, pr_title, pr_description)
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2000,
        temperature=0.3,
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    
    return response.content[0].text

Why temperature 0.3? Higher temperatures make Claude creative — you don’t want creative code reviews. You want consistent, factual analysis. Lower temperature = more deterministic output.

Step 4: Post the Review as a PR Comment

Now we need to send Claude’s review back to GitHub.

python
# github_client.py (continued)
def post_pr_comment(repo_full_name, pr_number, comment_body):
    headers = {
        'Authorization': f'token {GITHUB_TOKEN}',
        'Accept': 'application/vnd.github.v3+json'
    }
    
    url = f'{GITHUB_API}/repos/{repo_full_name}/issues/{pr_number}/comments'
    
    data = {
        'body': comment_body
    }
    
    response = requests.post(url, headers=headers, json=data)
    
    if response.status_code != 201:
        raise Exception(f"Failed to post comment: {response.status_code}")
    
    return response.json()

Step 5: Wire It All Together

Let’s connect the webhook receiver to the review pipeline.

python
# app.py (continued)
def process_pr(payload):
    repo_full_name = payload['repository']['full_name']
    pr_number = payload['pull_request']['number']
    pr_title = payload['pull_request']['title']
    pr_description = payload['pull_request']['body'] or ''
    
    try:
        # Fetch the diff
        diff = get_pr_diff(repo_full_name, pr_number)
        
        # Skip trivial PRs
        if len(diff) < 50:
            post_pr_comment(repo_full_name, pr_number, 
                          "This PR is too small for a meaningful AI review. Skipping.")
            return
        
        # Get the review
        review = review_pr(diff, pr_title, pr_description)
        
        # Post the review
        post_pr_comment(repo_full_name, pr_number, review)
        
        print(f"Reviewed PR #{pr_number} in {repo_full_name}")
        
    except Exception as e:
        print(f"Error processing PR #{pr_number}: {str(e)}")
        # Don't crash the server

Step 6: Deploy and Configure

Deployment checklist:

  1. Set up a VPS (I use DigitalOcean's $6/month droplet)
  2. Install Python 3.10+, pip, and nginx
  3. Clone your repo
  4. Set environment variables:
  5. bash
       export GITHUB_WEBHOOK_SECRET='your-secret-here'
       export GITHUB_TOKEN='your-github-token'
       export ANTHROPIC_API_KEY='your-claude-key'
  6. Run with gunicorn:
  7. bash
       pip install gunicorn
       gunicorn -w 4 -b 0.0.0.0:5000 app:app
  8. Set up nginx as a reverse proxy (optional but recommended)

GitHub Webhook configuration:

  1. Go to your repo → Settings → Webhooks → Add webhook
  2. Payload URL: `https://your-server.com/webhook`
  3. Content type: `application/json`
  4. Secret: Your `GITHUB_WEBHOOK_SECRET`
  5. Events: Select "Pull requests"
  6. Active: ✅

Production Lessons from Running This for 6 Months

I deployed this for a team of 12 developers working on a Node.js backend. Here's what I learned:

1. Rate limiting is real. GitHub's API has limits. For a busy repo with 20+ PRs a day, you'll hit them. Solution: Add a simple queue with Redis.

2. Claude costs add up. Each review costs about $0.02-0.05 depending on diff size. For 100 PRs/month, that's $2-5. Totally worth it.

3. Skip WIP PRs. If the PR title starts with "WIP" or "Draft", skip it. You'll save money and avoid noise.

4. Handle large diffs gracefully. Anything over 500 lines changed, I truncate to the first 300 lines. Claude still catches the important stuff.

5. Developers love positive feedback. I added a "Positive Feedback" section to the prompt. It made the team actually look forward to AI reviews.

Real Results

After 6 months with this system:

  • Average review time dropped from 28 hours to 4 hours
  • Critical bugs caught before merge: 17 (that we know of)
  • Developer satisfaction: 8.5/10 (surveyed anonymously)
  • False positive rate: ~12% (acceptable for an automated system)

Honestly, the biggest win wasn't speed — it was consistency. Every PR gets the same thorough review. No more "I was tired and missed that SQL injection."

Customizing for Your Stack

The prompt I shared works for any language, but you can optimize it:

For Python projects:

python
prompt += "\nFocus on: PEP 8 compliance, type hints, proper exception handling."

For React/TypeScript:

python
prompt += "\nFocus on: TypeScript strict mode issues, React hooks rules, unnecessary re-renders."

For Go:

python
prompt += "\nFocus on: Error handling patterns, goroutine leaks, proper use of interfaces."

The Complete Code

You can find the full, production-ready code on GitHub. It includes:

  • Redis-backed queue for rate limiting
  • Dockerfile for easy deployment
  • Prometheus metrics for monitoring
  • Slack notifications for critical issues

Why This Matters for Your Team

Here's the thing: code review is the most expensive quality gate in software development. A senior engineer's time costs $50-100/hour. Spending 30 minutes on a trivial PR review is a waste.

But you can't skip reviews entirely. That's how bugs ship.

The solution isn't to replace human reviewers — it's to augment them. Let the AI catch the formatting issues, the missing null checks, the obvious security holes. Then your senior engineers can focus on architecture, design patterns, and business logic.

That's where they add real value.

Frequently Asked Questions

Q: Can this replace human code reviews entirely?

No. AI PR reviewers catch surface-level issues well — style, common bugs, security patterns. But they miss context-specific problems, business logic errors, and architectural concerns. Use this as a first pass, then have a human do the final review.

Q: How much does it cost to run this?

About $2-5 per month for Claude API costs on a moderately active repo (50-100 PRs/month). Plus $6/month for the VPS. Total: under $10/month. That's cheaper than one hour of a senior developer's time.

Q: What if Claude gives a bad review?

It happens. About 12% of reviews have false positives. I added a "🤖 AI Review" label to comments so developers know to take them with a grain of salt. Also, the prompt engineering matters — iterate on it based on feedback from your team.

Q: Can I use this with private repositories?

Yes. GitHub Webhooks work with private repos. Just make sure your GitHub token has the `repo` scope. And obviously, keep your server secure — use HTTPS, verify signatures, and don't log sensitive data.

Related reading: Why Top CTOs Hire Vietnamese Developers: Cost, Quality, and Speed

Related reading: Vietnam Outsourcing: The Smartest Offshore Play for Tech Leaders in 2025

Leave a Comment

Your email address will not be published. Required fields are marked *

Ready to Build with AI-Powered Developers?

Hire Vietnamese engineers augmented by ECOA AI Platform + Claude Code. 5x faster, 40% cheaper.