How We Built an AI-Powered Code Review Triage Bot That Cut Review Cycle Time by 73%

You know the feeling. You push a PR, assign reviewers, and then wait. And wait. Meanwhile, a critical bug fix sits in limbo while a trivial typo fix gets reviewed in five minutes.

That’s broken.

Hire Vietnamese Developers: The Offshore Strategy That Actually Works

TL;DR: Vietnam is now the smartest offshore destination for software development—better time zones for APAC/AUS, rising English proficiency,… ...

We faced this exact problem with a client in Ho Chi Minh City—a fintech startup with 12 developers shipping 30+ PRs daily. Reviewers were drowning. Cycle times averaged 14 hours. Developers started working around the process.

So we built a triage bot. Not a code reviewer—a smart triage system that answers one question: Which PR should a human look at first?

Vietnam Outsourcing in 2025: Why Smart CTOs Are Betting on Southeast Asia’s Rising Tech Hub

TL;DR – What’s This About? Vietnam outsourcing is no longer a "budget backup" — it’s a strategic advantage.… ...

Here’s the exact architecture, the code, and the metrics that convinced us to roll it out across all our teams.

The Problem: FIFO Reviewing Is Dumb

Most teams review PRs in the order they arrive. That’s the path of least resistance. But it’s terrible for throughput.

A one-line config change gets reviewed before a security patch that blocks the entire release. A junior dev’s first PR sits for two days while a senior’s routine refactor gets immediate attention.

We wanted a system that could:

Detect high-risk PRs (security, core logic, database migrations)
Flag PRs from junior devs or first-time contributors
Identify PRs that block other work
Surface PRs that have been waiting too long

All without adding cognitive load to reviewers.

The Architecture: Lightweight, Stateless, Cheap

We didn’t want another heavy microservice. The triage bot is a single Python script triggered by GitHub webhooks. It runs as a GitHub Actions workflow on each PR event.


PR opened/updated → GitHub webhook → GitHub Actions workflow
                                      ↓
                              Python triage script
                                      ↓
                              Call OpenAI API (GPT-4o-mini)
                                      ↓
                              Compute priority score (0-100)
                                      ↓
                              Post comment with priority badge
                                      ↓
                              Update PR label: `priority-high`, `priority-medium`, `priority-low`

Total cost per PR: ~$0.002 in API calls. Runs in under 8 seconds.

The Triage Algorithm: Three Signals, One Score

We combine three weighted signals:

Risk score (40%) – Based on files changed. Changes to `migrations/`, `security/`, or core domain files get higher risk. We use a configurable YAML file with path patterns.

Urgency score (30%) – Does this PR have linked issues marked as “blocker”? Is it a hotfix branch? How long has it been open?

Context score (30%) – Author’s experience (number of merged PRs), PR size (lines changed), and test coverage delta.

The final score is a weighted sum, normalized to 0-100.

python
# triage_engine.py (simplified)
import yaml
from datetime import datetime, timezone

def compute_priority(pr_data, config):
    risk = _risk_score(pr_data, config['risk_patterns'])
    urgency = _urgency_score(pr_data)
    context = _context_score(pr_data)
    
    total = (risk * 0.4) + (urgency * 0.3) + (context * 0.3)
    return min(100, max(0, total))

def _risk_score(pr_data, patterns):
    score = 0
    for file in pr_data['files']:
        for pattern, weight in patterns.items():
            if fnmatch.fnmatch(file['filename'], pattern):
                score += weight
    return min(100, score)

def _urgency_score(pr_data):
    score = 0
    # PR open more than 24 hours? +20
    hours_open = (datetime.now(timezone.utc) - pr_data['created_at']).total_seconds() / 3600
    if hours_open > 24:
        score += 20
    # Hotfix branch? +30
    if pr_data['base_ref'] == 'main' and 'hotfix' in pr_data['head_ref']:
        score += 30
    return min(100, score)

def _context_score(pr_data):
    score = 50  # baseline
    # Junior dev? lower score (need more attention = higher priority)
    if pr_data['author']['total_prs'] < 5:
        score += 30
    # Large PR? more risk
    if pr_data['additions'] > 500:
        score += 20
    return min(100, score)

The AI Component: Lightweight Risk Classification

We use GPT-4o-mini to classify the PR description and commit messages. Why? Because some risks aren’t obvious from file paths.

A PR that changes `pricing.py` could be a trivial bug fix or a critical pricing logic change. The AI reads the description and commits, then returns a risk category.

python
# ai_classifier.py
from openai import OpenAI
import json

client = OpenAI()

def classify_pr_risk(pr_title, pr_body, commit_messages):
    prompt = f"""Analyze this pull request and classify its risk level.
Return JSON with fields: risk_level (critical/high/medium/low), reason, and suggested_reviewers (list of expertise areas).

Title: {pr_title}
Description: {pr_body[:2000]}
Commits: {'; '.join(commit_messages[:5])}"""

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"}
    )
    return json.loads(response.choices[0].message.content)

We then merge this AI classification into the risk score. If the AI says “critical”, we bump the risk score by 30 points.

The Results: 73% Faster Reviews

We ran this for 8 weeks on two teams (one in Ho Chi Minh City, one in the US). Here’s what we saw:

Metric	Before	After	Change
Median review cycle time	14h 22m	3h 51m	-73%
PRs waiting >24h	34%	8%	-76%
Reviewer satisfaction (1-5)	2.8	4.2	+50%
High-priority PRs reviewed first	42%	91%	+117%

The bot didn’t replace reviewers. It just told them where to look first.

Why This Matters for AI Coding Tools

Here’s the thing. Most AI coding tools focus on *writing* code. They generate PRs faster. But that just amplifies the review bottleneck.

You need AI that helps with the *process*, not just the output.

We’ve seen teams adopt AI coding assistants like Copilot or Claude Code and then complain that their review queue doubled. That’s a systems problem. A triage bot is a cheap fix.

Actually, we’re now building a more advanced version that also suggests the best reviewer based on file expertise. But that’s a story for another post.

How You Can Build This Yourself

The entire triage bot is about 200 lines of Python + a GitHub Actions workflow. You don’t need a dedicated server.

Create a `.github/workflows/pr-triage.yml` in your repo
Set up an OpenAI API key as a secret
Copy the triage engine script
Configure your risk patterns in `triage-config.yml`

Here’s the workflow file:

yaml
name: PR Triage
on:
  pull_request:
    types: [opened, synchronize, reopened]
jobs:
  triage:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install pyyaml openai httpx
      - run: python triage_engine.py
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

That’s it. No infrastructure. No cron jobs. It just works.

One Caveat

Don’t let the bot override human judgment. We made the priority labels advisory. Reviewers can override them. The bot is a suggestion engine, not a dictator.

We also added a feedback loop: if a reviewer disagrees with the priority, they can react with a 👎 emoji, and that trains the model for next time. (We log those to a simple CSV for periodic retraining.)

The Bigger Picture

We’re seeing more teams in Vietnam’s tech hubs—Ho Chi Minh City, Hanoi, Can Tho—adopt AI coding tools for automation. But the smart ones don’t just automate code generation. They automate the *workflow* around it.

A triage bot costs pennies per PR and saves hours of developer time. That’s the kind of ROI that makes CTOs smile.

Honestly, I’m surprised more teams don’t build this. It’s simple, effective, and it addresses a real pain point.

So here’s my challenge: go build your own. Start small. Triage just the PRs that touch critical paths. See if your cycle times don’t drop.

—

Frequently Asked Questions

1. Does the triage bot replace human code review?

No. It only prioritizes which PRs humans should review first. It doesn’t review code, approve PRs, or merge anything. Think of it as a smart queue manager.

2. Can I use a local LLM instead of OpenAI?

Yes. The AI classification step is optional. You can skip it entirely and rely only on the rule-based scoring. If you want local AI, swap the OpenAI call for Ollama or a local model. We tested with Llama 3.1 8B—it worked, just slower and slightly less accurate.

3. How do I handle false positives (low-priority PRs marked as high)?

Add a manual override mechanism. We use GitHub reactions (👎 to disagree). Log those cases and periodically adjust your risk pattern weights or AI prompt. Over eight weeks, our false positive rate dropped from 18% to 5%.

4. Does this work for open source repos?

Absolutely. We run it on several open source projects. The only change is you need to be careful with API costs if you have many external contributors. Set a max of 200 PRs per day or use a per-PR budget. Our cost runs about $0.50 per 100 PRs.