How We Built an AI-Powered Code Review Triage Bot That Cut Review Cycle Time by 73%
You know the feeling. You push a PR, assign reviewers, and then wait. And wait. Meanwhile, a critical bug fix sits in limbo while a trivial typo fix gets reviewed in five minutes.
That’s broken.
Hire Vietnamese Developers: The Offshore Strategy That Actually Works
TL;DR: Vietnam is now the smartest offshore destination for software development—better time zones for APAC/AUS, rising English proficiency,… ...
We faced this exact problem with a client in Ho Chi Minh City—a fintech startup with 12 developers shipping 30+ PRs daily. Reviewers were drowning. Cycle times averaged 14 hours. Developers started working around the process.
So we built a triage bot. Not a code reviewer—a smart triage system that answers one question: Which PR should a human look at first?
Vietnam Outsourcing in 2025: Why Smart CTOs Are Betting on Southeast Asia’s Rising Tech Hub
TL;DR – What’s This About? Vietnam outsourcing is no longer a "budget backup" — it’s a strategic advantage.… ...
Here’s the exact architecture, the code, and the metrics that convinced us to roll it out across all our teams.
The Problem: FIFO Reviewing Is Dumb
Most teams review PRs in the order they arrive. That’s the path of least resistance. But it’s terrible for throughput.
A one-line config change gets reviewed before a security patch that blocks the entire release. A junior dev’s first PR sits for two days while a senior’s routine refactor gets immediate attention.
We wanted a system that could:
- Detect high-risk PRs (security, core logic, database migrations)
- Flag PRs from junior devs or first-time contributors
- Identify PRs that block other work
- Surface PRs that have been waiting too long
All without adding cognitive load to reviewers.
The Architecture: Lightweight, Stateless, Cheap
We didn’t want another heavy microservice. The triage bot is a single Python script triggered by GitHub webhooks. It runs as a GitHub Actions workflow on each PR event.
PR opened/updated → GitHub webhook → GitHub Actions workflow
↓
Python triage script
↓
Call OpenAI API (GPT-4o-mini)
↓
Compute priority score (0-100)
↓
Post comment with priority badge
↓
Update PR label: `priority-high`, `priority-medium`, `priority-low`
Total cost per PR: ~$0.002 in API calls. Runs in under 8 seconds.
The Triage Algorithm: Three Signals, One Score
We combine three weighted signals:
- Risk score (40%) – Based on files changed. Changes to `migrations/`, `security/`, or core domain files get higher risk. We use a configurable YAML file with path patterns.
- Urgency score (30%) – Does this PR have linked issues marked as “blocker”? Is it a hotfix branch? How long has it been open?
- Context score (30%) – Author’s experience (number of merged PRs), PR size (lines changed), and test coverage delta.
The final score is a weighted sum, normalized to 0-100.
python
# triage_engine.py (simplified)
import yaml
from datetime import datetime, timezone
def compute_priority(pr_data, config):
risk = _risk_score(pr_data, config['risk_patterns'])
urgency = _urgency_score(pr_data)
context = _context_score(pr_data)
total = (risk * 0.4) + (urgency * 0.3) + (context * 0.3)
return min(100, max(0, total))
def _risk_score(pr_data, patterns):
score = 0
for file in pr_data['files']:
for pattern, weight in patterns.items():
if fnmatch.fnmatch(file['filename'], pattern):
score += weight
return min(100, score)
def _urgency_score(pr_data):
score = 0
# PR open more than 24 hours? +20
hours_open = (datetime.now(timezone.utc) - pr_data['created_at']).total_seconds() / 3600
if hours_open > 24:
score += 20
# Hotfix branch? +30
if pr_data['base_ref'] == 'main' and 'hotfix' in pr_data['head_ref']:
score += 30
return min(100, score)
def _context_score(pr_data):
score = 50 # baseline
# Junior dev? lower score (need more attention = higher priority)
if pr_data['author']['total_prs'] < 5:
score += 30
# Large PR? more risk
if pr_data['additions'] > 500:
score += 20
return min(100, score)
The AI Component: Lightweight Risk Classification
We use GPT-4o-mini to classify the PR description and commit messages. Why? Because some risks aren’t obvious from file paths.
A PR that changes `pricing.py` could be a trivial bug fix or a critical pricing logic change. The AI reads the description and commits, then returns a risk category.
python
# ai_classifier.py
from openai import OpenAI
import json
client = OpenAI()
def classify_pr_risk(pr_title, pr_body, commit_messages):
prompt = f"""Analyze this pull request and classify its risk level.
Return JSON with fields: risk_level (critical/high/medium/low), reason, and suggested_reviewers (list of expertise areas).
Title: {pr_title}
Description: {pr_body[:2000]}
Commits: {'; '.join(commit_messages[:5])}"""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content)
We then merge this AI classification into the risk score. If the AI says “critical”, we bump the risk score by 30 points.
The Results: 73% Faster Reviews
We ran this for 8 weeks on two teams (one in Ho Chi Minh City, one in the US). Here’s what we saw:
| Metric | Before | After | Change |
|---|---|---|---|
| Median review cycle time | 14h 22m | 3h 51m | -73% |
| PRs waiting >24h | 34% | 8% | -76% |
| Reviewer satisfaction (1-5) | 2.8 | 4.2 | +50% |
| High-priority PRs reviewed first | 42% | 91% | +117% |
The bot didn’t replace reviewers. It just told them where to look first.
Why This Matters for AI Coding Tools
Here’s the thing. Most AI coding tools focus on *writing* code. They generate PRs faster. But that just amplifies the review bottleneck.
You need AI that helps with the *process*, not just the output.
We’ve seen teams adopt AI coding assistants like Copilot or Claude Code and then complain that their review queue doubled. That’s a systems problem. A triage bot is a cheap fix.
Actually, we’re now building a more advanced version that also suggests the best reviewer based on file expertise. But that’s a story for another post.
How You Can Build This Yourself
The entire triage bot is about 200 lines of Python + a GitHub Actions workflow. You don’t need a dedicated server.
- Create a `.github/workflows/pr-triage.yml` in your repo
- Set up an OpenAI API key as a secret
- Copy the triage engine script
- Configure your risk patterns in `triage-config.yml`
Here’s the workflow file:
yaml
name: PR Triage
on:
pull_request:
types: [opened, synchronize, reopened]
jobs:
triage:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install pyyaml openai httpx
- run: python triage_engine.py
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
That’s it. No infrastructure. No cron jobs. It just works.
One Caveat
Don’t let the bot override human judgment. We made the priority labels advisory. Reviewers can override them. The bot is a suggestion engine, not a dictator.
We also added a feedback loop: if a reviewer disagrees with the priority, they can react with a 👎 emoji, and that trains the model for next time. (We log those to a simple CSV for periodic retraining.)
The Bigger Picture
We’re seeing more teams in Vietnam’s tech hubs—Ho Chi Minh City, Hanoi, Can Tho—adopt AI coding tools for automation. But the smart ones don’t just automate code generation. They automate the *workflow* around it.
A triage bot costs pennies per PR and saves hours of developer time. That’s the kind of ROI that makes CTOs smile.
Honestly, I’m surprised more teams don’t build this. It’s simple, effective, and it addresses a real pain point.
So here’s my challenge: go build your own. Start small. Triage just the PRs that touch critical paths. See if your cycle times don’t drop.
—
Frequently Asked Questions
1. Does the triage bot replace human code review?
No. It only prioritizes which PRs humans should review first. It doesn’t review code, approve PRs, or merge anything. Think of it as a smart queue manager.
2. Can I use a local LLM instead of OpenAI?
Yes. The AI classification step is optional. You can skip it entirely and rely only on the rule-based scoring. If you want local AI, swap the OpenAI call for Ollama or a local model. We tested with Llama 3.1 8B—it worked, just slower and slightly less accurate.
3. How do I handle false positives (low-priority PRs marked as high)?
Add a manual override mechanism. We use GitHub reactions (👎 to disagree). Log those cases and periodically adjust your risk pattern weights or AI prompt. Over eight weeks, our false positive rate dropped from 18% to 5%.
4. Does this work for open source repos?
Absolutely. We run it on several open source projects. The only change is you need to be careful with API costs if you have many external contributors. Set a max of 200 PRs per day or use a per-PR budget. Our cost runs about $0.50 per 100 PRs.
Related reading: Outsourcing Software Development in 2025: When, Why, and How to Build Your Offshore Engineering Team
Related reading: Hire Vietnamese Developers: Why Smart Tech Leaders Are Building Offshore Teams in Vietnam