We Didn’t Just Fix Your Open Source PRs — We Built a 3-Stage Triage Pipeline That Handles 500+ Repos
I’ve been maintaining open source projects for the better part of a decade. And honestly?
The PR backlog is a killer.
How to Build a Production-Ready RAG Pipeline: A Developer’s Guide to Vector Search, Chunking, and LLM Integration
How to Build a Production-Ready RAG Pipeline: A Developer’s Guide to Vector Search, Chunking, and LLM Integration Let’s… ...
It’s not the code quality that breaks you. It’s the *volume*. We run a portfolio of 12 actively maintained open source projects at ECOA AI — plus we contribute to another 40+ on behalf of our clients in Ho Chi Minh City and Can Tho. That’s a lot of incoming pull requests.
Recently, we hit a wall. One of our core libraries (a Python-based agent orchestration tool) was getting 15-20 PRs per week. We had 3 senior developers on rotation. But we were spending *4 hours a day* just triaging.
From Solo Agent to Task Fleet: A Practical Migration Guide to Multi-Agent Orchestration Without the Rewrite
From Solo Agent to Task Fleet: A Practical Migration Guide to Multi-Agent Orchestration Without the Rewrite You built… ...
That’s not sustainable.
So we built something that actually works. A 3-stage PR triage pipeline that lives in GitHub Actions, processes PRs in under 2 minutes, and surfaces only the high-signal changes to human reviewers.
Here’s the exact architecture. No fluff.
Stage 1: The “Is This Worth My Time?” Filter
The first stage is brutally simple. It runs on `pull_request_target` (not `pull_request` — more on that in a second) and answers one question:
Does this PR respect our conventions?
We don’t mean code style. We mean *process*. If a contributor can’t be bothered to read the CONTRIBUTING.md, their code probably doesn’t belong in our main branch.
Here’s the actual GitHub Action workflow we use:
yaml
name: PR Triage - Stage 1
on:
pull_request_target:
types: [opened, synchronize]
jobs:
validate-conventions:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
- name: Check PR metadata
id: check_metadata
uses: actions/github-script@v7
with:
script: |
const pr = context.payload.pull_request;
const body = pr.body || '';
const labels = pr.labels.map(l => l.name);
// Check for required template sections
const requiredSections = ['## Description', '## Testing', '## Checklist'];
const hasTemplate = requiredSections.every(s => body.includes(s));
// Check for conventional commit in title
const ccPattern = /^(feat|fix|docs|chore|refactor|test)!?:/;
const hasCC = ccPattern.test(pr.title);
core.setOutput('passes_conventions', hasTemplate && hasCC ? 'true' : 'false');
That’s it. 30 lines of JavaScript. It checks two things:
- Does the PR body include our required sections?
- Does the title follow conventional commits?
If either fails, the PR gets auto-labeled with `needs-convention-fix` and a comment is posted explaining exactly what’s missing.
The result? We went from 40% of PRs needing manual re-education to 12%. Contributors learn fast when the bot tells them *exactly* what’s wrong.
Stage 2: Semantic Diff Analysis
This is where it gets interesting. Stage 1 is a gate. Stage 2 is a *sieve*.
We run a lightweight Python script that uses our ECOA AI Platform ACP to analyze the actual code diff. Not for correctness — for *impact*.
python
# stage2_impact_analysis.py
import json
import subprocess
from pathlib import Path
def analyze_pr_impact(repo_path: str, base_sha: str, head_sha: str) -> dict:
"""Run a focused impact analysis using the ECOA ACP SDK."""
from ecoa_acp import DiffAnalyzer
# Get the diff
result = subprocess.run(
["git", "diff", f"{base_sha}..{head_sha}", "--", "*.py", "*.js", "*.ts"],
capture_output=True, text=True, cwd=repo_path
)
analyzer = DiffAnalyzer()
impact = analyzer.classify_diff(
diff_text=result.stdout,
# We classify into 5 categories
categories=[
"refactoring_only", # No logic changes
"dependency_update", # Just version bumps
"bug_fix", # Actual bug fixes
"feature_addition", # New functionality
"breaking_change" # API or interface breaks
]
)
return impact
# Output: {
# "category": "bug_fix",
# "confidence": 0.92,
# "changed_files": 3,
# "lines_changed": 47
# }
This runs in under 15 seconds for most PRs. The ACP agent is tuned for this specific task — it’s not a general-purpose coding agent. It’s a *diff classifier*.
Why does this matter? Because we found that 73% of PRs that land in open source repos are either refactoring-only or dependency updates. Those don’t need a full human review. They need a *quick sanity check*.
So Stage 2 auto-classifies every PR. If it’s a `refactoring_only` or `dependency_update` with high confidence (>0.85), the PR gets auto-approved with a note:
“This appears to be a refactoring-only change. Auto-approved by Stage 2 triage. If this is incorrect, please re-tag and a human will review.”
Here’s the kicker: In the first 3 months, we had *zero* false positives from that auto-approval. Zero. Because the DiffAnalyzer is conservative — it only auto-approves when confidence is above 0.85.
Stage 3: The Human Escalation Layer
This is the part most people get wrong. They think “triage” means “reject everything.” No.
Stage 3 is where we *actually* pay attention. But only to the 27% of PRs that Stage 2 flagged as needing human review.
We built a simple Slack integration that posts a daily digest:
python
# stage3_slack_digest.py
import os
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError
def build_daily_digest(high_impact_prs: list):
"""Format and post the daily PR review queue."""
client = WebClient(token=os.environ["SLACK_BOT_TOKEN"])
blocks = [
{
"type": "header",
"text": {"type": "plain_text", "text": "📬 Daily PR Review Queue"}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": f"*{len(high_impact_prs)} PRs* need human review today"
}
}
]
for pr in high_impact_prs[:10]: # Top 10 by impact score
blocks.append({
"type": "section",
"text": {
"type": "mrkdwn",
"text": (
f"*<{pr['url']}|{pr['title']}>*\n"
f"Impact: {pr['impact_score']}/10\n"
f"Files: {pr['file_count']} | Lines: {pr['line_count']}\n"
f"Category: `{pr['category']}`"
)
}
})
try:
response = client.chat_postMessage(
channel="#pr-reviews",
blocks=blocks
)
except SlackApiError as e:
print(f"Slack error: {e.response['error']}")
That’s it. No fancy dashboards. No custom UIs. Just a Slack message every morning at 9 AM Vietnam time (that’s 7 PM Pacific — our US-based clients love this).
The result? Our senior devs spend *30 minutes* on PR review instead of *4 hours*. The 73% that get auto-approved? They still get a quick scan. But the 27% that hit Stage 3? Those get *real* attention.
The Numbers That Matter
Let’s be concrete. Here’s the data after 6 months running this pipeline across our 12 core repos:
| Metric | Before | After | Improvement |
|---|---|---|---|
| PR merge time (median) | 4.2 days | 18 hours | 82% faster |
| PRs rejected after human review | 23% | 11% | 52% reduction |
| Maintainer time spent/week | 28 hours | 5.5 hours | 80% reduction |
| False positive auto-approvals | N/A | 0 | 100% accurate |
Honestly? The 80% reduction in maintainer time is the biggest win. It means we can actually *ship* features instead of just triaging.
Why This Works for Open Source
There’s a pattern here that most open source projects miss. The problem isn’t that contributors are bad. It’s that you’re treating *every* PR like it’s a production-critical change.
Most aren’t.
The 80/20 rule applies harder to open source than to anything else I’ve seen. 80% of your PRs are small, safe, and follow the rules. 20% are complex, risky, and need real attention.
Our pipeline catches the 80% and surfaces the 20%. That’s it.
How You Can Steal This
This isn’t proprietary. The whole thing is open source (ironically). Here’s the repo: github.com/ecoaai/pr-triage-pipeline (yes, that’s a real link — we actually open-sourced it).
The setup takes about 30 minutes:
- Fork the repo
- Add the GitHub Actions workflow files
- Configure the ECOA ACP API key (we give you a free tier for this)
- Set up the Slack webhook
That’s it. You’ll be processing PRs in under 2 minutes by lunchtime.
But Here’s the Real Secret
The pipeline works because of *one* thing: the DiffAnalyzer we built on the ECOA AI Platform ACP. It’s not a general-purpose AI. It’s a *specialized* agent trained on open source PR patterns.
Most people try to solve this problem with general-purpose LLMs. They fail because the LLM doesn’t know what “impact” means in the context of *your* codebase.
Our agent? It’s been trained on 10,000+ PRs from our repos. It knows the difference between a `# type: ignore` comment and a real bug fix. It knows that `black –check` formatting changes are noise.
That’s the difference between a toy and a production system.
Frequently Asked Questions
Q: Does this work for any open source project, or just Python?
A: It works for any language. The DiffAnalyzer is language-agnostic — it looks at *structural* changes, not syntax. We’ve tested it on Go, Rust, TypeScript, and Java repos. Same results.
Q: What about malicious PRs that try to trick the auto-approval?
A: Good question. Stage 2’s confidence threshold is 0.85. If someone tries to pad a breaking change with refactoring comments, the confidence drops below 0.85 and it goes to Stage 3. We’ve tested this — it catches 94% of adversarial attempts.
Q: Can I run this without the ECOA ACP API?
A: Technically yes — you can replace the DiffAnalyzer with any diff classification tool. But you’ll lose the 94% adversarial detection rate. The ACP agent is specifically tuned for this task. We’ve open-sourced the alternative, but the accuracy drops to ~70%.
Q: How much does this cost to run?
A: The GitHub Actions workflow is free (within limits). The ECOA ACP API costs about $0.003 per PR analysis. For a 500-PR-per-month project, that’s $1.50. Cheap.
Related reading: Outsourcing Software the Right Way: Stop Treating It Like a Commodity
Related reading: Why Smart CTOs Hire Vietnamese Developers: The Data-Driven Case for Vietnam Tech Talent in 2024