We Didn’t Just Automate PRs — We Built a 3-Stage Triage Pipeline That Handles 500+ Repos

1 comment
(GitHub and Open Source) - Stop drowning in open-source notifications. We built a 3-stage GitHub triage pipeline that automatically labels, routes, and prioritizes pull requests across 500+ repos. Here's the exact architecture we used with our Vietnamese team.

We Didn’t Just Automate PRs — We Built a 3-Stage Triage Pipeline That Handles 500+ Repos

Let’s be honest. If you maintain more than a handful of open-source repos, you know the pain. PRs pile up. Issues get buried. Contributors get frustrated and leave.

We were there. Managing nearly 500 repos for a client — a large open-source foundation — with a distributed team in Ho Chi Minh City and Can Tho. The manual triage was killing us. A single senior dev was spending 4+ hours a day just sorting through incoming PRs.

Why Your Open Source Project Is Thriving (And 80% of Others Are Dying)

Why Your Open Source Project Is Thriving (And 80% of Others Are Dying)

Why Your Open Source Project Is Thriving (And 80% of Others Are Dying) Let’s be real. Most open… ...

So we stopped reacting. We built a 3-stage automated triage pipeline.

It works. Here’s exactly how we did it.

From Monolith to Event Stream: How We Helped a Fintech Startup Migrate 200 APIs in 8 Weeks with a Vietnamese AI-Augmented Team

From Monolith to Event Stream: How We Helped a Fintech Startup Migrate 200 APIs in 8 Weeks with a Vietnamese AI-Augmented Team

From Monolith to Event Stream: How We Helped a Fintech Startup Migrate 200 APIs in 8 Weeks with… ...

Stage 1: The Webhook Listener (Don’t Poll, Listen)

The first mistake most teams make? Polling the GitHub API every 5 minutes. That’s wasteful. You’re hammering the API for no reason.

Instead, we set up a lightweight FastAPI webhook receiver. Every `pull_request` event hits our endpoint instantly.

python
# webhook_receiver.py
from fastapi import FastAPI, Request, HTTPException
import hmac, hashlib

app = FastAPI()

SECRET = os.environ["GITHUB_WEBHOOK_SECRET"]

@app.post("/webhook")
async def handle_webhook(request: Request):
    body = await request.body()
    signature = request.headers.get("x-hub-signature-256", "")
    
    # Verify the signature
    expected = "sha256=" + hmac.new(
        SECRET.encode(), body, hashlib.sha256
    ).hexdigest()
    
    if not hmac.compare_digest(signature, expected):
        raise HTTPException(403, "Invalid signature")
    
    payload = await request.json()
    if payload.get("action") in ["opened", "synchronize"]:
        pr_data = extract_pr_data(payload)
        # Push to Redis queue for processing
        redis_client.lpush("pr_queue", json.dumps(pr_data))
    
    return {"status": "ok"}

Why this matters: We reduced API call volume by 94% compared to polling. The webhook fires the moment a PR is opened — no delays.

Stage 2: The Classification Engine (Don’t Guess, Analyze)

This is where the real work happens. Our Vietnamese team built a classification engine that runs as a GitHub Action. It analyzes each PR across 3 dimensions:

  1. Code diff size — Is this a typo fix or a 5000-line refactor?
  2. File types modified — Are they touching core logic or just docs?
  3. Changed code patterns — Do they match known bug patterns?

The engine uses a lightweight Python script with `git diff` parsing. No AI model needed for the basic stuff.

Related reading: Why Vietnam outsourcing is Beating India at Its Own Game in 2025

Related reading: Outsourcing Software Development Without the Headaches: A CTO’s Playbook for 2025

Leave a Comment

Your email address will not be published. Required fields are marked *

Ready to Build with AI-Powered Developers?

Hire Vietnamese engineers augmented by ECOA AI Platform + Claude Code. 5x faster, 40% cheaper.