We Didn’t Just Automate PRs — We Built a 3-Stage Triage Pipeline That Handles 500+ Repos

1 comment
(GitHub and Open Source) - Stop drowning in open-source notifications. We built a 3-stage GitHub triage pipeline that automatically labels, routes, and prioritizes pull requests across 500+ repos. Here's the exact architecture we used with our Vietnamese team.

We Didn’t Just Automate PRs — We Built a 3-Stage Triage Pipeline That Handles 500+ Repos

Let’s be honest. If you maintain more than a handful of open-source repos, you know the pain. PRs pile up. Issues get buried. Contributors get frustrated and leave.

We were there. Managing nearly 500 repos for a client — a large open-source foundation — with a distributed team in Ho Chi Minh City and Can Tho. The manual triage was killing us. A single senior dev was spending 4+ hours a day just sorting through incoming PRs.

Why Smart Tech Leaders Hire Vietnamese Developers in 2025 (And Why You Should Too)

Why Smart Tech Leaders Hire Vietnamese Developers in 2025 (And Why You Should Too)

TL;DR: Vietnam is now a leading destination for offshore software development. With a deep STEM talent pool, stable… ...

So we stopped reacting. We built a 3-stage automated triage pipeline.

It works. Here’s exactly how we did it.

How AI is Reshaping the Software Development Lifecycle (And Why You Should Care)

How AI is Reshaping the Software Development Lifecycle (And Why You Should Care)

TL;DR: AI coding tools are transforming the quy trình phát triển phần mềm bằng AI, cutting development time… ...

Stage 1: The Webhook Listener (Don’t Poll, Listen)

The first mistake most teams make? Polling the GitHub API every 5 minutes. That’s wasteful. You’re hammering the API for no reason.

Instead, we set up a lightweight FastAPI webhook receiver. Every `pull_request` event hits our endpoint instantly.

python
# webhook_receiver.py
from fastapi import FastAPI, Request, HTTPException
import hmac, hashlib

app = FastAPI()

SECRET = os.environ["GITHUB_WEBHOOK_SECRET"]

@app.post("/webhook")
async def handle_webhook(request: Request):
    body = await request.body()
    signature = request.headers.get("x-hub-signature-256", "")
    
    # Verify the signature
    expected = "sha256=" + hmac.new(
        SECRET.encode(), body, hashlib.sha256
    ).hexdigest()
    
    if not hmac.compare_digest(signature, expected):
        raise HTTPException(403, "Invalid signature")
    
    payload = await request.json()
    if payload.get("action") in ["opened", "synchronize"]:
        pr_data = extract_pr_data(payload)
        # Push to Redis queue for processing
        redis_client.lpush("pr_queue", json.dumps(pr_data))
    
    return {"status": "ok"}

Why this matters: We reduced API call volume by 94% compared to polling. The webhook fires the moment a PR is opened — no delays.

Stage 2: The Classification Engine (Don’t Guess, Analyze)

This is where the real work happens. Our Vietnamese team built a classification engine that runs as a GitHub Action. It analyzes each PR across 3 dimensions:

  1. Code diff size — Is this a typo fix or a 5000-line refactor?
  2. File types modified — Are they touching core logic or just docs?
  3. Changed code patterns — Do they match known bug patterns?

The engine uses a lightweight Python script with `git diff` parsing. No AI model needed for the basic stuff.

Related reading: Why Vietnam outsourcing is Beating India at Its Own Game in 2025

Related reading: Outsourcing Software Development Without the Headaches: A CTO’s Playbook for 2025

Leave a Comment

Your email address will not be published. Required fields are marked *

Ready to Build with AI-Powered Developers?

Hire Vietnamese engineers augmented by ECOA AI Platform + Claude Code. 5x faster, 40% cheaper.