How We Built a Custom GitHub Bot to Automate Open Source Issues and PR Management (And You Can Too)

Maintaining a popular open source project is a time sink. Honestly, it’s the part nobody talks about when they show off their GitHub stars. The code is the fun part. The issues, PRs, stale branches, and duplicate reports? That’s the grind.

We manage a distributed team of Vietnamese engineers in Ho Chi Minh City and Can Tho, and we needed a way to free up their time for actual feature work — not labeling issues at 2 AM.

Outsourcing Software in 2025: Why Vietnam Is Winning the Offshore Engineering Race

TL;DR – Outsourcing software to Vietnam delivers 30–50% cost savings, 95% developer retention, and 3-hour time zone overlap… ...

So we built a custom GitHub bot using Probot. It’s not magic. It’s Node.js, a few webhook handlers, and some careful state management. Here’s exactly how we did it, and why you should too.

Why Not Just Use GitHub Actions?

I get this question a lot. GitHub Actions are great for CI/CD. But for complex, interactive workflows — where you need to respond to comments, react to labels, or maintain cross-issue state — a bot gives you way more control.

I Tried 5 Async Python Patterns for a Crawler That Hits 1,000 Sites/Minute — Here’s What Actually Worked

I Tried 5 Async Python Patterns for a Crawler That Hits 1,000 Sites/Minute — Here’s What Actually Worked… ...

Actions trigger on events, but they’re mostly stateless. A bot can track an issue’s lifecycle across multiple events. It can maintain a local cache, call external APIs, and respond in real time.

We needed a bot that could:

Auto-label issues based on keywords
Detect duplicate reports using cosine similarity
Stale issue management with configurable timeouts
Auto-assign reviewers based on file path patterns

GitHub Actions could do parts of this, but maintaining state across multiple workflow runs is painful. Probot handles that naturally.

The Architecture: Probot + Redis + GitHub API

Our bot runs as a small Node.js service deployed on a $20/month VPS. Here’s the stack:


GitHub Webhooks → Express (Probot) → Redis (state) → GitHub API

We chose Probot because it abstracts away webhook authentication and retries. You just write event handlers.

Probot handles webhook delivery and signature verification. It ships with a built-in `app` object that’s authenticated against GitHub.
Redis stores issue metadata: timestamps, previous labels, and similarity hash for duplicate detection.
GitHub API via `@octokit/rest` to create labels, assign reviewers, post comments.

Below is the core of our configuration file. Nothing fancy.

javascript
module.exports = (app) => {
  app.on('issues.opened', async (context) => {
    const issue = context.payload.issue;
    const labels = autoLabel(issue.title, issue.body);
    if (labels.length) {
      await context.octokit.issues.addLabels({
        owner: context.payload.repository.owner.login,
        repo: context.payload.repository.name,
        issue_number: issue.number,
        labels,
      });
    }
  });
};

That `autoLabel` function is where the real logic lives. We parse the issue body for keywords like ‘bug’, ‘feature’, ‘docs’, and cross-reference with our label schema. You’ll want to define a mapping like:


const labelMap = [
  { pattern: /bug|error|crash|fail/i, label: 'bug' },
  { pattern: /feature|request|enhancement|new/i, label: 'enhancement' },
  { pattern: /doc|readme|tutorial|guide/i, label: 'documentation' },
];

It’s simple but effective. We saw a 40% reduction in manual labeling within the first week.

Handling Duplicate Issues Without Burning API Credits

The harder part was duplicate detection. Users file the same bug report under different titles all the time. We didn’t want to query GitHub’s search on every new issue — that would hit rate limits fast.

Instead, we store a lightweight fingerprint of each issue in Redis: a hash generated from normalized title and body text using a simple TF-IDF-like approach. When a new issue comes in, the bot computes its fingerprint and checks Redis for any match above a 0.85 threshold.

Here’s a simplified version:

javascript
function fingerprint(text) {
  const words = text.toLowerCase().replace(/[^a-z0-9\s]/g, '').split(/\s+/);
  const freq = {};
  words.forEach(w => { freq[w] = (freq[w] || 0) + 1; });
  // Sort by frequency, take top 20
  return Object.entries(freq).sort((a,b) => b[1]-a[1]).slice(0,20).map(([k]) => k).join(',');
}

async function checkDuplicate(context, issue) {
  const fp = fingerprint(issue.title + ' ' + issue.body);
  const existing = await context.redis.get(`fp:${fp}`);
  if (existing) {
    context.octokit.issues.createComment({ ...issue, body: `🔄 This looks similar to #${existing}. Please check before filing.` });
    return true;
  }
  await context.redis.set(`fp:${fp}`, issue.number, 'EX', 86400 * 30);
  return false;
}

We store fingerprints with a 30-day TTL. That catches most dupes without growing Redis indefinitely. The number of duplicate issues we now catch automatically? 23%. That’s nearly a quarter of all new issues that we never have to manually sort.

Stale Issue Management That Doesn’t Annoy Contributors

Stale bots are a delicate dance. Tag too early and you offend new contributors. Tag too late and your issue tracker becomes a graveyard.

Our bot implements a two-stage stale workflow:

After 60 days of inactivity, it posts a comment: *”This issue has been inactive for 60 days. Please update if this is still relevant.”*
After 90 days, if no response, it automatically closes and adds a `stale` label.

But we learned not to auto-close certain types of issues. Feature requests with multiple 👍 reactions stay open. Bugs with high severity labels stick around. So our bot checks label priority before closing.

javascript
const HIGH_PRIORITY_LABELS = ['critical', 'security', 'blocker'];

app.on('schedule.repository', async (context) => {
  const staleIssues = await context.octokit.issues.listForRepo({
    owner: context.payload.repository.owner.login,
    repo: context.payload.repository.name,
    state: 'open',
    per_page: 100,
    sort: 'updated',
    direction: 'asc',
  });
  for (const issue of staleIssues.data) {
    const daysSinceUpdate = (Date.now() - new Date(issue.updated_at).getTime()) / 86400000;
    if (daysSinceUpdate > 60 && !issue.labels.some(l => HIGH_PRIORITY_LABELS.includes(l.name))) {
      if (daysSinceUpdate > 90) {
        await context.octokit.issues.update(owner, repo, issue.number, { state: 'closed' });
        await context.octokit.issues.addLabels(owner, repo, issue.number, ['stale']);
      } else {
        await context.octokit.issues.createComment(owner, repo, issue.number, staleNotice);
      }
    }
  }
});

We schedule this check every 6 hours using Probot’s built-in `schedule` event. No cron needed.

What We Learned About Running a Bot in Production

Rate limits are real. Even with a GitHub App (higher limits), you can hit the secondary rate limit if you make too many concurrent requests. We added a simple `Promise.all` with throttling: max 10 concurrent ops.

Idempotency matters. Duplicate webhook deliveries happen. Our handlers check Redis for a processed event ID before acting. If we already processed `delivery_id`, we skip.

Log everything. Probot logs to stdout, but we added structured logging with `pino` to track every action. Helps debugging when the bot silently fails.

Test with a mirror repo first. We cloned our main repo into a private org and had the bot run there for two weeks. It caught a bug where the stale checker closed issues from the wrong repo.

The Bottom Line

After 3 months, our bot handles:

73% of issue labeling automatically
23% of duplicate detection (the rest are still manual)
100% of stale issue closing (configurable)
PR auto-assignment based on CODEOWNERS – that’s another 15-minute script we built

Total maintainer time saved: roughly 8 hours per week across our 4-person team. That’s 32 hours a month we redirected to actual feature development.

Is it perfect? No. The duplicate detection still has false positives. Sometimes it mislabels a feature request as a bug. But we’d rather fix a wrong label than spend hours reading spam.

You don’t need a distributed team in Vietnam to benefit from this. But if you have one, imagine what they can do with 8 extra hours per week.

If you want to fork our bot, the core code is up on GitHub (link in comments). Drop your own config in, deploy it, and watch your issue tracker run itself.

Ever felt like you spend more time managing open source than coding? That’s the problem we solved. And you can too.

Frequently Asked Questions

Can I run a Probot bot for free?

Yes. Probot’s `probot run` works on any Node.js host. You can deploy it on a free tier of Heroku or Railway, but for production with Redis, a $5-$10 VPS from DigitalOcean or Linode is more reliable.

Does a GitHub App bot have rate limits?

GitHub Apps get 5,000 requests per hour by default, but you can request higher limits. More importantly, use conditional requests (ETags) to avoid unnecessary API calls. Our bot stays well under the limit even with 100+ repos.

How do I handle webhook retries?

Probot uses `@octokit/webhooks` which automatically verifies signatures and includes retry logic. We added idempotency keys via Redis to ensure duplicate deliveries don’t cause double actions. It’s a simple `if (await redis.exists(deliveryId)) return`.

Can I customize the bot for multiple repos?

Absolutely. Our bot uses a YAML config per repo stored in `.github/auto-bot.yml`. The bot reads that config on startup and caches it. Each repo can have its own label mapping, stale times, and auto-assignment rules.