Your Database Is Begging for Mercy: How a Simple Python Batching Pattern Saved Our API From N+1 Hell

Look, I’m not here to sell you on yet another ORM or some magic third-party library that promises to fix all your performance problems.

I’m here to tell you about the time our API almost collapsed under the weight of something embarrassingly simple: the N+1 query problem.

Vietnam Outsourcing: Why Your Next Tech Hub Should Be in Southeast Asia

TL;DR: Vietnam outsourcing is emerging as the top alternative to India and Philippines for offshore software development. Lower… ...

You’ve seen it. Every developer has. You fetch a list of 50 customers, then loop through them to fetch their orders. That’s 1 query + 50 queries. Then you need order items. That’s another 50. Suddenly, one innocent endpoint is hammering your database with 101 queries.

And nobody notices until p99 latency hits 3.8 seconds.

Why You Should Hire Vietnamese Developers in 2025: A CTO’s Perspective

TL;DR: Vietnam is emerging as a top-tier destination for offshore software development. High retention, competitive costs, strong technical… ...

We were that team. Our client, a mid-market e-commerce platform based in New York, was seeing slowdowns every time a product listing page loaded. The database CPU was peaking at 87% during business hours. Not good.

The fix? A 60-line Python batching layer that coalesced all those requests into two bulk queries.

Here’s the exact architecture we built — and how you can replicate it today.

The Problem: Serial Queries Are a Death Wish

Most developers reach for something like this:

python
# The naive approach — don't do this
async def get_orders_for_customers(customer_ids: list[int]) -> dict:
    results = {}
    for cid in customer_ids:
        # ONE query per customer
        orders = await db.fetch("SELECT * FROM orders WHERE customer_id = $1", cid)
        results[cid] = orders
    return results

This pattern is everywhere. It’s readable. It’s testable. And it will murder your database at scale.

We benchmarked this pattern on our staging environment with just 200 customers. The results were ugly:

Customers	Queries Executed	Total Time (ms)
10	11	45
50	51	320
200	201	1,840

Notice the exponential climb? That’s not a bug. That’s math. Each query carries TCP overhead, query parsing, planning, and execution. Do it 200 times in a row and your database starts sweating.

The worst part? Our Vietnamese team in Ho Chi Minh City spotted this antipattern in our codebase within the first week. The senior on the project, a guy named Minh, literally said: *”Why are we poking the database with a toothpick when we could use a shovel?”* He wasn’t wrong.

The Fix: Batching with Request Coalescing

The solution is brutally simple.

Instead of firing N queries sequentially, you collect all the IDs, fire one bulk query, then map the results back. But there’s a catch: real-time APIs need to handle concurrent requests. You can’t just batch things synchronously.

Here’s the pattern we settled on. We call it the Lazy Batcher.

python
import asyncio
from collections import defaultdict
from typing import Callable, Awaitable

class LazyBatcher:
    """
    Coalesces multiple individual fetch requests into one bulk query.
    
    Usage:
        batcher = LazyBatcher(partial(fetch_orders_by_customer_ids))
        order_a = await batcher.load(42)
        order_b = await batcher.load(99)
        # Internally, this fires ONE query: SELECT * FROM orders WHERE customer_id IN (42, 99)
    """
    
    def __init__(self, batch_fn: Callable[[list], Awaitable[dict]], max_batch_size: int = 500):
        self._batch_fn = batch_fn
        self._max_batch_size = max_batch_size
        self._queue: dict[int, asyncio.Future] = {}
        self._loop_task: asyncio.Task | None = None
        self._lock = asyncio.Lock()
    
    async def load(self, key: int) -> dict | None:
        future = asyncio.get_event_loop().create_future()
        async with self._lock:
            self._queue[key] = future
            if self._loop_task is None:
                self._loop_task = asyncio.create_task(self._drain())
        return await future
    
    async def _drain(self):
        await asyncio.sleep(0.01)  # 10ms window to collect batch
        async with self._lock:
            keys = list(self._queue.keys())
            futures = list(self._queue.values())
            self._queue.clear()
            self._loop_task = None
        
        # Handle overflow in chunks
        for i in range(0, len(keys), self._max_batch_size):
            chunk = keys[i:i + self._max_batch_size]
            results = await self._batch_fn(chunk)
            for idx, key in enumerate(chunk):
                futures[i + idx].set_result(results.get(key))

Wait — that `asyncio.sleep(0.01)` looks like a hack, right?

Actually, it’s intentional. It creates a 10-millisecond window where concurrent `load()` calls can accumulate. Think of it like a bus that waits at the stop for 10ms before departing. Anyone who hops on within that window rides together. This is the classic DataLoader pattern, popularized by GraphQL ecosystems, but implemented here in pure Python async with zero external dependencies.

The Results: 95% Reduction in Database Queries

We deployed this to production on a Friday afternoon. Scary, I know. But the results were immediate:

Before:

4,200 queries/second during peak
p99 latency: 3,800ms
Database CPU: 87%

After:

180 queries/second during peak
p99 latency: 180ms
Database CPU: 12%

That’s a 95% reduction in database queries and a 21x latency improvement. All from 60 lines of code.

But here’s the thing that shocked me: we applied this pattern to only four endpoints. The most trafficked ones, obviously. But the ripple effect on the entire database was massive. Less contention on row locks. Better cache hit ratios. Even *unrelated* queries got faster because the DB wasn’t drowning in connection overhead.

Why Most Developers Skip This Pattern

Honestly? It feels wrong.

Holding a request for 10ms before processing it goes against every instinct a developer has. We’re trained to optimize for latency *within* a single request. But what we miss is that sometimes, a tiny, controlled delay at one layer unlocks massive efficiency gains downstream.

It’s the same principle behind HTTP/2 multiplexing, TCP segmentation, and database connection pooling. But developers rarely apply it at the application logic layer.

Another reason: it’s slightly harder to test. Instead of mocking a single query, you need to verify that multiple `load()` calls get coalesced correctly. We wrote exactly three unit tests for this:

Two concurrent loads with the same batch function → should fire one query
Staggered loads with different batch functions → should fire two queries
Overflow beyond `max_batch_size` → should chunk correctly

All three pass. That’s it.

The Real Takeaway

Your database isn’t the problem. Slow queries aren’t always the problem.

Sometimes, the problem is that you’re asking your database to do too much *at the same time* in too many tiny conversations. The N+1 pattern is a protocol problem, not a query optimization problem.

Our team in Can Tho actually extended this pattern further. They built a generic `BatchExecutor` that can handle any remote resource — not just databases. We’ve since used it for Redis MGET operations, HTTP API calls, and even S3 metadata lookups. The same 60-line skeleton applies everywhere.

So before you throw more hardware at your slow API, or sprinkle `SELECT IN` clauses manually across your codebase, try this pattern. Steal the code above. Tweak the window size. Measure the impact.

Your database will thank you.

—

Frequently Asked Questions

Q: How do I choose the right batch window size? Is 10ms always correct?

It depends on your traffic pattern. For high-throughput endpoints receiving hundreds of concurrent requests per second, 10ms is solid. We’ve seen teams push it to 50ms for batch-heavy workloads. Start at 5ms, measure your p99 coalescing rate (what % of loads actually batch), and increase until you see diminishing returns. Never go above 100ms — users *will* notice.

Q: Does this pattern work with synchronous Python (no asyncio)?

Yes, but it’s uglier. You’ll need to use `threading.Lock` and `time.sleep()` instead of async primitives. The core logic — accumulate keys in a dict, sleep briefly, then fire one bulk query — is identical. We wouldn’t recommend it for CPU-bound workloads, but for I/O-bound applications it works fine.

Q: What happens if one key in the batch fails? Do all requests fail?

Great question. In our implementation, the entire batch fails if any key causes a database error. That’s intentional — in our system, partial failures are worse than full failures because they introduce silent data loss. If you need partial success, wrap each key’s result in a `Result` monad (or a simple try/except) and resolve each future individually. We’ve done that for batch external API calls where one failed upstream service shouldn’t tank the whole batch.

Q: Can I use this with Django ORM or SQLAlchemy?

You can, but you’ll fight the ORM. SQLAlchemy 2.0’s `selectinload` already solves the N+1 problem for relationships. For arbitrary bulk lookups, you’re better off writing raw SQL in your batch function. Django’s `prefetch_related` covers most cases, but for custom aggregation queries, drop to `connection.cursor()` and map results manually. Raw SQL isn’t scary — it’s liberating.