Build a Production-Ready Python Caching Layer with Redis: A Step-by-Step Developer Tutorial
You’ve got a slow endpoint. Hitting the database on every request. Users are feeling the lag.
Throwing more hardware at it is the expensive way out. Adding a caching layer? That’s the engineer’s move.
Outsourcing Software Development in 2025: How to Build High-Performance Remote Teams That Actually Deliver
TL;DR: Outsourcing software isn’t dead — but the old models are. This guide covers how to build high-performance… ...
But here’s the problem I see everywhere: developers treat caching like a afterthought. A `cache.get()` here, a `cache.set()` there. No consistency. No TTL strategy. No invalidation plan.
That’s not a caching layer. That’s technical debt with an expiration date.
Local AI Coding Assistants in 2025: Why Running LLMs on Your Laptop Beats the Cloud for Daily Development
Local AI Coding Assistants in 2025: Why Running LLMs on Your Laptop Beats the Cloud for Daily Development… ...
I spent two years building and maintaining the caching infrastructure for a fintech startup in Ho Chi Minh City. We processed about 50,000 requests per minute at peak. A badly configured cache took us down twice. Yes, *the cache* caused the outage. More on that later.
This tutorial walks you through building a production-grade caching layer in Python using Redis. You’ll get real code, concrete benchmarks, and the exact patterns we use at ECOA AI.
Why Redis? (And Why Not Something Else)
Redis is the default for a reason. It’s fast, it’s battle-tested, and it supports data structures that make caching genuinely useful.
Some numbers from our production setup at ECOA:
- P99 read latency: 1.2ms on a basic `m5.large` EC2 instance
- P99 write latency: 2.1ms
- Throughput: Sustained 40,000 ops/second on a single Redis node
Compare that to an in-memory Python dictionary cache: faster at sub-millisecond reads, but it dies the moment your process restarts. Every deployment becomes a cache flush.
Memcached? Faster for simple key-value, but it lacks Redis’s built-in TTL, data structures, and persistence options. For a caching *layer*, Redis wins.
But don’t use Redis as a primary database. That’s a different conversation.
The Core Pattern: A Generic Cache Decorator
Here’s the foundation. A reusable decorator that wraps any function with caching logic.
python
import functools
import hashlib
import json
from typing import Any, Callable, Optional
import redis.asyncio as aioredis
class RedisCache:
def __init__(self, redis_client: aioredis.Redis, default_ttl: int = 300):
self.client = redis_client
self.default_ttl = default_ttl
def _make_key(self, prefix: str, args: tuple, kwargs: dict) -> str:
"""Generate a deterministic cache key from function arguments."""
raw = f"{prefix}:{str(args)}:{json.dumps(kwargs, sort_keys=True)}"
return f"cache:{hashlib.md5(raw.encode()).hexdigest()}"
def cache(self, ttl: Optional[int] = None, prefix: str = "default"):
"""Decorator that caches async function results in Redis."""
def decorator(func: Callable) -> Callable:
@functools.wraps(func)
async def wrapper(*args, **kwargs):
cache_key = self._make_key(prefix, args, kwargs)
cached = await self.client.get(cache_key)
if cached is not None:
return json.loads(cached)
result = await func(*args, **kwargs)
await self.client.setex(
cache_key, ttl or self.default_ttl, json.dumps(result)
)
return result
return wrapper
return decorator
Three things matter here:
- Key generation: Use `hashlib.md5` with sorted JSON kwargs. Without `sort_keys=True`, the same arguments in different order create different keys. That’s a cache-miss death spiral waiting to happen.
- TTL per function: Not every piece of data ages the same. User profile data? Cache for 5 minutes. Product inventory? 30 seconds. More on this below.
- Async first: If you’re not using async in 2026, you’re leaving throughput on the table. This pattern works with FastAPI, aiohttp, or any async Python framework.
TTL Strategy: The Art of Setting Expiration
The most common caching mistake? Using one TTL for everything.
Here’s the rule we follow at ECOA:
| Data Type | TTL | Rationale |
|---|---|---|
| User session data | 15 minutes | Changes rarely, high read frequency |
| Product catalog | 5 minutes | Updated via CMS, acceptable staleness |
| Real-time inventory | 30 seconds | Stale inventory causes overselling |
| Aggregated reports | 1 hour | Computed daily, cached aggressively |
| API responses from external services | 60 seconds | Rate limit protection + latency |
Pro tip: Set your TTL to a random offset to avoid cache stampedes.
python
import random
def jittered_ttl(base_ttl: int, jitter_pct: float = 0.1) -> int:
"""Add ±10% jitter to prevent thundering herd at expiry."""
jitter = int(base_ttl * jitter_pct)
return base_ttl + random.randint(-jitter, jitter)
We use this exact function. When 10,000 requests hit the same key right as it expires, you don’t want all of them slamming your database at once. A small random offset spreads
Related reading: Vietnam Outsourcing: The Data-Driven Case for Choosing Vietnam as Your Offshore Dev Hub
Related reading: Outsourcing Software in 2025: Why Vietnam Is the Smartest Bet for Your Engineering Team