How We Cut API Response Time by 80% Using Redis, PostgreSQL, and a Vietnamese AI-Augmented Team
Last week, a US fintech client came to us with a problem their users were screaming about.
Dashboard load times were averaging 2.3 seconds. Their churn rate had jumped 12% in two months. The CTO told me flat out: “If we don’t fix this by next sprint, we’re dead.”
The Developer Case for Ditching Cloud AI: Why Your Next Codegen Model Should Live on Your Laptop
The Developer Case for Ditching Cloud AI: Why Your Next Codegen Model Should Live on Your Laptop I… ...
We fixed it in 10 days. End result? 450ms average response time. That’s an 80% cut.
Here’s the exact playbook we used, with real configs and metrics. No fluff.
Why Vietnam Outsourcing Is the Smartest Bet for Your Software Development
TL;DR – Vietnam outsourcing offers elite tech talent at 50‑60% of Western costs, with a 95% developer retention… ...
The Problem: N+1 Queries and No Cache Layer
The app was a React frontend hitting a Node.js REST API backed by PostgreSQL. Standard stuff. But the database was doing way too much work.
sql
-- The worst offender: loading user transactions
SELECT * FROM transactions WHERE user_id = 12345;
-- Then, for each transaction, loading the merchant
SELECT * FROM merchants WHERE id = 67890;
-- Then, for each merchant, loading the category
SELECT * FROM categories WHERE id = 111;
That’s an N+1 nightmare. For a user with 50 transactions, we were hitting the database 151 times per request.
Honestly, this is the kind of thing that happens when you ship fast without profiling. We’ve all been there.
Step 1: Profile First, Optimize Second
Don’t guess. Measure.
We used `EXPLAIN ANALYZE` in PostgreSQL and added OpenTelemetry tracing with Jaeger. Our team in Can Tho set this up in a single afternoon.
sql
EXPLAIN ANALYZE
SELECT t.*, m.name as merchant_name, c.name as category_name
FROM transactions t
JOIN merchants m ON t.merchant_id = m.id
JOIN categories c ON m.category_id = c.id
WHERE t.user_id = 12345;
The output showed sequential scans on three tables. No indexes on the join columns.
Fix #1: Add composite indexes.
sql
CREATE INDEX idx_transactions_user_id_created_at
ON transactions(user_id, created_at DESC);
CREATE INDEX idx_merchants_category_id
ON merchants(category_id);
CREATE INDEX idx_categories_id_name
ON categories(id) INCLUDE (name);
This single change cut query time from 340ms to 45ms. That’s a 7x improvement from indexing alone.
But that wasn’t enough. We needed more.
Step 2: Add Redis for Hot Data
Here’s a truth most tutorials won’t tell you: database indexes are great, but they don’t scale under high concurrency. When 500 users hit the same dashboard endpoint simultaneously, those 45ms queries stack up fast.
We added Redis as a read-through cache.
python
import redis
import json
r = redis.Redis(host='redis-cluster', port=6379, decode_responses=True)
def get_user_dashboard(user_id):
cache_key = f"dashboard:{user_id}"
# Try cache first
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss: query database
data = query_database(user_id)
# Cache for 60 seconds
r.setex(cache_key, 60, json.dumps(data))
return data
Simple, right? But here’s the configuration that matters.
yaml
# redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000
We set `maxmemory-policy` to `allkeys-lru`. This means Redis evicts the least recently used keys when memory fills up. Perfect for our use case where dashboard data changes every few minutes.
Result: Cache hit rate hit 87% within the first hour. Average response time dropped to 180ms.
But, we weren’t done yet.
Step 3: Optimize the API Layer
The API was doing something stupid. It was loading the full user object on every request, including fields like `encrypted_social_security` and `internal_risk_score` that the dashboard never displayed.
We added field selection at the ORM level.
typescript
// Before: loading everything
const user = await User.findByPk(userId);
// After: selecting only what's needed
const user = await User.findByPk(userId, {
attributes: ['id', 'name', 'email', 'avatar_url', 'created_at']
});
This cut the payload size from 12KB to 1.8KB per user. Over 1000 concurrent requests, that’s a 10MB reduction in network traffic.
Step 4: Connection Pooling Tuning
Default PostgreSQL connection pool settings are terrible for web apps. The default `max_connections` is 100, and most ORMs open new connections lazily.
We tuned the pool in our Node.js app:
typescript
const pool = new Pool({
user: 'app_user',
host: 'postgres-cluster',
database: 'fintech_prod',
password: process.env.DB_PASSWORD,
port: 5432,
max: 20, // Max connections per pool
idleTimeoutMillis: 30000, // Close idle connections after 30s
connectionTimeoutMillis: 2000, // Fail fast if no connection
maxUses: 7500 // Refresh connections periodically
});
And on the PostgreSQL side:
ini
# postgresql.conf
max_connections = 200
shared_buffers = 4GB
work_mem = 64MB
maintenance_work_mem = 1GB
effective_cache_size = 12GB
Notice `shared_buffers` at 4GB. That’s 25% of our 16GB RAM machine. The PostgreSQL docs recommend 25% for dedicated DB servers.
The Final Results
After 10 days of work, here’s what we measured:
| Metric | Before | After | Improvement |
|---|---|---|---|
| Average response time | 2,300ms | 450ms | 80% |
| P95 response time | 4,100ms | 890ms | 78% |
| DB queries per request | 151 | 3 | 98% |
| Cache hit rate | 0% | 87% | N/A |
| API payload size | 12KB | 1.8KB | 85% |
| Concurrent users supported | ~200 | ~2,500 | 12.5x |
More importantly, the client’s churn rate dropped back to normal within two weeks.
How the Vietnamese AI-Augmented Team Made This Possible
You might be wondering: how did we do this in 10 days with a remote team?
Here’s the thing. Our team in Can Tho, Vietnam, didn’t just write code. They used the ECOA AI Platform ACP to orchestrate the optimization workflow.
The AI agents handled:
- Automated query profiling: An agent ran `EXPLAIN ANALYZE` on every slow query and suggested indexes.
- Cache key generation: The platform analyzed access patterns and proposed optimal cache keys and TTL values.
- Connection pool simulation: We simulated 10,000 concurrent users in a sandbox before pushing to production.
That’s why we could deliver what normally takes 4-6 weeks in just 10 days. The AI didn’t replace the developers; it made them 5x faster.
Actually, I should be clear about something. The senior engineers on our team still made all the critical decisions about index design and cache invalidation strategies. The AI handled the grunt work. The humans handled the judgment calls.
Key Takeaways
- Profile before optimizing. We wasted zero time on wrong assumptions.
- Indexes are the cheapest optimization. One `CREATE INDEX` statement can save you from buying more hardware.
- Redis isn’t a silver bullet. You need the right eviction policy and TTL strategy.
- Connection pooling matters more than you think. Most defaults are tuned for batch processing, not web apps.
- AI-augmented teams ship faster. The automation of repetitive profiling and testing tasks is a game-changer.
Frequently Asked Questions
Q: Why not just use a CDN for API caching?
A: CDNs cache static assets, not dynamic API responses that vary per user. Redis sits in your infrastructure, handles hot data with sub-millisecond latency, and supports complex data structures like sorted sets and hashes that CDNs can’t.
Q: How do you handle cache invalidation?
A: We use a write-through pattern. When a user updates their profile, we invalidate their dashboard cache key immediately. The next read triggers a fresh database query and repopulates the cache. For time-sensitive data like transaction balances, we set short TTLs (30-60 seconds).
Q: What’s the biggest mistake teams make with PostgreSQL connection pooling?
A: Setting `max_connections` too high. More connections doesn’t mean more throughput. PostgreSQL spawns a process per connection, and context switching kills performance above ~200 connections. Use PgBouncer in transaction mode to multiplex connections efficiently.
Q: Can we achieve similar results without a remote team?
A: You can, but you’ll likely take 3-4x longer. The AI-augmented workflow our Vietnamese team uses automates the tedious parts—query profiling, cache key analysis, load testing. A local team without those tools would need to do all that manually. The math is simple: faster profiling means faster optimization.
Related reading: Outsourcing Software in 2025: Why Vietnam Is Winning the Offshore Engineering Race
Related: software outsourcing Vietnam — Learn more about how ECOA AI can help your team.
Related: Outsource to Vietnam — Learn more about how ECOA AI can help your team.
Related: Vietnam outsourcing — Learn more about how ECOA AI can help your team.
Related reading: Why Smart CTOs Hire Vietnamese Developers Over Other Offshore Hubs