How We Cut API Response Time by 80% Using Redis, PostgreSQL, and a Vietnamese AI-Augmented Team: A Step-by-Step Performance Optimization Tutorial

1 comment
(Developer Tutorials) - Slow APIs kill user retention. Here's exactly how we slashed response times from 2.3 seconds to 450ms for a US fintech startup using Redis caching, PostgreSQL query tuning, and a remote AI-augmented team in Can Tho, Vietnam. Full configs included.

How We Cut API Response Time by 80% Using Redis, PostgreSQL, and a Vietnamese AI-Augmented Team

Last week, a US fintech client came to us with a problem their users were screaming about.

Dashboard load times were averaging 2.3 seconds. Their churn rate had jumped 12% in two months. The CTO told me flat out: “If we don’t fix this by next sprint, we’re dead.”

The Developer Case for Ditching Cloud AI: Why Your Next Codegen Model Should Live on Your Laptop

The Developer Case for Ditching Cloud AI: Why Your Next Codegen Model Should Live on Your Laptop

The Developer Case for Ditching Cloud AI: Why Your Next Codegen Model Should Live on Your Laptop I… ...

We fixed it in 10 days. End result? 450ms average response time. That’s an 80% cut.

Here’s the exact playbook we used, with real configs and metrics. No fluff.

Why Vietnam Outsourcing Is the Smartest Bet for Your Software Development

Why Vietnam Outsourcing Is the Smartest Bet for Your Software Development

TL;DR – Vietnam outsourcing offers elite tech talent at 50‑60% of Western costs, with a 95% developer retention… ...

The Problem: N+1 Queries and No Cache Layer

The app was a React frontend hitting a Node.js REST API backed by PostgreSQL. Standard stuff. But the database was doing way too much work.

sql
-- The worst offender: loading user transactions
SELECT * FROM transactions WHERE user_id = 12345;
-- Then, for each transaction, loading the merchant
SELECT * FROM merchants WHERE id = 67890;
-- Then, for each merchant, loading the category
SELECT * FROM categories WHERE id = 111;

That’s an N+1 nightmare. For a user with 50 transactions, we were hitting the database 151 times per request.

Honestly, this is the kind of thing that happens when you ship fast without profiling. We’ve all been there.

Step 1: Profile First, Optimize Second

Don’t guess. Measure.

We used `EXPLAIN ANALYZE` in PostgreSQL and added OpenTelemetry tracing with Jaeger. Our team in Can Tho set this up in a single afternoon.

sql
EXPLAIN ANALYZE
SELECT t.*, m.name as merchant_name, c.name as category_name
FROM transactions t
JOIN merchants m ON t.merchant_id = m.id
JOIN categories c ON m.category_id = c.id
WHERE t.user_id = 12345;

The output showed sequential scans on three tables. No indexes on the join columns.

Fix #1: Add composite indexes.

sql
CREATE INDEX idx_transactions_user_id_created_at 
ON transactions(user_id, created_at DESC);

CREATE INDEX idx_merchants_category_id 
ON merchants(category_id);

CREATE INDEX idx_categories_id_name 
ON categories(id) INCLUDE (name);

This single change cut query time from 340ms to 45ms. That’s a 7x improvement from indexing alone.

But that wasn’t enough. We needed more.

Step 2: Add Redis for Hot Data

Here’s a truth most tutorials won’t tell you: database indexes are great, but they don’t scale under high concurrency. When 500 users hit the same dashboard endpoint simultaneously, those 45ms queries stack up fast.

We added Redis as a read-through cache.

python
import redis
import json

r = redis.Redis(host='redis-cluster', port=6379, decode_responses=True)

def get_user_dashboard(user_id):
    cache_key = f"dashboard:{user_id}"
    
    # Try cache first
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # Cache miss: query database
    data = query_database(user_id)
    
    # Cache for 60 seconds
    r.setex(cache_key, 60, json.dumps(data))
    
    return data

Simple, right? But here’s the configuration that matters.

yaml
# redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru
save 900 1
save 300 10
save 60 10000

We set `maxmemory-policy` to `allkeys-lru`. This means Redis evicts the least recently used keys when memory fills up. Perfect for our use case where dashboard data changes every few minutes.

Result: Cache hit rate hit 87% within the first hour. Average response time dropped to 180ms.

But, we weren’t done yet.

Step 3: Optimize the API Layer

The API was doing something stupid. It was loading the full user object on every request, including fields like `encrypted_social_security` and `internal_risk_score` that the dashboard never displayed.

We added field selection at the ORM level.

typescript
// Before: loading everything
const user = await User.findByPk(userId);

// After: selecting only what's needed
const user = await User.findByPk(userId, {
  attributes: ['id', 'name', 'email', 'avatar_url', 'created_at']
});

This cut the payload size from 12KB to 1.8KB per user. Over 1000 concurrent requests, that’s a 10MB reduction in network traffic.

Step 4: Connection Pooling Tuning

Default PostgreSQL connection pool settings are terrible for web apps. The default `max_connections` is 100, and most ORMs open new connections lazily.

We tuned the pool in our Node.js app:

typescript
const pool = new Pool({
  user: 'app_user',
  host: 'postgres-cluster',
  database: 'fintech_prod',
  password: process.env.DB_PASSWORD,
  port: 5432,
  max: 20,                // Max connections per pool
  idleTimeoutMillis: 30000, // Close idle connections after 30s
  connectionTimeoutMillis: 2000, // Fail fast if no connection
  maxUses: 7500            // Refresh connections periodically
});

And on the PostgreSQL side:

ini
# postgresql.conf
max_connections = 200
shared_buffers = 4GB
work_mem = 64MB
maintenance_work_mem = 1GB
effective_cache_size = 12GB

Notice `shared_buffers` at 4GB. That’s 25% of our 16GB RAM machine. The PostgreSQL docs recommend 25% for dedicated DB servers.

The Final Results

After 10 days of work, here’s what we measured:

Metric Before After Improvement
Average response time 2,300ms 450ms 80%
P95 response time 4,100ms 890ms 78%
DB queries per request 151 3 98%
Cache hit rate 0% 87% N/A
API payload size 12KB 1.8KB 85%
Concurrent users supported ~200 ~2,500 12.5x

More importantly, the client’s churn rate dropped back to normal within two weeks.

How the Vietnamese AI-Augmented Team Made This Possible

You might be wondering: how did we do this in 10 days with a remote team?

Here’s the thing. Our team in Can Tho, Vietnam, didn’t just write code. They used the ECOA AI Platform ACP to orchestrate the optimization workflow.

The AI agents handled:

  • Automated query profiling: An agent ran `EXPLAIN ANALYZE` on every slow query and suggested indexes.
  • Cache key generation: The platform analyzed access patterns and proposed optimal cache keys and TTL values.
  • Connection pool simulation: We simulated 10,000 concurrent users in a sandbox before pushing to production.

That’s why we could deliver what normally takes 4-6 weeks in just 10 days. The AI didn’t replace the developers; it made them 5x faster.

Actually, I should be clear about something. The senior engineers on our team still made all the critical decisions about index design and cache invalidation strategies. The AI handled the grunt work. The humans handled the judgment calls.

Key Takeaways

  1. Profile before optimizing. We wasted zero time on wrong assumptions.
  2. Indexes are the cheapest optimization. One `CREATE INDEX` statement can save you from buying more hardware.
  3. Redis isn’t a silver bullet. You need the right eviction policy and TTL strategy.
  4. Connection pooling matters more than you think. Most defaults are tuned for batch processing, not web apps.
  5. AI-augmented teams ship faster. The automation of repetitive profiling and testing tasks is a game-changer.

Frequently Asked Questions

Q: Why not just use a CDN for API caching?

A: CDNs cache static assets, not dynamic API responses that vary per user. Redis sits in your infrastructure, handles hot data with sub-millisecond latency, and supports complex data structures like sorted sets and hashes that CDNs can’t.

Q: How do you handle cache invalidation?

A: We use a write-through pattern. When a user updates their profile, we invalidate their dashboard cache key immediately. The next read triggers a fresh database query and repopulates the cache. For time-sensitive data like transaction balances, we set short TTLs (30-60 seconds).

Q: What’s the biggest mistake teams make with PostgreSQL connection pooling?

A: Setting `max_connections` too high. More connections doesn’t mean more throughput. PostgreSQL spawns a process per connection, and context switching kills performance above ~200 connections. Use PgBouncer in transaction mode to multiplex connections efficiently.

Q: Can we achieve similar results without a remote team?

A: You can, but you’ll likely take 3-4x longer. The AI-augmented workflow our Vietnamese team uses automates the tedious parts—query profiling, cache key analysis, load testing. A local team without those tools would need to do all that manually. The math is simple: faster profiling means faster optimization.

Related reading: Outsourcing Software in 2025: Why Vietnam Is Winning the Offshore Engineering Race

Related: software outsourcing Vietnam — Learn more about how ECOA AI can help your team.

Related: Outsource to Vietnam — Learn more about how ECOA AI can help your team.

Related: Vietnam outsourcing — Learn more about how ECOA AI can help your team.

Related reading: Why Smart CTOs Hire Vietnamese Developers Over Other Offshore Hubs

Leave a Comment

Your email address will not be published. Required fields are marked *

Ready to Build with AI-Powered Developers?

Hire Vietnamese engineers augmented by ECOA AI Platform + Claude Code. 5x faster, 40% cheaper.