We Slashed a Martech Startup’s Data Processing Time from 6 Hours to 18 Minutes — A Vietnam Offshore Case Study

Six hours. That’s how long it took a promising Martech startup to process their nightly customer data dump. Every morning, the CTO held his breath. Would the batch job finish before the sales team logged in?

It didn’t. Not always.

Why Smart CTOs Hire Vietnamese Developers in 2025

TL;DR: Vietnam is emerging as the top destination for offshore software development. You get skilled engineers at 30–50%… ...

And when it failed at 4 AM, nobody knew until the support tickets started rolling in at 9. The company was processing around 2.8 million user profiles per night, enriching them with behavioral data, and pushing the results into their CRM and analytics tools. The legacy pipeline was a tangled mess of Python scripts, cron jobs, and a single PostgreSQL worker that maxed out at 12 GB of RAM before choking.

They came to us. “Fix it. But don’t touch the application layer.”

I Used the GitHub API to Profile 500 Active Repos: The 5 Metrics That Predict Open Source Longevity

I Used the GitHub API to Profile 500 Active Repos: The 5 Metrics That Predict Open Source Longevity… ...

Fair constraint. The app itself was solid. The data pipeline was the problem.

The Problem: Batch Processing at Its Worst

Let’s be specific about what was broken. The original architecture looked like this:

A cron job kicked off at midnight.
A single Python process pulled raw event data from a MongoDB replica set.
It ran 12 sequential enrichment steps: IP geolocation, user-agent parsing, session stitching, funnel mapping, lead scoring, and a few custom API calls to a third-party intent data provider.
It wrote results back to PostgreSQL and a Redshift data warehouse.

The bottleneck was obvious: serial execution with no parallelism. Each enrichment step had to finish before the next one started. If the IP geolocation API had a 5-second timeout, the entire pipeline waited. And it did. Often.

The average latency per user profile was 7.8 milliseconds. Multiply that by 2.8 million, and you get 6 hours of pure wall-clock time. That’s not counting failures, retries, or the occasional OOM kill.

Our Approach: Multi-Agent Orchestration on ECOA AI Platform ACP

We didn’t rewrite the whole thing. That’s a rookie mistake. Instead, we decomposed the pipeline into discrete, independent tasks and used the ECOA AI Platform ACP (Agent Coordination Platform) to orchestrate them.

The plan was simple:

Break the monolithic Python script into 7 specialized AI agents.
Each agent handles exactly one enrichment step.
Agents run in parallel on a pool of lightweight workers.
A coordinator agent manages the data flow, handles failures, and retries with exponential backoff.

We assembled a team of 4 senior Vietnamese developers in our Can Tho hub. They had deep experience with distributed systems and Python async patterns. Honestly, I was skeptical at first. Can Tho isn’t Ho Chi Minh City — it’s quieter, more laid back. But the talent there is real. These engineers had shipped production-grade data pipelines for logistics and fintech clients before.

They started by profiling the existing code. Found a hidden gem: the session stitching step was doing a full table scan on a 500 GB MongoDB collection. Every. Single. Night.

“We can fix that with a compound index,” the lead engineer said. “It’ll take 30 minutes to deploy.”

It took 22 minutes. That single change cut the session stitching time from 45 minutes to 3.

The Architecture: What We Actually Built

Here’s the high-level design. I’m keeping it real — no buzzwords, just the actual components.

Data Ingestion Agent

Reads raw events from MongoDB change streams (not batch dumps).
Publishes events to a Redis Stream with a TTL of 24 hours.
Handles backpressure via a configurable concurrency limit.

Enrichment Agent Pool (7 agents)

Each agent subscribes to a specific Redis stream key.
Agents are stateless. They receive a single user profile, process it, and publish the enriched result to the next stream.
We used `asyncio` with `aiohttp` for all HTTP calls. No blocking I/O.
Timeout per agent: 2 seconds. If an agent exceeds that, the coordinator marks it as failed and retries up to 3 times.

Coordinator Agent

Tracks the state of each user profile through a DAG of enrichment steps.
Uses a Redis-backed state store with Lua scripts for atomic updates.
If an enrichment step fails after 3 retries, the coordinator routes the profile to a dead-letter queue (DLQ) for manual review.

Output Agent

Batches enriched profiles into chunks of 500.
Writes to PostgreSQL via `COPY` (not individual INSERTs).
Simultaneously pushes to Redshift via a staging S3 bucket.

The code for the coordinator state machine? About 200 lines of Python. We used the ECOA AI Platform ACP’s built-in DAG runner, which handles state persistence and retry logic out of the box. No need to reinvent the wheel.

python
# Simplified coordinator agent logic
async def process_profile(profile_id, dag_steps):
    state = await redis.hgetall(f"profile:{profile_id}")
    for step in dag_steps:
        if state.get(step.name) == "completed":
            continue
        try:
            result = await step.run(profile_id, timeout=2.0)
            await redis.hset(f"profile:{profile_id}", step.name, "completed")
        except Exception as e:
            await redis.hset(f"profile:{profile_id}", step.name, "failed")
            await dlq.push(profile_id, step.name, str(e))
            break

That’s it. No magic. Just clean, testable code.

The Results: Hard Numbers

We deployed the new pipeline in 3 weeks. The migration was seamless — we ran both pipelines in parallel for 2 nights to validate output consistency.

Metric	Before	After	Improvement
Total processing time	6 hours 12 minutes	18 minutes 47 seconds	95% reduction
Cloud compute cost (per night)	$847	$321	62% reduction
Failed profiles (per night)	12,000+	47	99.6% reduction
Developer time for maintenance	8 hours/week	1 hour/week	87% reduction

The cloud cost drop surprised even us. Here’s why: the old system provisioned a massive EC2 instance (m5.4xlarge) to handle the batch load. The new system uses a pool of 8 smaller instances (t3.medium) that auto-scale based on Redis stream depth. During low-traffic periods, the pool shrinks to 2 instances. The ECOA AI Platform ACP handles this scaling automatically through its agent lifecycle management.

More importantly, the Martech startup’s sales team now has fresh data by 6:30 AM instead of waiting until noon. Their CRM enrichment pipeline — which used to lag by a full day — now updates in near real-time.

What Made This Work? The Vietnam Team + AI Orchestration Combo

I’ve been doing offshore development for over a decade. I’ve seen projects fail because of communication gaps, timezone mismatches, and cultural misunderstandings. This one didn’t fail. Here’s why.

The Vietnamese engineers owned the problem. They didn’t wait for detailed specs. They dug into the code, found the MongoDB index issue, and fixed it without being asked. That’s the kind of ownership you don’t get from junior devs in other markets.

The ECOA AI Platform ACP eliminated boilerplate. We didn’t have to build the retry logic, state persistence, or worker pool management from scratch. The platform handled it. Our team focused on the business logic — the enrichment steps that actually generate value.

Timezone overlap was a non-issue. Can Tho is UTC+7. Our US-based client is UTC-5. We had a 4-hour overlap window every morning. The Vietnamese team used that window for standups and code reviews. The rest of the day, they worked async. It’s a rhythm that works if you trust your team.

The Hardest Lesson: Don’t Over-Engineer the Orchestration

Here’s a mistake we almost made. Early in the project, we tried to build a complex priority queue for the enrichment agents. “What if a VIP user profile needs faster processing?” the client asked.

We spent 3 days designing a priority system. Then we tested it. The difference in processing time for VIP vs. regular profiles? About 200 milliseconds. Total.

We ripped it out. The simple FIFO queue with a single concurrency limit worked perfectly. More importantly, it reduced the code complexity by 40%.

Sometimes the simplest orchestration is the best orchestration. Don’t build a priority system until you have data proving you need one.

Frequently Asked Questions

Q: How did you handle API rate limits from the third-party enrichment providers?

We built a token bucket rate limiter inside each enrichment agent. The agent checks its token count before making an API call. If tokens are exhausted, it waits and retries. The coordinator doesn’t need to know about rate limits — each agent manages its own quota. This kept the orchestration layer clean and the individual agents self-sufficient.

Q: Could this architecture work for non-Martech pipelines?

Absolutely. We’ve reused the same pattern for logistics tracking, financial transaction processing, and even social media content moderation. The key is the decomposition step: identify tasks that can run independently and have a coordinator manage the DAG. The ECOA AI Platform ACP’s agent model is generic enough to handle any data processing workflow.

Q: Why did you choose Redis over Kafka for the event stream?

Simplicity. The client didn’t need long-term event storage or replay capabilities. Redis Streams provide exactly-once delivery with minimal operational overhead. Kafka would have been overkill for a pipeline processing 2.8 million events per night. We use Kafka for higher-throughput systems, but Redis was the right call here. It also reduced the cloud cost by eliminating an extra managed service.

Q: How do you ensure data consistency across the parallel agents?

Each user profile is processed independently, so there’s no shared state to corrupt. The coordinator tracks each profile through its DAG using Redis atomic operations. If a profile fails at any step, it goes to the DLQ and is reprocessed later. We also run a nightly reconciliation job that compares the raw MongoDB data with the enriched PostgreSQL output. If there’s a mismatch, the coordinator re-processes the affected profiles. So far, the mismatch rate is below 0.01%.