How a Seed-Stage Startup Built a Full-Text Search Engine for 50M Documents in 3 Weeks Using a Vietnamese AI-Augmented Team

(Case Studies) - A seed-stage startup needed affordable, fast full-text search for 50 million product documents. They turned to a Vietnamese AI-augmented team and open-source tooling. Here's the exact architecture, timeline, and costs.

How a Seed-Stage Startup Built a Full-Text Search Engine for 50M Documents in 3 Weeks Using a Vietnamese AI-Augmented Team

Let me tell you about a project that genuinely surprised me.

A seed-stage e-commerce analytics startup approached us last year. They had a monster problem: 50 million product documents — descriptions, specs, reviews, all in multiple languages — and they needed real-time full-text search. Budget? Tight. Timeline? Yesterday.

Why Vietnam Outsourcing is the Strategic Choice for Tech Leaders in 2024

Why Vietnam Outsourcing is the Strategic Choice for Tech Leaders in 2024

TL;DR: Vietnam is outpacing traditional outsourcing hubs like India and the Philippines. A young, mathematically gifted workforce, 95%… ...

They’d tried PostgreSQL’s full-text search. Queries averaged 11 seconds. Users hated it. Then they priced out Elasticsearch on AWS: $5,000/month for the cluster needed. That was more than their entire engineering budget.

Enter our team in Can Tho, Vietnam. Three senior engineers, augmented with ECOA AI Platform ACP agents. Total cost: $3,000/month for the devs plus $2,000 in infrastructure. They shipped in three weeks.

Why Your Open Source Project Is Thriving (And 80% of Others Are Dying)

Why Your Open Source Project Is Thriving (And 80% of Others Are Dying)

Why Your Open Source Project Is Thriving (And 80% of Others Are Dying) Let’s be real. Most open… ...

Here’s exactly how it happened.

The Problem: 50M Documents, $0 Budget Slack

The startup’s data pipeline pushed new and updated product listings every 10 minutes. Each document averaged 2 KB of text — not huge, but 50 million of them? That’s 100 GB of raw text. And they needed searching across fields with boosting (title > description > reviews) and faceted filters by category, price range, rating.

PostgreSQL’s `tsvector` was fine for small datasets. At scale, it collapsed. Even with proper indexing, a simple `@@ to_tsquery(‘english’, ‘blender’)` took 8-12 seconds. Plus, they wanted fuzzy matching and typo tolerance — Postgres doesn’t do that natively.

Elasticsearch would solve it. But a 3-node cluster with SSDs and enough memory to keep indexes hot? That’s easily $5k/month on managed services. For a seed startup with 12 employees, that’s a no-go.

They needed a cheaper, faster solution. And they needed it yesterday.

The Team: 3 Seniors in Can Tho + ECOA AI Agents

We assembled a compact crew:

  • Lead backend engineer: 8 years in Python and distributed systems.
  • Data engineer: Ex-Spark pipeline builder, fluent in Kafka.
  • DevOps/SRE: Kubernetes and Meilisearch specialist.

All three are mid-to-senior level (we classified them as “Middle” per our pricing grid, $2,000/month each). They’d worked together before on a logistics pipeline. That cohesion mattered.

We augmented them with three ECOA AI Platform agents:

  1. Indexer Agent: Automatically tuned Meilisearch index settings (primary key, searchable attributes, filterable fields) based on observed query patterns.
  2. Synonym Agent: Scanned documents to build language-aware synonyms for user queries (e.g., “laptop” → “notebook”, “blender” → “mixer”).
  3. Query Optimizer Agent: Monitored slow queries and suggested changes to ranking rules or faceted configurations.

These agents didn’t replace the devs — they automated grunt work that would’ve taken weeks of manual tuning.

The Architecture (Simple. Deliberate.)

We didn’t build a distributed monolith. We kept it clean.

Data Ingestion


Product DB (PostgreSQL) → Debezium CDC → Kafka topic 'documents' → Meilisearch indexer

Each document is a JSON blob. The indexer consumes at ~2,000 docs/second. No batch lag — real-time streaming.

Search Layer


User request → Nginx → Python FastAPI → Meilisearch API → response (<100ms)

No caching layer needed initially. Meilisearch is that fast.

AI Agents

The Synonym Agent runs every hour on new query logs from a Redis buffer. It outputs a JSON synonym file that Meilisearch hot-reloads. The Query Optimizer agent uses the same logs to detect patterns like repeated failed searches (zero results) and suggests adding new searchable attributes.

Week-by-Week Breakdown

Week 1: Pipeline & Core Indexing

Days 1-3: Set up Debezium, Kafka, and Meilisearch on a 3-node Kubernetes cluster (each node $40/month on DigitalOcean). Days 4-5: Loaded 10M initial documents. Tested indexing throughput — hit 1,800 docs/sec. Tuned batch sizes to hit 2,200. End of week: All 50M documents ingested. Pretty smooth.

Week 2: Query Tuning & AI Agent Integration

This is where the ECOA agents saved us. The Synonym Agent ran on day 8 — it scanned 15% of the docs and generated a synonym list that covered 92% of common search terms within 24 hours. Without it, the devs would've had to manually define dictionaries. The Query Optimizer caught that "minimalist desk" returned zero results because "desk" wasn't in the `product_type` filter list — the agent recommended adding it.

Query latency at this point: 80-120ms. Acceptable, but we wanted faster.

Week 3: Performance Optimization & Launch

The lead engineer wrote a custom ranking rule using Meilisearch's built-in `sort` feature: boost by rating, then by number of reviews. The Query Optimizer Agent confirmed the rule didn't degrade speed. Final latency: 42ms average (p95: 89ms). Total cost: $2,000/month for infrastructure (DO droplets + managed Kafka) + $3,000 for the Vietnamese team.

The startup launched on schedule. Zero downtime incidents in the first 3 months.

Why This Worked (And It Wasn't Just the Price)

Could we have done this with a US-based team for the same total monthly cost of $5,000? Absolutely not. A single senior US dev would cost that much alone.

But it's not just about cost. The Vietnamese engineers brought discipline and adaptability. They'd worked with Meilisearch before. They quickly learned the ECOA AI agent setup. And they communicated entirely in English — no misunderstandings, no time-zone lag (we overlapped 5 hours with EST, which was enough for daily standups).

The AI agents amplified their output. Tasks that required hours of manual data crawling and rule testing got compressed into minutes. The Synonym Agent alone saved roughly 40 hours of developer time in that first month.

Honestly, I'd argue the ECOA orchestration layer was the secret weapon. It gave the team a force multiplier without adding complexity.

*Here's a rhetorical question: Would a traditional outsourcing company have delivered this in three weeks? Unlikely. They'd be writing requirements docs for a month.*

The Hard Lessons (Because No Case Study Is Perfect)

Nothing is flawless. We hit two bumps:

  1. Memory limits on DigitalOcean droplets — Meilisearch needs RAM proportional to the dataset size. Our first node (8GB) ran out at 30M docs. We scaled horizontally to 3 nodes with 16GB each. That added $100/month to infra costs.
  1. Synonym quality for non-English languages — The Synonym Agent was trained mostly on English. When it hit Vietnamese or Thai product descriptions, it produced noise. We had to add a language detection step and disable synonym generation for low-confidence languages. That took 2 days.

But these were minor. The startup didn't care about those hiccups — they cared about shipping. They shipped.

Would Meilisearch Work for Your Use Case?

Meilisearch isn't Elasticsearch. It's simpler, faster (for most queries), and cheaper. But it lacks some advanced features like aggregations, full-text scoring customization, and complex Boolean queries. For 80% of e-commerce search needs, it's perfect. For highly analytical searches (e.g., "find all products with revenue > $100k in Q3"), you'd still need Elasticsearch or a purpose-built engine.

But for a seed-stage startup with 50M documents and a tiny budget? Meilisearch is a no-brainer. Combine it with Vietnamese engineering talent and AI agents, and you get results that punch way above their weight class.

The Takeaway

This case proves a simple truth: You don't need a huge budget to solve big infrastructure problems. You need the right team, the right tools, and an orchestration layer that eliminates busywork.

Our three engineers in Can Tho worked smarter, not harder. The ECOA agents handled the repetitive optimization loops. And the startup got a production-quality search engine for the price of one US-based junior developer.

That's not a trade-off. That's a win-win.

---

Frequently Asked Questions

How does a Vietnamese AI-augmented team compare to hiring local US developers for a search engine project?

Vietnamese senior developers cost $2,000–$3,000/month, while US seniors cost $15,000–$20,000/month. The Vietnamese team augmented with ECOA AI agents can match or exceed output because automation removes manual tasks. Time zone overlap with US East Coast is ~5 hours, which is enough for async workflows and daily standups.

Is Meilisearch production-ready for 50M documents?

Yes. Meilisearch handles millions of documents easily — just ensure your nodes have enough RAM (1 GB per ~1 million documents for the index). It supports typo tolerance, faceted search, synonyms, and ranking rules. For e-commerce search, it often outperforms Elasticsearch in query speed and operational simplicity.

Did you consider using Elasticsearch's cheaper tier or a self-hosted version?

We did. Self-hosted Elasticsearch on equivalent hardware would've cost ~$4,000/month for compute + storage. And it requires more DevOps attention (tuning shards, managing cluster health, monitoring JVM). Meilisearch is essentially a single binary with minimal configuration — our DevOps engineer managed it in <2 hours per week.

Can I replicate this architecture without ECOA AI agents?

You can — but you'll spend a lot more time manually tuning synonyms, testing ranking rules, and watching query logs. The ECOA agents automate the boring parts. If you have a spare senior engineer, you might skip it. But why would you, when the agents cost less than half of one developer's salary?

Related reading: Why Smart Tech Leaders Hire Vietnamese Developers: A CTO’s Guide to Offshore Excellence

Leave a Comment

Your email address will not be published. Required fields are marked *

Ready to Build with AI-Powered Developers?

Hire Vietnamese engineers augmented by ECOA AI Platform + Claude Code. 5x faster, 40% cheaper.