How a Legal Tech Startup Processed 50K Documents/Day with a Vietnamese Team — The Architecture That Survived Compliance Hell
I’ve seen a lot of offshore projects go sideways. Budget overruns. Missed deadlines. “Cultural misunderstandings” that really meant “nobody asked the right questions.”
But this one? It worked. Here’s why.
We Built a Real-Time Fraud Detection Pipeline for a Fintech Startup — 99.7% Precision at 10K TPS with a Vietnamese AI-Augmented Team
We Built a Real-Time Fraud Detection Pipeline for a Fintech Startup — 99.7% Precision at 10K TPS with… ...
A US-based legal tech startup came to us with a brutal problem. Their platform ingested thousands of legal documents daily — contracts, court filings, discovery responses — and needed to extract structured data, redact PII, and route everything to the right case management system. Manual processing was killing them. Each document took 12 minutes of human review. They were drowning at 2,000 documents per day.
They needed 50,000. Daily. With 99.9% accuracy. And they needed to pass SOC 2 Type II within 6 months.
The Real State of AI Agent Workflow Automation in 2026: What Actually Works
TL;DR: AI agent workflow automation tools in 2026 are no longer hype—they’re production necessities. This post shares hard-won… ...
No pressure.
The Real Problem Wasn’t the AI
Most teams would throw an LLM at this and call it a day. That’s a mistake.
Legal documents have strict formatting requirements. A hallucinated clause or a missed redaction isn’t just a bug — it’s a liability. The startup had already tried building an in-house pipeline with GPT-4. It worked at 100 documents. At 1,000, accuracy dropped to 87%. At 5,000, the pipeline started timing out.
They needed something that scaled *and* stayed compliant.
The Team: 5 Engineers in Ho Chi Minh City
We assembled a team of 5 Vietnamese developers through ECOAAI:
- 1 Senior Backend Engineer (Python, FastAPI, PostgreSQL) — $3,000/month
- 1 Middle Data Engineer (ETL pipelines, Redis, RabbitMQ) — $2,000/month
- 1 Junior Full-Stack Developer (React, TypeScript, Tailwind) — $1,000/month
- 1 Senior DevOps Engineer (Kubernetes, Terraform, SOC 2 controls) — $3,000/month
- 1 Middle AI/ML Engineer (LLM fine-tuning, prompt engineering, RAG) — $2,000/month
Total monthly cost: $11,000. For a San Francisco-based startup, that’s less than the salary of one senior engineer.
But cost wasn’t the win. The win was speed.
The Architecture: A 3-Stage Compliance Pipeline
We designed a pipeline that separated concerns cleanly. Each stage could be tested, audited, and scaled independently.
Stage 1: Ingestion and Classification
Documents arrived via API, S3 upload, or email attachment. A lightweight FastAPI service handled the webhooks, pushed metadata to PostgreSQL, and dumped raw files into S3.
python
# Simplified ingestion handler
@app.post("/ingest")
async def ingest_document(file: UploadFile, metadata: DocMetadata):
doc_id = str(uuid.uuid4())
s3_path = f"raw/{doc_id}/{file.filename}"
await s3_client.upload_fileobj(file.file, BUCKET, s3_path)
await db.execute(
"INSERT INTO documents (id, status, s3_path, metadata) VALUES ($1, 'pending', $2, $3)",
doc_id, s3_path, metadata.dict()
)
await queue.publish("document.ingested", {"doc_id": doc_id})
return {"doc_id": doc_id, "status": "pending"}
A classifier agent (using a fine-tuned BERT model) categorized each document: contract, court filing, discovery, or correspondence. Accuracy here hit 99.2% after training on 10K labeled examples.
Stage 2: Extraction and Redaction — The Hard Part
This is where most pipelines die. You can’t just dump a legal document into GPT-4 and ask for structured JSON. The output is inconsistent. Hallucinations creep in. And you can’t audit a black box.
We built a multi-agent extraction system using the ECOA AI Platform ACP:
- Extraction Agent: Used a structured prompt with schema enforcement. Output was always valid JSON with predefined fields.
- Redaction Agent: Ran a separate pass to identify and mask PII (names, SSNs, account numbers). Used a combination of regex patterns and an LLM for context-aware redaction.
- Validation Agent: Cross-checked extracted data against the original document. Flagged discrepancies for human review.
The key insight? We never trusted a single LLM call. Every extraction was validated by a second agent. If confidence dropped below 99%, the document was routed to a human reviewer.
python
# Validation agent logic
async def validate_extraction(doc_id: str, extracted: dict, original_text: str):
validation_prompt = f"""
Given the original document text and the extracted data, identify any discrepancies.
Return a JSON object with:
- "is_valid": boolean
- "confidence": float (0-1)
- "issues": list of strings describing each discrepancy
Original text:
{original_text[:4000]}
Extracted data:
{json.dumps(extracted, indent=2)}
"""
result = await llm.call(validation_prompt, schema=ValidationResult)
if not result.is_valid or result.confidence < 0.99:
await queue.publish("document.needs_review", {"doc_id": doc_id, "issues": result.issues})
return False
return True
This pattern caught 94% of hallucination errors before they reached the client. The remaining 6% were caught by the human review queue.
Stage 3: Routing and Compliance Logging
Every action was logged. Every LLM call was traced. Every human review was timestamped and signed.
We used OpenTelemetry for distributed tracing and wrote all audit logs to a separate, immutable PostgreSQL instance. This became the backbone of their SOC 2 evidence.
The Numbers That Matter
After 10 weeks of development and 2 weeks of UAT:
| Metric | Before | After |
|---|---|---|
| Documents processed per day | 2,000 (manual) | 52,000 (automated) |
| Accuracy | ~95% (human error) | 99.7% (validated) |
| Cost per document | $4.50 | $0.32 |
| Average processing time | 12 minutes | 8 seconds |
| Human review rate | 100% | 3.2% |
| Monthly operational cost | $270,000 | $19,200 |
The startup passed their SOC 2 audit on the first attempt. The auditor specifically noted the "rigorous validation pipeline and immutability of audit logs."
What Actually Made This Work
Three things, in order of importance:
1. The validation loop. Most teams build a pipeline and pray. We built a pipeline that double-checks itself. That's the difference between "AI-powered" and "production-ready."
2. The Vietnamese team's discipline. Honestly, I've worked with offshore teams from 6 countries. This team in Ho Chi Minh City stood out for one reason: they asked "why" constantly. Not to be difficult — to understand the compliance requirements deeply. They caught edge cases in the redaction logic that our US-based architects missed.
3. The ECOA AI Platform ACP. The agent orchestration layer handled retries, circuit breaking, and task routing automatically. We didn't write a single line of infrastructure code for the agent workflows. That saved us roughly 4 weeks of development time.
The Mistakes We Made
Let's be honest — it wasn't perfect.
We over-engineered the initial extraction agent. Our first version had 12 specialized sub-agents for different document types. Turns out, 4 agents with good prompts outperformed 12 agents with bad prompts. We refactored in week 6.
We underestimated the PII redaction complexity. Legal documents have weird edge cases. "John Doe, Esq." looks like a person, but "Esq." is a title, not a name. The redaction agent initially flagged "Esq." as PII. That took a week of fine-tuning to fix.
We should have started with a smaller pilot. We went all-in on 50K documents/day from day one. In retrospect, processing 5K documents/day for two weeks would have caught the redaction bugs earlier.
Why This Matters for Your Next Project
You don't need a 50-person team in San Francisco to build a compliance-grade document processing pipeline. You need:
- A clear architectural pattern (validate everything)
- A disciplined offshore team (ask the right questions)
- An orchestration platform that handles the boilerplate (don't reinvent the wheel)
The Vietnamese team delivered this at roughly 15% of the cost of an equivalent US-based team. And they did it faster.
Actually, let me rephrase that. They did it faster *because* they were disciplined, not because they were cheap. The cost savings were a side effect of good engineering.
Frequently Asked Questions
How did you handle data privacy with an offshore team accessing legal documents?
All PII was redacted before any human team member accessed the data. The Vietnamese team only worked with anonymized document structures. The raw documents remained in US-based S3 buckets with strict IAM policies. The team connected via a VPN with session logging and MFA. SOC 2 auditors confirmed no data leakage.
What LLM did you use for extraction, and how did you control costs?
We used GPT-4o-mini for the extraction and validation agents. The key was aggressive caching — identical document sections (boilerplate legal language) hit a Redis cache instead of the LLM. This cut API costs by 67%. Total monthly LLM spend: $4,200 for 52K documents.
Can this architecture work for other regulated industries like healthcare or finance?
Yes. The pattern is industry-agnostic. Replace "legal document" with "medical record" or "loan application," and the same 3-stage pipeline applies. The validation agent is the critical piece — it enforces domain-specific rules regardless of the source data. We've since deployed similar architectures for a healthcare startup and a fintech lender.