We Built a Real-Time Fraud Detection Pipeline for a Fintech Startup — 99.7% Precision at 10K TPS with a Vietnamese AI-Augmented Team

Last year, a fintech startup in San Francisco came to us with a brutal problem. Their manual review team of 15 analysts couldn’t keep up with transaction volume. Fraudsters were slipping through the cracks. Chargebacks were eating 8% of their monthly revenue. They needed a real-time system that could score every transaction in under 100 milliseconds.

Oh, and they were processing 10,000 transactions per second at peak.

Top 10 Trending AI Repositories on GitHub — End of May 2026 Edition

This is the third edition of our monthly GitHub AI trending series. We track what the open-source AI… ...

Our Vietnamese team in Ho Chi Minh City took the challenge. We built a multi-agent AI pipeline on the ECOA AI Platform ACP. Six specialized agents, each with a single job. No cargo-culting generic LLM calls. We designed a stateful orchestration that routes each transaction through the right agents based on risk signals.

The result? 99.7% precision on fraud classification. Average latency of 42ms. And that 8% chargeback rate dropped to 0.4% in the first month.

Why Smart CTOs Hire Vietnamese Developers: The Data Behind Southeast Asia’s Rising Tech Hub

TL;DR: Vietnam is becoming a top destination for offshore software development. With strong math education, a 95% developer… ...

Let me walk you through exactly how we did it.

The Problem: Why Traditional Fraud Detection Fails at Scale

The client had been using a rules-based engine. Static thresholds like “amount > $10,000 && country != US”. It worked for a while. Then synthetic identity fraud exploded. Fraudsters learned to game those simple rules.

They tried a third-party ML service. The API cost was insane — $0.03 per transaction. At 10K TPS, that’s $300 per second. Not sustainable.

Manual reviews worked for high-value transactions, but each review took 15-20 minutes. The team could only handle about 200 reviews per day. They had a backlog of 2,000 flagged transactions. By the time they reviewed, the money was already gone.

We needed something different. Something that could:

Score every transaction in real-time
Adapt to new fraud patterns without retraining
Keep costs under $0.002 per transaction
Give human reviewers a clear, actionable summary when escalation was needed

The Architecture: Six Specialized Agents, One Pipeline

Here’s the thing about multi-agent systems: you don’t just throw a prompt at an LLM and call it an agent. Each agent needs a specific role, a strict input/output schema, and a clear handoff protocol.

We designed six agents. Each runs as a stateless container in our Kubernetes cluster, orchestrated by ECOA AI Platform ACP’s dynamic routing engine.

Agent	Role	Model	Avg Response Time
Transaction Enricher	Pulls historical account data, device fingerprint, IP geolocation	Local XGBoost	3ms
Pattern Matcher	Checks against known fraud clusters (real-time similarity search)	FAISS index	5ms
LLM Reasoner	Analyzes natural language fields (merchant description, notes)	Claude 3.5 Haiku	28ms
Risk Scorer	Aggregates signals, calculates composite score	Custom ensemble	2ms
Escalation Decider	Routes to auto-approve, auto-reject, or manual review	Decision tree	<1ms
Human Review Assistant	Summarizes transaction context with highlighted risk points	Claude 3.5 Sonnet	35ms

Why mix classical ML and LLMs? Because LLMs are expensive and slow for simple tasks. The Pattern Matcher uses a vector index — 5ms, 100% deterministic. The LLM Reasoner only fires if the transaction contains unusual free-text fields. That’s about 15% of all transactions.

That’s the key insight: don’t let the AI agent crowd make every decision. Build a router that knows which agents to call and when.

The Multi-Agent Orchestration Flow

Here’s the actual orchestration logic we run. Simplified, but this is the skeleton:

python
from ecoa import AgentOrchestrator, WorkflowBuilder

def fraud_detection_workflow():
    workflow = WorkflowBuilder("fraud_pipeline")
    
    # Step 1: Enrich transaction asynchronously
    workflow.add_agent("enricher", 
        agent_type="transaction_enricher",
        max_retries=1,
        timeout_ms=5000)
    
    # Step 2: Branch based on enrichment result
    workflow.add_conditional_route(
        condition=lambda ctx: ctx["enricher"]["device_risk"] > 0.8,
        true_path="pattern_matcher_fast",
        false_path="pattern_matcher_deep"
    )
    
    # Step 3: Parallel pattern matching + LLM reasoning
    workflow.add_parallel(
        agents=["pattern_matcher", "llm_reasoner"],
        merge_strategy="weighted_average",
        timeout_ms=500
    )
    
    # Step 4: Risk scoring
    workflow.add_agent("risk_scorer", 
        input_from=["enricher", "pattern_matcher", "llm_reasoner"])
    
    # Step 5: Decision
    workflow.add_agent("escalation_decider",
        input_from=["risk_scorer"])
    
    return workflow

orchestrator = AgentOrchestrator(workflow=fraud_detection_workflow())

The parallel step was the game-changer. Instead of calling agents sequentially, we run pattern matching and LLM reasoning at the same time. The merge strategy weights the LLM output higher when it’s confident (low entropy), and relies more on pattern matching when the LLM is uncertain.

Rhetorical question: Why do most architectures treat every agent as a serial step? You’re burning latency for no reason. If two agents don’t depend on each other, run them in parallel.

The Results: What We Measured

After 6 weeks of development and 2 weeks of shadow deployment, we flipped the switch.

Metric	Before	After	Improvement
Fraud detection precision	72%	99.7%	+27.7pp
Recall	65%	98.2%	+33.2pp
Average latency per tx	1.2s	42ms	28.6x faster
Cost per transaction	$0.03	$0.0018	94% reduction
Chargeback rate	8%	0.4%	95% reduction
Manual reviews per day	200	47	76% fewer

The Human Review Assistant agent was the unsung hero. It doesn’t just flag a transaction — it presents a one-paragraph summary of *why* it looks suspicious, with links to evidence. The analysts’ review time dropped from 15 minutes to 3 minutes.

Lessons Learned the Hard Way

We hit three major gotchas. Maybe they’ll save you some pain.

1. Agent timeouts must be dynamic, not static. Fraud patterns change. Sometimes the LLM Reasoner takes 500ms because it’s parsing a long merchant description. Other times it finishes in 100ms. We set fixed 200ms timeouts initially. Got 23% timeouts on valid transactions. Now we use a percentile-based timeout: if 95% of recent calls finished in under 300ms, we set the timeout at 400ms. It adapts.

2. The pattern matcher needs daily retraining, not weekly. Fraudsters adapt fast. We trained our FAISS index on the latest batch of confirmed fraud. But a 7-day refresh cycle meant we missed new patterns. Moved to a daily pipeline. Precision jumped from 96% to 99.7% overnight.

3. Don’t trust LLM confidence scores blindly. Claude 3.5 Haiku says it’s “very confident” in its assessment. But we found that high confidence didn’t correlate with accuracy for borderline transactions. We added a calibration layer: a simple logistic regression on top of the LLM’s output log probabilities. Fixed our false positives by 34%.

Why the Vietnamese Team Made the Difference

This project wasn’t just about the tech. It was about the team.

Our developers in Ho Chi Minh City have deep experience with both classical ML pipelines and modern LLM workflows. They’ve built fraud systems before — for e-commerce, logistics, banking. They understood the domain within days.

More importantly, they didn’t just implement my design. They challenged it. The parallel agent strategy? That came from a middle developer named Huy who said, “Why are we waiting for pattern matching to finish before calling the LLM? They don’t depend on each other.” He was right.

The $2,000/month cost for a mid-level developer in Vietnam made this project economically viable. A US-based equivalent would have blown the client’s budget.

The Bottom Line

We built a real-time fraud detection pipeline that handles 10K TPS with 99.7% precision. It’s running in production today. The client’s chargeback rate is down 95%. Their manual review team now handles fewer tickets with better context.

Could they have done this alone? Maybe. But not in 6 weeks. Not with the same cost structure. The combination of a skilled Vietnamese team and the ECOA AI Platform ACP’s multi-agent orchestration made the impossible possible.

Want to build something similar? Hire Vietnamese developers who know how to orchestrate AI agents for real-world production systems.

—

Frequently Asked Questions

How do you handle model drift for the pattern matching agent?

We run an automated daily retraining pipeline. Every midnight, the system pulls all transactions flagged as fraud from the past 24 hours, re-embeds them, and updates the FAISS index. The old index is kept for 7 days as a fallback. We monitor the precision drift — if it drops more than 0.5% in a day, the pipeline auto-triggers a manual review of the index quality.

Can this architecture work with open-source LLMs instead of Claude?

Yes. The LLM Reasoner agent is abstracted behind a model gateway. We’ve tested it with Llama 3.1 70B and Mistral Large. Inference cost drops by about 40%, but latency goes up by 2-3x. For real-time scoring at 10K TPS, you’d need significant GPU infrastructure. Claude 3.5 Haiku hits the sweet spot of cost, speed, and accuracy for this use case.

How do you prevent the system from creating bias against certain user groups?

We added a fairness monitor agent that runs asynchronously on a sample of 5% of transactions. It tracks protected attributes (inferred from enriched data) and raises an alert if approval/rejection rates deviate by more than 10% across groups. The client’s compliance team reviews these alerts weekly. We also avoid using demographic features in the enriched data — IP geolocation is granular enough to detect fraud without profiling users.