The Agent Selection Trap: Why Your Orchestrator Is Wasting Cycles on the Wrong AI Worker
You’ve built a multi-agent system. You’ve got a specialist agent for SQL generation, another one for API orchestration, a third for data validation. Your orchestrator sits in the middle, routing requests.
And it’s making the wrong calls. Constantly.
Vietnam Outsourcing: The Strategic Edge for Modern Tech Teams
TL;DR: Vietnam outsourcing offers a rare trifecta—deep technical talent, cost savings up to 70%, and time zone alignment… ...
The SQL agent gets a question about API response parsing. The API agent gets a raw data validation task. They fail, retry, fail again, and your system latency doubles. You blame the agents. But it’s not their fault.
The problem is your orchestrator doesn’t know which agent is actually suited for the work.
State Management Is the Silent Killer of Multi-Agent Systems: Here’s How We Fixed It
State Management Is the Silent Killer of Multi-Agent Systems: Here’s How We Fixed It You’ve built a shiny… ...
The Default Router Is a Coin Flip
Most orchestration frameworks use one of three strategies to pick an agent:
- Round-robin: Just cycles through agents blindly.
- Keyword matching: Routes based on a simple tag like “sql” or “api.”
- LLM-based selection: Asks a large language model to decide.
Here’s the problem with each one.
Round-robin is a joke for any serious system. Keyword matching breaks the second your input doesn’t contain the exact magic word. And LLM-based selection? We tested it. It adds 2-3 seconds of latency per decision, costs money per call, and still gets it wrong 15-20% of the time.
We were burning thousands of API calls a day just to decide which agent to talk to.
What We Built Instead: A Lightweight Scoring System
We needed something faster. Something deterministic. Something that didn’t require a PhD in prompt engineering.
The core idea: Every agent publishes a capability profile. The orchestrator scores each agent against the incoming task. The highest score wins.
Here’s the exact schema we used:
python
@dataclass
class AgentCapability:
name: str
description: str
keywords: list[str]
input_schema: dict
expected_output_type: str
avg_latency_ms: float
success_rate: float # 0.0 to 1.0
Then the scoring function:
python
def score_agent_for_task(agent: AgentCapability, task: Task) -> float:
keyword_score = 0
for kw in agent.keywords:
if kw.lower() in task.description.lower():
keyword_score += 10
# Boost agents that have actually succeeded on similar tasks
history_score = get_historical_success_rate(agent.name, task.type) * 20
# Penalize slow agents for time-sensitive tasks
latency_penalty = 0
if task.deadline_ms:
latency_penalty = max(0, (agent.avg_latency_ms / task.deadline_ms) * 15)
return keyword_score + history_score - latency_penalty
That’s it. No neural network. No LLM call. It runs in under 2 milliseconds.
The Vietnamese Team That Made It Production-Ready
We prototyped this in a weekend. But turning it into a production system that handles 50,000 requests per minute? That’s where our team in Can Tho, Vietnam came in.
Honestly, we tried building this in-house for three weeks. We got stuck on the historical success tracking. The caching layer was a mess. The team in Vietnam took our prototype and shipped a production-ready version in 10 days.
They added two critical pieces we hadn’t thought of:
- A feedback loop: When an agent fails, the orchestrator automatically adjusts that agent’s keyword weights downward. It’s self-healing.
- A fallback chain: If the top-scored agent is down, the system doesn’t crash — it moves to the second-best option within 50ms.
We cut our agent misrouting rate from 18% to 2.3% in the first week.
Why This Matters More Than You Think
Your multi-agent orchestration is only as good as your agent selection. Period.
If you’re spending money on fine-tuned specialist agents but your router keeps sending them the wrong tasks, you’re burning cash and latency. We were paying $0.003 per LLM-based routing call. At 2 million calls a month, that’s $6,000 on routing alone.
The scoring system costs us near zero.
But here’s the uncomfortable question: Are you measuring your orchestrator’s accuracy at all? Most teams don’t. They measure agent performance in isolation. They never check if the right agent is even getting the task.
Start logging it. You’ll be shocked.
The Production Numbers
After two months in production with the scoring-based orchestrator:
| Metric | Before | After |
|---|---|---|
| Agent misrouting rate | 18% | 2.3% |
| Average task completion time | 4.2s | 1.8s |
| LLM routing costs | $6,000/mo | $0 |
| System throughput | 12K req/min | 47K req/min |
The throughput improvement wasn’t just from faster routing. It’s because agents stopped wasting cycles on tasks they couldn’t handle. They actually did the work they were built for.
How to Implement This Today
You don’t need a massive refactor. Here’s the minimal path:
- Define capability profiles for each agent. Start with keywords and expected output types.
- Add a scoring function to your orchestrator. Use the code above as a starting point.
- Log every routing decision and its outcome. You can’t improve what you don’t measure.
- Add a feedback loop that adjusts scores based on real outcomes.
We built this in two weeks with a team of three mid-level developers in Vietnam. Total cost: about $3,000. The monthly savings on LLM routing calls alone paid for it in the first month.
The Real Lesson
Multi-agent orchestration isn’t about having the smartest agents. It’s about having a router that knows which agent to use and when. That’s the bottleneck nobody talks about.
We’ve been conditioned to think “more AI” solves everything. But sometimes the best fix is a simple scoring function and a team that knows how to ship fast.
The Vietnamese developers in Can Tho didn’t just build the feature. They showed us that the orchestrator’s job isn’t to be intelligent — it’s to be accurate. Speed comes from accuracy, not from clever prompts.
Stop treating your orchestrator like a brain. Treat it like a switchboard operator. Give it a good directory and let it do its job.
—
Frequently Asked Questions
How do you handle agents with overlapping capabilities in the scoring system?
We track historical success rates per task type. If two agents score close, the one with higher success on similar tasks wins. We also add a small random factor (±2%) to break ties and collect data on both options.
Does the scoring system work with dynamically created agents?
Yes. When a new agent registers, it starts with a default profile based on its declared capabilities. The system learns its actual strengths within 50-100 tasks through the feedback loop. We’ve seen it converge to accurate routing within hours.
What happens if all agents score below a threshold?
We have a fallback. If no agent scores above 30, the orchestrator sends the task to a general-purpose LLM agent. That agent processes the request and also suggests which specialist agent should handle similar tasks in the future. It’s a self-improving system.
Can this work with agents running on different infrastructure (e.g., cloud vs. on-prem)?
Absolutely. The capability profile includes a latency field. An on-prem agent that’s 10ms away will naturally score higher for time-sensitive tasks than a cloud agent at 80ms. The orchestrator doesn’t care where the agent lives — it only cares about the score.
Related reading: Outsourcing Software: The Vietnam Playbook That’s Beating India and the Philippines
Related reading: Why Smart CTOs Hire Vietnamese Developers: The Real Competitive Edge