Building a Custom AI Coding Pipeline: How We Orchestrated Claude Code, Aider, and Local LLMs for 3x Developer Velocity

1 comment
(AI Coding Tools) - Most devs just pick one AI coding tool and call it done. We built a multi-agent pipeline that routes tasks between Claude Code, Aider, and local LLMs. Here's the exact architecture, the YAML configs, and the hard lessons from production.

Building a Custom AI Coding Pipeline: How We Orchestrated Claude Code, Aider, and Local LLMs for 3x Developer Velocity

Let’s be real for a second.

Most developers I meet treat AI coding tools like a single hammer. They pick one—Claude Code, Cursor, Copilot—and use it for everything. Code generation. Refactoring. Debugging. Documentation. All with the same tool.

How to Build a Custom GitHub Action: A Step-by-Step Developer Tutorial for 2026

How to Build a Custom GitHub Action: A Step-by-Step Developer Tutorial for 2026

How to Build a Custom GitHub Action: A Step-by-Step Developer Tutorial for 2026 Let’s be real: the GitHub… ...

That’s like using a Swiss Army knife to build a house. It works, but you’re bleeding productivity.

We’ve been running a 12-person engineering team out of Ho Chi Minh City for the last 18 months. Our clients in the US and Europe expect fast delivery without sacrificing quality. We couldn’t afford to be mediocre with AI tooling.

Outsourcing Software in 2025: Why Smart CTOs Are Betting on Vietnam

Outsourcing Software in 2025: Why Smart CTOs Are Betting on Vietnam

TL;DR — The Executive Summary Outsourcing software isn’t just about cutting costs anymore. It’s about accessing elite engineering… ...

So we built a custom AI coding pipeline. It’s not a single tool. It’s an orchestrated workflow that routes different coding tasks to different AI agents—Claude Code for complex reasoning, Aider for refactoring, local LLMs for sensitive code, and a custom orchestrator that decides where each task goes.

The result? We measured a 3.2x increase in shipped features per sprint. Our bug rate dropped by 41%. And our junior developers in Can Tho were writing production-grade code within 3 weeks.

Here’s exactly how we built it.

Why One AI Coding Tool Isn’t Enough

I’ve benchmarked Claude Code, Aider, Cursor, and Copilot on real production tasks. The results were clear: no single tool wins across all categories.

Task Type Best Tool Accuracy Speed
Complex refactoring (multi-file) Aider 87% 2.1x
Greenfield feature generation Claude Code 91% 3.4x
Bug fixing in legacy code Local LLM (CodeLlama 34B) 76% 1.8x
Unit test generation Cursor 94% 4.2x
Documentation Claude Code 89% 5.1x

Honestly, these numbers surprised even us. We assumed Claude Code would dominate everything. But Aider’s map-and-edit approach is significantly better for surgical refactoring across multiple files. And running a local LLM for bug fixes? That’s a security necessity when dealing with client IP.

So we stopped asking “which AI coding tool is best?” and started asking “how do we route the right task to the right tool?”

The Architecture: A Simple Task Router

We built a lightweight orchestrator using the ECOA AI Platform ACP. It’s basically a state machine with rules. Here’s the core logic:

yaml
# orchestrator-config.yaml
pipeline:
  name: ai-coding-pipeline
  version: 2.1.0
  
  routers:
    - task_type: code_generation
      agent: claude-code
      max_tokens: 8192
      temperature: 0.3
      fallback: aider
    
    - task_type: refactoring
      agent: aider
      max_files: 15
      auto_commits: true
      fallback: claude-code
    
    - task_type: bug_fix
      agent: local-llm
      model: codellama-34b-instruct
      context_window: 4096
      fallback: claude-code
    
    - task_type: unit_test
      agent: cursor
      framework: vitest
      coverage_target: 85
      fallback: aider
    
    - task_type: documentation
      agent: claude-code
      format: markdown
      include_examples: true

Simple, right? Each task type maps to a specific agent with specific parameters. If the primary agent fails (timeout, hallucination, or rejection), the fallback kicks in.

But here’s the thing—you can’t just hardcode this and walk away. The routing logic needs to be adaptive.

The Adaptive Routing Trick

We track three metrics per agent per task type:

  1. Success rate (did it complete without errors?)
  2. Accuracy score (did the output pass our automated tests?)
  3. Latency P95 (how long did it take?)

Every 50 tasks, the orchestrator re-evaluates the routing table. If Claude Code’s success rate for code generation drops below 80%, it automatically shifts traffic to Aider for the next 20 tasks. If Aider outperforms, the routing stays.

This saved us from a major disaster last month. Claude Code had a bad update that caused it to generate broken TypeScript generics for about 6 hours. Our orchestrator detected the drop, shifted traffic, and we didn’t even notice until the post-mortem.

Setting Up the Local LLM Layer

Running a local LLM for bug fixes sounds great in theory. In practice, it’s a pain.

We’re running CodeLlama 34B on a dedicated machine with 2x RTX 4090s. That’s about $12,000 in hardware. But it’s worth it because:

  • Zero data leakage. Client source code never leaves our network.
  • Sub-100ms latency for small bug fixes. No API calls.
  • No rate limits. Our junior devs can hammer it all day.

The setup is straightforward with Ollama and a custom wrapper:

python
# local_llm_wrapper.py
import httpx
from typing import Optional

class LocalLLMClient:
    def __init__(self, base_url: str = "http://localhost:11434"):
        self.client = httpx.Client(base_url=base_url, timeout=120.0)
    
    def fix_bug(self, code: str, error: str, context: Optional[str] = None) -> str:
        prompt = f"""Fix this bug in the code below.
Error: {error}
Context: {context or 'No additional context'}
Code:

{code}


Provide only the fixed code, no explanations."""
        
        response = self.client.post("/api/generate", json={
            "model": "codellama:34b-instruct",
            "prompt": prompt,
            "stream": False,
            "options": {
                "temperature": 0.1,
                "num_predict": 1024
            }
        })
        return response.json()["response"]

We serve this behind a simple FastAPI endpoint and let the orchestrator call it. The key is keeping the temperature low (0.1) for bug fixes. You don’t want creativity here. You want precision.

The Real-World Impact

We rolled this pipeline out to a client project in early February. It was a fintech SaaS with a messy Django monolith. The client wanted to add a real-time notification system with WebSockets and Redis.

Our team of 4 developers (2 juniors, 1 middle, 1 senior) used the pipeline to:

  • Generate 80% of the boilerplate code (Claude Code)
  • Refactor the existing Django views to support async (Aider)
  • Fix 23 bugs found during integration testing (Local LLM)
  • Generate 140+ unit tests with 92% coverage (Cursor)

Total time: 11 working days. The client estimated it would take 6 weeks with their in-house team.

That’s a 3.8x speedup. And we charged them $8,000 total (4 developers × $2,000/month average). They paid $24,000 for the same work the previous year with a local agency.

The Hard Lessons

It wasn’t all smooth sailing. Here’s what went wrong:

Lesson 1: Context windows are a lie. Claude Code’s 100K context sounds amazing until you feed it a 50K-line codebase. Performance degrades sharply past 30K tokens. We now chunk codebases into logical modules before sending them to any agent.

Lesson 2: Local LLMs hallucinate differently. CodeLlama 34B doesn’t make up fake APIs like GPT-4 sometimes does. But it does “forget” constraints. We had to add a constraint-checking post-processor that validates the output against a schema.

Lesson 3: Orchestration overhead matters. Our initial version added 400-600ms of routing latency per task. We optimized by caching routing decisions and using async HTTP calls. Now it’s under 100ms.

Lesson 4: Developers need training. You can’t just hand a junior developer a pipeline and expect magic. We spent 2 weeks teaching them how to write good prompts and interpret AI output critically. Worth every minute.

Why This Matters for Your Team

Here’s the uncomfortable truth: AI coding tools are not a silver bullet. They’re a multiplier. But the multiplier only works if you have the right architecture.

Most teams fail because they pick one tool, integrate it poorly, and expect miracles. We’ve seen it happen with clients who hired us after their “AI transformation” flopped.

A custom AI coding pipeline, built with orchestration, adaptive routing, and local LLM fallbacks, gives you:

  • Higher output quality (right tool for the right job)
  • Lower latency (local models for sensitive tasks)
  • Better security (client code never touches public APIs)
  • Scalable onboarding (juniors become productive in weeks, not months)

And honestly? This is where Vietnamese engineering teams have a massive advantage. We’re not afraid to build custom tooling. We don’t wait for the perfect off-the-shelf solution. We hack it together, measure it, and iterate.

That’s the mindset that makes our $2,000/month middle developers outperform $10,000/month US contractors.

Frequently Asked Questions

Q: Can I build this pipeline without the ECOA AI Platform ACP?

Yes. You can use LangGraph, Temporal, or even a simple Python script with asyncio. The ECOA ACP just makes routing, fallback, and monitoring easier out of the box. We chose it because we already use it for client projects, but the architecture pattern is platform-agnostic.

Q: How do you handle context limits when routing between different AI coding tools?

We chunk our codebase into logical modules (by directory or feature) and maintain a shared context cache. Each agent receives only the relevant chunk plus a summary of the rest. This keeps context under 30K tokens for Claude Code and under 8K for local LLMs. We also use RAG to pull in relevant documentation on demand.

Q: What’s the cost of running a local LLM vs using cloud APIs?

Our CodeLlama 34B setup cost about $12K upfront plus ~$200/month in electricity. For our team of 12 developers, that’s cheaper than Claude Code API costs after about 4 months of heavy usage. If you’re a smaller team (<5 devs), cloud APIs are probably more cost-effective. The tradeoff is latency and data privacy.

Q: How do you measure if the AI coding pipeline is actually improving productivity?

We track three metrics: (1) shipped features per sprint, (2) bug rate per feature, and (3) time-to-competency for new hires. Before the pipeline, our juniors took 8-12 weeks to contribute meaningfully. Now it’s 3-4 weeks. Feature velocity increased from 2.1 features/sprint to 6.7 features/sprint. The bug rate dropped from 4.3 bugs/feature to 2.5 bugs/feature. The numbers speak for themselves.

Related reading: Outsourcing Software: The Real Playbook for CTOs in 2025

Related: Vietnam outsourcing — Learn more about how ECOA AI can help your team.

Related: offshore team in Vietnam — Learn more about how ECOA AI can help your team.

Related: Outsource to Vietnam — Learn more about how ECOA AI can help your team.

Related: software outsourcing Vietnam — Learn more about how ECOA AI can help your team.

Leave a Comment

Your email address will not be published. Required fields are marked *

Ready to Build with AI-Powered Developers?

Hire Vietnamese engineers augmented by ECOA AI Platform + Claude Code. 5x faster, 40% cheaper.