How We Cut AI Coding Tool Hallucinations by 61% Using a Custom Context Injection Pipeline

1 comment
(AI Coding Tools) - AI coding tools write convincing nonsense without full codebase context. Here's how we built a Python pipeline that injects relevant AST-parsed context, cutting hallucinations by 61% and saving hours of rework.

How We Cut AI Coding Tool Hallucinations by 61% Using a Custom Context Injection Pipeline

You’ve seen it happen. You ask an AI coding tool to refactor a service class, and it invents methods that don’t exist. It hallucinates imports from libraries you’ve never installed. It *sounds* right, but the code doesn’t compile.

We ran into this constantly on a client project here in Ho Chi Minh City — a US-based fintech building a real-time fraud detection system. The junior devs would generate code with Copilot, Cursor, or Claude Code, and then spend hours debugging hallucinations. One day, a team lead asked: *“Why does it keep inventing APIs that we never wrote?”*

Why Smart CTOs Hire Vietnamese Developers: A Data-Driven Guide to Offshore Engineering in 2025

Why Smart CTOs Hire Vietnamese Developers: A Data-Driven Guide to Offshore Engineering in 2025

TL;DR: Vietnam is now the top destination for offshore software development. You get strong technical skills (especially in… ...

The answer was obvious: the AI didn’t know our codebase. It only saw whatever context we typed into the prompt. So we built a context injection pipeline that feeds the AI relevant parts of the actual codebase before every generation request. The result? Hallucination rate dropped from 23% to 9% — a 61% reduction.

Here’s exactly how we did it.

AI Coding Assistants Compared in 2026: The Tools That Actually Make Developers Faster

AI Coding Assistants Compared in 2026: The Tools That Actually Make Developers Faster

TL;DR: By 2026, AI coding assistants have matured beyond simple autocomplete. Our head-to-head comparison of GitHub Copilot, Cursor,… ...

The Problem: AI Coding Tools Are Blind to Your Codebase

Most AI coding tools work like this:

  • You write a prompt describing what you need.
  • The model guesses based on its training data + whatever snippet you’ve provided.

If your prompt is vague, the model fills the gaps with hallucinated APIs. Even with a full file as context, it doesn’t know the rest of your project — the types, the utility functions, the existing patterns.

A typical hallucination we saw:

python
# User prompt: "Add a method to fetch user transactions"
def get_user_transactions(user_id: str) -> list[Transaction]:
    return TransactionService.fetch_by_user(user_id)  # TransactionService doesn't exist!

Our project used `UserTransactionRepository`, not `TransactionService`. The AI just made it up.

The Solution: Context Injection via Static Analysis

We built a lightweight Python pipeline that runs *before* you send the prompt. It does three things:

  1. Parses the current file using the `ast` module to identify the class, methods, and imports.
  2. Searches the codebase for related symbols (used types, parent classes, similar methods).
  3. Injects a structured context block into the prompt automatically.

No RAG. No vector database. Just targeted static analysis.

The Core Pipeline Code

Here’s the simplified version we run as a pre-hook:

python
import ast
import os
import glob
from typing import List, Dict

class ContextExtractor:
    def __init__(self, project_root: str):
        self.root = project_root

    def extract_file_symbols(self, filepath: str) -> Dict:
        with open(filepath, 'r') as f:
            tree = ast.parse(f.read())
        classes = [node.name for node in ast.walk(tree) if isinstance(node, ast.ClassDef)]
        functions = [node.name for node in ast.walk(tree) if isinstance(node, ast.FunctionDef)]
        imports = []
        for node in ast.walk(tree):
            if isinstance(node, ast.Import):
                imports.extend(alias.name for alias in node.names)
            elif isinstance(node, ast.ImportFrom):
                imports.append(f"{node.module}.{node.names[0].name}")
        return {"classes": classes, "functions": functions, "imports": imports}

    def find_referenced_files(self, symbols: List[str]) -> List[str]:
        referenced = set()
        for sym in symbols:
            # search for files containing the symbol definition
            for file in glob.glob(f"{self.root}/**/*.py", recursive=True):
                with open(file, 'r') as f:
                    if f"class {sym}" in f.read() or f"def {sym}" in f.read():
                        referenced.add(file)
        return list(referenced)

    def build_context_block(self, filepath: str) -> str:
        local_symbols = self.extract_file_symbols(filepath)
        referenced_files = self.find_referenced_files(local_symbols["imports"] + local_symbols["classes"])
        context = f"// Current file symbols: {local_symbols['classes']}\n"
        context += f"// Imports: {local_symbols['imports']}\n"
        for ref in referenced_files[:3]:  # limit to 3 files to stay within token budget
            with open(ref, 'r') as f:
                context += f"// File: {ref}\n{f.read()[:500]}\n"
        return context

Then we wrap the AI tool invocation:

python
def generate_with_context(prompt: str, current_file: str):
    context = ContextExtractor("/path/to/project").build_context_block(current_file)
    full_prompt = f"Context:\n{context}\n---\nTask:\n{prompt}"
    # Send to AI coding tool here
    return call_ai(full_prompt)

We integrated this into a VSCode extension that runs on file save. Every time a dev asks for code generation for the current file, the pipeline injects the three most relevant source files as context.

What the Data Showed

We ran a controlled A/B test over four weeks with six developers on the same project. Each developer used the same AI coding tool (Claude Code) for 50 tasks:

Metric Without Context Injection With Context Injection
Hallucinated API calls per task 1.2 0.47
Broken imports 0.8 0.11
Files requiring manual fix 23% 9%
Average dev time per task 14 min 9 min

Hallucination reduction: 61%. Development time per task dropped by 36%.

Honestly, the biggest win wasn’t the hallucination drop — it was the trust. Developers stopped double-checking every generated line. They knew the AI had seen the actual codebase, so the output was far more reliable.

Lessons Learned Along the Way

Don’t overload the context. We initially injected 10 files. Token usage skyrocketed, and the AI started ignoring the context. Three files is the sweet spot for most models.

Stale imports are a thing. Our pipeline would sometimes pull in files that were no longer relevant. We added a timestamp check — only include files modified in the last 30 days.

Pair with a linting pre-commit hook. Even with context, the AI occasionally violates project conventions. We run `ruff` and `mypy` before commit to catch any remaining issues. This combo has been bulletproof.

We’ve open-sourced a simplified version called `ctx-inject`. You can find it on our team’s GitHub. Drop it into any Python project and configure the context file limit.

Frequently Asked Questions

Does context injection work with any AI coding tool?

Yes. We tested it with Claude Code, Cursor, and GitHub Copilot Chat. The pipeline just modifies the prompt before sending it to the model. As long as you can intercept the API call, it works. For Copilot Chat in VSCode, we use a custom extension that hooks into the chat request.

How much more expensive is the token usage?

It depends. Injecting three medium-sized files adds roughly 2,000–4,000 tokens per request. If you generate 100 times a day, that’s about 300,000 extra tokens — roughly $0.15 with Claude 3.5 Sonnet pricing. The time savings from fixing hallucinations pay for the token cost 10x.

What

Related reading: Hire Vietnamese Developers: The Complete Guide to Building a High-Performance Remote Team

Leave a Comment

Your email address will not be published. Required fields are marked *

Ready to Build with AI-Powered Developers?

Hire Vietnamese engineers augmented by ECOA AI Platform + Claude Code. 5x faster, 40% cheaper.