When AI Coding Tools Meet Security Audits: How We Auto-Fixed 47 Vulnerabilities in a Weekend
Your AI coding tools write clean code. Most of the time.
But what happens when you drop them into a legacy PHP codebase that’s been patched by 15 different developers over 8 years? The model starts hallucinating. It suggests dependencies that don’t exist. It “fixes” one SQL injection only to introduce a broken PDO query that silently fails on Postgres.
How We Helped a Logistics Startup Cut API Costs by 62% Using a Vietnamese AI-Augmented Team
How We Helped a Logistics Startup Cut API Costs by 62% Using a Vietnamese AI-Augmented Team Let me… ...
Sound familiar?
I’ve been there. Recently, our team at ECOA AI (based in Can Tho, Vietnam) took on a client from the US — a mid-sized e-commerce platform that had failed two consecutive penetration tests. The report listed 47 confirmed vulnerabilities. SQLi, XSS, insecure deserialization, hardcoded API keys, you name it.
ECOA AI Platform ACP in Production: Deploying Multi-Agent AI Systems at Scale — A 2026 Field Guide
TL;DR ECOA AI Platform ACP (Agent Communication Protocol) is becoming the industry standard for multi-agent orchestration in production,… ...
The client had two requests: fix everything, and don’t break the checkout flow during Black Friday prep.
We had a weekend.
Here’s how we combined AI coding tools with old-school developer rigor to pull it off — and the exact patterns you can steal for your own security audits.
Why AI Coding Tools Miss Most Vulnerabilities in Legacy Code
Let’s be brutally honest: most AI coding tools are trained on modern codebases. React, Node.js, Python 3.10+. They shine there.
Drop them into a PHP 5.6 codebase with custom ORM layers, and the model starts guessing. Badly.
Here’s a real example from our audit. The original code:
php
$user_id = $_GET['user_id'];
$query = "SELECT * FROM users WHERE id = " . $user_id;
$result = mysqli_query($conn, $query);
We asked an AI coding tool to fix this SQL injection. It returned:
php
$user_id = mysqli_real_escape_string($conn, $_GET['user_id']);
$query = "SELECT * FROM users WHERE id = " . $user_id;
Closer. But still wrong. Numeric fields should use prepared statements, not string escaping. The AI missed the logical distinction.
Our senior developer on the ground in Can Tho caught it in under 30 seconds. Why? Because he’d seen this exact pattern fail in production three times before.
The lesson: AI coding tools are pattern matchers, not domain experts. They need human oversight, especially in security contexts.
Our Weekend Playbook: Hybrid AI + Human Security Audits
We didn’t just “let the AI loose.” We built a structured pipeline. Here’s the exact workflow we used:
Step 1: Scan with Semi-Automated Static Analysis
We ran the entire codebase through a custom Python wrapper around PHP CodeSniffer and Psalm. This gave us a baseline of ~200 potential issues.
Then we fed these findings to Claude 3.5 Sonnet with a strict system prompt:
“You are a PHP security auditor. For each finding, classify it as CRITICAL, HIGH, MEDIUM, or LOW. If CRITICAL or HIGH, generate a diff with prepared statement fixes. Do not change business logic.”
The AI returned 52 classified findings. We cross-referenced these with the pen-test report.
Step 2: Human Validation by a Senior Vietnamese Developer
Honestly, this is where Vietnam’s engineering culture shines. Our developers don’t just execute tickets — they question them.
Our lead in Can Tho reviewed each AI-suggested fix and rejected 12 of them outright. Why?
- 5 would have broken the custom caching layer
- 4 introduced type mismatches in the aging ORM
- 3 failed because the AI didn’t understand the stored procedure bindings
Actual quote from the dev: “The AI doesn’t know the database has triggers on the `orders` table. It’s trying to parameterize a query that needs raw input for a stored procedure.”
He was right.
Step 3: Automated Regression Suite
Before applying any fix, we wrote a PhpUnit test for each vulnerability. If the test didn’t pass before the fix, it validated the fix was needed. After the fix, the test had to pass.
We didn’t just trust the AI. We forced verification.
| Vulnerability Type | Count | AI-Fixed Correctly | Human-Corrected |
|---|---|---|---|
| SQL Injection | 18 | 14 | 4 |
| XSS | 12 | 10 | 2 |
| Insecure Deserialization | 7 | 3 | 4 |
| Hardcoded Secrets | 6 | 5 | 1 |
| IDOR | 4 | 2 | 2 |
| Total | 47 | 34 | 13 |
The AI got 72% right on the first pass. Not bad. But the 28% it missed would have been production outages.
The Real Secret: Prompt Engineering for Security Audits
Most teams fail because they use generic prompts. “Find vulnerabilities in this code.” That’s borderline useless.
Here’s the prompt structure we used that actually worked:
Role: Senior PHP Security Auditor working with a 2015-era e-commerce codebase.
Context: This code connects to MySQL 5.7. It uses mysqli, not PDO. Some stored procedures accept raw parameters.
Task: Review each classified HIGH or CRITICAL finding. Generate a complete diff that:
- Uses prepared statements where possible
- Preserves stored procedure calls exactly as-is
- Adds output encoding for all user-facing variables
- Logs the change with a standardized comment block
Constraint: Do not refactor. Only fix the specific vulnerability.
You’ll notice the constraints are specific. That’s what makes AI coding tools useful in security audits — they need guardrails, not freedom.
Why Vietnam’s Developer Culture Matches AI-Assisted Security Work
Here’s an uncomfortable truth: AI coding tools amplify both skill and sloppiness.
A junior dev who blindly accepts AI suggestions will introduce more vulnerabilities than they fix. We’ve seen it happen. A dev on autopilot can miss subtle logic flaws because the AI’s fix “looks right.”
But a senior developer who treats the AI as a first pass — a very fast typist with partial context — can achieve 5x throughput. That’s exactly what our Vietnamese team does.
In Can Tho, we’ve built a culture of “question the output.” Every AI-generated fix gets a code review. Not because we distrust the tool, but because we understand its limits.
Here’s a concrete stat from our audit: The AI tool suggested replacing `htmlspecialchars()` with a framework-specific XSS filter. That would have broken the entire template rendering system. Our dev caught it in 90 seconds.
You can’t train an AI to have production instincts. You hire for that.
The Result: Passed Pen-Test, Zero Regressions
Monday morning, the client ran their third penetration test. All 47 vulnerabilities were resolved. Zero critical findings. The checkout flow stayed up all weekend.
Cost? A fraction of what a US-based security firm would charge. Speed? We delivered in 2 days instead of the typical 3-week timeline.
The client’s CTO sent us a Slack message: “I don’t know how you did that, but we’re signing the annual retainer today.”
We did it with a mix of AI coding tools and senior engineering judgment — the kind you find in Vietnam.
How to Run Your Own AI-Assisted Security Audit
You don’t need a team of 10. You need two things:
- A structured pipeline — static analysis, AI-powered diff generation, human validation, regression testing
- Senior engineers who ask “why” — developers who won’t blindly accept an AI’s output
If you have those, you can run a security audit in a weekend. If you don’t, you’ll spend weeks patching the patches.
We’re hiring.
—
Frequently Asked Questions
Can AI coding tools replace a human security auditor for legacy codebases?
No. AI tools are great for pattern matching and generating initial fixes, but they lack production context. They can’t understand custom middleware, stored procedure bindings, or business logic constraints. You’ll get 70-80% coverage. The remaining 20-30% requires a senior developer who knows the codebase’s quirks.
Which AI coding tool works best for PHP vulnerability detection?
In our experience, Claude 3.5 Sonnet performed best for PHP security work. It understands older PHP idioms better than GPT-4 or Copilot. But the key isn’t the model — it’s the prompt. You need system prompts that explicitly constrain the AI from refactoring and force it to preserve existing patterns.
How long does an AI-assisted security audit actually take?
It depends on the codebase size. For a 50,000-line PHP app like ours, the full audit took 2 days. Static analysis took 2 hours, AI fix generation took 4 hours, human validation took 8 hours, and regression testing took the remaining time. Without AI, this would have taken 3-4 weeks with a dedicated security team.
What’s the biggest mistake teams make when using AI for security audits?
Using generic prompts like “find vulnerabilities.” You need structured prompts with role definitions, codebase context, explicit constraints, and output format requirements. Also, never skip human validation. We rejected 28% of AI-generated fixes because they would have broken production. Trust but verify — always.
Related reading: Outsourcing Software Development: Why Vietnam is the Smartest Bet in 2025
Related reading: Why Smart CTOs Hire Vietnamese Developers: A Data-Driven Guide to Offshore Engineering