Build a Custom AI-Powered Git Pre-Commit Hook with Python: Smarter Code Quality Checks
Let’s be real for a second. Traditional linters miss half the story.
They catch trailing whitespace and missing semicolons. They do *not* catch the subtle logical bug you introduced two hours ago. You know the one — the condition that’s backward, the off-by-one error that’ll haunt you in production at 3 AM.
How SupportFlow Cut Response Time by 73% Using ECOA AI: A Real Case Study
Let me tell you a story. It’s about a company called SupportFlow—a mid-sized SaaS platform that was drowning… ...
We ran into this constantly with our team in Ho Chi Minh City. Our ESLint and Pylint configs were tight. Still, every code review turned up *at least* one “how did this pass local checks?” conversation. It was wasting time. Worse, it was eroding trust.
So we built a better trap.
Why Smart CTOs Hire Vietnamese Developers for Scalable, Cost-Effective Engineering Teams
TL;DR: Vietnam is now the fastest-growing engineering hub in Southeast Asia. With 57,000+ IT graduates annually, competitive rates… ...
Here’s a custom AI-powered Git pre-commit hook in Python that sits between your local changes and the commit. It uses a local or cloud LLM to scan staged diffs for logic errors, security smells, and convention drift. And it blocks the commit — *hard* — if it finds something suspect.
I’ll walk you through the exact code. Build this once, and you’ll never trust a bare `git commit` again.
How This Differs From a Regular Linter
A regular linter works off a static rule set. Useful, but brittle. An AI-powered hook uses context — the diff itself, the surrounding file, and even your project’s CONTRIBUTING.md — to decide if something looks wrong.
Think of it this way:
| Traditional Linter | AI Pre-Commit Hook |
|---|---|
| Catches syntax issues | Catches logic issues |
| Rule-based | Context-based |
| Zero hallucination risk | Small hallucination risk |
| 10ms to run | 1-5 seconds to run |
The tradeoff? Speed. But that 1-2 second delay is worth it when it catches a role-check bug that was *perfectly valid* syntactically but would’ve broken your auth middleware in staging.
Let’s trade theory for code.
Prerequisites
You’ll need:
- Python 3.10+ installed locally
- An LLM API key (I’ll use OpenAI here, but you can swap in Claude, a local Ollama model, etc.)
- `pip install openai gitpython` — we’re calling the API and parsing the diff
That’s it. No bloated frameworks. No Docker containers.
Step 1: The `pre-commit` Hook Script
Create a file at `.git/hooks/pre-commit` in your target repo. This is the file Git executes *before* a commit finalizes.
Git hooks are just shell scripts with a specific exit code. Exit 0 means “commit allowed.” Exit 1 means “block the commit.”
Here’s our Python-backed hook:
bash
#!/bin/sh
# .git/hooks/pre-commit
echo "Running AI code quality check..."
python3 /path/to/your/ai_precommit_check.py
if [ $? -ne 0 ]; then
echo "❌ Commit blocked by AI pre-commit hook."
echo "Fix the issues above or override with: git commit --no-verify"
exit 1
fi
echo "✅ AI check passed. Committing."
exit 0
Make it executable:
bash
chmod +x .git/hooks/pre-commit
Now the real work happens in the Python script.
Step 2: The Python AI Check Script
Here’s the core logic. We grab the staged diff, send it to an LLM, and parse the response for issues.
python
# ai_precommit_check.py
import os
import sys
import subprocess
import json
from openai import OpenAI
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def get_staged_diff():
"""Returns the git diff of staged changes (the stuff about to be committed)."""
result = subprocess.run(
["git", "diff", "--cached"],
capture_output=True, text=True
)
return result.stdout
def check_diff_with_llm(diff_text: str) -> list:
"""
Sends the diff to an LLM and returns a list of issues found.
Expects a JSON response from the model.
"""
if not diff_text.strip():
return []
prompt = f"""
You are an expert code reviewer. Review the following git diff (staged changes).
Identify any:
- Logical bugs (wrong condition, off-by-one errors, null pointer risks)
- Security vulnerabilities (SQL injection, hardcoded secrets, XSS)
- Convention violations specific to Python (PEP8 violations, poor naming)
- Dead code or unreachable branches
Respond ONLY with a valid JSON array of objects. Each object has:
- "file": the file name from the diff header
- "line": approximate line number (int)
- "severity": "error" or "warning"
- "message": a concise explanation
If no issues found, respond with an empty array.
---DIFF START---
{diff_text}
---DIFF END---
"""
try:
response = client.chat.completions.create(
model="gpt-4o-mini", # cheap and fast
messages=[{"role": "user", "content": prompt}],
temperature=0.1, # low temperature for deterministic output
response_format={"type": "json_object"}
)
content = response.choices[0].message.content
# Parse the JSON response
result = json.loads(content)
if isinstance(result, list):
return result
elif isinstance(result, dict) and "issues" in result:
return result["issues"]
else:
return []
except Exception as e:
print(f"⚠️ LLM call failed: {e}. Allowing commit.", file=sys.stderr)
return []
def main():
diff = get_staged_diff()
if not diff:
# Nothing staged? Let it through.
sys.exit(0)
issues = check_diff_with_llm(diff)
if not issues:
sys.exit(0)
# We have issues. Block the commit.
print(f"\n🚫 AI Pre-Commit Hook Found {len(issues)} Issue(s):\n")
for issue in issues:
sev = issue.get("severity", "warning")
file = issue.get("file", "unknown")
line = issue.get("line", "?")
msg = issue.get("message", "No message")
print(f" [{sev.upper()}] {file}:{line} — {msg}")
sys.exit(1)
if __name__ == "__main__":
main()
Key design choices:
- Low temperature (0.1) — We don’t want the LLM to be creative. We want it strict and predictable.
- `response_format: json_object` — Forces the model to output valid JSON. No parsing nightmares.
- Fallback on failure — If the API call fails (network issue, rate limit), we exit 0 and let the commit through. You *never* want a flaky API call to block your team’s entire workflow during a crunch.
Step 3: Test It
Stage a deliberately buggy file:
python
# app/login.py
def validate_role(user):
if user.role = "admin": # Intentional bug: assignment instead of comparison
return True
return False
Now try to commit:
bash
git add app/login.py
git commit -m "fix: update role validation"
You’ll see something like:
Running AI code quality check...
🚫 AI Pre-Commit Hook Found 1 Issue(s):
[ERROR] app/login.py:3 — `=` used instead of `==` in a conditional. This assigns "admin" to user.role instead of comparing it.
❌ Commit blocked by AI pre-commit hook.
That’s the power of context. A linter would catch that `==` vs `=` issue *if* you had a rule. Some linters do. Many don’t, or they require verbose configuration. The LLM just *looks at it* and knows.
Production Hardening: What We Learned Deploying This
We rolled this hook out across three projects with our Can Tho engineering team. Here’s what we had to fix *immediately*:
1. Speed Limits
The first version sent *every* file diff to the LLM in a single request. Works fine for a 50-line change. Falls apart when someone stages a 2000-line CSS refactor and the API tokenizes the entire diff into a 10,000-token payload.
Fix: Limit the diff to the first 3000 characters of *meaningful* changes. If the diff is huge, skip the AI check and fall back to the linter.
python
def get_staged_diff(truncate_chars=3000):
raw = subprocess.run(
["git", "diff", "--cached"],
capture_output=True, text=True
).stdout
if len(raw) > truncate_chars:
print("⚠️ Diff too large for AI check. Falling back to linters.", file=sys.stderr)
return raw[:truncate_chars] + "\n... [truncated]"
return raw
2. False Positive Hell
Early on, the model flagged a perfectly valid `for _ in range(10)` loop as “potential infinite loop.” Not great. The fix was adding a zero-shot example in the prompt with a valid pattern it was not to flag.
Fix: Add “Ignore the following patterns: [valid pattern examples]” to the system prompt.
3. Team Trust
You *must* give developers an override. We made `git commit –no-verify` the documented escape hatch. Some devs use it when they know the change is correct and the AI is being overzealous. That’s fine. The hook is a safety net, not a cage.
Where to From Here
This is just the start. You can extend this hook to:
- Check against a specific project convention file — Read `CONTRIBUTING.md` and inject relevant rules into the prompt
- Log all declined commits — Collect data on *why* commits were blocked and audit AI decisions
- Use a local model — Swap `gpt-4o-mini` for Ollama’s `codellama` model to keep everything offline (we did this for a SOC 2 client)
If you’re building a remote team in Vietnam — whether in Hanoi, Ho Chi Minh City, or Can Tho — this kind of automation closes the gap between “code passes local checks” and “code is actually correct.” We’ve seen it cut review rework by close to 35% in our own stack.
Frequently Asked Questions
Can this hook work with Claude API instead of OpenAI?
Absolutely. Just swap the `OpenAI` client for Anthropic’s SDK. The prompt structure stays the same. We’ve used Claude Sonnet 4 on larger codebases because it handles context windows better, but the logic is identical.
Does this slow down `git commit` significantly?
For small diffs (most commits), it adds about 1-2 seconds. For large diffs (more than 500 lines), it can take 5-10 seconds. We added a truncation threshold to keep it under 3 seconds for 95% of commits. The team accepted that trade-off easily.
How do I share this hook across my team without everyone setting it up manually?
Use a symlink or copy the hook from a shared repo. Better yet, use a tool like `pre-commit` (the framework) and add a custom hook entry that references your Python script. Our team stores the script at `scripts/ai_precommit_check.py` in the repo root and the CI pipeline validates the hook is installed.
Does this work with TypeScript, Go, or Rust?
Yes. The prompt is language-agnostic. It reads the diff, not the compiled output. We’ve tested it on Python, JavaScript, TypeScript, and Rust diffs. The LLM adapts based on the file extension and diff content. We saw slightly better results on Python and JS (common training data), but Go and Rust catches were still solid.
Related reading: Vietnam Outsourcing: Why It’s the Smartest Offshore Development Move in 2025