Build Your Own AI Agent with Function Calling: A Complete Step-by-Step Python Tutorial (2026)

TL;DR

You can build your own AI agent with tool-calling capabilities in under 200 lines of Python
Modern LLMs (GPT-4o, Claude Sonnet 4, DeepSeek V4) natively support function calling — the agent decides when to invoke your tools
This tutorial walks through a complete agent: LLM client, tool definitions (shell, file read, web fetch), agent loop with conversation memory, and error handling
The agent can execute shell commands, read files, fetch web pages, and answer follow-up questions — all through natural language
By the end you’ll have a production-ready template you can extend with your own tools (database queries, API calls, Slack notifications)

Introduction

Function calling lets an LLM request the execution of external tools — running a shell command, querying a database, fetching a URL — and use the result to continue its reasoning. It is the architectural foundation of every AI coding agent on the market today.

Yet most tutorials skip the internals. They tell you to install a framework and call it done. This tutorial does the opposite: we will build a working AI agent from scratch, line by line, so you understand exactly how the magic works. Once you grasp the pattern, you can customize, extend, and debug any agent system — including multi-agent orchestrators and production-grade tools like Hermes Agent.

Outsourcing Software Development in 2025: The CTO’s Playbook for Real Results

TL;DR: Outsourcing software can cut costs by 40-60% and speed up delivery, but only if you pick the… ...

By the numbers: the openai Python package saw over 1.6 billion downloads in 2025, and GitHub hosts over 16,000+ repositories tagged with tool_use and function calling created just this year (up 340% from 2024). This is the most in-demand skill in AI engineering right now.

Prerequisites

Python 3.10+ installed on your machine
An OpenAI API key (or Anthropic, DeepSeek — the pattern is the same)
Basic familiarity with Python async/await
pip for package installation

Step 1: Project Setup and Dependencies

Create a new directory and set up a virtual environment:

Why Smart CTOs Hire Vietnamese Developers: A Data-Driven Guide to Offshore Engineering

TL;DR: Vietnam is emerging as the top-tier destination for offshore software development. Skilled engineers, a 12-hour time zone… ...

mkdir my-ai-agent && cd my-ai-agent
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install openai httpx python-dotenv

Next, create a .env file with your API key:

OPENAI_API_KEY=sk-your-key-here

This is all you need. No framework, no orchestration library — just the OpenAI SDK and HTTPX for web requests.

Step 2: The Core — Tool Definitions

In OpenAI’s API, tools are defined as JSON schemas following the JSON Schema specification. The LLM reads these schemas and, when appropriate, returns a tool_calls array in its response instead of plain text.

Let’s define three tools that make our agent genuinely useful:

Shell executor — run any shell command and capture output
File reader — read the contents of any text file
Web fetcher — download and return the text content of any URL

# tools.py
import subprocess
import httpx

TOOL_DEFINITIONS = [
    {
        "type": "function",
        "function": {
            "name": "run_shell",
            "description": "Execute a shell command and return stdout/stderr. Use for file operations, git, Python scripts.",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string",
                        "description": "Shell command to execute"
                    },
                    "timeout": {
                        "type": "integer",
                        "description": "Timeout in seconds (default 30)",
                        "default": 30
                    }
                },
                "required": ["command"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read a text file from the filesystem and return its contents.",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "Absolute or relative path to the file"
                    }
                },
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "fetch_url",
            "description": "Fetch a URL and return its text content. Use for API calls, documentation, web scraping.",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "The URL to fetch"
                    }
                },
                "required": ["url"]
            }
        }
    }
]

def run_shell(command: str, timeout: int = 30) -> str:
    try:
        result = subprocess.run(
            command, shell=True, capture_output=True, text=True, timeout=timeout
        )
        output = result.stdout
        if result.stderr:
            output += f"
[STDERR]
{result.stderr}"
        return output[:5000]  # Truncate to avoid token overflow
    except subprocess.TimeoutExpired:
        return f"Error: Command timed out after {timeout}s"
    except Exception as e:
        return f"Error: {str(e)}"

def read_file(path: str) -> str:
    try:
        with open(path, 'r') as f:
            return f.read()[:5000]
    except Exception as e:
        return f"Error: {str(e)}"

async def fetch_url(url: str) -> str:
    try:
        async with httpx.AsyncClient(timeout=15.0) as client:
            response = await client.get(url, follow_redirects=True)
            return response.text[:5000]
    except Exception as e:
        return f"Error: {str(e)}"

TOOL_MAP = {
    "run_shell": run_shell,
    "read_file": read_file,
    "fetch_url": fetch_url,
}

Key design decisions here: we truncate results to 5,000 characters to prevent token limit issues, and we wrap every tool in a try/except so a single tool failure doesn’t crash the agent. Production agents add retry logic, rate limiting, and sandboxing — but this is enough for a working prototype.

Step 3: The Agent Loop

The agent loop is the beating heart of any AI agent system. Here is the pattern that every agent framework — from simple scripts to ECOA AI Platform ACP — implements:

# agent.py
import json
import asyncio
from openai import AsyncOpenAI
from tools import TOOL_DEFINITIONS, TOOL_MAP

SYSTEM_PROMPT = """You are a helpful AI agent that can execute shell commands, read files, and fetch URLs.
When you need information that requires a tool, use the appropriate function.
Always explain what you are doing before calling a tool.
After receiving tool results, incorporate them into your response naturally.
"""

class Agent:
    def __init__(self, api_key: str, model: str = "gpt-4o"):
        self.client = AsyncOpenAI(api_key=api_key)
        self.model = model
        self.messages = [{"role": "system", "content": SYSTEM_PROMPT}]

    async def run(self, user_input: str) -> str:
        self.messages.append({"role": "user", "content": user_input})

        while True:
            response = await self.client.chat.completions.create(
                model=self.model,
                messages=self.messages,
                tools=TOOL_DEFINITIONS,
                tool_choice="auto",
                temperature=0.3,
            )

            message = response.choices[0].message

            if not message.tool_calls:
                # LLM chose to respond directly — we're done
                self.messages.append({"role": "assistant", "content": message.content})
                return message.content

            # Process each tool call
            self.messages.append(message)
            for tool_call in message.tool_calls:
                func_name = tool_call.function.name
                func_args = json.loads(tool_call.function.arguments)

                print(f"  → Calling tool: {func_name}({json.dumps(func_args)})")

                handler = TOOL_MAP.get(func_name)
                if handler is None:
                    result = f"Error: Unknown tool '{func_name}'"
                else:
                    if asyncio.iscoroutinefunction(handler):
                        result = await handler(**func_args)
                    else:
                        result = handler(**func_args)

                self.messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result
                })

Notice the while True loop: the agent may call one tool, get a result, and then decide it needs another. This is called multi-step reasoning — the hallmark of a capable agent. A typical coding task might involve: list directory → read file → run test → read output → explain results. Each step is a separate tool call within the same turn.

Step 4: The Main Entry Point

# main.py
import asyncio
import os
from dotenv import load_dotenv
from agent import Agent

load_dotenv()

async def main():
    agent = Agent(api_key=os.environ["OPENAI_API_KEY"])
    print("AI Agent ready. Type 'exit' to quit.
")

    while True:
        user_input = input("
You: ")
        if user_input.lower() in ("exit", "quit"):
            break

        response = await agent.run(user_input)
        print(f"
Agent: {response}")

if __name__ == "__main__":
    asyncio.run(main())

Step 5: Testing It Out

Run the agent and try some realistic scenarios:

python main.py

You: What files are in this directory?
  → Calling tool: run_shell({"command": "ls -la"})
Agent: Here are the files in your project directory:
- agent.py (the agent loop)
- tools.py (tool definitions)
- main.py (entry point)
- .env (API key config)

You: Read the agent.py file and tell me how the loop works
  → Calling tool: read_file({"path": "agent.py"})
Agent: The agent loop works by...

You: Fetch the latest Python release info from python.org
  → Calling tool: fetch_url({"url": "https://www.python.org/downloads/"})
Agent: The latest Python version available is 3.13...

The agent autonomously decides which tool to call for each request, chains multiple calls when needed, and explains its reasoning at every step.

Step 6: Adding Conversation Memory

The agent already has basic memory — the self.messages list grows with every turn. But context windows fill up fast. For production use, add summarization or vector-based retrieval:

def summarize_conversation(self) -> str:
    """Compress old messages into a summary to save context."""
    old_messages = self.messages[1:-5]  # Skip system + recent
    if len(old_messages) < 3:
        return

    summary_prompt = (
        "Summarize the key facts, decisions, and outputs from this conversation "
        "so far. Keep it under 200 words:"
    )
    for msg in old_messages:
        summary_prompt += f"
{msg['role']}: {msg['content'][:200]}"

    response = self.client.chat.completions.create(
        model=self.model,
        messages=[{"role": "user", "content": summary_prompt}]
    )
    summary = response.choices[0].message.content

    # Replace old messages with summary
    self.messages = [self.messages[0]] + [
        {"role": "system", "content": f"[CONVERSATION SUMMARY] {summary}"}
    ] + self.messages[-5:]

This pattern — compress, discard, continue — is how production agents like LangGraph and CrewAI handle long-running sessions without blowing past token limits.

Comparison: DIY Agent vs. Frameworks

Feature	DIY Agent (this tutorial)	OpenAI Assistants API	LangGraph	CrewAI
Lines of code	~180	~50	~100	~80
Full control over tool logic	✅ Complete	⚠️ Limited	✅ Complete	⚠️ Partial
Multi-agent orchestration	❌ Manual	❌	✅ Built-in	✅ Built-in
State persistence	Manual implementation	✅ Thread-based	✅ Checkpointing	⚠️ Basic
Streaming	Manual implementation	✅ Built-in	✅ Built-in	⚠️ Partial
Learning curve	Understand everything	Low	Medium	Low
Production readiness	Needs hardening	High	High	Medium
Cost (token overhead)	Minimal	Low	Moderate	Moderate

The DIY approach gives you complete visibility into every token, every tool call, and every decision the LLM makes. Frameworks abstract this away — great for speed, but dangerous when debugging subtle reasoning failures.

Security Considerations

Our agent can run arbitrary shell commands. In a production deployment, you must:

Sandbox execution — run tools in Docker containers or Firecracker microVMs
Whitelist commands — restrict shell access to a curated set (git, python, ls, cat)
Rate limit — cap the number of tool calls per minute
Audit logging — log every tool invocation with timestamp and result hash
Timeouts — enforce hard timeouts on every tool call (we already do this)

Tools like Hermes Agent implement all these safeguards out of the box — but understanding why they exist is essential before using any agent system in production.

FAQ

What is function calling in LLMs?

Function calling is a capability of modern LLMs (GPT-4o, Claude Sonnet 4, DeepSeek V4, Gemini 2.5) where the model can request the execution of external functions instead of generating a text response. The model outputs a structured JSON object describing which function to call and with what parameters. Your application executes the function and returns the result to the model for further processing.

Do I need a framework to build an AI agent?

No. As this tutorial demonstrates, a capable AI agent with tool calling can be built in under 200 lines of Python. Frameworks like LangGraph, CrewAI, and AutoGen add value for multi-agent orchestration, state persistence, and streaming — but the core pattern is simple enough to implement yourself.

Which LLM is best for function calling?

As of May 2026, the top performers are GPT-4o (best overall reliability), Claude Sonnet 4 (excellent at following complex tool schemas), and DeepSeek V4 (most cost-effective at $0.50/M input tokens). Benchmark your specific use case — function calling accuracy varies significantly by model and domain.

How do I handle errors when a tool fails?

The pattern used in this tutorial — wrapping every tool in try/except and returning the error as the tool result — lets the LLM decide how to handle failures. A well-prompted agent will retry with different parameters, explain the error to the user, or try an alternative approach.

What is the Agent Communication Protocol (ACP)?

ACP is an open protocol developed by ECOA AI Platform that standardizes how AI agents communicate with each other and with tools. It defines a JSON-RPC-based transport layer, capability discovery, and structured error handling. Tools like Claude Code, Codex CLI, and Hermes Agent all support ACP, making them interoperable. Learn more in our multi-agent tutorial.

Key Takeaways

Function calling is the foundation of every modern AI agent — understand the core pattern before reaching for a framework
Three tools are enough to build a useful agent: shell execution, file I/O, and web access cover 90% of real-world use cases
Multi-step reasoning emerges naturally from the agent loop — the LLM decides when to call tools and how to chain results
Security is not optional — sandboxing, rate limiting, and audit logging are mandatory for production agents
The DIY approach teaches you how every agent works, including Claude Code, Codex CLI, and Hermes Agent — once you've built one from scratch, you can work with any framework

Next Steps

You now have a working AI agent. The template code in this tutorial is intentionally minimal — extend it with:

Slack/Discord integration for team notifications
GitHub API tools (create PRs, review code, manage issues)
Database query tools (PostgreSQL, SQLite)
Vector memory using embeddings and cosine similarity
Multi-agent coordination using the ACP protocol

If you'd rather adopt a production-ready agent system, ECOA AI provides Vietnamese developers who specialize in building and deploying AI agent solutions exactly like this. Our developers can take your prototype agent and productionize it with proper sandboxing, monitoring, and scaling — so you can focus on the product, not the infrastructure.

Published on May 26, 2026 — Developer Tutorial series by ECOA AI

Build Your Own AI Agent with Function Calling: A Complete Step-by-Step Python Tutorial (2026)

TL;DR

Introduction

Outsourcing Software Development in 2025: The CTO’s Playbook for Real Results

Prerequisites

Step 1: Project Setup and Dependencies

Why Smart CTOs Hire Vietnamese Developers: A Data-Driven Guide to Offshore Engineering

Step 2: The Core — Tool Definitions

Step 3: The Agent Loop

Step 4: The Main Entry Point

Step 5: Testing It Out

Step 6: Adding Conversation Memory

Comparison: DIY Agent vs. Frameworks

Security Considerations

FAQ

What is function calling in LLMs?

Do I need a framework to build an AI agent?

Which LLM is best for function calling?

How do I handle errors when a tool fails?

What is the Agent Communication Protocol (ACP)?

Related Reading

Key Takeaways

Next Steps

Read more:

1 comment

Leave a Comment Cancel reply

Ready to Build with AI-Powered Developers?

Build Your Own AI Agent with Function Calling: A Complete Step-by-Step Python Tutorial (2026)

TL;DR

Introduction

Prerequisites

Step 1: Project Setup and Dependencies

Step 2: The Core — Tool Definitions

Step 3: The Agent Loop

Step 4: The Main Entry Point

Step 5: Testing It Out

Step 6: Adding Conversation Memory

Comparison: DIY Agent vs. Frameworks

Security Considerations

FAQ

What is function calling in LLMs?

Do I need a framework to build an AI agent?

Which LLM is best for function calling?

How do I handle errors when a tool fails?

What is the Agent Communication Protocol (ACP)?

Related Reading

Key Takeaways

Next Steps

Read more:

1 comment

Leave a Comment Cancel reply

RELATED POSTS

Ready to Build with AI-Powered Developers?