AI Agent concept art showing a glowing brain network connected to digital tools and code interfaces

TL;DR

  • You can build your own AI agent with tool-calling capabilities in under 200 lines of Python
  • Modern LLMs (GPT-4o, Claude Sonnet 4, DeepSeek V4) natively support function calling — the agent decides when to invoke your tools
  • This tutorial walks through a complete agent: LLM client, tool definitions (shell, file read, web fetch), agent loop with conversation memory, and error handling
  • The agent can execute shell commands, read files, fetch web pages, and answer follow-up questions — all through natural language
  • By the end you’ll have a production-ready template you can extend with your own tools (database queries, API calls, Slack notifications)

Introduction

In 2026, AI agents are no longer a futuristic concept — they are the default way developers interact with LLMs. From Claude Code and OpenAI Codex CLI to open-source frameworks like CrewAI, LangGraph, and AutoGen, the era of passive chatbots is over. But behind every agent framework lies a simple, elegant mechanism: function calling (also called tool use).

Function calling lets an LLM request the execution of external tools — running a shell command, querying a database, fetching a URL — and use the result to continue its reasoning. It is the architectural foundation of every AI coding agent on the market today.

Yet most tutorials skip the internals. They tell you to install a framework and call it done. This tutorial does the opposite: we will build a working AI agent from scratch, line by line, so you understand exactly how the magic works. Once you grasp the pattern, you can customize, extend, and debug any agent system — including multi-agent orchestrators and production-grade tools like Hermes Agent.

By the numbers: the openai Python package saw over 1.6 billion downloads in 2025, and GitHub hosts over 16,000+ repositories tagged with tool_use and function calling created just this year (up 340% from 2024). This is the most in-demand skill in AI engineering right now.

Prerequisites

  • Python 3.10+ installed on your machine
  • An OpenAI API key (or Anthropic, DeepSeek — the pattern is the same)
  • Basic familiarity with Python async/await
  • pip for package installation

Step 1: Project Setup and Dependencies

Create a new directory and set up a virtual environment:

mkdir my-ai-agent && cd my-ai-agent
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install openai httpx python-dotenv

Next, create a .env file with your API key:

OPENAI_API_KEY=sk-your-key-here

This is all you need. No framework, no orchestration library — just the OpenAI SDK and HTTPX for web requests.

Step 2: The Core — Tool Definitions

In OpenAI’s API, tools are defined as JSON schemas following the JSON Schema specification. The LLM reads these schemas and, when appropriate, returns a tool_calls array in its response instead of plain text.

Let’s define three tools that make our agent genuinely useful:

  1. Shell executor — run any shell command and capture output
  2. File reader — read the contents of any text file
  3. Web fetcher — download and return the text content of any URL
# tools.py
import subprocess
import httpx

TOOL_DEFINITIONS = [
    {
        "type": "function",
        "function": {
            "name": "run_shell",
            "description": "Execute a shell command and return stdout/stderr. Use for file operations, git, Python scripts.",
            "parameters": {
                "type": "object",
                "properties": {
                    "command": {
                        "type": "string",
                        "description": "Shell command to execute"
                    },
                    "timeout": {
                        "type": "integer",
                        "description": "Timeout in seconds (default 30)",
                        "default": 30
                    }
                },
                "required": ["command"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read a text file from the filesystem and return its contents.",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "Absolute or relative path to the file"
                    }
                },
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "fetch_url",
            "description": "Fetch a URL and return its text content. Use for API calls, documentation, web scraping.",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "The URL to fetch"
                    }
                },
                "required": ["url"]
            }
        }
    }
]


def run_shell(command: str, timeout: int = 30) -> str:
    try:
        result = subprocess.run(
            command, shell=True, capture_output=True, text=True, timeout=timeout
        )
        output = result.stdout
        if result.stderr:
            output += f"\n[STDERR]\n{result.stderr}"
        return output[:5000]  # Truncate to avoid token overflow
    except subprocess.TimeoutExpired:
        return f"Error: Command timed out after {timeout}s"
    except Exception as e:
        return f"Error: {str(e)}"


def read_file(path: str) -> str:
    try:
        with open(path, 'r') as f:
            return f.read()[:5000]
    except Exception as e:
        return f"Error: {str(e)}"


async def fetch_url(url: str) -> str:
    try:
        async with httpx.AsyncClient(timeout=15.0) as client:
            response = await client.get(url, follow_redirects=True)
            return response.text[:5000]
    except Exception as e:
        return f"Error: {str(e)}"


TOOL_MAP = {
    "run_shell": run_shell,
    "read_file": read_file,
    "fetch_url": fetch_url,
}

Key design decisions here: we truncate results to 5,000 characters to prevent token limit issues, and we wrap every tool in a try/except so a single tool failure doesn’t crash the agent. Production agents add retry logic, rate limiting, and sandboxing — but this is enough for a working prototype.

Step 3: The Agent Loop

The agent loop is the beating heart of any AI agent system. Here is the pattern that every agent framework — from simple scripts to Paperclip ACP — implements:

# agent.py
import json
import asyncio
from openai import AsyncOpenAI
from tools import TOOL_DEFINITIONS, TOOL_MAP

SYSTEM_PROMPT = """You are a helpful AI agent that can execute shell commands, read files, and fetch URLs.
When you need information that requires a tool, use the appropriate function.
Always explain what you are doing before calling a tool.
After receiving tool results, incorporate them into your response naturally.
"""


class Agent:
    def __init__(self, api_key: str, model: str = "gpt-4o"):
        self.client = AsyncOpenAI(api_key=api_key)
        self.model = model
        self.messages = [{"role": "system", "content": SYSTEM_PROMPT}]

    async def run(self, user_input: str) -> str:
        self.messages.append({"role": "user", "content": user_input})
        
        while True:
            response = await self.client.chat.completions.create(
                model=self.model,
                messages=self.messages,
                tools=TOOL_DEFINITIONS,
                tool_choice="auto",
                temperature=0.3,
            )
            
            message = response.choices[0].message
            
            if not message.tool_calls:
                # LLM chose to respond directly — we're done
                self.messages.append({"role": "assistant", "content": message.content})
                return message.content
            
            # Process each tool call
            self.messages.append(message)
            for tool_call in message.tool_calls:
                func_name = tool_call.function.name
                func_args = json.loads(tool_call.function.arguments)
                
                print(f"  → Calling tool: {func_name}({json.dumps(func_args)})")
                
                handler = TOOL_MAP.get(func_name)
                if handler is None:
                    result = f"Error: Unknown tool '{func_name}'"
                else:
                    if asyncio.iscoroutinefunction(handler):
                        result = await handler(**func_args)
                    else:
                        result = handler(**func_args)
                
                self.messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result
                })

Notice the while True loop: the agent may call one tool, get a result, and then decide it needs another. This is called multi-step reasoning — the hallmark of a capable agent. A typical coding task might involve: list directory → read file → run test → read output → explain results. Each step is a separate tool call within the same turn.

Step 4: The Main Entry Point

# main.py
import asyncio
import os
from dotenv import load_dotenv
from agent import Agent

load_dotenv()


async def main():
    agent = Agent(api_key=os.environ["OPENAI_API_KEY"])
    print("AI Agent ready. Type 'exit' to quit.\n")
    
    while True:
        user_input = input("\nYou: ")
        if user_input.lower() in ("exit", "quit"):
            break
        
        response = await agent.run(user_input)
        print(f"\nAgent: {response}")


if __name__ == "__main__":
    asyncio.run(main())

Step 5: Testing It Out

Run the agent and try some realistic scenarios:

python main.py

You: What files are in this directory?
  → Calling tool: run_shell({"command": "ls -la"})
Agent: Here are the files in your project directory:
- agent.py (the agent loop)
- tools.py (tool definitions)
- main.py (entry point)
- .env (API key config)

You: Read the agent.py file and tell me how the loop works
  → Calling tool: read_file({"path": "agent.py"})
Agent: The agent loop works by...

You: Fetch the latest Python release info from python.org
  → Calling tool: fetch_url({"url": "https://www.python.org/downloads/"})
Agent: The latest Python version available is 3.13...

The agent autonomously decides which tool to call for each request, chains multiple calls when needed, and explains its reasoning at every step.

Step 6: Adding Conversation Memory

The agent already has basic memory — the self.messages list grows with every turn. But context windows fill up fast. For production use, add summarization or vector-based retrieval:

def summarize_conversation(self) -> str:
    """Compress old messages into a summary to save context."""
    old_messages = self.messages[1:-5]  # Skip system + recent
    if len(old_messages) < 3:
        return
    
    summary_prompt = (
        "Summarize the key facts, decisions, and outputs from this conversation "
        "so far. Keep it under 200 words:"
    )
    for msg in old_messages:
        summary_prompt += f"\n{msg['role']}: {msg['content'][:200]}"
    
    response = self.client.chat.completions.create(
        model=self.model,
        messages=[{"role": "user", "content": summary_prompt}]
    )
    summary = response.choices[0].message.content
    
    # Replace old messages with summary
    self.messages = [self.messages[0]] + [
        {"role": "system", "content": f"[CONVERSATION SUMMARY] {summary}"}
    ] + self.messages[-5:]

This pattern — compress, discard, continue — is how production agents like LangGraph and CrewAI handle long-running sessions without blowing past token limits.

Comparison: DIY Agent vs. Frameworks

Feature DIY Agent (this tutorial) OpenAI Assistants API LangGraph CrewAI
Lines of code ~180 ~50 ~100 ~80
Full control over tool logic ✅ Complete ⚠️ Limited ✅ Complete ⚠️ Partial
Multi-agent orchestration ❌ Manual ✅ Built-in ✅ Built-in
State persistence Manual implementation ✅ Thread-based ✅ Checkpointing ⚠️ Basic
Streaming Manual implementation ✅ Built-in ✅ Built-in ⚠️ Partial
Learning curve Understand everything Low Medium Low
Production readiness Needs hardening High High Medium
Cost (token overhead) Minimal Low Moderate Moderate

The DIY approach gives you complete visibility into every token, every tool call, and every decision the LLM makes. Frameworks abstract this away — great for speed, but dangerous when debugging subtle reasoning failures.

Security Considerations

Our agent can run arbitrary shell commands. In a production deployment, you must:

  • Sandbox execution — run tools in Docker containers or Firecracker microVMs
  • Whitelist commands — restrict shell access to a curated set (git, python, ls, cat)
  • Rate limit — cap the number of tool calls per minute
  • Audit logging — log every tool invocation with timestamp and result hash
  • Timeouts — enforce hard timeouts on every tool call (we already do this)

Tools like Hermes Agent implement all these safeguards out of the box — but understanding why they exist is essential before using any agent system in production.

FAQ

What is function calling in LLMs?

Function calling is a capability of modern LLMs (GPT-4o, Claude Sonnet 4, DeepSeek V4, Gemini 2.5) where the model can request the execution of external functions instead of generating a text response. The model outputs a structured JSON object describing which function to call and with what parameters. Your application executes the function and returns the result to the model for further processing.

Do I need a framework to build an AI agent?

No. As this tutorial demonstrates, a capable AI agent with tool calling can be built in under 200 lines of Python. Frameworks like LangGraph, CrewAI, and AutoGen add value for multi-agent orchestration, state persistence, and streaming — but the core pattern is simple enough to implement yourself.

Which LLM is best for function calling?

As of May 2026, the top performers are GPT-4o (best overall reliability), Claude Sonnet 4 (excellent at following complex tool schemas), and DeepSeek V4 (most cost-effective at $0.50/M input tokens). Benchmark your specific use case — function calling accuracy varies significantly by model and domain.

How do I handle errors when a tool fails?

The pattern used in this tutorial — wrapping every tool in try/except and returning the error as the tool result — lets the LLM decide how to handle failures. A well-prompted agent will retry with different parameters, explain the error to the user, or try an alternative approach.

What is the Agent Communication Protocol (ACP)?

ACP is an open protocol developed by Paperclip that standardizes how AI agents communicate with each other and with tools. It defines a JSON-RPC-based transport layer, capability discovery, and structured error handling. Tools like Claude Code, Codex CLI, and Hermes Agent all support ACP, making them interoperable. Learn more in our multi-agent tutorial.

Key Takeaways

  1. Function calling is the foundation of every modern AI agent — understand the core pattern before reaching for a framework
  2. Three tools are enough to build a useful agent: shell execution, file I/O, and web access cover 90% of real-world use cases
  3. Multi-step reasoning emerges naturally from the agent loop — the LLM decides when to call tools and how to chain results
  4. Security is not optional — sandboxing, rate limiting, and audit logging are mandatory for production agents
  5. The DIY approach teaches you how every agent works, including Claude Code, Codex CLI, and Hermes Agent — once you've built one from scratch, you can work with any framework

Next Steps

You now have a working AI agent. The template code in this tutorial is intentionally minimal — extend it with:

  • Slack/Discord integration for team notifications
  • GitHub API tools (create PRs, review code, manage issues)
  • Database query tools (PostgreSQL, SQLite)
  • Vector memory using embeddings and cosine similarity
  • Multi-agent coordination using the ACP protocol

If you'd rather adopt a production-ready agent system, ECOA AI provides Vietnamese developers who specialize in building and deploying AI agent solutions exactly like this. Our developers can take your prototype agent and productionize it with proper sandboxing, monitoring, and scaling — so you can focus on the product, not the infrastructure.

Published on May 26, 2026 — Developer Tutorial series by ECOA AI