What is an AI agent and how is it different from a chatbot?

An AI agent is a system built on a large language model (LLM) that can autonomously take actions to accomplish goals — calling APIs, reading files, querying databases, writing code, and making multi-step decisions. A chatbot takes a message in and sends a message out. An agent takes a goal in and executes a plan that may involve dozens of tool calls, branching decisions, and self-correction loops before returning a result. The key architectural difference is the agent loop: observe → think → act → observe, repeated until the task is complete. In 2026, the most capable agents use models like Claude Opus, GPT-4o, and Gemini 2.0 Ultra with structured tool-use APIs that let the model reliably call external functions.

What Python libraries do I need to build an AI agent in 2026?

At minimum you need an LLM SDK — the Anthropic Python SDK (anthropic) for Claude or openai for GPT models. Beyond that, the most common libraries are: LangChain or LangGraph for pre-built agent frameworks with tool routing and memory; Pydantic for structured input/output validation; httpx or requests for API calls your agent makes; and optionally a vector database client (Pinecone, Weaviate, or ChromaDB) if your agent needs long-term memory via RAG. For production agents, add logging (structlog), observability (LangSmith or Helicone), and a task queue (Celery or Redis Queue) for async execution. You do not need all of these to start — a working agent can be built in under 50 lines with just the Anthropic SDK.

How much does it cost to run an AI agent?

Cost depends on three factors: the model you use, how many tokens each agent run consumes, and how many tool-call loops the agent executes. A simple agent using Claude Sonnet that makes 3-5 tool calls per task costs roughly $0.01-0.05 per run. A complex research agent using Claude Opus that runs 20-50 tool calls with large context windows can cost $0.50-2.00 per run. For most business applications in 2026, agent costs range from $0.02-0.20 per task. The key optimization is prompt caching — reusing cached system prompts and tool definitions across runs can reduce costs by 80-90%. Batch processing (running agents overnight at discounted rates) further reduces costs for non-time-sensitive workloads.

Can I build a production AI agent without LangChain?

Yes — and many production teams in 2026 do exactly that. The core agent loop (send message with tools → check if model wants to call a tool → execute the tool → send results back → repeat) is straightforward to implement directly with the Anthropic or OpenAI SDK in about 40-60 lines of Python. LangChain adds value when you need pre-built integrations (vector stores, document loaders, output parsers) or complex multi-agent orchestration via LangGraph. But for focused, single-purpose agents, writing the loop yourself gives you full control over error handling, retry logic, and cost management without the abstraction overhead. Many teams start with a direct SDK implementation and only add a framework when they hit a genuine complexity wall.

How to Build an AI Agent with Python [2026 Guide]

    Our Take: AI agents are the single most important skill shift in software engineering right now. If you can build a tool-using agent loop, you can automate work that used to require entire teams. This guide gives you the real architecture — not toy demos, but the same patterns running in production at companies processing millions of agent tasks per month. Start with the 40-line version, understand every line, then scale up.
  

What Is an AI Agent (And What It Isn't)
The Agent Architecture: Observe-Think-Act
Build a Minimal Agent in 40 Lines
Adding Tools Your Agent Can Call
Giving Your Agent Memory
Planning and Multi-Step Reasoning
LangChain vs. Direct SDK — When to Use What
Production Deployment Patterns
Cost Management and Optimization
5 Mistakes That Kill Agent Reliability

What Is an AI Agent (And What It Isn't)

An AI agent is not a chatbot with extra steps. A chatbot takes a message and returns a message. An agent takes a goal and executes a plan — calling APIs, reading files, querying databases, writing code, checking its own work, and looping until the job is done.

The difference is the loop. A chatbot is one pass: input → output. An agent runs a cycle:

Observe — Read the current state. What data do I have? What did my last action return?

Think — Given my goal and what I've observed, what should I do next?

Act — Call a tool, run code, make an API request, or return the final result.

Repeat — Feed the action result back in and loop until done.

In 2026, the best agents run on models like Claude Opus, GPT-4o, and Gemini 2.0 Ultra — models that support structured tool use natively. The model doesn't just generate text that looks like a function call; it returns a structured JSON object that your code can parse and execute deterministically.

Key Insight The quality of your agent is 80% determined by the quality of your tools and tool descriptions — not the model. A mediocre model with great tools beats a great model with bad tools every time.

The Agent Architecture: Observe-Think-Act

Every production agent — from Claude Code to Devin to custom enterprise agents — runs the same core loop. Here's the architecture:

The Core Agent Loop

Send a message to the LLM with: system prompt + conversation history + available tools
The LLM responds with either a text message (done) or a tool call (keep going)
If tool call: execute the tool, append the result to conversation history, go to step 1
If text message: return the result to the user

That's it. Every agent framework — LangChain, CrewAI, AutoGen, custom implementations — is a variation on this loop. The differences are in how they handle tool routing, memory, error recovery, and multi-agent coordination. But the loop is always the same.

Build a Minimal Agent in 40 Lines

Let's build a working agent with the Anthropic Python SDK. This agent can use tools, loop until it's done, and handle multi-step tasks. No framework needed.

First, install the SDK:

Terminal

pip install anthropic

Now the agent:

agent.py

import anthropic
import json

client = anthropic.Anthropic()  # uses ANTHROPIC_API_KEY env var

tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"]
        }
    }
]

def execute_tool(name, input):
    """Route tool calls to actual implementations."""
    if name == "get_weather":
        # In production, this calls a real API
        return f"72°F, sunny in {input['city']}"
    return "Unknown tool"

def run_agent(user_message):
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )

        # Collect assistant response
        messages.append({"role": "assistant", "content": response.content})

        # If the model is done (no tool calls), return the text
        if response.stop_reason == "end_turn":
            return "".join(b.text for b in response.content if b.type == "text")

        # Execute each tool call and send results back
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result
                })

        messages.append({"role": "user", "content": tool_results})

# Run it
print(run_agent("What's the weather in Denver and NYC?"))

Run this and the agent will make two tool calls — one for Denver, one for NYC — then synthesize the results into a natural response. That's an agent. It decided what tools to call, called them, and used the results to answer.

Adding Tools Your Agent Can Call

The power of an agent comes from its tools. Here are the tool categories that matter most in production:

Tool Category	Examples	When to Use
Data Retrieval	SQL queries, API calls, file reads, web scraping	Agent needs information it doesn't have
Data Mutation	Write files, update databases, send emails, create tickets	Agent needs to take action in the world
Computation	Run Python code, math calculations, data transformations	Agent needs precise calculations (LLMs are bad at math)
Search	Vector search, web search, document search	Agent needs to find relevant information in large datasets
Verification	Run tests, lint code, validate schemas, check URLs	Agent needs to verify its own work

The most important rule for tool design: write tool descriptions like you're explaining them to a new hire. The model reads these descriptions to decide when and how to call each tool. Vague descriptions produce bad tool selection. Specific descriptions with examples produce reliable agents.

Good vs. Bad Tool Descriptions

# BAD — vague, no guidance on when to use it
{
    "name": "search",
    "description": "Search for things"
}

# GOOD — specific, explains when and how
{
    "name": "search_knowledge_base",
    "description": "Search the company knowledge base for internal documentation, policies, and procedures. Use this when the user asks about company-specific information that wouldn't be in your training data. Returns the top 5 most relevant document chunks with source URLs. Input should be a natural language query, not keywords."
}

Giving Your Agent Memory

Agents without memory forget everything between runs. There are three types of memory that matter:

1. Conversation Memory (Short-Term)

This is the conversation history — the messages array in our code above. The model sees everything from the current session. This is free (it's just the context window) but limited by the model's context length.

2. Summary Memory (Medium-Term)

When conversations get long, compress older messages into summaries. This preserves important context without burning your entire context window on old messages.

Summary Memory Pattern

def compress_history(messages, keep_recent=10):
    """Summarize old messages, keep recent ones verbatim."""
    if len(messages) <= keep_recent:
        return messages

    old = messages[:-keep_recent]
    recent = messages[-keep_recent:]

    summary = client.messages.create(
        model="claude-haiku-4-5-20251001",  # cheap model for summaries
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"Summarize this conversation concisely:\n{json.dumps(old)}"
        }]
    )

    return [
        {"role": "user", "content": f"[Previous context: {summary.content[0].text}]"},
        *recent
    ]

3. Persistent Memory (Long-Term)

Store facts, preferences, and learned information in a vector database or simple file store. When starting a new conversation, retrieve relevant memories and inject them into the system prompt.

    Practical Tip: Start with just conversation memory. Add summary compression when your conversations regularly exceed 50 messages. Add persistent memory only when your agent needs to remember things across separate sessions. Each layer adds complexity — don't over-engineer early.
  

Planning and Multi-Step Reasoning

Simple agents react to each step independently. Better agents plan before acting. The difference is dramatic for complex tasks.

The simplest planning pattern is plan-then-execute: ask the model to write a step-by-step plan first, then execute each step.

Plan-Then-Execute Pattern

PLANNING_PROMPT = """Before taking any action, write a brief plan:
1. What is the user's goal?
2. What information do I need?
3. What tools will I call, and in what order?
4. How will I verify the result?

Write the plan, then execute it step by step."""

def run_planning_agent(user_message):
    messages = [
        {"role": "user", "content": f"{PLANNING_PROMPT}\n\nTask: {user_message}"}
    ]
    return agent_loop(messages)  # same loop as before

For more complex tasks, use ReAct (Reasoning + Acting) — the agent writes its reasoning before each action, creating a visible chain of thought that improves decision-making and makes debugging easier.

LangChain vs. Direct SDK — When to Use What

This is the most common question in 2026 agent development. Here's the honest answer:

Approach	Best For	Avoid When
Direct SDK (Anthropic/OpenAI)	Single-purpose agents, full control needed, performance-critical, learning how agents work	You need 15+ integrations (vector stores, doc loaders, etc.) and don't want to build them
LangChain / LangGraph	Multi-agent orchestration, complex workflows with branching, rapid prototyping with many integrations	Simple agents, performance-critical paths, you need to understand every line of code
Claude Agent SDK	Production agents on Anthropic models, built-in guardrails, managed tool execution	Multi-model agents, you need framework-agnostic code

Our recommendation: start with the direct SDK. Build the 40-line agent above. Understand every line. Then, when you hit a real complexity wall — not an imagined one — reach for a framework. Most production agents we see in the wild are 100-300 lines of direct SDK code. They don't need a framework.

Production Deployment Patterns

Moving an agent from a script to production requires handling three things that don't exist in demos:

1. Error Recovery

Tools fail. APIs time out. Models hallucinate tool names. Your agent loop needs to handle all of this gracefully.

Robust Tool Execution

def execute_tool_safely(name, input, max_retries=2):
    for attempt in range(max_retries + 1):
        try:
            result = execute_tool(name, input)
            return {"status": "success", "result": result}
        except Exception as e:
            if attempt == max_retries:
                return {"status": "error", "error": str(e)}
            time.sleep(1)  # brief backoff

2. Timeout and Cost Guardrails

An agent can loop forever if something goes wrong. Always set maximum iterations and spending limits.

Agent Guardrails

MAX_ITERATIONS = 25
MAX_TOKENS_TOTAL = 100_000

def run_agent_safe(user_message):
    messages = [{"role": "user", "content": user_message}]
    total_tokens = 0

    for i in range(MAX_ITERATIONS):
        response = client.messages.create(...)
        total_tokens += response.usage.input_tokens + response.usage.output_tokens

        if total_tokens > MAX_TOKENS_TOTAL:
            return "Agent stopped: token budget exceeded"

        if response.stop_reason == "end_turn":
            return extract_text(response)

        # ... tool execution loop

    return "Agent stopped: max iterations reached"

3. Observability

You need to see what your agent did, why, and how long each step took. Log every tool call, every model response, and every decision point. Tools like LangSmith, Helicone, and Braintrust make this easier, but even structured logging to a file works.

Cost Management and Optimization

$0.01–0.05

Simple agent run (Sonnet, 3-5 tool calls)

$0.50–2.00

Complex agent run (Opus, 20-50 tool calls)

80–90%

Cost reduction with prompt caching

50%

Savings with batch API (non-urgent tasks)

The three biggest cost levers:

Prompt caching — Cache your system prompt and tool definitions. If they're the same across runs (they usually are), you pay full price once and 90% less for subsequent calls.
Model routing — Use Haiku for simple classification and routing, Sonnet for most agent work, Opus only for tasks that genuinely need it. A router that picks the right model per step can cut costs 60-70%.
Batch API — For non-time-sensitive agent runs (nightly reports, batch processing), use the batch API at 50% discount.

5 Mistakes That Kill Agent Reliability

Too many tools — Giving an agent 50 tools is like giving a new employee 50 apps on their first day. They'll pick the wrong one. Start with 5-7 focused tools and add more only when needed.

Vague tool descriptions — "Search for data" tells the model nothing. "Search the PostgreSQL database for customer records by email, name, or account ID. Returns up to 10 matching rows with all columns." tells it everything.

No error handling in tools — When a tool throws an exception, the agent gets a cryptic Python traceback. Return structured error messages: {"error": "Customer not found", "suggestion": "Try searching by email instead of name"}

No iteration limits — An agent without guardrails will loop until it hits your API spending limit. Always cap iterations and token usage.

Testing with toy examples only — Your agent works on "what's the weather?" but breaks on real tasks. Test with messy, ambiguous, multi-step inputs that mirror actual usage. Build an eval suite early.

Ready to Master AI Agents?

Our hands-on bootcamp covers agent architecture, tool use, production deployment, and more — with real code, not slides. 5 cities. $1,490. 40 students max.

Reserve Your Seat

How to Build an AI Agent with Python in 2026

In This Article