In This Article
- What Is an AI Agent (And What It Isn't)
- The Agent Architecture: Observe-Think-Act
- Build a Minimal Agent in 40 Lines
- Adding Tools Your Agent Can Call
- Giving Your Agent Memory
- Planning and Multi-Step Reasoning
- LangChain vs. Direct SDK — When to Use What
- Production Deployment Patterns
- Cost Management and Optimization
- 5 Mistakes That Kill Agent Reliability
What Is an AI Agent (And What It Isn't)
An AI agent is not a chatbot with extra steps. A chatbot takes a message and returns a message. An agent takes a goal and executes a plan — calling APIs, reading files, querying databases, writing code, checking its own work, and looping until the job is done.
The difference is the loop. A chatbot is one pass: input → output. An agent runs a cycle:
In 2026, the best agents run on models like Claude Opus, GPT-4o, and Gemini 2.0 Ultra — models that support structured tool use natively. The model doesn't just generate text that looks like a function call; it returns a structured JSON object that your code can parse and execute deterministically.
The Agent Architecture: Observe-Think-Act
Every production agent — from Claude Code to Devin to custom enterprise agents — runs the same core loop. Here's the architecture:
- Send a message to the LLM with: system prompt + conversation history + available tools
- The LLM responds with either a text message (done) or a tool call (keep going)
- If tool call: execute the tool, append the result to conversation history, go to step 1
- If text message: return the result to the user
That's it. Every agent framework — LangChain, CrewAI, AutoGen, custom implementations — is a variation on this loop. The differences are in how they handle tool routing, memory, error recovery, and multi-agent coordination. But the loop is always the same.
Build a Minimal Agent in 40 Lines
Let's build a working agent with the Anthropic Python SDK. This agent can use tools, loop until it's done, and handle multi-step tasks. No framework needed.
First, install the SDK:
pip install anthropic
Now the agent:
import anthropic
import json
client = anthropic.Anthropic() # uses ANTHROPIC_API_KEY env var
tools = [
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
]
def execute_tool(name, input):
"""Route tool calls to actual implementations."""
if name == "get_weather":
# In production, this calls a real API
return f"72°F, sunny in {input['city']}"
return "Unknown tool"
def run_agent(user_message):
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=tools,
messages=messages
)
# Collect assistant response
messages.append({"role": "assistant", "content": response.content})
# If the model is done (no tool calls), return the text
if response.stop_reason == "end_turn":
return "".join(b.text for b in response.content if b.type == "text")
# Execute each tool call and send results back
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
messages.append({"role": "user", "content": tool_results})
# Run it
print(run_agent("What's the weather in Denver and NYC?"))
Run this and the agent will make two tool calls — one for Denver, one for NYC — then synthesize the results into a natural response. That's an agent. It decided what tools to call, called them, and used the results to answer.
Adding Tools Your Agent Can Call
The power of an agent comes from its tools. Here are the tool categories that matter most in production:
| Tool Category | Examples | When to Use |
|---|---|---|
| Data Retrieval | SQL queries, API calls, file reads, web scraping | Agent needs information it doesn't have |
| Data Mutation | Write files, update databases, send emails, create tickets | Agent needs to take action in the world |
| Computation | Run Python code, math calculations, data transformations | Agent needs precise calculations (LLMs are bad at math) |
| Search | Vector search, web search, document search | Agent needs to find relevant information in large datasets |
| Verification | Run tests, lint code, validate schemas, check URLs | Agent needs to verify its own work |
The most important rule for tool design: write tool descriptions like you're explaining them to a new hire. The model reads these descriptions to decide when and how to call each tool. Vague descriptions produce bad tool selection. Specific descriptions with examples produce reliable agents.
# BAD — vague, no guidance on when to use it
{
"name": "search",
"description": "Search for things"
}
# GOOD — specific, explains when and how
{
"name": "search_knowledge_base",
"description": "Search the company knowledge base for internal documentation, policies, and procedures. Use this when the user asks about company-specific information that wouldn't be in your training data. Returns the top 5 most relevant document chunks with source URLs. Input should be a natural language query, not keywords."
}
Giving Your Agent Memory
Agents without memory forget everything between runs. There are three types of memory that matter:
1. Conversation Memory (Short-Term)
This is the conversation history — the messages array in our code above. The model sees everything from the current session. This is free (it's just the context window) but limited by the model's context length.
2. Summary Memory (Medium-Term)
When conversations get long, compress older messages into summaries. This preserves important context without burning your entire context window on old messages.
def compress_history(messages, keep_recent=10):
"""Summarize old messages, keep recent ones verbatim."""
if len(messages) <= keep_recent:
return messages
old = messages[:-keep_recent]
recent = messages[-keep_recent:]
summary = client.messages.create(
model="claude-haiku-4-5-20251001", # cheap model for summaries
max_tokens=500,
messages=[{
"role": "user",
"content": f"Summarize this conversation concisely:\n{json.dumps(old)}"
}]
)
return [
{"role": "user", "content": f"[Previous context: {summary.content[0].text}]"},
*recent
]
3. Persistent Memory (Long-Term)
Store facts, preferences, and learned information in a vector database or simple file store. When starting a new conversation, retrieve relevant memories and inject them into the system prompt.
Planning and Multi-Step Reasoning
Simple agents react to each step independently. Better agents plan before acting. The difference is dramatic for complex tasks.
The simplest planning pattern is plan-then-execute: ask the model to write a step-by-step plan first, then execute each step.
PLANNING_PROMPT = """Before taking any action, write a brief plan:
1. What is the user's goal?
2. What information do I need?
3. What tools will I call, and in what order?
4. How will I verify the result?
Write the plan, then execute it step by step."""
def run_planning_agent(user_message):
messages = [
{"role": "user", "content": f"{PLANNING_PROMPT}\n\nTask: {user_message}"}
]
return agent_loop(messages) # same loop as before
For more complex tasks, use ReAct (Reasoning + Acting) — the agent writes its reasoning before each action, creating a visible chain of thought that improves decision-making and makes debugging easier.
LangChain vs. Direct SDK — When to Use What
This is the most common question in 2026 agent development. Here's the honest answer:
| Approach | Best For | Avoid When |
|---|---|---|
| Direct SDK (Anthropic/OpenAI) | Single-purpose agents, full control needed, performance-critical, learning how agents work | You need 15+ integrations (vector stores, doc loaders, etc.) and don't want to build them |
| LangChain / LangGraph | Multi-agent orchestration, complex workflows with branching, rapid prototyping with many integrations | Simple agents, performance-critical paths, you need to understand every line of code |
| Claude Agent SDK | Production agents on Anthropic models, built-in guardrails, managed tool execution | Multi-model agents, you need framework-agnostic code |
Our recommendation: start with the direct SDK. Build the 40-line agent above. Understand every line. Then, when you hit a real complexity wall — not an imagined one — reach for a framework. Most production agents we see in the wild are 100-300 lines of direct SDK code. They don't need a framework.
Production Deployment Patterns
Moving an agent from a script to production requires handling three things that don't exist in demos:
1. Error Recovery
Tools fail. APIs time out. Models hallucinate tool names. Your agent loop needs to handle all of this gracefully.
def execute_tool_safely(name, input, max_retries=2):
for attempt in range(max_retries + 1):
try:
result = execute_tool(name, input)
return {"status": "success", "result": result}
except Exception as e:
if attempt == max_retries:
return {"status": "error", "error": str(e)}
time.sleep(1) # brief backoff
2. Timeout and Cost Guardrails
An agent can loop forever if something goes wrong. Always set maximum iterations and spending limits.
MAX_ITERATIONS = 25
MAX_TOKENS_TOTAL = 100_000
def run_agent_safe(user_message):
messages = [{"role": "user", "content": user_message}]
total_tokens = 0
for i in range(MAX_ITERATIONS):
response = client.messages.create(...)
total_tokens += response.usage.input_tokens + response.usage.output_tokens
if total_tokens > MAX_TOKENS_TOTAL:
return "Agent stopped: token budget exceeded"
if response.stop_reason == "end_turn":
return extract_text(response)
# ... tool execution loop
return "Agent stopped: max iterations reached"
3. Observability
You need to see what your agent did, why, and how long each step took. Log every tool call, every model response, and every decision point. Tools like LangSmith, Helicone, and Braintrust make this easier, but even structured logging to a file works.
Cost Management and Optimization
The three biggest cost levers:
- Prompt caching — Cache your system prompt and tool definitions. If they're the same across runs (they usually are), you pay full price once and 90% less for subsequent calls.
- Model routing — Use Haiku for simple classification and routing, Sonnet for most agent work, Opus only for tasks that genuinely need it. A router that picks the right model per step can cut costs 60-70%.
- Batch API — For non-time-sensitive agent runs (nightly reports, batch processing), use the batch API at 50% discount.
5 Mistakes That Kill Agent Reliability
{"error": "Customer not found", "suggestion": "Try searching by email instead of name"}
Ready to Master AI Agents?
Our hands-on bootcamp covers agent architecture, tool use, production deployment, and more — with real code, not slides. 5 cities. $1,490. 40 students max.
Reserve Your Seat