Course HomeCitiesReserve Your Seat
Day 3 of 5 50 minutes

Memory and Context — Agents That Remember

Add short-term conversation memory, long-term JSON persistence, and context window management. ~200 lines of Python.

What you'll build today

An agent with two memory systems: short-term (conversation history within a session) and long-term (JSON file that persists facts across sessions). Plus a context window manager that summarizes old messages when the window fills up.

1
Two Types of Memory

Short-term vs long-term memory

Short-term memory is the conversation history — the messages list we've been building since Day 1. It's in RAM. When the session ends, it's gone.

Long-term memory is a persistent store of facts the agent has learned. When the agent discovers something important — a user preference, a key fact, a result from a previous task — it writes it to a file or database. Next session, it loads that store and starts with that knowledge.

Together, they give the agent both context within a session and continuity across sessions.

2
The Code

Memory-enabled agent

Pythonagent_day3.py
import anthropic, json, os
from datetime import datetime
from pathlib import Path

client = anthropic.Anthropic()
MEMORY_FILE = "agent_memory.json"
MAX_MESSAGES = 20  # summarize when history exceeds this

# ── Long-term memory ───────────────────────────────
def load_memory() -> dict:
    if Path(MEMORY_FILE).exists():
        return json.loads(Path(MEMORY_FILE).read_text())
    return {"facts": [], "preferences": {}, "task_history": []}

def save_memory(memory: dict):
    Path(MEMORY_FILE).write_text(json.dumps(memory, indent=2))

def remember_fact(fact: str, category: str = "general") -> str:
    memory = load_memory()
    entry = {
        "fact": fact,
        "category": category,
        "timestamp": datetime.now().isoformat()
    }
    memory["facts"].append(entry)
    save_memory(memory)
    return f"Remembered: '{fact}' (category: {category})"

def recall_facts(query: str = "") -> str:
    memory = load_memory()
    facts = memory["facts"]
    if query:
        facts = [f for f in facts
                  if query.lower() in f["fact"].lower()
                  or query.lower() in f["category"].lower()]
    if not facts:
        return "No facts found."
    return json.dumps(facts, indent=2)

# ── Context window management ──────────────────────
def summarize_old_messages(messages: list) -> list:
    """When history is long, summarize old messages into a single system message."""
    if len(messages) <= MAX_MESSAGES:
        return messages

    # Keep the most recent 10 messages
    recent = messages[-10:]
    old = messages[:-10]

    # Ask Claude to summarize the old messages
    summary_text = "\n".join(
        f"{m['role']}: {str(m['content'])[:200]}"
        for m in old
    )
    summary_resp = client.messages.create(
        model="claude-haiku-4-5",  # cheap model for summarization
        max_tokens=512,
        messages=[{
            "role": "user",
            "content": f"Summarize this conversation history in 3-5 sentences:\n{summary_text}"
        }]
    )
    summary = summary_resp.content[0].text

    # Replace old messages with a summary
    summary_message = {
        "role": "user",
        "content": f"[Earlier conversation summary]: {summary}"
    }
    return [summary_message] + recent

# ── Tools (extends Day 2) ─────────────────────────
TOOLS = [
  {"name":"remember_fact","description":"Store a fact in long-term memory for future sessions. Use for important info the user shares.",
   "input_schema":{"type":"object","properties":{"fact":{"type":"string"},"category":{"type":"string"}},"required":["fact"]}},
  {"name":"recall_facts","description":"Retrieve facts from long-term memory. Use at session start or when you need to recall past information.",
   "input_schema":{"type":"object","properties":{"query":{"type":"string"}},"required":[]}},
]

def execute_tool(name, inp):
    if name == "remember_fact":
        return remember_fact(inp["fact"], inp.get("category","general"))
    elif name == "recall_facts":
        return recall_facts(inp.get("query",""))
    raise ValueError(f"Unknown tool: {name}")

# ── Agent with memory ─────────────────────────────
class MemoryAgent:
    def __init__(self):
        self.messages = []
        self.memory = load_memory()
        # Inject existing long-term memory as context
        if self.memory["facts"]:
            facts_str = "\n".join(f"- {f['fact']}" for f in self.memory["facts"])
            print(f"Loaded {len(self.memory['facts'])} facts from long-term memory.")
            self._system_context = f"You know these facts from previous sessions:\n{facts_str}"
        else:
            self._system_context = "No previous session memory."

    def chat(self, user_input: str, max_steps=10):
        self.messages.append({"role":"user","content":user_input})
        self.messages = summarize_old_messages(self.messages)

        for _ in range(max_steps):
            resp = client.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=2048,
                system=self._system_context,
                tools=TOOLS,
                messages=self.messages
            )
            if resp.stop_reason == "end_turn":
                answer = resp.content[0].text
                self.messages.append({"role":"assistant","content":answer})
                return answer
            results = []
            for b in resp.content:
                if b.type == "tool_use":
                    result = execute_tool(b.name, b.input)
                    results.append({"type":"tool_result","tool_use_id":b.id,"content":str(result)})
            self.messages += [
                {"role":"assistant","content":resp.content},
                {"role":"user","content":results}
            ]
        return "Max steps reached."

# ── Demo: multi-session memory ─────────────────────
if __name__ == "__main__":
    agent = MemoryAgent()

    # Session 1: tell agent something
    r1 = agent.chat("My name is Alex and I prefer Python over JavaScript.")
    print("Response 1:", r1)

    # Session 2 (new agent instance simulates new session)
    agent2 = MemoryAgent()
    r2 = agent2.chat("What's my name and language preference?")
    print("Response 2:", r2)
    # Agent should answer correctly from long-term memory!

Run it twice: First run stores the preference. Second run (creating agent2) loads it from disk and answers correctly. That's long-term memory working.

3
Context Window

Why context window management matters

Every message you add to the history costs tokens. Claude's context window is 200K tokens — but at $3/million tokens for Sonnet, a 100K-token conversation costs $0.30 per exchange. For production agents running hundreds of tasks per day, this adds up.

The summarization approach above keeps costs manageable: once you have more than 20 messages, summarize the oldest ones into a single compact message. You preserve the gist of the conversation without paying for every word.

More sophisticated approaches: For production, consider semantic memory — using embeddings to retrieve only the relevant facts for the current task, rather than loading everything. We use a simpler JSON approach here because it works for 80% of use cases and doesn't require a vector database.

Day 3 Challenge

Complete before Day 4

  1. Run the demo and verify cross-session memory works
  2. Add a forget_fact(fact_id) tool that removes a specific memory entry
  3. Add a set_preference(key, value) tool that stores user preferences separately from facts
  4. Run 3 consecutive sessions — each one should know what the previous ones stored

Tomorrow: Multi-agent systems. One agent directing others. This is where things get interesting.