Day 4: Iteration Loops and Stopping Conditions

Today's Objective

By the end of this lesson you will build an agent loop with all four production stopping conditions, understand why runaway agents are the most common failure mode in shipped agentic products, and know the priority order for combining the four conditions.

A surprisingly common rookie mistake: wiring up a Claude API loop without a stopping condition and letting it run for nine hours on an impossible task, burning fifty dollars in API credits before anyone notices. It is easy to do. You forget to set max_iters, the task turns out to be unreachable, and the model patiently generates variations, runs tests, and generates more variations with complete confidence. The Anthropic dashboard lights up. The lesson sticks permanently.

Every serious production agent has to avoid this story. The only way is to design the loop for termination from the first line. Today's lesson covers how.

Why Stopping Is the Hard Part

Giving an agent a loop is trivial. Call the model, execute its tool calls, feed the results back, repeat. That is maybe twenty lines of Python. What is hard is knowing when you have actually arrived at the destination.

Consider the failure modes of a naive loop that has no stopping condition beyond "keep going":

The task is impossible or ambiguous. The agent will chase a goal that cannot be reached forever.
The task is already done but the agent does not recognize it, and keeps trying to "improve" something that is already correct.
The agent oscillates — it makes a change, tests fail, it reverts, tests fail differently, it makes the same change again, forever.
The model hallucinates progress. It believes it is making headway, announces partial success, and loops on the next step which does not exist.

The Four Stopping Conditions

Every serious agentic IDE uses some combination of these four. None of them is sufficient alone. All four are necessary in production.

Explicit Completion Tool Call

The agent itself calls a done tool that signals it has finished. This is the ideal stop — the model decides, not the harness. Claude Code and Cline both prefer this pattern.

Highest priority: model says done

Success Check (Green Tests)

If the task is framed as "make the test suite pass," the stop is obvious: run the tests, exit on green. Most real coding tasks can be framed this way, and when they can, it's the most reliable stop signal in existence.

Use when: task has verifiable success

Budget Cap (Tokens or Time)

Hard ceiling. The agent gets N tokens, M wall-clock seconds, or K tool calls. When exhausted, the loop halts and reports whatever progress it made. Prevents runaway costs. Every production system has this.

Safety net: always present

Max Iterations

Simplest safeguard. After K loop iterations, halt. Not the most elegant but it's a reliable backstop against infinite loops and lets you tune behavior with one knob. Typical values: 10 for simple tasks, 30 for complex ones, 100 for research runs.

Last line of defense

The priority order matters. Check completion first, success second, budget third, iterations last. If the agent calls "done" on iteration 3, you want to stop — not keep running because you haven't hit the max yet. Order the checks by how "clean" the stop is, not by how cheap the check is.

The Production Loop

Here is a real, runnable agent loop with all four stopping conditions. 40 lines of Python. You can copy this into a file right now and build on it.

agent_loop.py

Python

from anthropic import Anthropic

client = Anthropic()
TOOLS = ["bash", "read_file", "edit_file", "done"]

def run_agent(
    task: str,
    max_iters: int = 20,
    max_tokens: int = 100_000,
    success_check=None,
) -> dict:
    messages = [{"role": "user", "content": task}]
    tokens_used = 0

    for i in range(max_iters):
        resp = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            tools=TOOLS,
            messages=messages,
        )
        messages.append({"role": "assistant", "content": resp.content})
        tokens_used += resp.usage.input_tokens + resp.usage.output_tokens

        # STOP 1: explicit completion
        if any(b.type == "tool_use" and b.name == "done" for b in resp.content):
            return {"status": "complete", "iters": i + 1, "tokens": tokens_used}

        # STOP 2: success check (e.g., tests pass)
        if success_check and success_check():
            return {"status": "success", "iters": i + 1, "tokens": tokens_used}

        # STOP 3: budget cap
        if tokens_used >= max_tokens:
            return {"status": "budget_exhausted", "iters": i + 1, "tokens": tokens_used}

        # Execute any tool calls the model issued
        tool_results = []
        for b in resp.content:
            if b.type == "tool_use":
                result = execute_tool(b.name, b.input)
                tool_results.append({"type": "tool_result", "tool_use_id": b.id, "content": result})
        if tool_results:
            messages.append({"role": "user", "content": tool_results})

    # STOP 4: max iterations
    return {"status": "max_iters", "iters": max_iters, "tokens": tokens_used}

This is the same shape Cline uses, the same shape Claude Code uses, the same shape aider uses. Different languages, different tools, different prompts — same loop structure.

Tuning the Limits

The values above are production defaults but every task needs its own tuning. A rough guide for sizing limits on a new task:

Simple edit (add logging, rename a variable): max_iters=5, max_tokens=20k. If it takes more than 5 iterations, something is wrong with the task framing.
Feature implementation (add a new endpoint, write a class): max_iters=20, max_tokens=100k. Enough room to iterate on tests, discover edge cases, refactor.
Research run (explore a codebase, answer "where is X implemented"): max_iters=30, max_tokens=150k. Reading is cheap, exploration takes many cheap steps.
Migration or sweeping refactor: max_iters=50, max_tokens=300k. Most expensive, most likely to need a human in the loop at checkpoints.

Never leave max_tokens=None. This is the single most common production bug in agent engineering. Forget this one thing and you get the nine-hour-fifty-dollar story. The budget cap is not a nice-to-have — it is the only thing standing between you and an infinite loop.

When the Loop Stops at the Wrong Time

The other failure mode is the opposite: the loop stops too early. The model calls done prematurely because it is overconfident, or the success check returns true on a test suite that was passing before the agent touched anything. Both are real.

The mitigations are specific:

For premature done: make the done tool's schema require the model to summarize what it actually accomplished. Forcing a summary forces the model to actually check.
For false-positive success: capture test results before the agent starts and compare to the final results. Only call it success if both the test count and the pass count improved.
For "looks done but isn't": add a verification step as the last iteration. Run git diff and feed it back. The model often catches its own mistakes when shown the actual diff.

Hands-On: Break the Loop

Copy the 40-line loop above into a file called my_agent.py. Wire up an Anthropic API key. Then deliberately break it in each of the four ways so you see each stopping condition fire.

Trigger max_iters. Give it an impossible task ("find a bug in this empty file"). Set max_iters=5, max_tokens=very_high. Watch it exhaust iterations and return "max_iters".
Trigger budget cap. Set max_tokens=500, give it any task. It'll hit the budget on iteration 1 or 2 and return "budget_exhausted".
Trigger success. Pass a success_check that returns True after a few iterations. Watch it return "success" mid-loop.
Trigger completion. Give it a real task it can actually finish (add a docstring to one file). Watch the model itself call done and return "complete".

Seeing all four fire on the same code in one session is the single clearest way to internalize how production agent loops terminate. It takes 20 minutes and locks the concept in permanently.

Supporting Videos & Reading

Go deeper on agent loop design.

GitHub

cline/cline — agent loop source Production-quality open-source agent loop. Read the stop conditions in the controller code.

→

GitHub

SWE-agent loop design Princeton's academic paper and reference implementation of a loop that solves real GitHub issues.

→

Docs

Anthropic tool use documentation The canonical reference for how tool calls flow through Claude's API — the foundation of every agent loop.

→

YouTube

Build an AI Agent in Python Community walkthroughs of minimal agent loops. Compare their stop logic to what you just built.

→

Day 4 Checkpoint

Before Day 5, make sure:

You can name the four stopping conditions in priority order without looking.
You have actually run the 40-line loop with an Anthropic key and seen at least two of the conditions fire.
You understand why max_tokens should never be left unbounded.
You know the difference between a false-positive success and a real one.
You can explain — in your own words — why the model itself calling done is preferable to any harness-side stop.

What's Next

Day 5 is the capstone. You take everything from days 1-4 — shell tool, search, worktrees, stopping conditions — and assemble them into a working agentic coder of your own. Under 150 lines of Python. You will be able to point it at any repo and watch it actually work.

Continue To Day 5

Build Your Own Agent Loop (Capstone)

→