Day 01 Foundations

How Agents Access the Terminal

The shell tool is the primitive everything else in an agentic IDE builds on. Today you learn exactly how Claude Code, Cursor, and Antigravity spawn a subprocess, capture output, gate permissions, and stream results back into model context without blowing your token budget.

~1 hour Intermediate Hands-on Bo Peng

Today's Objective

By the end of this lesson you will understand the shell-tool architecture used by every major agentic coding IDE, build a minimal 30-line version of it yourself, and know which security gates are non-negotiable in a production implementation.

An agentic coding IDE cannot do anything useful without a terminal. Compile code, run tests, install dependencies, deploy a build, check git status — all of it routes through a shell subprocess. If you strip Claude Code, Cursor, or Google Antigravity down to the smallest piece of functionality they all share, the shell tool is it. Everything else — file reads, file edits, search, agent spawning — is a convenience built on top of this one primitive.

This lesson walks through the exact pattern these tools use, shows you a minimal Python implementation you can run yourself, and explains the permission gates that separate a toy agent from a production one. If you have ever wondered how an LLM manages to run pytest on your machine, this is where the magic lives. There is no magic — it is subprocess.run with some adult supervision.

01

The Shell Tool, Minimal Version

Every agentic IDE exposes a shell tool to its underlying model. The tool definition sent to the model looks approximately like this:

bash_tool_schema.json
JSON
{
  "name": "bash",
  "description": "Execute a bash command in the user's shell. Use for running tests, git commands, file ops, and anything else that needs a real shell.",
  "input_schema": {
    "type": "object",
    "properties": {
      "command": {"type": "string", "description": "The command to run"},
      "cwd":     {"type": "string", "description": "Working directory, relative to project root"},
      "timeout": {"type": "number", "description": "Seconds before kill"}
    },
    "required": ["command"]
  }
}

When the model wants to run a command, it emits a tool call matching this schema. The IDE harness catches that tool call, runs the command through a real subprocess, captures the output, and sends the result back as the next turn in the conversation. That is the entire loop.

Here is a 30-line Python implementation you can actually run.

shell_tool.py
Python
import subprocess, os

def bash_tool(
    command: str,
    cwd: str = ".",
    timeout: int = 60,
) -> dict:
    """Minimal shell tool: runs a command, returns stdout/stderr/exit."""
    # Resolve cwd to an absolute path inside the project
    workdir = os.path.realpath(os.path.join(PROJECT_ROOT, cwd))
    if not workdir.startswith(PROJECT_ROOT):
        return {"error": "cwd escapes project root"}

    try:
        p = subprocess.run(
            command,
            shell=True,
            cwd=workdir,
            capture_output=True,
            text=True,
            timeout=timeout,
        )
        return {
            "stdout": p.stdout[:30_000],       # truncate for context budget
            "stderr": p.stderr[:10_000],
            "exit_code": p.returncode,
            "truncated": len(p.stdout) > 30_000,
        }
    except subprocess.TimeoutExpired:
        return {"error": f"timeout after {timeout}s"}

Run this in a REPL. Pass it ls -la and watch it work. That is, in its entirety, the core mechanism. Every permission system, streaming output, and sandbox feature we discuss next is built on top of exactly this primitive.

Try it right now. Paste the function above into a Python file, set PROJECT_ROOT = os.path.realpath("."), and call bash_tool("ls -la"). You just implemented the core of what Claude Code does every time it runs a command. No magic.
02

Permissions: The Part That Actually Matters

The minimal shell tool above is a loaded gun with no safety. Any command the model emits will run. That is fine for a toy but a disaster in production — an agent that hallucinates rm -rf / at the wrong moment is a serious problem.

Every production agentic IDE solves this the same way: a permission gate in front of the subprocess. The gate decides whether a command gets run automatically, prompts you for confirmation, or refuses outright.

× Naive Shell Tool

Run Whatever the Model Says

Every tool call hits subprocess.run immediately. No allowlist, no confirmation, no audit log. Works until the model hallucinates a destructive command. Then it doesn't.

✓ Production Shell Tool

Gated by Permission Mode

Commands classified into safe, confirmed, or denied. Safe commands auto-run. Confirmed commands prompt the user. Denied commands are refused outright. Every production agent works this way.

Claude Code ships with four permission modes out of the box. Study them — they are the mental model for how every agentic IDE gates the shell.

Never run bypassPermissions against a real codebase. Use it inside a disposable Docker container or a worktree you are willing to delete. Agents are confident, fast, and occasionally wrong about which directory they are in.

Here is a minimal permission gate you can add to the shell tool from Section 1.

permission_gate.py
Python
SAFE_PATTERNS = ["ls", "cat", "grep", "rg", "fd", "git status", "git diff"]
DENY_PATTERNS = ["rm -rf", "sudo", "curl | sh", "dd if="]

def gate(command: str) -> str:
    """Return 'auto', 'confirm', or 'deny' for a given command."""
    for pat in DENY_PATTERNS:
        if pat in command:
            return "deny"
    for pat in SAFE_PATTERNS:
        if command.strip().startswith(pat):
            return "auto"
    return "confirm"

def gated_bash(command, **kwargs):
    decision = gate(command)
    if decision == "deny":
        return {"error": f"denied: {command}"}
    if decision == "confirm":
        if not ask_user(f"Run: {command}?"):
            return {"error": "user declined"}
    return bash_tool(command, **kwargs)

This is 20 lines. It is a lifetime more secure than the tool from Section 1. Every production agentic IDE has something more elaborate than this, but the shape is the same: classify, gate, run.

03

Output Streaming and Context Budgets

The other production concern is output size. A long-running command can produce enormous output — a test suite with 5,000 tests, a git log on a 10-year-old repo, a failed deploy with a 40,000-line stack trace. If you pipe all of that into model context, you blow the context budget in one tool call and the agent loses track of what it was doing.

The solution is always the same. Truncate output to a bounded size, show the most useful part, and note that truncation happened so the model can decide to rerun with different flags.

truncate_output.py
Python
def tail(text: str, max_chars: int) -> str:
    """Keep the last max_chars characters. Errors usually live at the tail."""
    if len(text) <= max_chars:
        return text
    truncated = text[-max_chars:]
    return f"... [{len(text) - max_chars} chars truncated] ...\n{truncated}"

result = {
    "stdout": tail(p.stdout, 30_000),
    "stderr": tail(p.stderr, 10_000),
    "exit_code": p.returncode,
}

Tail-truncation is not glamorous but it is the single most common pattern in production agentic tools. Everyone does it the same way.

04

How the Three IDEs Implement This

You now have all the pieces. Every agentic IDE wires these same primitives together slightly differently. Here is the quick tour of Claude Code, Cursor, and Antigravity so you can recognize the patterns when you see them.

Claude Code

CLI-first. Runs in your actual terminal. The Bash tool uses a subprocess exactly like the one above, with a permission system that supports default, acceptEdits, bypassPermissions, and an allowlist mode. Output streamed with tail truncation. Session settings and allowlists live in ~/.claude/settings.json and per-project CLAUDE.md.

Cursor

VS Code fork. The shell tool is surfaced as a "Run command" block inside chat with a confirmation UI. By default every command prompts for approval, and the user can toggle auto-run for individual commands. Output is rendered inline with a collapsible "show more" for long tails.

Google Antigravity

VS Code fork. Command execution goes through a staged queue — the model proposes commands that accumulate in a task panel, and you approve or edit before they run. This is the most conservative of the three, trading immediacy for a clearer audit trail of what the agent has staged.

The pattern is always: classify the command, gate it, run it, truncate the output, report the exit code. The UX layer changes. The architecture does not.
05

Hands-On: Build It Yourself

Stop reading and build this. It takes 20 minutes and teaches you more than any further explanation.

  1. Create a new file my_shell_tool.py.
  2. Paste the bash_tool function from Section 1 and the gated_bash wrapper from Section 2 into it.
  3. Set PROJECT_ROOT = os.path.realpath(".") at the top of the file.
  4. At the bottom, add a simple test: print(gated_bash("ls -la")).
  5. Run it. Verify the output looks right.
  6. Now try gated_bash("git status"). Should auto-run.
  7. Try gated_bash("rm -rf /tmp/nothing"). Should be denied.
  8. Try gated_bash("npm install"). Should prompt (since npm isn't in the safe list).

This is the same architecture Claude Code uses, minus the polish. Everything else in this course — search, file tools, agent spawning — is built on top of the primitive you just implemented.

Supporting Videos & Reading

Go deeper with these external references.

Day 1 Checkpoint

Before moving to Day 2, you should be able to answer these without looking:

What's Next

You now understand how an agent reaches the terminal. In Day 2 we look at the next primitive stacked on top of it — efficient code search. You will learn why every agentic IDE replaced grep with ripgrep, and how to structure search output so an agent doesn't blow 40,000 tokens on noise.

Continue To Day 2
Efficient Code Search with rg, fd, and ast-grep