The shell tool is the primitive everything else in an agentic IDE builds on. Today you learn exactly how Claude Code, Cursor, and Antigravity spawn a subprocess, capture output, gate permissions, and stream results back into model context without blowing your token budget.
By the end of this lesson you will understand the shell-tool architecture used by every major agentic coding IDE, build a minimal 30-line version of it yourself, and know which security gates are non-negotiable in a production implementation.
An agentic coding IDE cannot do anything useful without a terminal. Compile code, run tests, install dependencies, deploy a build, check git status — all of it routes through a shell subprocess. If you strip Claude Code, Cursor, or Google Antigravity down to the smallest piece of functionality they all share, the shell tool is it. Everything else — file reads, file edits, search, agent spawning — is a convenience built on top of this one primitive.
This lesson walks through the exact pattern these tools use, shows you a minimal Python implementation you can run yourself, and explains the permission gates that separate a toy agent from a production one. If you have ever wondered how an LLM manages to run pytest on your machine, this is where the magic lives. There is no magic — it is subprocess.run with some adult supervision.
Every agentic IDE exposes a shell tool to its underlying model. The tool definition sent to the model looks approximately like this:
{
"name": "bash",
"description": "Execute a bash command in the user's shell. Use for running tests, git commands, file ops, and anything else that needs a real shell.",
"input_schema": {
"type": "object",
"properties": {
"command": {"type": "string", "description": "The command to run"},
"cwd": {"type": "string", "description": "Working directory, relative to project root"},
"timeout": {"type": "number", "description": "Seconds before kill"}
},
"required": ["command"]
}
}
When the model wants to run a command, it emits a tool call matching this schema. The IDE harness catches that tool call, runs the command through a real subprocess, captures the output, and sends the result back as the next turn in the conversation. That is the entire loop.
Here is a 30-line Python implementation you can actually run.
import subprocess, os def bash_tool( command: str, cwd: str = ".", timeout: int = 60, ) -> dict: """Minimal shell tool: runs a command, returns stdout/stderr/exit.""" # Resolve cwd to an absolute path inside the project workdir = os.path.realpath(os.path.join(PROJECT_ROOT, cwd)) if not workdir.startswith(PROJECT_ROOT): return {"error": "cwd escapes project root"} try: p = subprocess.run( command, shell=True, cwd=workdir, capture_output=True, text=True, timeout=timeout, ) return { "stdout": p.stdout[:30_000], # truncate for context budget "stderr": p.stderr[:10_000], "exit_code": p.returncode, "truncated": len(p.stdout) > 30_000, } except subprocess.TimeoutExpired: return {"error": f"timeout after {timeout}s"}
Run this in a REPL. Pass it ls -la and watch it work. That is, in its entirety, the core mechanism. Every permission system, streaming output, and sandbox feature we discuss next is built on top of exactly this primitive.
PROJECT_ROOT = os.path.realpath("."), and call bash_tool("ls -la"). You just implemented the core of what Claude Code does every time it runs a command. No magic.
The minimal shell tool above is a loaded gun with no safety. Any command the model emits will run. That is fine for a toy but a disaster in production — an agent that hallucinates rm -rf / at the wrong moment is a serious problem.
Every production agentic IDE solves this the same way: a permission gate in front of the subprocess. The gate decides whether a command gets run automatically, prompts you for confirmation, or refuses outright.
Every tool call hits subprocess.run immediately. No allowlist, no confirmation, no audit log. Works until the model hallucinates a destructive command. Then it doesn't.
Commands classified into safe, confirmed, or denied. Safe commands auto-run. Confirmed commands prompt the user. Denied commands are refused outright. Every production agent works this way.
Claude Code ships with four permission modes out of the box. Study them — they are the mental model for how every agentic IDE gates the shell.
bypassPermissions against a real codebase. Use it inside a disposable Docker container or a worktree you are willing to delete. Agents are confident, fast, and occasionally wrong about which directory they are in.
Here is a minimal permission gate you can add to the shell tool from Section 1.
SAFE_PATTERNS = ["ls", "cat", "grep", "rg", "fd", "git status", "git diff"] DENY_PATTERNS = ["rm -rf", "sudo", "curl | sh", "dd if="] def gate(command: str) -> str: """Return 'auto', 'confirm', or 'deny' for a given command.""" for pat in DENY_PATTERNS: if pat in command: return "deny" for pat in SAFE_PATTERNS: if command.strip().startswith(pat): return "auto" return "confirm" def gated_bash(command, **kwargs): decision = gate(command) if decision == "deny": return {"error": f"denied: {command}"} if decision == "confirm": if not ask_user(f"Run: {command}?"): return {"error": "user declined"} return bash_tool(command, **kwargs)
This is 20 lines. It is a lifetime more secure than the tool from Section 1. Every production agentic IDE has something more elaborate than this, but the shape is the same: classify, gate, run.
The other production concern is output size. A long-running command can produce enormous output — a test suite with 5,000 tests, a git log on a 10-year-old repo, a failed deploy with a 40,000-line stack trace. If you pipe all of that into model context, you blow the context budget in one tool call and the agent loses track of what it was doing.
The solution is always the same. Truncate output to a bounded size, show the most useful part, and note that truncation happened so the model can decide to rerun with different flags.
def tail(text: str, max_chars: int) -> str: """Keep the last max_chars characters. Errors usually live at the tail.""" if len(text) <= max_chars: return text truncated = text[-max_chars:] return f"... [{len(text) - max_chars} chars truncated] ...\n{truncated}" result = { "stdout": tail(p.stdout, 30_000), "stderr": tail(p.stderr, 10_000), "exit_code": p.returncode, }
Tail-truncation is not glamorous but it is the single most common pattern in production agentic tools. Everyone does it the same way.
You now have all the pieces. Every agentic IDE wires these same primitives together slightly differently. Here is the quick tour of Claude Code, Cursor, and Antigravity so you can recognize the patterns when you see them.
CLI-first. Runs in your actual terminal. The Bash tool uses a subprocess exactly like the one above, with a permission system that supports default, acceptEdits, bypassPermissions, and an allowlist mode. Output streamed with tail truncation. Session settings and allowlists live in ~/.claude/settings.json and per-project CLAUDE.md.
VS Code fork. The shell tool is surfaced as a "Run command" block inside chat with a confirmation UI. By default every command prompts for approval, and the user can toggle auto-run for individual commands. Output is rendered inline with a collapsible "show more" for long tails.
VS Code fork. Command execution goes through a staged queue — the model proposes commands that accumulate in a task panel, and you approve or edit before they run. This is the most conservative of the three, trading immediacy for a clearer audit trail of what the agent has staged.
Stop reading and build this. It takes 20 minutes and teaches you more than any further explanation.
my_shell_tool.py.bash_tool function from Section 1 and the gated_bash wrapper from Section 2 into it.PROJECT_ROOT = os.path.realpath(".") at the top of the file.print(gated_bash("ls -la")).gated_bash("git status"). Should auto-run.gated_bash("rm -rf /tmp/nothing"). Should be denied.gated_bash("npm install"). Should prompt (since npm isn't in the safe list).This is the same architecture Claude Code uses, minus the polish. Everything else in this course — search, file tools, agent spawning — is built on top of the primitive you just implemented.
Before moving to Day 2, you should be able to answer these without looking:
bash_tool does?You now understand how an agent reaches the terminal. In Day 2 we look at the next primitive stacked on top of it — efficient code search. You will learn why every agentic IDE replaced grep with ripgrep, and how to structure search output so an agent doesn't blow 40,000 tokens on noise.