Day 2: Efficient Code Search with rg, fd, ast-grep

Today's Objective

By the end of this lesson you'll know exactly which search tools agentic IDEs use and why, how to bound output size so agents don't burn 40,000 tokens on noise, and when ast-grep beats regex. You'll install ripgrep, fd, and ast-grep locally and run all three against a real codebase.

If Day 1 was about how agents reach the terminal, Day 2 is about what they do once they're there. The single most common tool call in any agentic IDE session is a search: find this function, locate these imports, show me every file that mentions that variable. Do this wrong and you blow the context budget on your first tool call. Do it right and the agent glides through a million-line codebase like it's fifty.

The difference between right and wrong is mostly which binary you call. Every major agentic IDE standardized on the same small set of modern tools. Once you see why, you'll never reach for grep or find in an agent tool again.

The Speed Gap Is Real, and It Matters

Here is the measurement everyone argues about until they run it themselves. On a 500k-line Python monorepo, the same content search was benchmarked six ways. These are representative wall-clock numbers from a warm cache on a typical modern laptop:

Task	Old Tool	Modern Tool	Speedup
Content search	grep -r "pattern"	rg "pattern"	~30×
File discovery	find . -name "*.py"	fd -e py	~10×
Syntax match	grep "def foo"	ast-grep 'def $NAME'	precision
Git-aware	grep -r (no gitignore)	git grep	scoped

Raw speed is only the surface of the story. The more important difference — for agents specifically — is what gets returned and how.

Install the tools right now. On macOS: brew install ripgrep fd ast-grep. On Linux: cargo install ripgrep fd-find ast-grep, or your distro's package manager. On Windows: scoop install ripgrep fd ast-grep or use the official GitHub releases.

Ripgrep: The New Default

Ripgrep (rg) is the content-search tool that ate the world. Written in Rust, uses SIMD-accelerated scanning, respects .gitignore by default, and has built-in file type filters. Claude Code wraps it. Cursor uses it. Antigravity uses it. When you see an agentic IDE search your code, rg is what's actually running.

Here are the flags you will use every day once you start using it.

rg "pattern"

Basic content search. Respects .gitignore automatically. Recursive by default. Works on any directory.

rg -l "pattern"

List only file names that contain the pattern. Ideal when the agent just needs to know which files, not the lines.

rg --type tsx "useAuth"

Built-in file type filters. Supports py, js, tsx, go, rust, java, and dozens more without regex file globbing.

rg -C 3 "pattern"

Include 3 lines of context before and after each match. The context window for humans AND agents — tells the model what the match means.

rg -n "TODO|FIXME"

Regex alternation. Line numbers included. The -n flag is critical because agents need to know where to edit.

rg "pattern" | head -50

Cap output size. Absolutely essential for agent tools — never return unbounded output.

Here is a concrete example of how Claude Code's Grep tool calls rg under the hood. The shape is identical across agentic IDEs:

agent_grep.py

Python

import subprocess

def grep_tool(
    pattern: str,
    path: str = ".",
    file_type: str = None,
    context: int = 0,
    head_limit: int = 50,
    output_mode: str = "content",  # content | files | count
) -> str:
    """Wraps rg with flags an agent can actually use."""
    cmd = ["rg"]
    if output_mode == "files":
        cmd.append("-l")
    elif output_mode == "count":
        cmd.append("-c")
    else:
        cmd.append("-n")  # line numbers for content mode
    if file_type:
        cmd.extend(["--type", file_type])
    if context:
        cmd.extend(["-C", str(context)])
    cmd.extend([pattern, path])

    out = subprocess.run(cmd, capture_output=True, text=True).stdout
    lines = out.split("\n")[:head_limit]
    return "\n".join(lines)

That wrapper is 25 lines and it is more or less what sits inside every agentic IDE. The complexity is not in the search itself. It is in the choice of flags: head_limit to cap size, output_mode to minimize data when the agent only needs file names, file_type to avoid wasted matches in irrelevant languages.

02.5

fd: File Discovery Without the Pain

fd is to find what rg is to grep. Rewritten in Rust, defaults that make sense, respects .gitignore, and a syntax that humans can actually remember.

fd_examples.sh

Bash

# All Python files under src/
fd -e py src/

# All files modified in the last day
fd --changed-within 1d

# All files matching a name pattern
fd "config" --type f

# All directories (not files)
fd --type d

# Exclude a path
fd -e ts --exclude node_modules

# Execute a command on each result
fd -e py --exec wc -l

In agent context, fd is usually the first tool an agent calls when it is orienting itself in a new codebase. "What Python files exist?" is the most common question an agent asks before it touches any code, and fd is the fastest way to answer it with structured, git-aware, bounded output.

ast-grep: When Regex Isn't Enough

Here is the tool almost nobody knows about and everyone should. ast-grep searches code by its parsed syntax tree, not its raw text. That means you can write patterns that match "any async function" or "any class method that returns a specific type" without writing unreadable regex.

Consider the problem. You want to find every async function in a Python codebase. With regex:

regex_attempt.sh

Bash

# Regex: brittle, misses decorated functions, matches strings by accident
rg "async def \w+\(" --type py

That looks fine until you realize it misses async def __call__ with a leading underscore run against by some patterns, misses decorated methods on a different line, and accidentally matches a string literal inside a docstring that contains the characters "async def foo(". Regex doesn't know what code is — it just sees characters.

ast-grep knows what code is because it actually parses it.

ast_grep_examples.sh

Bash

# Match every async function by structure
ast-grep --pattern 'async def $NAME($$$)' --lang python

# Match every class method that returns a dict
ast-grep --pattern 'def $NAME($$$) -> dict: $$$' --lang python

# Rewrite: replace print statements with logger calls
ast-grep --pattern 'print($ARG)' --rewrite 'logger.info($ARG)' --lang python

# TypeScript: find every useState that starts with a string
ast-grep --pattern 'useState("$STR")' --lang typescript

The $NAME, $$$, and $ARG are pattern variables that match any valid syntax tree node at that position. This is what an agent needs when it is doing a refactor. "Find every print call and rewrite it as a logger call" is a one-liner with ast-grep and a nightmare with regex.

When to reach for each: Use rg for finding text (names, strings, comments). Use fd for finding files. Use ast-grep for refactoring code by shape. The three tools together cover 95% of the searches any agentic IDE needs to do.

The Context Budget Problem

Here is the subtle thing everyone gets wrong on their first agent. Modern language models have huge context windows — a million tokens in some cases — but every token of search output costs money, latency, and mental attention from the model. Returning 5,000 lines of grep output is not free even when it fits.

Every agentic IDE addresses this with the same pattern: bounded output, with an option to zoom in.

Hard head_limit. Cap output at 50 or 100 lines by default. Never unbounded.
Files mode first, content mode second. If the agent just needs to know which files match, return paths (short) rather than matches (long).
Context lines on demand. Add -C 3 only when the agent explicitly asks for it — otherwise just return matching lines.
Signal truncation. When the limit is hit, tell the agent so it can narrow the query next time instead of guessing.

Hands-On: Build Your Own Search Tool

Stop reading. Build this. Takes about 25 minutes.

Install the tools: brew install ripgrep fd ast-grep (or the equivalent for your OS).
Pick a real project you work on. Any language, any size.
Run rg --type py "def " -l. Notice how fast it is and how it respects .gitignore.
Run fd -e py --changed-within 7d. See what you changed this week.
Run ast-grep --pattern 'def $NAME($$$) -> dict' against the same project. Every function that returns a dict, with zero regex.
Save the grep_tool wrapper from Section 2 into a file. Call it with different flags and see how the output changes.
Now try it with head_limit=10 on a broad pattern. Notice how the tool handles truncation.

Once you have run these on your own code, the speed difference becomes visceral. You will not go back.

Supporting Videos & Reading

Deepen your search skills with these external references.

GitHub

BurntSushi/ripgrep Official repo. README is a masterclass in tool design and speed engineering.

→

Article

ripgrep is faster than {grep, ag, git grep, ucg, pt, sift} Andrew Gallant's definitive benchmark post on why ripgrep is the speed leader.

→

GitHub

sharkdp/fd Official repo. Simple, beautifully documented alternative to find.

→

Docs

ast-grep documentation Official docs, interactive playground, and pattern language reference.

→

YouTube

ripgrep Tutorials Community walkthroughs on flags, gotchas, and real-world usage.

→

Day 2 Checkpoint

Before moving to Day 3, make sure you can answer these:

Why do agentic IDEs use rg instead of grep? Name two reasons.
What's the difference between rg's "content" and "files" output modes, and when does an agent want each?
What problem does ast-grep solve that regex can't?
Why is head_limit on every search tool, and what goes wrong without it?
Can you install ripgrep, fd, and ast-grep on your own machine and run a search right now?

What's Next

You can now reach the terminal and search it efficiently. Day 3 shows you how to do the most-requested thing in agentic IDEs: run multiple agents in parallel on the same repository without them colliding. The answer is git worktrees, and you'll learn it in fifteen minutes.

Continue To Day 3

Parallel Agents with Git Worktrees

→