What It Is

Promptfoo is a CLI-first tool for evaluating prompts across multiple models, running red-team tests, and comparing outputs side-by-side. Its declarative YAML config makes it easy to run the same prompts across OpenAI, Anthropic, Gemini, and local models, then see a comparison table.

How It Works

Define evals in a promptfooconfig.yaml file — list your providers (models), prompts, test cases, and assertions. Run `promptfoo eval` and it executes each prompt against each provider, runs assertions, and outputs a comparison table. Includes built-in red-team capabilities: prompt injection, jailbreaking, bias testing, and adversarial examples. Also has a web UI for exploring results.