Promptfoo is a CLI-first tool for evaluating prompts across multiple models, running red-team tests, and comparing outputs side-by-side. Its declarative YAML config makes it easy to run the same prompts across OpenAI, Anthropic, Gemini, and local models, then see a comparison table.
Define evals in a promptfooconfig.yaml file — list your providers (models), prompts, test cases, and assertions. Run `promptfoo eval` and it executes each prompt against each provider, runs assertions, and outputs a comparison table. Includes built-in red-team capabilities: prompt injection, jailbreaking, bias testing, and adversarial examples. Also has a web UI for exploring results.
Promptfoo CLI: free and open source. Promptfoo Cloud (managed UI + team features): free tier, paid tiers for enterprise.
Developers doing prompt engineering, security teams red-teaming LLM apps, and anyone comparing models. Popular for A/B testing prompt variations.