What It Is

Ragas is the de facto open-source framework for evaluating retrieval-augmented generation (RAG) pipelines. It implements six canonical metrics — faithfulness, answer relevance, context precision, context recall, answer correctness, and answer semantic similarity — that measure different dimensions of RAG quality.

How It Works

Ragas uses an LLM-as-judge approach: for each metric, Ragas prompts a judge LLM (typically GPT-4 or Claude) to evaluate your generated answer against the question, retrieved contexts, and ground truth. Results are aggregated into a score per metric and per example. Integrates with LangChain, LlamaIndex, and raw Python pipelines.