A document Q&A system — load a PDF or text file, index it into a vector store, and ask natural language questions that get answered from the document's content. This is the most deployed LangChain pattern in production.
Install RAG dependencies
pip install langchain langchain-openai langchain-community
pip install chromadb pypdf tiktoken
The RAG Pipeline — 4 Steps
1. Load documents from files, URLs, or databases. 2. Split them into small chunks (LLMs have context limits). 3. Embed chunks into vectors and store in a vector database. 4. Retrieve the most relevant chunks at query time, inject them into the prompt.
Why chunks? A 200-page PDF is too big for one context window. Splitting it into 500-token chunks means you can retrieve just the 3-5 most relevant sections for any given question.
Complete RAG Pipeline
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
# 1. Load a document
loader = TextLoader("my_document.txt")
docs = loader.load()
# 2. Split into chunks
splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50
)
chunks = splitter.split_documents(docs)
print(f"Created {len(chunks)} chunks")
# 3. Embed and store in Chroma vector DB
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
# 4. Build the RAG chain
model = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_template("""
Answer the question using only the provided context.
If the answer isn't in the context, say "I don't have that information."
Context:
{context}
Question: {question}
""")
def format_docs(docs):
return "\n\n".join(d.page_content for d in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt | model | StrOutputParser()
)
# Ask questions
answer = rag_chain.invoke("What are the main topics in this document?")
print(answer)
chunk_overlap=50 ensures sentences don't get cut off at boundaries. The last 50 tokens of each chunk overlap with the first 50 of the next, preserving context.
Day 3 Complete — What You Learned
- How RAG works: load → split → embed → retrieve
- Used TextLoader and RecursiveCharacterTextSplitter
- Embedded chunks into Chroma vector store
- Built a complete retrieval + generation chain with LCEL
Tomorrow: agents and tools
Day 4 shows how to build AI agents that reason, decide which tools to use, and take actions — not just generate text.
Day 4: Agents and Tools