A multi-turn chatbot that maintains conversation history — ask it something, follow up with "what did you just say?" and it knows. A terminal-based chat loop with full context management.
Why LLMs are stateless by default
Every API call to an LLM is independent. The model doesn't remember your last message. To build a chatbot, you have to manually include conversation history in every request. LangChain's memory classes automate this.
In modern LangChain, the cleanest approach is to manage history yourself using ChatMessageHistory and include it in the prompt:
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
model = ChatOpenAI(model="gpt-4o-mini")
# Manually manage history
history = [
SystemMessage(content="You are a helpful AI assistant. Be concise.")
]
def chat(user_input: str) -> str:
history.append(HumanMessage(content=user_input))
response = model.invoke(history)
history.append(AIMessage(content=response.content))
return response.content
# Chat loop
print("Chat started. Type 'quit' to exit.\n")
while True:
user = input("You: ")
if user.lower() == 'quit': break
print(f"AI: {chat(user)}\n")
Run it and test: ask "What is LangChain?" then follow up with "What framework did you just mention?" — it remembers because the entire history is included in each API call.
LangChain's Built-in Memory Wrapper
For production apps, LangChain provides RunnableWithMessageHistory — it automatically manages history storage per session:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
model = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
MessagesPlaceholder(variable_name="history"), # injects history here
("human", "{input}")
])
chain = prompt | model
# Store sessions in memory (use Redis/DB in production)
store = {}
def get_session_history(session_id: str):
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]
chain_with_history = RunnableWithMessageHistory(
chain,
get_session_history,
input_messages_key="input",
history_messages_key="history"
)
# Each session_id maintains its own history
config = {"configurable": {"session_id": "user_123"}}
r1 = chain_with_history.invoke({"input": "My name is Bo."}, config=config)
r2 = chain_with_history.invoke({"input": "What's my name?"}, config=config)
print(r2.content) # "Your name is Bo."
Session IDs let you run multiple conversations independently. In a web app, use the user's ID or session token. In a script, any unique string works.
Managing Context Window Limits
Long conversations eat your context window and cost money. You need to trim history. The simplest approach: keep only the last N messages.
from langchain_core.messages import trim_messages
# Keep last 10 messages (5 turns)
trimmer = trim_messages(
max_tokens=2000,
strategy="last",
token_counter=model,
include_system=True,
allow_partial=False
)
# Insert trimmer into the chain
chain_with_trim = (
RunnablePassthrough.assign(messages=lambda x: trimmer.invoke(x["messages"]))
| prompt | model
)
Day 2 Complete — What You Learned
- Why LLMs are stateless and how to add memory manually
- Built a multi-turn chat loop with manual history
- Used RunnableWithMessageHistory for session-based memory
- Managed context limits with trim_messages
Tomorrow: RAG — query your documents
Day 3 is the most in-demand LangChain skill — building retrieval-augmented generation pipelines that answer questions from your own documents.
Day 3: RAG Pipeline