Waiting 5 seconds for a full response feels bad. Watching text appear word by word feels fast and alive. Streaming uses a fraction of the perceived latency. Today you'll implement Claude's streaming API in your React app.
Instead of waiting for the complete response, the server streams tokens as they're generated. The browser receives a stream of server-sent events (SSE) and updates the UI incrementally.
The flow: user sends message → server starts streaming Claude's response → browser receives chunks → React appends each chunk to the message in state → full response appears token by token.
app.post('/api/chat/stream', async (req, res) => {
const { messages } = req.body;
// Set SSE headers
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
try {
const stream = client.messages.stream({
model: 'claude-3-haiku-20240307',
max_tokens: 1024,
messages: messages,
});
// Send each text delta as an SSE event
stream.on('text', (text) => {
res.write(`data: ${JSON.stringify({ text })}\n\n`);
});
stream.on('end', () => {
res.write('data: [DONE]\n\n');
res.end();
});
stream.on('error', (err) => {
res.write(`data: ${JSON.stringify({ error: err.message })}\n\n`);
res.end();
});
} catch (err) {
res.write(`data: ${JSON.stringify({ error: err.message })}\n\n`);
res.end();
}
});
async function sendMessageStream(userText) {
const userMsg = { id: Date.now(), role: 'user', content: userText };
setMessages(prev => [...prev, userMsg]);
setIsLoading(true);
// Add a placeholder for the streaming response
const aiId = Date.now() + 1;
setMessages(prev => [...prev, { id: aiId, role: 'assistant', content: '' }]);
try {
const response = await fetch('/api/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ messages: [...messages, userMsg] })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(l => l.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6); // remove "data: "
if (data === '[DONE]') break;
try {
const { text } = JSON.parse(data);
if (text) {
// Append the chunk to the last message
setMessages(prev =>
prev.map(msg =>
msg.id === aiId
? { ...msg, content: msg.content + text }
: msg
)
);
}
} catch {}
}
}
} finally {
setIsLoading(false);
}
}
import { useRef, useEffect } from 'react';
function ChatMessages({ messages, isLoading }) {
const bottomRef = useRef(null);
useEffect(() => {
bottomRef.current?.scrollIntoView({ behavior: 'smooth' });
}, [messages]); // runs whenever messages update
return (
{messages.map(msg => )}
{isLoading && }
{/* invisible element at the bottom */}
);
}
Claude's responses often include markdown — bold, code blocks, lists. Render it properly:
npm install react-markdown
import ReactMarkdown from 'react-markdown';
function MessageCard({ role, content }) {
return (
{role === 'assistant'
? {content}
: {content}
}
);
}
/api/chat/stream endpoint to your Express server.App.jsx to call the streaming endpoint instead of the regular one.useRef and useEffect.react-markdown and render Claude's responses as markdown.Content-Type: text/event-stream on the server.setMessages with a map.useRef + scrollIntoView to auto-scroll to the latest message.react-markdown renders Claude's markdown formatting correctly.Add a 'Stop Generation' button that appears during streaming. When clicked, cancel the fetch request using AbortController and leave the partial response in the chat.