Synchronous AI endpoints block while waiting for Claude to respond. Async endpoints handle many requests concurrently. Streaming lets users see output as it generates instead of waiting for the full response.
Two versions of the same AI endpoint: a standard async endpoint and a streaming endpoint that sends tokens to the client as they arrive — exactly how ChatGPT works.
from fastapi import FastAPI import anthropic app = FastAPI() # Sync: blocks a thread while waiting for Claude @app.post("/sync-summarize") def sync_summarize(text: str): client = anthropic.Anthropic() # ... call Claude synchronously # Async: frees the thread while waiting for Claude @app.post("/async-summarize") async def async_summarize(text: str): client = anthropic.AsyncAnthropic() # ... await Claude asynchronously
The Anthropic SDK has an AsyncAnthropic client for async code. The pattern is nearly identical to the sync version — just add async/await:
from fastapi import FastAPI from pydantic import BaseModel import anthropic app = FastAPI() client = anthropic.AsyncAnthropic() class SummarizeReq(BaseModel): text: str @app.post("/summarize") async def summarize(req: SummarizeReq): msg = await client.messages.create( model="claude-sonnet-4-20250514", max_tokens=512, messages=[{"role": "user", "content": req.text}] ) return {"summary": msg.content[0].text}
Streaming sends tokens to the client as they're generated instead of waiting for the full response. This is how ChatGPT feels fast — it starts showing you text immediately while Claude is still generating.
from fastapi import FastAPI from fastapi.responses import StreamingResponse import anthropic app = FastAPI() client = anthropic.Anthropic() def generate_stream(text: str): with client.messages.stream( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": text}] ) as stream: for text_chunk in stream.text_stream: yield text_chunk @app.post("/stream") def stream_response(text: str): return StreamingResponse( generate_stream(text), media_type="text/plain" )
Test the stream endpoint with curl: curl -X POST "http://localhost:8000/stream" -d "text=Explain quantum computing" — you'll see tokens arrive one by one.
Want live instruction and hands-on projects? Join the AI bootcamp — 3 days, 5 cities.
Before moving on, confirm understanding of these key concepts: