Courses Curriculum Cities Blog Enroll Now
API Development for AI · Day 3 of 5 ~45 minutes

Day 3: Async Endpoints and Streaming AI Responses

Synchronous AI endpoints block while waiting for Claude to respond. Async endpoints handle many requests concurrently. Streaming lets users see output as it generates instead of waiting for the full response.

1
Day 1
2
Day 2
3
Day 3
4
Day 4
5
Day 5
What You'll Build

Two versions of the same AI endpoint: a standard async endpoint and a streaming endpoint that sends tokens to the client as they arrive — exactly how ChatGPT works.

1
Section 1 · 10 min

Sync vs Async in FastAPI

When a synchronous endpoint calls Claude's API, the server thread waits — doing nothing — until Claude responds. With 100 concurrent users, that means 100 threads sitting idle waiting on Claude. It doesn't scale.

Async endpoints use Python's async/await to free the thread while waiting. The server handles other requests while Claude processes yours, then resumes your request when the response arrives.

pythonasync_example.py
from fastapi import FastAPI
import anthropic

app = FastAPI()

# Sync: blocks a thread while waiting for Claude
@app.post("/sync-summarize")
def sync_summarize(text: str):
    client = anthropic.Anthropic()
    # ... call Claude synchronously

# Async: frees the thread while waiting for Claude
@app.post("/async-summarize")
async def async_summarize(text: str):
    client = anthropic.AsyncAnthropic()
    # ... await Claude asynchronously
2
Section 2 · 15 min

Build an Async AI Endpoint

The Anthropic SDK has an AsyncAnthropic client for async code. The pattern is nearly identical to the sync version — just add async/await:

pythonmain.py
from fastapi import FastAPI
from pydantic import BaseModel
import anthropic

app = FastAPI()
client = anthropic.AsyncAnthropic()

class SummarizeReq(BaseModel):
    text: str

@app.post("/summarize")
async def summarize(req: SummarizeReq):
    msg = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=512,
        messages=[{"role": "user", "content": req.text}]
    )
    return {"summary": msg.content[0].text}
3
Section 3 · 20 min

Streaming Responses

Streaming sends tokens to the client as they're generated instead of waiting for the full response. This is how ChatGPT feels fast — it starts showing you text immediately while Claude is still generating.

pythonstreaming.py
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import anthropic

app = FastAPI()
client = anthropic.Anthropic()

def generate_stream(text: str):
    with client.messages.stream(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": text}]
    ) as stream:
        for text_chunk in stream.text_stream:
            yield text_chunk

@app.post("/stream")
def stream_response(text: str):
    return StreamingResponse(
        generate_stream(text),
        media_type="text/plain"
    )

Test the stream endpoint with curl: curl -X POST "http://localhost:8000/stream" -d "text=Explain quantum computing" — you'll see tokens arrive one by one.

What You Learned Today

  • Why async endpoints handle concurrent requests better than synchronous ones
  • How AsyncAnthropic client works — nearly identical syntax to sync, just add async/await
  • How StreamingResponse works for real-time token delivery
  • When to use streaming vs. non-streaming — user-facing chat UIs vs. background processing
Your Challenge

Go Further on Your Own

  • Build a streaming endpoint that returns Server-Sent Events (SSE) format instead of plain text — this is what React frontends expect for real-time AI responses
  • Add a timeout to your async endpoint: if Claude takes more than 30 seconds to respond, cancel the request and return a 504 error
  • Build a simple load test: use Python's asyncio to send 20 concurrent requests to your async endpoint and measure how it performs
Day 3 Complete

Nice work. Keep going.

Day 4 is ready when you are.

Continue to Day 4
Course Progress
60%

Want live instruction and hands-on projects? Join the AI bootcamp — 3 days, 5 cities.

Finished this lesson?