Day 03 Applied Skills

Async Endpoints and Streaming AI Responses

Synchronous AI endpoints block while waiting for Claude to respond. Async endpoints handle many requests concurrently. Streaming lets users see output as it generates instead of waiting for the full response.

~1 hour Hands-on Precision AI Academy

Today's Objective

Two versions of the same AI endpoint: a standard async endpoint and a streaming endpoint that sends tokens to the client as they arrive — exactly how ChatGPT works.


  
code
from fastapi import FastAPI
import anthropic

app = FastAPI()

# Sync: blocks a thread while waiting for Claude
@app.post("/sync-summarize")
def sync_summarize(text: str):
    client = anthropic.Anthropic()
    # ... call Claude synchronously

# Async: frees the thread while waiting for Claude
@app.post("/async-summarize")
async def async_summarize(text: str):
    client = anthropic.AsyncAnthropic()
    # ... await Claude asynchronously
01
Section 2 · 15 min

Build an Async AI Endpoint

The Anthropic SDK has an AsyncAnthropic client for async code. The pattern is nearly identical to the sync version — just add async/await:

main.py
python
from fastapi import FastAPI
from pydantic import BaseModel
import anthropic

app = FastAPI()
client = anthropic.AsyncAnthropic()

class SummarizeReq(BaseModel):
    text: str

@app.post("/summarize")
async def summarize(req: SummarizeReq):
    msg = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=512,
        messages=[{"role": "user", "content": req.text}]
    )
    return {"summary": msg.content[0].text}
01
Section 3 · 20 min

Streaming Responses

Streaming sends tokens to the client as they're generated instead of waiting for the full response. This is how ChatGPT feels fast — it starts showing you text immediately while Claude is still generating.

streaming.py
python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import anthropic

app = FastAPI()
client = anthropic.Anthropic()

def generate_stream(text: str):
    with client.messages.stream(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": text}]
    ) as stream:
        for text_chunk in stream.text_stream:
            yield text_chunk

@app.post("/stream")
def stream_response(text: str):
    return StreamingResponse(
        generate_stream(text),
        media_type="text/plain"
    )

Test the stream endpoint with curl: curl -X POST "http://localhost:8000/stream" -d "text=Explain quantum computing" — you'll see tokens arrive one by one.

60%

Want live instruction and hands-on projects? Join the AI bootcamp — 3 days, 5 cities.

Supporting References & Reading

Go deeper with these external resources.

FastAPI Docs
Async Endpoints and Streaming AI Responses Official FastAPI documentation with examples and guides.
YouTube
Async Endpoints and Streaming AI Responses FastAPI tutorials on YouTube
MDN
MDN Web Docs Comprehensive web technology reference

Day 3 Checkpoint

Before moving on, confirm understanding of these key concepts:

Continue To Day 4
Day 4 of the API Development for AI course