A Bedrock inference script that calls Claude via boto3, handles responses, and demonstrates streaming. You'll also build a simple comparison wrapper that shows the syntax difference between Bedrock and the direct Anthropic API.
When to use Bedrock vs calling Claude directly
| Factor | AWS Bedrock | Direct Anthropic API |
|---|---|---|
| Data residency | Stays in your AWS region | Anthropic's infrastructure |
| AWS integrations | Native (IAM, VPC, CloudWatch) | Manual integration needed |
| Multiple model providers | Yes (Titan, Llama, Cohere, etc.) | Anthropic models only |
| Setup complexity | More initial setup | Simpler to start |
| Best for | Enterprise, compliance-sensitive, AWS-native apps | Prototypes, direct use |
Rule of thumb: If you're already on AWS and need compliance (healthcare, government, finance), use Bedrock. If you're prototyping or the app doesn't need AWS services, use the Anthropic API directly.
Set up boto3 for Bedrock
pip install boto3
boto3 picks up your AWS credentials automatically from the aws configure setup in Day 1. No additional setup needed.
Call Claude via Bedrock
import boto3
import json
client = boto3.client("bedrock-runtime", region_name="us-east-1")
def call_claude(prompt, max_tokens=512):
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"messages": [{"role": "user", "content": prompt}]
})
response = client.invoke_model(
modelId="anthropic.claude-opus-4-5-20251101-v1:0",
body=body,
contentType="application/json",
accept="application/json"
)
result = json.loads(response["body"].read())
return result["content"][0]["text"]
# Test it
response = call_claude("Summarize what AWS Bedrock is in 2 sentences.")
print(response)
# Check usage metadata
print(f"\nInput tokens: {json.loads(boto3.client('bedrock-runtime', region_name='us-east-1').invoke_model(modelId='anthropic.claude-opus-4-5-20251101-v1:0', body=json.dumps({'anthropic_version':'bedrock-2023-05-31','max_tokens':10,'messages':[{'role':'user','content':'hi'}]}), contentType='application/json', accept='application/json')['body'].read())['usage']['input_tokens']}")
Stream responses for real-time output
import boto3
import json
client = boto3.client("bedrock-runtime", region_name="us-east-1")
response = client.invoke_model_with_response_stream(
modelId="anthropic.claude-opus-4-5-20251101-v1:0",
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Write a short poem about cloud computing."}]
}),
contentType="application/json",
accept="application/json"
)
for event in response["body"]:
chunk = json.loads(event["chunk"]["bytes"])
if chunk["type"] == "content_block_delta":
print(chunk["delta"]["text"], end="", flush=True)
print()
Day 2 Complete
- Understood when to use Bedrock vs the direct Anthropic API
- Made a synchronous Bedrock call with boto3
- Implemented streaming response handling
- Wrapped calls in a reusable function
Next: S3 + Lambda — Serverless AI Pipelines
Day 3 wires Bedrock into a Lambda function triggered by S3 file uploads. A fully serverless, event-driven AI pipeline.
Go to Day 3