AWS Bedrock Explained: How to Build AI Apps with Amazon's Foundation Models

In This Guide

  1. What Is AWS Bedrock?
  2. Models Available in Bedrock
  3. Bedrock vs. OpenAI API vs. Azure OpenAI
  4. Bedrock Architecture: APIs and Core Primitives
  5. Knowledge Bases: Managed RAG Without the Pipeline
  6. Agents for Bedrock: Autonomous AI with Tool Use
  7. Guardrails: Content Filtering and Safety Controls
  8. AWS Bedrock Pricing: On-Demand vs. Provisioned Throughput
  9. Setting Up Bedrock: IAM, boto3, and SDK Basics
  10. Building a Document Q&A System: Conceptual Walkthrough
  11. Bedrock for Government: FedRAMP and GovCloud
  12. When to Use Bedrock vs. Self-Hosting vs. OpenAI

Key Takeaways

I have built production AI applications on AWS Bedrock for federal clients in GovCloud — this is a practitioner guide, not a marketing overview. If you are building AI applications on AWS, you have a choice: wire together your own model hosting infrastructure, navigate separate API keys for every AI provider, and manage embedding pipelines, vector stores, and safety filters yourself — or use AWS Bedrock and let Amazon handle all of it.

Bedrock is Amazon's answer to the enterprise AI infrastructure problem. It gives you serverless, pay-per-token access to the most capable foundation models on the market — Claude from Anthropic, Llama from Meta, Titan from Amazon, Mistral, and Stability AI image models — all through a single, unified API that integrates natively with IAM, CloudWatch, VPC endpoints, and every other AWS service you already use.

This guide covers everything you need to know to actually build with it: the architecture, the pricing model, the managed services built on top of it, code examples, and a clear framework for deciding when Bedrock is the right choice versus self-hosting or the OpenAI API.

13+
Foundation models available in AWS Bedrock from 6+ providers
One API, one IAM policy, one billing line — multiple frontier models.

What Is AWS Bedrock?

AWS Bedrock is a fully managed foundation model service. The core value proposition is simple: you get API access to powerful AI models without managing any servers, GPU clusters, or model weights. You do not deploy anything. You do not configure CUDA drivers. You call an API and pay for what you use.

What separates Bedrock from simply calling the Anthropic API or OpenAI API directly is the integration layer. Bedrock is deeply embedded in the AWS ecosystem:

The result is a platform where you can move from "I want an AI feature" to "this AI feature is running in production with logging, access controls, content filtering, and a RAG pipeline" significantly faster than assembling those components yourself.

Bedrock Is Not a Model — It Is a Platform

A common misconception is that Bedrock is Amazon's AI model. It is not. Amazon has its own Titan models on Bedrock, but the platform primarily provides access to third-party models — Claude, Llama, Mistral — inside AWS's infrastructure and compliance boundary. Think of it as a managed marketplace for foundation models, with AWS-grade security wrapped around everything.

Models Available in Bedrock

AWS Bedrock gives you access to six model families — Anthropic Claude, Meta Llama, Amazon Titan, Mistral, Cohere, and Stability AI — through a single unified API, all within the AWS compliance boundary, with no separate accounts or API keys per provider required. The lineup changes as new models are released, but the major families as of early 2026 are:

Anthropic — Claude

The Claude model family is one of Bedrock's most capable offerings. Claude 3.5 Sonnet and Claude 3.5 Haiku are available for production workloads, with Claude offering strong performance on reasoning, long-context tasks, coding, and instruction following. Claude 3 Opus remains available for the most demanding tasks. Anthropic's models are particularly strong for enterprise applications requiring nuanced instruction following and safety-conscious outputs.

Meta — Llama

Llama 3.1 and Llama 3.2 variants are available, including the 8B, 70B, and 405B parameter versions. Llama is the leading open-weights model family and gives you a strong open-source option within the managed Bedrock environment — useful when you want model transparency or need to demonstrate that you are not sending data to a proprietary provider.

Amazon — Titan

Amazon's own Titan Text models (Express, Lite, Premier) cover general text generation and summarization. Titan Embeddings (v1 and v2) are the recommended embedding models for Bedrock Knowledge Bases. Amazon also offers Titan Image Generator for image synthesis tasks.

Mistral AI

Mistral Large, Mistral Small, and Mixtral 8x7B are available. Mistral's models are known for strong performance per dollar and European data residency provenance — useful for workloads with EU compliance requirements.

Stability AI

Stable Diffusion XL and Stable Image Core for image generation. Available through the same InvokeModel API as text models.

Model Access Requires Explicit Enablement

By default, no models are enabled in a new Bedrock account. You must go to the AWS console, navigate to Bedrock → Model access, and explicitly request access to each model family. Approval is typically instant for most models, but Anthropic models may require a brief review. This is by design — it ensures accountability for model usage.

Bedrock vs. OpenAI API vs. Azure OpenAI

All three platforms give you API access to large language models. The differences matter most at the enterprise and government level, where compliance, multi-model flexibility, and infrastructure integration drive decisions.

Feature AWS Bedrock OpenAI API Azure OpenAI
Model variety Claude, Llama, Titan, Mistral, Stability AI GPT-4o, o1, DALL-E, Whisper GPT-4o, o1 (Microsoft-hosted)
Authentication AWS IAM roles & policies API keys Azure AD / managed identity
FedRAMP High Yes No Yes (Azure Gov)
GovCloud availability Yes No Yes (Azure Gov)
Managed RAG (out of box) Yes — Knowledge Bases No (bring your own) Partial — Azure AI Search
Managed Agents Yes — Agents for Bedrock Yes — Assistants API Partial — preview
Content guardrails Yes — Guardrails for Bedrock Partial — Moderation API Partial — Azure Content Safety
VPC / private networking Yes — VPC endpoints No Yes — Private Link
Pricing model Per token, on-demand or provisioned Per token, on-demand Per token, on-demand or PTU
AWS ecosystem integration Native Manual Manual
Best for AWS-native enterprise & gov Consumer apps, startups Azure-native enterprise & gov

The headline difference: if you are already on AWS, Bedrock is almost always the right foundation. If you are a small team building a consumer-facing product and speed to market is everything, OpenAI's API has the fastest onboarding path. If your organization is Microsoft-first and needs FedRAMP compliance with GPT-4o specifically, Azure OpenAI is the answer.

Bedrock Architecture: APIs and Core Primitives

Bedrock exposes two primary invocation APIs plus the higher-order services built on top of them.

InvokeModel

The InvokeModel API is the low-level primitive. You send a raw request body that matches the model provider's schema and get back a raw response. The request body format varies by model family — Anthropic Claude uses the Messages API format, while Amazon Titan uses its own schema. This is the most flexible option but requires you to handle model-specific formatting.

Python — boto3 InvokeModel (Anthropic Claude)
import boto3
import json

client = boto3.client("bedrock-runtime", region_name="us-east-1")

body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 1024,
    "messages": [
        {
            "role": "user",
            "content": "Summarize the key risks in this contract."
        }
    ]
})

response = client.invoke_model(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    body=body,
    contentType="application/json",
    accept="application/json"
)

result = json.loads(response["body"].read())
print(result["content"][0]["text"])

Converse API

The Converse API is Bedrock's unified, model-agnostic interface introduced in 2024. Instead of formatting requests differently for each model, you use a single consistent schema and Bedrock translates it to whatever the underlying model expects. This is the recommended API for most new applications because it makes model switching trivially easy — swap the modelId and nothing else changes.

Python — boto3 Converse API (model-agnostic)
import boto3

client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse(
    modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
    # Swap to "meta.llama3-70b-instruct-v1:0" — same code, different model
    messages=[
        {
            "role": "user",
            "content": [{"text": "What are the main use cases for AWS Bedrock?"}]
        }
    ],
    inferenceConfig={
        "maxTokens": 1024,
        "temperature": 0.3
    }
)

output = response["output"]["message"]["content"][0]["text"]
print(output)

For streaming responses — useful for chat interfaces where you want text to appear progressively — use converse_stream instead of converse. The interface is identical; Bedrock handles the server-sent event stream.

Knowledge Bases: Managed RAG Without the Pipeline

Knowledge Bases for Bedrock is AWS's fully managed RAG service — you point it at an S3 bucket, choose an embedding model, and Bedrock handles chunking, embedding, vector storage, and retrieval automatically, eliminating the need to build and maintain a custom pipeline. Retrieval-Augmented Generation (RAG) is the most common pattern for building AI applications over proprietary data — internal documents, policy manuals, product catalogs, case files. The standard self-built RAG pipeline involves: chunking documents, running them through an embedding model, storing vectors in a vector database, retrieving relevant chunks at query time, and injecting them into the model prompt.

That pipeline takes real engineering effort to build and maintain. Knowledge Bases for Bedrock does all of it for you, managed and serverless.

How It Works

1

Connect a data source

Point Bedrock at an S3 bucket containing your documents (PDF, Word, HTML, CSV, text). You can also connect Confluence, SharePoint, Salesforce, and web crawlers as data sources.

2

Choose an embedding model

Select from Amazon Titan Embeddings v2 or Cohere Embed. Bedrock automatically chunks your documents, runs them through the embedding model, and stores the resulting vectors.

3

Select a vector store

Bedrock can manage the vector store for you (using OpenSearch Serverless behind the scenes), or you can bring your own — OpenSearch, Pinecone, Redis, or Aurora PostgreSQL with pgvector.

4

Query via RetrieveAndGenerate API

At runtime, pass a user question to the Knowledge Base. Bedrock retrieves the most relevant document chunks, injects them into the model's context, and returns a grounded answer with citations.

Sync, Not Real-Time

Knowledge Bases ingest documents through a sync job — you trigger a sync and Bedrock processes new and updated files. This is not real-time streaming ingestion. For most enterprise document Q&A use cases (policy docs, contracts, reports) this is perfectly fine. For applications that need sub-second ingestion of live data, you would need a custom pipeline.

Agents for Bedrock: Autonomous AI with Tool Use

An Agent for Bedrock is an AI system that can take multi-step actions by calling external APIs and AWS services autonomously, based on a user goal. Where a standard model invocation answers a question, an Agent breaks down a task, decides what tools to use, calls those tools, observes the results, and continues until the task is complete.

You define an Agent by specifying:

A real example: an HR document agent that can look up an employee record from an API, retrieve the relevant policy from a Knowledge Base, and generate a personalized response. The agent reasons about which actions to take, calls the Lambda behind your HR API, fetches the policy document, and synthesizes a complete answer — without you writing any orchestration logic.

Agents vs. Knowledge Bases: What's the Difference?

Knowledge Bases handle retrieval — answering questions from documents. Agents handle action — taking multi-step sequences involving external systems. You can attach a Knowledge Base to an Agent to give it both retrieval and action capabilities simultaneously.

Guardrails: Content Filtering and Safety Controls

Guardrails for Bedrock is a configurable safety layer that sits between your application and any model on Bedrock. It applies on both the input (user prompt) and the output (model response), and it works with every model on the platform — not just Amazon's own Titan models.

Guardrails give you four main control surfaces:

Guardrails Are Applied at Invocation, Not Configuration

You attach a Guardrail ID and version to your model invocation call. This means you can use the same guardrail across multiple models and applications, update it centrally, and have consistent safety behavior without modifying application code. This is a significant operational advantage for enterprises managing dozens of AI applications.

AWS Bedrock Pricing: On-Demand vs. Provisioned Throughput

Bedrock offers two pricing modes. Understanding them matters for cost management at scale.

On-Demand Pricing

You pay per token — input tokens and output tokens are priced separately. There is no minimum commitment. This is ideal for applications with variable or unpredictable traffic. Pricing varies significantly by model: as of early 2026, Claude 3.5 Haiku is roughly $0.80 per million input tokens and $4 per million output tokens, while Claude 3.5 Sonnet is roughly $3 per million input tokens and $15 per million output tokens. Llama 3.1 70B is cheaper at roughly $0.99 per million input/output tokens. Exact pricing changes — check the AWS Bedrock pricing page for current rates.

Provisioned Throughput

Provisioned Throughput lets you purchase a guaranteed number of model units (MUs) — essentially reserved capacity. You pay a fixed hourly rate regardless of whether you use the capacity. This is appropriate when you have:

Pricing Mode Cost Structure Best For Fine-Tuned Models
On-Demand Per input/output token Variable traffic, prototyping, low volume Not supported
Provisioned Throughput Fixed hourly rate per Model Unit High volume, consistent load, SLA requirements Required

For most teams building internal tools, prototypes, or moderate-traffic applications, on-demand is the right starting point. Revisit Provisioned Throughput when you have real usage data and can quantify whether the reserved capacity math works in your favor.

Setting Up Bedrock: IAM, boto3, and SDK Basics

Getting your first Bedrock call working takes about 15 minutes. Here is the setup sequence:

1

Enable model access in the console

Navigate to the AWS console → Amazon Bedrock → Model access. Select the model families you need and click Request access. Wait for approval (usually instant).

2

Configure an IAM policy

Create or update an IAM policy to allow bedrock:InvokeModel and bedrock:Converse on the specific model ARNs you need. Attach this policy to the IAM role your application assumes — a Lambda execution role, an ECS task role, or a developer's IAM user for local development.

3

Install boto3 and configure credentials

Run pip install boto3. Configure credentials via aws configure for local development, or use the role-based credential chain automatically when running on EC2, Lambda, or ECS.

4

Instantiate the bedrock-runtime client

Use boto3.client("bedrock-runtime") for model invocations. The bedrock client (without "-runtime") is used for management operations like listing models and managing Provisioned Throughput — not for actual inference.

IAM Policy — Minimal Bedrock Invoke Permissions
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream",
        "bedrock:Converse",
        "bedrock:ConverseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
        "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-haiku-20241022-v1:0"
      ]
    }
  ]
}

Building a Document Q&A System: Conceptual Walkthrough

A production document Q&A system on Bedrock requires four AWS services: S3 for document storage, a Bedrock Knowledge Base for managed RAG, Lambda for the API handler, and API Gateway for the HTTP endpoint — with optional Guardrails for PII redaction. You can have this architecture running in under a day. The most common enterprise AI application is exactly this: "Ask questions about our policy manual," "Search the contract library," "Query our internal knowledge base." Here is how you build one on Bedrock end-to-end.

Architecture

The Query Flow

When a user submits a question, your Lambda calls the RetrieveAndGenerate API with the user's question and the Knowledge Base ID. Bedrock automatically embeds the question, searches the vector store for the top-K relevant chunks, injects those chunks into a prompt, calls the generation model (your choice — Claude is common), and returns the answer along with citations showing which source documents were used.

Python — Knowledge Base RetrieveAndGenerate
import boto3

agent_client = boto3.client("bedrock-agent-runtime", region_name="us-east-1")

response = agent_client.retrieve_and_generate(
    input={"text": "What is the company's remote work policy for international travel?"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": "YOUR_KB_ID",
            "modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {"numberOfResults": 5}
            }
        }
    }
)

answer = response["output"]["text"]
citations = response["citations"]  # Source documents used
print(answer)

The citations object contains the exact text passages retrieved, the S3 URIs of the source files, and relevance scores. This is critical for enterprise applications where users need to verify answers against original documents — you can surface "This answer was drawn from PolicyManual_v3.pdf, pages 12-14" directly in your UI.

Bedrock for Government: FedRAMP and GovCloud

AWS Bedrock holds FedRAMP High authorization and is available in AWS GovCloud (US-East) and GovCloud (US-West). This makes it one of the few managed foundation model services that federal agencies can use for workloads involving sensitive, mission-critical, or Controlled Unclassified Information (CUI) data.

What FedRAMP High Actually Means

FedRAMP High covers systems where the loss of confidentiality, integrity, or availability could have severe or catastrophic impact on agency operations. This is the authorization level required for most law enforcement, intelligence community, and defense agency workloads. FedRAMP Moderate (which many commercial cloud services hold) is not sufficient for these use cases.

GovCloud Considerations

Running Bedrock in GovCloud means your model invocation traffic, request/response data, and any stored embeddings stay physically within AWS infrastructure restricted to U.S. persons. Key operational differences from commercial regions:

Authority to Operate (ATO)

FedRAMP authorization for Bedrock covers AWS's responsibility layer — the infrastructure, the service endpoints, the encryption at rest and in transit. Your application running on top of Bedrock still requires its own ATO from the relevant agency. The AWS FedRAMP package simplifies the ATO process significantly by providing pre-authorized controls you can inherit, but it does not eliminate the agency authorization requirement for your application.

Federal Agencies Already Using Cloud AI

Following Executive Order 14110 on AI safety and the subsequent OSTP AI in Government guidance, federal agencies have significantly accelerated cloud AI adoption. Bedrock's FedRAMP High authorization and GovCloud availability make it a natural choice for agencies that are already AWS shops — and the majority of the federal civilian government is on AWS in some capacity.

When to Use Bedrock vs. Self-Hosting vs. OpenAI

Use Bedrock if you are already on AWS or need FedRAMP compliance. Use the OpenAI API directly if speed of onboarding is the priority and compliance is not a factor. Self-host only if you have air-gap requirements or the GPU infrastructure to run models efficiently at scale. The choice depends on your compliance requirements, infrastructure preferences, team skills, and the specific application you are building.

Use Bedrock When:

Use the OpenAI API (Direct) When:

Self-Host Models When:

For most enterprise teams building internal AI tools on AWS, Bedrock is the path of least resistance to production-quality AI with enterprise-grade compliance. The managed RAG, Agents, and Guardrails alone save months of engineering compared to building equivalent functionality yourself.

We teach Bedrock, Claude API, and cloud AI hands-on.

Precision AI Academy's three-day bootcamp covers AWS Bedrock, Knowledge Bases, Agents, and how to integrate them into real applications. Denver, LA, NYC, Chicago, and Dallas — October 2026. $1,490, 40 seats per city.

Reserve Your Seat

The bottom line: AWS Bedrock is the right foundation model platform for teams already on AWS — it trades the simplicity of a single-provider API key for enterprise-grade IAM integration, FedRAMP High authorization, multi-model flexibility, and managed services (Knowledge Bases, Agents, Guardrails) that would take months to build yourself. If you are building AI applications in a regulated or AWS-native environment, Bedrock eliminates more infrastructure work than any other option on the market.

Frequently Asked Questions

Do I need to manage servers to use AWS Bedrock?

No. Bedrock is fully serverless. You call an API endpoint and pay per token. There are no EC2 instances to manage, no GPU clusters to configure, no Docker containers to deploy for the base model access. If you use Agents for Bedrock with Lambda-backed action groups, you do manage those Lambda functions — but the model inference layer itself is entirely managed by AWS.

Can I use my own fine-tuned model on Bedrock?

Yes. Bedrock supports fine-tuning for select model families (Amazon Titan, Llama, Cohere Command) using your own training data stored in S3. Fine-tuned models must be deployed on Provisioned Throughput — they cannot run on the on-demand tier. You can also import custom model weights (for supported architectures) if you have fine-tuned externally and want to run them through the Bedrock interface.

How does Bedrock handle data privacy?

AWS commits that data sent to Bedrock for inference is not used to train the underlying foundation models. Your prompts and completions are not shared with model providers (Anthropic, Meta, etc.). Data is encrypted in transit with TLS and at rest. For GovCloud deployments, data stays within the GovCloud boundary. AWS's data processing addendum for Bedrock covers these commitments in detail — review it with your legal team for regulated industry use cases.

Is Bedrock supported in the AWS CDK and Terraform?

Yes. AWS CDK has L1 constructs for Bedrock resources (Knowledge Bases, Agents, Guardrails) and the community has contributed higher-level L2 constructs. The official AWS CDK Bedrock alpha module provides production-ready constructs for common patterns. Terraform's AWS provider includes Bedrock resources for infrastructure-as-code deployments. Both are actively maintained and generally stable for the core service features.

Cloud AI is the core skill of the next decade.

AWS Bedrock, Azure OpenAI, Google Vertex AI — the teams that understand these platforms will build the most valuable applications. Our bootcamp puts you in the room with professionals who are already building on these stacks. Five cities, October 2026.

Reserve Your Seat

Note: AWS Bedrock pricing, model availability, and feature set change frequently. Verify current details at aws.amazon.com/bedrock before architectural decisions. Code examples reflect boto3 SDK patterns as of early 2026 and should be validated against the current SDK documentation.

Sources: AWS Documentation, Gartner Cloud Strategy, CNCF Annual Survey

BP

Bo Peng

AI Instructor & Founder, Precision AI Academy

Bo has trained 400+ professionals in applied AI across federal agencies and Fortune 500 companies. Former university instructor specializing in practical AI tools for non-programmers. Kaggle competitor and builder of production AI systems. He founded Precision AI Academy to bridge the gap between AI theory and real-world professional application.

Explore More Guides