Free Tool · 2026 Pricing

LLM API Cost Calculator
Claude, GPT-5, Gemini & More

Enter your request volume. Get exact daily, monthly, and annual API costs for every major model — with caching support and shareable links.

Your Usage

Adjust sliders to match your workload

Requests per day 1,000
11M
Avg input tokens / request 1,000
100100K
Avg output tokens / request 500
5010K
Cache discount
Claude & OpenAI offer 90% off cached tokens
Cache hit rate 50%
0%100%
Link copied to clipboard!
Sort by:
Model Daily Monthly Annual Per request Context

Monthly cost comparison

Which model should I pick?

Based on your volume and cost inputs

?

Adjust sliders to get a recommendation

We'll suggest the best model for your volume once you enter your usage.

Latency budget

Cheap models can mean slower p95 latency. Factor in time-to-first-token when picking a model for user-facing apps — a $0.10/day savings isn't worth 8s waits.

🔁

Retry overhead

Rate limit errors, timeouts, and provider outages trigger retries. Budget 5–15% extra tokens for retry logic in production systems — especially at high volume.

💾

Prompt caching setup

Caching saves money but adds engineering time. You need stable system prompts, cache-aware routing, and hit-rate monitoring. Real savings take 2–4 weeks to tune.

📊

Rate limit tiers

Default rate limits on new accounts are low. Tier upgrades require spend history or manual review — plan ahead if you're ramping to high volume fast.

🔐

Data residency & compliance

Enterprise agreements, HIPAA/FedRAMP variants, or EU data residency can add 20–50% to base API costs — not shown in any public pricing page.

📦

Batch vs. real-time

Batch APIs (Claude, OpenAI) offer 50% discounts for async workloads. If your use case tolerates 24h latency, you can cut costs in half — invisible in standard pricing.

Build smarter AI apps at our
2-Day Hands-On Bootcamp

Learn prompt engineering, API cost optimization, and production-grade AI architecture — in person, 5 cities, Oct 2026.

See Bootcamp Dates & Pricing