Fireworks AI specializes in high-throughput inference for open-source models, with particular strength in serving agent workloads that require rapid tool calling and multi-step reasoning. Built by ex-Meta engineers who worked on PyTorch and ML infrastructure.
Fireworks runs open models (Llama, Mixtral, DeepSeek, Qwen, FireFunction — their own function-calling fine-tune) on their custom inference stack. Their FireFunction model is specifically optimized for tool calling, which matters for agent applications. They also offer 'compound AI' patterns — ensembles of small and large models working together — for cost-optimized production.
Llama 3.1 70B: $0.90 per M tokens. FireFunction V2: $0.90. DeepSeek V3: $0.90. Fine-tuning with LoRA: $0.50-$1 per million training tokens. Dedicated deployments on request.
Cursor, Notion AI, Quora, Upstage, and many AI-first startups. Popular for agentic workloads specifically.