Together AI

Open-model hosting and fine-tuning

LLM API $0.18-$0.88 per M tokens
Visit Official Site →

What It Is

Together AI hosts the largest catalog of open-source LLMs in the world — 200+ models across Llama, Mistral, Mixtral, Qwen, DeepSeek, Gemma, and more. A single API key gives you access to everything, with managed fine-tuning, dedicated endpoints, and fast inference.

How It Works

Together's API is OpenAI-compatible. You can switch models by changing a single string in your request. Fine-tuning is managed end-to-end — upload your dataset, pick a base model, and Together handles the training, checkpointing, and serving. Dedicated endpoints give you reserved capacity for predictable latency and throughput. They also offer image generation (FLUX, SDXL) and embedding models alongside LLMs.

Pricing Breakdown

Llama 3.1 70B: $0.88 per M tokens (blended). Llama 3.1 8B: $0.18. Mixtral 8x22B: $1.20. Fine-tuning: $3-$20 per million training tokens depending on base model. Dedicated endpoints: $3-$10/hour depending on GPU type. Pay as you go.

Who Uses It

Pika Labs, Arcee AI, Labelbox, Nomic AI, and hundreds of startups building on open models. The default choice when Groq doesn't have the model you need.

Strengths & Weaknesses

✓ Strengths

  • Largest open-model catalog (200+)
  • Fine-tuning support
  • Dedicated endpoints
  • Strong fine-tuning UX

× Weaknesses

  • Slower than Groq for real-time
  • Less reliable uptime than closed APIs
  • Less specialized than Fireworks

Best Use Cases

Open-model deploymentFine-tuningModel evaluationProduction inference

Alternatives

Groq
World's fastest LLM inference
Fireworks AI
Fast open-model inference with fine-tuning
Replicate
Run open-source models via API
← Back to AI Tools Database