Llama is Meta's open-weight LLM family and the most widely deployed open AI model in the world. The Llama 4 generation (released 2025) introduced mixture-of-experts architecture with 17B and 109B active parameter variants. Permissive commercial license (with the acceptable use policy exclusion for 700M+ monthly active users) makes it the default choice for almost every open-weight deployment.
Meta releases base model weights on Hugging Face and llama.meta.com. You download them and run locally via Ollama, LM Studio, vLLM, llama.cpp, or hosted services like Together and Groq. For fine-tuning, Unsloth and Axolotl are the most popular frameworks. For serving at scale, vLLM or TGI are the production options.
Free to download and self-host. Hosted costs via Together ($0.18-$0.88/M), Groq ($0.05-$0.79/M), and Fireworks ($0.90/M). Self-hosting costs depend on your GPU — a single A100 can serve Llama 3.1 70B at reasonable throughput for under $1/hour on spot instances.
Essentially every AI company that self-hosts an open model. Goldman Sachs, Dell, IBM, Zoom, AT&T, Shopify, and thousands of others deploy Llama in production.