What It Is

Groq's Language Processing Unit (LPU) is a custom silicon architecture purpose-built for LLM inference. Where GPUs process 30-100 tokens/second on a 70B model, Groq's LPUs deliver 500-1000+ tokens/second — a 10-20x speedup. For real-time applications like voice agents, live chat, and interactive coding, this speed difference fundamentally changes the UX.

How It Works

Groq runs open models (Llama, Mixtral, Qwen, DeepSeek) on their custom LPU hardware. The API is OpenAI-compatible, so integration is a drop-in replacement. The LPU architecture uses a single-core, software-scheduled design that eliminates batching overhead and memory contention that limit GPU throughput. The tradeoff is that Groq can only host open models with published weights — you can't fine-tune or use closed models.

Pricing Breakdown

Llama 3.1 70B: $0.59 input / $0.79 output per M tokens. Llama 3.1 8B: $0.05/$0.08. Mixtral 8x7B: $0.24/$0.24. Free tier available for experimentation. Pay-as-you-go, no contract.

Who Uses It

Builders of voice AI, real-time chat applications, live coding assistants, and anywhere latency matters more than model choice. Widely used for conversational AI prototypes.