What It Is

Gemma 4 is Google's open-weight model family, released April 2, 2026 under the Apache 2.0 license — the most permissive commercial license available. The family includes four sizes: Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts, and 31B Dense. All are natively multimodal (text, image, video, OCR) across every size, and the smaller variants also support audio input — a first for open models at that parameter count.

How It Works

Gemma 4 models are distributed as GGUF and SafeTensors files via Hugging Face, LM Studio, and Google Cloud's Vertex AI. The E2B and E4B variants are specifically engineered for edge deployment — they run with near-zero latency on phones, Raspberry Pi, Jetson Orin Nano, and modern laptops. Context windows scale up to 256K tokens. The 31B Dense model currently ranks #3 open model on the Arena leaderboard.

Pricing Breakdown

Free under Apache 2.0. Self-host costs depend on hardware: E2B runs on a phone, E4B on a Raspberry Pi 5, 26B MoE on a single RTX 4090, 31B Dense on a single A100 at reasonable throughput. Hosted via Google Cloud Vertex AI with pay-per-token pricing.

Who Uses It

On-device AI product teams, regulated industries needing on-prem deployment, educational platforms, and the broader open-source community. Popular specifically for edge AI use cases.