llama.cpp

C++ LLM inference on CPU and GPU

Local Runtime Free (OSS)
Visit Official Site →

What It Is

llama.cpp is the foundational C++ inference engine for GGUF models. Runs on CPU, GPU, Apple Silicon, mobile, and everything in between. Powers Ollama, LM Studio, and dozens of other tools.

Strengths & Weaknesses

✓ Strengths

  • Runs anywhere
  • GGUF format
  • Extensive quantization options
  • No Python required

× Weaknesses

  • Lower-level API
  • Performance varies by hardware
  • C++ for customization

Best Use Cases

Edge deploymentCross-platformQuantized inferenceEmbedded systems

Alternatives

Ollama
Run LLMs locally with one command
vLLM
Production-grade LLM inference server
← Back to AI Tools Database