Use-case base scores
Each model carries a 0–10 rating per category (coding, writing, reasoning, etc.) sourced from independent benchmarks and practitioner testing as of April 2026.
Context window penalty
If you need "enormous" context and a model tops out at 32K, it loses points. We match your stated need against actual context limits.
Cost weight
When "cost is critical," we boost cheaper models by up to 20 points and penalize premium ones. "Quality first" reverses those weights.
Privacy / hosting filter
Models that can't satisfy your hosting requirement are capped at 40 and labeled. On-prem requires open-source models.
Latency factor
Real-time use cases down-weight large reasoning models (Opus, GPT-5 full) and boost Flash/mini-tier models.
Volume adjustment
High-volume production penalizes models without robust API tiers or with high per-token cost, and boosts API-first models with batch discounts.