What Is a Token? How Tokenization Works

Tokens are not words

AI models split text into sub-word pieces called tokens — not whole words. A token is roughly 4 characters of English prose, but it varies by model. Short common words are usually one token; long or rare words get split into two or three.

"tokenization" → ["token", "ization"] = 2 tokens "the" → ["the"] = 1 token "unbelievable" → ["un","believ","able"] = 3 tokens

Every model uses a different tokenizer

GPT-4o and GPT-5 use OpenAI's cl100k_base BPE tokenizer. Claude uses Anthropic's own BPE variant, which is slightly more efficient on English text (~3.5 chars/token vs 4). Gemini uses Google's SentencePiece. Code, JSON, and non-English text tokenize differently — often less efficiently.

English prose: ~4 chars / token (GPT) ~3.5 chars / token (Claude) Code / JSON: ~3 chars / token (more splits) CJK characters: ~1–2 chars / token

Tokens = money and limits

API pricing is per million tokens, billed separately for input and output. Context windows (the max tokens a model processes at once) range from 400K for GPT-5 to 2M for Gemini 2.5 Pro. Exceeding the context limit causes errors — so knowing your token count before calling the API prevents wasted spend.

Claude Opus 4.6 input: $15.00 / 1M tokens GPT-4o input: $2.50 / 1M tokens Gemini 2.5 Flash input: $0.075 / 1M tokens → 100K-token doc costs $0.0075–$1.50

For exact token counts, use the official tools: OpenAI Tokenizer and Anthropic Model Docs.

Heuristic estimates only. This tool uses character-ratio approximations that are accurate within ~5% for typical English prose. Code, non-English text, and heavy punctuation may differ more. For exact counts before a production API call, use the official tokenizer libraries (tiktoken for OpenAI, anthropic SDK count_tokens() for Claude).

Token Counter & Cost Calculator

What Is a Token? How Tokenization Works

Tokens are not words

Every model uses a different tokenizer

Tokens = money and limits

Learn to Use AI Models Efficiently

Token Counter & Cost Calculator

What Is a Token? How Tokenization Works

Tokens are not words

Every model uses a different tokenizer

Tokens = money and limits

More Free AI Tools

Learn to Use AI Models Efficiently