How much do tokens cost? LLM pricing per million tokens compared across providers

LLM API pricing is quoted per million tokens, split between input tokens (what you send) and output tokens (what the model generates). Output tokens typically cost 3-5x more than input tokens because generation is computationally more expensive. The cheapest models (DeepSeek V4 Flash, GPT-5.4 Nano) cost under $0.50 per million input tokens. The most expensive (Claude Opus 4.7) costs $15 per million input tokens.

Knowledge area
Fundamentals
How tokens and tokenization work
Topic focus
Token cost guide
token-cost

How this is calculated

To calculate cost: (input_tokens / 1,000,000) × input_price + (output_tokens / 1,000,000) × output_price. A single ChatGPT interaction with a 500-token prompt and 200-token response on GPT-5.4 costs roughly $0.003. A high-volume application processing 1 million requests per day with 1,000 input tokens and 500 output tokens each on GPT-5.4 costs about $7,500 per day. Prompt caching dramatically reduces costs for repeated content: cached input tokens on GPT-5.4 cost $0.62 per million (75% discount). Batch mode (24-hour turnaround) gives a 50% discount. Always check if your use case qualifies for caching or batch pricing before scaling.

Verdict

Token costs are small per-request but add up fast at scale. Use prompt caching for repeated content, batch mode for non-urgent processing, and cheaper models for tasks that don't need frontier intelligence. The built-in LLM Pricing Calculator on this site estimates costs across models and usage patterns.

More Tokens scenarios

Frequently asked questions

What is a token in an LLM?
A token is a chunk of text that a language model reads as a single unit. It is usually a common word, part of a longer word, or a piece of punctuation rather than a whole word or a single character. As a rough rule of thumb, one token is about four characters of English text, and 100 tokens is roughly 75 words.
How accurate is this token counter?
For OpenAI and Llama models the count is exact, because it uses the same Byte Pair Encoding tokenizers those models ship (o200k_base for GPT-5 and GPT-4o, cl100k_base for GPT-4 and GPT-3.5, and the Llama 3 tokenizer for Llama 3 and 4). For Claude and Gemini the count is a labelled estimate, since those tokenizers are not publicly available to run in the browser. Estimates are typically within about 10% of the real value.
Why do different models report different token counts?
Each model family is trained with its own tokenizer and vocabulary, so the same sentence can split into a different number of tokens depending on the model. Newer vocabularies like OpenAI's o200k_base are generally more efficient, packing more characters into each token, which lowers the count compared to older tokenizers.
Is my text sent to a server?
No. All tokenization and counting happens locally in your browser using a tokenizer that loads on the page. Nothing you type or paste is uploaded, logged, or stored, which makes the tool safe to use for private prompts and confidential text.