Client-side token counter vs API-reported counts: which should you trust?

There are two ways to know how many tokens your prompt uses: count them before sending with a client-side tokenizer, or check the usage field in the API response after the request completes. The client-side count is an estimate of the input tokens. The API response is the ground truth, including both input and output tokens. In practice, they should match exactly for input tokens (assuming you're using the same tokenizer the API uses).

Knowledge area
Best Practices
Practical token optimization advice
Topic focus
Token counter vs API
token-counter-vs-api

How this is calculated

For OpenAI models, the tiktoken library or gpt-tokenizer npm package gives exact pre-request counts that match the API response. For Llama models, llama3-tokenizer-js gives exact counts. For Claude and Gemini, client-side tokenizers don't have access to the real tokenizer, so any pre-request count is an estimate. The API response always includes the exact count. For cost estimation, use client-side tools for budgeting before you send. For billing and monitoring, use the API response values. If you see a mismatch between your client-side count and the API response for OpenAI models, check that you're using the correct tokenizer for the model (o200k_base for GPT-5 and GPT-4o, cl100k_base for older models).

Verdict

Use the client-side token counter for pre-request planning and prompt optimization. Trust the API response for billing and production monitoring. For OpenAI and Llama, they should match. For Claude and Gemini, treat the client-side count as an estimate within roughly 10%.

More Tokens scenarios

Frequently asked questions

What is a token in an LLM?
A token is a chunk of text that a language model reads as a single unit. It is usually a common word, part of a longer word, or a piece of punctuation rather than a whole word or a single character. As a rough rule of thumb, one token is about four characters of English text, and 100 tokens is roughly 75 words.
How accurate is this token counter?
For OpenAI and Llama models the count is exact, because it uses the same Byte Pair Encoding tokenizers those models ship (o200k_base for GPT-5 and GPT-4o, cl100k_base for GPT-4 and GPT-3.5, and the Llama 3 tokenizer for Llama 3 and 4). For Claude and Gemini the count is a labelled estimate, since those tokenizers are not publicly available to run in the browser. Estimates are typically within about 10% of the real value.
Why do different models report different token counts?
Each model family is trained with its own tokenizer and vocabulary, so the same sentence can split into a different number of tokens depending on the model. Newer vocabularies like OpenAI's o200k_base are generally more efficient, packing more characters into each token, which lowers the count compared to older tokenizers.
Is my text sent to a server?
No. All tokenization and counting happens locally in your browser using a tokenizer that loads on the page. Nothing you type or paste is uploaded, logged, or stored, which makes the tool safe to use for private prompts and confidential text.