Token efficiency: how to say more with fewer tokens and cut your LLM bill
Token efficiency is about maximizing the intelligence you get per token you spend. Wasted tokens are wasted money. A verbose system prompt with 500 tokens of polite filler costs the same as 500 tokens of detailed instructions, and the latter gets better results. The most common token wastes: redundant instructions repeated across messages, politeness padding, examples that are longer than necessary, and including entire documents when only a section is relevant.
How this is calculated
Concrete efficiency tactics: use imperative mood ('Return a JSON array of...') instead of polite requests ('Could you please return...'), which saves 5-10 tokens per prompt with no quality loss. Put static instructions in the system prompt where they can be cached. Use the shortest example that demonstrates the pattern. For structured output, use the API's native structured output or JSON mode instead of describing the format in the prompt. Trim trailing whitespace, which some tokenizers count as tokens. For multi-turn conversations, prune the history aggressively: models rarely need more than the last 5-10 exchanges for context.
Verdict
Token efficiency compounds. Saving 50 tokens per request across 10,000 daily requests saves 500,000 tokens per day, which is real money at scale. Write for the model the way you'd write for a busy colleague: clear, direct, no fluff.
More Tokens scenarios
Frequently asked questions
What is a token in an LLM?
How accurate is this token counter?
Why do different models report different token counts?
Is my text sent to a server?
Related tools
LLM API Pricing Calculator
Compare API costs across major models (OpenAI, Anthropic, Google) with prompt caching.
Use tool ➜LLM VRAM Calculator
Calculate the VRAM needed to run or fine-tune any LLM at any quantization.
Use tool ➜JSON Formatter
Validate, format, and minify JSON data with syntax highlighting.
Use tool ➜