How to reduce token usage: practical prompt optimization that saves money
Prompt optimization is the practice of getting the same or better results from an LLM with fewer input tokens. Since most API pricing charges per input token, trimming your prompts directly reduces your bill. A 30% reduction in token usage means a 30% reduction in cost, with no change to the model or provider.
How this is calculated
Effective optimization techniques: remove politeness padding ('please', 'could you', 'I would like you to'), which can add 10-15% to token counts. Use system prompts for persistent instructions rather than repeating them in every user message. Compress few-shot examples to the minimum that demonstrates the pattern. Use the prompt cache feature (available on OpenAI, Anthropic, and DeepSeek) for static content like system prompts and few-shot examples; cached tokens cost 50-90% less. For structured output, use the model's native JSON mode or structured output feature rather than lengthy formatting instructions in the prompt. For multi-turn conversations, prune conversation history aggressively.
Verdict
Prompt optimization is the fastest way to cut LLM costs. Start by removing politeness language and moving instructions to the system prompt. Then enable prompt caching. These two changes alone typically reduce costs by 30% or more.
More Tokens scenarios
Related guides
Frequently asked questions
What is a token in an LLM?
How accurate is this token counter?
Why do different models report different token counts?
Is my text sent to a server?
Related tools
LLM API Pricing Calculator
Compare API costs across major models (OpenAI, Anthropic, Google) with prompt caching.
Use tool ➜LLM VRAM Calculator
Calculate the VRAM needed to run or fine-tune any LLM at any quantization.
Use tool ➜JSON Formatter
Validate, format, and minify JSON data with syntax highlighting.
Use tool ➜