Back to Home

LLM API Pricing & Cost Calculator

Compare API costs across OpenAI, Anthropic, Google, and DeepSeek. Model caching discounts and batch requests.

Request Parameters

0% cached
Batch API Mode (50% off)

Add Models

Cost Comparison

Gemini 3.5 Flash
$6.50
Claude Haiku 4.5
$12.50
Gemini 3.1 Pro (<=200k)
$26.00
GPT-5.4
$32.50
Claude Sonnet 4.6
$37.50
Claude Opus 4.7
$62.50
GPT-5.5
$65.00

Model Specs (per 1M tokens)

ModelInputCached InputOutput
Claude Haiku 4.5 $1.00$0.10$5.00
Claude Opus 4.7 $5.00$0.50$25.00
Claude Sonnet 4.6 $3.00$0.30$15.00
Gemini 3.1 Pro (<=200k) $2.00$0.20$12.00
Gemini 3.5 Flash $0.50$0.25$3.00
GPT-5.4 $2.50$0.25$15.00
GPT-5.5 $5.00$0.50$30.00

About this tool

The LLM API Pricing Calculator helps developers and startups estimate the cloud costs of integrating major AI models like GPT-5.4, Claude Sonnet 4.6, or DeepSeek V4. You can tweak parameters such as input/output token counts, prompt caching discounts, and batch processing to see exactly how your monthly bill changes.

Unlike static pricing tables, this calculator models the compounding effect of multi-turn agentic workflows. By toggling 'Agentic Loop', you can see how Anthropic and Google's aggressive caching discounts flip the economics of running autonomous agents.

Prompt and Context Caching

Caching is the single most important variable in modern AI economics. If you send the same 100K token system prompt repeatedly, caching means you only pay full price for it once, and a tiny fraction (10% to 50%) for subsequent hits. The calculator lets you estimate what percentage of your input tokens will be cached.

Batch API Mode

If your application does not require real-time latency, you can route requests through a Batch API. This guarantees a 50% discount across nearly all major platforms in exchange for a 24-hour turnaround window.

Standard Chat vs Agentic Workflows

A standard chat prompt has a predictable 3:1 input-to-output ratio. But autonomous agents accumulate context: each turn they add their previous thought process to the prompt, making the input grow exponentially. Caching prevents these loops from destroying your API budget.

Pre-computed API pricing calculators for the most heavily debated AI models.

Frequently asked questions

Does OpenAI support prompt caching?
Yes, OpenAI supports prompt caching natively on newer models like GPT-5.4 and GPT-4o-mini. However, the exact discount mechanics differ from Anthropic and Google, meaning you may save slightly less on long iterative conversations compared to Claude.
What is Batch API and when should I use it?
Batch APIs allow you to submit a large volume of requests that are processed asynchronously over 24 hours. Most providers (OpenAI, Anthropic, DeepSeek, Google) offer a flat 50% discount for using their batch API. It's perfect for non-urgent tasks like daily data processing, embedding generation, or bulk tagging.
Why is prompt caching so important for Agentic workflows?
In an agentic loop (like an AI trying to write code or browse the web), the model iteratively adds new responses to the same growing context window. Without caching, you pay for the entire history on every single turn, which compounds costs exponentially. Prompt caching makes the history nearly free, drastically lowering the cost of agents.
Are input and output tokens billed differently?
Yes, across almost all major providers, output tokens (generation) are significantly more expensive than input tokens (reading). The ratio is typically 3x to 5x higher for output tokens because generating text requires significantly more VRAM and compute per token.