LLM API Pricing & Cost Calculator

Q: Does OpenAI support prompt caching?

Yes, OpenAI supports prompt caching natively on newer models like GPT-5.4 and GPT-4o-mini. However, the exact discount mechanics differ from Anthropic and Google, meaning you may save slightly less on long iterative conversations compared to Claude.

Q: What is Batch API and when should I use it?

Batch APIs allow you to submit a large volume of requests that are processed asynchronously over 24 hours. Most providers (OpenAI, Anthropic, DeepSeek, Google) offer a flat 50% discount for using their batch API. It's perfect for non-urgent tasks like daily data processing, embedding generation, or bulk tagging.

Q: Why is prompt caching so important for Agentic workflows?

In an agentic loop (like an AI trying to write code or browse the web), the model iteratively adds new responses to the same growing context window. Without caching, you pay for the entire history on every single turn, which compounds costs exponentially. Prompt caching makes the history nearly free, drastically lowering the cost of agents.

Q: Are input and output tokens billed differently?

Yes, across almost all major providers, output tokens (generation) are significantly more expensive than input tokens (reading). The ratio is typically 3x to 5x higher for output tokens because generating text requires significantly more VRAM and compute per token.

Compare API costs across OpenAI, Anthropic, Google, and DeepSeek. Model caching discounts and batch requests.

Request Parameters

Input Tokens (per request)

Not sure how many tokens your prompt is? Count it with the Token Counter.

Output Tokens (per request)

Number of requests

Prompt Caching (%)0% cached

Batch API Mode (50% off)

Add Models

Cost Comparison

Claude Haiku 4.5

$12.50

Gemini 3.5 Flash

$19.50

Gemini 3.1 Pro (<=200k)

$26.00

GPT-5.4

$32.50

Claude Sonnet 4.6

$37.50

Claude Opus 4.8

$62.50

GPT-5.5

$65.00

Model Specs (per 1M tokens)

Model	Input	Cached Input	Output
Claude Haiku 4.5	$1.00	$0.10	$5.00
Claude Opus 4.8	$5.00	$0.50	$25.00
Claude Sonnet 4.6	$3.00	$0.30	$15.00
Gemini 3.1 Pro (<=200k)	$2.00	$0.20	$12.00
Gemini 3.5 Flash	$1.50	$0.38	$9.00
GPT-5.4	$2.50	$0.25	$15.00
GPT-5.5	$5.00	$0.50	$30.00

How to use this tool

Pick a model
Select a model from the dropdown (GPT-5.4, Claude Sonnet 4.6, Gemini 3.5 Flash, DeepSeek V4 Pro, and more). The calculator loads that model's per-million-token input and output pricing automatically.
Enter input and output token counts
Type the number of input tokens (the prompt) and output tokens (the response) you expect per request, or per month. Output tokens are billed 3x to 5x higher than input across most providers.
Toggle caching and batch options
Turn on prompt caching to model the discount for repeated system prompts, or toggle Batch API mode for the flat 50% async discount. For agentic loops, toggle the agentic option to see how caching flips the economics.
Read the cost estimate
The calculator returns the total cost for your token volume, broken down by input and output. Compare multiple models side by side to find the cheapest option that meets your latency and quality needs.

About this tool

The LLM API Pricing Calculator helps developers and startups estimate the cloud costs of integrating major AI models like GPT-5.4, Claude Sonnet 4.6, or DeepSeek V4. You can tweak parameters such as input/output token counts, prompt caching discounts, and batch processing to see exactly how your monthly bill changes.

Unlike static pricing tables, this calculator models the compounding effect of multi-turn agentic workflows. By toggling 'Agentic Loop', you can see how Anthropic and Google's aggressive caching discounts flip the economics of running autonomous agents.

Prompt and Context Caching

Caching is the single most important variable in modern AI economics. If you send the same 100K token system prompt repeatedly, caching means you only pay full price for it once, and a tiny fraction (10% to 50%) for subsequent hits. The calculator lets you estimate what percentage of your input tokens will be cached.

Batch API Mode

If your application does not require real-time latency, you can route requests through a Batch API. This guarantees a 50% discount across nearly all major platforms in exchange for a 24-hour turnaround window.

Standard Chat vs Agentic Workflows

A standard chat prompt has a predictable 3:1 input-to-output ratio. But autonomous agents accumulate context: each turn they add their previous thought process to the prompt, making the input grow exponentially. Caching prevents these loops from destroying your API budget.

Pre-computed API pricing calculators for the most heavily debated AI models.

GPT-5.5 Pricing

Calculate OpenAI's flagship model API costs.

View details →

GPT-5.4 Pricing

Estimate costs for standard GPT-5.4 workloads.

View details →

Claude Opus 4.8 Pricing

Calculate costs for Anthropic's latest flagship reasoning model.

View details →

Head-to-head cost breakdowns of the models this calculator prices, such as GPT-5.4 vs Claude Sonnet 4.6.

GPT-5.4 vs Claude Sonnet 4.6

The workhorse model pricing showdown.

Read comparison →

Claude Opus 4.7 vs DeepSeek V4 Pro

Frontier reasoning versus optimized price-performance.

Read comparison →

DeepSeek V4 Pro vs Mistral Large 3

Serverless pricing versus flagship open weights.

Read comparison →

Frequently asked questions

Does OpenAI support prompt caching?

Yes, OpenAI supports prompt caching natively on newer models like GPT-5.4 and GPT-4o-mini. However, the exact discount mechanics differ from Anthropic and Google, meaning you may save slightly less on long iterative conversations compared to Claude.

What is Batch API and when should I use it?

Batch APIs allow you to submit a large volume of requests that are processed asynchronously over 24 hours. Most providers (OpenAI, Anthropic, DeepSeek, Google) offer a flat 50% discount for using their batch API. It's perfect for non-urgent tasks like daily data processing, embedding generation, or bulk tagging.

Why is prompt caching so important for Agentic workflows?

In an agentic loop (like an AI trying to write code or browse the web), the model iteratively adds new responses to the same growing context window. Without caching, you pay for the entire history on every single turn, which compounds costs exponentially. Prompt caching makes the history nearly free, drastically lowering the cost of agents.

Are input and output tokens billed differently?

Yes, across almost all major providers, output tokens (generation) are significantly more expensive than input tokens (reading). The ratio is typically 3x to 5x higher for output tokens because generating text requires significantly more VRAM and compute per token.

LLM Token Counter

Count tokens in any prompt for GPT, Claude, Gemini, and Llama with exact OpenAI tokenization.

Use tool ➜

LLM VRAM Calculator

Calculate the VRAM needed to run or fine-tune any LLM at any quantization.

Use tool ➜

Power Cost Estimator

Estimate annual electricity costs for your PC, Server, or TV.

Use tool ➜

Data Transfer Calculator

Estimate transfer times for files over USB, WiFi, Ethernet, and more.

Use tool ➜

LLM API Pricing & Cost Calculator

Request Parameters

Add Models

Cost Comparison

Model Specs (per 1M tokens)

How to use this tool

Pick a model

Enter input and output token counts

Toggle caching and batch options

Read the cost estimate

About this tool

Prompt and Context Caching

Batch API Mode

Standard Chat vs Agentic Workflows

Popular model pricing

API pricing comparisons

Frequently asked questions

Related tools

LLM Token Counter

LLM VRAM Calculator

Power Cost Estimator

Data Transfer Calculator