Gemini 3.5 Flash vs GPT-5.4 Mini: which utility model is more economical?
Fast, lightweight multimodal models comparison.
Fast, lightweight models are the backbone of real-time applications. Gemini 3.5 Flash and GPT-5.4 Mini both target low-latency workflows with highly competitive pricing.
Cost Comparison
Based on 100,000 input tokens (50% cached), 5,000 output tokens, and 100 requests.
Side-by-side specs
| Spec | Gemini 3.5 Flash | GPT-5.4 Mini |
|---|---|---|
| Input Cost (per M) | $0.50 (better on this spec) | $0.75 |
| Output Cost (per M) | $3.00 (better on this spec) | $4.50 |
| Cached Input (per M) | $0.25 | $0.075 (better on this spec) |
| Batch Discount | 50% | 50% |
How they differ
Gemini 3.5 Flash is priced at $0.50 per million input tokens and $3.00 per million output tokens, offering a 50% cache discount. GPT-5.4 Mini costs $0.75 per million input tokens and $4.50 per million output tokens, but offers a 90% cache discount.
Verdict
Gemini 3.5 Flash has cheaper base rates. However, if you have a high prompt reuse rate exceeding 65%, GPT-5.4 Mini can become cheaper due to its superior 90% caching discount.
Which should you pick?
Choose Gemini 3.5 Flash
Multimodal tasks involving audio/video/images, and standard low-latency transactions.
Choose GPT-5.4 Mini
Chat applications with highly repetitive, cached system prompts and long context history.
