How much VRAM does Qwen 2.5 72B need at Q4_K_M? Long-context inference

Qwen 2.5 72B at Q4_K_M with the model's native 128K context needs about 91.6 GB of VRAM. Note how the long context cache of 43 GB exceeds the 40 GB of weights.

Total VRAM required
91.6 GB
Qwen 2.5 72B at Q4_K_M
Weights
40.3 GB
72B params
KV cache
42.9 GB
128K tokens, FP16 KV

Calculator

Estimated VRAM required

91.6 GB

72B params at Q4_K_M, 131,072 token context, batch 1, inference.

Weights
40.3 GB
KV cache
42.9 GB
Overhead
8.3 GB

Estimate accuracy: Weights within ~2%. KV cache within ~5% for standard GQA models, ~10% for MLA (DeepSeek). Real VRAM may vary with framework (vLLM vs llama.cpp vs Transformers), Flash Attention, and driver overhead.

KV cache exceeds model weights: Consider lowering the context length to save on VRAM. Contexts between 8K and 64K are generally more typical for local setups.

Hardware that fits

Apple M3 Ultra 128GB
Unified
96 GB
95% used
H200 141GB
Datacenter
141 GB
65% used
Apple M3 Ultra 192GB
Unified
144 GB
64% used

Just barely too small

A100 80GB
Datacenter
80 GB
short by 11.6 GB
H100 80GB
Datacenter
80 GB
short by 11.6 GB

How this is calculated

At 128K with FP16 KV, the cache for this model is around 43 GB on its own. Weights take 40.3 GB, and overhead is 8.3 GB, totaling 91.6 GB.

Verdict

The KV cache is the hidden cost of long-context models. If you're not actually using the full 128K context, set the context to 8K or 16K in your inference engine - the savings are immediate. Q8 KV is the other obvious lever and almost always worth it.

More Qwen scenarios

Frequently asked questions

Why is the VRAM higher than Llama 3.1 70B at the same quant?
Qwen 2.5 72B is slightly larger (72B vs 70B params) and we are running at its native context. At matched context and parameters they are within 5% of each other.
Does Q8 KV cache hurt quality?
Not measurably for inference. Q8 KV is widely used in production llama.cpp builds with no observable change in output. Use it freely as a memory-saving lever.