How much VRAM does Mixtral 8x7B need at Q4_K_M? MoE memory math

Mixtral 8x7B is the rare model where memory is the bottleneck and compute is cheap. It needs at minimum a 32 GB card or a 24 GB card with Q8 KV cache and reduced context. When it fits, it's one of the fastest high-quality local options.

Mixtral 8x7B at Q4_K_M with 32K context needs about 34 GB of VRAM. Mixture-of-experts is the wrinkle here: even though only 2 of the 8 experts activate per token, all 8 must be in memory at all times. That makes Mixtral 8x7B effectively a 47B-parameter model from a memory standpoint, even though it computes like a 12B.

By TechCompare · Updated July 2026

Total VRAM required

33.7 GB

Mixtral 8x7B at Q4_K_M

Weights

26.3 GB

47B params

KV cache

4.3 GB

32K tokens, FP16 KV

Estimated VRAM required

33.7 GB

47B params at Q4_K_M, 32,768 token context, batch 1, inference.

Weights

26.3 GB

KV cache

4.3 GB

Overhead

3.1 GB

Doesn't fit on a 32 GB consumer GPU at Q4_K_M. Try (21.3 GB) for the smallest quant that fits a single RTX 5090.

Estimate accuracy: Weights within ~2%. KV cache within ~5% for standard GQA models, ~10% for MLA (DeepSeek). Real VRAM may vary with framework (vLLM vs llama.cpp vs Transformers), Flash Attention, and driver overhead.

Custom architecture - SWA not applied. If you're modeling Gemma 3/4 or Mistral Nemo, pick the preset for accurate KV cache.

Hardware that fits

A100 40GB

Datacenter

40 GB

84% used

Apple M3 Max 64GB

Unified

48 GB

70% used

RTX 6000 Ada

Pro

48 GB

70% used

Just barely too small

RTX 5090

Consumer

32 GB

short by 1.7 GB

Open full calculator to tweak settings ➜

How this is calculated

MoE is a compute optimization, not a memory one. The 8 expert FFN blocks plus the shared attention give Mixtral roughly 47B total parameters that must all be loaded in VRAM. At Q4_K_M that's about 26 GB of weights, plus ~4.3 GB of KV cache at 32K context. The benefit is throughput - inference runs at the speed of a 12B dense model, so you get 70B-class quality at 12B-class generation speed if it fits.

Calculator

Hardware that fits

How this is calculated

Verdict

More Mistral scenarios

Frequently asked questions

Related tools

RAM Latency Calculator

Power Cost Estimator

Data Transfer Calculator

Data Read Visualizer