How much VRAM does Qwen3.5 122B need at Q4_K_M? The 2026 MoE workhorse
Qwen3.5 122B at Q4_K_M with native 256K context needs about 104 GB of VRAM with all 122B params resident. The architecture is Qwen's Gated Delta Networks at scale: 48 layers, 12B active per token, 2 KV heads, head_dim 256, native 256K context. It's the workhorse MoE of the 2026 Qwen3.5 fleet - frontier-quality reasoning with Apache 2.0 licensing. Active-only loading drops the resident footprint to ~36 GB if you want to run it on a 48 GB GPU at the usual cold-expert bandwidth cost.
Calculator
Estimated VRAM required
103 GB
122B params at Q4_K_M, 262,144 token context, batch 1, inference.
Estimate accuracy: Weights within ~2%. KV cache within ~5% for standard GQA models, ~10% for MLA (DeepSeek). Real VRAM may vary with framework (vLLM vs llama.cpp vs Transformers), Flash Attention, and driver overhead.
Custom architecture - SWA not applied. If you're modeling Gemma 3/4 or Mistral Nemo, pick the preset for accurate KV cache.
Hardware that fits
Just barely too small
How this is calculated
68 GB of weights at Q4_K_M, 26 GB KV cache (at the full 256K native context), and ~9.4 GB overhead. The 104 GB resident total is the deployment number - one H200 141GB, an MI300X, or two 80 GB cards with tensor parallelism. Active-only shrinks weights to 12B * 0.56 = 6.7 GB while keeping the KV cache and overhead stable, totaling ~36 GB.
Verdict
Qwen3.5 122B at 104 GB resident is the configuration to deploy when you have datacenter-grade hardware and want frontier reasoning quality with Apache 2.0 licensing. Active-only on a single 48 GB consumer or pro GPU is the 'just works' fallback for someone asking 'what's the smartest local model I can run?' in 2026 - capability that was hosted-API-only six months ago.
More Qwen scenarios
Frequently asked questions
How does Qwen3.5 122B compare to Llama 4 Scout?
What about Qwen3.5 27B (dense) instead?
Does Qwen3.5 use Gated Delta Networks?
Related tools
RAM Latency Calculator
Convert DDR3/DDR4/DDR5 timings (CL, tRCD, tRP, tRAS) into true latency in nanoseconds.
Use tool ➜Power Cost Estimator
Estimate annual electricity costs for your PC, Server, or TV.
Use tool ➜Data Transfer Calculator
Estimate transfer times for files over USB, WiFi, Ethernet, and more.
Use tool ➜Data Read Visualizer
Visualize the massive speed difference between CPU cache, RAM, and storage.
Use tool ➜