How much VRAM does gpt-oss 20B need at Q4_K_M? OpenAI's first open-weights model
gpt-oss 20B at Q4_K_M with native 128K context needs about 19.4 GB of VRAM with all experts resident, dropping to roughly 9.3 GB with active-only weight loading. It's an MoE with 3.6B active parameters per token, 24 layers, an unusual hidden size of 2944, head_dim 64, and 128K native context. The 20B variant of OpenAI's first Apache 2.0 release is the smallest model in the family and is intentionally sized for laptops and consumer GPUs.
Calculator
Estimated VRAM required
19.4 GB
20B params at Q4_K_M, 131,072 token context, batch 1, inference.
Estimate accuracy: Weights within ~2%. KV cache within ~5% for standard GQA models, ~10% for MLA (DeepSeek). Real VRAM may vary with framework (vLLM vs llama.cpp vs Transformers), Flash Attention, and driver overhead.
Hardware that fits
How this is calculated
11.2 GB of weights at Q4_K_M plus a 6.4 GB KV cache (at the full 128K context window) and ~1.8 GB overhead. Active-only loading shrinks weights to 3.6B * 0.56 = 2.0 GB while keeping the same KV cache, totaling ~9.3 GB - it runs on a 12 GB GPU or a unified memory device.
Verdict
gpt-oss 20B at Q4_K_M is the model to download if you've never run a local LLM before in 2026. Apache 2.0, runs on anything, decent reasoning for the size, and OpenAI lineage. Not as strong as Qwen3 30B-A3B at the same memory footprint, but the brand recognition matters and the licensing is the cleanest in the open-weights ecosystem.
Frequently asked questions
What does '3.6B active' actually mean for inference speed?
Should I pick gpt-oss 20B or Qwen3 30B-A3B?
Related tools
RAM Latency Calculator
Convert DDR3/DDR4/DDR5 timings (CL, tRCD, tRP, tRAS) into true latency in nanoseconds.
Use tool ➜Power Cost Estimator
Estimate annual electricity costs for your PC, Server, or TV.
Use tool ➜Data Transfer Calculator
Estimate transfer times for files over USB, WiFi, Ethernet, and more.
Use tool ➜Data Read Visualizer
Visualize the massive speed difference between CPU cache, RAM, and storage.
Use tool ➜