How much VRAM does gpt-oss 20B need at Q4_K_M? OpenAI's first open-weights model

gpt-oss 20B at Q4_K_M is the model to download if you've never run a local LLM before in 2026. Apache 2.0, runs on anything, decent reasoning for the size, and OpenAI lineage. Not as strong as Qwen3 30B-A3B at the same memory footprint, but the brand recognition matters and the licensing is the cleanest in the open-weights ecosystem.

gpt-oss 20B at Q4_K_M with native 128K context needs about 19.4 GB of VRAM with all experts resident, dropping to roughly 9.3 GB with active-only weight loading. It's an MoE with 3.6B active parameters per token, 24 layers, an unusual hidden size of 2944, head_dim 64, and 128K native context. The 20B variant of OpenAI's first Apache 2.0 release is the smallest model in the family and is intentionally sized for laptops and consumer GPUs.

By TechCompare · Updated July 2026

Total VRAM required

19.4 GB

gpt-oss 20B (MoE) at Q4_K_M

Weights

11.2 GB

20B params

KV cache

6.4 GB

128K tokens, FP16 KV

Estimated VRAM required

19.4 GB

20B params at Q4_K_M, 131,072 token context, batch 1, inference.

Weights

11.2 GB

KV cache

6.4 GB

Overhead

1.8 GB

Estimate accuracy: Weights within ~2%. KV cache within ~5% for standard GQA models, ~10% for MLA (DeepSeek). Real VRAM may vary with framework (vLLM vs llama.cpp vs Transformers), Flash Attention, and driver overhead.

Hardware that fits

RTX 3090

Consumer

24 GB

81% used

RTX 5090

Consumer

32 GB

61% used

A100 40GB

Datacenter

40 GB

49% used

Apple M3 Max 64GB

Unified

48 GB

40% used

Open full calculator to tweak settings ➜

How this is calculated

11.2 GB of weights at Q4_K_M plus a 6.4 GB KV cache (at the full 128K context window) and ~1.8 GB overhead. Active-only loading shrinks weights to 3.6B * 0.56 = 2.0 GB while keeping the same KV cache, totaling ~9.3 GB - it runs on a 12 GB GPU or a unified memory device.

Calculator

Hardware that fits

How this is calculated

Verdict

More gpt-oss scenarios

Frequently asked questions

Related tools

RAM Latency Calculator

Power Cost Estimator

Data Transfer Calculator

Data Read Visualizer