L3 cache vs RAM: the latency cliff and why it dominates server performance

L3 cache (also called Last Level Cache or LLC) is shared across all cores in a CPU chiplet, typically 16-96 MB, with latency of 10-15 ns. RAM (DDR5) is 50-80 ns. The jump from L3 to RAM is roughly 5x in latency, which is the largest single step in the entire memory hierarchy. This is the cliff that performance engineers obsess over.

Hardware tier
CPU Cache
On-die processor cache levels
Topic focus
L3 cache vs RAM
l3-vs-ram

How this is calculated

A database index scan that fits in L3 runs 5x faster than one that spills to RAM. A game engine's texture streaming lives or dies on whether the working set stays in L3. Server CPUs (AMD EPYC, Intel Xeon) ship with massive L3 caches (up to 1 GB with 3D V-Cache) specifically because so many workloads are L3-bound. The L3-to-RAM gap is a physics problem: capacitance on the memory bus limits how fast signals can travel between the CPU die and the DIMM slots. 3D V-Cache stacks extra L3 directly on top of the CPU die to avoid the trip to RAM entirely.

Verdict

If your working set fits in L3, you're fast. If it spills to RAM, you're 5x slower. Bigger L3 caches are one of the most cost-effective ways to improve real-world application performance, which is why AMD's 3D V-Cache parts dominate gaming benchmarks.

More Latency scenarios

Frequently asked questions

How much faster is L1 cache than RAM?
Roughly 40-60x faster for random access. L1 cache access is about 1 ns; DDR5 RAM is about 50-80 ns end-to-end including controller overhead. That's why keeping hot data in cache dominates real-world CPU performance.
Is NVMe SSD faster than RAM?
No. NVMe is fast for storage, but for random access its latency is around 50-150 µs versus RAM's 50-80 ns. That's a 1,000x gap. NVMe beats RAM only on raw capacity and persistence, never on latency.
Why is HDD so much slower than SSD?
A spinning HDD has to physically move a read head to the right track and wait for the platter to rotate into position, typically 5-15 ms per random access. An SSD has no moving parts and returns data in under 100 µs, roughly 100x faster for random reads.
What's the point of L3 cache?
L1 and L2 are tiny (KB to low MB) and per-core. L3 is much larger (tens of MB) and shared across cores, acting as a buffer before requests go to main RAM. It catches data evicted from L1/L2 and data shared between cores.
How many nanoseconds is one CPU cycle?
At 4 GHz, one cycle is 0.25 ns. At 5 GHz, 0.2 ns. Cache hits are measured in single-digit cycles; main memory access costs hundreds of cycles, which is why optimizing for cache locality matters enormously in performance-critical code.
Does DDR5 have lower latency than DDR4?
Not usually at the same relative tier. DDR5 improved bandwidth and capacity significantly, but true latency (in ns) for mainstream kits is similar to late-stage DDR4. The gains from DDR5 come from bandwidth and larger capacities, not lower memory latency.