L1 vs L2 cache: how CPU cache tiers affect real-world performance

L1 cache is the fastest memory in a computer, typically 1 ns latency (3-5 CPU cycles) and 32-64 KB per core. L2 cache is slightly slower (3-4 ns, 10-15 cycles) but larger (256 KB to 1 MB per core). A program whose hot data fits in L1 runs dramatically faster than one that spills into L2, and L2 hits are still vastly faster than going to RAM.

Hardware tier
CPU Cache
On-die processor cache levels
Topic focus
L1 vs L2 cache
l1-vs-l2

How this is calculated

Modern CPUs use a multi-level cache hierarchy because faster memory is more expensive per byte and takes more die area. L1 is split into L1i (instructions) and L1d (data), each per-core and private. L2 is also per-core but unified (instructions and data together). L3 is shared across all cores on a chiplet or die. A miss in L1 that hits in L2 costs an extra 10-12 cycles. A miss that goes all the way to RAM costs 200+ cycles. This is why optimizing for cache locality (keeping data structures compact, accessing memory sequentially) can speed up a program by 10x or more.

Verdict

L1 cache is an order of magnitude faster than L2. Keeping your working set in L1 is the single biggest performance lever in CPU-bound code. But L2 hits are still excellent compared to RAM. Don't fear L2 misses. Fear RAM misses.

More Latency scenarios

Frequently asked questions

How much faster is L1 cache than RAM?
Roughly 40-60x faster for random access. L1 cache access is about 1 ns; DDR5 RAM is about 50-80 ns end-to-end including controller overhead. That's why keeping hot data in cache dominates real-world CPU performance.
Is NVMe SSD faster than RAM?
No. NVMe is fast for storage, but for random access its latency is around 50-150 µs versus RAM's 50-80 ns. That's a 1,000x gap. NVMe beats RAM only on raw capacity and persistence, never on latency.
Why is HDD so much slower than SSD?
A spinning HDD has to physically move a read head to the right track and wait for the platter to rotate into position, typically 5-15 ms per random access. An SSD has no moving parts and returns data in under 100 µs, roughly 100x faster for random reads.
What's the point of L3 cache?
L1 and L2 are tiny (KB to low MB) and per-core. L3 is much larger (tens of MB) and shared across cores, acting as a buffer before requests go to main RAM. It catches data evicted from L1/L2 and data shared between cores.
How many nanoseconds is one CPU cycle?
At 4 GHz, one cycle is 0.25 ns. At 5 GHz, 0.2 ns. Cache hits are measured in single-digit cycles; main memory access costs hundreds of cycles, which is why optimizing for cache locality matters enormously in performance-critical code.
Does DDR5 have lower latency than DDR4?
Not usually at the same relative tier. DDR5 improved bandwidth and capacity significantly, but true latency (in ns) for mainstream kits is similar to late-stage DDR4. The gains from DDR5 come from bandwidth and larger capacities, not lower memory latency.