CPU cache hierarchy explained: L1, L2, L3 and why they exist
CPU cores execute instructions in fractions of a nanosecond. RAM responds in 50-80 nanoseconds. Without cache, a CPU would spend 99% of its time waiting for data. The cache hierarchy (L1, L2, L3) bridges this gap by keeping frequently accessed data closer to the execution units. Each level is larger but slower than the one before it.
How this is calculated
The hierarchy works on the principle of locality: if a program accesses a memory address, it's likely to access nearby addresses soon (spatial locality) and the same address again soon (temporal locality). L1 caches the most recently used data at the smallest granularity (64-byte cache lines). L2 catches L1 evictions. L3 catches L2 evictions and serves as a shared pool for inter-core communication. Cache design is a trade-off between latency (smaller is faster), hit rate (larger catches more), and power consumption (larger uses more). Modern CPUs spend roughly 30-40% of their die area on cache.
Verdict
The cache hierarchy is the most important architectural feature of a modern CPU that most developers never think about. Understanding it explains why array-of-structs vs struct-of-arrays matters, why linked lists are slow, and why optimizing for cache locality can speed up code by an order of magnitude.
More Latency scenarios
Frequently asked questions
How much faster is L1 cache than RAM?
Is NVMe SSD faster than RAM?
Why is HDD so much slower than SSD?
What's the point of L3 cache?
How many nanoseconds is one CPU cycle?
Does DDR5 have lower latency than DDR4?
Related tools
RAM Latency Calculator
Convert DDR3/DDR4/DDR5 timings (CL, tRCD, tRP, tRAS) into true latency in nanoseconds.
Use tool ➜RAID Calculator
Calculate usable capacity and fault tolerance for RAID 0, 1, 5, 6, and 10.
Use tool ➜Display Bandwidth Calculator
Check if your HDMI/DP cable supports your resolution and refresh rate.
Use tool ➜