L1 vs L2 cache: how CPU cache tiers affect real-world performance
L1 cache is the fastest memory in a computer, typically 1 ns latency (3-5 CPU cycles) and 32-64 KB per core. L2 cache is slightly slower (3-4 ns, 10-15 cycles) but larger (256 KB to 1 MB per core). A program whose hot data fits in L1 runs dramatically faster than one that spills into L2, and L2 hits are still vastly faster than going to RAM.
How this is calculated
Modern CPUs use a multi-level cache hierarchy because faster memory is more expensive per byte and takes more die area. L1 is split into L1i (instructions) and L1d (data), each per-core and private. L2 is also per-core but unified (instructions and data together). L3 is shared across all cores on a chiplet or die. A miss in L1 that hits in L2 costs an extra 10-12 cycles. A miss that goes all the way to RAM costs 200+ cycles. This is why optimizing for cache locality (keeping data structures compact, accessing memory sequentially) can speed up a program by 10x or more.
Verdict
L1 cache is an order of magnitude faster than L2. Keeping your working set in L1 is the single biggest performance lever in CPU-bound code. But L2 hits are still excellent compared to RAM. Don't fear L2 misses. Fear RAM misses.
More Latency scenarios
Frequently asked questions
How much faster is L1 cache than RAM?
Is NVMe SSD faster than RAM?
Why is HDD so much slower than SSD?
What's the point of L3 cache?
How many nanoseconds is one CPU cycle?
Does DDR5 have lower latency than DDR4?
Related tools
RAM Latency Calculator
Convert DDR3/DDR4/DDR5 timings (CL, tRCD, tRP, tRAS) into true latency in nanoseconds.
Use tool ➜RAID Calculator
Calculate usable capacity and fault tolerance for RAID 0, 1, 5, 6, and 10.
Use tool ➜Display Bandwidth Calculator
Check if your HDMI/DP cable supports your resolution and refresh rate.
Use tool ➜