Deep Dive - Caching

To understand caching as a technology you must look at the physical materials involved. Not all memory is created equal.

Jan 09, 2026

What is a browser cache? Should you clear it? - Surfshark

Every engineer knows that caching makes things faster. This is a surface level understanding. In the world of high scale architecture caching is actually a desperate battle against the speed of light and the physical limits of hardware.

Information has mass and moving it across a network or even across a motherboard takes time. This deep dive will explore the technological foundations of cache efficiency including the memory hierarchy and the probabilistic mechanics of Bloom Filters and the low level protocols that govern how data moves between hardware layers.

The Physics of Memory (SRAM vs DRAM)

To understand caching as a technology you must look at the physical materials involved. Not all memory is created equal.

SRAM (Static RAM)

SRAM is the technology used for CPU caches (L1 L2 and L3). It uses six transistors to store a single bit of data. Because it does not need to be “refreshed” it is incredibly fast with access times in the sub nanosecond range. However it is physically large and very expensive. This is why your CPU only has a few megabytes of it.

DRAM (Dynamic RAM)

DRAM is the technology used for your main system memory. It uses a single transistor and a capacitor to store a bit. Capacitors leak electricity so DRAM must be refreshed thousands of times per second. This refresh cycle creates a physical “wait state” that makes it significantly slower than SRAM.

Caching technology is essentially the software and hardware bridge used to hide the slow performance of DRAM by keeping the most important data in the fast SRAM layer.

The Mathematics of Access AMAT

To quantify the success of a cache we use the Average Memory Access Time or AMAT formula. This formula proves that even a tiny improvement in your hit rate can lead to a massive increase in system speed.

AMAT = T_(hit) + (MR × T_(miss))

In this equation T_(hit) is the time to find data in the cache. MR is the Miss Rate (the percentage of failed lookups). T_(miss) is the “penalty” or the time taken to fetch from the slower database.

The Hit Rate Multiplier

Imagine your cache takes 1 ms and your database takes 100 ms.

If your Miss Rate is 10% your total time is 1 + (0.1 x 100) = 11 ms.
If you improve your code to reach a 1% Miss Rate your total time is 1 + (0.01 x 100) = 2 ms.

By reducing misses by just 9% you have made the entire system over 5 times faster. This is why the choice of Eviction Algorithm is the most important technical decision in the cache layer.

Set Membership Logic (Bloom Filters)

One of the most expensive errors in system design is the Ghost Read where a user asks for data that does not exist. The system checks the cache and fails then checks the database and fails. This burns resources twice for no result

The solution is the Bloom Filter technology. This is a probabilistic data structure that can tell you if an item is “Definitely Not” in the set.

The False Positive Formula

The Bloom Filter uses a bit array of size m and k hash functions. The probability of an error (P) is defined by

P ≈ (1 - e^-kn/m)^k

If the filter says “No” the system stops immediately. This technological shield prevents millions of useless database queries every hour in systems like Google Search or BigTable.

Cache Coherency

In multi core systems or distributed clusters the technology must solve the Coherency Problem. If Core A changes a piece of data how does Core B know its local copy is now garbage?

Hardware and distributed caches use the MESI Protocol to manage this handshake.

The Four States

Modified - The data is only in this cache and it is “dirty” (changed). It must be written back to memory eventually.

Exclusive - The data is only in this cache and it matches the main memory.

Shared - The data is in multiple caches and they all match the main memory.

Invalid - This copy is old and must be deleted.

When a “Modified” event happens the technology broadcasts an Invalidate message to all other cores ensuring that no two parts of the system ever work with different versions of the same truth.

Advanced Eviction (W-TinyLFU)

Since cache memory is limited you must eventually delete old data. Simple LRU (Least Recently Used) is easily tricked by a “scan” where a background task reads every file once and wipes out the entire useful cache.

Modern high scale systems use W-TinyLFU. This technology divides the cache into a Window for new items and a Main Space for long term items.

The Frequency Sketch

When a new item wants to enter the Main Space it must “audition” against the item currently slated for eviction. The system uses a Count Min Sketch to estimate how many times each item has been used recently. The item with the higher frequency wins and stays in memory. This ensures that the cache only keeps data that is actually popular over time.

The Chaos of the Cache Stampede

The most dangerous failure in caching is the Stampede. This happens when a very popular item (like a celebrity profile) expires at the exact same millisecond for 100,000 users.

Probabilistic Early Recomputation

To solve this we use a randomizing formula to refresh the cache before it actually expires.

Time_(now) - (Gap × Weight × log(random())) > Expiry

This logic ensures that as the data gets older the probability of a refresh increases. One single user request will trigger the refresh in the background while everyone else continues to get the slightly old data. The “herd” never reaches the database and the system stays online.

Conclusion

Caching is not a simple “plug and play” tool. It is a deep engineering discipline involving AMAT math and SRAM physics and Bloom Filter probabilities and MESI consistency handshakes. By mastering the technology of memory hierarchies you move from just “using a cache” to “engineering a system” that can survive the highest scales of the modern internet.

Neural Foundry

Absolutely brilliant breakdown of the hardware layer underneath caching. The SRAM vs DRAM distinction (6 transistors versus 1 transistor plus capacitor) is somthing I wish more software engineers understood when they talk about performance. At my last job we had a caching issue where folks kept tuning algrithms but never considered how the refresh cycle on DRAM was the actual bottleneck.

Expand full comment

BinaryBox

Discussion about this post

Ready for more?