Caching Data For Optimal Performance

Data is typically stored in a persistent store – I,e a DB. But frequent fetches from the DB make data load very slow. Therefore, there is a need to cache the data closer to compute so that the data load time is reduced.

Places, where the data can be cached, are in application memory or an external in-memory database such as Redis.

Here I will describe the caching strategies which I have used to store the data in application memory. This is in the context of a stateless application written in java which depends on a large financial data set (100GB+) that is changing in real-time.

Subscribe to data change events from source services over Kafka and store the current data in memory:

The idea is based around that external services which are owners of the data such as stock price will publish over pub sub-service such as Kafka the new data. This data will be picked up by the application and stored in memory. Therefore, the application has access to the current state of data in memory. If the application tries to access data that isn’t available in memory the application should be designed to make a rest call to the service which owns the data to retrieve it and add it to the cache. This can happen if a data update was missed.

How to cache the data in application memory

Guava (now replaced by Caffeine) cache provides an excellent mechanism to cache data in application memory. The idea is to have recent data available in application memory and have data that hasn’t been used recently evicted from memory to avoid memory pressure.

Using external in memory cache (combine read through and write through)

A combination of read-through and write-through caching is a good approach. The read-through approach ensures quick access to the data, it can be designed on its own to mark the entry in the cache as stale when the database changes. However, read-through can be paired with write-through such that database and cache writes are synchronized. To ensure consistency in data between cache and database, we can write to DB and then invalidate the cache. The reason to invalidate the cache after the write is to avoid lingering stale data in the cache – the client will be forced to load current data from DB and add it to cache once it’s deleted. The client requesting data will not find it in cache and thus will load the current data from DB into the cache. The Facebook paper on scaling via Memcache included in references below describes a concept of a lease, the MIT lecture (also included in references) further explains the concept of the lease. Essentially the lease is used to ensure that when writing to Memcache only the client who got the lease for the data can write it, other clients requesting read of data will wait and if another client tries to write while the lease is in progress, that client’s write will be denied. The lease is structured to be timebound and will expire after a timeout. Leases prevent the thundering-heard problem and also race conditions that stem from concurrent writes that can lead to out of order data persisting in cache.

References:

Azure cache aside pattern:

https://docs.microsoft.com/en-us/azure/architecture/patterns/cache-aside

This is an excellent article on how Facebook scaled via Memcache

https://www.cs.bu.edu/~jappavoo/jappavoo.github.com/451/papers/memcache-fb.pdf

This lecture video at MIT distills the concepts from the Facebook Memcache scale paper around data consistency in detail.

A great video on Yahoo’s PNUTs data storage. The presenter answers quite a few interesting questions during the presentation

Here is the paper on PNUTS.

https://sites.cs.ucsb.edu/~agrawal/fall2009/PNUTS.pdf

A nice article on various caching strategies

https://codeahoy.com/2017/08/11/caching-strategies-and-how-to-choose-the-right-one/

One caveat to the below comparison is that it states Memcache as multi-threaded whereas Redis is single-threaded. But with Redis version 6 supports multi-threading IO.

https://aws.amazon.com/elasticache/redis-vs-memcached/

A nice article by Netflix that explains how they ran an active-active cache to gain resiliency.

https://netflixtechblog.com/active-active-for-multi-regional-resiliency-c47719f6685b

Nice article on how Netflix built a resilient caching in-memory layer over Memcache.

https://netflixtechblog.com/announcing-evcache-distributed-in-memory-datastore-for-cloud-c26a698c27f7