On-Premise Data, AKA – “Cloud Cache”

On-Premise Data, AKA – “Cloud Cache”

I remember studying memory caching techniques in my computer architecture course in college, learning about how memory is organized and about overall caching strategies. The Level 1 (L1) or primary cache is the primary form of storage, and considered to be the fastest form of data storage. The L1 cache exists directly on the processor (CPU) and is limited in size to data that is accessed often or that is considered critical for quick access.

A quick refresher from Wikipedia:

Cache Entries

Memory is split into “locations”, which correspond to cache “lines”. Each data access involving the cache uses this size, which tends to be larger than the largest CPU request size.

Each location in memory can be identified by a physical memory address. When memory is copied to the cache, a cache entry is created. It can include:

the requested memory location (now called a tag)

a copy of the data

When the processor needs to read or write a location in main memory, it first checks for a corresponding entry in the cache. The cache checks for the contents of the requested memory location in any cache lines that might contain that address. If the processor finds that the memory location is in the cache, a cache hit has occurred (otherwise, a cache miss).

In case of a cache hit, the processor immediately reads or writes the data in the cache line.

In case of a cache miss, the cache allocates a new entry, and copies in data from main memory. Then, the request is fulfilled from the contents of the cache.

Cache Performance

The proportion of accesses that result in a cache hit is known as the hit rate, and can be a measure of the effectiveness of the cache for a given program or algorithm.

Read misses delay execution because they require data to be transferred from memory much slower than the cache itself. Write misses may occur without such penalty, since the processor can continue execution while data is copied to main memory in the background.

Instruction caches are similar to data caches, but the CPU only performs read accesses (instruction fetches) to the instruction cache. (With Harvard-architecture CPUs, instruction and data caches can be separated for higher performance, but they can also be combined to reduce the hardware overhead.)

Up until recently, the cloud was primarily thought of as a place to store backup data. This has changed significantly over the past 18 months. With the emergence of Big Data, organizations simply cannot afford to store copious amounts of data on local hardware. Part of the issue isn’t just the size of the data, but the fact that elastic storage provisioning models in the cloud make it easy to right-size storage, and pay for only what you need – something you simply cannot do on-premise. If you look at how digital music, social media, and online E-Commerce functions in 2012, you see that (big) data only makes sense to exist wholly in the cloud.

On-Premise Caching

With data living somewhere else, applications and services which require real-time high availability/low latency can be a real challenge. The solution is exactly the same as the L1 cache concept, namely, I predict that on-premise storage will simply be a form of high-speed cache. Systems will only store a small subset of big data locally. I’m already seeing this with many cloud-hosted audio services – which stream MRU (most recently used) or MFU (most frequently used) datasets to local devices for fast access. What is interesting in this model is the ability to access data even when cloud access is not currently available (think of your mobile device in airplane mode).

I have no doubt, at some point; on-premise storage will simply be considered a “cloud cache”. Don’t be surprised if storage on a LAN is considered L1 cache and intermediary cloud storage geographically proximal as an L2-cache, before finally reaching the true source of the data. Which, by the way, is probably already federated across many data stores optimized for this kind of access already.

* Did you know Hadoop (distributed file system) was the name of the creator’s child’s stuffed elephant? Just thought I would throw that out there.