Hot or Not? Evolving Nimble’s Cache Architecture

May 27, 2015

By Senthil Ramamoorthy – Principal Engineer

Flash storage is common in the data center today, yet Nimble Storage has emerged as the only enterprise data storage vendor with the technology to compete head-to-head against legacy market leaders EMC and NetApp.

A key factor has been Nimble’s proprietary CASL (cache accelerated sequential layout) file system, including our unique cache architecture. The team I work on recently added some improvements to the way our system determines how “hot” each block is, which has markedly improved the results our customers are seeing.

For context, rather than just providing flash caching, CASL is built on top of data layout algorithms that turn random writes into full-stripe sequential writes on dense low-cost hard disk drives (HDDs). This lets us deliver up to 10,000+ IOPS (input-output operations per second) per 7,200 RPM HDD – better write performance than an SSD (solid state drive) in most all-flash arrays! Better still, it enables us to leverage flash where it is most effective – by serving random reads with a dynamic flash cache.

Nimble’s Adaptive Flash platform uses a cache architecture with two main parts: admissioncontrol determines which blocks will be admitted to the cache, and evictioncontrol determines which blocks need to be evicted out of the cache. In looking to further improve the performance of these processes, we had to ensure that the enhancements would be applicable to the widest variety of customer workloads, and that they not have negative effects on any existing workloads.

Our team focused specifically on the way we maintain the temperature map of the blocks in the cache, and on the eviction algorithm. After many months of development and testing, we started deploying the new caching technology late last year, and have been carefully monitoring ever since using Nimble’s InfoSight analytics tool. More than one-half of all installed Nimble arrays are now running the new code, and are seeing significant improvements in cache effectiveness, based on many metrics across the install base.

For instance, let’s look first at read latency improvement. As the graph below shows, customers saw read latency decrease across the install base.

The new caching algorithms also decreased the read miss rate across the installed base:

These charts clearly show the effects of the new cache architecture, and highlight the historic and predictive insights available to our customers via InfoSight.

Kudos to my colleagues and the rest of the engineering team for their work throughout the full cycle of identifying a complex problem, researching possible solutions, discovering innovative approaches, and then building and testing new code that delivered these capabilities to our customers.