Search This Blog

Observing cache hit/miss rates

At InterModal Data we build large systems with many components running in highly available configurations 24x7x365. For such systems, understanding how the components are working is very important. Our analytics system measures and records thousands of metrics from all components and makes these measurements readily available for performance analysis, capacity planning, and trouble shooting. Alas, having access to the records of hundreds of thousands of metrics is not enough, we need good, concise methods of showing that in meaningful ways. In this post, we'll look at the cache hit/miss data for a storage system and a few methods of observing the data.

In general, caches exist to optimize the cost vs performance of a system. For storage systems in particular, we often see RAM working as cache for drives. Drives are slow, relatively inexpensive ($/bit) and persistently store data even when powered off. By contrast, RAM is fast, relatively expensive, and volatile. Component and systems designers balance the relatively high cost of RAM against the lower cost of drives while managing performance and volatility. For the large systems we design at InterModal Data, the cache designs are very important to overall system scalability and performance.

Once we have a cache in the system, we're always interested to know how well it is working. If over-designed, adding expensive caches just raises the system cost, adding little benefit. One metric often used for this analysis is the cache hit/miss ratio. Hits are good, misses are bad. But it is impossible to always have 100% hits when volatile RAM is used. We can easily plot this over time as our workload varies.

In the following graphs, the data backing the graph is identical. The workload varies over approximately 30 hours.

Traditionally, this is tracked as the hit/miss ratio easily represented as a ratio.

Here we see lots of hits (green = good) with a few cases where the misses (red = bad) seem to rear their ugly heads. Should we be worried? We can't really tell from this graph because there is only the ratio, no magnitude. Perhaps the system is really idle and a handful of misses are measured. When presented with only hit/miss ratio, it is impractical to make any analysis, the magnitude is also needed. Many analysis systems then show you the magnitudes stacked as below.

In this view, the number of accesses are the top of the stacked lines. Under each access point we see the ratio of hits/misses expressed as magnitude. This is better than the ratio graph. Now we can see that the magnitudes are changing from a few thousand accesses/second to approximately 170,000 accesses/second. We can also see that there were times where we saw misses, but during those times the number of accesses was relatively small. If the ratio graph caused some concern, this graph removes almost all of that concern.

However, in this graph we also lose the ability to discern the hit/miss ratio because of the stacking. Consider if we had two or more levels of cache and wanted to see the overall cache effectiveness, we could quickly lose the details in the stacking.

Recall that hits are good (green) and misses are bad (red). Also consider that Wall Street has trained us to like graphs that go "up and to the right" (good). We can use this to our advantage and more easily separate the good from the bad.

Here we've graphed the misses as negative values. Hits go up to the top and are green (all good things). Misses go down and are red (all bad things). The number of accesses is the spread between the good and the bad, so as the spread increases, more work is being asked of the system. In this case we can still see that the cache misses are a relatively small portion of the overall access and, more importantly, occur early in time. As time progresses the hit ratio and accesses both increase for this workload. This is a much better view of the data.

Here is another example of the SSD read cache for this same experiment. First, the hit/miss ratio graph.

If this was the only view you see, you should be horrified: too much red and red is bad! Don't panic.

This graph clearly shows the story in the appropriate context. There are some misses and hits, but the overall magnitude is very low, especially when compared to the RAM cache graph of the same system. No need to panic, the SSD cache is doing its job, though it is not especially busy compared to the RAM cache.

This method scales to multiple cache levels and systems -- very useful for the large, scalable systems we design at InterModal Data.

Get link

Facebook

Twitter

Pinterest

Google+

Email

Other Apps

Comments

Post a Comment

Popular Posts

Today, we routinely hear people carrying on about IOPS-this and IOPS-that. Mostly this seems to come from marketing people: 1.5 million IOPS-this, billion IOPS-that. Right off the bat, a billion IOPS is not hard to do, the metric lends itself rather well to parallelization...

This post is the first in a series looking at the use and misuse of IOPS for storage system performance analysis or specification.

Let's do some simple math. We all want low latency -- the holy grail of performance. In the bad old days, many computer systems were bandwidth constrained in the I/O data path, so it was very easy to measure the effect of bandwidth constraints on latency. For example, fast/wide parallel SCSI and UltraSCSI was the rage when the dot-com bubble was bubbling, capped out at 20 MB/sec. Suppose we had to move 100 MB of data, then the latency is easily calculated:

If you wander through the OpenSolarisZFS-discuss archives or look at the ZFS Best Practices Guide, then you can encounter references and debates about whether the zfs send and zfs receive commands are suitable for backups. As I've described before, zfs send and zfs receive can be part of a comprehensive backup strategy for high-transaction environments. But people get nervous when we discuss placing a zfs send stream on persistent storage. The reasoning is that if the stream gets corrupted, then it is useless. There is an RFE open to improve the robustness of zfs receive, but that is little consolation for someone who has lost data. The fundamental design of ZFS is exposed in zfs send -- the send stream contains an object, not files. This is great for replicating objects, and since ZFS file systems and volumes are objects, it is quite handy. This is why zfs send and zfs receive do not replace the functionality of an enterprise backup system that works on files. So, I expect the te…

ZFS now offers triple-parity raidz3. Conceptually, raidz3 is an N+3 parity protection scheme. Today, there are few, if any, other implementations of triple parity protection, so when we say "raidz is similar to RAID-5" and "raidz2 is similar to RAID-6" there is no similar allusion for raidz3. I prefer to say "raidz3 is like raidz2 with one additional level of parity protection. But how much better is raidz3 than raidz2? To help answer that question, I used the simple Mean Time to Data Loss (MTTDL) model to calculate the data retention capabilities of the possible configurations of 12 disks under ZFS. To be fair, the same model applies to other RAID implementations, but I'll use the ZFS terminology here.

In this MTTDL model, the configuration includes N total disks. If the data protection scheme is raidz3, then the minimum N = 1 data disk + 3 parity disks = 4. You can add more data disks to increase the overall available space, so if N=6 then you have 3 data…