EMC FAST Cache: Increase Performance While Reducing Disk Drive Count

In a nut shell , FAST Cache is a collection of Enterprise Flash Drives that sits between Storage System’s DRAM Cache and Disks. FAST Cache holds a large percentage of the most frequently used data in high performance FLASH drive. While the response time of DRAM Cache is in nanoseconds , response time of FAST Cache is in terms on milliseconds.

On my previous post on FAST Cache I had discussed on benefits of FAST Cache during AV Scan of Virtual Machines and on a small database workload which you can find over here. If you thought that was awesome , Hold your breath till you see the performance result from one another large database workload illustrated in this post.

The results are from a database workload which has been migrated to an EMC Storage with FAST Cache and to be honest we never expected such extreme performance . The LUN in contention is a Single large database LUN that has been allocated to an application and performance reports from Navisphere Analyser software, the built-in tool for performance analysis of Storage workloads hosted on EMC environment.

Below graph show the utilization of the LUN and we could see that the LUN is continuously being accessed throughput the day and hence almost at 100% most part of the day. Nothing to worry or cheer about but this shows that the database is accessed regularly.

Below graph shows the Total number of IOPS served by the LUN and it consistently servers close to 8000 IOPS and close to 10,000 peak IOPS indicating heavy Disk load on the database LUN. The distribution of Read:Write ratio was 70%:30%. for this specific workload. So how many disks do you think would be at the backend to serve these many IOPS with consistently less than 6ms response time ?

Below graph illustrates the amount of Read IOPS vis-a-vis SP Cache Read Hits/sec . Why am I indicating this ? This is because this is what the performance would have been on older boxes without FAST Cache and we had to populate enough back-end disks to serve the difference in IOPS but not any more with awesome FAST Cache !!

Below graph indicates the IOPS comparison between Total Read IOPS vis-a-vis FAST Cache Read hits/sec and here is the difference !! Look at the awesomeness of FAST Cache !! Almost 85% of Read hits are served from FAST Cache and this helps in reducing the number of back-end disks required to a large extent.

Shown below is the FAST Cache Read Hits/sec ratio and you can see that consistently 85% of the Read hits has been served from FAST Cache and that’s an amazing percentage of load being handled by FAST Cache

Now coming up below is the real differentiator of FAST Cache from its competitors in terms of ability to serve Write IOPS as well from Cache . Almost 4000 Write IOPS at peak has been served by FAST Cache and consistently more than 75% of the write Hits has been from Cache and translating this to reduction in number of disks in the back-end would be huge.

Disk Utilization or load on the back-end disks that forms the Raid Group of the LUN in discussion has been shown below and leaving alone small period of time where it has peaked to 65% , overall it has been less than 32% and this is an amazing number considering that the LUN on these disks is serving 8000+ IOPS . So Question again – How many disks are there in this Raid Group ?

As you can see there are only 10 15K RPM Disks in RAID-5 in the back-end and they are serving an amazing 10,000 IOPS with less than 6ms response time !!!

Please find below the response time of the disks and even they are less than 12ms and considering the load , these are just amazing figures.

Finally , the amount of IOPS served from disk and as we see around 320 Peak IOPS has been served from each disk which will sum upto 3200 IOPS from disk ( 320 IOPS * 10 Disks) and rest all (7000 IOPS) from Cache.

How has FAST Cache helped ?

8000 Consistent IOPS and 10,000+ Peak IOPS from a LUN carved out of 10 Disks (15,000 RPM) in RAID-5 is something that one could never have had imagined. FAST Cache has shown time and again with various workloads that if we understand the nature of workload and configure FAST Cache accordingly , we are in for lot of AWESOMENESS !!! Just imaging 10,000 IOPS on our older storage systems considering 80%Read 20% Write IO distribution and 30% SP Cache hits would have required approximately 55-65 disks but we have achieved the same performance on 10 Disks on a RAID-5.

Eager to hear back on your thoughts and experiences with TheEMCStig called “FAST Cache” .

“320 Peak IOPS has been served from each disk”, yeah except that is mechanically impossible. 165IOPS is measured max. To get 320, the reads would have to be sequential and the prefetch was getting lucky. Given that graph I’d say the backend was doing about 70 IOPS average while the overall array load was shy of 4000 IOPS. In any statistical conversation you have to throw out the outliers and while 10,000 IOPS sounds fantastic, your data clearly shows it was an a special case and therefore irrelevant.

I’m not discounting the value of Fast/Cache. Far from it. But a 10 disk RAID5 can sustain ~1500 read ops so the cache was picking up 2500 more. I’ll be generous and call it 2x. Now maybe EMC can try to defend their preposterous pricing for run of the mill SSD drives.

Hi Matt – At the Outset I would want to clarify that I dont work for EMC nor a EMC Partner. This is a Real life case study in my environment and this environment cant be cosnidered as irelevant as we have been running this for a year now and have been able to replicate the same to multiple environments as well.

If you note , I have also mentioned the amount of benefit that you can derive from FAST Cache is dependent on lot many parametrs like type of work load , how are they accessed , How is the FAST Cache configured etc .. Every work load might not get the same result but certainly we have been able to derive lot more benefits from FAST Cache than 2X that you have come to conclude !

Lucky , Special Case , irrelevant or mechanically impossible cannot be the terms used for cases where data is explained with performance data.

matt

September 8, 2011 at 8:29 pm

Oh sure in a pure sequential environment you can get more than 165 IOPs out of a 15k disk because 165 is the perfect random workload figure. Crowing about seeing 320 out of spinning rust is completely stupid when it’s patently obvious the workload from 16:30 to 18:00 is streaming, sequential access. BTW 320 is the max IOPs under streaming conditions. What is important is that the SSDs are absorbing an average of 3900 IOPs during the rest of the day which is about 2.5x more than the disks by themselves could be expected to sustain. Therefore if the benefit of SSDs is ~2.5x in your normal workload mix, (I’ll even allow up to 5x better under certain fortunate conditions – your data only shows 3x) how does EMC defend their preposterous pricing on the SSDs? THAT is the question nobody seems to be asking of them. $5500+ for each 75GB SSD?!?! On a cost+performance/GB they are so grotesquely overpriced where is the mutiny from the user community?

Hi Matt – I am not here to defend their Price nor I am a fan of the Huge Price. But then everything comes with a cost and this is not something that is specific to EMC. As a technology it has been helpful to most of our apps in reducing response time drastically with lower number of disks.