Till now, we have gone through various Analytical metrics provided by PerfAccel. We have also seen different kinds of Synthetic workload runs and their Analytical information including the metric “Amount of data cached”. Now-a-days, inspite of being expensive, SSDs are very popular as compared to their SATA/SAS counterparts in terms of $/GB and Perf/$. But, the first and most important question before any Storage admin is always “How much SSD ?” “Shall we spend on a 128GB or 256GB or even higher capacity SSD ?”. Let’s see how PerfAccel helps us here by providing the correct answer even before they decide to Spend $$$ on the SSDs.

Unfortunately our Applications do not have the privilege of storing all of their data on SSDs. Even if we get a 1TB SSD, then we need to manually copy the whole Application data into the SSD, which may be large, for ex. say 500 GB or 800 GB or even more. Now, it is a very common scenario where most of the Application workloads access only a part of the whole dataset. Suppose, our Application has 800GB of data but works on, say, only 100GB of the files scattered throughout. In this case, copying all of the 800 GB of data is a pure waste i.e. 700 GB of expensive SSD space is wasted and simply blocked from others using it. This could have been utilized for the working set data of 4-5 other Applications.

This is where PerfAccel provides a disrupting, game-changing solution by intelligently managing & placing only the HOT DATA on the SSD. This means only the 100GB of data that is really accessed by the Application will be placed dynamically & automatically on the SSD by PerfAccel. It also facilitates other Applications to use the remaining cache space in a seamless manner without disturbing the first application. And no manual data movement to the SSD is necessary. PerfAccel does everything automatically.

Now, the question that comes is, how do we decide what size of SSD to purchase ? 128 GB, 256 GB, 512 GB , 1TB or even more ? Well, PerfAccel considers this as one of the most important aspects of Caching & Analytics and recommends users to find the answer much before even thinking about spending a dollar on SSDs. And to help users in finding the correct size, PerfAccel provides the Special Analytical Mode of Caching. With this mode we can have a Virtual Cache of size upto 4TB with just a small 100GB of SATA/SAS disk or any similar disk.

This mode is very helpful in understanding the working set characteristics of the application and in sizing the high performance cache device for the Application workload much before actually investing in a real SSD. Analytical mode helps predict the optimal size of the cache device for the compute grid environment and avoids expensive over provisioning or bad under provisioning.

Apart from sizing feature, its other major objective is to Provide I/O Analytics or I/O behavior of the Application at multiple levels of granularity.

How to find the cache size ?

The process for Sizing /finding the Working set size of any Application is super-simplified by PerfAccel

It’s as simple as described below:

Create an Analytical Source for the Application data directory and assign the amount of virtual cache space that you think may be the maximum amount needed. its an approximate estimated value. If your are not sure, then assign the Max virtual cache of 4TB (yes 4TB….even though we have a smaller disk here, with PerfAccel you can have a 4TB virtual cache 🙂 )

Then run the Application Workload

Once the workloads complete, you can verify the various I/O traces to find out how much cache has been utilized by the workload runs. – “Cache Usage Trace” , “Cache Cleanup Trace”, “Cache IO Trace”, “Cache Brief Trace” etc.

You can re-run the Application workload and verify that the cache usage stays around the same size and also see how the cache is being hit during the Job run

If Cache Evictions are triggered by PerfAccel, then it means that Cache Size is not sufficient. In that case, the cache size has to be increased further and workload re-run (So, here if the Virtual size of 4TB is used, then no Evictions will be triggered [unless you really have a working set size more than 4TB 🙂 ] and we won’t come to this step)

If the cache usage turns out to say 200GB, then you can safely go for a 256 GB SSD , as its clear from the Usage Traces that Application workload is not going to use more than 200GB.

This was for 1 application. If you have multiple applications, then do the same thing i.e. create Analytical Sources for each of them and assign Max Virtual Cache and run the respective workloads. At the end, you can sum up the total cache usage from all the Sources. This final value is maximum amount of working set size of all the Applications. You should purchase a SSD with capacity slightly more than this.

So, here PerfAccel helped you save some $$$ on the SSD straightaway. We will now see a demo of the above process to make the whole process very clear. So, lets create our cache first.

Now, we will create an Analytical Source instead of the Regular Writethrough Source. The Analytical Source needs a cache with atleast 25GB of space. (Though we can override this with a lower value with -f, but Datagres does not recommend doing so)

The Analytical source can be created by using the option “-o analytical” in the source creation command

We will see both the cases i.e. Limited Cache Size and Unlimited Cache size

This is the most preferred mode, where we want to Fingerprint Application I/O workload and we are not sure about its I/O profile, working set size etc. For that we need to provide “max” as the Cache Size in the source creation command

We can see here clearly that Cache space allocated to the source is 4TB

Now, lets proceed with this source. And lets run a workload. We will use filebench again to generate the workload. This time we will use the Webserver profile of filebench.

The workload generation will happen covering 200000 files and will run for 2000 secs.

At this point we are not sure of how much cache size this workload requires. Neither does any utility exists out there which can tell us about the Working set size information. (Does it ??? Reply in comments 🙂 ). But, don’t worry, we have PerfAccel with us. Lets wait for the test to end and we will go through the cache sizing details. The data generation will take a lot of time as its going to create 200000 files over the NFS share. Then the test will run for 2000 secs

We have run the test again on the pre-created files. So, this time the workload simply ran doing its I/O. Let’s see how many of our conclusions derived after the first run still hold true :

Does not need more than 10 GB of cache space – True, usage is still 8.4 GB only

Does more re-reads after writing for the first time. So, its a good candidate for caching – True, in this run there was almost 100% read hits from the cache

Is Read Intensive – True, writes/re-writes are almost none

Does not trigger any evictions, again proving that the cache size needed for it is around 10GB – True – No evictions now as well

Created around 200000 new files – False, No new files were created in this 2nd run

Is I/O intensive rather than Attribute or Meta Operation intensive, as the Attribute misses/hits are very less compared to the number of files – True, only 200001 Attribute Read Hits i.e. 1 for each file

So, with the 1st and 2nd round of conclusions, we can arrive at the fact that this particular workload can be satisfied with a small cache size i.e. around 10GB. To handle future/unknown scenarios where the workload gets affected by new conditions, lets add a 10GB more to the size, making it 20GB. So, 20GB of cache space is good enough for this workload. Now, we can create a Write through source with SSD cache of 20GB for getting the Caching as well as Analytics benefits from PerfAccel

Its very clear how PerfAccel has simplified the whole process for finding out the Cache space Sizing information of any Application/Workload. PerfAccel Analytical mode is not only used for Sizing. It can also be used for I/O Fingerprinting an Application, without using any Real SSD as well. PerfAccel is definitely a powerful tool in the world of Next generation Storage Analytics

Now we know how to find the Cache Sizing of any Application using PerfAccel. But, what would happen if we still go ahead with a lower cache size which gets filled completely by the working set data of the Application ? How would the Application Behave ? How would PerfAccel behave ?

We will cover this aspect of PerfAccel i.e. Evictions in our next blog.

In the meantime, you can go ahead and Download PerfAccel right now and get hands on with the exciting features.