An IT industry insider's perspective on information, technology and customer challenges.

August 28, 2008

Your Useable Capacity May Vary ...

In the US, every car sold has a standardized EPA rating on fuel economy. Using a quaint measurement system of "miles per gallon", it's not precise, but it does give buyers a rough measure of comparative fuel efficiency.

And, of course, has given rise to the frequent disclaimer that "your mileage may vary".

And is it time to start comparing the capacity efficiency of storage arrays the way we do cars?

------------------------

Update Feb 17 2009:

While the specific conclusions reached in this blog post are now obsolete due to enhancements by the respective vendors, the general topic of storage efficiency is not obsolete.

I encourage all storage customers to take the time and effort to figure out what their useable capacity might be -- once all overheads are subtracted.

The differences are still significant, although getting to an accurate figure will take some effort, as can be seen by this post and its comments.]

-- CPH

-----------------------

Why Is This Important?

My impression is that in the US, when gasoline was $1.25 a gallon, not too many people paid attention to that efficiency rating. Spike gas to $4 a gallon, all of the sudden that EPA rating was very important indeed.

In the storage world, we have the luxury of constantly declining media prices, but an industry average growth in capacity that far exceeds the decline in raw costs. As a result, most organizations spend more on storage every year.

It's a fair question ...

How much raw capacity are you buying?

And how much of that do you get to actually use to store your data, once all the overheads are accounted for?

Creating A Standardized Measure

When it comes to fuel efficiency in the US, we have the benefit of a government-mandated standard for comparison. When it comes to storage, we have no such luxury, so we have to create one.

The proposal I'd offer is a like-for-like comparison as follows:

a relatively standardized use case that most vendors document with specific recommendations (e.g. Microsoft Exchange)

Yes, every use case is different, but we have to pick one, and Exchange seems like a reasonable proxy of an application that most people have in their environments.

Although many vendors don't publish recommendations for other high transactional-rate applications such as Oracle, SQLsever, SAP etc. (EMC does, though) I think it's reasonable to extend Exchange findings to these use cases.

Conversely, I don't think it's reasonable to extend this sort of comparison to file serving, backup-to-disk, decision support and other applications with decidedly different profiles. Performance and application availability matter in the use cases we're targeting for this exercise.

Yes, efficiency ratios play out differently in smaller configurations (say 10 or 20 disks) or larger configurations (say 500 or 1000 disks). Every mid-tier array has its optimum configuration points where the numbers play out better than others.

We didn't try to game this. We didn’t need to.

Yes, disks are available in many different sizes, but the real issue here is spindles -- the same efficiency ratios play out whether we're talking 146GB disks or 1TB disks.

Yes, vendor recommendations change all the time, but are usually a compromise between decent performance, decent availability, decent protection and decent management. Just like EPA gas mileage and individual driving styles, you're free to vary from these recommendations, but not without compromising something else.

And we don't want to have to resort to the storage equivalent of "hypermiling".

Where possible, we've provided links to vendor-supplied documentation. In some cases, these documents have since been removed from public view, so we'd recommend contacting your vendor if you'd like the most -- ahem -- updated version.

We did the best we could. If we got something wrong, let us know where we went wrong, and we'll fix it to the best of our abilities. Our goal is to eventually publish a series of white papers and tools that will help customers figure out the comparative storage capacity efficiency for themselves.

So, consider this a sort of preview of future materials.

As a starting point, we're going to look at a best-practices efficiency comparison for EMC's recent CX4, NetApp's FAS series, and HP's EVA. All are offered by their vendors as mid-tier arrays that support these sorts of environments.

Once the efficiency ratios are calculated, they can be expressed in two ways: one way is in terms of "percent efficiency" of raw vs. usable capacity (ranging from 34% to 70% in this exercise), but it's also useful to express this in terms of price deltas, e.g. because a given array is less capacity efficient, you end up paying much, much more for each unit of usable capacity.

For those of you who are keeping an eye on rack space, we've included that as well. We haven't yet included energy factors (power and cooling), but we'd like to do that in the future.

Ready to dive in? I think you'll find it interesting.

EMC CX4 -- 70% Storage Capacity Efficiency

As far as arrays in this category go, the CX4 is near the top of practical storage capacity efficiency without compromising performance, availability and management. Sure, there's some overhead (as we'll see), but -- compared to many alternatives -- it looks very attractive.

Hot spares -- EMC recommends setting aside 1 disk in 30 as a hot spare. Hot spares speed recovery of failed disks and provide an extra measure of availability. The CX4 uses a global hot spare scheme, which means that a small number of hot spares can protect a much larger number of production drives.

Snapshots -- EMC best practices call for reserving 10 to 20% of capacity as a snapshot reserve. If you run out of snapshot reserve, and more can't be dynamically added, the snapshot fails, and not the application itself.

Overhead -- All arrays use differing amounts of raw capacity for internal management features. A portion of the first five drives is used to create a vault for storing both the FLARE code as well as a safety area for the contents of cache -- the remainder of these drives can be used for available capacity.

In addition, all CLARiiONs store data in 520-byte sectors rather than 512. The extra 8 bytes is used to provide an additional layer of data integrity, further reducing available capacity by 1.5%. Not all vendors offer this additional protection against data corruption. More on this here.

Running the numbers, we see that a CX4 offers 70% usable capacity when configured with RAID 5 and 10% snap reserve, per EMC recommendations. Choosing RAID 6 instead of RAID 5 decreases this to 65%. Electing a 20% snap reserve decreases efficiency to 65% for RAID 5, and 61% for RAID 6.

For those of you counting physical space as well, this results in 12 drive shelves.

The EVA provides a very wide number of options in balancing performance, usable capacity and availability. Unlike other arrays such as the CX4, once these choices are made, changing them can be very disruptive.

The EVA configuration choices include:

- Number of disk groups- Number of proactive disk management events- Type and number of disks in a group- VRAID level (O,1 and 5)- Disk failure protection level (none, single, double)- Cache settings

The EVA is slightly unusual in terms of how you think about disk overhead.

First, the EVA is built around the concept of "disk groups". HP recommends that separate disk groups be used to isolate performance characteristics. The more distinct high I/O applications you put on an EVA, the more disk groups. For certain cases like Exchange and Oracle, HP recommends that data and logs be separated to different disk groups.

Given that most arrays do other things in addition to just Exchange, we've assumed (like the CX4 above) that the EVA with over 120 disks will be perhaps be supporting things like SQLserver, Oracle and other high I/O applications.

We've decided to use 7 disk groups for this example: two for Exchange (logs and data), three disk groups for Oracle (two applications each requiring a separate disk group but a shared log), one for a SQL database (perhaps sharing log file with the Oracle log disk group), and - finally -- a disk group reserved for snapshot images with decent performance (e.g. MirrorClones)

Even though HP recommends Vraid1 for Exchange, we have elected to configure Vraid5 to maintain a decent comparison with the CX4 configuration above. Also in the spirit of fair play, HP recommends a 20% snap reserve for Exchange. We've elected to make this 10% to maintain a rough comparison with the CX4.

Hot Spares -- The hot sparing scheme is somewhat unique on the EVA -- there's no concept of a global hot spare. Hot spares are associated with individual disk groups. And, since everything is "virtual", HP recommends required-case hot spare provisioning in the event that a user later elects to use, say, Vraid1 instead of Vraid5.

This means that the concept of a "virtual" hot spare has to be twice whatever the largest disk in the group might be, and you need one of these per disk group. For example, if 146 GB drives are used, the virtual hot spare area is 2x146 GB. If a single member of the disk group is, say, a 450GB drive, the virtual hot spare area must be 900GB.

There's an additional level of hot sparing per disk group as well, "Proactive Disk Management", which plays roughly the same role a second hot spare would in a global design. Unfortunately, this too is associated per disk group, and must be twice the size of the largest disk in the group. Each "Proactive Disk Management" virtual spare protects against a single "event" (e.g. disk failure) in a disk group. If you want protection against more than one event, you'll need more of these. We configured to protect against a single event here.

HP folks, if we got this wrong, our apologies in advance. You'll have to admit, it's a pretty intricate scheme.

If we go back to our configuration example, this would be a total of 14 virtual hot spares and 14 virtual proactive disk management spares would be required for 7 disk groups. Fewer disk groups would require fewer virtual hot spares. No mention is made of potentially using larger capacity drives solely for the purpose of virtual disk hot spares, but I would assume that this would make for more difficult administration.

Snapshots -- This analysis uses HP EVA's "Virtually Capacity Free" snapshots, which require close to the same amount of capacity as a CX4 would. It should be noted that CX4 snapshots reside on separate disks to minimize production performance impact.

For the EVA to match this, HP must use "fully allocated" snapshots which would further impact usable percentages more than what is shown here. Again, in the spirit of fair play, we’ve given the EVA the benefit of the doubt on this one.

RAID Parity -- The EVA offers two parity RAID choices: 4+1 Vraid5 and 4+4 Vraid1. For consistency, this study uses Vraid5, even though HP recommends the less space-efficient Vraid1 for many performance-intensive environments.

As a result, for there to be 120 data disks, we'll need another 30 parity disks -- plus protection for the other overhead drives.

Overhead -- EVA makes five copies of its OS and configuration data which it distributes among the first five disk groups. At least one of the disk groups must be reserved exclusively for log files, although the smallest disk group supported is eight drives. Interestingly enough, no single log file exceeds more than one disk, meaning that seven of the disks are unusable, probably for performance reasons.

Finally, to allow EVA to complete its housekeeping in a reasonable time, EVA best practice recommends a further 10% of each disk group be given to the EVA operating system, known as "Occupancy Alarm".

Even with giving the EVA the benefit of the doubt in several areas, we still arrive at a 47% usability factor.

I think we're being generous here, though. Anecdotally, we get routinely exposed to EVA customers who have much lower usable capacity basd on specific HP recommendations -- sometimes as low as 33%.

For those of you counting physical space, this is 17 drive shelves.

Here's the chart.

But there's more to this than just efficiency.

If you assume that, for example, that CX4 and EVA raw capacity is priced roughly the same, that means that every unit of usable EVA capacity is roughly 63% more expensive than the same capacity on a CX4.

And that's without factoring things like power, cooling and floor space.

Sobering, isn't it?

NetApp FAS Series -- 34% Storage Capacity Efficiency

Although the FAS series can be used in block-oriented environments such as Exchange, it does so by emulating a block storage device on top of its underlying WAFL file system. WAFL only performs well when it is guaranteed that free blocks are always available for new writes and snapshot data.

When it is used in this manner in high change rate environments (such as Exchange) and snapshots, NetApp often recommends significant amounts of storage overhead to ensure reliable operation and acceptable performance. This is often called "space reservation" or "fractional reserve".

Hot Spares -- For the FAS series, every disk that is not being used can be considered a potential hot spare. NetApp best practices state that each file head should have a minimum of two hot spares assigned up to 100 disks, then two additional hot spares for every 84 additional disks.

For a 364 disk configuration, that means 2 for the first 100 drives, and 8 disks for the remainder.

Source: NetApp's Storage Best Practices and Resiliency Guide

RAID Parity -- FAS supports two parity RAID modes, RAID 4 or RAID-DP (RAID 6). NetApp's default is RAID 6 for these environments, organized as a 14+2 scheme. In our 364 disk configuration, this rounds up to 23 RAID groups, or a total of 46 parity drives.

Snapshots -- As mentioned above, WAFL doesn't want to run out of space when using snapshots. The question of precise figures required for an Exchange environment seem to be a subject of controversy both inside and outside of NetApp -- some documents point to a 100% reserve space recommendation, others suggest that it's reasonable to get by with a lesser amount.

One thing is extremely clear -- running out of snap reserve looks to be a very bad thing in a NetApp environment -- there's no place to put an incoming write, usually resulting in catastrophic application failure. By comparison, other approaches (e.g. CX4 and EVA) simply stop trying to capture before-write data if you run out of reserve -- the application itself continues to run.

"It is recommended to have a 100% space reservation value set for volumes hosting LUNs containing Exchange data. This guarantees sufficient space for all write operations on Exchange data volumes and ensures zero donwtime for Exchange users".

"It is extremely important to keep in mind that a change in fractional reserve percentage might result in the failure of a write operation should the changine in that LUN exceed that percentage (frequent monitoring of space is recommended). Therefore, one must be sure of the change rate of the data in the LUN before changing this option. As a best practice, it would be best to leave it at 100% for a time".

I guess you've been warned ...

I would assume that this same sort of recommendation would apply to any write-intensive application: SAP production instances, Oracle transactional applications, and so on.

Overhead -- WAFL has a high amount of space overhead, which comes in handy for a variety of reasons, but comes at a price.

First, WAFL must "right-size" (format) all drives to the same lowest-common-capacity denominator if drives come from different vendors.

Second, since WAFL is a file system there's overhead for the metadata that all file systems require.

Thirdly, the WAFL design wants 1 in 10 blocks to be free so that application writes aren't delayed, this means an additional 10% of reserve at all times.

And, finally, there's another allocation for "Core Dump Reserve" should NetApp customer support want to take a look at an ONTAP core dump.

Here's the chart:

This means that the NetApp FAS series achieves 34% storage efficiency as compared to CX4 for these configurations. If you're counting rack space, this results in 26 disk shelves.

And, for this use case, this means that a given unit of usable capacity on NetApp's FAS is approximately 2x expensive than a more capacity efficient device, such as a CX4 for these kinds of use cases.

And again, that's not counting power, cooling and space.

Maybe you should ask them for a better discount :-)

So, What Does All Of This Mean?

First, I think there's enough variability in capacity efficiency that -- just perhaps -- we ought to start educating storage users on the difference between raw and usable capacities in real-world use cases.

If roughly the same usable capacity costs X from one vendor, and 1.6x from another vendor, and 2x from yet another vendor -- isn't that significant?

Second, as every IT organization takes a sharper look at power and cooling, the numbers get even bigger, don't they?

Even assuming all the associated controller and enclosure electronics were roughly equivalent in terms of power and cooling (definitely not the case, but assume it is just to simplify the discussion), doesn't it matter that a given usable capacity has X power/cooling requirement from one vendor, 1.6x from another vendor, and 2x from yet another vendor?

Third, I think this storage capacity efficiency discussion is something that other vendors don't really want to talk about. But I believe that customers will want to start getting into the practice of requesting quotes for usable capacity, configured in accordance with vendors' published recommendations for their environments -- in addition to asking for power/cooling requirements.

My guess is that many vendors will argue that (a) they have features and benefits that offset their inefficient use of space, (b) we're wrong in our analysis, or (c) this really doesn't really matter.

Just to head off the obvious, if there's a claim that your thin/virtual provisioning feature saves storage, or your dedupe feature saves storage, or whatever it is, please point us to where you recommend that customers use these features with performance-intensive and availability-intensive applications.

Regarding the second point, if you think we're wrong on some substantive point, please accept our apologies in advance -- and point us to the correct answer. As long as we're staying within the use case guidelines outlined above -- we'll glad to make the change.

But please, don't try and argue that this isn't an important discussion for customers ...

"Proper settings for the protection level, occupancy alarm, and available free space will provide the resources for the array to respond to capacity-related faults.

"Free space, the capacity that is not allocated to a virtual disk, is used by the EVA controller for multiple purposes. Although the array is designed to operate fully allocated, functions like snapshot,reconstruction, leveling, remote replication, and disk management either require or work more efficiently with additional free space."

"Additional reserved free space — as managed by the occupancy alarm and the total virtual disk capacity — affect leveling, remote replication, local replication, and proactive disk management.“ Figure 3 gives a great picture of things.Note that in our calculation we separated PDM and Capacity Alarm.This was to keep things understandable rather than lumping all the factors into Capacity Alarm.We chose ONE PDM event although this paper suggests two is also a good choice.For Capacity Alarm we chose a conservative 10% to include all items beyond one PDM event including replication log file, releveling overhead, etc.

Comments

"And for NAS, we are leaning to a V3170 in front of a CX4. So you both win ;-)"

Posted by: Martin G | August 29, 2008 at 01:32 PM

Doesn't this type of feedback from an end user tell you something here?

The virtualisation of a CX4 by a V3170 delivering a NAS solution indicates that the customer is not happy with the features supplied on one vendors storage system - so much so they feel the need for it to virtualized it behind another vendor’s appliance.

This type of feedback on an EMC blog by an end user is amusing and I am surprised no one else has capitalized on this comment... yet.

The cost of hardware in this case raw disk is relatively cheap. The features the vendor delivers using the hardware is what you really have to pay for when you buy a storage system.

So in answer to your statement "Your Usable Capacity May Vary" - my reply as an end user would be "Who cares, as long as I get the features I want out of the storage system together with the reliability and performance I expect from an enterprise class storage system.

The three vendors you have picked on in this blog are quite interesting. Two of the three have evolved originally from being hardware based and the other software. After all features are predominantly derived from software-based manipulation of hardware.

It looks to me that the two companies with hardware roots are playing catch up and one of them is using discussions such as this to mask their deficiencies in other areas where they are not measuring up!

I think you should be asking – what does the customer really want? What is important?

Maybe if you identified what people want from technology (in your case it would be a spell checker - your blog states that you enjoy “piano, mountain bking and skiing-- in that order.”) then we would be having a more interesting discussion about what each vendor delivers in the way of features.

I suppose your response to my feedback will be “On yer bke” and my response would be “F7 to you” :o)

Nice discussion btw... but it's just not about what I think the end user is interested in.

I really care about useable capacity as an end-user; I may be in the position where I have to put somewhere between 18-50 petabytes of storage on the floor over the next three years. Okay, this is an extreme example but if my overhead is too high; I'm going to have build new data centres.

I want my storage to be as efficient as possible; I also want it to have a layered feature set, so that I can turn on the features I need but also have options to turn off those features I don't need. And a whole new subject, I only want to pay for the features I use on the disk I'm using i.e I don't want to pay for software on a per head/controller basis!

Man....I SO want to get in on this.
But I got a bad cold and my head is not working properly.
However, I would like to say it IS a good discussion regardless of one might think of the results EMC got.

But, there is also the thing that it is perhaps looked on a little to narrow as Exchange is (also) about supporting X users/mailboxes with a reasonable responsetime (20ms max as per MS) etc.
Hence a discussion about performance and disk efficiency is very much valid.

Maybe as a follow on to your test?
Once you have made the calculations, set it up as the best practices says; run Jetstress.

That way we will see if being most efficient in the amount of disks needed to reach a certain usable capacity is the same as being able to sustain most users/mailboxes.

I'm an end user helping to manage four EVA units, they are all maxed out at 240 drives, and there are no doubt more on the way. It's kind of an understandable newbie mistake to configure too many disk groups. You lose usual capacity, and subvert the self-balancing capability of the array that way. We made too many (six total, but now reduced to four) on our first array, You can mostly get around the balance problem if you then manually balance your IOPS workload across the disk groups, but why do that if the array will do it naturally?

What you do lose with the very large disk groups is the potential size of a data-loss event -- The EVA has an unusual implementation of RAID, data is protected within RAID groups of 6-11 drives called RSSes. If you simultaneously fail two drives in an RSS you will lose every vraid5 vdisks in the disk group. The probability of lightning striking twice like this, however is very low, since in a large disk group there will be many RSSes. This is improved further by failure prediction which moves data off a suspect drive and ungroups it automatically. For vraid1 vdisks to fail, you have to lose a disk and it's 'married pair disk' within an RSS simultaneously (joint probability should be n*(n-1) drives in group). After understanding this we've moved to fewer larger groups.

The poster who says a vdisk provides performance isolation isn't really correct, that's also mostly provided by separate diskgroups - not only physical disks but also cache management is somewhat isolated (although details are sketchy there). But 'performance isolation' is overrated - the way to get the best total performance possible from the hardware is to stripe every lun across every drive.
One can still isolate 'problem' apps with a few disk groups,

All arrays share resources, so absolute performance isolation is impossible within an array. If you really have an app that has to have a guaranteed service level, maybe it should have dedicated hardware.

Coming from an EVA environment I can say that your assessment is good overall. Although HP does recommend fewer storage groups, in practicality this proves problematic. I had storage groups for FC disk and FATA, which each placed a large number of spindles in each. If ALL the array did was backend exchange it would have been fine but in a mixed use scenario, as was our case, we found this recommendation troublesome. With singular large storage groups and several multi-purpose hosts attached you now have the potential for a single host to affect the performance of the entire storage group. We had numerous *whole* array outages with no clear cause, but tracing through events in the time line pointed to probable suspects. To avoid this you would *have* to break out the SGs as Chuck indicated, but if performance and reliability aren't a concern for you...

Chuck maybe you could adjust your EVA numbers for the (albeit problematic) recommendation of 1-2 storage groups? That will help the HP numbers a bit but not much. That recommendation should be seriously caveated but HP!

Jim, I'm not sure your whole array outages would have been helped at all by having more disk groups.
I do feel the EVA used to be rather unreliable, and we a few outages here, probably similar to what you had. The eva had to be rebooted in this instance, but we never lost any data. The cause of these were
failing disk interfaces continually blathering on the backend, saturating it with numerous loop events.

Knock wood, there have been no such outages for quite a while since the firmware on the drives were updated (several other changes were made to deal with this specific issue).

As a customer we indicated to the vendor our fury about this problem, and would not have continued with this array had we felt they had not been addressed. I'm glad it has been, because the EVA is a joy to use and manage, in my opinion.

You mention performance and availability. Even though hosts using vdisks in the same disk group share physical disks, a large disk group can handle a LOT of IOPS. That's your best total perfoamcne option. You may want to isolate a host doing a lot of I/O, but by moving it to a separate smaller disk group, will you be limiting performance of the host that really needs it? That's the sysadmin's call, and this is one of the subtle parts of eva management, but in practice the complexity is much less because most hosts will not be a problem, unless you've underconfigured your storage. The best practice config generally "just works", and to improve eva performacne -- add more disks.

I don't care to get drawn into a political discussion regarding EMC vs Netapp vs HP. However, I would like to point out that you should remove the "aggregate snap reserve" from your Netapp calculations. Aggregate snap reserve is only used in deployments with synchronous mirroring. In the type of deployment you are discussing, you would lose nothing to aggregate snap reserve.

Hi Chuck...
I'm alot confused...
I'm working on my assignment n i need ur help...
I need to ask some questions...
What are storage devices in a computer?
What are their roles in a computer?
What are different types of Storage Devices?
How do you measure their capacity?
Can you please List at least five different types of storage devices?
At least one Brand of Each device along its price........?