An IT industry insider's perspective on information, technology and customer challenges.

August 30, 2008

Updates To The Capacity Post

To say I started a bit of a ruckus is an understatement.

48 hours later, I thought I'd share with you what we've learned through this exercise.

What Happened First

I published a blog post showing differences in usable capacity between three popular midtier vendors. Not surprisingly, the results showed that the EMC CX4 was more capacity efficiency than HP's EVA or NetApp's FAS.

We designed the configs around 120 usable disks, and set them up for multiple instances of a demanding application, e.g. Exchange. We did the best we could with published documentation. We made no effort to game anything whatsoever.

And we offered to set the record straight if we made a mistake.What Happened Next

A lot of blog hits and a lot of comments is what happened next, including a pick-up from The Register as well as Blocks and Files.

Oh my.

Lots of angry comments from employees from vendors. A few users chiming in. And, somewhere in the noise, a few useful nuggets came out of the discussion, which I want to share below.

BTW, if you ever feel like interacting with someone on a blog or forum, might I suggest an effort to be somewhat polite and courteous? It works well in the real world, and it makes sense online as well.

To this day, almost all published documentation recommends a 100% snap reserve for demanding block-oriented applications. In addition, the system defaults to this, so that's what most people end up running.

This was confirmed by several users (go read the comments), was not refuted by NetApp (yet!), which means that -- so far -- the results largely stand.

The HP EVA

A different scenario played out when discussing the EVA.

Here, the "capacity efficiency variable" is the use of disk groups. Each disk group is almost like a virtual storage array -- it shares common sparing, for example. EVA software combines all members of a disk group into a single, large pool, from which you carve virtual disks for application use.

The more disk groups you use, the more overhead gets used for hot spare-style protection. Obviously, HP customers have a vested interest in using as few disk groups as possible in the interests of space efficiency.

HP stated that -- for our exercise -- 1 or 2 disk groups should have been used, and not 7. If they are correct as stated, our results are wrong, and we need to go recalc a bit.

But I'm not 100% satisfied, as follows.

Certain HP documentation clearly states that in certain situations, customers will want to create multiple disk groups for performance reasons. For example, a database in one disk group, a transaction log in another disk group. Or putting sequentially accessed data in one disk group, and randomly accessed data in another volume group.

The concept is sometimes called "performance isolation", e.g. minimizing contention between demanding applications. On a traditional (e.g. non-virtualized) array, this is pretty easy to do: simply carve up LUN groups and hand them out for different purporses. No one will step on anyone else's spindles.

But this is a bit harder with the EVA -- everything is spread around. Carve up virtual disks and hand them out, and their underlying sectors are using all the spindles. This sort of approach offers great performance (all the spindles are put to use), but presumes that you've got enough spindle I/O to go around.

No ability to isolate, say, one application being backed up B2D, from another interactive application means that you'll have the potential of applications stepping on each other from time to time

If you don't for some reason, you'll experience performance contention -- the actions of one application will directly affect performance on other applications. If this happens on an EVA, you have a few choices: (1) buy more spindles (disks), (2) buy another array, or (3) create additional disk groups to isolate applications from each other.

So, if you plan to load up your EVA with several performance-intensive applications, and you don't want them stepping on each other, there's a case that can be made (unofficially confirmed) that you'll want more than the 1 or 2 disk groups that HP is offering up. And of course, that means more overhead to support each disk group.

But there's more.

HP documentation also points out the desirability to have multiple disk groups for availability isolation. Since all applications use all disks in a disk group, a failed disk that can't be recovered puts a neat hole in *all* your applications, and not just one.

Let me say that again.

If you are running a single disk group (as recommended occasionally by HP), and a disk fails that can't be recovered with hot sparing, etc. -- you run the risk of *every* application, file system, etc. having a problem and needing to be recovered.

Wow.

I don't know about you, but I'd be strongly motivated to use as significant number of disk groups to protect myself from that scenario. I wouldn't want an unrecoverable disk failure in, say, my file system to take down Exchange, Oracle, SAP, SQLserver, etc. etc.

And neither would you, I think.

All of the sudden, our recommendation for 7 disk groups doesn't look so bad as it first did.

So, I'm going to have to rephrase the discussion with our friends at HP as follows:

"What are the recommended number of disk groups for an EVA with 120 usable disks where the customer has 6 or 7 demanding applications, and desires a significant degree of performance isolation and availability isolation?".

I bet the answer isn't one or two disk groups ...

Other Criticisms

There were those who thought that there was no way a vendor should be making one of these comparisons, that it should be done by an independent party. I'd agree, but the problem is coming up with an independent party these days. Any suggestions?

There were those who argued that the protection levels should have been different on the configs. We went with what each vendor recommended, or tried to. You're free to debate the merits of RAID 6 versus RAID 5 with proactive global hot spares separately ...

And there were those that thought we should have benchmarked each of these configs to ensure we would get equivalent performance from each. Sorry, I don't think that's possible.

But several people thought it was a useful -- though flawed -- discussion. And that's all we really wanted.

What's Next?

Well, based on what we've seen so far, it's unlikely that NetApp will change their recommendations (or their defaults) for FAS in these environments.

They're wedded to RAID DP (fine) and are forced to support 100% snap reserves, since running out of snap reserve means a vicious application crash. Oh, I'm sure we'll see some posturing from them, but nothing substantive to correct our work.

HP's a different case. Their design works well for one or two applications, but doesn't look so good when one starts loading up multiple applications that demand a degree of isolation.

And, since that's a subjective discussion, we probably need to go back and rework the numbers for EVA showing 3 or 4 disk groups, and not 7.

Comments

It's a good hornet's nest to stir up to be honest! A bugbear of mine for a while has been trying to get true costs from vendors and yes I know it depends but the question I am asking more and more is assuming I follow your best practise, how much will it cost me to store a true terabyte of data? We are beginning to get more sophisticated as well and beginning to profile our applications; so I might ask how much will it cost me to store a true terabyte of data with a certain I/O profile.

And BTW, if I follow your best practise, I want you to sign-up to guaranteeing performance and capacity. I might not be looking for a money-back guarantee but I am going to be looking at you to put some skin in the game; be it service credits, PS, free upgrades etc.

Of course, you might get very conservative and completely over-specify but hey, this is a competitive game and if you price too high, you loose but if you under-specify to win the business and put yourself in a hole; you better be prepared to give me a ladder to get us out of the hole.

I find Chuck's blog (and Barry's) very useful as it tends to generate great debate and this is all useful to me, the end-user. It's the nearest thing I get to sticking you guys all in a room and telling you to fight it out. So all, please keep posting (try to keep it polite tho')!

it is absolutely healthy to think of usable capacity, performance, availability,etc.. It is all about SLA, isn't?

The Big problem is that, unless you have a very clear application profiling, this does not help and for that matter, it is hard to find enough expertise from the customer and vendor side to do this excersize (virtually all vendors are willing to do it for BIG customers but not many will do it from small shops)

Many vendors like to show you two pages of features without focusing in your problem and things don't go as it should..

Ironically, many customers enjoy the debates

This debate can be a good start as MS exchnage has gone through a good profiling pushed by Microsoft ESRP and vendors BPs... it is difficult to extend it though beyond this....

Thanks for netting this out Chuck. The signal to noise ratio in your prior post plus comments was getting too low due to the volume of your hyperbole as well as the buzz of all those hornets flying around :)

On behalf of NetApp, let me unequivocally state in no uncertain terms that the *officially recommended* SAFE percentage of Space and Fractional Reservations for LUN’s on NetApp FAS systems is ZERO.

That means in your prior example NetApp FAS efficiency goes up to 34+37=71% and we win your little circus sideshow. Pretty much as simple as you would expect NetApp to be!

FWIW - Martin’s comment above actually nails it as the true underlying customer requirement at this level of the IT stack. What can we as storage vendors offer that properly balances all four of:

1. Availability,
2. Performance,
3. Efficiency, and
4. Overall Cost?

NetApp’s focus and proven track record is on all 4, not merely one or two of the above. And I haven't even mentioned dedupe on primary storage yet :)

For those interested in finding out the difference between NetApp’s recommended and default settings for managing snapshot space (plus maybe a little history explaining some folks’ confusion on the topic) I will commit to posting an entry by the end of this long-weekend over on my blog:

While we are all here talking about Useable Capacity and there appears to be a number of vendors reading this; can you address one of my personal bugbears, the definition of a Terabyte? We are intelligent people working in the IT industry and should be capable of working in bases other than decimal.

Why is this important? It is very important for me and a number of other people, especially anyone trying to operate a recharge model!

What percentage vary from Vendor to Vendor. Applications or features turned ON changes the equation.

Say on EMC Timefinder the overhead is 100% and EMC Snapshots the overhead is say 30-50%

On NetApp
The Snapreserve can vary between 20% to 200% depends on what feature the user needs.

On HP or IBM the similar overhead exists.

It exists everywhere and I think every customer understands this. There is no right or wrong thing in the overheads taken. Every customer will appreciate if the overheads can bring them better results.

Say for example instantaneous snapshot backups and restore. Take 300% overhead and if you can restore a 5 TB database in seconds.

Great! Go for the overhead and give me better results.

In a direct apple to apple comparison a Clariion will have less overheads with other vendors. But certain features turned ON like Snapclone, Snapshots the story needs a relook.

Always entertaining - One of my favorites blogs that I check almost daily.

Not really sure how seriously I should take your arguments regarding capacity when we have some basic concepts that seem (to me) out of alignment.

Let's start with your RAID 5 and dual parity RAID comment.

Your previous thread, you disagreed with dual parity RAID being any different to RAID 5 with a global hot spare. Maybe I've missed something. To me, this would be like me arguing that RAID 5 is really no different to RAID 0 with a (global) hot spare. Maybe I misunderstood you? Additionally in context of this discussion, can you then explain how EMC's RAID 6 differs from EMC's RAID 5 (with a global hot spare)?

Lastly, I've not come across any one, single storage engineer that would default to RAID 5 for MS Exchange and the like. Best practice is RAID 10, is it not?

"HP stated that -- for our exercise -- 1 or 2 disk groups should have been used, and not 7. If they are correct as stated, our results are wrong, and we need to go recalc a bit."

The HP response to your post points to a best practice guide. If you look in the guide it states as few as possible. The HP response points to 2 groups. Alternating transaction logs and dbs.

"The concept is sometimes called "performance isolation", e.g. minimizing contention between demanding applications. On a traditional (e.g. non-virtualized) array, this is pretty easy to do: simply carve up LUN groups and hand them out for different purporses. No one will step on anyone else's spindles."

Close to 60 disks per group if you are careful to alternate between the two (not much management there), IO balance should be good to go and a lot of underlying IO and bandwidth thruput.

I'd have to agree that your comparison is not taking into account the typical use of snapshots on a NetApp vs the typical use of snapshots on a CX array.

The reason that most (if not all) documentation that you can find recommends a large snapshot reserve is that most NetApp customers use snapshots all the time. They make them hourly, daily, and weekly, and keep weeks or even MONTHs of history online. Those snapshots are then integrated into their data protection mechanisms using things like SnapManager for Exchange. And they get all those snapshots with the same performance as they get with no snapshots -- something you definitely can't say about CX arrays.

Now let talk about Clarions. If people make snapshots on a CX, it's typically to create a quick, fixed reference to back up for tonight only. That snapshot is then released the next day before the next snapshot is made, or MAYBE kept around for a few days. They're not typically kept around for days, weeks, or months the way you do on NetApps.

You don't need much reserve to do that. This is why EMC's recommendation for snap reserve is much lower than NetApp's recommendation. People don't keep as much snapshot history, so they don't need as much snapshot reserve.

If you did the same number of snapshots on both systems, I think the amount of space taken up by snapshots would be roughly the same. (They're both block-level-incremental technologies.) The only reason they recommend a larger reserve is that people make more snapshots and keep them longer on NetApps -- because they CAN. And they can do so without a performance penalty. (Try keeping 30 or 60 days of snapshot history on a CX and see what happens to the performance. That's why people don't use ANY copy-on-write-based snapshot systems the way they use NetApp snapshots.)

If you were to use a NetApp like a Clarion (and only keep a few days' worth of snapshots), then the snapshot reserve would be roughly the same, and the whole point of your post would be moot.

And, BTW, I'd have to agree that comparing RAID 6 with RAID 5 and hot spares is just funny.

The comparison made here has little to do with "typical use case", as mentioned before it's based on each vendor's mandatory recommendations for an Exchange environment.

Go re-read the material I posted more carefully, please.

On a related note, do you consider keeping snaps around on the same physical array a "backup" per se? I mean, if the array has a bad day, you've lost your primary AND all of your "backups" in one fell swoop -- what are your thoughts on this?

FWIW - Even though you and the rest of EMC seem to routinely ignore it, I've gone ahead and described (again) how customers today safely provision far less than 100% LUN reservations using Exchange and other block-oriented apps using NetApp FAS arrays:

Even though you have no direct answer to why your CX4 configuration example violates all of EMC’s stated best practices and published benchmarks, I’m happy to answer your direct questions above:

1. The one NetApp Exchange whitepaper you refer to does indeed contain outdated information, which takes the most conservative approach in that one particular section. That section of the paper was authored before SnapManager for Exchange 5.0 was released. However since the release of SME50, the Admin Guide for that product contains our latest recommendations which enable NetApp Exchange customers to get as aggressive as they want regarding space efficiency for LUN’s and/or snapshots – all at no risk to the application.

2. Because we can, and customers see enormous value in it :) See Curtis Preston’s salient comment right above for further explanation!

Our customers are welcome to change these default or recommended settings because they have the rich monitoring tools and automated intelligent policies to SAFELY maximize all of their important criteria at the same time:

1. Availability,
2. Performance,
3. Efficiency, and
4. Overall Cost?

Once again I will unequivocally state that NetApp’s focus and proven track record is on all 4, not merely one or two of the above. And I haven't even mentioned dedupe on primary storage yet :)

Sorry to disappoint you, Chuck, but I am firmly in NetApp's camp on this issue. I believe you are trying to make a point that doesn't exist. I'm not saying I like NetApp more than EMC or vice versa. I'm saying that the point you are trying to make doesn't exist.

1. You don't need a snap reserve if you're not going to make snapshots

2. The reason they recommend 100% is that this matches how people use snapshots on NetApps.

3. If they made snapshots on NetApps the way you make them on CX, you would need a snap reserve the same size as the CX array.

As to your "why don't I see this in the docs," it's not because it doesn't work or shouldn't be in the docs. It's because nobody at NetApp said, "Hey, do you suppose that we should document how NOT to use our single most differentiating feature?" (That being all the snapshots you have space for with no performance hit.) NOW, if you ask anyone who actually KNOWS something about NetApps (support, their CTO, me, anyone else who's actually USED a NetApp filer -- NOT a blogger for a competing vendor, no matter how forthright he may be), they will tell you that if you don't make snapshots, you don't need any room for them. Why is this so hard to understand?

As to your other question, of course I don't consider un-backed-up snapshots a backup per-se. They're a backup against logical corruption, of course. But to recover against physical problems (double-disk failure in a RAID 5 array), you need to copy it somewhere else via replication (snapmirror, snapvault, qtree snapmirror) or via tape backups.

Sorry to disappoint you, Chuck, but I am firmly in NetApp's camp on this issue. I believe you are trying to make a point that doesn't exist. I'm not saying I like NetApp more than EMC or vice versa. I'm saying that the point you are trying to make doesn't exist.

1. You don't need a snap reserve if you're not going to make snapshots

2. The reason they recommend 100% is that this matches how people use snapshots on NetApps.

3. If they made snapshots on NetApps the way you make them on CX, you would need a snap reserve the same size as the CX array.

As to your "why don't I see this in the docs," it's not because it doesn't work or shouldn't be in the docs. It's because nobody at NetApp said, "Hey, do you suppose that we should document how NOT to use our single most differentiating feature?" (That being all the snapshots you have space for with no performance hit.) NOW, if you ask anyone who actually KNOWS something about NetApps (support, their CTO, me, anyone else who's actually USED a NetApp filer -- NOT a blogger for a competing vendor, no matter how forthright he may be), they will tell you that if you don't make snapshots, you don't need any room for them. Why is this so hard to understand?

As to your other question, of course I don't consider un-backed-up snapshots a backup per-se. They're a backup against logical corruption, of course. But to recover against physical problems (double-disk failure in a RAID 5 array), you need to copy it somewhere else via replication (snapmirror, snapvault, qtree snapmirror) or via tape backups.