March 14, 2011

Multivendor VMware Performance–thank you (with toys and a treat)

The “Everything VMware at EMC” community is one of the most busy locations on ECN (the EMC Community Network), which I think is awesome :-)

The plea I made for customers to share their performance data (here) has now been viewed more than 1700 times!

I’m happy to announce three things:

1) We have our 30 winners! I love that they cross the globe, come in small/medium/large, and represent such a multi-vendor cross-segment of the community. I’m out of IX2s for now, so will be closing that down. If you want to continue to contribute – please go ahead, but I won’t be able to send more IX2s. All the folks listed below will be contacted for shipping info, and should receive your IX2’s shortly. Feedback on how you end up using them would be welcome!

2) Based on the data, and on other information sources we have our first VMware-centric IO stack optimization ready, and are looking for people to test it!

If you’re an EMC customer using an EMC Celerra, EMC Unifed NS series, or EMC VNX array and using NFS datastores, we have something we would like you to try. It’s experimental – so DO NOT DO THIS on a production array.

3) So, I’m opening a NEW thread, and NEW contest… Customers, EMCers, EMC Partners using any of the systems in topic #2 (@Fox_inti, @jasonboche, @HerseyC, @landog – you folksapply!!!), if you provide the “before and after” data (using the same process as in the original contest) to this thread on Everything VMware at EMC, I will send the first 30 to post an iPad (16GB Wifi version). For details – read below!

The EMC NFS performance through EMC NAS stack (which is used in differing ways) has always been very, very good for traditional NFS/CIFS (read: unstructured file, read biased) use cases. This set of recent face-melting SPEC SFS benchmarks shows how that lead has broadened with VNX (which can scale to leverage large amounts of SSD as both a cache and a tier). We were not kidding when we said 2011 was going to be a year of breaking records.

BUT – all is not rosy

DART’s IO path for transactional (read: skewed towards smaller block, more write intensive) has never had the same degree of optimization efforts as it is not a general purpose NAS profile. Over-simplifying, this mostly due to serialization of IOs through the NAS stack, and missed optimizations around this type of workload. To date, all the optimizations (uncached mode as an example) done automatically by the EMC vCenter plugin (VSI) have an effect, but don’t change how the NAS stack itself handles the IO path. This doesn’t mean that transactional workloads on EMC NAS is a BAD choice (many customers are using it and loving it), but rather that we hadn’t been optimizing as much as we could have. The two use cases where this effect is the strongest are Oracle on NFS and VMware NFS datastores.

The good news: Engineering optimization around BOTH the Isilon and DART NAS stacks has a lot coming around optimizing around these sorts of transactional NFS workloads – both in the near, and longer terms.

Some of the near-term DART optimizations (there are really big ones planned for the longer term) are ready for some early experimental engineering feedback. To understand better:

The optimization is for the NAS write path – it has shown very large latency performance improvements in some test with very small IO random IO workloads on NFS datastores.

Multi-VMs is important if you want to provide data. We know it's good with a single VM (4x better latency), but the way it's better is by reducing the serialization of writes through the NAS stack to the backend disk devices (which produces a lot less latency). The main questions are: a) how much it holds up with random, multi-vm workloads; b) whether performance regressed with other workloads (large IO sequential guest workloads)

It is experimental. This means: Don’t use it in production – PERIOD. If you have a non-production Celerra/VNX use case, give it a shot. We’ve been playing with it for a while, so it seems solid, but never use non-production code in production environments.

So – how do you get it?

For customers who would like to try the optimization, here is the process (this is the preferred process, as it formalizes feeback):

the epatch is available to the tech support group, which is the usual method in which we release code.

The epatch is called 6.0.40.805 If your customer wishes to obtain and test with it, please open an SR with your local service representative and have them work with tech support to get the 6.0.40.805 e-patch so that you can schedule your upgrades.

Please provide any feedback based on the experience that results from this patch. Negative or positive, we'd like to hear it.

For non-customers (EMC Partners/employees) who would like to try the fix, the experimental DART epatch is here, with the MD5 here, and the release notes here.

What tests are the most interesting? Well – read here for the contest details. I’m thankful for any/all data.

Regardless, please provide any feedback you find. In particular, performance tests before/after with VM guest level small block, random workloads (with ranges of VMs in datastores) would be VERY appreciated.

Comments

You can follow this conversation by subscribing to the comment feed for this post.

Hi,

We have many concerns using NFS on Celerra with NS/VNXe (or VNX with DMs), because we're doing full virtualizations, i.e. no physical servers anymore and are weary of (v)DM fail-overs.

Since it's doing a re-mount and potentially some "quick" fsck over the file system, depending on the size of the FS, it might take a while still to remount the NFS. We do know that in the past, DM fail-overs could take as long as 15 minutes, we've also heard that in recent DART codes, re-mount times have been largely improved, but not elminiated and even 90 seconds, in our opinion, is way too much. And what happens if the NFS file system becomes corrupted during fail-over and fsck does it's full check or fails to re-mount the file system (might this still happen?)?

For above reason, and the comparably minuscule fail-over times of iSCSI and FC, we mentally waved off NFS solutions for craplication usage only.

Speaking frankly (as I always try to do) NAS failover time remains less than ideal. There have been massive improvements from the past - failovers occur well under the 125s recommended timeout value (which the EMC VSI plugin automatically sets) on ESX hosts.

That said - there's still a lot of work to do. Through 2011, the goal of the NAS team is to get it to under 30s always, through a variety of methods (this 30s mark would let it operate in almost all cases without extending timeouts).

As a general statement answer to the "what's my take" - if I were a customer, I wouldn't use just VMFS on block or just NFS. I think juidicious use of both is the most flexible choice. There are simply still use cases which are different enough that each has their place.

For some reason I don't fully comprehend, many customers want to "just use one". I guess I can partially comprehend the appeal of "just use one" (operational simplicity), but If you're going to look at the world through that lens, then you need to pick the one which covers the superset of SLAs, not the subset - which is one of the reasons many customers (regardless of storage vendor) go VMFS/block only. This is reflected in your thinking.

To me, that's a "too bad" choice (to limit yourself to just one) - you miss out on leveraging the relative strengths of one vs. the other.

There are, BTW, three ways that failover time can be brought down even further:

1) PERHAPS: clustered NAS solutions coupled with near term changes in vSphere where failure/retry uses DNS rather than IP (on failure, the fileserver doesn't "boot", the client is redirected to an already working node. Working on validating this.

2) DEFINTELY: continued acceleration of "classic" NAS failover (see above). Lots of things that can be done here (partial boot state, changes to the core filesystem) that are all being done.

3) DEFINITELY (but further out in the roadmap), use of NFS v4.1 (multiple sessions) or better yet, pNFS coupled with clustered NAS. This is a variant of 1, but depends on vmkernel support for NFS v4.1 or pNFS.

Well it's good to hear that you're as concerned about the issue as we are.

Concerning the "mix and match" of technologies, it's not that I or we would disagree, but mostly it's connected to the storage device model, which is used or were used in the past or as you mentioned, complexitity issues for those (e.g. SMB customer admins) needing to maintain the solution in the end, as you mentioned too.

We really hope for 1) and/or 3), because 1) it's what NetApp does, allegedly (can't tell, I've only seen them from outside, blinking in racks so far).

(Name and email address are required. Email address will not be displayed with the comment.)

Name is required to post a comment

Please enter a valid email address

Invalid URL

Please enable JavaScript if you would like to comment on this blog.

Disclaimer

The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by Dell Technologies and does not necessarily reflect the views and opinions of Dell Technologies or any part of Dell Technologies. This is my blog, it is not an Dell Technologies blog.