An IT industry insider's perspective on information, technology and customer challenges.

March 05, 2013

It's A Flash World: EMC's "Flash.next" Announcement

By now, most people who work with storage -- vendors, customers, partners -- appear to have read the memo: we're in the midst of a technology transition from spinning disk to flash storage.

All the signs are clearly visible. From an industry perspective, it's certainly gotten noisy and will only get noisier: all the familiar players and many new ones are out there with newer, flash-based products. Just as you'd expect.

Even the slower moving storage vendors have started to respond to the inevitable -- it's becoming a flash storage world, perhaps faster than some may have thought.

I love a good disruption ...

Today, EMC announced significant enhancements to our existing all-flash storage products, now rebranded under a new "Xtrem" moniker, and -- perhaps more importantly -- shared a bit more perspective of what's coming before too long.

While I'm sure that the merits of EMC's offerings will be widely debated, there's no debating that we're investing like crazy to stay ahead of the technology and customer demands.

From a vendor perspective, that's what a good technology disruption is all about ... take it seriously, or suffer the consequences.

The Server Flash Battle

The primary rationale for flash storage is performance, plain and simple. Whether that's achieving a new level of performance that was simply unattainable before, or getting a certain level of performance at a lower cost, this isn't really about sheer capacity -- that's what spinning disks do so well.

As is so often the case, as architects we'd like to scale performance and capacity as independently as possible.

Although we all intuitively know that processor performance has been outstripping disk performance for many decades, it's important to remind people that -- at some point -- the industry has to go looking for better answers.

Back in the day, if you were a vendor selling an all-disk array, it could compete well when you could offer, say, 10-30% better performance than the competition. That was a big deal way back when.

Now it's not.

Most of us have gravitated to the IOPS/GB metric to describe relative storage media performance. It's not perfect in every situation, but it's turning out to be a useful shorthand, especially when talking about "hot" data.

This chart sums it up pretty well.

We have our familiar, cost-effective disk coming in at ~0.5 IOPS per GB of capacity.

An enterprise-class SSD sitting in a storage array that knows how to use it intelligently delivers ~150 IOPS per GB.

Wow, that's 300x better -- nothing to sneeze at.

But look what happens when we take basically the same media technology, and mount it on a server-resident PCIe card: 2000 IOPS/GB. That's more than 13x faster than the array-based enterprise flash drives.

And ~4000x faster than today's disk drives.

If flash is all about application performance, is it any wonder why there's such an intense industry focus on server flash in particular? EMC clearly sees server-resident flash as a critical part of the overall storage stack (much the way we think about HBAs, multipath I/O, etc.) and -- as a result -- we're investing very heavily: hardware *and* software.

It's the future of storage performance.

Mind-Bending Performance: A Standard Feature

I've become weary of putting up example after example of before-and-after performance, using realistic workloads you'll see in just about every data center on the planet.

Maybe you've also gotten weary seeing them as well.

Here's the point: there are surprisingly few real-world workloads that DON'T see a spectacular increase in performance by using server-based PCIe flash. The exceptions are pretty easy to figure out: bulk sequential reads -- and bulk sequential writes.

Everything else is sort of like shooting fish in a barrel -- way too easy.

What Makes It Hard

Here's the wrinkle: a PCIe flash card sitting in a server is a storage resource, not a compute or network resource. It stores data. Two concerns immediately pop up as a result.

The first, obviously, is data persistence. Servers fail, cards fail, etc. -- how do you make important data persistent, recoverable, replicated, etc.? You know, the kinds of things that storage arrays do today?

The second more pragmatic concern is pooling and efficiency -- these flash cards aren't cheap compared to spinning disks, so you want to get the maximum mileage from each and every one of them.

And both of these requirements get solved in software, not hardware.

Part I: XtremSF (eXtreme Storage Flash) -- The Hardware

EMC-supplied server flash cards are now branded as XtremSF, and the portfolio has gone from a few options to a very broad selection indeed. Based on this announcement, I think we're getting exceptionally
aggressive when it comes to server flash cards. I like that, and I
think customers will as well. Game on!

The performance numbers are impressive: not only in terms of how they compare to alternatives, but in their transparency.

Note the clear, unambiguous measurements based on real-world 4K blocks, read/write/mixed -- as well as both read and write latency. If you've done any serious performance modeling for your apps, these numbers should plug right in, and give you a reasonable prediction of what you're likely to see performance-wise with this approach.

I don't think it's any surprise that we're largely competing with "Brand F" in this space. They were there early, so we have to bring a much better game. And it looks like we're doing exactly that.

Note the use of x8 PCI lanes vs. the more typical x4. Also note that there's an offload engine on each and every XtremSF card -- these are not "dumb" cards. The results are less CPU utilization and more predictable I/O response times as the offload engine can intelligently manage housekeeping, garbage collection, etc.

From a pure technology perspective, you'll notice both eMLC and SLC variations. Just like we have different tiers of disk storage, we're seeing the pronounced emergence of different tiers of flash storage -- and that's now here server-resident PCIe cards.

Yes, we have customers who put both types in their servers -- that's the real world.

I think the champ of the hardware announcement is new high-capacity 2.2 TB half-height, half-length MLC card. It should be self-evident that this technology is evolving quickly -- over 2 terabytes of blazing server flash capacity on a modest PCIe card! Yes, it's better / faster / more efficient than the competition, but this space is starting to move faster, and faster, and faster. All good.

Let's throw a little red meat out there on performance, just for fun. Keep in mind, the product is designed for the real world of multiple concurrent workloads, and multiple flavors of I/O streams.

This chart shows a 70/30 read/write mix using both 4K and 8K blocks vs. the current competition.

Bottom line -- there's a 2x advantage. And, since flash is all about speed, that sort of matters, doesn't it?

Just on a pure hardware level, the EMC XtremSF products stack up very favorably against the next closest competitor.

But -- to be fair -- things are moving very fast in this space.

No one can rest on their laurels.

Part II: XtremSW -- The Software

With this announcement, XtremeSW now refers to EMC's software management layer that supports the new XtremSF PCIe cards. While XtremSF can be used without XtremSW software, the inverse is not true -- yet ...

The first product in the family (XtremSW Cache) is here today; over time we'll be evolving this into a full suite of software products (XtremSW Suite) that manage server-resident flash cards.

In one sense, XtremSW can be seen as a progression of EMC's popular FAST (fully automated storage tiering) approach -- we just have a new tier of storage, this time using PCIe flash cards resident in a server.

XtremSW allows any XtremSF card to do two things at the same time: act as local storage for non-persistent data (think temp files, scratch areas, etc.), or act as a smart cache for persistent data that eventually ends up on an external EMC storage array.

Volumes that are designated as non-persistent (e.g. local storage) don't need to communicate with an external storage array.

Volumes designated as persistent perform cache write-through to the array (presumably landing on the array's cache!) as well as a local copy being stored on the card for a subsequent access.

Read hits come directly from the PCIe card -- naturally. Read misses are served from the array, and kept around for subsequent use.

All as you'd expect.

That sort of flexibility ends up being important from an efficiency perspective. For example, let's say you've got a decently consolidated virtualized server running perhaps 5-10 workloads. You certainly don't want a dedicated flash card for each workload -- although that might be very cool!

Some of the apps want non-persistent local storage, some want caching on top of external data protection, and some want both.

With today's technology, you can't pool the PCIe flash cards across servers (yet!), but you *can* share them nicely within a single server, especially for virtualized workloads.

I'm sure many will be glad to see full support from VMware for the VMotion use case ... that's big.

There's also the beginnings of management integration with Symmetrix VMAX (typically very performance conscious customers) with visibility, monitoring and reporting from Unisphere for VMAX.

Storage admins can see where the XtremSW-enabled cards are, what they're doing, how they're performing, etc.

Much more can be done here -- but it's a useful and very pragmatic start.

Part III: An XtremIO Update

We've been sharing details about "Project X" (the product we're building from the XtremIO acquisition) over the last few months.

If you're not familiar, XtremIO is a purpose-built all-flash storage array architecture and not a repurposed traditional array stuffed with flash drives.

It uses a scale-out design, offers real-time inline global dedupe, and can run data services at full throttle (e.g. snaps) with no observable degradation -- all while delivering predictable half-millisecond-ish latency regardless of what you throw at it.

Not to mention, it's incredibly easy to set up and use.

We've now gone from beta to "directed availability" -- which now means EMC has started to ship production revenue units to qualified customers. Look for a full general availability announcement later this year.

Fortunately, there's a good body of documentation that characterize the workloads and use cases where the combination of XtremSF and XtremSW can make the greatest impact. Behind this, we've got some decent tools and methodologies to look at specific customer applications, and make a reasonable estimate of performance impact.

What I've recently realized is that server-side flash storage provides customers yet another important option in boosting application-specific storage performance.

Many years ago, when customers came to us and found they needed more storage performance for specific applications, the options were heavy-handed at best: upgrade your array, re-lay out your data, add more paths, buy a bunch faster disk drives, etc.

Not exactly simple or cheap.

We'd do these very extensive performance studies on behalf of our customers, helping them evaluate the different options, but none of recommendations were exactly an easy pill to swallow.

With XtremSF and XtremSW, we've now got a third important tool in the toolbox. If you fit the profile -- and so many do -- just add a few PCIe flash cards to your existing servers, and stand back. Shazam!

Not to be undervalued is EMC's world-class customer service. These are enterprise apps we're talking about, and they demand enterprise support from an enterprise company.

As an example, check out the current server qualification matrix for XtremSF.

Lots and lots of choices -- all 100% backed by EMC.

A fair question might be -- how does all this stuff fit together? So many choices, so many choices ...

The simplest way I've found to explain it boils down to (a) the size of your working set, and (b) the predictability of the working set. By "working set", I'm referring to the set of blocks a given application wants to use and re-use to get its work done.. You'll also hear this described as locality of reference or perhaps data skew -- most applications tend to only be using a small portion of their available data sets at any one time, and that's exploitable.

The XtremSF / XtremeSW combination will provide the very ultimate in storage performance if (a) a good portion of your working set(s) fit on the XtremSF card, and (b) the contents of the workings set aren't changing rapidly.

Other than perhaps loading everything into a big chunk of server RAM, this is the best game in town. Not every workload fits this profile, but a surprising number of familiar ones do.

Don't need nose-bleed performance, or you have larger working sets, or maybe a bit more unpredictable I/O patterns? VNX and VMAX with FAST fit this, as does Isilon for those scale-out NAS environments.

Or, as is becoming increasingly popular, front-end your array with a few XtremSF cards as needed.

Need nose-bleed performance, and large working sets, and need predictable performance and have random I/O patterns? Take a good look at the now-available XtremIO array -- that's what it was designed for.

Here's the real-world bit -- in many environments, you'll often find a need for all three approaches. There's no "best" -- as you might expect, it all depends on what you're trying to get done.

Part V: The Road Ahead

As part of this announcement, the team gave a preview into some of the things they're working on for future XtremSW Suite software releases. And they all involve making multiple cards work intelligently together across multiple servers.

For one thing, it's not hard to observe that we'll need a better storage coherency mechanism before long. Think multiple readers and writers against the same storage object, and providing better mechanisms for granular synchronization and consistency across server boundaries.

Having to flush everything out to an array and read it back in again works OK for now, but we certainly can do better.

The second is pooling -- how do we turn all those nifty XtremSF cards into a shared capacity pool where any application on any server can read or write to any location, like we do with storage arrays today?

Obviously, high-speed interconnects will be desired, but there's an opportunity to re-create a familiar storage model using simple commodity servers and essentially commodity flash cards.

If this sort of stuff interests you, you might want to go read my recent post on software-defined storage concepts for a fuller exposition of some of the underlying ideas at play here.

Warning: Big Ideas At Work

While all of this can be very interesting from a pure storage perspective (server flash, software defined storage, etc.), it gets far more interesting when you start juxtaposing these thoughts with broader architectural concepts like SDDC, data fabrics and next-gen converged infrastructure.

None of this stuff happens in isolation. More on that soon ... I promise.

Comments

We didn't see the benefits you speak of, and not because of bulk sequential I/O. When you have more than 80GB assigned to a database's buffer cache for a 1TB database, in our case we saw about 5% of the I/O requests being met by the VFCache card.

You can be as fast as you like with an SSD cache card, but if 95% of your I/O comes from system memory, you can't help us, sorry. Our next generation of servers have something like 380GB of physical RAM...

First, you're right -- configure ginormous amounts of system RAM, and you'll get a great caching effect. That's been true for a long time.

Second, if you have an extremely random I/O pattern, caching of any sort isn't that helpful -- unless the cache size is a significant portion of the target database size.

That's part of the rationale behind all-flash arrays, such as XtremIO.

Third, we have to keep in mind that none of the server side caching today (RAM, SSD, PCIe, whatever) can safely be used for persistent writes - although good server-side caching can free up IOPS for the array to process more writes.

Customers have told me that using enormous amounts of RAM is (a) expensive, (b) can make the system unstable, and (c) can't be shared between multiple applications potentially running on the same server. There are popular exceptions (SAP's HANA comes to mind) and I'd be interested in your take as well.