An IT industry insider's perspective on information, technology and customer challenges.

March 11, 2015

Why I Think VSAN Is So Disruptive

Looking for a great disruption story in enterprise IT tech? I think what VSAN is doing to the established storage industry deserves to be a strong candidate.

I've seen disruptions -- small and large -- come and go. If you're into IT infrastructure, this is one worth watching.

A few years ago, I moved from EMC to VMware on the power of that prediction. So far, it’s played out pretty much as I had hoped it would. There’s now clearly a new dynamic in the ~$35B storage industry, and VMware’s Virtual SAN is very emblematic of the changes that are now afoot.

There’s a lot going on here, so it’s worth sharing. In each case, you’ll see a long-held tenent around The Way Things Have Always Been Done clearly up for grabs.

See if you agree?

I began this post by making a list of changes — deep, fundamental changes — that VSAN is starting to bring about in the storage world.

To be clear, I’m not talking so much about specific technologies, or how this vendor stacks up against that other one.

I’m really far more interested in the big-picture changes around fundamental assumptions as to “how storage is done” in IT shops around the globe: how it's acquired, how it's consumed, how it's managed.

If you’re not familiar with Virtual SAN, here’s what you need to know: it’s storage software built into the vSphere hypervisor. It takes the flash and disk drives inside of servers, and turns them into a shared, resilient enterprise-grade storage service that’s fast as heck. Along the way, it takes just about every assumption we've made about enterprise storage in the last 20 years and basically turns it on its head.

Storage Shouldn’t Have To Be About Big Boxes

Most of today’s enterprise storage market is served by external storage arrays, essentially big, purpose-built hardware boxes running specialized software. Very sophisticated, but at a cost.

If your organization needs a non-trivial amount of storage, you usually start by determining your requirements, evaluating vendors, selecting one, designing a specific configuration, putting your order in, taking delivery some time later, installing it and preparing it for use.

Big fun, right?

The fundamental act of simply making capacity ready to consume — from “I need” to “I have” — is usually a long, complex and often difficult process: best measured in months. I think the most challenging part is that IT shops have to figure out what they need well before actual demand shows up. Of course, this approach causes all manner of friction and inefficiency.

We’ve all just gotten used to it — that’s just the way it is, isn’t it? Sort of like endlessly sitting in morning commute traffic. We forget that there might be a better way.

The VSAN model is completely different. Going from “I need” to “I have” can be measured in days — or sometimes less.

For starters, VSAN is software — you simply license the CPUs where you want to use it. Or use it in evaluation mode for a while. The licensing model is not capacity-based, which is quite refreshing. That makes it as easy to consume as vSphere itself.

The hardware beneath VSAN is entirely up to you, within reason. Build a VSAN environment from hand-selected components if that’s your desire. Grab a ReadyNode if you’re in a hurry. Or go for something that’s packaged the ultimate in a simplified experience: EVO:RAIL. Choice is good.

Depending on your hardware model, getting more storage capacity is about as simple as ordering some new parts for your servers. Faster, easier, smaller chunks, less drama, etc. No more big boxes.

Yes, there is a short learning curve the first time someone goes about putting together a VSAN hardware configuration (sorry!), but — after that — there’s not much to talk about.

There are some obvious and not-so-obvious consequences from this storage model.

Yes, people can save money (sometimes really big $$$) by going this way. Parts is parts. We’ve seen plenty of head-to-head quotes, and sometimes the differences are substantial.

But there’s more that should be considered …

Consider, for example, that storage technologies are getting faster/better/cheaper all the time.

Let’s say a cool new flash drive comes out — and it looks amazing. Now, compare the time elapsed between getting that drive supported with VSAN, and getting it supported in the storage arrays you currently own.

There's a big difference in time-to-usability for any newer storage tech. And that really matters to some people.

One customer told us he likes the “fungibility” of the VSAN approach, given that clusters seem to be coming and going a lot in his world. He has an inventory of parts, and can quickly build a new cluster w/storage from his stash, tear down a cluster that isn’t being used for more parts, mix and match, etc.

Sort of like LEGOs.

Just try that with a traditional storage array.

More Performance (Or Capacity) Shouldn’t Mean A Bigger Box

A large part of storage performance comes down to the storage controllers inside the array: how many, how fast.

Add more servers that drive more workload, and you’re often looking at the next-bigger box — and all the fun that entails: acquiring the new array, migrating all your applications, figuring out what to do with the old array, etc.

Yuck. But that’s the way it’s always been, right?

VSAN works differently.

As you add servers to support more virtualized applications, at the same time you’re also adding the potential for more storage performance and capacity. A maxed-out 64 node VSAN cluster can deliver ~7m cached 4K read IOPS.

Want more performance without adding more servers? Just add another disk controller and disk group to your existing servers, or perhaps just bigger flash devices, and you’ll get one heck of a performance bump.

Without having to call your storage vendor :)

Storage Shouldn’t Need To Be Done By Storage Professionals

I suppose an argument could be made about it being best to have your taxes done by tax professionals, but an awful lot of people seem to do just fine by using TurboTax software.

There certainly are parts of the storage landscape that are difficult and arcane — and that’s where you need storage professionals. There are also an awful lot of places where a simple, easy-to-use solution will suffice quite nicely, and that’s what VSAN brings to the table.

With VSAN, storage just becomes part of what a vSphere administrator does day-to-day. No special skills required. Need a VM? Here you go: compute, network and storage. Policies drive provisioning. Nothing could really be simpler.

No real need to interact with a storage team — unless there’s something special going on.

Can't We All Just Work Together?

Any time you get a team greater than a handful of people, people split up into different roles. The classic pattern in enterprise IT infrastructure has a dedicated server team, a dedicated network team, a storage team, etc.

The vSphere admins are usually dependent on the others to do basic things like provision, troubleshoot, etc. For some reason, I’ve observed particular friction between the virtualization team and the storage team. As in people on both sides pulling their hair out.

Many virtualization environments move quickly: spinning up new apps and workloads, reconfiguring things based on new requirements — every day (or every hour!) brings something new.

That’s what virtualization is supposed to do — makes things far more flexible and liquid.

When that world bumps up against a traditional storage shop that thinks in terms of long planning horizons and careful change management — well, worlds collide.

With VSAN, vSphere admins can be self-sufficient for most of their day-to-day requirements. No storage expertise required. Of course, there will always be applications that can justify an external array, and the team that manages it.

It’s just that there will be less of that.

Storage Software Is Now Not Just Another Application

The idea of doing storage in software is not new. The idea of building a rich storage subsystem into a hypervisor is new. And, when you go looking, there are plenty of software storage products that run as an application, also known as a VSA or virtual storage appliance.

In this VSA world, your precious storage subsystem is now just another application. It competes for memory and CPU like all other applications, but with one exception: when it gets slow, everything that uses it also gets slow.

Because it’s built into the hypervisor, its resource requirements are quite reasonable. It doesn’t have to compete with other applications, because it isn’t a standalone application like a VSA is. Your servers can be smaller, your virtualization consolidation ratios better — or both.

Why do I think this will change things going forward?

Because VSAN now establishes the baseline for what you should expect to get with your hypervisor. Any vendor selling a VSA storage product as an add-on has to make a clear case as to why their storage thingie is better than what already comes built into vSphere.

Not only in justifying the extra price, but also the extra resources as well as the extra management complexity. Clearly, there are cases where this can be done, but there aren’t as many as before.

And that’s going to put a lot of pressure on the vendors who use a VSA-based approach.

The Vendor Pecking Order Changes

The last wave of storage hardware vendors were all array manufacturers — they got all the attention. In this wave, the storage component vendors are finding some new love.

As a good example, the flash vendors such as SanDisk and Micron are starting to do a great job marketing directly to VSAN customers. Why? A decent proportion of a VSAN config goes into flash, and how these devices perform affects the entire proposition.

This new-found stardom is not lost on them — especially as we start with all-flash configurations.

At one time, there was a dogfight between FC HBA vendors who wanted to attach to all the SANs that were being built. In this world, it’s the storage IO controller vendor. Avago (formerly LSI) as well as some of their newer competitors are aware that there’s a new market forming here, and realizing they can reach end users directly vs. being buried in an OEM server configuration.

There’s A Lot Going On In Storage Right Now …

We’ve seen one shift already from disk to flash — that much is clear. Interesting, but — at the end of the day, all we were really doing was replacing one kind of storage media with another.

What I’m seeing now has the potential to be far more structural and significant. Now up for grabs is the fundamental model of "how storage is done" in IT shops large and small.

An attractive alternative to the familiar big box arrays of yesterday.

Storage being specified, acquired, consumed, delivered and managed by the virtualization team, with far less dependence on the traditional storage team.

Storage being consumed far more conveniently than before.

Storage software embedded in the hypervisor having strong architectural advantages over other approaches.

Storage being able to pick up all the advances in commodity-oriented server tech far faster than the array vendors.

Component vendors becoming far more important than before.

And probably a few things I forgot as well :)

Yes, I work for VMware. And VSAN is my baby.

But there’s a reason I chose this gig — I thought VMware and VSAN were going to be responsible for a lot of healthy disruptive changes in the storage business. Customers would win as a result.

As hyper-converged systems are comprised of a shared-nothing storage architecture they inherently laden by low storage utilization. How can you position VSAN or EVO:RAIL for enterprise-scale deployments? Doesn't their ~20% storage utilization disqualify them from AFAs that deliver greater than 300% storage utilization?

To reiterate, I think HCI addresses significant market needs - I just disagree with the notion that HCI can address enterprise storage needs. Customers wrestle with data center resource scarcity as much as they do with storage challenges.

First, you're trying to position hyperconverged infrastructure as limited to small scale: remote offices and the like. While it obviously is a good fit there, there's also real customer evidence that it's a great fit in the data center as well.

Second, I think you're trying to make it all about storage efficiency. For some shops, that will be one concern among many. For most, "people efficiency" is far more important.

These IT leaders look at things differently: how efficient are my people in delivering the services that the business requires? That's where products like VSAN and EVO:RAIL shine.

In addition to "price per usable", there are strong advantages over dedicated external storage arrays that have to be procured separately, managed separately, supported separately, etc. Just like many of us prefer using a single mobile device over multiples.

I love the marketing claim of "300%". Go big, or go home!!! Even though we both know that actual results might vary significantly :)

Can HCI storage solutions go toe-to-toe feature-for-feature with the best arrays out there? No, not in some aspects -- but uniquely strong in others. And I think you'll find the gap closing fast indeed.

Let's not forget that the hardware in a VSAN implementation is the same low-cost commodity hardware they're using in their servers. Even with an array's moderate abilities to dedupe, the VSAN proposals often come out less expensive thanks to how parts are priced in the open market.

Stepping back, if you think back to how hypervisors found their way into data centers, we're seeing the same process today, only vastly accelerated. Back then, there were valid cases where physical made sense. Far fewer of those today, no?

I know where all those VSAN implementations are. And if you're trying to convince yourself that a good portion aren't in data centers supporting demanding applications, you'd be mistaken.

I personally feel that VMware needs to be careful to not over hype the technology.

In the right use case I am sure it makes a lot of sense, but at the end of the day it is not a mature solution yet as it has only been available for a year.

It is important that customers are aware of some of the limitations:

1. Usable to raw capacity is poor
2. Double disk protection, a requirement for many, is not really viable
3. You need to licence every node in the cluster even if it does not use VSAN
4. No redundancy for the cache drive - if it fails the entire disk group goes down - not ideal
5. ...

I think the biggest issue the storage industry has is over inflated list prices - this is what causes many of the problems listed above (i.e. slow procurement process).

If the storage array vendors can sort out their list prices, let's hope VSAN forces them to do this, and addresses the ease of use issues by making converged stacks available that are built on simple commodity servers rather than UCS then I think it will be an interesting battle.

I am sure there is a significant place for VSAN moving forward but I still think a modernised storage array industry has a lot going for it - on the other hand if it continues to make things complex and insists on outrageous list prices then it is in trouble.

You've got your facts wildly incorrect, so let me see if I can help you?

1. If you use a policy to protect against a single failure, roughly 50% of raw space is available for consumption. That's essentially RAID 1. The good news is that VSAN does this with low-cost server disk drives. Go price out a usable config, compare $/usable to any array, and you'll see what I mean. Using RAID 1 also has advantages on read performance and rebuilds.

2. Protection level is set by policy on a per-VMDK basis if needed, using FTT, or failures to tolerate. FTT levels from 0 to 3 are supported, so that would protect against double (and even triple!) disk failures. Again, users can pick and choose which VMDKs get which protection level, or none if that's indicated.

3. Yes, you license every node in a cluster, because ideally every node is both producing and consuming storage services. Other licensing approaches can be debated, but this approach got the nod from our customers as being simple and consistent with other things they do.

4. The level of cache protection (hence redundancy) is determined through on a per-object basis using the FTT (failures to tolerate) setting: none, one, two, three etc. If a cache drive fails, it affects all the capacity devices behind it, usually 1-4. If FTT>0, production continues and a rebuild of the failed components begins.

Having come from the array business, this is an entirely reasonable failure domain.

I will agree with you, though, that the lack of transparent pricing in the array business doesn't do customers any favors. In the array vendor's defense, though, component prices change downward frequently, and there are dozens of variable that go into pricing a given configuration.

I do not think it is fair to say I have got my facts wildly incorrect - I have spent a considerable amount of time studying VSAN 5.5 and 6.0 over the last year.

1. If you use a policy to protect against a single failure, roughly 50% of raw space is available for consumption. That's essentially RAID 1. The good news is that VSAN does this with low-cost server disk drives. Go price out a usable config, compare $/usable to any array, and you'll see what I mean. Using RAID 1 also has advantages on read performance and rebuilds.
>>>> 50% usable capacity is not competitive today, take a look at products like XtremIO they use a 23+2 RAID Group which will eventually also enable double disk protection - NetApp FAS would also be very similar, and it has double disk protection today

I appreciate what you are saying about low-cost drives, but in order to compete with products like EMC VNXe or NetApp E-Series in the low/mid market the drives would need to be at a negative cost:

You could get a very nice E-Series or VNXe with a considerable amount of storage for this amount of money whereas with VSAN this is software only - I appreciate that as you progress beyond year 5 VSAN will become more competitive as at year 5 you will probably replace the entire array whereas you will just pay maintenance on VSAN and replace the hardware

2. Protection level is set by policy on a per-VMDK basis if needed, using FTT, or failures to tolerate. FTT levels from 0 to 3 are supported, so that would protect against double (and even triple!) disk failures. Again, users can pick and choose which VMDKs get which protection level, or none if that's indicated.
>>>> I am aware you can have double disk or triple disk protection but it does not look like something you would want to use in the real world as it would use up more CPU resources and reduce usable capacity to around one third/quarter - server disks are cheap, but not that cheap

3. Yes, you license every node in a cluster, because ideally every node is both producing and consuming storage services. Other licensing approaches can be debated, but this approach got the nod from our customers as being simple and consistent with other things they do.
>>>> Why not just license the nodes that hold the storage? I think if you ask most customers that is what they would ask for - with this model you would address the problem that storage and compute do not scale linearly, the idea of a 20 node cluster with 4 dedicated VSAN nodes looks very attractive to me

4. The level of cache protection (hence redundancy) is determined through on a per-object basis using the FTT (failures to tolerate) setting: none, one, two, three etc. If a cache drive fails, it affects all the capacity devices behind it, usually 1-4. If FTT>0, production continues and a rebuild of the failed components begins.

Having come from the array business, this is an entirely reasonable failure domain.
>>>> I really do not see how you can defend an architecture whereby the failure of one drive (in a 1+7 disk group) takes all 8 drives offline - surely it must be on the the roadmap to resolve this so that you can have multiple cache drives to both scale performance (another limitation today) and local redundancy

I may come across as being anti VSAN and pro storage arrays, but this is not true I am just trying to objectively determine the strengths and weaknesses of each and at this stage I would conclude:

1. VSAN is too expensive in a lot of scenarios - it will make the most sense when you have a small number of nodes and either a lot of capacity or a lot of IOPS, everything in between it will struggle with
2. VSAN is immature and unproven compared to storage arrays (but we will not be able to say that in a few years)
3. It has some architectural limitations compared to storage arrays (no parity RAID or Erasure coding provides poor capacity utilisation and a caching architecture that on failure of a single drive fails up to 7 other drives)
4. It's simplicity and linear scaling of compute and storage undoubtedly will fit some use cases today
5. The traditional storage array vendors need to move to more realistic street prices otherwise they will continue to "shoot themselves in the foot" and drive customers to the start-ups and hyper-converged products like VSAN
8. Storage arrays need to move to erasure coding as RAID is now looking "a bit long in the tooth"

Just my opinion and as always the market will decide so it will be interesting to observe what happens over the next few years.

Mark, you still have your facts a bit messed up. It sounds like you're trying to skew your subjective observations. I'll try once more with real facts, but I'm not hopeful here.

1. The prices you are quoting for software are list prices. Many customers already have volume licensing arrangements. Second, I know what E-Series and VNXe arrays price at. Go ahead and do a head-to-head usable GB comparison.

I should also point out that protection is defined on a per-object basis. Depending on your needs, some object classes could have zero protection, be able to tolerate a single failure, be able to tolerate two failures, etc.

Also, keep in mind when you sign up for one of those entry-level arrays, your future options are limited around expansion, hardware upgrades, etc. Not the case with VSAN.

2. VSAN is designed to use no more than 10% of host CPU, frequently much less. That's true regardless of protection mode selected. A Seagate 4TB SAS drive goes for $269 these days online. Yes, Mark, server drives *are* that inexpensive! How much do you think that exact same drive would cost from an enterprise array vendor?

3. Regarding your "ideal" configuration, you're still thinking like a traditional storage array dude. An evenly balanced cluster is just that -- evenly balanced. Great performance, minimal CPU and memory impact. While the product certainly supports creating unbalanced configurations, it would be a more complex environment to manage, and the main motivation here is simplicity.

Licensing is licensing. We went with a licensing scheme that was simple and consistent with what vSphere customers are doing today. To date, neither pricing nor licensing has been a barrier to customer adoption.

4. I think you're mistaken about how VSAN failure domains work. Yes, the failure of a cache device will render all the capacity devices behind it unreachable. That is the same behavior as a cache controller failing in a storage array. A given server can support multiple storage controllers and/or multiple cache devices if desired. So if you wanted to have 4+ small cache devices in a server, each with a modest amount of capacity behind it, that's an option.

Yes, that costs more -- but the flexibility is there.

As you your other subjective personal opinions: VSAN is too expensive, immature, architecturally lacking key features, etc. -- you are more than welcome to your perspectives, but none of these are objective facts.

I think your biggest misperception is that VSAN is directly competing with traditional storage arrays. That's not really the case. I could make a long list of things that storage arrays do that VSAN does not. And, conversely, I could make a long list of things that VSAN does that storage arrays do not.

VSAN is designed for vSphere administrators who want a storage product that works the way they do. Which is why it's not a storage array.

As I said I am not in any way anti VSAN,I am just not convinced that we will see a large percentage of vSphere customers adopting the technology for a significant proportion of their storage estate - only time will tell and I might well be wrong!!!

I also think VSAN has come a long way considering it has only been on the market for a year and I am sure there will be major improvements over the next few years (i.e. de-dupe, erasure coding, multiple cache drives).

At the end of the day it is much easier for me to have an objective opinion as I do not work for a vendor, but I do agree I am most comfortable with a storage array so this is always going to be my frame of reference at least for the next few years.