Cloud Prizefight: OpenStack vs. VMware

There have been many discussions in the cloud landscape comparing VMware and OpenStack. In fact, it’s one of the most popular topics among those thinking about using OpenStack. I’ve given a couple of presentations to the SF Bay OpenStack Meetup on this topic and many peers have asked me to write about it. To make it interesting, I’ve decided to structure this as a head-to-head bout between these two cloud software contenders competing for usage in your data center. Some aspects I will consider are open vs. closed systems, Enterprise legacy application vs. cloud-aware application, free vs. licensed, and well-tested features vs. controlling your own roadmap.

The contenders will be judged in the following categories: design, features, use cases, and value. The categories will be scored on a 10-point scale and then tallied to determine the winner.

Round 1: Design

VMware’s suite of applications was built from ground up, starting with the hypervisor. The ESX(i) hypervisor is free and provides an excellent support structure for VMware orchestration products such as vSphere and vCloud director. The software is thoroughly tested and has a monolithic architecture. Overall, the product is documented and has a proven track history—used by high-profile customers on a multi-data-center scale. That said, the system is closed and the roadmap is completely dependent on VMware’s own objectives, with no control in the hands of consumers.

OpenStack is open source and no single company controls its destiny. The project is relatively new—2 years young—but has huge market momentum and the backing of many large companies (see: companies supporting OpenStack). With so many companies devoting resources to OpenStack it has no dependencies to a single company. However, the deployment and architecture have a steeper learning curve than VMware and the documentation is not always current.

Scoring – Design

Category

Design

Features

Use Cases

Value

OpenStack

7

VMware

8

VMware takes a small lead in design with excellent documentation and an easy-to-use interface for deployment and management. OpenStack is no slouch here, though, since it was designed from the ground up for flexibility and it’s vendor agnostic in terms of hardware and hypervisors.

Round 2: Features

VMware vMotion

vMotion is the building block for three vSphere Features: DRS, DPM, and host maintenance. Live Migration allows for the movement of a VM from one host to another with zero downtime and it’s supported via shared storage. When a VM is moved from one host to the other, the RAM state and data should be migrated to the new host. Since the storage is shared, the data does not need to move at all—rather the link to the data changes from one host to another. This makes for a fast transition time, since the data does not need to be copied/moved via a network.

OpenStack Live Migration

KVM live migration allows a guest operating system to move to another hypervisor. The main dependency here is shared storage.

KVM live migration allows a guest operating system to move to another hypervisor . You can migrate a guest back and forth between an AMD host and an Intel host. A 64-bit guest can only be migrated to a 64-bit host, but a 32-bit guest can be migrated to either. During the live migration process the guest should not be affected by the operation, and the client can continue to perform operations while the migration is running. The main dependency here is shared storage, which can be expensive.

Live migration requirements:

Distributed filesystem for the VM storage, such as NFS or GlusterFS

Libvirt must have the listen flag enabled

Each compute node (hypervisor) must be located in the same network/subnet

The authentication must be configured as none or via SSH with SSH keys

The mount point used by the DFS must be the same at each location

OpenStack Block Migration

In OpenStack, shared storage is not required for VM migration since there is support for KVM block migration. In this scenario, both the RAM state and data are moved from one host to another. The downside is that it takes longer and requires CPU resources on both host and target to make the move. There are use cases when block migration is a better option than the classic live migration because the ability to use only the network to migrate your VMs can be priceless. This is especially true if the main purpose of moving VMs is host maintenance. Some deployments do not have shared storage but still need to perform maintenance on compute nodes like kernel or security upgrades and VM downtime is not acceptable. In this case, a block migrate is the ideal solution.

Use case:

A user doesn’t have a distributed filesystem, and doesn’t want one for understandable reasons—perhaps the costs of enterprise storage and network latency—but wants to be able to perform maintenance operations on hypervisors without interrupting VMs.

VMware DRS and DPM

DRS leverages vMotion by dynamically monitoring the resource usage of VMs and hosts during runtime and moving the VMs to efficiently load balance across hosts.

Use cases:

Provision time: Initial placement of VMs based on any automation that you have set up

DPM leverages vMotion by moving VMs off hosts and shutting them down during periods of lower load to reduce power consumption. When the load grows, DPM turn hosts on again and spawns VMs on them.

OpenStack Scheduler

OpenStack includes schedulers for compute and volumes. OpenStack uses a scheduler to select an appropriate host for your VM based on a list of attributes and filters set by the cloud admin. The scheduler is quite flexible and can support a wealth of filters, but consumers can also write a custom filter using JSON. While the scheduler is flexible, and highly customizable, it’s not quite a replacement for DRS for the following reasons:

The data used by the scheduler to determine which host to provision to is the static data derived from the nova database. That is, host A already has four VMs, so let’s choose a new host for the next VM.

The scheduler only influences the placement of VMs at provision time, it will not move VMs while they are running. Dynamic data can be supported by an external monitoring solution such as Nagios working with the scheduler; however, the scheduler will still only affect the initial placement of VMs.

VMware HA

VM-level high availability (HA) in vSphere allows for the spawning of the same VM to a different host when the VM or ESX(i) host fails. It should not be confused with fault tolerance (FT), as HA is not fault tolerant. HA simply means that when something fails, it can be restored in reasonable amount of time via self healing. HA is protection for virtual machines from hardware failure: If a failure does occur, HA reboots or powers on the VM on a different ESX(i) host, so it is basically a cold power on from a crash. In HA, the services are susceptible to downtime.

OpenStack HA

Currently, there is no official support for VM-level HA in OpenStack—it was initially planned for the Folsom release but was later dropped/postponed. There is currently an incubation project called Evacuatethat is adding support for VM-level HA to OpenStack.

VMware Fault Tolerance

VMware FT livestreams the state of a virtual machine and all changes to a secondary ESX(i) server for protected VMs. Fault tolerance means that when either the primary or secondary VM’s host dies, as long as the other stays up, the VM keeps running. Contrary to marketing myths, this still doesn’t help you if an application crashes or during patching. Once it crashes, it crashes on both sides, and if you stop a service to patch, it will also stop on both VMs. It protects you against a single host failure with no interruption to the protected VM. True application-level clustering like MSCS or WCS are required to protect against application-level failures. Considering other FT limitations, like high resource usage, double, ram, disk, and cpu, and bandwidth to livestream the state this is one of the less used VMware features. It requires twice the memory as memory cannot dedupe via TPS across hosts. It also uses CPU lockstepping to sync every CPU instruction between the VMs. This results in the limitation that only single vCPU VMs can be protected with FT.

OpenStack FT

In OpenStack, there is no feature comparable to FT and there are no plans to introduce this feature. Furthermore, instructions mirroring is not supported by KVM (the most common hypervisor for OpenStack)

Scoring – Features

Category

Design

Features

Use Cases

Value

OpenStack

7

6

VMware

8

9

As you can see, there are some gaps between VMware and OpenStack, and there are also gaps within those features. OpenStack and VMware are in a battle, with both companies matching each other’s features. This is good for OpenStack as VMware is extremely expensive and OpenStack is free. VMware has spent lots of money developing these features, which need to be passed on the the consumer, whereas OpenStack features are developed by the community and can be consumed freely.

As VMware increases their lead in the features category, they have invested a great deal in features like vMotion, HA, FT, and other ways to protect the VMs. OpenStack has been catching up in features that they deem useful for cloud-aware tenants but have also dropped features deemed lower priority in order to focus on supporting more hardware solutions.

Round 3: Use Cases

Before we can assign value to the features above we need to think about use cases. In the cloud ecosystem there are two types of tenants that consume infrastructure as a service: cloud-aware and legacy. Cloud-aware applications will handle HA and DR policies on their own, while legacy application will rely on the infrastructure to provide HA and DR. See diagram below from an VMware cloud architect’s article.

Legacy applications will tend to need features such as FT, VM-level HA, and auto virus scanning, whereas cloudaware applications do not; when one VM fails, just bring up additional VMs to replace them.

Pet vs. Cattle

The analogy goes as follows: In the legacy service model, where you think of your machines as pets and give them names like dusty.cern.ch, they are raised and cared for. When they get ill, you nurse them back to health. In the cloud-aware tenant service model, VMs are treated like cattle, given number names like vm1002.cern.ch, they are all identical, and when they get ill, you shoot them and get another cow.

Future application architectures should use cattle. VMware features that nurture/protect the VM are less important in the cattle service model.

VS.

Scoring – Use Cases

Category

Design

Features

Use Cases

Value

OpenStack

7

6

8

VMware

8

9

6

OpenStack catches up in this category, since many of the features VMware had (and OpenStack didn’t) are not necessarily useful for cloud-aware applications. Furthermore, you will pay license fees for features you may not need and have no control over VMware adding the features you do need.

Round 4: Value

Here comes the final round that decides it all. However, the answer to which provides the best value isn’t as clear, since it depends on scale. While OpenStack is free to use, it does require a lot of engineering resources and expertise. It will also require more effort to architect and stand up, since it supports so many deployment scenarios and the installations patterns are never the same. VMware has associated costs for licensing but should be easier to install and get running. Also, it’s easier to get resources trained with using point and click interfaces vs. a command line.

In short, OpenStack has a higher initial cost, but as projects scale, you will get more value, due to the lack of licensing fees. VMware will be cheaper for smaller installations, but the value will diminish as you increase scale. That being said, cloud use cases are trending toward large scale and as people get more experience with OpenStack, the initial costs will be lower.

Scoring – Value

Category

Design

Features

UseCases

Value

OpenStack

7

6

8

10

VMware

8

9

6

6

And the winner is…

Scoring – Final

Category

Design

Features

UseCases

Value

Total

OpenStack

7

6

8

10

31

VMware

8

9

6

6

29

In a title bout between two of the biggest players in the cloud landscape, VMware took a big lead early on in features and design, but OpenStack came through as the underdog and won the competition by dealing a knockout blow in value.

Author’s note

Coincidentally, at the time of this writing, VMware stock plunged 22 percent in a single day on January 29, with market analysts citing the lack of a clear and well-defined cloud strategy and weak outlook…

I understand that some of you may disagree with my scoring and the fact that I assigned the same weight to each category. Truth be told, the scoring is not perfect and completely subjective, since the reason for its existence was to make the material a little more interesting. That said, please feel free to give your opinions in the comments!

Thank you for the reply Umair. I wish I could have covered all VMware features in this post, but this post is not only about feature comparison and the feature section is already quite long. I chose vmotion (block and live), DRS, DPM, VM level HA, and FT as these were the most popular among clients when weighing OpenStack and VMware.

– You mistake technology for approach when using the “Pets vs. Cattle” analogy (both products can support both approaches)

– You don’t (and can’t) quantify, in a generalized manner, the CapEx, OpEx, or TCO of each product since it depends on client needs and circumstances. There are many use cases, not just legacy vs. cloud-aware apps

– You base value on one measurement, licensing costs? Value is a measurement of benefit to cost and free, even at scale, doesn’t necessarily mean better value. Here’s an animal analogy for you – A free puppy isn’t free, and the bigger the dog gets, the less free it becomes.

In conclusion, this is biased marketing that serves no useful purpose – subjective (highly) as you have stated.

Thanks for the comment. Both can be augmented to support the approaches but there are different use cases for when one fits better.

Here is a direct quote from VMware vcloud Architect that claims VMware solutions is better suited for Legacy applications, while AWS and OS model will better suit cloud-aware application.

“The AWS / OpenStack model can be seen as a forward leaning model whereas vCloud Director can be seen as a backward leaning model. The former model aim at creating a brand new experience in how applications are engineered, developed and operated. The latter model aim at creating a cloud-like experience for workloads that have been engineered and developed in a more traditional “enterprise” way.”
Massimo Re Ferrev Cloud Architect at VMware

I’m sorry you found the information useless, if you can point me to a more useful comparison I’d appreciate it. Regarding the pure marketing remark, I covered the features with quite a lot of technical depth which actually sparked additional discussion about the technology as you can see from the other comments.

You didn’t claim to do a comparison based on what they would be in the future. Suresh’s comment is very accurate, and given the lack of documentation and “OpenStack” engineers TODAY, the costs are not accurately accounted for in the article.

All that being said, had you omitted scoring altogether, the article has a lot of very well-worded content regarding some of the differences in the two products.

Finally, I think it’s important to remember that it’s not one against the other, as VMWare provides considerable support to the OpenStack project.

This is an excellent detailed dive into the differences. I need to start to use the Pets vs. Cattle analogy.. That’s a good one.

That said, I think this evaluation would be better with a few key adds.. separate from the aforementioned ‘more VMware’ features.

On DRS vs. the OS Scheduler
– As described.. Vmware DRS has an advantage over the OpenStack scheduler, at present, because DRS applies various key metrics to placement decisions where the OS scheulder does not.. and DRS is used over the lifecycle of the VM rather than just the initial placement
– But DRS is closed, has no controllable weighting for the metrics being used, and doesn’t consider the temporal aspects of the metrics. As a simple example.. just because CPU is high during backups in the middle of the night for a short window.. doesn’t mean you should move the VM off to another host .. Another example.. If you know a spike is going to happen in the future, it would be good to feed DRS with input from the future.
– The OS Scheduler could kick DRS to the curb over time because of this latter difference, especially if it became extensible/modular/’have the provider model’

On why this dynamic/lifecycle placement function is important…
– vMotion/DRS/HA is certainly a key aspect for dealing with the legacy “Pets” oriented VM’s .. And Clearly.. over time that doesn’t matter as much as VM’s become cattle.
BUT that’s not why I care about this function. I care because vMotion/DRS is also essential in maximizing utilization/oversubscription by ‘bin packing’ the VM’s.

In one of our environments, we ended up turning off DRS because of it’s weaknesses this way.. And wrote our own scheduling algorithm to increase utilization further… unfortunately, because vmware changed things, in a closed way, across versions it was hard to maintain this customization.

On other aspects missing..
There is no discussion of the ISV support (key ISV’s only providing ‘support’ for applications running on VMware) issue nor the very top of mind issue of migrating Applications to from Pets to Cattle.
What happens when you love Fluffy so much that you don’t want to kill it and replace it with cattle? 🙂

I’d argue with a few tweaks, OpenStack could be better at handling the Pets paradigm as well.

Thanks for this detailed reply. I really liked this insight in to how DRS doesn’t control temporal aspects of metrics and ends up becoming more of a nuisance than a feature at times(You mentioned you guys turned it off?). I’m surprised that VMware has not come up with an algorithm to recognize cpu patterns and ignore cpu spikes due to backups etc? Or a mode where you can ignore DRS load balancing during specific set hours.

All excellent points, Toby. Boy, it’s nice to have a boss who knows his stuff 🙂

Regarding the inevitable demise of Fluffy…

I believe that OpenStack could certainly be better at handling the Pets paradigm, but I hope that the development of features isn’t just about a “VMWare has this capability, so we should have the same” decision, as too often features developed in that way tend to mimick APIs and interfaces that reinforce the wrong/outdated/monolithic design principles that led to the Pets thinking in the first place.

Which is why I’m always ranting about being sure that the features we implement — especially around PaaS and orchestration — complement and reinforce the Cattle viewpoint. 😉

Whether our host software solution is for XIV, DS8000, Storwize, or any other IBM storage system – the release notes, user guide, or installation guide are available on the IC in either PDF or HTML format (whatever suits you best).

I purposely left this out. When VMware purchased Nicira, I expect that most of the features Nicira developed for quantum(OpenStack) will be going in to vDS (VMware) as well. I’d like to wait for the dust to settle before tackling this one. Also, I think quantum has so much breadth that it may deserve its own post, so thanks for the idea. 🙂

Correct me if I’m wrong, but isn’t this is more of a comparison of KVM vs vSphere? The only mention of vCloud is in the first paragraph. If you’re really going to compare OpenStack with VMware, you should be comparing OpenStack with vCloud Director and what each of them do.
And as I understand OpenStack, it actually supports vSphere as a hypervisor.

I think there is some confusion. Despite VMware recently rebranding the ESX(i) name as “Vsphere Hypervisor”, Vsphere is definitely not just a hypervisor.
While ESX(i) is free and can be used by OpenStack as a hypervisor, features like DRS, vMotion, DPM, HA, and vDS are not available without additional Vsphere licenses. Without Vsphere, ESX(i) is simplified to just provisioning VMs and snapshotting.

When I’m comparing the features above I’mm comparing OpenStack with KVM to Vsphere with ESX(i). I could compare OpenStack with ESX(i) to Vsphere with ESX(i) – but the majority OpenStack features are being developed with KVM. It’ important to compare these Vsphere features as they are what people seem to care about when moving from VMware to OpenStack.

Vcloud Director used to manage multiple VMware data centers, but it can only work with vSphere, hence why it’s important to compare the features at the vSphere layer.

1. While OS seems to avoid VMware licensing costs it also seems to introduce (potentially) the highest cost of any IT shop: people/labor required to run it. You kind of alluded to this above – is that the case?
2. Completely unrelated to that point: I’d like to see you keep this going and talk about security in both environments.

1) Yes, i agree that the initial cost of set up for OpenStack can be high, so in smaller scale use cases, it makes sense to use VmWare. OpenStack main value comes from when you scale. The other big benefit is that because it’s open source, you can have more effect on the roadmap.

Lee : you should be comparing the trifecta of: vSphere – the hypervisor, vCloud director – multi-tenancy via vIrtual DCs ie for separate business units (not only multiple physical data centers)and vCloud Automation Center which is a governance portal which sits on top of vCD to determine who has access to what resources and where will the resources live.

Thanks for the feedback. As I mentioned these are the most asked for features when I work with clients thinking of switching to OpenStack. We can definitely have follow up blog on both security features and the vCloud Director features and another blog.

I really don’t see how the complexity of OpenStack makes sense for IT organizations looking to simplify, control costs, and start to return more business value more quickly to their enterprise.

The entire point of the interest in “cloud” is that business decision makers don’t want deep engineering core competency on hand at all. The idea that “as OpenStack proliferates talent will get cheap” is predicated on some assumptions that are fallacious.

1) that OpenStack *will* proliferate – this is not at all a given
2) that if it did, the talent of maintaining such a massively complex beast of a thing would somehow commoditize. This hasn’t happened in the past with complex domains which is precisely why CIOs want to get *out of the business* of running super complex stuff that requires expensive talent. For example data specialists and automation experts never “became cheap” just because everyone needs them.

OpenStack is a nice science project for IT folks who want the corporate enterprise to look like a university lab, but in the real world I don’t see how it fits.

It makes sense for a service provider like RackSpace to build on it and sell it since they don’t have the resources to actually build a competitive public cloud platform on their own, but it just doesn’t make much sense for enterprise IT to adopt.

The design patterns discussion (the cattle vs pets thing) is accurate, but the reality is that many enterprise use cases simply don’t fit cloud design patterns.

Some might *never* fit because they just don’t line up. I do hope people realize that N-tier architecture with ephemeral tiers and minimal persistent state (which is the basis for cloud design patterns), simply does not fit every use case.

And in cases where an architectural shift *could* work, there is a ton of cost in attempting it and limited value in shifting it “just because”. Enterprises don’t run infinite scale like Netflix, Facebook, Google, Amazon and Twitter. Cloud design patterns originally emerged from the huge public players because of this need for infinite scale. Why sink millions into “modernizing” an application to fit the Facebook design pattern when it is an application whose purpose ultimately requires 24×7 availability for a small (say 50,000), and static, population of users?

Enterprises bring forward a ton of legacy and monolithic application stove pipes. They don’t directly monetize technology, but rather use technology to facilitate business value. This pile of legacy, and often cumbersome, process around delivery to ensure security and compliance is at the core of the business. Transforming it is something you don’t do casually. To compound matters most IT lags in maturity.

Calling OpenStack “a winner”, seems sort of disconnected from reality honestly. Only on blogs and comment streams is everything determined by a technical scorecard. This is why analysts frequently miss the mark and technologists who can’t see past technology often end up frustrated.

Licensing is a small cost compared to the massive burden of operational and integration complexity. Migrations can cost more than 10 years of operations savings when they go sideways and enterprises know this. IT lags in maturity and capabilities and has a hard time just keeping the lights on in most cases.

Businesses are enamored with *public cloud* because they want to get *out of the IT business*, not deeper into it.

All of these issues actually have far more impact on which direction adoption will ultimately go than a technical scorecard and an analysis of licensing costs. The OSS community always misses this reality when trying to understand why enterprises dont just “stop paying for software and services and build everything out of “free” bricks”

Keep in mind that even the mainframe is still alive and well in many enterprises and, in some cases, it actually *is* the best tool for the job to which its being applied once *all* factors have been weighed.

None of this is “cool and trendy” though, and “cool and trendy” is what tends to move tech blog traffic and certainly what is key when it comes to winning funding for startup projects.

I’d have to agree with Qflux.Open stack is still relativley immature. However the datacentre is becoming more commoditised. Vmware wins the battle in the management arena. Its had a large head start. Companies are not paying for hypervisors they are paying for the ability to manage the infrastructure. The biggest cost in the DC are power and cooling this drove the move to virtualisation. Its comes down to cost. Once open cloud starts to really drive the management and cost down then it will win. However some times a hammer is the best tool for the job.

With regard to increasing supply of talent, look at how much linux talent is now available compared to when linux started gaining momentum in the server space, this is purely a function of demand.

I completely agree that it will be difficult for enterprises to shift existing workloads to cloud aware applications. However, given the amount of success that aws has had with it’s offerings, there are many enterprises ready and willing adapt their software to utilize these type of clouds

As someone who has used VMware for many
years and caring for many ‘pets’, I couldn’t agree more with QFlux. Use the right tool for the right job until you find a better tool. I’m sure when the day comes where all VMs become cattle then Open Stack and it’s teams of Virtual Farmers will be happy as pigs in a Data Center.

Personally, I think Openstack is at least 5 years away from “playing with the big dogs” such as VMware. Openstack is very complicated and difficult to admin/deploy.

What about support? VMware has awesome support. Also, VMware has training programs and certifications. You won’t have a whole lot of difficulty finding a VMware professional if you need one in a bind.

I think it will be interesting to see if it’s too late for Openstack to claim a slice of the vPie. I really love the idea of Openstack, and I hope it gets the backing it needs to morph into an enterprise quality, dependable solution.

I would disagree with the idea that larger scale provides higher value for OpenStack. Any time you have additional complexity involved in simple operations, that complexity is compounded when managing larger environments. For instance, when managing load on hosts, DRS might seem unnecessary on a couple hosts with 50 VMs, but it proves essential when you’re managing hundreds of hosts with tens of thousands of VMs. Since “value” is not only dollars but usefulness per dollar, in my experience the simpler and more automated is of higher value.

Rather then comparing and trying to decide doing things one way or the other.
I think a lot of the existing VMWare customer also would like to know how to make the two work together.
e.g. how we can get Storage vMotion/sDRS works in a openstack environment while the underlying resource pool still consist of a VMWare environment.
Customer might not want to move away from VMWare, something they have been using for years and have a rich sets of management tools (e.g. vROPs) …. and probably don’t want to convert everything they have into OpenStack.
But instead, have a global layer based on OpenStack… but is still capable to manage and provision to underlying resources which uses one or more hypervisor technologies (Hyper-V, KVM, VWWare, Public Cloud, etc).

This was a fantastic article 3 years ago. The scoring method was a useful mechanism for describing your perspective on the relative value of each solution. It reminds me of a marketing assessment I saw last year: http://rogerjbest.com/nav.cfm?A=N&C=6&P=1

Interestingly, Best splits out benefits and costs. So in your use cases benefits category, they would probably score about the same (for cloud workloads), but VMware would score much, much worse in terms of price costs – although it would be interesting to assess other costs as well. Source of cost might be purchase price, complexity, stability risk, or lifecycle costs.

Have you considered giving the two solutions a re-match to see how they stack up today? Have there been any significant changes to either company’s strategy or products or changes in industry trends that would change the scores?