What is Software Defined Storage? A VMware TMM Perspective

I’m posting this from a train which is currently hurtling its way across the middle of Ireland. I’m on my way to meet our friends at NetApp, whose Insight conference takes place in Dublin this week. We’ll be catching up later to talk about many of the storage previews and visions announced at VMworld 2012. Most of you will know by now that the vast majority of my posts are technical in nature. In this post I will be taking a slightly different slant, and try to explain one of the new concepts VMware has around storage. Some of you who have been following the announcements at VMworld will have heard the references to software defined datacenter. An integral part of this vision is software defined storage. So what exactly is that? I wanted to use this post to share some of what we at VMware envision to be software defined storage.

Many of us involved in storage provisioning and management in vSphere environments will be familiar with the concepts of determining virtual machine and application storage requirements, and passing these off to our storage administrator to create appropriate LUNs or Volumes on the storage array to meet these needs. Indeed, many of us do this ourselves, as well as managing the vSphere environment. A datastore is then built on this LUN in the case of VMFS, and our VM (along with many other VMs) would then typically share this datastore. In the case of NFS, many VMs are provisioned on the same mountpoint.

But what if the virtual machine requirements change over time? Maybe it now needs snapshot capability or it needs to be replicated. Maybe the application now needs more IOPs than was previously envisioned. Then the task of creating an appropriate datastore has to be repeated, and the virtual machine will need to be migrated to the new datastore. While vSphere has many features to help with this management, it is still a tedious process.

What if you decide you’d like to use deduplication, or perhaps flash as a cache for your VM? These are all tasks which are defined in the storage array level (albeit, we have seen these capabilities provided by appliances, and we continue to see new storage appliances provide these capabilities). But generally speaking, we are relying on the array to provide these features for our VM or application.

What if we could call out these requirements when we are creating our VM instead of having to create them in advance of VM deployment? What if we could get vSphere to inform the appliance or array about these requirements, and tell it to create Virtual Machine storage to meet these requirements? One part of software defined storage is the ability to define a profile on vSphere containing the storage capabilities that we require for our virtual machine or application, and push those requirements out to the storage layer when we deploy a VM, or indeed change the I/O requirements of a VM.

So what sort of capabilities are we talking about? Well, some of the things which spring to mind are:

Application IOPs requirements

Application latency requirements

Application Availability

Snapshot for backup

Replication for DR

Deduplication/Compression

Cache requirements

Another part of software defined storage is around these storage capabilities. What if we could offer a choice of storage services, such as cache, dedupe, etc? And what if these services were available at various parts of the I/O path, such as the host, storage array and interconnect/fabric (or appliance sitting in the path)? The ability of giving customers a choice of capabilities and where to use them in the I/O path is also part of the software defined storage vision.

Basically, we see customers building a storage profile containing the requirements for the virtual machine and applications running in the VM, and instead of having the VM storage objects prepared for us by a storage administrator, we request that the VMDK is now instantiated based on the requirements defined in the profile. So if a high number of IOPs was specificied in the profile, the VMDK would be provisioned across enough spindles (and/or with enough cache) to meet that requirement. If availability was specified as a requirement, then mirror copies of the VMDK would be created to meet this requirement. You get the idea.

Ok, that sounds good, but what features does VMware have to allow us to do this?

So we’ve been working towards this for some time. A number of services already exist in vSphere 5.0 which might have given you an inkling as to where we are going. We introduced VASA, the vSphere APIs for Storage Awareness which allows the storage array to send us details about the datastores. These capabilities are then surfaced up in vCenter. VMware also introduced VM Storage Profiles, which allowed us to use the datastore capabilities surfaced by VASA (or indeed create your own user defined capabilities) to build profiles which could be selected at provisioning time. This would then place all of your available destination datastores in compatible or non-compatible, depending on whether or not the datastore had matching capabilities as defined in the profile.

Now, there were a few limitations with the initial release. First off, VASA was a one way protocol – it could send information up to vSphere, but vSphere could not push anything down (fairly crucial if we want to use profiles to define storage requirements). Secondly, and those of you who used VASA will attest to this, VASA could only show one capability per datastore. This was limiting in many ways, and wouldn’t be very useful in a software defined storage model. To address these limitations, a new version of VASA (2.0) and Storage Policy Based Management (SPBM) are in the works. This is going to enable two way VASA communication, as well as allow numerous capabilities from the same storage object to be surfaced in vCenter.

What features or products are VMware developing to use this, and make software defined storage a reality? Again, for those of you who were at VMworld 2012, there were quite a few tech previews given out around our software defined storage vision. Remember that although we tech previewed these features at VMworld 2012, there is absolutely no guarantee that we will ever ship these products. Nor can we share any guidelines around availability, packaging or pricing. So please don’t ask.

First up is Virtual Volumes aka vVOLs. This has been in development for some time now, and is something we are collaborating with our storage partners on. The objective here is to create your policy requirements for your virtual machine or application based on the capabilities of the array, select it during VM provisioning and push it down to the storage array for instantiating. The array will read the policy requirements, and a VMDK will be created as a storage object on the array, with all the capabilities in the profile. VMDKs are now first class storage objects on the array. No more carving Luns and Volumes. No more debating whether to use NAS or block storage – so long as the array meets the requirements in the policy, it shouldn’t matter what the unlying protocol used is. The other nice thing is that vVOLs uses the concept of a protocol endpoint. This means that only one device needs to be configured for all your VMDKs, and whatever path policy or redundancy is associated with the protocol endpoint communication channel is inherited by your vVOLs (VMDKs). No more ensuring that your LUN is presented down all paths to all ESXi hosts in a consistent manner. How nice is that? You can read more about vVOLs here.

Another tech preview shown at VMworld 2012 was Distributed Storage. This is a project where the local storage across a cluster of ESXi hosts is used to create a distributed datastore. The hosts participating in the cluster can then use this distributed datastore for VM deployments, and it is all managed from vSphere. Once again, profiles play a major role. Depending on the amount of storage, and the characteristics of the storage (SSD, HDD), the datastore capabilities will be surfaced up to the vSphere layer. Storage Profiles containing a subset of the capabilities (availability, IOPS, etc) are then created, and an appropriate profile is chosen depending on the requirements of the VM (and application) being deployed. Again, this makes extensive use of VASA & SPBM. You can see more about Distributed Storage here.

One final product which was tech previewed at VMworld 2012 was vFlash. This project was to enable flash (SSD or PCIe) to be consumed as a resource by Virtual Machines. Two mechanisms are envisaged. The first is for VMs to use the flash transparently, i.e. it is a cache in their I/O path, but they are not aware of it although they benefit from the improved performance. The second mechanism is to make cache VM-aware. This method would allocate chunks of a flash resource directly to the VM as a virtual disk, which would then appear as a drive in the Guest OS. This flash drive could then be used by specific parts of an application that needed I/O performance. We would ship vFlash with some default cache algorithms, but we are planning to have an open API so if flash vendors wish to plugin their own caching alogorithms, they can certainly do that too. With this feature, one can choose to have flash at the host level instead of at the array level. This is all about giving you a choice of storage services.

I hope this post has given you some idea about where we are coming from, and more importantly, where we are going with our Software Defined Storage vision at VMware.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage

About Cormac Hogan

Cormac Hogan is a senior technical marketing architect within the Cloud Infrastructure Product Marketing group at VMware. He is responsible for storage in general, with a focus on core VMware vSphere storage technologies and virtual storage, including the VMware vSphere® Storage Appliance. He has been in VMware since 2005 and in technical marketing since 2011.

Nice post Cormac. I really like the direction VMware is going in making VMDKs 1st class citizens, I think it will simplfy a lot of processes.

You mention putting different services (dedupe, flash) at different parts of the I/O path; how disruptive do you see this being for a traditional storage design? Or maybe it’s not disruptive at all, and the decsion to use which service, at the host, fabric, or array is driven by the application requirement specified during provisioning?

This is exactly the place we want to get to Josh – selection from a choice of storage services when deploying a VM. I don’t see this being too disruptive, as vvols is already being embraced. Imagine having multiple different options for flash, dedupe, replication, etc. available to you as services, and then choosing to include or omit these services depending on a VMs requirements. How cool would that be?

This is too VMWARE centric view of everything. It defeats the purpose of software defined storage. If at all there is any such thing as software defined storage. In software defined networking, the idea is to have application level control over network. If same analogy is applied, to software defined storage, then hypervisor should be completely out of picture as far as “software defined” part is concerned. Unfortunately everyone is using buzz words like this to promote their own products. In my opinion, for a true software defined storage, all dependency on hypervisor should be removed and hypervisor should be completely transperent.

Thanks for the comment. OK – Its a VMware blog post written by a VMware employee, and the title says its my view – so guilty as charged with the VMware centric view

But yes, I do agree that this is all about applications specifying their storage requirement, but what applications can do that today? Until that mechanism is available, the only way to do it is via the virtual machine settings – that is how we are implementing it. But who knows what we will do in the future.

This is too VMWARE centric view of everything. It defeats the purpose of software defined storage. If at all there is any such thing as software defined storage. In software defined networking, the idea is to have application level control over network. If same analogy is applied, to software defined storage, then hypervisor should be completely out of picture as far as “software defined” part is concerned. Unfortunately everyone is using buzz words like this to promote their own products. In my opinion, for a true software defined storage, all dependency on hypervisor should be removed and hypervisor should be completely transperent.

Cormac, what is your opinion on EMC ViPR to address Chandra’s comments. ViPR is a wide open platforms with open north and southbound APIs and support disparate storage technologies and will support DAS in the near future as well. In addition it is projected to offer a rich set of data services.