Space-Efficient Sparse Virtual Disks and VMware View

09/04/2012

VMware is about to release vSphere 5.1. This new release is bringing many improvements to the vSphere storage architecture, but I would like specifically to discuss the changes to the virtual disk format and how it will impact VMware View implementations when using Linked Clones. Cormac Hogan wrote What’s New in VMware vSphere® 5.1 – Storage and I am basing my discussion on his paper.

Virtual Disk Format 5.1

When a virtual machine’s operating system reads and writes to the virtual disk, it uses the same interfaces as for physical disk. VMware designed the VMDK (virtual machine disk) format to mimic the operation of physical disk. Virtual disks are stored as one or more VMDK files on the host computer or remote storage device, and appear to the guest operating system as standard disk drives.

“Thin provisioning of storage addressed a major storage inefficiency issue by allocating blocks of storage to a guest operating system (OS) file system or database only as they were needed, rather than at the time of creation. However, traditional thin provisioning does not address reclaiming stale or deleted data within a guest OS, leading to a gradual growth of storage allocation to a guest OS over time.

With the release of vSphere 5.1, VMware is introducing a new virtual disk type, the space-efficient sparse virtual disk (SE sparse disk). One of its major features is the ability to reclaim previously used space within the guest OS”

The previous Linked Clone format made up of extents that begin small and grow over time are referred to as sparse extents or VMFSSparse. The new Space-Efficient Sparse format is referred as SESparse.

“Another major feature of the SE sparse disk is the ability to set a granular virtual machine disk block allocation size according to the requirements of the application. Some applications running inside a virtual machine work best with larger block allocations; some work best with smaller blocks. This was not tunable in the past.
Space Reclaim”

I commonly hear storage vendors mentioning IO size profile for Windows XP or 7 as Random and around 4KB block size. What they forget is that on top of the Windows OS there are a number of applications that will generate IO block requests with different IO size.

In my tests, as per graphs below, the average Write IO size was around 11KB, while the average Read IO size was around 4KB. In a VDI workload where writes are the majority of the IO workload I would almost discard the Read IOs, specially if we account for storage and host caching technologies available today.

“The new SE sparse disk implements a space reclaim feature to reclaim blocks that were previously used but now are unused on the guest OS. These are blocks that were previously written but currently are unaddressed in a file system/database due to file deletions, temporary files, and so on.

There are two steps involved in the space reclamation feature: The first step is the wipe operation that frees up a contiguous area of free space in the virtual machine disk (VMDK); the second step is the shrink, which unmaps or truncates that area of free space to enable the physical storage to be returned to the free pool.

Wipe

• Initiate a call to VMware Tools to scan the guest OS file system.
• Mark the unused blocks as free.
• Run the SCSI UNMAP command in the guest to instruct the virtual SCSI layer in the VMkernel to mark the blocks as free in the SE sparse disk.

The wipe operation is initiated via an API call to VMware Tools. VMware Tools initiates a scan of the guest OS to find stranded space and mark the file system blocks as free. The first SCSI UNMAP operation is then run from within the guest OS, instructing the VMkernel as to which blocks can be reclaimed. The VMkernel captures these SCSI UNMAP commands and does not pass them through to the array. When the VMkernel detects which blocks are free, it uses its virtual SCSI layer to reorganize the SE sparse disk by moving blocks from the end of the disk to unallocated blocks at its beginning. This creates a contiguous area of free space within the VMDK. The shrink operation then sends either an SCSI UNMAP command (for SCSI disks) or an RPC TRUNCATE command (for NFS) to the array to free the space.”

“The virtual machines require Hardware Version 9 (HWv9) to handle the SCSI UNMAP command in the guest OS. Earlier versions cannot handle this and will fail the operation.“

New Grain Size

In vSphere 5.0, the default grain size/block allocation unit size for virtual machine disks on ESXi was 4KB. Redo logs, used by snapshots and linked clones, had a grain size of 512 bytes (one sector).

As mentioned previously, with the introduction of SE sparse disks, the grain size is now tunable and can be set based on the requirements of a particular storage array or application.

In the initial release of SE sparse disks in vSphere 5.1, the default grain size is set to 4KB. Specific VMware products and features that use the new SE sparse disk format for redo logs/linked clones will also use this new default grain size.

NOTE: Direct user tuning of the grain size is not exposed in vSphere 5.1.”

I don’t have yet a specific recommendation for the size/block allocation. As Cormac Hogan is highlight in his paper, the grain size is not user settable in vSphere 5.1. If VMware exposes the function than perhaps would make sense to tune it for write IOs; and in my case probably around 11KB block size.

“SE Sparse Disk Initial Use Case

The scope of SE sparse disks in vSphere 5.1 is restricted to VMware View. VMware® View™ Composer can use linked clones for the rollout of desktops. Linked clones are read/write snapshots of a read-only parent desktop image. View benefits from the new 4KB grain size, which improves performance by addressing alignment issues experienced in some storage arrays with the 512-byte grain size used in linked clones based on the vmfsSparse (redo log) format. The SE sparse disk format also provides far better space efficiency to desktops deployed on this virtual disk format, especially with its ability to reclaim stranded space.

When deploying desktops with View Composer, a number of images/snapshots are created. In the vSphere 5.1, SE sparse disk format can be used by the View Composer for linked clones and subsequent snapshots of them. These images represent the majority of storage used in a View environment.”

For VMware View deployments leveraging Linked Clones with Non-Persistent desktops that get refreshed or re-composed every so often, or after user logoff, the storage savings provided by the new SESparse disk format may not sound so attractive. However, the new SESparse format will allow administrators to create Persistent Pools of desktops using Linked Clones, therefore having the ability to perform linked clone operations, while being space efficient.

The wipe/shrink operation is also critical for eliminating the issue with unaligned blocks due to linked clone growth based on 4KB block size growth. Ultimately that means there will be less IO requests hitting the storage subsystem during normal workload operations.

prezha

Andre, thank you for this great overview.
Please just clarify the “Storage vMotion” part: “SESparse disks are converted to VMFSSparse format if the destination host is not running vSphere 5.1 or later” – what do you mean by “destination host” in Storage vMotion – is it the host from which Storage vMotion was initiated, or?
Thank you
BR prezha

prezha

Andre, thank you for the answer.
Unless I got something wrong, in its simplest form, Storage vMotion (not just vMotion nor combination of both) should be related only to moving VM files from one LUN/storage to another while running (if not off) on the one and the same host all the time? Taking this as a premise, I don’t see two hosts, but only one related to a Storage vMotion and that is why I asked about the “destination host” relation with Storage vMotion in my question.
Please correct me if I am wrong.
Thank you in advance
BR prezha

Storage vMotion is action of moving the VM files while in production (powered on) to a different LUN or datastore. The operation can be executed on a single host, or also moving VM to a different host.

[…] new Space-Efficient Sparse Virtual Disks available in vSphere 5.1. I previously wrote about it in Space-Efficient Sparse Virtual Disks and VMware View and announced the new Horizon View 5.2 feature at What’s New in VMware Horizon View 5.2 (Beyond […]

[…] These are blocks that were previously written but currently are unaddressed in a file system/database due to file deletions, temporary files, and so on. There are two steps involved in the space reclamation feature: The first step is the wipe operation that frees up a contiguous area of free space in the virtual machine disk (VMDK); the second step is the shrink, which unmaps or truncates that area of free space to enable the physical storage to be returned to the free pool. For a deep-dive on the technology read Space-Efficient Sparse Virtual Disks and VMware View. […]