Tuesday, November 12, 2013

In a virtualized environment, storage operations traditionally have been expensive from a resource perspective. Functions such as cloning and snapshots can be performed more efficiently by the storage device than by the host.VMware vSphere® Storage APIs – Array Integration (VAAI), also referred to as hardware acceleration or hardware offload APIs, are a set of APIs to enable communication between VMware vSphere ESXi™ hosts and storage devices. The APIs define a set of “storage primitives” that enable the ESXi host to offload certain storage operations to the array, which reduces resource overhead on the ESXi hosts and can significantly improve performance for storage-intensive operations such as storage cloning, zeroing, and so on. The goal of VAAI is to help storage vendors provide hardware assistance to speed up VMware® I/O operations that are more efficiently accomplished in the storage hardware.Without the use of VAAI, cloning or migration of virtual machines by the vSphere VMkernel Data Mover involves software data movement. The Data Mover issues I/O to read and write blocks to and from the source and destination datastores. With VAAI, the Data Mover can use the API primitives to offload operations to the array if possible. For example, if the desired operation were to copy a virtual machine disk (VMDK) file from one datastore to another inside the same array, the array would be directed to make the copy completely inside the array. Whenever a data movement operation is invoked and the corresponding hardware offload operation is enabled, the Data Mover will first attempt to use the hardware offload. If the hardware offload operation fails, the Data Mover reverts to the traditional software method of data movement. In nearly all cases, hardware data movement will perform significantly better than software data movement. It will consume fewer CPU cycles and less bandwidth on the storage fabric. Improvements in performance can be observed by timing operations that use the VAAI primitives and using esxtop to track values such as CMDS/s, READS/s, WRITES/s, MBREAD/s, and MBWRTN/s of storage adapters during the operation.In the initial VMware vSphere 4.1 implementation, three VAAI primitives were released. These primitives applied only to block (Fibre Channel, iSCSI, FCoE) storage. There were no VAAI primitives for NAS storage in this initial release.In vSphere 5.0, VAAI primitives for NAS storage and VMware vSphere Thin Provisioning were introduced.

VAAI Block Primitives

In VMware vSphere VMFS, many operations must establish a lock on the volume when updating a resource.

Because VMFS is a clustered file system, many ESXi hosts can share the volume. When one host must make an

update to the VMFS metadata, a locking mechanism is required to maintain file system integrity and prevent

another host from coming in and updating the same metadata. The following operations require this lock:

1. Acquire on-disk locks

2. Upgrade an optimistic lock to an exclusive/physical lock

3. Unlock a read-only/multiwriter lock

4. Acquire a heartbeat

5. Clear a heartbeat

6. Replay a heartbeat

7. Reclaim a heartbeat

8. Acquire on-disk lock with dead owner

It is not essential to understand all of these operations in the context of this whitepaper. It is sufficient to

understand that various VMFS metadata operations require a lock.

ATS is an enhanced locking mechanism designed to replace the use of SCSI reservations on VMFS volumes

when doing metadata updates. A SCSI reservation locks a whole LUN and prevents other hosts from doing

metadata updates of a VMFS volume when one host sharing the volume has a lock. This can lead to various

contention issues when many virtual machines are using the same datastore. It is a limiting factor for scaling to

very large VMFS volumes. ATS is a lock mechanism that must modify only a disk sector on the VMFS volume.

When successful, it enables an ESXi host to perform a metadata update on the volume. This includes allocating

space to a VMDK during provisioning, because certain characteristics must be updated in the metadata to

reflect the new size of the file. The introduction of ATS addresses the contention issues with SCSI reservations

and enables VMFS volumes to scale to much larger sizes.

In vSphere 4.0, VMFS3 used SCSI reservations for establishing the lock, because there was no VAAI support in

that release. In vSphere 4.1, on a VAAI-enabled array, VMFS3 used ATS for only operations 1 and 2 listed

previously, and only when there was no contention for disk lock acquisitions. VMFS3 reverted to using SCSI

reservations if there was a multihost collision when acquiring an on-disk lock using ATS.

In the initial VAAI release, the ATS primitives had to be implemented differently on each storage array, requiring

a different ATS opcode depending on the vendor. ATS is now a standard T10 SCSI command and uses opcode

0x89 (COMPARE AND WRITE).

For VMFS5 datastores formatted on a VAAI-enabled array, all the critical section functionality from operations 1

to 8 is done using ATS. There no longer should be any SCSI reservations on VAAI-enabled VMFS5. ATS continues

to be used even if there is contention. On non-VAAI arrays, SCSI reservations continue to be used for