Edward L. Haletky covers VMware ESX Server's data store performance or bandwidth issues, SCSI-2 reservation issues, performance-gathering agents, and then finishes with some other issues and a discussion of the impact of Sarbanes-Oxley.

This chapter is from the book

ESX creates a myriad of problems for administrators, specifically problems having to do with the scheduling of various operations
around the use of normal tools and other everyday activities such as deployments, VMotion to balance nodes, and backups. Most,
if not all, the limitations revolve around issues related to performance gathering and the data stores upon which VMs are
placed, whether SCSI, including iSCSI, or non-VMDK files accessed from NFS shared off a NAS or some other system.

The performance-gathering issues dictate which tools to use to gather performance data and how to use the tools that gather
this data. A certain level of understanding is required to interpret the results, and this knowledge will assist in balancing
the VMs across multiple ESX Servers.

The data store limitations consist of bandwidth issues; each has a limited pipe between the ESX Server and the remote storage
and reservation or locking issues. These two issues dictate quite a bit how ESX should be managed. As discussed in Chapter
5, “Storage with ESX,” SCSI reservations will occur whenever the metadata of the VMFS is changed and the reservation happens
for the whole LUN and not an extent of the LUN. This also dictates the layout of VMFS on each LUN; specifically, a VMFS should
take up a whole LUN and not a part of the LUN.

This chapter covers data store performance or bandwidth issues, SCSI-2 reservation issues, and performance-gathering agents,
and then finishes with some other issues and a discussion of the impact of Sarbanes-Oxley. Note that some of the solutions
discussed within this chapter are utopian and not easy to implement within large-scale ESX environments. These are documented
for completeness and to give information that will aid in debugging these common problems.

Data Store Performance or Bandwidth Issues

Because bandwidth is an issue, it is important to make sure that all your data stores have as much bandwidth as possible and
to use this bandwidth sparingly for each data store. Normal operational behavior of a VM often includes such things as full
disk virus scans, backups, spyware scans, and other items that are extremely disk-intensive activities. Although none of these
activities will require any form of locking of the data store on which the VMDK resides, they all take a serious amount of
bandwidth to accomplish. The bandwidth requirements for a single VM are not very large compared to an ESX Server with more
VMs. Staggering the activities in time will greatly reduce the strain on the storage environment, but remember that staggering
across ESX Servers is a good idea as long as different data stores are in use on each ESX Server. For example, it would cause
locking issues for VMs that reside on the same LUN but different ESX Servers to be backed up at the same time. This should
be avoided. However, virus scans will not cause many issues when done from multiple VMs on the same LUN from multiple ESX
Servers, because operations on the VMDK do not cause locks at the LUN level. It is possible that running of disk-intensive
tools within a VM could cause results similar to those that occur with SCSI Reservations, but are not reservations. Instead,
they are load issues that cause the SAN or NAS to be overworked and therefore present failures similar to SCSI-2 Reservations.

Best Practice for Internal VM Operations

Stagger all disk-intensive operations internal to the VM over time and ESX hosts to reduce strain on the storage network.