Knowledge Base

ESXi/ESX hosts with visibility to RDM LUNs being used by MSCS nodes with RDMs may take a long time to start or during LUN rescan (1016106)

Symptoms

ESXi/ESX 4.x and ESXi 5.x hosts take a long time to start. This time depends on the number of RDMs that are attached to the ESXi/ESX host.

Note: In a system with 10 RDMs used in an MSCS cluster with two nodes, a restart of the ESXi/ESX host with the secondary node takes approximately 30 minutes. In a system with less RDMs, the restart time is less. For example, if only three RDMs are used, the restart time is approximately 10 minutes.

ESXi intermittently shows an error message on the Summary Tab and vSphere Client may not be able to start:

Cannot synchronize host hostname. Operation Timed out.

The logging screen shows the start waiting after this message:

Loading module multiextent.

The cluster is running virtual machines participating in an MSCS using shared RDMs and SCSI Reservations across hosts, and a virtual machine on another host is the active cluster node holding a SCSI Reservation.

Delay appears at these steps:

Starting path claiming and SCSI device discovery

In the vmkernel.log file of the restarting ESXi host (to check the log file depending on the version of ESXi, see the note below), you see entries similar to:

If you configure the setting on an existing VMFS LUN, you may see these entries in the vmkernel.log file:

cpu4:10169)WARNING: Partition: 1273: Device "naa.XXXXXXXXXXXXXXXXXXXxxxxxxxxxxxxx" with a VMFS partition is marked perennially reserved. This is not supported and may lead to data loss.cpu4:10169)WARNING: Partition: 1273: Device "naa.XXXXXXXXXXXXXXXXXXXxxxxxxxxxxxxx" with a VMFS partition is marked perennially reserved. This is not supported and may lead to data loss.cpu4:10169)WARNING: Partition: 1273: Device "naa.XXXXXXXXXXXXXXXXXXXxxxxxxxxxxxxx" with a VMFS partition is marked perennially reserved. This is not supported and may lead to data loss.

On ESXi/ESX 4.1, if the rescan times are still extended, the best option to resolve the issue is to upgrade the host to ESXi 5.0, which includes both of the fixes above (that is, the Patch released on 2011-07-28 and changing the advanced option Scsi.CRTimeoutDuringBoot to 1).

Before configuring the perennially-reserved setting on an existing LUN, you can verify that the LUN is mounted as a VMFS LUN. To view the existing settings, run the command:

esxcfg-scsidevs -m|grep naa.XXXXXXXXXXXXXXXXXXX

The issue Cannot synchronize host hostname. Operation Timed out is fixed in ESXi 5.0 and we recommend upgrading to ESXi 5.0 or later.

ESXi 5.0

ESXi 5.0 uses a different technique to determine if Raw Device Mapped (RDM) LUNs are used for MSCS cluster devices, by introducing a configuration flag to mark each device as perennially reserved that is participating in an MSCS cluster. During the start of an ESXi host, the storage mid-layer attempts to discover all devices presented to an ESXi host during the device claiming phase. However, MSCS LUNs that have a permanent SCSI reservation cause the start process to lengthen as the ESXi host cannot interrogate the LUN due to the persistent SCSI reservation placed on a device by an active MSCS Node hosted on another ESXi host.

Configuring the device to be perennially reserved is local to each ESXi host, and must be performed on every ESXi 5.0 host that has visibility to each device participating in an MSCS cluster. This improves the start time for all ESXi hosts that have visibility to the device(s).

There is no support to apply this setting using vSphere host profiles. As such, ESXi 5.0 hosts deployed using vSphere Auto Deploy cannot take advantage of this feature.

Note: The advanced option Scsi.CRTimeoutDuringBoot is no longer valid on ESXi 5.0.

Upgrading to ESXi 5.0

To upgrade to ESXi 5.0:

For each host having visibility to MSCS RDM LUNs:

All virtual machines in the cluster must be powered off.

Prior to upgrading, unmount all MSCS RDMs from the host:

Determine which RDM LUNs are part of an MSCS cluster.

From the vSphere Client, select a virtual machine that has a mapping to the MSCS cluster RDM devices.

Note: This works even if the LUNs are not currently presented to the host.

Re-present the MSCS RDM devices to the host and rescan.

Confirm that the correct devices are marked as perennially reserved by running this command on the host:

esxcli storage core device list |less

Note: Restarting hosts should not have issues with MSCS devices.

Already upgraded ESXi 5.1/5.5 hosts

To mark the MSCS LUNs as perennially reserved on an already upgraded ESXi 5.1/5.5 host, set the perennially reserved flag in Host Profiles. For more information, see the vSphere MSCS Setup Checklist section in the vSphere Documentation Center in these guides:

Note: Stateless auto deploy wipes all settings at start so it is not possible to set the perennially reserved flag which leads to a large delay in starting. However, this esxcli command can be incorporated in a local start script.

Already upgraded ESXi 5.0 hosts

To mark the MSCS LUNs as perennially reserved on an already upgraded ESXi 5.0 host, run the esxcli command from Already upgraded ESXi 5.1/5.5 hosts section of this article and all subsequent rescans/starts at normal speed.

Determine which RDM LUNs are part of an MSCS cluster.

From the vSphere Client, select a virtual machine that has a mapping to the MSCS cluster RDM devices.

PowerCLI 5.0

To mark the MSCS LUNs as perennially reserved using the PowerCLI, esxcli functionality is available directly through the PowerCLI. Retrieve an esxcli instance and invoke any of its methods. For more information, see the VMware vSphere PowerCLI Blog.