What's New

VMware vCenter Site Recovery Manager 5.0.1 offers the following improvements:

Forced failover, to allow you to recover virtual machines in cases where storage arrays fail at the protected site and, as a result, protected virtual machines are unmanageable and cannot be shut down, powered off, or unregistered.

Compatible Storage Arrays and Storage Replication Adapters

VMware VSA Support

Virtual machines that reside on the vSphere Storage Appliance (VSA) can be
protected by SRM 5.0.1 using vSphere Replication (VR). VSA does not require a Storage Replication Adapter (SRA) to work with SRM 5.0.1.

Installation and Upgrade

SRM 5.0.1 can run with ESXi Server 4.0 and 4.1 and with Virtual Infrastructure 3.5 only if you use array-based replication. If you use vSphere Replication, either alone or in conjunction with array-based replication, then you must upgrade ESXi Server hosts to version 5.0 or ESXi Server 5.0 update 1 as part of the upgrade process.

Upgrade from SRM 5.0 to SRM 5.0.1

You can perform an in-place upgrade of SRM 5.0 to SRM 5.0.1. VMware recommends in-place upgrades rather than fresh installations as this preserves all history reports, recovery plans, protection groups and customizations of recovery plans. You must perform the upgrade procedure on both the protected site and on the recovery site.

Log into the machine on which you are running SRM Server on the protected site.

Back up the SRM database using the tools that your database software provides.

Download and run VMware-srm-5.0.1-buildnumber.exe.

Click Yes when prompted for confirmation that you want to upgrade SRM.

Click Next to install SRM 5.0.1 using the settings from the previous SRM installation.

Click Yes to confirm that you have backed up the SRM database.

Click Finish when the installation completes.

Repeat the upgrade process on the recovery site.

After you have upgraded the SRM server, you must reinstall the SRM client plug-in.

Uninstall the SRM 5.0 client plug-in.

Log into a vSphere Client instance and connect to the vCenter Server to which the SRM server is connected.

Select Plug-ins > Manage Plug-ins.

Click Download and Install to install the SRM 5.0.1 client plug-in.

When the plug-in installation completes, log into SRM and verify that the configuration from the previous version has been retained.

Repeat the process for all vSphere Client instances that you use to connect to SRM server.

You can run SRM 5.0.1 with vSphere Replication 1.0, or you can upgrade vSphere Replication to version 1.0.1 to benefit from the bug fixes in version 1.0.1. You upgrade the vSphere Replication Management Server (VRM Server) and the vSphere Replication (VR) Servers separately.

IMPORTANT: Do not select the option in Update > Settings in the VAMI to automatically update vSphere Replication. If you select automatic updates, VAMI updates vSphere Replication to the latest 5.x version, which is incompatible with SRM and vCenter Server 5.0.x. Leave the update setting set to No automatic updates.

Upgrade the VRM Server:

Upgrade the SRM server and client to SRM 5.0.1.

Go to the VRM Server configuration interface at https://VRMS_IP_address:8080.

Log into the VRM Server configuration interface as root.

Select the Update tab.

Click Check Updates. The update checker shows that version 1.0.1 is available.

Click InstallUpdate.

Select the VRM > Configuration tab and click the Restart button under VRM Server Status to restart the VRM Server.

Repeat the process for the VRM Server on the recovery site.

Upgrade the VR Servers:

Upgrade the VRM Server.

Go to the VR Server configuration interface at https://VR_IP_address:5480.

Log into the VR Server configuration interface as root.

Select the Update tab.

Click Check Updates. The update checker shows that version 1.0.1 is available.

Click InstallUpdates.

Select the System > Information tab and click the Reboot button to restart the VR Server.

Repeat the process for the VR Servers on the recovery site.

Upgrade from SRM 4.1.2 to SRM 5.0.1

You can perform an in-place upgrade of SRM 4.1.2 to SRM 5.0.1. VMware recommends in-place upgrades rather than fresh installations as this preserves all history reports, recovery plans, protection groups and customizations of recovery plans. To upgrade the SRM client to 5.0.1, you must first uninstall the SRM 4.1.2 client.

NOTE: Upgrading from SRM 4.1 or SRM 4.1.1 to SRM 5.0.1 is not supported.

Open Source Components

The copyright statements and licenses applicable to the open source software components distributed in vCenter Site Recovery Manager 5.0.1 are available at the SRM Downloads site. You can also download the source files for any GPL, LGPL, or other similar licenses that require the source code or modifications to source code to be made available for the most recent generally available release of vCenter Site Recovery Manager.

Caveats and Limitations

Interoperability with Storage vMotion and Storage DRS
Due to some specific and limited cases where recoverability can be compromised during storage movement, Site Recovery Manager 5.0.1 is not supported for use with Storage vMotion (SVmotion) and is not supported for use with the Storage Distributed Resource Scheduler (SDRS) including the use of datastore clusters. If you use SVMotion to move a protected virtual machine from a datastore that is in a protection group to a datastore that is not protected, you must manually reconfigure the protection of that virtual machine.

Network address translation (NAT) is not supported with SRM
When configuring vSphere Replication, you must configure the vSphere Replication Server (VR server) with an IP address that is visible to both the protected vSphere Replication Management Server (VRM Server) and the recovery VRM Server.

Interoperability with vCloud Director
Site Recovery Manager 5.0.1 offers limited support for vCloud Director environments. Using SRM to protect virtual machines within vCloud resource pools (virtual machines deployed to an Organization) is not supported. Using SRM to protect the management structure of vCD is supported. For information about how to use SRM to protect the vCD Server instances, vCenter Server instances, and databases that provide the management infrastructure for vCloud Director, see VMware vCloud Director Infrastructure Resiliency Case Study.

Re-protect and automated failback not supported with vSphere Replication
Re-Protect and Automated Failback is only supported with array-replicated virtual machines. Virtual machines configured with vSphere Replication cannot be failed back automatically to the original site using existing recovery plans.

Certain vSphere features and RDM not supported with vSphere Replication
You cannot use vSphere Replication in conjunction with vSphere Fault Tolerance, virtual machine templates, linked clones, or with physical raw disk mapping (RDM).

Protection and recovery of virtual machines with memory state snapshots
When protecting virtual machines with memory state snapshots, the ESX hosts at the protection and recovery sites must have compatible CPUs, as defined in the VMware knowledge base articles VMotion CPU Compatibility Requirements for Intel Processors and VMotion CPU Compatibility Requirements for AMD Processors. The hosts must also have the same BIOS features enabled. If the BIOS configurations of the servers do not match, they show a compatibility error message even if they are otherwise identical. The two most common features to check are Non-Execute Memory Protection (NX / XD) and Virtualization Technology (VT / AMD-V). For more limitations to the protection and recovery of virtual machines with snapshots, see Limitations to Protection and Recovery of Virtual Machines in the Site Recovery Manager Administration Guide.

Resolved Issues

Planned Migration May Result in Slowed ESX Hosts

During planned migration, SRM first instructs ESX hosts to unmount replicated datastores and detach the LUNs backing these datastores. Next, SRM instructs storage array software to make the detached LUNs read-only. This process helps ensure that devices on ESX hosts do not encounter an All Paths Down (APD) condition for the datastores and LUNs being migrated. Migrating a virtual machine with RDMs may result in the RDM LUNs entering an APD condition. After RDMs enter an APD condition, ESX hosts continue to reattempt to establish connectivity with the lost RDM LUNs. As the number of unavailable RDMs increases, the number of ESX host attempts to reconnect to the lost RDMs increases correspondingly. As this proceeds, the ESX host may become slow to respond and vCenter Server may eventually find the hosts unresponsive. This is more likely to occur with certain storage arrays. For example, this is more likely when an SRA supports on iSCSI target per LUN. This is now fixed.

SRM tasks that are cleaned up too quickly cause the ManagedObjectNotFound exception

SRM tasks are removed from the vmware-dr service one minute after the tasks have completed. If SRM refers to a task object that has been removed, it returns a ManagedObjectNotFound exception and shows the error message The object has already been deleted or has not been completely created. The default time for cleaning up tasks is one minute. If you experience this behavior, you can configure the cleanup time by setting the Topology.drTaskCleanupTime parameter in the config.xml file.

<topology>
<drTaskCleanupTime>300</drTaskCleanupTime>
</topology>

Per-CPU license count is incorrect

Some customers who purchased SRM 1.x and SRM 4.0 might still be using per-CPU allocated licenses. They can continue to work with SRM 5.0.1 using those Per-CPU licenses. In SRM 5.0, the formula for counting how many CPU licenses are used is too lenient. It is possible in this scenario that the conversion will incorrectly grant too many per-CPU licenses for SRM 5.0. This has been fixed in SRM 5.0.1, and there are stricter warnings about insufficient licenses.

Protecting a virtual machine in a replicated datastore spanning two disk partitions causes SRM to stop unexpectedly and fail to restart

If you protect a virtual machine that is contained in a replicated datastore that spans two disk partitions in the same device, SRM stops unexpectedly while recalculating the datastore group. The SRM logs show the error Panic: Assert Failed: "ok" @ d:\build\ob\bora-474459\srm\public\persistence/saveLoadUtils.h:329. SRM Server then fails to restart. This issue has been fixed.

If network communication between the protected and recovery sites is interrupted for less than five minutes, SRM Server can become unresponsive. This issue is caused by SRM Server missing update results and the timeout on the server side of waitforupdate calls from the remote site during the network interruption. This issue has been fixed by introducing client-side timeouts on the waitforupdate call. The default client-side timeout is 5 minutes.

Repeated host rescans during test and recovery reenabled

SRM 4.1 provided a configurable option, storageProvider.hostRescanCnt, to allow you to repeatedly scan hosts during testing and recovery. This option was absent from SRM 5.0 but has been restored in the Advanced Settings menu in SRM 5.0.1. Right-click a site in the Sites view, select Advanced Settings, then select storageProvider. See KB 1008283.

Customization Specification Does Not Configure the Gateway for Red Hat Enterprise Linux 5.x

Image customization of Red Hat Enterprise Linux 5.x virtual machines does not configure the gateway properly. Consequently, if you assign a new gateway to the recovering RHEL 5.x virtual machine during customization, the new gateway entry is appended to the /etc/sysconfig/network-scripts/route-ethX file. The RHEL network service picks up the first entry in that file, namely the old gateway setting on the protection site, and does not pick up the gateway changes specified by the user during customization. This has been fixed.

Recovery SRM Server stops unexpectedly after suspension with an error about the recovery plan being on the wrong context

Following a suspension of both the protected and recovery sites, the SRM Server on the recovery side stops unexpectedly when you restart it, with the error CreateRemoteSuspendVmListViewAndCallback on plan Recovery Plan on wrong context. This has been fixed.

Configuring vSphere Replication results in an invalid locale error when using SRM in the simplified Chinese locale with the vCenter Server Appliance

If you use SRM in the simplified Chinese locale with the vCenter Server Appliance, attempting to configure a vSphere replication on a virtual machine fails with an invalid locale error. This has been fixed.

SRM stops unexpectedly during startup after starting to check CPU license use

Under certain circumstances, SRM stops unexpectedly during startup when checking CPU license use, with the error Unexpected Object in results: vim.VirtualMachine. This has been fixed.

During an in-place upgrade from SRM 4.1.x to 5.0.x, SRM creates a file, exportConfig.xml, that contains details of Inventory mappings, datastore mappings, protected virtual machines and groups, recovery plans, and so on. You can use this file to migrate data into the SRM database after the upgrade. However, if you run the SRM installer in repair mode before migrating the data to the database, the installer deletes exportConfig.xml. This has been fixed and running the installer in repair mode does not delete the exportConfig.xml file.

IP customization of virtual machines fails with Error Code 14010

IP customization of virtual machines fails with the error There was an error in communication' Error Code: '14010' when it attempts to configure adapters that are not specific to VMware. This has been fixed, and IP customization now skips non-default network adapters and only configures the physical adapter.

Running a test recovery can fail with the error Panic: Assert Failed: "runtimeInfo._results != 0 (Missing results in plan: recovery-plan-10257)" @ d:/build/ob/bora-474459/srm/src/recovery/engine/manager.cpp:1300^M. This issue occurs because storage replication adapters (SRA) allow snapshot IDs to be the same on different storage devices, but SRM requires snapshot IDs to be unique. This issue only affects test recoveries, and does not affect actual recoveries. The issue has been fixed, and SRM now accepts duplicate snapshot IDs.

Cannot scroll down network list for recovery plans when many network options are available

If more than 30 networks are available in the SRM environment, the complete list of available networks that you can select when you create a recovery plan is not visible. This has been fixed, and you can now scroll down the complete list.

Known Issues

The following known issues have been discovered through rigorous testing and will help you understand some behavior you might encounter in this release.

NEW A vulnerability in the glibc library allows remote code execution

Your vSphere Replication appliance might be impacted by a vulnerability in the glibc that allows remote code execution.

When disks larger than 256GB are protected using vSphere Replication (VR), any operation that causes an internal restart of the virtual disk device causes the disk to complete a full sync. Internal restarts occur any time:

A virtual machine is restarted

A virtual machine is vMotioned

A virtual machine is reconfigured

A snapshot is taken of a virtual machine

Replication is paused and resumed

The full sync is initiated by ESX, and any resolution to this issue would involve an update to ESX. These syncs involve additional I/O to both the protected and recovery site disks, which often takes longer than the Recovery Point Objective (RPO), resulting in a missed RPO target. This issue is present in ESXi Server 5.0, but has been fixed in ESXi Server 5.0 update 1.

Workaround: Upgrade ESXi Server to version 5.0 update 1.

Setting LVM.enableResignature to 1 remains set after a test failover

SRM does not support ESX environments in which the LVM.enableResignature flag is set to 0. During a test failover or an actual failover, SRM automatically sets LVM.enableResignature to 1 if the flag is not already set. SRM sets this flag to resignature snapshot volumes and mounts them on ESX hosts for recovery. After the operation is completed, the flag remains set to 1. For information, see KB 2010051.

Cannot reconfigure vSphere Replication on a virtual machine after receiving Error committing the transaction during configuration

If you receive the message Error committing the transaction during the configuration of replication on a virtual machine, any attempt to reconfigure replication on the virtual machine fails. This issue occurs because vSphere Replication does not clean up the configuration data properly after the configuration attempt. Consequently, the replication of the virtual machine appears configured to vSphere Replication when it is not.

Workaround: To clean up the configuration data correctly, disable replication of the virtual machine at the command line.

Log in to the ESXi console.

Run a command to look up the ID of the virtual machine in the ESXi host.

# vim-cmd vmsvc/getallvms | grep virtual_machine_name

The virtual machine ID is the number in the first column.

Run a command to disable replication for the virtual machine with the ID that you found in the previous step.

If you run a recovery plan with the forced failover option selected, and then revert back to planned migration by selecting Planned Migration, the forced failover checkbox remains selected and grayed out in the Run Recovery Plan wizard. This issue only affects the user interface and SRM performs the correct behavior.

Workaround: Close and reopen the Run Recovery Plan wizard after unselecting the forced failover option.

Recovery or migration operations can fail if placeholder datastores are not visible to all hosts in a protected cluster

During recovery and migration, placeholder virtual machines are replaced with recovered virtual machines. If you have a cluster with multiple hosts on the recovery site, all the placeholder datastores must be available to all the hosts in the cluster, otherwise swapping virtual machines can fail. SRM does not prevent you from selecting placeholder datastores that are not available to all the hosts in the cluster. If placeholder datastores are not visible to all the hosts, the recovery plan fails with the error Error - Unable to access file [datastore]" Unable to access file [datastore] Failed to unregister protected VMs. Hosts must have access to the datastores that contain both the placeholder virtual machines and the recovered virtual machines.

Workaround: Manually check that the datastores for both the placeholder virtual machines and the recovered virtual machines are visible to all the hosts in a protected cluster.

VRM Server and VR Server versions are not updated in virtual machine summary after upgrade

If you upgrade an exising installation of the vSphere Replication Manager Server (VRMS) appliance from version 1.0 to version 1.0.1, the version numbers of the VRM Server and VR Server are not updated in the virtual machine Summary tab. When you select the VRM Server or a VR Server in the vCenter Server Inventory, the Summary tab still displays version 1.0.0.0, even though the VRM Server has been updated to 1.0.1. If you perform a new installation of VRM Server version 1.0.1, the Summary tab displays the correct version number.

Workaround: Check the version number in the VRMS virtual appliance management interface:

Log into https://VRMS_IP_address:8080.

Select the Update tab.

Click Status.
The version number is 1.0.1.0.

Alternatively, you can see the correct version number in the console for the VRM Server or VR Server in the vSphere Client.

Use of unsupported databases with vSphere Replication Management Server is possible

You can configure the vSphere Replication Management Server (VRMS) to use databases that are not supported and VRMS configuration will succeed without any warnings about database support. However, using an unsupported database can lead to unpredictable behavior. The following databases are fully tested and supported for use with VRMS:

SQL Server 2005 SP4 64-bit

SQL Server 2008 R2 SP1 64-bit

SQL Server 2008 R2 64-bit

Some SRAs handle certain timezones incorrectly during failover

Test and real failovers can stop with the error Failed to create snapshots of replica devices for group 'protection-group-999' using array pair 'array-pair-999': Vmacore::SystemException "The parameter is incorrect. " (87). This error is due to a mishandling of the time zone returned by the storage array to the SRA. All timestamps earlier than January 1 1970 will experience this issue. For details and a workaround, see KB 2018597.

The SRM 5.0.1 upgrade process completes without error, vCenter Server restarts, and the hosts, guests, and storage all come online in the same state as when the vCenter Server service was stopped. However, during SRM migration the Protection Group imports fail with the following error:

Skipping VM protection for all VMs in group (group) due to an error: One or more datastores are either already protected in a different group or not currently replicated.

The datastore Object IDs listed in the SRM exportConfig.xml file are different to the same datastore Object IDs that are shown in the MOB browser. This issue is related to the issue described in KB 2007404.

Workaround: Edit the exportConfig.xml to use the datastore Object IDs from the MOB browser and rerun the srm-migration importConfig command.

Events not Properly Displayed for Korean Operating Systems

When the vSphere Client starts, it determines the locale on which it is running, and then chooses the set of messages to display based on the locale. When the vSphere Client is installed on a Korean operating system, the client requests messages from the ko folder from the vCenter Server installation because the vCenter Server and the vSphere Client are localized for Korean. While the vCenter Server and vSphere Client are localized for Korean, SRM is not. Therefore, XXX messages are displayed, instead of SRM server messages. To resolve this issue, create copy of the en folder which is in C:\Program Files\VMware\Infrastructure\VirtualCenter Server\extensions\com.vmware.vcDr\locale\. Rename the folder from en to ko and restart the vCenter Server and SRM services.

A recovery or test workflow fails for a virtual machine with the following message: Error - Unexpected error '3008' when communicating with ESX or guest VM: Cannot connect to the virtual machine.

Under rare circumstances this error might occur when you configure IP customization or an in-guest callout for the virtual machine and the recovery site cluster is in fully-automated DRS mode. An unexpected vMotion might cause a temporary communication failure with the virtual machine, resulting in the customization script error.

Workaround: Rerun the recovery plan. If the error persists, configure the recovery site cluster DRS to manual mode and rerun the recovery plan.

Recovery fails with Error creating test bubble image for group ... The detailed exception is Error while getting host mounts for datastore:managed-object-id... or The object has already been deleted or has not been completely created.

If you run a test recovery or a planned recovery and the recovery plan fails with the specific exception, the LUN used for storing replication data has been temporarily disconnected from ESXi. When reconnected, replication continues as normal and no replication data is lost. The exception occurs during these scenarios:

vSphere Replication cannot locate the LUN as the LUN has changed its internal ID.

The target datastore internal ID changes when the host containing the target datastore is removed from vCenter inventory and later added.

You must manually reconfigure the replication to refresh the new ID.

Workaround: If the primary site is no longer available, contact VMware Support for instructions on manually updating the VRMS database with the new datastore managed object id. If the primary site is still available:

Run a cleanup operation on the recovery plan that failed.

In the Virtual Machines tab of the vSphere Replication view, right-click a virtual machine and select Configure Replication.

Click Next, and click Browse to change the location of the files on the datastore that has been disconnected and then reconnected, and select the same datastore and folder locations as before.

Reuse the existing disks and reconfigure the replication of the virtual machine. The vSphere Replication management server picks up the changed datastore identity (managed object ID) in vCenter Server.

Wait for the initial sync to finish. This sync uses existing disks and checks for data consistency.