Business Continuity and Disaster Recovery solutions drive automation, efficiency, data protection, and validation of an organization's enterprise-level Disaster recover strategy. Today, there is a need for efficient Disaster Recovery solution for datacenter workloads including private/public cloud environment with minimal investment at DR Site.

Here we propose a simplified Disaster recovery solution for IBM Power system LPARs (Logical Partitions) through Virtual IO server (VIOS) Shared Storage Pool (SSP) and backup/restore (VIOSBR) capabilities. This solution is currently applicable to Power system LPARs where storage is provisioned through VIOS Shared Storage Pools (SSP).

2. Benefits

The primary benefit of this solution over any other traditional DR solution is saving Cost. Here there is no need to have active workloads/ LPARs running at the DR site which eventually saves huge licensing, Power and maintenance cost

3. Pre-requisites:

There are two pre-requisites to implement this solution.

VIOS 2.2.5.0 or later: VIOS BR tool in VIOS version 2.2.5.0 has been enhanced to backup SSP cluster level configurations at primary site and restore this cluster at DR site with mirrored storage disks.

Storage Mirroring of SSP disks: Storage mirroring has to be enabled at SAN level (SRDF, SVC, PPRC) between primary and DR site for all the disks which are part of the SSP cluster.

Note: In order to enable redundancy at Shared Storage pool, user can choose to configure SSP Failure Groups at primary site. However, the DR solution we are describing here is not dependent on SSP Failure Groups.

4. Environment:

As mentioned above, this solution proposes a method to leverage following two VIOS capabilities to develop a light weight Disaster recovery solution without compromising on Data availability, protection and Virtual IO device configuration.

a. VIOSBR - VIOS Backup and Restore capability

PowerVM Virtual IO Server (VIOS) virtualizes Input/Output adapters and enables sharing of these adapter resources with multiple Virtual machines. VIOS is also responsible for creating and managing virtual and logical mappings between Physical and Virtual Resources. In order to make these mappings persistent across driver updates, it’s possible to backup virtual and logical configurations before updates and restore these mappings after updates using VIOS Backup & restore Capability.

The proposed solution here is to take the backup of PowerVM Virtual IO Server cluster level configuration at primary site and restore the cluster configuration at disaster site with new set of VIO Servers upon primary site failure. End to End disaster recovery solution using VIOS SSP and VIOS BR in PowerVM environment is described through following steps.

Create an active cluster with Shared storage pool at primary site with SAN Storage and enable required SSP Capabilities.

Create LPARs/Virtual machines and provision the storage from SSP Logical units including their rootvg disks.

Install the supported OS such as AIX 7.1 on client LPARs. Configure and enable workloads on these Virtual machines.

If the user wishes to bring existing LPARs and storage to VIOS SSP environment, SSP has a capability through “importpv” command to migrate the storage disks from non-SSP environments.

Enable Storage level mirroring of the SAN Storage disks that are part of Shared Storage Pools across DR site. User can choose to opt for either synchronous or asynchronous mirroring based on their requirements.

Take the backup of Virtual IO Server using VIOSBR command and save the backup data in one of the known location at DR Site. Alternatively, user can opt to save the data on one of the mirrored non-SSP disks so that it will be available to restore the cluster at DR site.

Upon primary site failure, VIOS Shared Storage pool will be recovered by restoring configurations using the backup data through viosbr restore method.

Either all client LPARs or subset of LPARs from primary site can be recovered at DR site using this solution.

6. Solution – User Roles and responsibilities

This solution expects following actions from the user along with pre-requisites

6.1 At Primary Site

Enable storage level mirroring of all the disks that are part of Shared Storage pool with the storage at DR site.

Take a cluster level backup using VIOS backup/restore capability. Backup must be taken whenever there is a change in cluster configuration.

If user intends to create client LPARs at DR site similar to primary site manually (without HMC Remote Restart support), he/she is expected to save the profile information of all the LPARs that need to be restarted at DR site. This profile info needs to be saved at the safe location at DR site.

6.2 At Backup site - Prior to disaster

Create at least one VIOS at DR site and copy the backup data to a temporary directory. This is not a mandatory requirement before the disaster. However, to reduce the time required to restart all the services at DR site, it’s advisable to activate at least one VIO Server before the disaster. This requirement also helps to save all the required data (backup data, mirrored disk data, client profile data, etc.) to the VIO Server.

6.3 At Backup site – After the disaster

Fail over the mirrored LUNs so that they are accessible at DR site.

User can create more VIO Servers for IO virtualizations apart from VIO Servers that were created before the disaster.

Recover the Shared Storage Pool (by now, all the VIO servers should be accessible on network and mirrored LUNs should be accessible on all VIO servers.) using viosbr command with the below inputs.

Free physical volume (hdiskX) for saving the repository information which is accessible from all the VIO Servers.

Backup file captured at primary site.

File containing the list of new hostnames of Virtual I/O Server(s) (“nodelist”) that need(s) to be configured as part of the Shared Storage Pool.

File containing the list of mirrored hdisk names (“disklist”) to restore the cluster.

User is expected to create Client partitions at DR site manually using the saved profile and create respective Virtual adapters on VIOS for these client partitions when they decide to activate the DR site. User can choose to create required client LPARs with business critical applications (not necessary to create all the client LPARs).

Note: User can opt to use PowerVM Remote restart capability for LPARs auto creation. More details are available in section 7 below.

User can opt to use PowerVM remote restart capability to restart client LPARs after restoring the SSP cluster at DR site. Currently Simplified Remote Restart feature is supported on Power 8 based servers with HMC 8.4 version for LPARs that use SSP provisioned storage.

More details on Simplified Remote Restart are available in white paper and IBM knowledge center at the below links.

This solution helps users to restart SSP cluster on mirrored disks through enhanced VIOSBR capability. It’s the responsibility of the users to create LPARs and required virtual adapter mappings in VIOS either manually or through Remote Restart capability.

This solution has been tried at different customer locations and based on one of the client’s experience, Chris Gibson has written a detailed blog at the following location

Create a file "nodelist" with the new hostnames that will be part of new SSP setup. Type one hostname per line. Make sure no white spaces are there at the beginning or end

Ex.: # cat nodelist (list of VIOS node host names)
DRVIOS1

Create a file "disklist" with the list of disks which are the mirror copies of the SSP pool of primary site. Type one disk name per line. Make sure no white spaces are there at the beginning or end. All these disks should be accessible on all the nodes specified in the previous step.