Last year I was working for a customer that was upgrading several AIX 5.3 systems to AIX 6.1. The migrations were successful for the most part, but we did encounter one issue that took a little time to resolve.

The customer was using nimadm to migrate. This process worked fine, however, on a couple of systems a strange error was encountered after the migration. The LPAR was booted into AIX 6.1 and everything came up fine. The applications were started and users started accessing the system.

It was several days later, when the AIX administrator attempted to configure new storage on the AIX system, when the first sign of trouble appeared. He had asked his Storage guy to assign a couple of new disks to his LPAR (via NPIV/VFC). As soon as the Storage admin had completed the assignment, the AIX admin ran cfgmgr to detect and configure the new hdisks. Immediately, cfgmgr reported the following error:

Method error (/usr/lib/methods/cfgscsidisk):

0514-023 The specified device does not exist in the

customized device configuration database.

Initially, the AIX team suspected there was some fault with either the storage device or the zoning of the disk. Both of these items were checked and doubled-checked and were found to be OK. Our next step was to run cfgmgr again, but this time we wanted a greater level of detail captured. To do this we used the following environment variable to force cfgmgr to be ‘more verbose’.

# export CFGLOG="cmd,meth,lib,verbosity:9"

We ran cfgmgr and went to the /var/adm/ras/cfglog file to view the results with the alog command. However, we noticed that the cfglog file had a size of zero (0) and contained no data.

# cd /var/adm/ras

# ls –l cfglog

-rw-r----- 1 root system 0 May 16 13:22 cfglog

We decided to recreate the cfglogalog file and run mkdev again to reproduce the disk configuration error.

# rm cfglog

# echo "Create cfglog `date`"|alog -t cfg

# mkdev -l hdisk0

Method error (/usr/lib/methods/cfgscsidisk):

0514-023 The specified device does not exist in the

customized device configuration database.

This time we found some useful data in the cfglog file.

# alog -t cfg -o

MS 31981804 28835876 /usr/lib/methods/cfgscsidisk -l hdisk39

M4 31981804 Parallel mode = 0

M4 31981804 Get CuDv for hdisk39

M4 31981804 Get device PdDv, uniquetype=disk/fcp/htcvspmpio

M4 31981804 Get parent CuDv, name=fscsi0

M4 31981804 ..is_mpio_capable()

M4 31981804 Device is MPIO

M4 31981804 ..get_paths()

M4 31981804 Getting CuPaths for name='hdisk39'

M4 31981804 Found 1 paths

M0 31981804 cfgcommon.c 225 mpio_init error, rc=23

MS 28835892 31981568 /usr/lib/methods/cfgscsidisk -l hdisk0

M4 28835892 Parallel mode = 0

M4 28835892 Get CuDv for hdisk0

M4 28835892 Get device PdDv, uniquetype=disk/fcp/htcvspmpio

M4 28835892 Get parent CuDv, name=fscsi0

M4 28835892 ..is_mpio_capable()

M4 28835892 Device is MPIO

M4 28835892 ..get_paths()

M4 28835892 Getting CuPaths for name='hdisk0'

M4 28835892 Found 2 paths

M0 28835892 cfgcommon.c 225 mpio_init error, rc=23

MS 25690326 27328608 /usr/lib/methods/cfgscsidisk -l hdisk0

M4 25690326 Parallel mode = 0

M4 25690326 Get CuDv for hdisk0

M4 25690326 Get device PdDv, uniquetype=disk/fcp/htcvspmpio

M4 25690326 Get parent CuDv, name=fscsi0

M4 25690326 ..is_mpio_capable()

M4 25690326 Device is MPIO

M4 25690326 ..get_paths()

M4 25690326 Getting CuPaths for name='hdisk0'

M4 25690326 Found 2 paths

M0 25690326 cfgcommon.c 225 mpio_init error, rc=23

The configuration method was attempting to configure a disk device type of htcvspmpio (which was correct) but it was unable to configure the device paths (mpio_init error rc=23). We suspected that the system was missing some sort device driver support for the type of storage in use.

Cutting a very long story short, we determined, with the help of the IBM AIX support team, that the issue stemmed from “old” AIX installation media used to create the AIX 6.1 TL6 SP5 SPOT and lppsource on the NIM master. Old AIX 6.1 media was originally used (several years ago) to create the NIM resources and was gradually updated over time, all the way up to TL6 SP5.

IBM support identified that the older install media contained a liblpp.a file that was missing the necessary PdPathAt ODM files. Newer install media contained a fix to add the appropriate entries to the bos.rte.cfgfiles. e.g.

My team and I have recently
been trying to stream line our AIX disaster recovery process. We’ve been
looking for ways to reduce our overall recovery time. Several ideas were tossed
around such as a) using a standby DR LPAR with AIX already installed and using
rsync/scp to keep the Prod & DR LPARs in sync and b) using alt_disk_copy
(with the –O flag for a device reset) to clone rootvg to an alternate disk
which is then replicated to DR. These methods may work but are cumbersome to
administer and (in the case of alt_disk_copy) require additional (permanent) resources
on every production system. With over 120 production instances of AIX, the disk
space requirements start to add up.

So far we’ve concluded that
the best way to achieve our goal is by using SAN replicated rootvg volumes at
our DR site.

Our current DR process relies
on recovery of AIX systems from mksysb images from a NIM master. All our data
(non-rootvg) LUNs are already replicated to our DR site. The aim was to change the
process and ‘recover’ our AIX images using replicated rootvg LUNs. This will
reduce our overall recovery time at DR (which is crucial if we are to meet the proposed
recovery time objectives set by our business). Based on current IBM
documentation we were relatively comfortable with the proposed approach. The
following IBM developerWorks article (originally
published in 2009 and updated in late 2010) describes “scenarios in which remapping, copying, and reuse of SAN disks is
allowed and supported. More easily switch AIX environments from one system to
another and help achieve higher availability and reduced down time. These
scenarios also allow for fast deployment of new systems using cloning.”

The document focuses on fully virtualised environments
that utilise shared processors and VIO servers. One area where this document is
currently lacking in information is the use of NPIV and virtual fibre channel
adapters in a DR scenario. We reached out to our contacts in the AIX
development space and asked the following question:

“Hoping you can help us find some statements regarding support
for a replicated rootvg environment using NPIV/Virtual Fibre Channel adapters?
The following IBM developerWorks article discusses VSCSI and we are looking for
something similar for NPIV.http://www.ibm.com/developerworks/aix/library/au-AIX_HA_SAN/index.html
My guess is that restrictions similar to those for physical FC adapters will
apply here? But I'm hoping that given the adapters are virtual the limitations
may be relaxed.
Are you aware of any statement regarding support (or not) for booting from
another system using a disk subsystem image of rootvg replicated to another
disk subsystem when using NPIV? And what, if any, additional requirements/restrictions
may apply when using NPIV?”

We received the following
responses:

“There are some
additional considerations when using NPIV for booting from a replicated rootvg.
With NPIV the client partitions has virtual Fibre Channel adapter ports,
but has physical access to the actual (physical) disk devices. There may
be an increased chance of needing to update the boot list via the Open Firmware
SMS menu. Since the clients have access to the actual disks, you have the
possibility of running multipathing software besides AIX MPIO. If you
are using a multipathing software to manage the NPIV attached disks besides AIX
MPIO, then you should contact the vendor that provided the software to check
their support statement,

Since one
or more of the physical devices will change when booting from an NPIV
replicated rootvg, it is recommend to set the ghostdev attribute. The
ghostdev attribute will trigger when it detects the AIX image is booting from
either a different partition or server. Ghostdev attribute should not
trigger during LPM operations (Live Partition Mobility). Once triggered,
ghostdev will clear the customized ODM database. This will cause detected
devices to be discovered as new devices (with default settings), and avoid the
issue with missing/stale device entries in ODM. Since ghostdev does clear
the entire customized ODM database, this will require you import your data
(non-rootvg) volume groups again, and perform any (device) attribute
customization. To set ghostdev, run "chdev -l sys0 -a ghostdev=1". Ghostdev
must be set before the rootvg is replicated.

Similar
with virtual devices, the client partition is booting an existing rootvg where
the hardware may be different. It's possible some applications have
dependency on tracking the actual physical devices (instead of data on the
disks). For example PowerHA, may keep track of a disk for a cluster
health checks. If you do have applications that have a dependency
on tracking physical devices, then additional setup (of those applications) may
be required after the first boot from the replicated rootvg.

We do have
multiple customers using NPIV for such scenarios. I believe most of them
worked with IBM Lab Based Services to assist with implementing such a
configuration, and some of the customers required some custom scripts to
further customize their system after booting from the replicated rootvg.
Those customers set the ghostdev attribute, and had custom scripts to
import their data (non-rootvg) volume groups, and update PowerHA to point to
the new health check disk.

You should
get support for such an NPIV setup with IBM as long as you follow the
considerations listed in the WhitePaper.”

“Development has approved using
NPIV to do this for one customer. Below are more detailed requirements for this
DR strategy using NPIV. If a Disaster Recovery (DR) environment not using
PowerHA Enterprise Editionis used, then
we believe the white paper located http://www.ibm.com/developerworks/aix/library/au-AIX_HA_SAN/index.html provides the guidelines in regards to setup, prerequisites, and
as well as the limitations of such a DR deployment.Deployments as detailed in the white paper
are supported by IBM.However note that
such a deployment has many manually instituted responsibilities on the customer
to setup and maintain such an environment. IBM expects that the customer
carefully manage these manual steps without any mistakes.The White paper does not currently cover
using NPIV in such a DR scenario.

We have the following guidelines
regarding the configuration, which includes NPIV as an option: All of the
system configuration should be virtualized, with the possible exception of disk
devices when using NPIV. If NPIV is
used then AIX MPIO must be used as the multi-pathing solution.If multi-pathing software is used besides AIX
MPIO, then the vendor of that software must be contacted regarding a support
statement. Install AIX (at least minimum required TLs/SPs for desired AIX
version) and software stack (Middleware and applications) on primary systems,
which is compatible with systems at both sites. Primary and secondary sites
should be using systems with similar hardware, same microcode levels, and same
VIOS levels.

Many manual steps are needed to
setup the virtual and physical devices accurately on secondary site VIOS. If Virtual SCSI Disks are being used, then
discover the unique identification on the primary site and map the disks to the
corresponding replication disks. Map the same appropriately on VIOS on
secondary site. Level of VIOS should support attribute to open the secondary devices
passively. This setting needs to be setup correctly on the VIOS on secondary
site. Operating environment should not have subnet dependencies. Manage the
replication relationships accurately. Manually may need to switch the secondary
disks to primary node. Raw disk usage may cause problems. Some middle ware
products may bypass Operating system and use the disk directly. They might have
their own restrictions for this environment.(e.g. anything that is device location code or storage LUN unique ID dependant
may have issues when the cloned image is restarted on the secondary system with
replicated storage).

Set the "ghostdev"
attribute using chdev command (must be done on the primary). This attribute can
be set using the command "chdev -l sys0 –a ghostdev=1". The ghostdev
attribute will delete customized ODM database on rootvg when AIX detects it has
booted from a different LPAR or system. If the "ghostdev" attribute
is not set, then the booting from the alternate site will result in devices in
ODM showing up in "Defined" or "Missing" state.

After a failover to secondary
site, may need to reset the boot device list for each LPAR before the boot of
the LPAR using SMS menus of the firmware Note that this is not an exhaustive
list of issues. Refer to the white paper and study the same as it applies to
the environment. So as long as they are using MPIO, you're OK.If not using MPIO and some OEM storage, then
the storage vendor must also support it.”

While these responses indicate that this form of
recovery is supported by IBM, we were still looking to IBM for clarity on the
support position. It has been noted that other IBM customers have had mixed
responses when contacting AIX support for feedback and assistance with this
type of DR procedure. And it’s not hard to see why when you read statements
like this from the “Supported Methods of Duplicating an AIX
System” document:

“Unsupported
Methods

1. Using a bitwise copy of a rootvg disk
to another disk.

This bitwise copy can be a one-time snapshot copy such as flashcopy, from one
disk to another, or a continuously-updating copy method, such as Metro Mirror.

While these methods will give you an exact duplicate of the installed AIX
operating system, the copy of the OS may not be bootable. A typical scenario
where this is tried is when one system is a production host and there is a
desire to create a duplicate system at a disaster recovery site in a remote
location.

2. Removing the rootvg disks from one
system and inserting into another.

This also applies to re-zoning SAN disks that contain the rootvg so another
host can see them and attempt to boot from them.

Why don't these methods work?

The reason for this is there are many objects in an AIX system that are unique
to it; Hardware location codes, World-Wide Port Names, partition identifiers,
and Vital Product Data (VPD) to name a few. Most of these objects or
identifiers are stored in the ODM and used by AIX commands.

If a disk containing the AIX rootvg in one system is copied bit-for-bit (or
removed), then inserted in another system, the firmware in the second system
will describe an entirely different device tree than the AIX ODM expects to
find, because it is operating on different hardware. Devices that were
previously seen will show missing or removed, and usually the system will
typically fail to boot with LED 554
(unknown boot disk).”

So, as a secondary objective,
we have been working closely with our local IBM representatives to obtain some
surety from IBM that our proposed DR strategy for AIX is fully supported by
both the AIX development and support teams.

With that in mind I’ll
provide an overview of our new DR approach and hope that it offers others insight
to alternative method for recovery and also to assist IBM in further
understanding what some of the “larger” AIX customers are looking for in terms
of simplified AIX disaster recovery.

What follows is a detailed
description of our IBM AIX, PowerVM/Power Systems environment, the proposed
recovery steps and other items for consideration.

·- Please refer to the following table for a summary of
the environment details.

Our Recovery Procedure:

1.- Change the sys0ghostdev attribute value to 1 on the source production AIX system.
Set the "ghostdev" attribute using chdev command (must be done on the
primary). This attribute can be set using the command "chdev -l sys0 –a ghostdev=1". The
ghostdev attribute will delete customized ODM database on rootvg when AIX
detects it has booted from a different LPAR or system.

2.- Take note of the
source systems rootvg hdisk PVID.

3.- Select the source
production rootvg LUN for replication on the HDS VSP.

4.- Replicate the LUN
from the production site to the DR HDS VSP.

5.- In a DR test, suspendHDSreplication from production to DR.

6.- Assign the
replicated LUN to the target LPAR on the DR POWER6 595 i.e. map the LUN to the
WWPN of the virtual FC adapter on the DR LPAR.

7.- Attempt to boot
the DR LPAR using the replicated rootvg LUN. If necessary, enter SMS menu to
update the boot list i.e. select the correct boot disk, check for the same PVID
as the source host.

8.- Once the LPAR has
successfully booted, the AIX administrator would configure the necessary
devices i.e. import data volume groups, configure network interfaces, etc. This
may also be scripted for execution during the first boot process.

9.- Please refer
to the following diagrams for a visual representation of the proposed process.

Some Notes/Caveats:

The following is a list of
items that we understand are possible limitations and issues with our new DR
process.

·- Booting from
replicated rootvg disks may fail for several reasons, such as, a) there is
unexpected corruption in the replicated LUN image due to rootvg not being
quiesced during replication or b) there is a unidentified issue with the AIX
system that is only apparent the next time the system is booted; this could be
mis-configuration by the administrator or some other unforseen problem.

·- In the event that
an LPAR fails to boot via a replicated rootvg LUN, a backup method is available
for recovery. Switching back to manual NIM mksysb restore provides a sufficient
backup should the replicated rootvg be unusable.

·- If the
"ghostdev" attribute is not set, then booting from the DR site will
result in devices in the ODM showing up in a "Defined" or
"Missing" state.

· -
Once a DR test is
completed, the DR LPAR should be de-activated immediately so that SAN disk
replication can be restarted between production and DR. Failure to perform this
step may result in the DR LPAR failing as a result of file system corruption.

·- At present we are
using AIX MPIO only. There is discussion of using HDLM in the future. We will
contact HDS for a support statement regarding booting from replicated rootvg
LUNs with HDLM installed.

·- The ghostdev
attribute is not implemented in AIX 5.3. AIX 5.3 is no longer supported*.

So far all of our testing has
been successful. We verified that we could replicate an SOE rootvg image of AIX
6.1 and 7.1 to DR and successfully boot an LPAR using the replicated disk.
Based on these tests there doesn’t appear to be anything stopping us from using
this method for DR purposes. The following table outlines the different
versions of AIX we tested and the results.

Once the system was booted we
needed to perform some post boot configuration tasks. These tasks were handled
by two scripts that were called from /etc/inittab. On the source system we
installed the new scripts (in /etc) and added new entries to the /etc/inittab
file. These scripts only run if the systemid matches that of the DR systemid.Note: Only partial contents of each script
are shown below…but you get the idea.

echo "$MYNAME: The systemid
$LSATTR_SYSTEMID_DR does not match the expected DR systemid of
$DR_SYSTEMID."

echo "$MYNAME: This script should only
be executed at DR."

echo "$MYNAME: If you are not booting
the system at the DR site, then you can ignore this message."

echo "$MYNAME: No changes have been
perfomed. Script is exiting."

fi

The ghostdev attribute essentially provides us with a clean ODM
and allows the system to discover new devices and build the ODM from scratch.
If you attempt to boot from a replicated rootvg disk without first setting the ghostdev attribute, your system may fail to boot (hang at LED 554)
because of a new device tree and/or missing devices. You might be able to recover from this
situation (without restoring from mksysb) by performing the steps outlined on
pages 16-20 of the following document (thanks to Dominic Lancaster at IBM for
the presentation).