Note: Try this on a crash’n’burn system before unleashing it’s
fury on a real AIX system (i.e. one that has users that depend on it!). Always
take a mksysb backup before performing this type of activity.

aixlpar1 :
/tmp # ksh -x fixmyrootvg.ksh

+ + lslv -l
hd5

+ grep hdisk

+ head -1

+ awk {print
$1}

PV=hdisk0

+ VG=rootvg

+ lqueryvg
-Lp hdisk0

+ awk {
print $2 }

+ read
LVname

+ odmdelete
-q name = hd5 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd5 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd5 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd5 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd6 -o CuAt

0518-307
odmdelete: 4 objects deleted.

+ odmdelete
-q name = hd6 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd6 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd6 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd8 -o CuAt

0518-307
odmdelete: 3 objects deleted.

+ odmdelete
-q name = hd8 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd8 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd8 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd4 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd4 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd4 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd4 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd2 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd2 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd2 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd2 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd9var -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd9var -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd9var -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd9var -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd3 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd3 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd3 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd3 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd1 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd1 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd1 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd1 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd10opt -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd10opt -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd10opt -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd10opt -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = local -o CuAt

0518-307
odmdelete: 4 objects deleted.

+ odmdelete
-q name = local -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = local -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = local -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd7 -o CuAt

0518-307
odmdelete: 3 objects deleted.

+ odmdelete
-q name = hd7 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd7 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd7 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd11admin -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd11admin -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd11admin -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd11admin -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = rootvg -o CuAt

0518-307
odmdelete: 3 objects deleted.

+ odmdelete
-q parent = rootvg -o CuDv

0518-307
odmdelete: 0 objects deleted.

+ odmdelete
-q name = rootvg -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q name = rootvg -o CuDep

0518-307
odmdelete: 0 objects deleted.

+ odmdelete
-q dependency = rootvg -o CuDep

0518-307
odmdelete: 0 objects deleted.

+ [ rootvg =
rootvg ]

+ odmdelete
-q value1 = 10 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = rootvg -o CuDvDr

0518-307
odmdelete: 0 objects deleted.

+ importvg -y rootvg hdisk0

rootvg

0516-012 lvaryoffvg: Logical
volume must be closed.If the logical

volume contains a filesystem, the
umount command will close

the LV device.

0516-942 varyoffvg: Unable to
vary off volume group rootvg.

+ varyonvg rootvg

+ synclvodm -Pv rootvg

synclvodm:
Physical volume data updated.

synclvodm:
Logical volume hd5 updated.

synclvodm:
Logical volume hd6 updated.

synclvodm:
Logical volume hd8 updated.

synclvodm:
Logical volume hd4 updated.

synclvodm:
Logical volume hd2 updated.

synclvodm:
Logical volume hd9var updated.

synclvodm:
Logical volume hd3 updated.

synclvodm:
Logical volume hd1 updated.

synclvodm:
Logical volume hd10opt updated.

synclvodm:
Logical volume hd7 updated.

synclvodm:
Logical volume hd11admin updated.

+ savebase

Hey Presto! The root volume
group is now named rootvg just the
way we like it!

I updated my
lab VIOS to the latest fix pack (V2.2.2.1)
this week and thought I’d try the new VIOS part
command. This new command is an improved version of the existing vios_advisor tool. The major difference
between the two is the fact that the new tool is now included with VIOS code
and will be updated via new VIOS fix packs. The following link has some
information on using the command:

This new
tool “Provides performance reports with
suggestions for making configurational changes to the environment, and helps to
identify areas for further investigation. The reports are based on the key
performance metrics of various partition resources that are collected from the Virtual I/O Server (VIOS)
environment.” Just like the old VIOS advisor.

I ran the
tool for 10 minutes on my idle VIOS, just to see how the new XML report looked.

I then scp’ed the tar file to my
laptop, extracted it and opened the vios_advisor_report.xml
file. This is what the report looked like:

I was also able to open the nmon file using the nmon analyser tool. It produced typical
nmon performance graphs as you’d
expect.

So, not only does the new part tool run the VIOS advisor it also capturesnmon performance data at the same time.

This is rather impressive and a great
move by IBM. The original VIOS advisor tool was free and of course not
officially supported by IBM (although the development team were very responsive
to requests from users of the tool). The new tool is fully supported by the IBM
team and as a result will only get better and better as time goes by. I’m not a
fan of the new command name, part (good luck trying to google
it!), I still prefer vios_advisor
but hey, what’s in a name, right?

Starting with AIX 6.1 TL08 and AIX 7.1
TL02 there’s a new AIX CPU tuning feature called “Scaled Throughput” mode. This is supported on POWER7 and POWER7+ processors
only (do not try this on POWER6!). This
new mode has the ability to dispatch workload to more SMT threads per VP,
avoiding the need to unfold additional VPs. I’ve heard it described as being more
“POWER6 like”. I’m not suggesting that you use this feature. This post simply
discusses what this new mode can do.

By default
AIX (on POWER7) operates in “Raw
Throughput” mode. This mode provides the best performance per thread per core.
It offers the best response times. It utilises more cores (VPs) to process a
systems workload. By comparison, “Scaled
Throughput” provides a greater level of per core throughput (processing) by dispatching
more SMT threads on a core. This has the effect of utilising fewer VPs/cores.
In this mode, more (or all) SMT threads per core will be utilised before dispatching
workload to other VPs/core in the system.

The schedo
tuning command can be used to enable the new mode using a new parameter called vpm_throughput_mode. e.g.

# schedo –p –o
vpm_throughput_mode=X

This tunable can be set to one of the
following values:

0 = Legacy Raw mode (default).

1= Scaled or “Enhanced Raw” mode with a higher
threshold than legacy.

2 = Scaled mode, use primary and secondary SMT
threads.

4 = Scaled mode, use all four SMT threads.

At this stage, this tunable is not
restricted, but if you plan on experimenting with it please be careful; make
sure you understand how this tuning may impact your system; always test new
tuning in a non-production environment first!

I performed a couple of quick tests today,
just to see what impact tuning the parameter would have on an AIX 7.1 TL2 system.

aixlpar1 : /
# oslevel -s

7100-02-01-1245

aixlpar1 : /
# lsconf | grep Mode

System
Model: IBM,9119-FHB

Processor
Implementation Mode: POWER 7

I started some CPU intensive workload.

NamePIDCPU%PgSp OwnerCliV20WPAR Total0

ncpu4718826 15.7108K dbSerV30Press: "h"-help

ncpu7143668 15.7108K dbCliV30"q"-quit

ncpu8126488 15.7108K db

ncpu5832920 15.6108K db

The vpm_throughput_mode
parameter was left at the default value (0).

# schedo -a
| grep vpm_throughput_mode

vpm_throughput_mode = 0

As expected, the workload was evenly dispatched
across each of the primary SMT threads of the 4 VPs assigned to the partition
i.e. logical CPU 0, 4, 8
and 12. None of the secondary or
tertiary SMT threads were active. This is the default mode and will provide the
greatest raw throughput
(performance) per VP (as there’s no overhead associated with enabling secondary
or tertiary SMT threads).

Topas
Monitor for host:aixlpar1EVENTS/QUEUESFILE/TTY

Fri Dec 21
11:06:34 2012Interval:2Cswitch184Readch864

Syscall299Writech651

CPUUser% Kern% Wait% Idle%PhyscReads111Rawin0

099.70.30.00.00.63Writes1Ttyout233

10.20.50.099.30.12Forks0Igets0

20.00.00.0 100.00.12Execs0Namei9

30.00.00.0 100.00.12Runqueue4.00Dirblk0

4100.00.00.00.00.63Waitqueue0.0

50.00.00.0 100.00.12MEMORY

60.00.00.0 100.00.12PAGINGReal,MB4096

70.00.00.0 100.00.12Faults0% Comp22

8100.00.00.00.00.63Steals0% Noncomp2

90.00.00.0 100.00.12PgspIn0% Client2

100.00.00.0 100.00.12PgspOut0

110.00.00.0 100.00.12PageIn0PAGING SPACE

12100.00.00.00.00.63PageOut0Size,MB2048

130.00.00.0 100.00.12Sios0% Used0

140.00.00.0 100.00.12% Free100

150.00.00.0 100.00.12NFS (calls/sec)

Next, we enabled scaled throughput mode (2).
The workload slowly migrated to logical CPUs 4, 5, 8 and 9. So now only two primary SMT threads were
active (lcpu’s 4 and 8) and two secondary threads were active (lcpu’s 5 and 9).
All the processing was being performed by fewer VPs (almost like POWER6).

# schedo -p
-o vpm_throughput_mode=2

Topas
Monitor for host:aixlpar1EVENTS/QUEUESFILE/TTY

Fri Dec 21
11:07:36 2012Interval:2Cswitch179Readch935

Syscall301Writech794

CPUUser% Kern% Wait% Idle%PhyscReads112Rawin0

014.160.30.025.60.00Writes2Ttyout304

15.732.80.061.50.00Forks0Igets0

20.01.60.098.40.00Execs0Namei10

30.03.00.097.00.00Runqueue4.00Dirblk0

4100.00.00.00.00.47Waitqueue0.0

5100.00.00.00.00.47MEMORY

60.00.00.0 100.00.03PAGINGReal,MB4096

70.00.00.0 100.00.03Faults0% Comp22

8100.00.00.00.00.47Steals0% Noncomp2

9100.00.00.00.00.47PgspIn0% Client2

100.00.00.0 100.00.03PgspOut0

110.00.00.0 100.00.03PageIn0PAGING SPACE

120.050.60.049.40.00PageOut0Size,MB2048

150.013.30.086.70.00Sios0% Used0

140.00.80.099.20.00%
Free100

150.00.80.099.20.00NFS (calls/sec)

And finally, I tried scaled mode with all
four SMT threads (4). All of the workload was migrated to a single VP but all 4
SMT threads were being utilised (primary SMT thread lcpu 0, secondary/tertiary
SMT threads 1, 2 &3). This mode offers lower overall core consumption but
has the (possibly negative) side effect of enabling more SMT threads on a
single VP/core, which may not perform as well as the same workload evenly dispatched
to 4 individual VPs/cores (on primary SMT threads).

# schedo -p
-o vpm_throughput_mode=4

Topas
Monitor for host:aixlpar1EVENTS/QUEUESFILE/TTY

Fri Dec 21
11:08:30 2012Interval:2Cswitch199Readch385

Syscall149Writech769

CPUUser% Kern% Wait%
Idle%PhyscReads2Rawin0

099.70.30.00.00.25Writes1Ttyout375

199.90.10.00.00.25Forks0Igets0

2100.00.00.00.00.25Execs0Namei6

3100.00.00.00.00.25Runqueue4.00Dirblk0

40.048.80.051.20.00Waitqueue0.0

50.05.50.094.50.00MEMORY

60.03.60.096.40.00PAGINGReal,MB4096

70.03.30.096.70.00Faults0% Comp22

80.077.10.022.90.00Steals0% Noncomp2

90.052.90.047.10.00PgspIn0% Client2

100.044.80.055.20.00PgspOut0

110.038.50.061.50.00PageIn0PAGING SPACE

120.051.70.048.30.00PageOut0Size,MB2048

130.08.90.091.10.00Sios0% Used0

140.07.00.093.00.00%
Free100

150.07.40.092.60.00NFS (calls/sec)

For more information on “Scaled Throughput” mode, take a look at
the following presentation:

After
updating my NIM master to AIX 7.1 TL2 SP1 (7100-02-01-1245), I noticed a
problem. Whenever I installed a new AIX partition using NIM, the resources
allocated to the NIM client were not
being de-allocated, even though the installation was completing successfully. Also,
if I tried to run my usual ‘NIM client reset’ script (below), the resources
were still allocated.

#!/usr/bin/ksh

# Reset a
NIM client.

if [[
"$1" = "" ]] ; then

echo Please specify a NIM client to reset
e.g. aixlpar1.

else

if lsnim -l $1 > /dev/null 2>&1 ;
then

nim -o reset -F $1

nim -Fo de-allocate -a subclass=all $1

nim -Fo change -a cpuid= $1

else

echo Not a valid NIM client!?

fi

fi

For
example, here’s my NIM client with the lpp_source,
mksysb and SPOT resources assigned to it (even though the AIX install
completed OK).

root@nim1 :
/ # lsnim -l aixlpar1

aixlpar1:

class= machines

type= standalone

connect= shell

platform= chrp

netboot_kernel= 64

if1= network1 aixlpar1 0

cable_type1= N/A

Cstate= ready for a NIM operation

prev_state= not running

Mstate= currently running

boot= boot

lpp_source=
lpp_sourceaix710105

mksysb= aixlpar1-71

nim_script= nim_script

spot=
spotaix710105

cpuid= 00C453C75C00

control= master

Cstate_result= success

installed_image = aixlpar1-71

My
workaround was to use 'smit nim_mac_res'
to manually de-allocate resources from
the client:

====

De-allocate Network Install Resources

aixlpar1machinesstandalone

>lpp_sourceaix710105lpp_source

>spotaix710105spot

>aixlpar1-71mksysb

====

It
appears that others were also experiencing this problem. I found the following
thread on the IBM developerWorks AIX user forum:

A colleague of mine was planning to modify the max_xfer_size attribute on a couple of
FC adapters in one of his AIX LPARs. As he was describing his plan to me, I
asked him how he intended to back out of the change should the LPAR fail to
boot after the modifications. “But, what could
possibly go wrong?” he fired back. I advised him to use multibos to create a standby (backup)
instance of the AIX OS, just in case. He begrudgingly did so, just to keep me
happy.

The next day he told me the following
tale.

He had modified the FC adapters max_xfer_size attribute as planned.
First, checking the current values, for the attribute on both adapters.

aixlpar1
: / # lsattr -El fcs0 -a max_xfer_size

max_xfer_size
0x100000 Maximum Transfer Size True

aixlpar1
: / # lsattr -El fcs1 -a max_xfer_size

max_xfer_size
0x100000 Maximum Transfer Size True

He’d created a standby AIX instance before
making changes to the adapters. He also prevented multibos from changing the bootlist to the standby boot logical
volume (BLV).

Then he manually changed the LPARs boot
list to include the standby BLV.

aixlpar1
: / # bootlist -m normal hdisk2 blv=hd5
hdisk2 blv=bos_hd5

aixlpar1
: / # bootlist -m normal -o

hdisk2
blv=hd5 pathid=0

hdisk2
blv=hd5 pathid=1

hdisk2 blv=bos_hd5 pathid=0

hdisk2 blv=bos_hd5 pathid=1

He carefully recorded the bootlist output, just in case the boot
failed with new max_xfer_size values.
He could use the vdevice name and location
to manually select the standby BLV to start the system in an emergency.

The cause of the 554 hang appeared to be related
to the fact that the VIOS physical adapters needed their max_xfer_size value changed to the new value before the client LPAR
virtual fibre channel adapters were modified.

In a perfect world, 99.9% of AIX administrators would prefer their systems to look like this:

# lspv | grep rootvg

hdisk000c342c68dfcbdfbrootvgactive

However, in reality, 99.9% of AIX administrators live with systems that look something like this:

# lspv | grep rootvg

hdisk3900c342c68dfcbdfbrootvgactive

And 99.9% of them don’t have time to tidy up their systems so that rootvg resides on hdisk0.

Most of them have much bigger fish to fry, such as performance, virtualisation, automation, security, project delivery, TPS reports, etc!

If they did have time, they could use the mirrorvg and rendev commands to ‘bring order to the Universe’.

WARNING! Let me make this perfectly clear! The procedure that is shown below is NOT SUPPORTED by IBM. If you choose to follow these procedures, DO NOT contact IBM support for help. They will not be able to assist you. YOU HAVE BEEN WARNED!

Note: Disk drive devices that are members of the root volume group, or that will become members of the root volume group (by means of LVM or install procedures), must not be renamed. Renaming such disk drives may interfere with the ability to recover from certain scenarios, including boot failures. Some devices may have special requirements on their names in order for other devices or applications to use them. Using the rendev command to rename such a device may result in the device being unusable.

Note: To protect the configuration database, the rendev command cannot be interrupted once it has started. Trying to stop this command before completion, could result in
a corrupted database.

In a previous post
I discussed how you can identify some of the different types of a PowerVM Capacity
on Demand (CoD) activation keys from IBM.

Recently I had to Activate Memory Expansion (AME) on a
couple of POWER7 systems. I discovered that all of the keys contained a similar
string. It appears that if a CoD key contains the string CA1F0000000800then it is safe to assume it will activate
AME for a particular system. e.g.

9741EF3AE6969F17CA1F0000000800419D

937A1240F00F5B05CA1F0000000800413D

And while I’m talking about AME, I thought I’d share this
tip as well.

I was performing a demo of AME for my team and wanted to
change the AME expansion factor using DLPAR during the demo. I did not want to
use the HMC GUI but rather the HMC command line (as it’s faster).

To change the expansion factor for an LPAR (that’s enabled
for AME), you can use the chhwres
command from the HMC CLI.

During the demo I highlighted the current (running)
expansion factor for the LPAR (using the lshwres
command).

My team and I have recently
been trying to stream line our AIX disaster recovery process. We’ve been
looking for ways to reduce our overall recovery time. Several ideas were tossed
around such as a) using a standby DR LPAR with AIX already installed and using
rsync/scp to keep the Prod & DR LPARs in sync and b) using alt_disk_copy
(with the –O flag for a device reset) to clone rootvg to an alternate disk
which is then replicated to DR. These methods may work but are cumbersome to
administer and (in the case of alt_disk_copy) require additional (permanent) resources
on every production system. With over 120 production instances of AIX, the disk
space requirements start to add up.

So far we’ve concluded that
the best way to achieve our goal is by using SAN replicated rootvg volumes at
our DR site.

Our current DR process relies
on recovery of AIX systems from mksysb images from a NIM master. All our data
(non-rootvg) LUNs are already replicated to our DR site. The aim was to change the
process and ‘recover’ our AIX images using replicated rootvg LUNs. This will
reduce our overall recovery time at DR (which is crucial if we are to meet the proposed
recovery time objectives set by our business). Based on current IBM
documentation we were relatively comfortable with the proposed approach. The
following IBM developerWorks article (originally
published in 2009 and updated in late 2010) describes “scenarios in which remapping, copying, and reuse of SAN disks is
allowed and supported. More easily switch AIX environments from one system to
another and help achieve higher availability and reduced down time. These
scenarios also allow for fast deployment of new systems using cloning.”

The document focuses on fully virtualised environments
that utilise shared processors and VIO servers. One area where this document is
currently lacking in information is the use of NPIV and virtual fibre channel
adapters in a DR scenario. We reached out to our contacts in the AIX
development space and asked the following question:

“Hoping you can help us find some statements regarding support
for a replicated rootvg environment using NPIV/Virtual Fibre Channel adapters?
The following IBM developerWorks article discusses VSCSI and we are looking for
something similar for NPIV.http://www.ibm.com/developerworks/aix/library/au-AIX_HA_SAN/index.html
My guess is that restrictions similar to those for physical FC adapters will
apply here? But I'm hoping that given the adapters are virtual the limitations
may be relaxed.
Are you aware of any statement regarding support (or not) for booting from
another system using a disk subsystem image of rootvg replicated to another
disk subsystem when using NPIV? And what, if any, additional requirements/restrictions
may apply when using NPIV?”

We received the following
responses:

“There are some
additional considerations when using NPIV for booting from a replicated rootvg.
With NPIV the client partitions has virtual Fibre Channel adapter ports,
but has physical access to the actual (physical) disk devices. There may
be an increased chance of needing to update the boot list via the Open Firmware
SMS menu. Since the clients have access to the actual disks, you have the
possibility of running multipathing software besides AIX MPIO. If you
are using a multipathing software to manage the NPIV attached disks besides AIX
MPIO, then you should contact the vendor that provided the software to check
their support statement,

Since one
or more of the physical devices will change when booting from an NPIV
replicated rootvg, it is recommend to set the ghostdev attribute. The
ghostdev attribute will trigger when it detects the AIX image is booting from
either a different partition or server. Ghostdev attribute should not
trigger during LPM operations (Live Partition Mobility). Once triggered,
ghostdev will clear the customized ODM database. This will cause detected
devices to be discovered as new devices (with default settings), and avoid the
issue with missing/stale device entries in ODM. Since ghostdev does clear
the entire customized ODM database, this will require you import your data
(non-rootvg) volume groups again, and perform any (device) attribute
customization. To set ghostdev, run "chdev -l sys0 -a ghostdev=1". Ghostdev
must be set before the rootvg is replicated.

Similar
with virtual devices, the client partition is booting an existing rootvg where
the hardware may be different. It's possible some applications have
dependency on tracking the actual physical devices (instead of data on the
disks). For example PowerHA, may keep track of a disk for a cluster
health checks. If you do have applications that have a dependency
on tracking physical devices, then additional setup (of those applications) may
be required after the first boot from the replicated rootvg.

We do have
multiple customers using NPIV for such scenarios. I believe most of them
worked with IBM Lab Based Services to assist with implementing such a
configuration, and some of the customers required some custom scripts to
further customize their system after booting from the replicated rootvg.
Those customers set the ghostdev attribute, and had custom scripts to
import their data (non-rootvg) volume groups, and update PowerHA to point to
the new health check disk.

You should
get support for such an NPIV setup with IBM as long as you follow the
considerations listed in the WhitePaper.”

“Development has approved using
NPIV to do this for one customer. Below are more detailed requirements for this
DR strategy using NPIV. If a Disaster Recovery (DR) environment not using
PowerHA Enterprise Editionis used, then
we believe the white paper located http://www.ibm.com/developerworks/aix/library/au-AIX_HA_SAN/index.html provides the guidelines in regards to setup, prerequisites, and
as well as the limitations of such a DR deployment.Deployments as detailed in the white paper
are supported by IBM.However note that
such a deployment has many manually instituted responsibilities on the customer
to setup and maintain such an environment. IBM expects that the customer
carefully manage these manual steps without any mistakes.The White paper does not currently cover
using NPIV in such a DR scenario.

We have the following guidelines
regarding the configuration, which includes NPIV as an option: All of the
system configuration should be virtualized, with the possible exception of disk
devices when using NPIV. If NPIV is
used then AIX MPIO must be used as the multi-pathing solution.If multi-pathing software is used besides AIX
MPIO, then the vendor of that software must be contacted regarding a support
statement. Install AIX (at least minimum required TLs/SPs for desired AIX
version) and software stack (Middleware and applications) on primary systems,
which is compatible with systems at both sites. Primary and secondary sites
should be using systems with similar hardware, same microcode levels, and same
VIOS levels.

Many manual steps are needed to
setup the virtual and physical devices accurately on secondary site VIOS. If Virtual SCSI Disks are being used, then
discover the unique identification on the primary site and map the disks to the
corresponding replication disks. Map the same appropriately on VIOS on
secondary site. Level of VIOS should support attribute to open the secondary devices
passively. This setting needs to be setup correctly on the VIOS on secondary
site. Operating environment should not have subnet dependencies. Manage the
replication relationships accurately. Manually may need to switch the secondary
disks to primary node. Raw disk usage may cause problems. Some middle ware
products may bypass Operating system and use the disk directly. They might have
their own restrictions for this environment.(e.g. anything that is device location code or storage LUN unique ID dependant
may have issues when the cloned image is restarted on the secondary system with
replicated storage).

Set the "ghostdev"
attribute using chdev command (must be done on the primary). This attribute can
be set using the command "chdev -l sys0 –a ghostdev=1". The ghostdev
attribute will delete customized ODM database on rootvg when AIX detects it has
booted from a different LPAR or system. If the "ghostdev" attribute
is not set, then the booting from the alternate site will result in devices in
ODM showing up in "Defined" or "Missing" state.

After a failover to secondary
site, may need to reset the boot device list for each LPAR before the boot of
the LPAR using SMS menus of the firmware Note that this is not an exhaustive
list of issues. Refer to the white paper and study the same as it applies to
the environment. So as long as they are using MPIO, you're OK.If not using MPIO and some OEM storage, then
the storage vendor must also support it.”

While these responses indicate that this form of
recovery is supported by IBM, we were still looking to IBM for clarity on the
support position. It has been noted that other IBM customers have had mixed
responses when contacting AIX support for feedback and assistance with this
type of DR procedure. And it’s not hard to see why when you read statements
like this from the “Supported Methods of Duplicating an AIX
System” document:

“Unsupported
Methods

1. Using a bitwise copy of a rootvg disk
to another disk.

This bitwise copy can be a one-time snapshot copy such as flashcopy, from one
disk to another, or a continuously-updating copy method, such as Metro Mirror.

While these methods will give you an exact duplicate of the installed AIX
operating system, the copy of the OS may not be bootable. A typical scenario
where this is tried is when one system is a production host and there is a
desire to create a duplicate system at a disaster recovery site in a remote
location.

2. Removing the rootvg disks from one
system and inserting into another.

This also applies to re-zoning SAN disks that contain the rootvg so another
host can see them and attempt to boot from them.

Why don't these methods work?

The reason for this is there are many objects in an AIX system that are unique
to it; Hardware location codes, World-Wide Port Names, partition identifiers,
and Vital Product Data (VPD) to name a few. Most of these objects or
identifiers are stored in the ODM and used by AIX commands.

If a disk containing the AIX rootvg in one system is copied bit-for-bit (or
removed), then inserted in another system, the firmware in the second system
will describe an entirely different device tree than the AIX ODM expects to
find, because it is operating on different hardware. Devices that were
previously seen will show missing or removed, and usually the system will
typically fail to boot with LED 554
(unknown boot disk).”

So, as a secondary objective,
we have been working closely with our local IBM representatives to obtain some
surety from IBM that our proposed DR strategy for AIX is fully supported by
both the AIX development and support teams.

With that in mind I’ll
provide an overview of our new DR approach and hope that it offers others insight
to alternative method for recovery and also to assist IBM in further
understanding what some of the “larger” AIX customers are looking for in terms
of simplified AIX disaster recovery.

What follows is a detailed
description of our IBM AIX, PowerVM/Power Systems environment, the proposed
recovery steps and other items for consideration.

·- Please refer to the following table for a summary of
the environment details.

Our Recovery Procedure:

1.- Change the sys0ghostdev attribute value to 1 on the source production AIX system.
Set the "ghostdev" attribute using chdev command (must be done on the
primary). This attribute can be set using the command "chdev -l sys0 –a ghostdev=1". The
ghostdev attribute will delete customized ODM database on rootvg when AIX
detects it has booted from a different LPAR or system.

2.- Take note of the
source systems rootvg hdisk PVID.

3.- Select the source
production rootvg LUN for replication on the HDS VSP.

4.- Replicate the LUN
from the production site to the DR HDS VSP.

5.- In a DR test, suspendHDSreplication from production to DR.

6.- Assign the
replicated LUN to the target LPAR on the DR POWER6 595 i.e. map the LUN to the
WWPN of the virtual FC adapter on the DR LPAR.

7.- Attempt to boot
the DR LPAR using the replicated rootvg LUN. If necessary, enter SMS menu to
update the boot list i.e. select the correct boot disk, check for the same PVID
as the source host.

8.- Once the LPAR has
successfully booted, the AIX administrator would configure the necessary
devices i.e. import data volume groups, configure network interfaces, etc. This
may also be scripted for execution during the first boot process.

9.- Please refer
to the following diagrams for a visual representation of the proposed process.

Some Notes/Caveats:

The following is a list of
items that we understand are possible limitations and issues with our new DR
process.

·- Booting from
replicated rootvg disks may fail for several reasons, such as, a) there is
unexpected corruption in the replicated LUN image due to rootvg not being
quiesced during replication or b) there is a unidentified issue with the AIX
system that is only apparent the next time the system is booted; this could be
mis-configuration by the administrator or some other unforseen problem.

·- In the event that
an LPAR fails to boot via a replicated rootvg LUN, a backup method is available
for recovery. Switching back to manual NIM mksysb restore provides a sufficient
backup should the replicated rootvg be unusable.

·- If the
"ghostdev" attribute is not set, then booting from the DR site will
result in devices in the ODM showing up in a "Defined" or
"Missing" state.

· -
Once a DR test is
completed, the DR LPAR should be de-activated immediately so that SAN disk
replication can be restarted between production and DR. Failure to perform this
step may result in the DR LPAR failing as a result of file system corruption.

·- At present we are
using AIX MPIO only. There is discussion of using HDLM in the future. We will
contact HDS for a support statement regarding booting from replicated rootvg
LUNs with HDLM installed.

·- The ghostdev
attribute is not implemented in AIX 5.3. AIX 5.3 is no longer supported*.

So far all of our testing has
been successful. We verified that we could replicate an SOE rootvg image of AIX
6.1 and 7.1 to DR and successfully boot an LPAR using the replicated disk.
Based on these tests there doesn’t appear to be anything stopping us from using
this method for DR purposes. The following table outlines the different
versions of AIX we tested and the results.

Once the system was booted we
needed to perform some post boot configuration tasks. These tasks were handled
by two scripts that were called from /etc/inittab. On the source system we
installed the new scripts (in /etc) and added new entries to the /etc/inittab
file. These scripts only run if the systemid matches that of the DR systemid.Note: Only partial contents of each script
are shown below…but you get the idea.

echo "$MYNAME: The systemid
$LSATTR_SYSTEMID_DR does not match the expected DR systemid of
$DR_SYSTEMID."

echo "$MYNAME: This script should only
be executed at DR."

echo "$MYNAME: If you are not booting
the system at the DR site, then you can ignore this message."

echo "$MYNAME: No changes have been
perfomed. Script is exiting."

fi

The ghostdev attribute essentially provides us with a clean ODM
and allows the system to discover new devices and build the ODM from scratch.
If you attempt to boot from a replicated rootvg disk without first setting the ghostdev attribute, your system may fail to boot (hang at LED 554)
because of a new device tree and/or missing devices. You might be able to recover from this
situation (without restoring from mksysb) by performing the steps outlined on
pages 16-20 of the following document (thanks to Dominic Lancaster at IBM for
the presentation).

I received the following
question from an AIX administrator in Germany.

“Hi Chris,

on your blog, you explain how to find out the active value
of

num_cmd_elems of an fc-adapter by using the kdb. So you can
decide, if the

value of lsattr is active or not ...

I wonder if you can find out the values fc_err_recov and
dyntrk of the

fscsiX device.?

# lsattr -El fscsi0

attach
switch How this adapter is
CONNECTED False

dyntrk
yes Dynamic Tracking of
FC Devices True

fc_err_recov delayed_fail FC Fabric Event Error RECOVERY
Policy True

scsi_id
0x1021f Adapter SCSI
ID
False

sw_fc_class
3 FC Class
for
Fabric
True

I try to use echo efscsi fscsi0 | kdb .. but I can't figure
it out..

Can you help my please?”

I did a little research on his behalf
and came up with an answer. However, I’m not at all surprised he had trouble
finding the right information. It's not easy, clear or documented!

I received the following information
from my IBM AIX contacts.

“The following relies on internal structures that are subject to
change.

The procedure was tested on 6100-06, 6100-07, and 7100-01. I don't
have a lab system with physical HBAs and 5.3 at the moment.

Hopefully the same steps should work for 5.3. You may need to
first run efscsi without arguments to load the kdb module before running efscsi
fscsiX.

# kdb

(0)> efscsi fscsi1 | grep efscsi_ddi

struct efscsi_ddi ddi
= 0xF1000A060084A080

(0)> dd 0xF1000A060084A080+20 2

F1000A060084A0A0: 0101020202010200 000000B400000028 ...............(

FFDDNNNNNNNN

FF = fc_error_recov: 01=delayed_fail
02=fast_fail

DD = dyntrk: 00=disabled 01=enabled

NNNN=num_cmd_elems - 20 (20 reserved)

e.g. 200 - 20 = 180 = B4

So in
this example, fc_err_recov is set to fast_fail (02), dyntrk is set to yes (01)
and num_cmd_elems is set to 200.“

I tested this on a lab system
running AIX 6.1 TL6 and AIX 7.1 TL1. Starting with an FC adapter with dyntrk disabled (set to no), fc_err_recov disabled (set to
delayed_fail) and num_cmd_elems set
to 500.

# lsattr -El fscsi1

attachnoneHow this adapter is CONNECTEDFalse

dyntrknoDynamic Tracking of FC DevicesTrue

fc_err_recov delayed_fail FC Fabric Event Error RECOVERY
Policy True

scsi_idAdapter SCSI IDFalse

sw_fc_class3FC Class for FabricTrue

# lsattr -El fcs1 -a num_cmd_elems

num_cmd_elems 500 Maximum number of COMMANDS to queue to the adapter
True

# kdb

(0)> efscsi fscsi1
| grep efscsi_ddi

struct efscsi_ddi
ddi = 0xF1000A060096E080

(0)> dd 0xF1000A060096E080+20
2

F1000A060096E0A0: 0101020201000100
000001E000000028...............(

FFDDNNNNNNNN

OK, let’s break it down. From the kdb output we can determine the
following:

·fc_error_recov is currently set to
delayed_fail (FF=01
= fc_error_recov = delayed_fail).

Based on the output, num_cmd_elems is set to 200 (C8) and max_xfer_size is set to 1048576
(100000).

The max_xfer_size
for VFC is tricky because it is contained in a structure that can and does
change between SPs and TLs.In
6100-06-01 max_xfer_size is offset
3932 bytes into the structure so we get the value like this:

Perhaps the easiest way to handle
changes between versions is to use the fact that max_xfer_size is immediately after num_cmd_elems and that is very unlikely to change. So, knowing that
the structure size does not change by very much you can grep in the general area:

Attention: just a note about max_xfer_size
and virtual FC adapters. In my experience, if the values for this attribute on
the VIO client do not match those on
the VIO server, then you will have trouble configuring the virtual FC adapters.
Possible side effects may include your system never booting again!

So if I change the value to
0x200000 on the client, without mirroring this value on the VIO server, I may encounter
the following effects:

# rmdev -Rl
fcs1

sfwcomm1
Defined

fscsi1
Defined

fcnet1
Defined

fcs1 Defined

# chdev -l
fcs1 -a max_xfer_size=0x200000

fcs1 changed

The cfgmgr command will report errors for the FC adapter.

# cfgmgr

Method error
(/usr/lib/methods/cfgefscsi -l fscsi1
):

0514-061 Cannot
find a child device.

Method error
(/usr/lib/methods/cfgstorfworkcom -l sfwcomm1 ):

0514-040 Error initializing a device
into the kernel.

Errors, similar to the
following, may appear in the AIX error report.

# errpt
errpt | grep fcs

0E0C5B310726123812 U S fcs1Undefined error

8C9E92210726123812 I S fcs1Informational message

You’ll observe messages in
the error report that claim a request from the client was rejected by the VIOS.

If you encounter this
problem, restore the clients FC adapter attributes to their previous values
before restarting the system. If you don’t, then your LPAR may no longer boot
and may hang on LED 554. Change your VIOS first then update your VIO clients.

Until recently, if you were
configuring a new LPAR with virtual FC adapters you couldn’t force it to log
into the SAN before an operating system (such as AIX) was installed. I’ve
written about this before (see link below). I also offered a way to work around
this issue.

I’ve successfully used this method
on both POWER6 (595) and POWER7 (795) systems. After configuring a new LPAR
profile with a single VFC adapter, the VIOS reported that the client was not
logged into the SAN:

Here’s an example of converting
rootvg file systems from JFS to JFS2 using alt_disk_copy.

My lab system was migrated from AIX
5.3 to 7.1 via nimadm. Unfortunately, nimadm does not convert JFS file
systems to JFS2 during the migration. So, in this case, even though I’ve
migrated to AIX 7.1 (which is a good thing) I’m still left with legacy JFS file
systems in rootvg.

And because the AIX 5.3 version of
the alt_disk_copy command does not
have the –T option, I can’t convert my JFS file systems to JFS2 before I
migrate to AIX 7.1. So my best option is to migrate to AIX 7.1 then convert
rootvg to JFS2 file systems. A few hops in the process but it’s good enough.

aixlpar1 : /
# lsvg -l rootvg

rootvg:

LV NAMETYPELPsPPsPVsLV STATEMOUNT POINT

hd5boot111closed/syncdN/A

hd6paging32321open/syncdN/A

hd8jfslog111open/syncdN/A

hd4jfs441 open/syncd/

hd2jfs29291open/syncd/usr

hd9varjfs20201open/syncd/var

hd3jfs1641641open/syncd/tmp

hd1jfs441open/syncd/home

hd10optjfs441open/syncd/opt

localjfs441open/syncd/usr/local

loglvjfs441open/syncd/var/log

hd7sysdump991open/syncdN/A

hd71sysdump991open/syncdN/A

hd11adminjfs221open/syncd/admin

aixlpar1 : /
# oslevel -s

7100-01-01-1141

I clone rootvg to a spare disk using
alt_disk_copy and the –T flag (which will convert the file
systems to JFS2). The process converts the file systems to JFS2, as shown in
the section below (highlighted in green).

aixlpar1 : /
# alt_disk_copy
-d hdisk1 -T

Source boot
disk is: hdisk2

jfs2j2: Current data file
/image.data moved to /image.data.acct.save.3735802.

Before I reboot the system on the
alternate rootvg I verify that the cloned volume group now contains JFS2 file
systems only. I “wake up” the altinst_rootvg and run the lsvg command to confirm the file system is correct. I then put the
altinst_rootvg to “sleep”, reboot the system and verify all rootvg file systems
are mounted as jfs2.

aixlpar1 : /
# alt_rootvg_op -W -d hdisk1

Waking up
altinst_rootvg volume group ...

aixlpar1 : /
# lsvg -l altinst_rootvg

altinst_rootvg:

LV NAMETYPELPsPPsPVsLV STATEMOUNT POINT

alt_hd5boot111closed/syncdN/A

alt_hd6paging32321closed/syncdN/A

alt_hd8jfs2log111open/syncdN/A

alt_hd4jfs2441open/syncd/alt_inst

alt_hd2jfs229291open/syncd/alt_inst/usr

alt_hd9varjfs220201open/syncd/alt_inst/var

alt_hd3jfs21641641open/syncd/alt_inst/tmp

alt_hd1jfs2441open/syncd/alt_inst/home

alt_hd10optjfs2441open/syncd/alt_inst/opt

alt_localjfs2441open/syncd/alt_inst/usr/local

alt_loglvjfs2441open/syncd/alt_inst/var/log

alt_hd7sysdump991closed/syncdN/A

alt_hd71sysdump991closed/syncdN/A

alt_hd11adminjfs2221open/syncd/alt_inst/admin

aixlpar1 : /
# alt_rootvg_op -S altinst_rootvg

Putting
volume group altinst_rootvg to sleep ...

forced
unmount of /alt_inst/var/log

forced
unmount of /alt_inst/var

forced
unmount of /alt_inst/usr/local

forced
unmount of /alt_inst/usr

forced
unmount of /alt_inst/tmp

forced
unmount of /alt_inst/opt

forced
unmount of /alt_inst/home

forced
unmount of /alt_inst/admin

forced
unmount of /alt_inst

Fixing LV
control blocks...

Fixing file
system superblocks...

aixlpar1 : /
#

; Reboot on
the alternate rootvg hdisk

aixlpar1 : /
# uptime

10:18AMup 1 min,1 user,load average: 0.32, 0.09, 0.03

aixlpar1 : /
# lspv

hdisk100c342c637f21a59rootvgactive

hdisk200c342c6161c6b47old_rootvg

aixlpar1 : /
# df

Filesystem512-blocksFree %UsedIused %Iused Mounted on

/dev/hd452428841156822%30407% /

/dev/hd2380108853717686%3408935% /usr

/dev/hd9var262144024135208%35542% /var

/dev/hd321495808214763121%1101% /tmp

/dev/hd15242885224561%901% /home

/proc-----/proc

/dev/hd10opt52428826656050%513315% /opt

/dev/local5242884953206%2491% /usr/local

/dev/loglv5242885220401%491% /var/log

/dev/hd11admin2621442613841%71% /admin

aixlpar1 : /
# lsvg -l rootvg

rootvg:

LV NAMETYPELPsPPsPVsLV STATEMOUNT POINT

hd5boot111closed/syncdN/A

hd6paging32321open/syncdN/A

hd8jfs2log111open/syncdN/A

hd4jfs2441open/syncd/

hd2jfs229291open/syncd/usr

hd9varjfs220201open/syncd/var

hd3jfs21641641open/syncd/tmp

hd1jfs2441open/syncd/home

hd10optjfs2441open/syncd/opt

localjfs2441open/syncd/usr/local

loglvjfs2441open/syncd/var/log

hd7sysdump991open/syncdN/A

hd71sysdump991open/syncdN/A

hd11adminjfs2221open/syncd/admin

===

The message “filesystem
not converted” (below)
is not related to the JFS to JFS2 conversion. This messge refers to whether or
not the file system needs to be changed to use Variable Inode Extents (VIX).
This is the default setting for JFS2 file systems.

Why would I want to convert rootvg
to JFS2 anyway? Well for starters, it’s generally considered best practice to
use JFS2 as it offers several performance & scalability enhancements over
JFS. For example, you cannot create files greater than 2GB on JFS, unless the
file system was created as “large (big) file” enabled; jfs file systems in
rootvg were never created as large file enabled.

Another reason…..eventually JFS will
be retired.

Here’s an example of a potential
problem with JFS in rootvg. You try to create a file of a size greater than 2GB
in /tmp (type jfs). Even though the ulimit settings are not restricting the
creation of a file of this size, the JFS file system will not allow it. The
file creation process fails. The bf attribute
for the /tmp file system is set to false.
This indicates the file system is not “large file” enabled.

I’ve shared
my tips for resolving DLPAR problems in the past. So this week, when one of my
colleagues was experiencing an issue with DLPAR, I referred him to my blog post
and suggested he follow the troubleshooting steps. He did so and I went about
my business. Later that same day I asked him how he had fared. He told me that
DLPAR was still not working on his particular AIX LPAR. It was an AIX 5.3
system and he was attempting to another Virtual Processor to the LPAR. He
expressed his frustration with the situation, so I offered to take a look for
him.

What I
found was that the system was missing an important fileset. A fileset that
enabled DLPAR operations on AIX 5.3 systems. The fileset in question was named csm.client. Without this fileset
installed DLPAR would never work.

I advised
my colleague of the problem and suggested he follow the steps below to resolve
the issue. After he reinstalled the fileset, RMC communication between the HMC
and LPAR was restored and his DLPAR processor add operation completed without
issue.

1. Mount the NIM masters lpp_source file system:

aix53lpar1
: / # mount nim1:/export/lpp_source /mnt

2. Verify CSM filesets are not installed and the IBM.DRM subsystem is either inoperative
or missing.

I’ve received a couple of requests for an example of using a
post migration script with nimadm.
What follows is a simple example of using such a resource with NIM. If you are
not familiar with the nimadm tool then
perhaps you’d like to start first by reading my article on using nimadm
to migrate to AIX 6.1.

The nimadm utility
can perform both pre and post migration tasks. This is accomplished by running
NIM scripts either before or after a migration. The tool accepts the following
flags for pre and post migration script resources:

This
script resource that is run on the NIM master, but in the environment of the
client's alt_inst file system that is mounted on the master (this is done by
using the chroot command). This
script is run before the migration begins.

post-migration

This
script resource is similar to the pre-migration script, but it is executed
after the migration is complete.

We are going to focus on post-migration only, although the
configuration is the same for both.

In this example I need to uninstall and install a 3rd
party device fileset for a storage device. I need to perform this task as part
of the migration process. To protect the innocent, I have not named the storage
vendor in this post. But I will say that it was not IBM storage we are dealing
with in this case.

Before we start, first we collect all the necessary device
filesets that provide support for this type of storage on AIX. We place them
into a local directory on my NIM master. Along with the software, I also place
a copy of my NIM script in the same directory on the NIM master. The script
name is XYZpost.ksh.

root@nim1
: /usr/local/XYZ # ls -ltr

total
544

-r-xr-xr-x1 rootsystem51200 Mar 11
2011MPIO_1001I

-r-xr-xr-x1 rootsystem51200 Mar 11
2011MPIO_1002U

-r-xr-xr-x1 rootsystem51200 Mar 11
2011MPIO_1003U

-r-xr-xr-x1 rootsystem51200 Mar 11
2011MPIO_1004U

-r-xr-xr-x1 rootsystem51200 May 18 16:39 MPIO_1005U

-r-xr-xr-x1 rootsystem715 May 24 16:57
XYZpost.ksh

-rw-r--r--1 rootsystem2310 May 25 14:57
.toc

The contents of my script are simple. This script will
de-install the old device fileset and then immediately install the latest
version of the VendorXYZ’s device fileset. The script will then change the attributes
for the vendor’s storage to more appropriate default values.

At this point I copy the same directory and all of its contents
to the NIM client.

root@nim1 : /usr/local # scp –pr XYZ
lparaix01:/usr/local/

…etc…

lparaix01 : /usr/local/XYZ # ls -ltr

total
0

-r-xr-xr-x1 rootsystem51200 Mar 11
2011MPIO_1001I

-r-xr-xr-x1 rootsystem51200 Mar 11
2011MPIO_1002U

-r-xr-xr-x1 rootsystem51200 Mar 11
2011MPIO_1003U

-r-xr-xr-x1 rootsystem51200 Mar 11
2011MPIO_1004U

-r-xr-xr-x1 rootsystem51200 May 18 16:39 MPIO_1005U

-r-xr-xr-x1 rootsystem715 May 24 16:57
XYZpost.ksh

-rw-r--r--1 rootsystem2310 May 25 14:57
.toc

Make sure that any scripts you write for use with nimadm start with an appropriate ‘hashbang’ to announce it is a shell
script and the shell that must be used to execute it e.g. #!/usr/bin/ksh.
If you forget to do this nimadm will
fail to execute your script and will report an error message similar to the
following:

/lparaix01_alt/alt_inst/tmp/.alt_mig_chroot_script.11731036: Cannot run a file that does not have a valid format.

The next step is to define the script as a NIM resource so that nimadm can call the resource during the
migration process. I’ve decided to call this new NIM resource, XYZPOST.

This is easily achieved using smit nim_mkres:

root@nim1
: / # smit nim_mkres

|script= an executable file which is
executed on a client|

Define a Resource

Type
or select values in entry fields.

Press
Enter AFTER making all desired changes.

[Entry Fields]

*
Resource Name[XYZPOST]

*
Resource Typescript

* Server of Resource[master]+

* Location of Resource[/usr/local/XYZ/XYZpost.ksh]/

We can confirm that the NIM script resource is now available
using the lsnim command.

root@nim1
: / # lsnim -t script

XYZPOSTresourcesscript

root@nim1
: / # lsnim -l XYZPOST

XYZPOST:

class= resources

type= script

Rstate= ready for use

prev_state= unavailable for use

location= /usr/local/XYZ/XYZpost.ksh

alloc_count = 0

server= master

Now that the script is in place, and defined to NIM, we are
ready to test it. We will migrate the system from AIX 5.3 to AIX 6.1 using nimadm. Once the migration phase is
complete (phases 1 to 6), the post-migration script will be executed in the NIM
clients nimadm (chroot) environment
on the NIM master. Once this is finished the NIM clients data is synced back to
the NIM clients alternate disk and the boot image is created. The migration
process is then complete.

We add the –z flag to
our nimadm command line options to
specify the post migration resource.

In normal operation we would simply let nimadm run all phases in sequence with the following command.

Phase 6 has completed successfully. The NIM clients rootvg data
has been migrated from AIX 5.3 to 6.1 on the NIM master. The data has not yet been
synced back to the NIM client.

At this stage we can now run phase 7 separately and ensure that
it performs the required task. We expect it will de-install the device fileset,
install the latest version and change the ODM default attributes for the device
type. Again, you’ll notice that we specify the –P flag for phase 7 only.

Great news! Our script has worked as expected. The old fileset
was de-installed, the new fileset was installed and the PdAt default attributes
were changed successfully.

Note:
You can also review the post migration script output at a later
date if you wish. All nimadm
activities are logged, on the NIM master, to /var/adm/ras/alt_mig/NIMclientname_alt_mig.log
(where NIMclientname is the name
of the NIM client being migrated by nimadm).

With regard to nimadm
log files, please be aware that if you choose to run nimadm in phases (as I’ve shown in this example) that each run will
generate a new log file. So in my
case, when I ran phases 1 to 6, this created a log file named
lparaix01_alt_mig.log. When I ran phase 7, the original log file was moved to
lparaix01_alt_mig.log.prev. A new
log file was created and used for phase 7. Then when I ran phases 8 to 12, the
phase 7 log file was moved to lparaix01_alt_mig.log.prev and a new log file was
used for phases 8-12. For this reason you may want to backup each log file to a
unique file name as you execute each phase group, so that you do not lose any of
the information logged to the .log or .log.prev files.

Now we can complete the rest of the migration and execute the
remaining phases, 8 through 12.

If you run out of space in the root
file system, odd things can happen when you try to map virtual devices to
virtual adapters with mkvdev.

For example, a colleague of mine was
attempting to map a new hdisk to a vhost adapter on a pair of VIOS. The VIOS
was running a recent version of code. He received the following error message
(see below). It wasn’t a very helpful message. At first I thought it was due to
the fact that he had not set the reserve_policy
attribute for the new disk to no_reserve
on both VIOS. Changing the value for that attribute did not help.

I found the
same issue on the second VIOS i.e. a full root file system due to a core file
(from cimserver). I also found no trace of a full file system event in the error
report. Perhaps someone had taken it upon themselves to “clean house” at some
point and had removed entries from the VIOS error log.

Make sure
you monitor file system space on your VIOS. Who knows what else might fail if
you run out of space in a critical file system.

Are you backing up your AIX systems
over Virtual Ethernet adapters? Of course you are, who isn’t right? Are your
backup server and clients on the same physical POWER system? You are most
likely backing up over Virtual Ethernet to another AIX LPAR that is running
your enterprise backup software, such as TSM or Legato Networker for example.
And you probably have a dedicated private virtual network (and adapters) on
both the clients and the server to handle the traffic for the nightly backups.
The next question is, have you tuned your Virtual Ethernet adapters?

There are several tips available for
tuning your Virtual Ethernet adapters for better performance on AIX. These tips
include changing settings such as MTU size, TCP window sizes, enabling
largesend, etc. I highly recommend the following blog posts from Anthony English
and Nigel Griffiths on this subject:

OK, so you got everything humming
along nicely, your backups are flying over the virtual network (across the POWER
hypervisor) and everybody is happy. After a period of time, you notice that the
backups have started to “slow down”. They are taking longer to finish. The
overall throughput of a backup drops. Some backups start in the evening around
9pm and are still running the next morning at 7am! In some cases you need to
kill the backups or even reboot the backup server LPAR for things to return to
normal.

“What is going on!?” You cry.

Well, there are a number of reasons
why this could be happening. For example, your shared processor pool may be
overwhelmed during the backup window. As we know, Virtual Ethernet adapters require
CPU to do their work. If the CPU pool is running low on available CPU
resources, this could contribute to the problem. And of course there could be
tuning issues with the Virtual Ethernet adapters or the AIX OS in general. Or
there may be issues with other pieces of the infrastructure, like network and
SAN switches, adapters, etc. Perhaps there’s an issue with the applications
and/or databases on the AIX systems? They often have their own mechanisms/tools
for backing up their data to your enterprise backup software. Is the backup
server sized to cope with the load i.e. CPU, memory, disk layout and I/O,
sufficient tape drives, disk storage pools, etc?

So assuming you’ve checked all of
the above (and more), then perhaps you’ve hit a problem that I encountered
recently. In my particular case, backups “over the hypervisor” were slowing
down, without any discernible cause. Initially the backups would be “very fast”
but after a month or so, things would start to slow down dramatically.

We noticed that there were very
large (and increasing) values for “Packets
Dropped”, “Hypervisor Send/Receive
Failures” and “No Resource Errors”
in the output from the netstat –v
command.

ETHERNET
STATISTICS (ent1) :

Device Type:
Virtual I/O Ethernet Adapter (l-lan)

Hardware
Address:41:ba:13:e7:25:0b

Elapsed
Time: 42 days 4 hours 3 minutes 34 seconds

Transmit
Statistics:Receive
Statistics:

---------------------------------------

Packets:
5978589961Packets: 26139832411

Bytes:
779465989202Bytes:
711051516630458

Interrupts:
0Interrupts: 6804561727

Transmit
Errors: 0Receive Errors: 0

Packets
Dropped: 0Packets Dropped: 86012309

...

Max
Collision Errors: 0No Resource Errors:
46113807

...

Hypervisor
Send Failures: 0

Receiver Failures: 0

Send Errors: 0

Hypervisor Receive Failures: 46113807

After some discussion with IBM AIX
support, we discovered that would should increase some of the buffer sizes for
our Virtual Ethernet adapter (the entX device). This would alleviate the no
resource issues we’d been experiencing. Looking at the output from the netstat -v command, we also noticed
that the Medium, Large and Huge buffers had
all reached their maximum values in the past.

...

Receive Information

Receive Buffers

Buffer TypeTinySmallMediumLargeHuge

Min Buffers5125121282424

Max Buffers204820482566464

Allocated5135351482864

Registered5125101272413

History

Max Allocated5769512566464

Lowest Registered502502641211

...

The advice from IBM
support was to increase these buffers using the chdev command (they also advised that we should reboot for the
changes to take effect):

Since implementing this
tuning change (to the adapter on the backup server), we have not had a repeat
of the problem. We will continue to monitor the performance and I’ll be sure to
let everyone know if we have further issues.

This entry is
similar in theme to one of my previous posts
about verifying your hdisk queue_depth settings with kdb. This time we want to check if an attribute for a Virtual FC (VFC)
adapter has been modified and whether or not AIX has been restarted since the
change. The attribute I’m interested in is num_cmd_elems.
This value is often changed from its default settings, in AIX environments, to improve
I/O performance on SAN attached storage.

From
kdb you can identify the VFC
adapters configured on an AIX system using the vfcs subcommand. Not only does this tell you what adapters you
have, but it also identifies the VIOS each adapter is connected to and the
corresponding vfchost adapter. Nice!

(0)>
vfcs

NAMEADDRESSSTATEHOSTHOST_ADAPOPENED NUM_ACTIVE

fcs00xF1000A00103D40000x0008vio1vfchost100x010x0000

fcs10xF1000A00103D60000x0008vio2vfchost100x010x0000

You can view
the current (running) configuration of a VFC adapter using the kdb vfcs subcommand and the name of the
VFC adapter, for example fcs1:

Using the
output from this command we can determine the current (running) value for a
number of VFC attributes, including num_cmd_elems.

So I start
with an adapter with a num_cmd_elems
value of 200. Both the lsattr command
and kdb report 200 (C8 in hex) for num_cmd_elems.

#
lsattr -El fcs1 -a num_cmd_elems

num_cmd_elems
200 Maximum
Number of COMMAND Elements True

#
echo vfcs fcs1 | kdb | grep num_cmd_elems

num_cmd_elems:
0xC8location_code: U9119.FHA.87654A1-V20-C10-T1

I change num_cmd_elems to 400 with chdev –P (remember,
the –P flag only updates the AIX ODM, and not the running configuration of the
device in the AIX kernel. You must either reboot for this change to take effect
or offline & online the device).

#
chdev -l fcs1 -a num_cmd_elems=400 -P

fcs1
changed

Now the lsattr command reports num_cmd_elems is set to 400 in the ODM.