The first
time I ran the ’viostat –adapter’ command I expected to find
non-zero values for kbps, tps, etc, for each vfchost adapter. However, the values were always zero, no matter
how much traffic traversed the adapters.

$ viostat
-adapter 1 10

...

vadapter:Kbpstpsbkreadbkwrtn

vfchost00.00.00.00.0

...

vadapter:Kbpstpsbkreadbkwrtn

vfchost10.00.00.00.0

I wondered
if this was expected behaviour. Was the output supposed to report the amount of
pass-thru traffic per vfchost? In 2011, I posed this question on the IBM developerWorks
PowerVM forum regarding this observation. One of the replies stated:

"viostat
does not give statistics for NPIV devices. The vfchost adapter is just a
passthru, it doesn't know what the commands it gets are."

I
appreciated someone taking the time to answer my question but I was still
curious. I tested the same command again (in 2013) on a recent VIOS level (2.2.2.1),
but I received the same result. It was time to source an official answer on
this behaviour.

Here is the
official response I received from IBM:

1.FC adapter stats in viostat/iostat do not include
NPIV.

2.viostat & iostat are an aggregate of all the
stats from the underlying disks, which of course NPIV doesn't have.

There's really no way for the vfchost adapter to monitor I/O,
since it doesn't know what the commands it gets are. He's just a passthru,
passing the commands he gets from the client directly to the physical FC
adapter.

3.You can run fcstat on the VIOS but that has the same
issues/limitations mentioned above.

Intent here was that customers would use tools on the client to monitor
this sort of thing.

To summarize
the comments from Development:

viostat
does NOT give statistics for NPIV devices.”

This made
sense but I wondered why the tool hadn’t been changed to exclude vfchost adapters from the output (to
avoid customer confusion). There's obviously no valid reason to ever display any
information for this type of adapter. I also understood that it was expected
that I/O would be monitored at the client LPAR level. But I must say that an
option for monitoring VFC I/O from a VIO server would be advantageous i.e. a
single source view of all I/O activity for all VFC clients; particularly when
there are several hundred partitions on a frame.The response was:

“…the way
the vfchost driver currently works is that it calls iostadd to register a
dkstat structure, resulting in the adapter being listed when viostat is
called.This is misleading, however,
since the vfchost driver does not actually track I/O.The commands coming from the client partition
are simply passed as-is to the physical FC adapter, and we don't know if a
particular command is an I/O command or not.The iostadd call is left over from porting the code from the vscsi
driver, and Development agrees it should probably have been removed before
shipping the code.

There has
also been mention of a DCR #MR0413117456 (Title: FC adapter stats in
viostat/iostat does not include NPIV) which you can follow-up with Marketing to
register your interest/track progress if that is something you're interested in
pursuing.”

Last year I was working for a customer that was upgrading several AIX 5.3 systems to AIX 6.1. The migrations were successful for the most part, but we did encounter one issue that took a little time to resolve.

The customer was using nimadm to migrate. This process worked fine, however, on a couple of systems a strange error was encountered after the migration. The LPAR was booted into AIX 6.1 and everything came up fine. The applications were started and users started accessing the system.

It was several days later, when the AIX administrator attempted to configure new storage on the AIX system, when the first sign of trouble appeared. He had asked his Storage guy to assign a couple of new disks to his LPAR (via NPIV/VFC). As soon as the Storage admin had completed the assignment, the AIX admin ran cfgmgr to detect and configure the new hdisks. Immediately, cfgmgr reported the following error:

Method error (/usr/lib/methods/cfgscsidisk):

0514-023 The specified device does not exist in the

customized device configuration database.

Initially, the AIX team suspected there was some fault with either the storage device or the zoning of the disk. Both of these items were checked and doubled-checked and were found to be OK. Our next step was to run cfgmgr again, but this time we wanted a greater level of detail captured. To do this we used the following environment variable to force cfgmgr to be ‘more verbose’.

# export CFGLOG="cmd,meth,lib,verbosity:9"

We ran cfgmgr and went to the /var/adm/ras/cfglog file to view the results with the alog command. However, we noticed that the cfglog file had a size of zero (0) and contained no data.

# cd /var/adm/ras

# ls –l cfglog

-rw-r----- 1 root system 0 May 16 13:22 cfglog

We decided to recreate the cfglogalog file and run mkdev again to reproduce the disk configuration error.

# rm cfglog

# echo "Create cfglog `date`"|alog -t cfg

# mkdev -l hdisk0

Method error (/usr/lib/methods/cfgscsidisk):

0514-023 The specified device does not exist in the

customized device configuration database.

This time we found some useful data in the cfglog file.

# alog -t cfg -o

MS 31981804 28835876 /usr/lib/methods/cfgscsidisk -l hdisk39

M4 31981804 Parallel mode = 0

M4 31981804 Get CuDv for hdisk39

M4 31981804 Get device PdDv, uniquetype=disk/fcp/htcvspmpio

M4 31981804 Get parent CuDv, name=fscsi0

M4 31981804 ..is_mpio_capable()

M4 31981804 Device is MPIO

M4 31981804 ..get_paths()

M4 31981804 Getting CuPaths for name='hdisk39'

M4 31981804 Found 1 paths

M0 31981804 cfgcommon.c 225 mpio_init error, rc=23

MS 28835892 31981568 /usr/lib/methods/cfgscsidisk -l hdisk0

M4 28835892 Parallel mode = 0

M4 28835892 Get CuDv for hdisk0

M4 28835892 Get device PdDv, uniquetype=disk/fcp/htcvspmpio

M4 28835892 Get parent CuDv, name=fscsi0

M4 28835892 ..is_mpio_capable()

M4 28835892 Device is MPIO

M4 28835892 ..get_paths()

M4 28835892 Getting CuPaths for name='hdisk0'

M4 28835892 Found 2 paths

M0 28835892 cfgcommon.c 225 mpio_init error, rc=23

MS 25690326 27328608 /usr/lib/methods/cfgscsidisk -l hdisk0

M4 25690326 Parallel mode = 0

M4 25690326 Get CuDv for hdisk0

M4 25690326 Get device PdDv, uniquetype=disk/fcp/htcvspmpio

M4 25690326 Get parent CuDv, name=fscsi0

M4 25690326 ..is_mpio_capable()

M4 25690326 Device is MPIO

M4 25690326 ..get_paths()

M4 25690326 Getting CuPaths for name='hdisk0'

M4 25690326 Found 2 paths

M0 25690326 cfgcommon.c 225 mpio_init error, rc=23

The configuration method was attempting to configure a disk device type of htcvspmpio (which was correct) but it was unable to configure the device paths (mpio_init error rc=23). We suspected that the system was missing some sort device driver support for the type of storage in use.

Cutting a very long story short, we determined, with the help of the IBM AIX support team, that the issue stemmed from “old” AIX installation media used to create the AIX 6.1 TL6 SP5 SPOT and lppsource on the NIM master. Old AIX 6.1 media was originally used (several years ago) to create the NIM resources and was gradually updated over time, all the way up to TL6 SP5.

IBM support identified that the older install media contained a liblpp.a file that was missing the necessary PdPathAt ODM files. Newer install media contained a fix to add the appropriate entries to the bos.rte.cfgfiles. e.g.

I tested this in my lab. It worked as advertised. I simply ran the mkwpar command to restore an AIX 5.3 TL12 SP9 image and several minutes later I had an AIX 5.3 vWPAR up and running on my POWER8 system (S824).

I received an email this week from a
colleague that worked with me on the NIM
Redbook back in 2006. He was experiencing an issue with DSM and NIM. He was
attempting to use the dgetmacs
command to obtain the MAC address of the network adapters on an LPAR. The
command was failing to return the right information.

I experienced this very issue during
the writing of the AIX 7.1 Differences
Guide Redbook. And given that I was in Austin, sitting in the same building
as the AIX development team, I was able to speak with the developers directly
about the issue. At that time they provided me with the following workaround.

First they asked me to check the size
of the /usr/lib/nls/msg/en_US/IBMhsc.netboot.cat message
catalog file.

# ls –l /usr/lib/nls/msg/en_US/IBMhsc.netboot.cat

-rw-r--r--1 binbin3905 Aug 08 09:54

They were surprised to find that the
file appeared to be “too small”. They promptly sent me the catalog file from
one of their development AIX 7.1 systems.I replaced the file as follows:

# cd/usr/lib/nls/msg/en_US/

# ls -ltr IBMhsc*

-rw-r--r--1 binbin3905 Aug 08 09:54
IBMhsc.netboot.cat

# cp -p IBMhsc.netboot.cat
IBMhsc.netboot.cat.old

# cp /tmp/lpar1/IBMhsc.netboot.cat.new IBMhsc.netboot.cat

# ls -ltr IBMhsc*

-rw-r--r--1 binbin3905 Aug 08 09:54
IBMhsc.netboot.cat.old

-rw-r--r--1 binbin26374 Dec 23 11:24 IBMhsc.netboot.cat

This fixed the problem for me during
the residency.

So I asked my friend to do the same
(after I sent him the message catalog file). He ran the dgetmacs command again and this time it returned the MAC address
for all the network adapters in his LPAR. Success!

A colleague of mine was planning to modify the max_xfer_size attribute on a couple of
FC adapters in one of his AIX LPARs. As he was describing his plan to me, I
asked him how he intended to back out of the change should the LPAR fail to
boot after the modifications. “But, what could
possibly go wrong?” he fired back. I advised him to use multibos to create a standby (backup)
instance of the AIX OS, just in case. He begrudgingly did so, just to keep me
happy.

The next day he told me the following
tale.

He had modified the FC adapters max_xfer_size attribute as planned.
First, checking the current values, for the attribute on both adapters.

aixlpar1
: / # lsattr -El fcs0 -a max_xfer_size

max_xfer_size
0x100000 Maximum Transfer Size True

aixlpar1
: / # lsattr -El fcs1 -a max_xfer_size

max_xfer_size
0x100000 Maximum Transfer Size True

He’d created a standby AIX instance before
making changes to the adapters. He also prevented multibos from changing the bootlist to the standby boot logical
volume (BLV).

Then he manually changed the LPARs boot
list to include the standby BLV.

aixlpar1
: / # bootlist -m normal hdisk2 blv=hd5
hdisk2 blv=bos_hd5

aixlpar1
: / # bootlist -m normal -o

hdisk2
blv=hd5 pathid=0

hdisk2
blv=hd5 pathid=1

hdisk2 blv=bos_hd5 pathid=0

hdisk2 blv=bos_hd5 pathid=1

He carefully recorded the bootlist output, just in case the boot
failed with new max_xfer_size values.
He could use the vdevice name and location
to manually select the standby BLV to start the system in an emergency.

The cause of the 554 hang appeared to be related
to the fact that the VIOS physical adapters needed their max_xfer_size value changed to the new value before the client LPAR
virtual fibre channel adapters were modified.

Note: Try this on a crash’n’burn system before unleashing it’s
fury on a real AIX system (i.e. one that has users that depend on it!). Always
take a mksysb backup before performing this type of activity.

aixlpar1 :
/tmp # ksh -x fixmyrootvg.ksh

+ + lslv -l
hd5

+ grep hdisk

+ head -1

+ awk {print
$1}

PV=hdisk0

+ VG=rootvg

+ lqueryvg
-Lp hdisk0

+ awk {
print $2 }

+ read
LVname

+ odmdelete
-q name = hd5 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd5 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd5 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd5 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd6 -o CuAt

0518-307
odmdelete: 4 objects deleted.

+ odmdelete
-q name = hd6 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd6 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd6 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd8 -o CuAt

0518-307
odmdelete: 3 objects deleted.

+ odmdelete
-q name = hd8 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd8 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd8 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd4 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd4 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd4 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd4 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd2 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd2 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd2 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd2 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd9var -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd9var -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd9var -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd9var -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd3 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd3 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd3 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd3 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd1 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd1 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd1 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd1 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd10opt -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd10opt -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd10opt -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd10opt -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = local -o CuAt

0518-307
odmdelete: 4 objects deleted.

+ odmdelete
-q name = local -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = local -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = local -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd7 -o CuAt

0518-307
odmdelete: 3 objects deleted.

+ odmdelete
-q name = hd7 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd7 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd7 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd11admin -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd11admin -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd11admin -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd11admin -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = rootvg -o CuAt

0518-307
odmdelete: 3 objects deleted.

+ odmdelete
-q parent = rootvg -o CuDv

0518-307
odmdelete: 0 objects deleted.

+ odmdelete
-q name = rootvg -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q name = rootvg -o CuDep

0518-307
odmdelete: 0 objects deleted.

+ odmdelete
-q dependency = rootvg -o CuDep

0518-307
odmdelete: 0 objects deleted.

+ [ rootvg =
rootvg ]

+ odmdelete
-q value1 = 10 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = rootvg -o CuDvDr

0518-307
odmdelete: 0 objects deleted.

+ importvg -y rootvg hdisk0

rootvg

0516-012 lvaryoffvg: Logical
volume must be closed.If the logical

volume contains a filesystem, the
umount command will close

the LV device.

0516-942 varyoffvg: Unable to
vary off volume group rootvg.

+ varyonvg rootvg

+ synclvodm -Pv rootvg

synclvodm:
Physical volume data updated.

synclvodm:
Logical volume hd5 updated.

synclvodm:
Logical volume hd6 updated.

synclvodm:
Logical volume hd8 updated.

synclvodm:
Logical volume hd4 updated.

synclvodm:
Logical volume hd2 updated.

synclvodm:
Logical volume hd9var updated.

synclvodm:
Logical volume hd3 updated.

synclvodm:
Logical volume hd1 updated.

synclvodm:
Logical volume hd10opt updated.

synclvodm:
Logical volume hd7 updated.

synclvodm:
Logical volume hd11admin updated.

+ savebase

Hey Presto! The root volume
group is now named rootvg just the
way we like it!

Starting with AIX Version 7.2, the AIX operating system provides the AIX Live Update function which eliminates downtime associated with patching the AIX operating system. Previous releases of AIX required systems to be rebooted after an interim fix was applied to a running system. This new feature allows workloads to remain active during a Live Update operation and the operating system can use the interim fix immediately without needing to restart the entire system. In the first release of this feature, AIX Live Update will allow customers to install interim fixes (ifixes) only. Ultimately it may be possible to use this function to install AIX Service Packs (SPs) and Technology Levels (TLs) without a reboot.

IBM delivers kernel fixes in the form of ifixes to resolve issues that are reported by customers. If a fix changes the AIX kernel or loaded kernel extensions that cannot be unloaded, the host logical partition (LPAR) must be rebooted. To address this issue, AIX Version 7.1, and earlier, provided concurrent update-enabled ifixes that allowed deployment of some limited kernel fixes to a running LPAR. Unfortunately not all ifixes could be delivered as “concurrent update-enabled”. The AIX Live Update solution is not constrained by the same limitations as in the case of concurrent update enabled ifixes. The AIX 7.2, Live Update feature will allow customers to install ifixes without needing to reboot their AIX systems, avoiding downtime for their mission critical, production workloads.

This article (in the link below) will discuss the high-level concepts relating to AIX Live Updates and then provide a real example of how to use the tool to patch a live AIX system. I was fortunate enough to take part in an Early Ship Program (ESP) for AIX 7.2. During the ESP I had the opportunity to test the AIX Live Update feature. I’ll share my experience using this tool in the example that follows.

Technology Level 3, Service Pack 1 for AIX 7.1 introduced a new feature for the mksysb utility. This new feature makes it possible to create consistent mksysb backups of your AIX systems using file system snapshot technology. This feature is also available with Technology Level 9 for AIX 6.1.

Previously, when taking a mksysb image of an AIX system, the administrator would often see messages stating the files that were meant to be backed up were now missing (usually transient temporary files) or occasionally there would be issues backing up files that were currently “in use”. Often the administrator would need to quiesce the system in order to take a “clean” backup of the system. And, depending on the size of the system, the time required to perform the backup (with the system offline) was significant.

A solution was required to provide the administrator with the ability to create a volume group backup without having to quiesce the system (or volume group). This would allow the backup to complete (without missing any files) and provide a consistent and functionally stable backup.

This challenge is easily answered using the file system snapshot technology already built-in to AIX. JFS2 has the ability to create snapshots of a mounted JFS2 file system. This creates a consistent (block-level) image of the file system at a point in time.

The mksysb command has been enhanced to call the snapshot command to create snapshots of the JFS2 file systems (in rootvg) and then use the snapshots to create a mksysb backup. To enable the snapshot feature, use the new –T flag with the mksysb command.

Not surprisingly, a few other AIX ‘backup related’ commands have also benefited from this new feature. The savevg, savewpar, mkcd and mkdvd commands have all been enhanced to accept the new –T flag to enable file system snapshots.

The mksysb snapshot feature creates an external snapshot of each of the mounted JFS2 file systems in the root volume group. This requires additional free space in the volume group to create the external logical volumes for each snapshot file system. When the backup is complete, the snapshots are deleted.

Note: snapshots are only available for JFS2 file systems. JFS file systems in a volume group will be backed up using traditional methods.

Here’s an example of using the –T flag with the mksysb command.

After issuing the mksysb command we soon see the message “Creating snapshots”.

# oslevel -s

7100-03-01-1341

# mksysb -Tvie /mksysb/lpar11-mksysb-snapshot

mksysb -Tvie /mksysb/lpar11-mksysb-snapshot

Creating information file (/image.data) for rootvg.

Creating snapshots.

…

Looking at the output from the df command, we observe several snapshot file systems have been created and mounted.

And we see the corresponding snapshot logical volumes in the root volume group.

# lsvg -l rootvg

rootvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

hd5 boot 1 1 1 closed/syncd N/A

hd6 paging 8 8 1 open/syncd N/A

hd8 jfs2log 1 1 1 open/syncd N/A

hd4 jfs2 16 16 1 open/syncd /

hd2 jfs2 64 64 1 open/syncd /usr

hd9var jfs2 32 32 1 open/syncd /var

hd3 jfs2 16 16 1 open/syncd /tmp

hd1 jfs2 1 1 1 open/syncd /home

hd10opt jfs2 16 16 1 open/syncd /opt

hd11admin jfs2 2 2 1 open/syncd /admin

lg_dumplv sysdump 16 16 1 open/syncd N/A

livedump jfs2 4 4 1 open/syncd /var/adm/ras/livedump

fixeslv jfs2 100 100 1 open/syncd /fixes

mksysblv jfs2 100 100 1 open/syncd /mksysb

fslv00 jfs2 1 1 1 open/syncd N/A

fslv01 jfs2 5 5 1 open/syncd N/A

fslv02 jfs2 1 1 1 open/syncd N/A

fslv03 jfs2 1 1 1 open/syncd N/A

fslv04 jfs2 1 1 1 open/syncd N/A

fslv05 jfs2 1 1 1 open/syncd N/A

fslv06 jfs2 1 1 1 open/syncd N/A

fslv07 jfs2 1 1 1 open/syncd N/A

fslv08 jfs2 13 13 1 open/syncd N/A

fslv09 jfs2 1 1 1 open/syncd N/A

Whilst the mksysb backup is still running, we are able to query the status of each snapshot using the snapshot command.

# snapshot -q /

Snapshots for /

Current Location 512-blocks Free Time

* /dev/fslv00 131072 123392 Mon Dec 2 12:57:11 2013

Snapshots for /

Current Location 512-blocks Free Time

* /dev/fslv00 131072 123392 Mon Dec 2 12:57:11 2013

Snapshots for /usr

Current Location 512-blocks Free Time

* /dev/fslv01 655360 646656 Mon Dec 2 12:57:13 2013

Snapshots for /var

Current Location 512-blocks Free Time

* /dev/fslv02 131072 129024 Mon Dec 2 12:57:14 2013

Snapshots for /tmp

Current Location 512-blocks Free Time

* /dev/fslv03 131072 130048 Mon Dec 2 12:57:16 2013

Snapshots for /home

Current Location 512-blocks Free Time

* /dev/fslv04 131072 130304 Mon Dec 2 12:57:17 2013

Snapshots for /opt

Current Location 512-blocks Free Time

* /dev/fslv05 131072 129536 Mon Dec 2 12:57:18 2013

Snapshots for /admin

Current Location 512-blocks Free Time

* /dev/fslv06 131072 130304 Mon Dec 2 12:57:20 2013

Snapshots for /var/adm/ras/livedump

Current Location 512-blocks Free Time

* /dev/fslv07 131072 130304 Mon Dec 2 12:57:21 2013

Snapshots for /fixes

Current Location 512-blocks Free Time

* /dev/fslv08 1703936 1701376 Mon Dec 2 12:57:22 2013

Snapshots for /mksysb

Current Location 512-blocks Free Time

* /dev/fslv09 131072 130304 Mon Dec 2 12:57:24 2013

From the mksysb man page:

-T = Creates backup by using snapshots. This command applies only to JFS2 file systems. When you specify the -T flag to use snapshots for creating a volume group backup, external JFS2 snapshots are created. Snapshots allow for a point-in-time image of a JFS2 file system and thus, do not require a system to be put into a temporarily inactive state. The size of the snapshot is 2% - 15% of the size of the file system. The snapshot logical volumes are removed when back up is complete. However, snapshots are not removed if a file system already has other snapshots. Additionally, if a file system has internal snapshots, external snapshots cannot be created and thus, snapshots are not used for creating the backup of the file system. The use of the -T flag does not affect any JFS file systems that are present in the volume group that is being backed up. These file systems are backed up in the same manner as done previously.

I was performing a volume group re-org i.e. changing the
INTER-POLICY of a logical volume from minimum to maximum.

# lslv
fixeslv| grep INTER

INTER-POLICY:minimumRELOCATABLE:yes

# chlv
-e x fixeslv

# lslv
fixeslv| grep INTER

INTER-POLICY:maximumRELOCATABLE:yes

I attempted to run the reorgvg
command. I was greeted by the following error message!

# reorgvg
tempvg fixeslv

0516-966 reorgvg:
Unable to create internal map.

I ran the command again, this time with truss. I found that the /usr/sbin/allocp
command was being called and was failing. I determined this must be because of
a lack of space at the logical volume layer.

#
/usr/sbin/allocp -?

/usr/sbin/allocp:
Not a recognized flag: ?

0516-422
allocp: [-i LVid] [-t Type] [-c Copies] [-s Size]

[-k] [-u UpperBound>] [-e
InterPolicy] [-a InterPolicy

The truss output showed:

statx("/usr/sbin/allocp", 0x2FF21ED8, 76, 0)= 0

statx("/usr/sbin/allocp", 0x20009E70, 176, 020) = 0

kioctl(2, 22528, 0x00000000, 0x00000000)Err#25
ENOTTY

kfork()=
3735812

_sigaction(20,
0x00000000, 0x2FF21F20)= 0

_sigaction(20,
0x2FF21F20, 0x2FF21F30)= 0

kwaitpid(0x2FF21F90,
-1, 6, 0x00000000, 0x00000000) = 3735812

And yes, my volume group was indeed out of free PPs!

# lsvg
tempvg

VOLUME
GROUP:
tempvg
VG IDENTIFIER: 00f6027300004c0000000130773bdb73

VG
STATE:
active
PP SIZE: 512 megabyte(s)

VG
PERMISSION: read/write
TOTAL PPs: 99 (50688 megabytes)

MAX
LVs:
256
FREE
PPs: 0 (0 megabytes)

LVs:
1
USED PPs: 99 (50688 megabytes)

OPEN
LVs:
1
QUORUM: 2 (Enabled)

TOTAL
PVs: 2
VG DESCRIPTORS: 3

STALE
PVs:
0
STALE PPs: 0

ACTIVE
PVs:
1
AUTO
ON: yes

MAX
PPs per VG: 32768
MAX PVs: 1024

LTG
size (Dynamic): 256
kilobyte(s) AUTO
SYNC: no

HOT
SPARE:
no
BB POLICY: relocatable

PV
RESTRICTION: none

cgaix7[/opt]
>

Silly me, it clearly states in the reorgvg man page that there must be at least one free PP in the
volume group for the command to run successfully.

2At least one free physical partition
(PP) must exist on the specified volume group for the reorgvg command to run
successfully. For mirrored logical volumes, one free PP per physical
volume (PV) is required in order for the reorgvg command to maintain logical
volume strictness during execution; otherwise the reorgvg command still runs,
but moves both copies of a logical partition to the same disk during its
execution.

So I shrunk the file system in question (there was a large amount
of allocated but unused file system space, so it was safe to shrink it).

I can see when my reorgvg
failed (rc=1) and when it succeded (rc=0). This is also a good way of
determining when a reorgvg command
is issued and when it finished. Of course, an easier way would be to start the reorgvg command with the time command. It will produce a nice
little summary of the time taken.

# time
reorgvg tempvg fixeslv

0516-962
reorgvg: Logical volume fixeslv migrated.

real3m12.94s

user0m1.52s

sys0m4.60s

But if I forgot to use the time
command, I can look at the lvmcfg alog file for an answer. In the following
example, the reorgvg.sh process is
started at 23:49. The entry in the log file begins with an uppercase S. The entry that starts with an
uppercase E indicates the end of the
reorgvg.sh process.It is the information in the third field that
tells me how long the process ran for in seconds:milliseconds.