This entry is
similar in theme to one of my previous posts
about verifying your hdisk queue_depth settings with kdb. This time we want to check if an attribute for a Virtual FC (VFC)
adapter has been modified and whether or not AIX has been restarted since the
change. The attribute I’m interested in is num_cmd_elems.
This value is often changed from its default settings, in AIX environments, to improve
I/O performance on SAN attached storage.

From
kdb you can identify the VFC
adapters configured on an AIX system using the vfcs subcommand. Not only does this tell you what adapters you
have, but it also identifies the VIOS each adapter is connected to and the
corresponding vfchost adapter. Nice!

(0)>
vfcs

NAMEADDRESSSTATEHOSTHOST_ADAPOPENED NUM_ACTIVE

fcs00xF1000A00103D40000x0008vio1vfchost100x010x0000

fcs10xF1000A00103D60000x0008vio2vfchost100x010x0000

You can view
the current (running) configuration of a VFC adapter using the kdb vfcs subcommand and the name of the
VFC adapter, for example fcs1:

Using the
output from this command we can determine the current (running) value for a
number of VFC attributes, including num_cmd_elems.

So I start
with an adapter with a num_cmd_elems
value of 200. Both the lsattr command
and kdb report 200 (C8 in hex) for num_cmd_elems.

#
lsattr -El fcs1 -a num_cmd_elems

num_cmd_elems
200 Maximum
Number of COMMAND Elements True

#
echo vfcs fcs1 | kdb | grep num_cmd_elems

num_cmd_elems:
0xC8location_code: U9119.FHA.87654A1-V20-C10-T1

I change num_cmd_elems to 400 with chdev –P (remember,
the –P flag only updates the AIX ODM, and not the running configuration of the
device in the AIX kernel. You must either reboot for this change to take effect
or offline & online the device).

#
chdev -l fcs1 -a num_cmd_elems=400 -P

fcs1
changed

Now the lsattr command reports num_cmd_elems is set to 400 in the ODM.

Just say you
change the queue_depth on a hdisk
with chdev –P. This updates the devices
ODM information only, not its running configuration. The new value will take
effect next time I reboot the system. So now I have a different queue_depth in the ODM compared to the
devices current running config (in the kernel).

What if I
forget that I’ve made this change to the ODM and forget to reboot the system
for many months? Someone complains of an I/O performance issue....I check the
queue_depths and find they appear to be set appropriately but I still see disk
queue full conditions on my hdisks. But have I rebooted since changing the values?

How do I
know if the ODM matches the devices running configuration?

For example,
I start with a queue_depth of 3,
which is confirmed by looking at lsattr
(ODM) and kdb (running config)
output:

# lsattr -El
hdisk6 -a queue_depth

queue_depth 3 Queue DEPTH
True

# echo
scsidisk hdisk6 | kdb | grep queue_depth

ushort queue_depth =
0x3;
< In Hex.

Now I change
the queue_depth using chdev –P i.e. only updating the ODM.

# chdev -l
hdisk6 -a queue_depth=256 -P

hdisk6
changed

# lsattr -El
hdisk6 -a queue_depth

queue_depth 256 Queue DEPTH
True

kdb reports that the disks running
configuration still has a queue_depth of
3.

# echo
scsidisk hdisk6 | kdb | grep queue_depth

ushort queue_depth = 0x3;

Now if I varyoff
the VG and change the disk queue_depth,
both lsattr (ODM) and kdb (the running config) show the same
value:

# umount
/test

# varyoffvg
testvg

# chdev -l
hdisk6 -a queue_depth=256

hdisk6
changed

# varyonvg
testvg

# mount
/test

# lsattr -El
hdisk6 -a queue_depth

queue_depth 256 Queue DEPTH
True

# echo
scsidisk hdisk6 | kdb | grep queue_depth

ushort queue_depth =
0x100;
< In Hex = Dec 256.

# echo
"ibase=16 ; 100" | bc

256

This is one
way of checking you’ve rebooted since you changed your queue_depth attributes. I’ve tried this on AIX 6.1 and 7.1 only.

Last year I was working for a customer that was upgrading several AIX 5.3 systems to AIX 6.1. The migrations were successful for the most part, but we did encounter one issue that took a little time to resolve.

The customer was using nimadm to migrate. This process worked fine, however, on a couple of systems a strange error was encountered after the migration. The LPAR was booted into AIX 6.1 and everything came up fine. The applications were started and users started accessing the system.

It was several days later, when the AIX administrator attempted to configure new storage on the AIX system, when the first sign of trouble appeared. He had asked his Storage guy to assign a couple of new disks to his LPAR (via NPIV/VFC). As soon as the Storage admin had completed the assignment, the AIX admin ran cfgmgr to detect and configure the new hdisks. Immediately, cfgmgr reported the following error:

Method error (/usr/lib/methods/cfgscsidisk):

0514-023 The specified device does not exist in the

customized device configuration database.

Initially, the AIX team suspected there was some fault with either the storage device or the zoning of the disk. Both of these items were checked and doubled-checked and were found to be OK. Our next step was to run cfgmgr again, but this time we wanted a greater level of detail captured. To do this we used the following environment variable to force cfgmgr to be ‘more verbose’.

# export CFGLOG="cmd,meth,lib,verbosity:9"

We ran cfgmgr and went to the /var/adm/ras/cfglog file to view the results with the alog command. However, we noticed that the cfglog file had a size of zero (0) and contained no data.

# cd /var/adm/ras

# ls –l cfglog

-rw-r----- 1 root system 0 May 16 13:22 cfglog

We decided to recreate the cfglogalog file and run mkdev again to reproduce the disk configuration error.

# rm cfglog

# echo "Create cfglog `date`"|alog -t cfg

# mkdev -l hdisk0

Method error (/usr/lib/methods/cfgscsidisk):

0514-023 The specified device does not exist in the

customized device configuration database.

This time we found some useful data in the cfglog file.

# alog -t cfg -o

MS 31981804 28835876 /usr/lib/methods/cfgscsidisk -l hdisk39

M4 31981804 Parallel mode = 0

M4 31981804 Get CuDv for hdisk39

M4 31981804 Get device PdDv, uniquetype=disk/fcp/htcvspmpio

M4 31981804 Get parent CuDv, name=fscsi0

M4 31981804 ..is_mpio_capable()

M4 31981804 Device is MPIO

M4 31981804 ..get_paths()

M4 31981804 Getting CuPaths for name='hdisk39'

M4 31981804 Found 1 paths

M0 31981804 cfgcommon.c 225 mpio_init error, rc=23

MS 28835892 31981568 /usr/lib/methods/cfgscsidisk -l hdisk0

M4 28835892 Parallel mode = 0

M4 28835892 Get CuDv for hdisk0

M4 28835892 Get device PdDv, uniquetype=disk/fcp/htcvspmpio

M4 28835892 Get parent CuDv, name=fscsi0

M4 28835892 ..is_mpio_capable()

M4 28835892 Device is MPIO

M4 28835892 ..get_paths()

M4 28835892 Getting CuPaths for name='hdisk0'

M4 28835892 Found 2 paths

M0 28835892 cfgcommon.c 225 mpio_init error, rc=23

MS 25690326 27328608 /usr/lib/methods/cfgscsidisk -l hdisk0

M4 25690326 Parallel mode = 0

M4 25690326 Get CuDv for hdisk0

M4 25690326 Get device PdDv, uniquetype=disk/fcp/htcvspmpio

M4 25690326 Get parent CuDv, name=fscsi0

M4 25690326 ..is_mpio_capable()

M4 25690326 Device is MPIO

M4 25690326 ..get_paths()

M4 25690326 Getting CuPaths for name='hdisk0'

M4 25690326 Found 2 paths

M0 25690326 cfgcommon.c 225 mpio_init error, rc=23

The configuration method was attempting to configure a disk device type of htcvspmpio (which was correct) but it was unable to configure the device paths (mpio_init error rc=23). We suspected that the system was missing some sort device driver support for the type of storage in use.

Cutting a very long story short, we determined, with the help of the IBM AIX support team, that the issue stemmed from “old” AIX installation media used to create the AIX 6.1 TL6 SP5 SPOT and lppsource on the NIM master. Old AIX 6.1 media was originally used (several years ago) to create the NIM resources and was gradually updated over time, all the way up to TL6 SP5.

IBM support identified that the older install media contained a liblpp.a file that was missing the necessary PdPathAt ODM files. Newer install media contained a fix to add the appropriate entries to the bos.rte.cfgfiles. e.g.