I updated my
lab VIOS to the latest fix pack (V2.2.2.1)
this week and thought I’d try the new VIOS part
command. This new command is an improved version of the existing vios_advisor tool. The major difference
between the two is the fact that the new tool is now included with VIOS code
and will be updated via new VIOS fix packs. The following link has some
information on using the command:

This new
tool “Provides performance reports with
suggestions for making configurational changes to the environment, and helps to
identify areas for further investigation. The reports are based on the key
performance metrics of various partition resources that are collected from the Virtual I/O Server (VIOS)
environment.” Just like the old VIOS advisor.

I ran the
tool for 10 minutes on my idle VIOS, just to see how the new XML report looked.

I then scp’ed the tar file to my
laptop, extracted it and opened the vios_advisor_report.xml
file. This is what the report looked like:

I was also able to open the nmon file using the nmon analyser tool. It produced typical
nmon performance graphs as you’d
expect.

So, not only does the new part tool run the VIOS advisor it also capturesnmon performance data at the same time.

This is rather impressive and a great
move by IBM. The original VIOS advisor tool was free and of course not
officially supported by IBM (although the development team were very responsive
to requests from users of the tool). The new tool is fully supported by the IBM
team and as a result will only get better and better as time goes by. I’m not a
fan of the new command name, part (good luck trying to google
it!), I still prefer vios_advisor
but hey, what’s in a name, right?

One of my customers was configuring a new AIX 5.3 Versioned WPAR when they came across a very interesting issue. I thought I’d share the experience here, just in case anyone else comes across the problem. We configured the VWPAR to host an old application. The setup was relatively straight forward, restore the AIX 5.3 mksysb into the VWPAR and export the data disk from the Global into the VWPAR, import the volume group and mount the file systems. Job done! However, we noticed some fairly poor performance during application load tests. After some investigation we discovered that disk I/O performance was worse in the VWPAR than on the source LPAR. The question was, why?

We initially suspected the customers SAN and/or the storage subsystem, but both of these came back clean with no errors or configuration issues. In the end, the problem was related to a lack of ODM attributes in the PdAt object class, which prevented the VWPAR disk from using the correct queue depth setting.

Let me explain by demonstrating the problem and the workaround.

First, let’s add a new disk to a VWPAR. This will be used for a data volume group and file system. The disk in question is hdisk3.

# uname -W

0

# lsdev -Cc disk

hdisk0 Available Virtual SCSI Disk Drive

hdisk1 Available Virtual SCSI Disk Drive

hdisk2 Defined Virtual SCSI Disk Drive

hdisk3 Available Virtual SCSI Disk Drive <<<<<<

We set the disk queue depth to an appropriate number, in this case 256.

Note: This value will differ depending on the storage subsystem type, so check with your storage team and/or vendor for the best setting for your environment.

# chdev -l hdisk3 -a queue_depth=256

hdisk3 changed

Using the lsattr command, we verify that the queue depth attribute is set correctly in both the ODM and the AIX kernel.

# lsattr -El hdisk3 -a queue_depth

queue_depth 256 Queue DEPTH True

# lsattr -Pl hdisk3 -a queue_depth

queue_depth 256 Queue DEPTH True

We can also use kdb to verify the setting in the kernel. Remember at this stage, we are concentrating on hdisk3, which is referenced with a specific kernel device address in kdb.

From the output above, we can see that the queue depth is correctly i.e. set to 0x100 in Hex (256 in decimal).

Next, we export hdisk3 to the VWPAR using the chwpar command. The disk, as expected, enters a Defined state in the Global environment. It is known as hdisk1 in the VWPAR.

# chwpar -D devname=hdisk3 p8wpar1

# lswpar -D p8wpar1 | head -2 ; lswpar -D p8wpar1 | grep hdisk

Name Device Name Type Virtual Device RootVG Status

-------------------------------------------------------------------

p8wpar1 hdisk3 disk hdisk1 no EXPORTED <<<<<<

p8wpar1 hdisk2 disk hdisk0 yes EXPORTED

[root@gibopvc1]/ # lsdev -Cc disk

hdisk0 Available Virtual SCSI Disk Drive

hdisk1 Available Virtual SCSI Disk Drive

hdisk2 Defined Virtual SCSI Disk Drive

hdisk3 Defined Virtual SCSI Disk Drive

In the VWPAR, we run cfgmgr to discover the disk. We create a new data volume group (datavg) and file system (datafs) for application use (note: the steps to create the VG and FS are not shown below). This is for demonstration purposes only; the customer imported the data volume groups on their system.

We perform a very simple I/O test in the /datafs file system. We write/create a 1GB file and time the execution. We noticed immediately that the task took longer than expected.

# cd /datafs

# time lmktemp Afile 1024M

Afile

real 0m7.22s <<<<<<<<<<<<<<< SLOW?

user 0m0.04s

sys 0m1.36s

We ran the iostat command, from the Global environment, and noticed that “serv qfull” was constantly non-zero (very large numbers) for hdisk3. Essentially the hdisk queue was full all the time. This was bad and unexpected, given the queue depth setting of 256!

Now comes the interesting part. With a little help from our friends in IBM support, using kdb we found that the queue depth was reported as being set to 1 in the kernel and not 256! You’ll also notice here that the hdisk name has changed from hdisk3 to hdisk1. This happened as a result of exporting hdisk3 to the VWPAR. The disk is known as hdisk1 in the VWPAR (not hdisk3) but the kernel address is the same.

Fortunately, IBM support was able to provide us with a workaround. The first step was to add the missing vparent PdAt entry to the ODM in the Global environment.

# cat addodm_pdat_for_vparent.txt

PdAt:

uniquetype = "wio/common/vparent"

attribute = "naca_1_spt"

deflt = "1"

values = "1"

width = ""

type = "R"

generic = ""

rep = "n"

nls_index = 0

# odmadd addodm_pdat_for_vparent.txt

# odmget PdAt | grep -p "wio/common/vparent"

PdAt:

uniquetype = "wio/common/vparent"

attribute = "naca_1_spt"

deflt = "1"

values = "1"

width = ""

type = "R"

generic = ""

rep = "n"

nls_index = 0

We did the same in the VWPAR.

# clogin p8wpar1

# uname -W

11

# odmget PdAt | grep -p "wio/common/vparent"

#

# odmadd addodm_pdat_for_vparent.txt

# odmget PdAt | grep -p "wio/common/vparent"

PdAt:

uniquetype = "wio/common/vparent"

attribute = "naca_1_spt"

deflt = "1"

values = "1"

width = ""

type = "R"

generic = ""

rep = "n"

nls_index = 0

In the VWPAR, we removed the hdisk and then discovered it again, ensuring that the queue depth attribute was set to 256 in the ODM.

# uname –W

11

# rmdev -dl hdisk1

hdisk1 deleted

# cfgmgr

# lspv

hdisk0 00f94f58a0b98ca2 rootvg active

hdisk1 none None

# lsattr -El hdisk1 –a queue_depth

queue_depth 256 Queue DEPTH True

# odmget CuAt | grep -p queue

CuAt:

name = "hdisk1"

attribute = "queue_depth"

value = "256"

type = "R"

generic = "UD"

rep = "nr"

nls_index = 12

Back in the Global environment we checked that the queue depth was set correctly in the kernel. And it was!

# uname -W

0

# echo scsidisk 0xF1000A01C014C000 | kdb | grep queue_depth

ushort queue_depth = 0x100;

We re-ran the simple I/O test and immediately found that the test ran faster and the hdisk queue (for hdisk3, as shown by iostat from the Global environment) was no longer full. Subsequent application load tests showed much better performance.