Chris's AIX Blog

I wanted to mention a new AIX feature, available with AIX 7.1 TL3 (and 6.1 TL9) called the ‘AIX Virtual Ethernet Link Status’ capability. Previous implementations of Virtual Ethernet do not have the ability to detect loss of network connectivity.

For example, if the VIOS SEA is unavailable and VIO clients are unable to communicate with external systems on the network, the Virtual Ethernet adapter would always remain “connected” to the network via the Hypervisors virtual switch. However, in reality, the VIO client was cut off from the external network.

This could lead to a few undesirable problems, such as, a) needing to provide an IP address to ping for Etherchannel (or NIB) configurations to force a failover during a network incident, lacking the ability to auto fail-back afterwards, b) unable to determine total device failure in the VIOS and c) PowerHA fail-over capability was somewhat reduced as it was unable to monitor the external network “reach-ability”.

The AIX VEA Link Status feature provides a way to overcome the previous limitations. The new VEA device will periodically poll the VIOS/SEA using L2 packets (LLDP format). The VIOS will respond with its physical device link status. If the VIOS is down, the VIO client times out and sets the uplink status to down.

To enable this new feature you’ll need your VIO clients to run either AIX 7.1 TL3 or AIX 6.1 TL9. Your VIOS will need to be running v2.2.3.0 at a minimum (recommend 2.2.3.1). There’s no special configuration required on the VIOS/SEA to support this feature. On the VIO client, you’ll find two new device attributes that you can configure/tune. These attributes are:

poll_uplink (yes, no)

poll_uplink_int (100ms – 5000ms)

Here’s some output from the lsattr and chdev commands on my test AIX 7.1 TL3 partition that show these new attributes.

This feature is still considered “new” but I’m very interested to see how this will integrate with PowerHA in the future. Perhaps the use of “Single Adapter” configurations with PowerHA will become more robust (allowing PowerHA to track and respond to network events)...…and possibly more prevalent. You can find more information on this feature here:

I finally got PowerVP installed in my lab environment today. As part of the server agent installation process I needed to install the latest service pack (SP1) for PowerVP. After I downloaded the service pack from IBM Fix Central, I tried to run the install script from the command line on my AIX partition. It failed everytime I ran it, no matter what options I supplied the installer script!

[root@gibbo]/tmp/cg # ./PowerVP.bin.SP1 –i Silent

Preparing to install...

Extracting the installation resources from the installer archive...

Configuring the installer for this system's environment...

Launching installer...

Graphical installers are not supported by the VM. The console mode will be used instead...

The installer cannot run in this UI mode. To specify the interface mode, use the -i command-line option, followed by the UI mode identifier. The valid UI modes identifiers are GUI, Console, and Silent.

The only way I could get the SP installed was to use the GUI installer. I installed VNC server on my AIX partition, connected to the VNC server and then ran the installer script (as shown in the following screenshots). This worked fine.

“A system copy WPAR is a system WPAR that is created by copying the files from the root volume group of an existing AIX system or an AIX system backup image.”

So, you can either use a mksysb image of an LPAR or use an LPARs current rootvg to create a WPAR. This is interesting. The main difference between this type of WPAR and a standard shared system WPAR is the fact that the System Copy WPAR uses the existing configuration files to create the instance. Files such as /etc/passwd, /etc/hosts and so on, are copied in to the new WPAR. A standard shared system WPAR would create a fresh AIX instance e.g. no users, no entries in /etc/hosts, etc.; you’d need to configure everything from scratch.

“A system copy WPAR contains configured files and file systems directly from its source. A system copy WPAR differs from a standard system WPAR because it contains the files and file systems from the root volume group of the source system. A standard WPAR is created as a newly installed system by installing new and unconfigured root parts of filesets into a default set of files.”

Let’s try it. I’ll create a new System Copy WPAR using a mksysb image of my AIX 7.1 LPAR. I have existing user accounts on the LPAR (gibbo) and I want this (along with other system configuration) to be migrated across to the new WPAR.

This new feature appears to be an efficient method of converting/migrating existing LPARs in to WPARs. This may advantage customers that are considering WPARs for their AIX environment and have been looking for an easy way to transition to a WPAR strategy. If you have existing AIX 6.1 and/or 7.1 LPARs at recent TL levels, this may be something to think about.

Specifies the number of threads in threaded mode, where the value of the thread parameter is 1. This value applies only when the thread mode is enabled. The nthreads attribute can be set to any value between 1 and 128. The default value is 7.

Queue size (queue_size)

Specifies the queue size for the Shared Ethernet Adapter (SEA) threads in threaded mode where the value of the thread parameter is 1. This attribute indicates the number of packets that can be accommodated in each thread queue. This value applies only when the thread mode is enabled. When you change this value, the change does not take effect until the system restarts. The queue_size attribute can be set to any value between 2 and 65535. The default value is 8192.

Hash algorithms (hash_algo)

Specifies the hash algorithm that is used to assign connections to Shared Ethernet Adapter (SEA) threads in threaded mode, where the value of the thread parameter is 1. When the hash_algo parameter is set to 0 (the default), an addition operation of the source and destination Media Access Control (MAC) addresses, IP addresses, and port numbers is done. When the hash_algo parameter is set to 1, a murmur3 hash function is done instead of an addition operation. The murmur3 hash function is slower, but it achieves better distribution. This value applies only when the thread mode is enabled.

Number of concurrent partition mobility operations for the mover service partition

True

concurrency_lvl

3

Concurrency level

True

lpm_msnap_succ

1

Create a mini-snap (when a migration ends, the set of information related to a specific migration, that is gathered and packed on each mover service partition involved in the migration), for successful migrations

Specify fibre channel ports using vios_fc_port_name. Run the lslparmigr command to show a list of available slot IDs for a VIOS partition. Run the migrlpar command to accomplish the following tasks:

Specify virtual slot IDs for one or more virtual adapter mappings.

Validate the specified slot IDs.

Note: You can specify the port name of the Fibre Channel to be used for creating Fibre Channel mapping on the source server when you are performing partition migration.

You can use the HMC command line interface to specify the port name. List all the valid port names of the Fibre Channel by running the lsnports command. From the list of the valid port names, specify the port name that you want to use by running the migrlpar command with the attribute

vios_fc_port_name specifying the port name you want to use.

The following attributes of pseudo device can be modified by using the migrlpar command:

num_active_migrations_configured

concurr_migration_perf_level

Run the following HMC command to modify the attribute values of the pseudo device, for example to set the number of active migrations to 8 run:

The virtual adapter is not ready to be moved. The source virtual Ethernet is not bridged.

2

The virtual adapter can be moved with less capability. All virtual local area networks (VLAN) are not bridged on the destination. Hence, the virtual Ethernet adapter has less capability on the target system compared to the source system.

3

The stream ID is still in use.

64

The migmgr command cannot be started.

65

The stream ID is invalid.

66

The virtual adapter type is invalid.

67

The virtual adapter DLPAR resource connector (DRC) name is not recognized.

68

The virtual adapter method cannot be started, or it was prematurely terminated.

69

There is a lack of resources (that is, the ENOMEM error code).

80

The storage that is being used by the adapter is specific to the VIOS and cannot be accessed by another VIOS. Hence, the virtual adapter cannot complete the mobility operation.

81

The virtual adapter is not configured.

82

The virtual adapter cannot be placed in a migration state.

83

The virtual devices are not found.

84

The virtual adapter VIOS level is insufficient.

85

The virtual adapter cannot be configured.

86

The virtual adapter is busy and cannot be unconfigured.

87

The virtual adapter or device minimum patch level is insufficient.

88

The device description is invalid.

89

The command argument is invalid.

90

The virtual target device cannot be created because of incompatible backing device attributes. Typically, this is because of a mismatch in the maximum transfer (MTU) size or SCSI reserve attributes of the backing device between the source VIOS and the target VIOS.

91

The DRC name passed to the migration code is for an adapter that exists.

You can use the chhwres command, on the HMC, to “free up” allocated processor and memory resources from your logical partitions.

I may want to do this if I need to improve the affinity placement of a partition on a POWER7 system or if I need to free up resources from trial capacity on demand in order to re-use the resources with On/Off capacity instead.

Here’s an example of “freeing up” some processor resources from a partition.

I’d like to remove the processor allocation from my AIX partition named 750lpar6. This partition is currently active and has 0.5 processing units assigned to it.

We all know that nimadm and multibos don’t play well together (at least that’s the case right now). If you're planning on migrating a system to a newer version of AIX (6.1 to 7.1 for example) you must ensure that your BOS LVs are reset to their traditional names and that all standby BOS instances have been removed. It’s clear that if you plan on using nimadm, to migrate a NIM client to a new version of AIX, you should first remove any and all instances/remnants of multibos, on the client system, before you start.

What’s not immediately clear is that even the NIM master should be cleansed of all evidence of multibos activity.

For example, let’s say you use multibos to install a new TL and SP on your NIM master e.g. AIX 7.1 TL1 SP4 to AIX 7.1 TL1 SP5. You reboot the system and run from the standby BOS instance (leaving the original instance in rootvg, just in case you need to back out).

You then attempt to migrate a NIM client, using nimadm, to a new version of AIX. The operation fails at phase 8, with the following error message:

+---------------------------------------------------------------+

Executing nimadm phase 8.

+---------------------------------------------------------------+

Creating client boot image.

ls: 0653-341 The file /dev/hd5 does not exist.

At first you might think that this error relates to the NIM client. You check the client system…..there’s absolutely no evidence of multibos on the system. Good.

The issue lies with the fact that the NIM master is enabled for multibos and running from the standby instance. Restarting the NIM master, on the original instance, booting from hd5, rather than bos_hd5 and then removing the standby instance (multibos –R), will resolve the issue. Now, nimadm will run without error.

When you import a volume group, the importvg command will populate the /etc/filesystems file based on the logical volume minor number order (which is stored in the VGDA on the physical volume/hdisk). If someone manually edits the /etc/filesystems, then its contents will no longer match the order contained in the VGDA of the physical volume. This can become a problem the next time someone attempts to export and import a volume group. Essentially they may end up with file systems over-mounted and what appears to be the loss of data!

Here’s a quick example of the problem.

Let’s create a couple of new file systems; /fs1 and /fs1/fs2. I’ll deliberately create them in the “wrong” order.

# mklv -tjfs2 -y lv2 cgvg 1

lv2

# crfs -vjfs2 -dlv2 -Ayes -u fs -m /fs1/fs2

File system created successfully.

65328 kilobytes total disk space.

New File System size is 131072

# mklv -tjfs2 -y lv1 cgvg 1

lv1

# crfs -vjfs2 -dlv1 -Ayes -u fs -m /fs1

File system created successfully.

65328 kilobytes total disk space.

New File System size is 131072

Hmmm, lv2 appears before lv1 in the output from lsvg. The first indication of a potential problem!

# lsvg -l cgvg

cgvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

lv2 jfs2 1 1 1 closed/syncd /fs1/fs2

loglv00 jfs2log 1 1 1 closed/syncd N/A

lv1 jfs2 1 1 1 closed/syncd /fs1

Whoops! /fs1 should be mounted before /fs1/fs2!!! Doh!

# mount -t fs

# mount | tail -2

/dev/lv2 /fs1/fs2 jfs2 Jul 19 23:07 rw,log=/dev/loglv00

/dev/lv1 /fs1 jfs2 Jul 19 23:07 rw,log=/dev/loglv00

Data in /fs1/fs2 is now hidden and inaccessible. The /fs1 file system has over-mounted the /fs1/fs2 file system. This could look like data loss i.e. someone removed all the files from the file system.

# df -g | grep fs

/dev/lv2 - - - - - /fs1/fs2

/dev/lv1 0.06 0.06 1% 4 1% /fs1

The file systems are listed in the wrong order in /etc/filesystems as well. Double Doh!

# tail -15 /etc/filesystems

/fs1/fs2:

dev = /dev/lv2

vfs = jfs2

log = /dev/loglv00

mount = true

type = fs

account = false

/fs1:

dev = /dev/lv1

vfs = jfs2

log = /dev/loglv00

mount = true

type = fs

account = false

No problem. I’ll just edit the /etc/filesystems file and rearrange the order. Simple, right?

# vi /etc/filesystems

/fs1:

dev = /dev/lv1

vfs = jfs2

log = /dev/loglv00

mount = true

type = fs

account = false

/fs1/fs2:

dev = /dev/lv2

vfs = jfs2

log = /dev/loglv00

mount = true

type = fs

account = false

Let’s remount the file systems in the correct order.

# umount -t fs

# mount -t fs

# df -g | grep fs

/dev/lv1 0.06 0.06 1% 5 1% /fs1

/dev/lv2 0.06 0.06 1% 4 1% /fs1/fs2

That looks better now, doesn’t it!? I’m happy now.....although, lsvg still indicates there could be a potential problem here…

# lsvg -l cgvg

cgvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

lv2 jfs2 1 1 1 open/syncd /fs1/fs2

loglv00 jfs2log 1 1 1 open/syncd N/A

lv1 jfs2 1 1 1 open/syncd /fs1

All is well, until one day someone exports the VG and re-imports it, like so:

# varyoffvg cgvg

# exportvg cgvg

# importvg -y cgvg hdisk2

cgvg

# mount -t fs

# mount | tail -2

/dev/lv2 /fs1/fs2 jfs2 Jul 19 23:07 rw,log=/dev/loglv00

/dev/lv1 /fs1 jfs2 Jul 19 23:07 rw,log=/dev/loglv00

Huh? What’s happened here!? I thought I fixed this before!?

Try to avoid this situation before it becomes a problem (for you or someone else!) in the future. If you discover this issue whilst creating your new file systems, remove the file systems and recreate them in the correct order. Obviously, try to do this before you place any data in the file systems. Otherwise you may need to back up and restore the data!

# mklv -tjfs2 -y lv1 cgvg 1

lv1

# crfs -vjfs2 –dlv1 -Ayes -u fs -m /fs1

File system created successfully.

65328 kilobytes total disk space.

New File System size is 131072

# mklv -tjfs2 -y lv2 cgvg 1

lv2

# crfs -vjfs2 –dlv2 -Ayes -u fs -m /fs1/fs2

File system created successfully.

65328 kilobytes total disk space.

New File System size is 131072

You may be able to detect this problem, prior to importing a volume group, by using the lqueryvg command. Looking at the output in the “Logical” section, you might be able to ascertain a potential LV and FS mount order issue.

# lqueryvg -Atp hdisk2 | grep lv

0516-320 lqueryvg: Physical volume hdisk2 is not assigned to

a volume group.

Logical: 00f603cd00004c000000013ff2fc1388.1 lv2 1

00f603cd00004c000000013ff2fc1388.2 loglv00 1

00f603cd00004c000000013ff2fc1388.3 lv1 1

Once you’ve identified the problem you can fix the issue retrospectively (once the VG is imported) by editing /etc/filesystems. Of course, this is just a temporary fix until someone exports and imports the VG again, in which case the mount order issue will occur again.

The essential message here is do NOT edit the /etc/filesystems file by hand when creating file systems.

Note: Try this on a crash’n’burn system before unleashing it’s
fury on a real AIX system (i.e. one that has users that depend on it!). Always
take a mksysb backup before performing this type of activity.

aixlpar1 :
/tmp # ksh -x fixmyrootvg.ksh

+ + lslv -l
hd5

+ grep hdisk

+ head -1

+ awk {print
$1}

PV=hdisk0

+ VG=rootvg

+ lqueryvg
-Lp hdisk0

+ awk {
print $2 }

+ read
LVname

+ odmdelete
-q name = hd5 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd5 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd5 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd5 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd6 -o CuAt

0518-307
odmdelete: 4 objects deleted.

+ odmdelete
-q name = hd6 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd6 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd6 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd8 -o CuAt

0518-307
odmdelete: 3 objects deleted.

+ odmdelete
-q name = hd8 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd8 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd8 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd4 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd4 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd4 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd4 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd2 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd2 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd2 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd2 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd9var -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd9var -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd9var -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd9var -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd3 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd3 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd3 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd3 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd1 -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd1 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd1 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd1 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd10opt -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd10opt -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd10opt -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd10opt -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = local -o CuAt

0518-307
odmdelete: 4 objects deleted.

+ odmdelete
-q name = local -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = local -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = local -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd7 -o CuAt

0518-307
odmdelete: 3 objects deleted.

+ odmdelete
-q name = hd7 -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd7 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd7 -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = hd11admin -o CuAt

0518-307
odmdelete: 5 objects deleted.

+ odmdelete
-q name = hd11admin -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = hd11admin -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q dependency = hd11admin -o CuDep

0518-307
odmdelete: 1 objects deleted.

+ read
LVname

+ odmdelete
-q name = rootvg -o CuAt

0518-307
odmdelete: 3 objects deleted.

+ odmdelete
-q parent = rootvg -o CuDv

0518-307
odmdelete: 0 objects deleted.

+ odmdelete
-q name = rootvg -o CuDv

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q name = rootvg -o CuDep

0518-307
odmdelete: 0 objects deleted.

+ odmdelete
-q dependency = rootvg -o CuDep

0518-307
odmdelete: 0 objects deleted.

+ [ rootvg =
rootvg ]

+ odmdelete
-q value1 = 10 -o CuDvDr

0518-307
odmdelete: 1 objects deleted.

+ odmdelete
-q value3 = rootvg -o CuDvDr

0518-307
odmdelete: 0 objects deleted.

+ importvg -y rootvg hdisk0

rootvg

0516-012 lvaryoffvg: Logical
volume must be closed.If the logical

volume contains a filesystem, the
umount command will close

the LV device.

0516-942 varyoffvg: Unable to
vary off volume group rootvg.

+ varyonvg rootvg

+ synclvodm -Pv rootvg

synclvodm:
Physical volume data updated.

synclvodm:
Logical volume hd5 updated.

synclvodm:
Logical volume hd6 updated.

synclvodm:
Logical volume hd8 updated.

synclvodm:
Logical volume hd4 updated.

synclvodm:
Logical volume hd2 updated.

synclvodm:
Logical volume hd9var updated.

synclvodm:
Logical volume hd3 updated.

synclvodm:
Logical volume hd1 updated.

synclvodm:
Logical volume hd10opt updated.

synclvodm:
Logical volume hd7 updated.

synclvodm:
Logical volume hd11admin updated.

+ savebase

Hey Presto! The root volume
group is now named rootvg just the
way we like it!

Starting with AIX 6.1 TL08 and AIX 7.1
TL02 there’s a new AIX CPU tuning feature called “Scaled Throughput” mode. This is supported on POWER7 and POWER7+ processors
only (do not try this on POWER6!). This
new mode has the ability to dispatch workload to more SMT threads per VP,
avoiding the need to unfold additional VPs. I’ve heard it described as being more
“POWER6 like”. I’m not suggesting that you use this feature. This post simply
discusses what this new mode can do.

By default
AIX (on POWER7) operates in “Raw
Throughput” mode. This mode provides the best performance per thread per core.
It offers the best response times. It utilises more cores (VPs) to process a
systems workload. By comparison, “Scaled
Throughput” provides a greater level of per core throughput (processing) by dispatching
more SMT threads on a core. This has the effect of utilising fewer VPs/cores.
In this mode, more (or all) SMT threads per core will be utilised before dispatching
workload to other VPs/core in the system.

The schedo
tuning command can be used to enable the new mode using a new parameter called vpm_throughput_mode. e.g.

# schedo –p –o
vpm_throughput_mode=X

This tunable can be set to one of the
following values:

0 = Legacy Raw mode (default).

1= Scaled or “Enhanced Raw” mode with a higher
threshold than legacy.

2 = Scaled mode, use primary and secondary SMT
threads.

4 = Scaled mode, use all four SMT threads.

At this stage, this tunable is not
restricted, but if you plan on experimenting with it please be careful; make
sure you understand how this tuning may impact your system; always test new
tuning in a non-production environment first!

I performed a couple of quick tests today,
just to see what impact tuning the parameter would have on an AIX 7.1 TL2 system.

aixlpar1 : /
# oslevel -s

7100-02-01-1245

aixlpar1 : /
# lsconf | grep Mode

System
Model: IBM,9119-FHB

Processor
Implementation Mode: POWER 7

I started some CPU intensive workload.

NamePIDCPU%PgSp OwnerCliV20WPAR Total0

ncpu4718826 15.7108K dbSerV30Press: "h"-help

ncpu7143668 15.7108K dbCliV30"q"-quit

ncpu8126488 15.7108K db

ncpu5832920 15.6108K db

The vpm_throughput_mode
parameter was left at the default value (0).

# schedo -a
| grep vpm_throughput_mode

vpm_throughput_mode = 0

As expected, the workload was evenly dispatched
across each of the primary SMT threads of the 4 VPs assigned to the partition
i.e. logical CPU 0, 4, 8
and 12. None of the secondary or
tertiary SMT threads were active. This is the default mode and will provide the
greatest raw throughput
(performance) per VP (as there’s no overhead associated with enabling secondary
or tertiary SMT threads).

Topas
Monitor for host:aixlpar1EVENTS/QUEUESFILE/TTY

Fri Dec 21
11:06:34 2012Interval:2Cswitch184Readch864

Syscall299Writech651

CPUUser% Kern% Wait% Idle%PhyscReads111Rawin0

099.70.30.00.00.63Writes1Ttyout233

10.20.50.099.30.12Forks0Igets0

20.00.00.0 100.00.12Execs0Namei9

30.00.00.0 100.00.12Runqueue4.00Dirblk0

4100.00.00.00.00.63Waitqueue0.0

50.00.00.0 100.00.12MEMORY

60.00.00.0 100.00.12PAGINGReal,MB4096

70.00.00.0 100.00.12Faults0% Comp22

8100.00.00.00.00.63Steals0% Noncomp2

90.00.00.0 100.00.12PgspIn0% Client2

100.00.00.0 100.00.12PgspOut0

110.00.00.0 100.00.12PageIn0PAGING SPACE

12100.00.00.00.00.63PageOut0Size,MB2048

130.00.00.0 100.00.12Sios0% Used0

140.00.00.0 100.00.12% Free100

150.00.00.0 100.00.12NFS (calls/sec)

Next, we enabled scaled throughput mode (2).
The workload slowly migrated to logical CPUs 4, 5, 8 and 9. So now only two primary SMT threads were
active (lcpu’s 4 and 8) and two secondary threads were active (lcpu’s 5 and 9).
All the processing was being performed by fewer VPs (almost like POWER6).

# schedo -p
-o vpm_throughput_mode=2

Topas
Monitor for host:aixlpar1EVENTS/QUEUESFILE/TTY

Fri Dec 21
11:07:36 2012Interval:2Cswitch179Readch935

Syscall301Writech794

CPUUser% Kern% Wait% Idle%PhyscReads112Rawin0

014.160.30.025.60.00Writes2Ttyout304

15.732.80.061.50.00Forks0Igets0

20.01.60.098.40.00Execs0Namei10

30.03.00.097.00.00Runqueue4.00Dirblk0

4100.00.00.00.00.47Waitqueue0.0

5100.00.00.00.00.47MEMORY

60.00.00.0 100.00.03PAGINGReal,MB4096

70.00.00.0 100.00.03Faults0% Comp22

8100.00.00.00.00.47Steals0% Noncomp2

9100.00.00.00.00.47PgspIn0% Client2

100.00.00.0 100.00.03PgspOut0

110.00.00.0 100.00.03PageIn0PAGING SPACE

120.050.60.049.40.00PageOut0Size,MB2048

150.013.30.086.70.00Sios0% Used0

140.00.80.099.20.00%
Free100

150.00.80.099.20.00NFS (calls/sec)

And finally, I tried scaled mode with all
four SMT threads (4). All of the workload was migrated to a single VP but all 4
SMT threads were being utilised (primary SMT thread lcpu 0, secondary/tertiary
SMT threads 1, 2 &3). This mode offers lower overall core consumption but
has the (possibly negative) side effect of enabling more SMT threads on a
single VP/core, which may not perform as well as the same workload evenly dispatched
to 4 individual VPs/cores (on primary SMT threads).

# schedo -p
-o vpm_throughput_mode=4

Topas
Monitor for host:aixlpar1EVENTS/QUEUESFILE/TTY

Fri Dec 21
11:08:30 2012Interval:2Cswitch199Readch385

Syscall149Writech769

CPUUser% Kern% Wait%
Idle%PhyscReads2Rawin0

099.70.30.00.00.25Writes1Ttyout375

199.90.10.00.00.25Forks0Igets0

2100.00.00.00.00.25Execs0Namei6

3100.00.00.00.00.25Runqueue4.00Dirblk0

40.048.80.051.20.00Waitqueue0.0

50.05.50.094.50.00MEMORY

60.03.60.096.40.00PAGINGReal,MB4096

70.03.30.096.70.00Faults0% Comp22

80.077.10.022.90.00Steals0% Noncomp2

90.052.90.047.10.00PgspIn0% Client2

100.044.80.055.20.00PgspOut0

110.038.50.061.50.00PageIn0PAGING SPACE

120.051.70.048.30.00PageOut0Size,MB2048

130.08.90.091.10.00Sios0% Used0

140.07.00.093.00.00%
Free100

150.07.40.092.60.00NFS (calls/sec)

For more information on “Scaled Throughput” mode, take a look at
the following presentation:

A colleague of mine was planning to modify the max_xfer_size attribute on a couple of
FC adapters in one of his AIX LPARs. As he was describing his plan to me, I
asked him how he intended to back out of the change should the LPAR fail to
boot after the modifications. “But, what could
possibly go wrong?” he fired back. I advised him to use multibos to create a standby (backup)
instance of the AIX OS, just in case. He begrudgingly did so, just to keep me
happy.

The next day he told me the following
tale.

He had modified the FC adapters max_xfer_size attribute as planned.
First, checking the current values, for the attribute on both adapters.

aixlpar1
: / # lsattr -El fcs0 -a max_xfer_size

max_xfer_size
0x100000 Maximum Transfer Size True

aixlpar1
: / # lsattr -El fcs1 -a max_xfer_size

max_xfer_size
0x100000 Maximum Transfer Size True

He’d created a standby AIX instance before
making changes to the adapters. He also prevented multibos from changing the bootlist to the standby boot logical
volume (BLV).

Then he manually changed the LPARs boot
list to include the standby BLV.

aixlpar1
: / # bootlist -m normal hdisk2 blv=hd5
hdisk2 blv=bos_hd5

aixlpar1
: / # bootlist -m normal -o

hdisk2
blv=hd5 pathid=0

hdisk2
blv=hd5 pathid=1

hdisk2 blv=bos_hd5 pathid=0

hdisk2 blv=bos_hd5 pathid=1

He carefully recorded the bootlist output, just in case the boot
failed with new max_xfer_size values.
He could use the vdevice name and location
to manually select the standby BLV to start the system in an emergency.

The cause of the 554 hang appeared to be related
to the fact that the VIOS physical adapters needed their max_xfer_size value changed to the new value before the client LPAR
virtual fibre channel adapters were modified.

In a previous post
I discussed how you can identify some of the different types of a PowerVM Capacity
on Demand (CoD) activation keys from IBM.

Recently I had to Activate Memory Expansion (AME) on a
couple of POWER7 systems. I discovered that all of the keys contained a similar
string. It appears that if a CoD key contains the string CA1F0000000800then it is safe to assume it will activate
AME for a particular system. e.g.

9741EF3AE6969F17CA1F0000000800419D

937A1240F00F5B05CA1F0000000800413D

And while I’m talking about AME, I thought I’d share this
tip as well.

I was performing a demo of AME for my team and wanted to
change the AME expansion factor using DLPAR during the demo. I did not want to
use the HMC GUI but rather the HMC command line (as it’s faster).

To change the expansion factor for an LPAR (that’s enabled
for AME), you can use the chhwres
command from the HMC CLI.

During the demo I highlighted the current (running)
expansion factor for the LPAR (using the lshwres
command).

Here’s an example of converting
rootvg file systems from JFS to JFS2 using alt_disk_copy.

My lab system was migrated from AIX
5.3 to 7.1 via nimadm. Unfortunately, nimadm does not convert JFS file
systems to JFS2 during the migration. So, in this case, even though I’ve
migrated to AIX 7.1 (which is a good thing) I’m still left with legacy JFS file
systems in rootvg.

And because the AIX 5.3 version of
the alt_disk_copy command does not
have the –T option, I can’t convert my JFS file systems to JFS2 before I
migrate to AIX 7.1. So my best option is to migrate to AIX 7.1 then convert
rootvg to JFS2 file systems. A few hops in the process but it’s good enough.

aixlpar1 : /
# lsvg -l rootvg

rootvg:

LV NAMETYPELPsPPsPVsLV STATEMOUNT POINT

hd5boot111closed/syncdN/A

hd6paging32321open/syncdN/A

hd8jfslog111open/syncdN/A

hd4jfs441 open/syncd/

hd2jfs29291open/syncd/usr

hd9varjfs20201open/syncd/var

hd3jfs1641641open/syncd/tmp

hd1jfs441open/syncd/home

hd10optjfs441open/syncd/opt

localjfs441open/syncd/usr/local

loglvjfs441open/syncd/var/log

hd7sysdump991open/syncdN/A

hd71sysdump991open/syncdN/A

hd11adminjfs221open/syncd/admin

aixlpar1 : /
# oslevel -s

7100-01-01-1141

I clone rootvg to a spare disk using
alt_disk_copy and the –T flag (which will convert the file
systems to JFS2). The process converts the file systems to JFS2, as shown in
the section below (highlighted in green).

aixlpar1 : /
# alt_disk_copy
-d hdisk1 -T

Source boot
disk is: hdisk2

jfs2j2: Current data file
/image.data moved to /image.data.acct.save.3735802.

Before I reboot the system on the
alternate rootvg I verify that the cloned volume group now contains JFS2 file
systems only. I “wake up” the altinst_rootvg and run the lsvg command to confirm the file system is correct. I then put the
altinst_rootvg to “sleep”, reboot the system and verify all rootvg file systems
are mounted as jfs2.

aixlpar1 : /
# alt_rootvg_op -W -d hdisk1

Waking up
altinst_rootvg volume group ...

aixlpar1 : /
# lsvg -l altinst_rootvg

altinst_rootvg:

LV NAMETYPELPsPPsPVsLV STATEMOUNT POINT

alt_hd5boot111closed/syncdN/A

alt_hd6paging32321closed/syncdN/A

alt_hd8jfs2log111open/syncdN/A

alt_hd4jfs2441open/syncd/alt_inst

alt_hd2jfs229291open/syncd/alt_inst/usr

alt_hd9varjfs220201open/syncd/alt_inst/var

alt_hd3jfs21641641open/syncd/alt_inst/tmp

alt_hd1jfs2441open/syncd/alt_inst/home

alt_hd10optjfs2441open/syncd/alt_inst/opt

alt_localjfs2441open/syncd/alt_inst/usr/local

alt_loglvjfs2441open/syncd/alt_inst/var/log

alt_hd7sysdump991closed/syncdN/A

alt_hd71sysdump991closed/syncdN/A

alt_hd11adminjfs2221open/syncd/alt_inst/admin

aixlpar1 : /
# alt_rootvg_op -S altinst_rootvg

Putting
volume group altinst_rootvg to sleep ...

forced
unmount of /alt_inst/var/log

forced
unmount of /alt_inst/var

forced
unmount of /alt_inst/usr/local

forced
unmount of /alt_inst/usr

forced
unmount of /alt_inst/tmp

forced
unmount of /alt_inst/opt

forced
unmount of /alt_inst/home

forced
unmount of /alt_inst/admin

forced
unmount of /alt_inst

Fixing LV
control blocks...

Fixing file
system superblocks...

aixlpar1 : /
#

; Reboot on
the alternate rootvg hdisk

aixlpar1 : /
# uptime

10:18AMup 1 min,1 user,load average: 0.32, 0.09, 0.03

aixlpar1 : /
# lspv

hdisk100c342c637f21a59rootvgactive

hdisk200c342c6161c6b47old_rootvg

aixlpar1 : /
# df

Filesystem512-blocksFree %UsedIused %Iused Mounted on

/dev/hd452428841156822%30407% /

/dev/hd2380108853717686%3408935% /usr

/dev/hd9var262144024135208%35542% /var

/dev/hd321495808214763121%1101% /tmp

/dev/hd15242885224561%901% /home

/proc-----/proc

/dev/hd10opt52428826656050%513315% /opt

/dev/local5242884953206%2491% /usr/local

/dev/loglv5242885220401%491% /var/log

/dev/hd11admin2621442613841%71% /admin

aixlpar1 : /
# lsvg -l rootvg

rootvg:

LV NAMETYPELPsPPsPVsLV STATEMOUNT POINT

hd5boot111closed/syncdN/A

hd6paging32321open/syncdN/A

hd8jfs2log111open/syncdN/A

hd4jfs2441open/syncd/

hd2jfs229291open/syncd/usr

hd9varjfs220201open/syncd/var

hd3jfs21641641open/syncd/tmp

hd1jfs2441open/syncd/home

hd10optjfs2441open/syncd/opt

localjfs2441open/syncd/usr/local

loglvjfs2441open/syncd/var/log

hd7sysdump991open/syncdN/A

hd71sysdump991open/syncdN/A

hd11adminjfs2221open/syncd/admin

===

The message “filesystem
not converted” (below)
is not related to the JFS to JFS2 conversion. This messge refers to whether or
not the file system needs to be changed to use Variable Inode Extents (VIX).
This is the default setting for JFS2 file systems.

Why would I want to convert rootvg
to JFS2 anyway? Well for starters, it’s generally considered best practice to
use JFS2 as it offers several performance & scalability enhancements over
JFS. For example, you cannot create files greater than 2GB on JFS, unless the
file system was created as “large (big) file” enabled; jfs file systems in
rootvg were never created as large file enabled.

Another reason…..eventually JFS will
be retired.

Here’s an example of a potential
problem with JFS in rootvg. You try to create a file of a size greater than 2GB
in /tmp (type jfs). Even though the ulimit settings are not restricting the
creation of a file of this size, the JFS file system will not allow it. The
file creation process fails. The bf attribute
for the /tmp file system is set to false.
This indicates the file system is not “large file” enabled.

I’ve shared
my tips for resolving DLPAR problems in the past. So this week, when one of my
colleagues was experiencing an issue with DLPAR, I referred him to my blog post
and suggested he follow the troubleshooting steps. He did so and I went about
my business. Later that same day I asked him how he had fared. He told me that
DLPAR was still not working on his particular AIX LPAR. It was an AIX 5.3
system and he was attempting to another Virtual Processor to the LPAR. He
expressed his frustration with the situation, so I offered to take a look for
him.

What I
found was that the system was missing an important fileset. A fileset that
enabled DLPAR operations on AIX 5.3 systems. The fileset in question was named csm.client. Without this fileset
installed DLPAR would never work.

I advised
my colleague of the problem and suggested he follow the steps below to resolve
the issue. After he reinstalled the fileset, RMC communication between the HMC
and LPAR was restored and his DLPAR processor add operation completed without
issue.

1. Mount the NIM masters lpp_source file system:

aix53lpar1
: / # mount nim1:/export/lpp_source /mnt

2. Verify CSM filesets are not installed and the IBM.DRM subsystem is either inoperative
or missing.

This entry is
similar in theme to one of my previous posts
about verifying your hdisk queue_depth settings with kdb. This time we want to check if an attribute for a Virtual FC (VFC)
adapter has been modified and whether or not AIX has been restarted since the
change. The attribute I’m interested in is num_cmd_elems.
This value is often changed from its default settings, in AIX environments, to improve
I/O performance on SAN attached storage.

From
kdb you can identify the VFC
adapters configured on an AIX system using the vfcs subcommand. Not only does this tell you what adapters you
have, but it also identifies the VIOS each adapter is connected to and the
corresponding vfchost adapter. Nice!

(0)>
vfcs

NAMEADDRESSSTATEHOSTHOST_ADAPOPENED NUM_ACTIVE

fcs00xF1000A00103D40000x0008vio1vfchost100x010x0000

fcs10xF1000A00103D60000x0008vio2vfchost100x010x0000

You can view
the current (running) configuration of a VFC adapter using the kdb vfcs subcommand and the name of the
VFC adapter, for example fcs1:

Using the
output from this command we can determine the current (running) value for a
number of VFC attributes, including num_cmd_elems.

So I start
with an adapter with a num_cmd_elems
value of 200. Both the lsattr command
and kdb report 200 (C8 in hex) for num_cmd_elems.

#
lsattr -El fcs1 -a num_cmd_elems

num_cmd_elems
200 Maximum
Number of COMMAND Elements True

#
echo vfcs fcs1 | kdb | grep num_cmd_elems

num_cmd_elems:
0xC8location_code: U9119.FHA.87654A1-V20-C10-T1

I change num_cmd_elems to 400 with chdev –P (remember,
the –P flag only updates the AIX ODM, and not the running configuration of the
device in the AIX kernel. You must either reboot for this change to take effect
or offline & online the device).

#
chdev -l fcs1 -a num_cmd_elems=400 -P

fcs1
changed

Now the lsattr command reports num_cmd_elems is set to 400 in the ODM.

Here are some questions I received recently regarding VLAN
tagging on the VIO server. My answers are shown in green.

“Hi Chris,

Q: I’m trying to
understand when, where and why there would be the need to use ‘mkvdev –vlan
(etc.) on the VIOS, and I’m wondering whether you would be able to clarify this
for me, please.

Is it necessary to add
the VLAN tag devices to the SEA, or is it suffice to just have them defined
within the Virtual Ethernet itself which is part of the SEA?”

A:
It is suffice to simply define the VLAN ids assigned to the Virtual Ethernet
adapters associated with the SEA.

“Q: For completeness,
on the rare occasions I have done this, I have added the VLAN’s to the Virtual
Ethernet and also as VLAN devices on the VIOS (mkvdev –vlan etc.)”

A:
mkvdev –vlan is not necessary, unless
the VIOS needs to communicate with hosts on different VLANs i.e. you need an IP
address on the VIOS for each VLAN. This does not mean the SEA will bridge this
VLAN traffic for VIOCs.

“Q: The reason I
started thinking of this is, is because one of our customers wants to add new
VLAN’s to their SEA, but they’re not running Power7 hardware. Therefore, the
online method would be to add a new Virtual Adapter which contains the new VLAN
ID’s to the VIOS using DLPAR, then use chdev –dev (etc.) on the SEA to include
the new Virtual Ethernet.”

A:
Agreed. The “IBM PowerVM Virtualization Managing and Monitoring” Redbook
states: “If your system doesn’t support dynamic VLAN modifications and you are
modifying the VLAN list of a virtual Ethernet adapter that is configured in a
SEA with ha_mode enabled, the HMC will not allow you to reconfigure the list of
VLANs on that interface. You will need to add an additional virtual Ethernet
adapter and modify the virt_adapters list of the SEA, or modify the profile of
both Virtual I/O Servers and re-activate both Virtual I/O Servers at the same
time.”

“Q: From the phone
call I had, it would appear that the VLAN tags are included on the Virtual
Ethernet device, but have not been added to the SEA by running mkvdev –vlan
(etc. ) on the VIOS’s. This leads me to assume that the ‘mkvdev –vlan’ is
only required if there is a requirement to access the VIOS itself from a
particular VLAN. Am I right, or is there something I’m not understanding?
I’m unable to find documentation that explains the answer. Do you happen to
know?”

A:
That is also my understanding (based on my experience). On page 483 of the “IBM
PowerVM Virtualization Introduction and Configuration” Redbook , it states:
“The addition of VLAN interfaces to the SEA adapter is only necessary if the VIO
Server itself needs to communicate on these VLANs”.

“Q: Hi Chris,

We are trying to
associate a new entX Virtual Ethernet Trunk Device to an existing SEA. The new
device must be configured for VLAN tagging. The existing virtual Ethernet
adapter that (is already associated with the SEA) is not configured for VLAN
tagging. This device will remain associated to the SEA and continue to pass
untagged packets to the already configured network.

Ultimately the configuration
we want would be two entX devices associated with the existing SEA. One entX
device is configured for notagged packets and the other entX device is
configured for tagging.

Reply: “hmm ok I see
what you are saying, I will give it a go and tell you how it turns out...thanks.
ok finally got around to testing using a VIOS at DR site. Created
new virtual adapter PVID 55 and VID 888 (ent9) then added it to the existing
SEA as shown below: