Chris's AIX Blog

Further to my previous post on viostat, Dan Braden (from IBM ATS) provided me with one of his new presentations; Monitoring, Measuring Bandwidth And Tuning SAN Storage From AIX . In this document Dan presents several ways for monitoring virtual FC adapter usage statistics.

I just saw your article in your blog on the subject. There are 3 ways to look at real FC adapter thruput that I'm aware of:

interactive nmon then press ^

nmon recordings run thru the NMON analyzer

fcstat

Slide 45 shows an example where I had only NPIV traffic flowing thru the adapters and I used the "^" and "a" commands in interactive nmon to show both reports. The FC adapter report (getting its statistics from the adapter driver) shows the real thruput, while the adapter report (getting its statistics from the hdisk driver) doesn't show any IO.”

Dan sent a follow up email stating:

“....there is now a new option for fcstat (the -n <WWPN> option) which will display the statistics for a virtual port on the adapter. “

In the most recent VIOS version (2.2.2.2) the fcstat command has been updated to support new options. The -n flag can display the statistics on a virtual port level specified by the WWPN of the virtual adapter, for example:

$ fcstat -n C050760547E90000 fcs0

The -client flag will display the statistics of the virtual adapter per client, for example:

After
updating my NIM master to AIX 7.1 TL2 SP1 (7100-02-01-1245), I noticed a
problem. Whenever I installed a new AIX partition using NIM, the resources
allocated to the NIM client were not
being de-allocated, even though the installation was completing successfully. Also,
if I tried to run my usual ‘NIM client reset’ script (below), the resources
were still allocated.

#!/usr/bin/ksh

# Reset a
NIM client.

if [[
"$1" = "" ]] ; then

echo Please specify a NIM client to reset
e.g. aixlpar1.

else

if lsnim -l $1 > /dev/null 2>&1 ;
then

nim -o reset -F $1

nim -Fo de-allocate -a subclass=all $1

nim -Fo change -a cpuid= $1

else

echo Not a valid NIM client!?

fi

fi

For
example, here’s my NIM client with the lpp_source,
mksysb and SPOT resources assigned to it (even though the AIX install
completed OK).

root@nim1 :
/ # lsnim -l aixlpar1

aixlpar1:

class= machines

type= standalone

connect= shell

platform= chrp

netboot_kernel= 64

if1= network1 aixlpar1 0

cable_type1= N/A

Cstate= ready for a NIM operation

prev_state= not running

Mstate= currently running

boot= boot

lpp_source=
lpp_sourceaix710105

mksysb= aixlpar1-71

nim_script= nim_script

spot=
spotaix710105

cpuid= 00C453C75C00

control= master

Cstate_result= success

installed_image = aixlpar1-71

My
workaround was to use 'smit nim_mac_res'
to manually de-allocate resources from
the client:

====

De-allocate Network Install Resources

aixlpar1machinesstandalone

>lpp_sourceaix710105lpp_source

>spotaix710105spot

>aixlpar1-71mksysb

====

It
appears that others were also experiencing this problem. I found the following
thread on the IBM developerWorks AIX user forum:

My team and I have recently
been trying to stream line our AIX disaster recovery process. We’ve been
looking for ways to reduce our overall recovery time. Several ideas were tossed
around such as a) using a standby DR LPAR with AIX already installed and using
rsync/scp to keep the Prod & DR LPARs in sync and b) using alt_disk_copy
(with the –O flag for a device reset) to clone rootvg to an alternate disk
which is then replicated to DR. These methods may work but are cumbersome to
administer and (in the case of alt_disk_copy) require additional (permanent) resources
on every production system. With over 120 production instances of AIX, the disk
space requirements start to add up.

So far we’ve concluded that
the best way to achieve our goal is by using SAN replicated rootvg volumes at
our DR site.

Our current DR process relies
on recovery of AIX systems from mksysb images from a NIM master. All our data
(non-rootvg) LUNs are already replicated to our DR site. The aim was to change the
process and ‘recover’ our AIX images using replicated rootvg LUNs. This will
reduce our overall recovery time at DR (which is crucial if we are to meet the proposed
recovery time objectives set by our business). Based on current IBM
documentation we were relatively comfortable with the proposed approach. The
following IBM developerWorks article (originally
published in 2009 and updated in late 2010) describes “scenarios in which remapping, copying, and reuse of SAN disks is
allowed and supported. More easily switch AIX environments from one system to
another and help achieve higher availability and reduced down time. These
scenarios also allow for fast deployment of new systems using cloning.”

The document focuses on fully virtualised environments
that utilise shared processors and VIO servers. One area where this document is
currently lacking in information is the use of NPIV and virtual fibre channel
adapters in a DR scenario. We reached out to our contacts in the AIX
development space and asked the following question:

“Hoping you can help us find some statements regarding support
for a replicated rootvg environment using NPIV/Virtual Fibre Channel adapters?
The following IBM developerWorks article discusses VSCSI and we are looking for
something similar for NPIV.http://www.ibm.com/developerworks/aix/library/au-AIX_HA_SAN/index.html
My guess is that restrictions similar to those for physical FC adapters will
apply here? But I'm hoping that given the adapters are virtual the limitations
may be relaxed.
Are you aware of any statement regarding support (or not) for booting from
another system using a disk subsystem image of rootvg replicated to another
disk subsystem when using NPIV? And what, if any, additional requirements/restrictions
may apply when using NPIV?”

We received the following
responses:

“There are some
additional considerations when using NPIV for booting from a replicated rootvg.
With NPIV the client partitions has virtual Fibre Channel adapter ports,
but has physical access to the actual (physical) disk devices. There may
be an increased chance of needing to update the boot list via the Open Firmware
SMS menu. Since the clients have access to the actual disks, you have the
possibility of running multipathing software besides AIX MPIO. If you
are using a multipathing software to manage the NPIV attached disks besides AIX
MPIO, then you should contact the vendor that provided the software to check
their support statement,

Since one
or more of the physical devices will change when booting from an NPIV
replicated rootvg, it is recommend to set the ghostdev attribute. The
ghostdev attribute will trigger when it detects the AIX image is booting from
either a different partition or server. Ghostdev attribute should not
trigger during LPM operations (Live Partition Mobility). Once triggered,
ghostdev will clear the customized ODM database. This will cause detected
devices to be discovered as new devices (with default settings), and avoid the
issue with missing/stale device entries in ODM. Since ghostdev does clear
the entire customized ODM database, this will require you import your data
(non-rootvg) volume groups again, and perform any (device) attribute
customization. To set ghostdev, run "chdev -l sys0 -a ghostdev=1". Ghostdev
must be set before the rootvg is replicated.

Similar
with virtual devices, the client partition is booting an existing rootvg where
the hardware may be different. It's possible some applications have
dependency on tracking the actual physical devices (instead of data on the
disks). For example PowerHA, may keep track of a disk for a cluster
health checks. If you do have applications that have a dependency
on tracking physical devices, then additional setup (of those applications) may
be required after the first boot from the replicated rootvg.

We do have
multiple customers using NPIV for such scenarios. I believe most of them
worked with IBM Lab Based Services to assist with implementing such a
configuration, and some of the customers required some custom scripts to
further customize their system after booting from the replicated rootvg.
Those customers set the ghostdev attribute, and had custom scripts to
import their data (non-rootvg) volume groups, and update PowerHA to point to
the new health check disk.

You should
get support for such an NPIV setup with IBM as long as you follow the
considerations listed in the WhitePaper.”

“Development has approved using
NPIV to do this for one customer. Below are more detailed requirements for this
DR strategy using NPIV. If a Disaster Recovery (DR) environment not using
PowerHA Enterprise Editionis used, then
we believe the white paper located http://www.ibm.com/developerworks/aix/library/au-AIX_HA_SAN/index.html provides the guidelines in regards to setup, prerequisites, and
as well as the limitations of such a DR deployment.Deployments as detailed in the white paper
are supported by IBM.However note that
such a deployment has many manually instituted responsibilities on the customer
to setup and maintain such an environment. IBM expects that the customer
carefully manage these manual steps without any mistakes.The White paper does not currently cover
using NPIV in such a DR scenario.

We have the following guidelines
regarding the configuration, which includes NPIV as an option: All of the
system configuration should be virtualized, with the possible exception of disk
devices when using NPIV. If NPIV is
used then AIX MPIO must be used as the multi-pathing solution.If multi-pathing software is used besides AIX
MPIO, then the vendor of that software must be contacted regarding a support
statement. Install AIX (at least minimum required TLs/SPs for desired AIX
version) and software stack (Middleware and applications) on primary systems,
which is compatible with systems at both sites. Primary and secondary sites
should be using systems with similar hardware, same microcode levels, and same
VIOS levels.

Many manual steps are needed to
setup the virtual and physical devices accurately on secondary site VIOS. If Virtual SCSI Disks are being used, then
discover the unique identification on the primary site and map the disks to the
corresponding replication disks. Map the same appropriately on VIOS on
secondary site. Level of VIOS should support attribute to open the secondary devices
passively. This setting needs to be setup correctly on the VIOS on secondary
site. Operating environment should not have subnet dependencies. Manage the
replication relationships accurately. Manually may need to switch the secondary
disks to primary node. Raw disk usage may cause problems. Some middle ware
products may bypass Operating system and use the disk directly. They might have
their own restrictions for this environment.(e.g. anything that is device location code or storage LUN unique ID dependant
may have issues when the cloned image is restarted on the secondary system with
replicated storage).

Set the "ghostdev"
attribute using chdev command (must be done on the primary). This attribute can
be set using the command "chdev -l sys0 –a ghostdev=1". The ghostdev
attribute will delete customized ODM database on rootvg when AIX detects it has
booted from a different LPAR or system. If the "ghostdev" attribute
is not set, then the booting from the alternate site will result in devices in
ODM showing up in "Defined" or "Missing" state.

After a failover to secondary
site, may need to reset the boot device list for each LPAR before the boot of
the LPAR using SMS menus of the firmware Note that this is not an exhaustive
list of issues. Refer to the white paper and study the same as it applies to
the environment. So as long as they are using MPIO, you're OK.If not using MPIO and some OEM storage, then
the storage vendor must also support it.”

While these responses indicate that this form of
recovery is supported by IBM, we were still looking to IBM for clarity on the
support position. It has been noted that other IBM customers have had mixed
responses when contacting AIX support for feedback and assistance with this
type of DR procedure. And it’s not hard to see why when you read statements
like this from the “Supported Methods of Duplicating an AIX
System” document:

“Unsupported
Methods

1. Using a bitwise copy of a rootvg disk
to another disk.

This bitwise copy can be a one-time snapshot copy such as flashcopy, from one
disk to another, or a continuously-updating copy method, such as Metro Mirror.

While these methods will give you an exact duplicate of the installed AIX
operating system, the copy of the OS may not be bootable. A typical scenario
where this is tried is when one system is a production host and there is a
desire to create a duplicate system at a disaster recovery site in a remote
location.

2. Removing the rootvg disks from one
system and inserting into another.

This also applies to re-zoning SAN disks that contain the rootvg so another
host can see them and attempt to boot from them.

Why don't these methods work?

The reason for this is there are many objects in an AIX system that are unique
to it; Hardware location codes, World-Wide Port Names, partition identifiers,
and Vital Product Data (VPD) to name a few. Most of these objects or
identifiers are stored in the ODM and used by AIX commands.

If a disk containing the AIX rootvg in one system is copied bit-for-bit (or
removed), then inserted in another system, the firmware in the second system
will describe an entirely different device tree than the AIX ODM expects to
find, because it is operating on different hardware. Devices that were
previously seen will show missing or removed, and usually the system will
typically fail to boot with LED 554
(unknown boot disk).”

So, as a secondary objective,
we have been working closely with our local IBM representatives to obtain some
surety from IBM that our proposed DR strategy for AIX is fully supported by
both the AIX development and support teams.

With that in mind I’ll
provide an overview of our new DR approach and hope that it offers others insight
to alternative method for recovery and also to assist IBM in further
understanding what some of the “larger” AIX customers are looking for in terms
of simplified AIX disaster recovery.

What follows is a detailed
description of our IBM AIX, PowerVM/Power Systems environment, the proposed
recovery steps and other items for consideration.

·- Please refer to the following table for a summary of
the environment details.

Our Recovery Procedure:

1.- Change the sys0ghostdev attribute value to 1 on the source production AIX system.
Set the "ghostdev" attribute using chdev command (must be done on the
primary). This attribute can be set using the command "chdev -l sys0 –a ghostdev=1". The
ghostdev attribute will delete customized ODM database on rootvg when AIX
detects it has booted from a different LPAR or system.

2.- Take note of the
source systems rootvg hdisk PVID.

3.- Select the source
production rootvg LUN for replication on the HDS VSP.

4.- Replicate the LUN
from the production site to the DR HDS VSP.

5.- In a DR test, suspendHDSreplication from production to DR.

6.- Assign the
replicated LUN to the target LPAR on the DR POWER6 595 i.e. map the LUN to the
WWPN of the virtual FC adapter on the DR LPAR.

7.- Attempt to boot
the DR LPAR using the replicated rootvg LUN. If necessary, enter SMS menu to
update the boot list i.e. select the correct boot disk, check for the same PVID
as the source host.

8.- Once the LPAR has
successfully booted, the AIX administrator would configure the necessary
devices i.e. import data volume groups, configure network interfaces, etc. This
may also be scripted for execution during the first boot process.

9.- Please refer
to the following diagrams for a visual representation of the proposed process.

Some Notes/Caveats:

The following is a list of
items that we understand are possible limitations and issues with our new DR
process.

·- Booting from
replicated rootvg disks may fail for several reasons, such as, a) there is
unexpected corruption in the replicated LUN image due to rootvg not being
quiesced during replication or b) there is a unidentified issue with the AIX
system that is only apparent the next time the system is booted; this could be
mis-configuration by the administrator or some other unforseen problem.

·- In the event that
an LPAR fails to boot via a replicated rootvg LUN, a backup method is available
for recovery. Switching back to manual NIM mksysb restore provides a sufficient
backup should the replicated rootvg be unusable.

·- If the
"ghostdev" attribute is not set, then booting from the DR site will
result in devices in the ODM showing up in a "Defined" or
"Missing" state.

· -
Once a DR test is
completed, the DR LPAR should be de-activated immediately so that SAN disk
replication can be restarted between production and DR. Failure to perform this
step may result in the DR LPAR failing as a result of file system corruption.

·- At present we are
using AIX MPIO only. There is discussion of using HDLM in the future. We will
contact HDS for a support statement regarding booting from replicated rootvg
LUNs with HDLM installed.

·- The ghostdev
attribute is not implemented in AIX 5.3. AIX 5.3 is no longer supported*.

So far all of our testing has
been successful. We verified that we could replicate an SOE rootvg image of AIX
6.1 and 7.1 to DR and successfully boot an LPAR using the replicated disk.
Based on these tests there doesn’t appear to be anything stopping us from using
this method for DR purposes. The following table outlines the different
versions of AIX we tested and the results.

Once the system was booted we
needed to perform some post boot configuration tasks. These tasks were handled
by two scripts that were called from /etc/inittab. On the source system we
installed the new scripts (in /etc) and added new entries to the /etc/inittab
file. These scripts only run if the systemid matches that of the DR systemid.Note: Only partial contents of each script
are shown below…but you get the idea.

echo "$MYNAME: The systemid
$LSATTR_SYSTEMID_DR does not match the expected DR systemid of
$DR_SYSTEMID."

echo "$MYNAME: This script should only
be executed at DR."

echo "$MYNAME: If you are not booting
the system at the DR site, then you can ignore this message."

echo "$MYNAME: No changes have been
perfomed. Script is exiting."

fi

The ghostdev attribute essentially provides us with a clean ODM
and allows the system to discover new devices and build the ODM from scratch.
If you attempt to boot from a replicated rootvg disk without first setting the ghostdev attribute, your system may fail to boot (hang at LED 554)
because of a new device tree and/or missing devices. You might be able to recover from this
situation (without restoring from mksysb) by performing the steps outlined on
pages 16-20 of the following document (thanks to Dominic Lancaster at IBM for
the presentation).

Recent releases of AIX installation media
(for 7.1 and 6.1) now contain the OpenSSH base installation filesets. This is
very handy; we no longer need to download or locate the software from other
sources.

One thing to consider is what this means
for future AIX migrations.

If you are migrating a system (that already
has a version of SSH installed) to AIX 7.1 then you may notice that the first
time you attempt to connect to the server (after the 7.1 migration) the
following ssh message appears:

root@nim1
: / # ssh aixlpar1

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

@WARNING: REMOTE HOST IDENTIFICATION HAS
CHANGED!@

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

IT
IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!

Someone
could be eavesdropping on you right now (man-in-the-middle attack)!

It
is also possible that a host key has just been changed.

The
fingerprint for the RSA key sent by the remote host is

59:68:05:71:60:b5:d1:96:87:df:f6:9c:ca:9a:14:3e.

Please
contact your system administrator.

Add
correct host key in /.ssh/known_hosts to get rid of this message.

Offending
RSA key in /.ssh/known_hosts:17

RSA
host key for aixlpar1 has changed and you have requested strict checking.

Host key
verification failed.

In the
output above I’m attempting to SSH from another system to the newly migrated
AIX 7.1 LPAR. This is essentially informing us that the SSH host keys on the
AIX 7.1 server don’t match the host key stored in the local systems
/.ssh/known_hosts file. Something has changed.

Now of
course I could simply accept this change and update my known_hosts files, like
so:

root@nim1
: / # ssh-keygen -R aixlpar1

/.ssh/known_hosts
updated.

Original
contents retained as /.ssh/known_hosts.old

With
known_hosts updated, I’m able to SSH to the AIX 7.1 system successfully.

cgibson@nim1
: /home/cgibson $ ssh aixlpar1 date

Mon
Aug 20 19:44:20 EET 2012

But that’s
just for my SSH known_hosts file only. What about all the users that connect to
this system via SSH/SFTP/SCP? Do I really expect all of them to update their
known_hosts file with the new host key information?

This could
create problems for automated tasks, like file transfers. If these transfers
stop working then their could be “hell to pay”. So the question I’m often asked
is what can I do to prevent this from happening in the first place? Luckily
there is a way.

In this
example, we are using nimadm to migrate from AIX 5.3 to 7.1. The
AIX 7.1 lpp_source resource was created using the AIX 7.1 installation media
DVDs. All filesets were copied from the DVDs, verbatim, to the new 7.1
lpp_source resource on the NIM master.

First we
verify that the openssh* filesets are in fact in the AIX 7.1 lpp_source on the
NIM master.

root@nim1
: / # nim -o showres lpp_sourceaix710101 | grep -i ssh

openssh.base.client5.4.0.6100IN usr,root

openssh.base.client5.8.0.6101IN usr,root

openssh.base.server5.4.0.6100IN usr,root

openssh.base.server5.8.0.6101IN usr,root

openssh.man.en_US5.4.0.6100IN usr

openssh.man.en_US5.8.0.6101IN usr

openssh.msg.EN_US5.8.0.6101IN usr

openssh.msg.en_US5.4.0.6100IN usr

openssh.msg.en_US5.8.0.6101IN usr

On the NIM
client (running AIX 5.3), we verify there is an older version of SSH already
installed. The migration will remove these filesets (and the associated
/etc/ssh_host_* files). The newer version of SSH will be installed and new
ssh_host_key* files will be generated (hence the problem with the remote SSH
clients known_hosts files no longer holding the correct host keys).

Rather than
update these filesets manually after the migration, you can include this step
as a post migration task with nimadm.

An
alternative way to work around this problem (after the fact) would be to
restore the original ssh_host_key* files from a backup.For example, I copied the original
ssh_host_key* files to my home directory before starting the AIX migration.

aixlpar1
: / # cd /etc

aixlpar1
: /etc # cp -pr ssh /home/cgibson/ssh_orig/

In the
output below, I discover that my ssh_host_key* files have all been recreated
during the migration.

aixlpar1
: /etc/ssh # ls -ltr

total
352

-rw-r--r--1 rootsystem1288 May 01
2007ssh_config

-rw-r--r--1 rootsystem1155 May 04
2007sshd_banner

-rw-r--r--1 rootsystem2867 Oct 29
2008sshd_config

-rw-r-----1 rootsystem7 Aug 20 21:00
sshd.pid

-rw-r--r--1 rootsystem2341 Aug 20 21:19
ssh_prng_cmds

-rw-------1 rootsystem132839 Aug 20 21:19
moduli

-rw-r-----1 rootsystem382 Aug 20 21:45
ssh_host_rsa_key.pub

-rw-------1 rootsystem1679 Aug 20 21:45
ssh_host_rsa_key

-rw-r-----1 rootsystem630 Aug 20 21:45
ssh_host_key.pub

-rw-------1 rootsystem965 Aug 20 21:45
ssh_host_key

-rw-r-----1 rootsystem590 Aug 20 21:45
ssh_host_dsa_key.pub

-rw-------1 rootsystem668 Aug 20 21:45
ssh_host_dsa_key

I copy the
original files back to the /etc/ssh directory. The sshd subsystem is also
restarted to pick up the updated ssh_host* files.

aixlpar1
: /etc/ssh # cp -p /home/cgibson/ssh_orig/ssh_host_* .

aixlpar1
: /etc/ssh # ls -ltr

total
352

-rw-r--r--1 rootsystem210 Feb 03
2006ssh_host_rsa_key.pub

-rw-------1 rootsystem887 Feb 03 2006ssh_host_rsa_key

-rw-r--r--1 rootsystem319 Feb 03
2006ssh_host_key.pub

-rw-------1 rootsystem515 Feb 03
2006ssh_host_key

-rw-r--r--1 rootsystem590 Feb 03
2006ssh_host_dsa_key.pub

-rw-------1 rootsystem668 Feb 03
2006ssh_host_dsa_key

-rw-r--r--1 rootsystem1288 May 01
2007ssh_config

-rw-r--r--1 rootsystem1155 May 04
2007sshd_banner

-rw-r--r--1 rootsystem2867 Oct 29 2008sshd_config

-rw-r-----1 rootsystem7 Aug 20 21:00
sshd.pid

-rw-r--r--1 rootsystem2341 Aug 20 21:19
ssh_prng_cmds

-rw-------1 rootsystem132839 Aug 20 21:19
moduli

aixlpar1
: /etc/ssh # stopsrc -s sshd

0513-044
The sshd Subsystem was requested to stop.

aixlpar1
: /etc/ssh # startsrc -s sshd

0513-059
The sshd Subsystem has been started. Subsystem PID is 3997822.

Until recently, if you were
configuring a new LPAR with virtual FC adapters you couldn’t force it to log
into the SAN before an operating system (such as AIX) was installed. I’ve
written about this before (see link below). I also offered a way to work around
this issue.

I’ve successfully used this method
on both POWER6 (595) and POWER7 (795) systems. After configuring a new LPAR
profile with a single VFC adapter, the VIOS reported that the client was not
logged into the SAN:

If you run out of space in the root
file system, odd things can happen when you try to map virtual devices to
virtual adapters with mkvdev.

For example, a colleague of mine was
attempting to map a new hdisk to a vhost adapter on a pair of VIOS. The VIOS
was running a recent version of code. He received the following error message
(see below). It wasn’t a very helpful message. At first I thought it was due to
the fact that he had not set the reserve_policy
attribute for the new disk to no_reserve
on both VIOS. Changing the value for that attribute did not help.

I found the
same issue on the second VIOS i.e. a full root file system due to a core file
(from cimserver). I also found no trace of a full file system event in the error
report. Perhaps someone had taken it upon themselves to “clean house” at some
point and had removed entries from the VIOS error log.

Make sure
you monitor file system space on your VIOS. Who knows what else might fail if
you run out of space in a critical file system.

Starting with AIX 7.1, CSM is no
longer supported or available. It has been replaced by Distributed Systems
Managment (DSM).Section 5.2 of the IBM AIX 7.1 Differences Guide Redbook provides
details of the new DSM capabilities.

Fortunately DSM still provides access
to the dsh command.I’ve written about how I’ve used this utility
in the past. The new dsh command (and other tools) are
provided in the new DSM filesets named dsm.core
and dsm.dsh.

These filesets are NOT installed by default. You must
manually install them. They can be found on your AIX 7.1 media.

If dsh is something you use, then I recommend you read the section on
DSM in the Redbook. Also take a look at section 5.2.7 Using DSM and NIM,
in which it describes how you can integrate DSM and NIM and completely automate
the installation of AIX:

“The
AIX Network Installation Manager (NIM) has been enhanced to work with the Distributed
System Management (DSM) commands. This integration enables the automatic
installation of new AIX systems that are either currently powered on or off.”

Although I’ve written about the dsh command before, there’s one usage
I’ve not covered.And that is using dsh to manage users across a group of
LPARs. In particular, changing a user’s password.

Before I go any further, I should
state that for the following to work you must first configure ssh keys on your
NIM master (or central mgmt AIX system) so that you can communicate with all of
your AIX systems via SSH, as root,without being prompted for a password. Read my article on dsh to find out how to do this if
necessary.

In the following example, I use dsh from my NIM master. It is my
central point of control for my AIX environment.

My ssh keys for root on my NIM master
have been generated and distributed to all of my LPARs.

root@nim# ssh-keygen -d

Generating public/private dsa key pair.

Enter file in which to save the key (/.ssh/id_dsa):

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /.ssh/id_dsa.

Your public key has been saved in /.ssh/id_dsa.pub.

The key fingerprint is:

ed:18:e9:00:37:13:7c:7c:74:6a:a9:e0:ad:c0:09:a9
root@nim

The key's randomart image is:

+--[ DSA 1024]----+

|... .. .|

|...o .+|

|o .
=. .+|

| . o = = =|

|E+ o
S .|

|.
+ +|

|.
o .|

||

||

+-----------------+

root@nim# ls -ltra

total 40

-rw-------1 rootsystem214 17 Sep 2010authorized_keys

drwxr-xr-x7 rootsystem4096 16 Nov 11:43 ..

-rw-r--r--1 rootsystem3615 16 Nov 12:04 known_hosts

-rw-r--r--1 rootsystem601 16 Nov 12:06
id_dsa.pub

-rw-------1 rootsystem672 16 Nov 12:06 id_dsa

drwx------2 rootsystem256 16 Nov 12:06 .

On my AIX LPARs, the authorized_keys file has been updated with
the public ssh key from my NIM master:

On the NIM master, the root user was
configured for the DSH environment. The following entry was placed in roots .profile:

root@nim# cat /.profile

ENV=$HOME/.kshrc

The following entry was placed in
roots .kshrc file:

root@nim# cat /.kshrc

export
DSH_NODE_RSH=/usr/bin/ssh

export
DSH_NODE_LIST=/usr/local/etc/nodes

A /usr/local/etc/nodes
file was created on the NIM master. This file contains a list of each of the
nodes that dsh can communicate with
from NIM:

root@nim# cat /usr/local/etc/nodes

aixlpar1

aixlpar2

aixlpar3

aixlpar4

aixlpar5

aixlpar6

aixlpar7

aixlpar8

aixlpar9

aixlpar10

aixlpar11

The first time that the dsh command is run against a new host,
the following message will be displayed. dsh
uses the FQDN, and the FQDN needs to be added to the known_hosts file for ssh. Therefore you must make an ssh connection first with FQDN to the
host:

root@nim# dsh uptime

aixlpar1.cg.com.au
: Host key verification failed.

dsh:2617-009 aixlpar1.cg.com.au remote shell had exit code 255

It is necessary to ssh directly to each node using its
FQDN. This step is only required once for each node. For example:

root@nim# ssh aixlpar1.cg.com.au

The authenticity of host 'aixlpar1.cg.com.au (172.1.6.17)' can't be established.

I set the users password to abc123,
using the chpasswd utility. I also
remove the ADMCHG flag so that the
user is not prompted to change their password on their first logon attempt.

root@nim# dsh 'echo cg:abc123 | chpasswd -c'

I confirm that I can logon with the
new user with the specified password, on one of the AIX LPARs.

root@nim# ssh cg@aixlpar1

cg@aixlpar1’s password:

Last login: Thu Mar1 20:05:01 CST 2012 on /dev/pts/1 from aix71

$
id

uid=204(cg)
gid=1(staff)

Another nice feature of dsh is the dshbak utility. This utility presents formatted output from the dsh command. For example:

root@nim 520 [/.ssh]# dsh errpt | dshbak

HOST:
aixlpar1.cg.com.au

------------------------------------

IDENTIFIER TIMESTAMPT C RESOURCE_NAMEDESCRIPTION

AA8AB2411116110811 T O OPERATOROPERATOR NOTIFICATION

A6DF45AA1104135011 I O RMCdaemonThe
daemon is started.

2BFA76F61104134111 T S SYSPROCSYSTEM SHUTDOWN BY USER

9DBCFDEE1104134111 T O errdemonERROR LOGGING TURNED ON

HOST:
aixlpar2.cg.com.au

-------------------------------

IDENTIFIER TIMESTAMPT C RESOURCE_NAMEDESCRIPTION

DE9A52D11111012611 I S rmt10AAA1

4865FA9B1111012211 P H rmt10TAPE
OPERATION ERROR

DE9A52D11110233511 I S rmt0AAA1

4865FA9B1110225511 P H rmt0TAPE
OPERATION ERROR

DE9A52D11109180311 I S rmt0AAA1

4865FA9B1109180011 P H rmt0TAPE
OPERATION ERROR

DE9A52D11108180411 I S rmt2AAA1

4865FA9B1108180211 P H rmt2TAPE
OPERATION ERROR

DE9A52D11108165711 I S rmt6AAA1

4865FA9B1108165111 P H rmt6TAPE
OPERATION ERROR

A22058611102085311 P S SYSPROCExcessive interrupt disablement time

F7FA22C91031134111 I O SYSJ2UNABLE TO ALLOCATE SPACE IN FILE SYSTEM

DE9A52D11030163411 I S rmt0AAA1

4865FA9B1030163411 P H rmt0TAPE
OPERATION ERROR

....etc....

WARNING: Please be VERY CAREFUL when using the dsh command. Issuing
the wrong command can cause damage to all your AIX LPARS!

The dsm.dsh package contains the following utilities:

# lslpp -f
dsm.dsh | grep /usr/bin

/usr/bin/dcp
-> /opt/ibm/sysmgt/dsm/bin/dcp

/usr/bin/dsh
-> /opt/ibm/sysmgt/dsm/bin/dsh

/usr/bin/dping
-> /opt/ibm/sysmgt/dsm/bin/dping

/usr/bin/dshbak
-> /opt/ibm/sysmgt/dsm/bin/dshbak

If you are a
fan of the dping command, you are
going to be disappointed. Although the command is currently included in the dsm.dsh fileset, it probably won’t be
for much longer.

The command
works, “sort of”:

root@nim#
dping aixlpar1

aixlpar1: ping (alive)

But if you
run ‘dping –a’:

root@nim# dping
-a

dping: 2651-095 CSM license has
expired or has not been accepted. Run csmconfig -L if you have installed a new
release.

According to
the developers, dping is no longer
supported and will eventually be removed from the DSM package. The response
from the developers was as follows:

"The
reason "dping -a" is failing with the license check is because the
command is calling “/usr/bin/runact-api –c IBM.DmsCtrl::::isLicenseValid"
and the license is not set. So the command fails. Since CSM is not
supported anymore and went end of life. "

“...
please consider the dping command as being "deprecated" code pending
removal from the dsm.dsh package.”

When I asked why the command was listed in the AIX 7.1 online
documentation if it was no longer available, I was informed: “We are
in the process of working with component owner regarding the DOCs and updating
them.”. At this
stage I’ve not been able to find an alternative command (in AIX). If I find
one, I’ll update this post.

If you are
planning on migrating to AIX 7.1 please be aware that CSM is no longer
supported or available with AIX 7.1. CSM is now ‘end of life’.

I was working
with a customer recently on a Power Blade that was running the Integrated
Virtualisation Manager (IVM). They’d installed a VIO partition onto the Blade
and had hoped to install a couple of AIX LPARs on the system. However they didn’t
get very far.

As soon as they
attempted to NIM install the LPARs, they would get stuck at trying to ping the
NIM master from the client. Basically, the Shared Ethernet Adapter (SEA) was
not working properly and none of the LPARs could communicate with the external
network. So they asked for some assistance.

The Blade server
name was Server-8406-71Y-SN06BF99Z. The SEA was configured as ent7.

On the network
switch port, the native VLAN (PVID), was configured as 11, with VLAN tag 68
added as an allowed VLAN. If the client LPARs tried to access the network using
a PVID of 68, instead of a VLAN TAG of 68, they would get stuck at the switch
port i.e. the un-tagged packets for 10.1.68.X via PVID 11 would fail. The
packets for 10.1.68.X needed to be tagged with VLAN id 68 in order for the
switch to pass the traffic.

So the question
was, how do we add VLAN tags in the IVM environment? If we’d been using a HMC,
then this would be simple to fix. Just add the VLAN tags into the Virtual
Ethernet Adapter used by the SEA and we’d be done.

We had to use the
lshwres and chhwres commands to resolve this one. First we listed the virtual
adapters known to the VIO server (IVM). At slot 12, we found our SEA adapter
with port_vlan_id set to 68 and addl_vlan_ids set to none.

We needed to
change port_vlan_id to 11 and addl_vlan_ids to 68. We also required
the ieee_virtual_eth value set to 1.

First we removed
the existing SEA adapter, as we would not be able to make changes to it while
it was “active”. We then removed the adapter from slot 12 and then re-added it,
again at slot 12, with port_vlan_id
and addl_vlan_ids set to the desired
values.

This little tip was passed on to me by
a friendly IBM hardware engineer many years ago.

When entering a capacity on demand
(CoD) code into a Power system, you can tell how many processors and how much
memory will be activated, just by looking at the code you’ve given by IBM.

For example, the following codes, when
entered for the appropriate Power system, will enable 4 processors (POD) and
64GB of memory (MOD). I can also tell* that once the VET code is entered, this
system will be licensed for PowerVM Enterprise Edition (2C28).

While attending the IBM Power Systems
Symposium this week, I learnt that starting with AIX 7.1 (and AIX 6.1 TL6) JFS2
logging is disabled during a mksysb restore. You may be familiar with disabling
JFS2 logs, if not, take a look at this IBM technical note:

I’ve been unable to find any official
documentation from IBM that mentions this new enhancement to the mksysb restore
process. However, when I checked my own AIX 7.1 system I found the following
statement/code in the /usr/lpp/bosinst/bi_main script:

I was performing a volume group re-org i.e. changing the
INTER-POLICY of a logical volume from minimum to maximum.

# lslv
fixeslv| grep INTER

INTER-POLICY:minimumRELOCATABLE:yes

# chlv
-e x fixeslv

# lslv
fixeslv| grep INTER

INTER-POLICY:maximumRELOCATABLE:yes

I attempted to run the reorgvg
command. I was greeted by the following error message!

# reorgvg
tempvg fixeslv

0516-966 reorgvg:
Unable to create internal map.

I ran the command again, this time with truss. I found that the /usr/sbin/allocp
command was being called and was failing. I determined this must be because of
a lack of space at the logical volume layer.

#
/usr/sbin/allocp -?

/usr/sbin/allocp:
Not a recognized flag: ?

0516-422
allocp: [-i LVid] [-t Type] [-c Copies] [-s Size]

[-k] [-u UpperBound>] [-e
InterPolicy] [-a InterPolicy

The truss output showed:

statx("/usr/sbin/allocp", 0x2FF21ED8, 76, 0)= 0

statx("/usr/sbin/allocp", 0x20009E70, 176, 020) = 0

kioctl(2, 22528, 0x00000000, 0x00000000)Err#25
ENOTTY

kfork()=
3735812

_sigaction(20,
0x00000000, 0x2FF21F20)= 0

_sigaction(20,
0x2FF21F20, 0x2FF21F30)= 0

kwaitpid(0x2FF21F90,
-1, 6, 0x00000000, 0x00000000) = 3735812

And yes, my volume group was indeed out of free PPs!

# lsvg
tempvg

VOLUME
GROUP:
tempvg
VG IDENTIFIER: 00f6027300004c0000000130773bdb73

VG
STATE:
active
PP SIZE: 512 megabyte(s)

VG
PERMISSION: read/write
TOTAL PPs: 99 (50688 megabytes)

MAX
LVs:
256
FREE
PPs: 0 (0 megabytes)

LVs:
1
USED PPs: 99 (50688 megabytes)

OPEN
LVs:
1
QUORUM: 2 (Enabled)

TOTAL
PVs: 2
VG DESCRIPTORS: 3

STALE
PVs:
0
STALE PPs: 0

ACTIVE
PVs:
1
AUTO
ON: yes

MAX
PPs per VG: 32768
MAX PVs: 1024

LTG
size (Dynamic): 256
kilobyte(s) AUTO
SYNC: no

HOT
SPARE:
no
BB POLICY: relocatable

PV
RESTRICTION: none

cgaix7[/opt]
>

Silly me, it clearly states in the reorgvg man page that there must be at least one free PP in the
volume group for the command to run successfully.

2At least one free physical partition
(PP) must exist on the specified volume group for the reorgvg command to run
successfully. For mirrored logical volumes, one free PP per physical
volume (PV) is required in order for the reorgvg command to maintain logical
volume strictness during execution; otherwise the reorgvg command still runs,
but moves both copies of a logical partition to the same disk during its
execution.

So I shrunk the file system in question (there was a large amount
of allocated but unused file system space, so it was safe to shrink it).

I can see when my reorgvg
failed (rc=1) and when it succeded (rc=0). This is also a good way of
determining when a reorgvg command
is issued and when it finished. Of course, an easier way would be to start the reorgvg command with the time command. It will produce a nice
little summary of the time taken.

# time
reorgvg tempvg fixeslv

0516-962
reorgvg: Logical volume fixeslv migrated.

real3m12.94s

user0m1.52s

sys0m4.60s

But if I forgot to use the time
command, I can look at the lvmcfg alog file for an answer. In the following
example, the reorgvg.sh process is
started at 23:49. The entry in the log file begins with an uppercase S. The entry that starts with an
uppercase E indicates the end of the
reorgvg.sh process.It is the information in the third field that
tells me how long the process ran for in seconds:milliseconds.

IBM made some announcements today relating to
their latest POWER7 server offerings. The new line of systems includes new
entry level systems and the highly anticipated high-end system, the POWER7 795!
They also officially outlined some of
the new features available in AIX 7.1. You can review the details here.
I’ve discussed some of these new features here
and here.
The official AIX 7.1 announcement details are available here.

The announcement got me thinking about my
recent customer engagements and why some have chosen to deploy AIX into their
IBM POWER environments, while others are considering a Linux on POWER solution.

I’ve found that it usually comes down to a
skills decision more than anything else. Most customers are happy to either
continue working with AIX (if they are existing AIX users) or migrate from
another UNIX OS to AIX. I’ve seen very few customers actually migrate to Linux
on POWER, but I’ve worked with several that have seriously considered it. Those
that have chosen to deploy Linux are doing so purely because they have in-house
Linux skills. They are concerned that migrating to AIX may be too big a jump
for their technical staff. I find this thinking interesting, as most of the
customers I’ve dealt with who run other UNIX OS’s like Tru64, Solaris or HP-UX
are more than happy to migrate to AIX. They believe the move is relatively
minor and doesn’t require massive re-training of their UNIX admins. I tend to
agree.

For me, AIX
is my preferred “Enterprise class” UNIX Operating System. Notice I’m prefacing
this with the words Enterprise class. Don’t get me wrong, I have worked with Linux
systems in both small and large customer environments. It is a great OS. But
I’ve found that it really only fits into environments that have a relatively
small number of users and where significant downtime can be tolerated for
things like operating system maintenance. This doesn’t fit the Enterprise class
of UNIX server OS’s that I’m thinking of here. When I contemplate the word Enterprise, I think of servers and
operating systems that can respond to business demands in terms of performance,
reliability, stability and availability. An Enterprise UNIX can provide all of
these things without compromise. Linux can offer performance and reliability
(in my opinion). However, from what I’ve seen, it lacks features & functions
in the areas of stability and availability. AIX on the other hand ticks all the
boxes. Again, this is just my opinion based on my experiences with both AIX and
Linux in the Enterprise landscape. Others will no doubt have their own
experiences that may or may not match my own.

So when I’m designing an Enterprise UNIX
server environment for a customer, I always start with an AIX on POWER base. If
the customer wants Linux, sure I can look at that too, but I strongly recommend
AIX as my preferred choice for large systems. Most of my customers are running
relatively large SAP/Oracle systems. AIX on POWER is a great combination for
large Enterprise systems. If you need to deploy large database systems that
must service tens of thousands of users (like a big SAP system), then I believe
AIX is the perfect OS on which to provide a platform for these large scale
systems.

AIX is a very mature and powerful UNIX OS. It
has been a major player in the UNIX server market for over 20 years (as shown
below). Some people are just not aware of how mature, robust and stable the AIX
OS has become over the years. There are many impressive aspects of the OS in
the areas of performance, scalability, reliability, management and
administration.

Just looking at some of the administration
capabilities built into AIX are enough for me to always recommend AIX over
Linux (or any other UNIX OS), when it comes to large Enterprise servers.

For example, the System Management Interface Tool (SMIT) can make the UNIX admins life a lot simpler.
This is an interactive tool that is part of the AIX OS. Almost all tasks that
an AIX administrator may need to perform can be executed using this tool. It is
a text-based tool (there is also an X interface but I recommend sticking with
the text-based menus). Everything it does, it does through standard AIX
commands and Korn shell functions. This feature is especially useful when you
need to automate a repetitive task; you can have SMIT create the proper
command-line sequence, and you can then use those commands in your own script.
My compatriot, Anthony English, has a nice intro to SMIT on his AIX blog.

The
AIX Logical Volume Manager (LVM)
is built into the OS, for free. AIX LVM helps UNIX system administrators manage their
storage in a very flexible manner. LVM allows logical
volumes to span multiple physical volumes. Data on logical volumes appears to
be contiguous to the user, but might not be contiguous on the physical volume.
This allows file systems, paging space, and other logical volumes to be resized
or relocated, span multiple physical volumes, and have their contents
replicated for greater flexibility and availability. It provides capabilities
for mirroring data across disks, migrating data across disks & storage
subsystems, expand/shrink filesystems and more.......all of which can be
performed dynamically.....no downtime
required. The concept, implementation and interface to the AIX LVM is one of a
kind. All of its features support the ‘continuous availability’ philosophy.

One of the biggest reasons that I love AIX
over Linux is the mksysb. It’s built
into the OS and allows you to create a bootable image of your AIX system. This
image can be used to restore a broken AIX system or for cloning other systems.
The cloning feature is truly amazing. You can take an image created on a
low-end system and deploy it on any POWER system, all the way up to the
high-end POWER boxes. This simplifies the installation and cloning processing
when you need to install and manage many AIX LPARs. By using an SOE mksysb
image you can deploy consistent AIX images across your Enterprise POWER server
environment.

This brings me to another wonderful feature of
AIX, the Network Installation Manager
(NIM).
NIM is powerful network installation tool (comparable to Linux Kickstart).
Using NIM you can backup/restore, update and upgrade one or more AIX systems
either individually or simultaneously. This can all be achieved over a network
connection, removing the need for handling physical installation media forever.

Another fine example of AIXs superior OS management tools is multibos. This tool allows an AIX administrator to
create and maintain two separate, bootable instances of the AIX OS within the
same root volume group (rootvg). This second instance of rootvg is known as a
standby Base Operating System (BOS) and is an extremely handy tool for
performing AIX TL and Service Pack (SP) updates. Multibos lets you install,
update and customize a standby instance of the AIX OS without impacting the running and active production instance of the
AIX OS. This is valuable in environments with tight maintenance windows.

When it comes to upgrading the OS to a new
release of AIX, the nimadm
utility can assist the administrator greatly in this task. The nimadm
utility offers several advantages. For example, a system administrator can use nimadm to create a copy of a
NIM client's rootvg and migrate the disk to a newer version or release of AIX.
All of this can be done without
disruption to the client (there is no outage required to perform the
migration). After the migration is finished, the only downtime required will be
a single scheduled reboot of the system.

AIX
6.1 introduced new capability that most UNIX operating systems are still
working on. Concurrent
Updates of the AIX Kernel.....without a reboot! IBM is always working hard at making AIX an OS
that can provide continuous availability,
even if it needs to be patched. AIX now has the ability to update certain kernel
components and kernel extensions in place, without needing a system reboot. In
addition, concurrent updates can be removed from the system without needing a
reboot. Can you do this with other UNIX OSs?

And that’s just some of the features that make
AIX the only UNIX OS that I recommend
for Enterprise systems. There are many more tools and features that I couldn’t
live without (like alt_disk_install, savevg, installp, WPARs, etc, the list
goes on). If you are new to AIX and you are considering what your next UNIX OS
should be, then I recommend you take a very
close look at AIX.

And finally, the support provided by IBM, is
first class. Whenever I’ve needed assistance with an AIX issue or query, I have
always received timely, professional and useful advice. On the rare occasions
where I’ve uncovered a new bug, IBM AIX support have always been quick to
provide me with an interim fix to resolve or workaround a problem. That’s the
sort of support you’d expect for an Enterprise UNIX OS, isn’t it? What’s the
support like from your current UNIX (or Linux) OS vendor?

Linux is still a viable UNIX operating system.
However, I think it’s more suited to certain workloads like small to medium
size mail, web and other utility servers and services. AIX, however, would be
my platform of choice for my 10TB Oracle database running SAP ERP, not just for
performance reasons, but primarily because of the system administration
features of AIX that allow me to support and manage the system without
impacting my customers or enforcing reboots/outages whenever I need to change
something on the system.

All IBM need to do is create their own Linux
distribution (Blue Linux perhaps?)
that has all the features of AIX built in and then I’m sold! But why would
they? We already have AIX.

Whenever
I’m building a new AIX system I always make sure to install it. I really like
the fact that I can quickly list processes that are connected to TCP and UDP
ports on my system. For example, to check for the current SSH connections on my
system I can run lsof and check
port 22 (SSH). Immediately I have a good idea of the existing SSH
sessions/connections. I can also check to see if the SSH server (sshd daemon)
is running and listening (LISTEN) on my AIX partition.

But
sometimes I work on systems that don’t have lsof installed. It may not be practical or appropriate for me to
install it either. So I have
to find another tool (or tools) that will do something similar.

Of course,
I could use netstat to check
that a server daemon was listening on a particular TCP port and view any
established connections. But this doesn’t give me the associated process id’s.

$ netstat -a | grep -i ssh

tcp400*.ssh*.*LISTEN

tcp4048aix01.ssh172.29.131.16.50284ESTABLISHED

Fortunately,
the rmsock command
can provide that information. So if I wanted to find the process id for the
sshd daemon that is listening on my system I’d do the following. First I need
to find the socket id using netstat*.

# netstat -@aA | grep -i ssh
| grep LIST | grep Global

Globalf1000700049303b0 tcp40 0*.ssh*.*LISTEN

Then
I can use rmsockto
discover the process id associated with the sockect. In this case it’s PID 282700.

$ rmsock f1000200003e9bb0
tcpcb

The socket 0x3e9808 is being
held by proccess 282700 (sshd).

Unlike what its name implies, rmsock
does not remove the socket, if it is being used by a process. It just reports
the process holding the socket. Note that the second argument of rmsock is the protocol. It's tcpcb
in this example to indicate that the protocol is TCP. The results of the
command are also logged to /var/adm/ras/rmsock.log.

#
tail /var/adm/ras/rmsock.log

socket
0xf100020001c45008 held by process 434420 (writesrv) can't be removed.

socket
0xf100020000663008 held by process 418040 (java) can't be removed.

socket
0xf1000200012ad008 held by process 418040 (java) can't be removed.

socket
0xf100020000dec008 held by process 163840 (inetd) can't be removed.

socket
0xf100020000deb008 held by process 163840 (inetd) can't be removed.

socket
0xf10002000016f808 held by process 192554 (snmpdv3ne) can't be removed.

socket
0xf100020001c51808 held by process 442596 (dtlogin) can't be removed.

socket
0xf1000200012a4008 held by process 418040 (java) can't be removed.

socket
0xf100020000666008 held by process 315640 (java) can't be removed.

socket
0xf100020000deb808 held by process 163840 (inetd) can't be removed.

*Note: In my
example I specified the @ symbol with the netstat command. I
also grep’ed for the string Global.
You may have to do the same if you have WPARs running on your system. In my
case I have two active WPARs who both have their own sshd process. My Global
environment also has an sshd process. So in total there are three sshd daemons
that I can view from the Global environment. By specifiying the @ symbol with
netstat, I can quickly determine which process belongs to the Global
environment and those that exist within each WPAR.

Essentially
this program is designed to give IBM customers, ISVs and IBM BPs the
opportunity to gain early experience with the latest release of AIX prior to
general availability. This is a great time to join forces and help IBM mould
the next generation of the AIX OS.

I got
involved in the AIX 6 Open
Beta back in 2007. It was a worthwhile experience. The time I
spent learning new features like WPARs and RBAC, put me in a good position when
it came time to actually implement these outside of my lab environment. It was
also a good opportunity to provide feedback to the IBM AIX development
community. Several AIX developers monitored the comments/questions in the Beta
Forum and provided advice (and sometimes fixes) for known (and unknown!) issues
with the beta release. It also provided the developers with plenty of real
world feedback that they could take back to the labs, long before the product
was officially released. This certainly helped fix bugs and improve certain
enhancements before customers starting using the OS in their computing
environments.

The Getting Started guide provides useful
information that you will need to know before attempting to install the OS. For
example, the beta code will run on any IBM System p, eServer pSeries or POWER system
that is based on PPC970, POWER4, POWER5, POWER6 or POWER7 processors.

The guide
also describes what new functionality has been included in this release of the
beta. If this program is anything like the AIX 6 beta, there may be more than
one release of the code, with further enhancements available in each release.
The new function in this release includes:

AIX 5.2 Workload Partitions
for AIX 7provides the capability to create a WPAR running AIX
5.2 TL10 SP8. This allows a migration path for an AIX 5.2 system running on old hardware
to move to POWER7. All that is required is to create a mksysb image of the AIX
5.2 system and then provide this image when creating the WPAR. The WPAR must be
created on a system running AIX 7 on POWER7 hardware. This is a very
interesting feature, one that I am eager to test.

B.Removal of WPAR local storage device
restrictions.

AIX 7 will allow for exporting a virtual or physical fibre channel
adapter to a WPAR. The WPAR will essentially own the physical adapter and its
child devices. This will allow for SAN storage devices to be directly assigned to
the WPAR's FC adapter(s). This means it will not be necessary to provision the storage
in the Global environment first and then export it to the WPAR. This is also
interesting as we may now be able to assign SAN disk to a WPAR for both rootvg
and data volume groups. Maybe even FC tape devices within a WPAR will work?

D.Etherchannel enhancements in 802.3ad mode.

There are some enhancements to AIX 7 EtherChannel support
for 802.3AD mode. The enhancement makes sure that the link is LACP ready before
sending data packets. If I’m interpreting this correctly, this will ensure that
the aggregated link is configured appropriately. If it’s not, it will provide
an error in the AIX errpt stating that the link is not configured correctly.
This can help avoid situations where the AIX EtherChannel is configured but the
Network Switch is not. At present there is very little an AIX administrator can
do as the link will appear to be functioning even if the Switch end has not
been configured for an aggregated link.

And
the most important point in the Getting
Started guide has to be, how to install the AIX beta code! An ISO image of
the code is provided for download. The installation steps are straightforward
as the image can be installed via a DVD device. Using a media repository on a
VIO server could be one way to accomplish this task. Unfortunately, there is no
mention of NIM install support yet. Here are the basic steps from the guide:

Installing the AIX Open Beta Driver

The
AIX 7 Open Beta driver is delivered by restoring a system backup (mksysb) of
the code downloaded via DVD ISO image from the AIX Open Beta web-site.

Once
you have downloaded and created the AIX 7 Open Beta media (as described above)
follow the following steps to install the ‘mksysb’.

1.Put
the DVD of the AIX Open Beta in the DVD drive. A series of screens/menus will
be displayed. Follow the instruction on the screens and make the following
selections:

•
Type 1 and press Enter to have English during install.

•
Type 1 to continue with the install.

•
Type 1 to Start Install Now with Default Setting.

2.The
system will start installing the AIX 7.0 BETA.

3.Upon
completion of the install, the system will reboot. You can then login as
“root”, no password is required.

Next I
recommend that you take a look at the Release
Notes. It provides a few bits of information that may come in handy when
planning for the install, such as:

·The
Open Beta code is being delivered via an “mksysb” install image. Migration
installation is not supported with the open beta driver.

·The
open beta driver does not support IBM Systems Director Agent.

·When
installing in a disk smaller than 15.36 GB, the following warning is displayed:
A disk of size 15360 was specified in the bosinst.data file, but there is
not a disk of at least that size on the system. You can safely ignore this
warning.

·The
image is known to install without issues on an 8 GB disk.

·oslevel
output shows V7BETA.

By the
way, if you are unable to find a spare system or LPAR on which to install the
beta, perhaps you can consider using the IBM Virtual Loaner Program (VLP).
They are planning to support LPARs running the AIX 7 Open Beta starting from
July 17th. I use the VLP all the time and found it be a fantastic
way to try new things without the need for, or expense of, my own IBM POWER
system. There are some drawbacks, such as not having access to your own
dedicated hardware, HMC, VIO server, NIM master, but still it’s great if you
just want to test something on an AIX system.

I’ll
report back once I’ve got my AIX 7 Open Beta system up and running!

If you are
a system administrator that is responsible for managing AIX systems that run
SAP, then you’ve probably had an experience similar to the following?

OK, so one
day my SAP Basis administrator contacts me and says “I can’t start saposcol......can
you please reboot the system?” I quickly reply, “Was there an error message when trying to restart saposcol?”. He
replies, “No”. Again I return very
quickly, “OK, have you checked to see if
there are any shared memory segments left for saposcol?”. Just as quick, he replies “How do I do that?”

Together
we try starting saposcol and what we
find is that it thinks it’s already running (as shown below, PID 327924). But there is no such process!

My
conclusion is that there must be a shared memory segment still allocated for
saposcol. There were many other SAP processes still running happily, so there
were several shared memory segments to sift through. So, what shared memory ID
does saposcol use?

Now, according
to the following website, shared memory key 4dbe is used by saposcol on AIX.

Answer: Sometimes it may be
necessary to remove the shared memory key of saposcol (see point 6). Caution:
Please be very careful! This procedure should be performed only after checking
that saposcol is really not running (see point 4) and only in cases when other
options (see point 6) really do not work! For this, execute command “ipcs -ma”
and note the line that contains saposcol key “4dbe”. You need the shared
memory ID. After that, execute command ipcrm -m ID. Now the command “saposcol
-s” should show that saposcol is not running and that the shared memory is not
attached. The shared memory key will be created automatically by the saposcol when
the collector is next started: “saposcol -l”.

So I run ipcs to check for the existence of 4dbe. And I find an entry for this key.

There are
several process id’s ‘attached’ to this segment. However, only one of them actually
exists (PID 2293794).

Several new features
were mentioned in the launch, but there were two new features that I found
particularly interesting:

- AIX 5.2 WPARs
for AIX 7

- Cluster
Aware AIX

I thought I would
briefly describe each feature in this post.

AIX 5.2 WPARs
for AIX 7

In AIX version 7,
administrators will now have the capability to create Workload Partitions
(WPARs) that can run AIX 5.2, inside an AIX 7 operating system instance. This
will be supported on the POWER7 server platform.This is pretty cool. IBM have done this to allow some
customers, that are unable to migrate to later generations of AIX and Power, to
move to POWER7 whilst keeping their legacy AIX 5.2 systems operational. So for
those clients that MUST stay on AIX 5.2 (for various reasons such as
Application support) but would like to run their systems on POWER7, this
feature may be very attractive. It will help to reduce the effort required when
consolidating older AIX 5.2 systems onto newer hardware. It may also reduce
some of the risk associated with migrating applications from one version of the
AIX operating system to another.

To migrate an existing
AIX 5.2 system to an AIX 7 WPAR, administrators will first need to take a
mksysb of the existing system. Then they can simply restore the mksysb image
inside the AIX 7 WPAR. IBM will also offer limited defect and how-to support
for the AIX 5.2 operating system in an AIX 7 WPAR. These WPARs can, of course,
be managed via IBM Systems Director with the Workload Partitions Manager
plug-in.

The following figure
provides a visualization of how these AIX 5.2 systems will fit into an AIX 7
WPAR. The WPARs in blue are native AIX 7 WPARs, while the WPARs in orange are
AIX 5.2 WPARs running in the same AIX 7 instance. Pretty amazing really!

I can only speculate as to what other versions of AIX will be
supported in this manner in the future. Just imagine an AIX 5.3 WPAR inside
AIX7?

Cluster
Aware AIX

Another very interesting feature
of AIX 7 is a new technology known as “Cluster
Aware AIX”. Believe it or not, administrators will now be able to create a
cluster of AIX systems using features of the new AIX 7 kernel. IBM have introduced
this “in built” clustering to the AIX OS in order to simplify the configuration
and management of highly available clusters. This new AIX clustering has been
designed to allow for:

- The easy creation
of clusters of AIX instances for scale-out computing or high availability.

-Capabilities such as common device naming to help simplify
administration.

- Built in event management and
monitoring.

- A foundation for
future AIX capabilities and the next generation of PowerHA SystemMirror.

This does not replace PowerHA but
it does change the way in which AIX traditionally integrates with cluster
software like HACMP and PowerHA. A lot of the HA cluster functionality is now
available in the AIX 7 kernel itself. However, the mature RSCT technology is
still a component of the AIX and PowerHA configuration. I’m looking forward to
reading more about this new technology and it’s capabilities.

These are just two of the many
features introduced in AIX 7. I’m eagerly looking forward to what these
features and others mean for the future of the AIX operating system. It’s
exciting to watch this operating system grow and strengthen over time. I can’t
wait to get my hands on an AIX 7 system so that I can trial these new features.

And speaking of trialing AIX 7,
there is good news. IBM plan on running another AIX Open Beta program for AIX 7
mid 2010. Just as they did with AIX
Version 6, customers will be given the opportunity to download a beta
version of AIX 7 and trial it on their own systems in their own environment.
This is very exciting and I’m really looking forward to it.

I encourage you to read the
official AIX 7 announcement to learn more about the future of the AIX operating
system and what it can do for you and your organization…..for many years to
come!

Just the other day, I needed to use the AIX splitvg command in order to copy some data from one system to another.

I thought I’d share the experience here.

The splitvg command can split a single mirror copy of a fully mirrored volume group into a separate “snapshot” volume group.

From the man page:

The original volume group VGname will stop using the disks that are now part of the snapshot volume group SnapVGname. Both volume groups will keep track of the writes within the volume group so that when the snapshot volume group is rejoined with the original volume group consistent data is maintained across the rejoined mirrors copies. Notes:

1To split a volume group, all logical volumes in the volume group must have the target mirror copy and the mirror must exist on a disk or set of disks. Only the target mirror copy must exist on the target disk or disks.

2The splitvg command will fail if any of the disks to be split are not active within the original volume group.

3In the unlikely event of a system crash or loss of quorum while running this command, the joinvg command must be run to rejoin the disks back to the original volume group.

4There is no concurrent or enhanced concurrent mode support for creating snapshot volume groups.

5New logical volumes and file system mount points will be created in the snapshot volume group.

6The splitvg command is not supported for the rootvg.

7The splitvg command is not supported for a volume group that has an active paging space.

8When the splitvg command targets a concurrent-capable volume group which is varied on in non-concurrent mode, the new volume group that is created will not be varied on when the splitvg command completes. The new volume group must be varied on manually.

So, looking at point 4, above, if you are using enhanced concurrent volume groups (for example with PowerHA), then you will not be able to use the splitvg command. This is disappointing as this would have been very handy in some of the large PowerHA systems I have worked with…..perhaps this will be supported in the future?

Anyway, back to my example. I had wanted to break-off one of the mirrors of a mirrored volume group and then assign the “split” volume group to another host to copy some data off.

The volume group datavg contained two disks, hdisk0 and hdisk3, as shown in the lspv output below.

The new volume group contains a new logical volume (pre-fixed with fs i.e. fsfslv00) and a file system (pre-fixed with /fs i.e. /fs/data). I can mount this file system and access the data in the file system and create and/or modify files (as shown below).

During the testing of the migration process we noticed that some of
the sys0 tunables were being reset to their default settings after the
migration had completed. This was rather odd. I’d never had this issue during
an AIX migration in the past.

We noticed the following attributes had changed.

fullcore-
Was set to true before migration. Set to false after migration.

iostat-
Was set to true before migration. Set to false after migration.

maxuproc-
Was set to 2048 before migration. Set to 128 after migration.

The maxuproc value wasof particular concern as it has an impact on the number of processes an
application (user) can start. So when one of our SAP/Oracle test systems was
unable to start because maxuproc was set to low, we were very puzzled.
After we had discovered that maxuproc was incorrect, we changed it to the
appropriate value and restarted SAP/Oracle successfully. We were then very determined to
identify the root cause of this issue. We could not see any issue with our
migration process (via nimadm) and decided to log a call with IBM.

IBM AIX support were able to assist us in determining the problem.

---------------------

After building more debug methods and performing
further debug, which

involves multiple restore attempts, we figured out
the root cause.

- During second phase of boot, when cfgsys_chrp
run, it tries to set all

customized values for sys0 device.However, in this process, if and

when an error occurs, all customized values for
sys0 will be reset to

allow the system to boot. (Instead of hang/crash).

- In the case of the customer's scenario, when
cfgsys_chrp() tries to set ncargs

to value of 30, it fails with an error.Reason being, for

AIX 6.1, minimum value for ncargs is 256. If it is
less than 256, the kernel

returns an error and then cfgsys_chrp
"resets" all customized attribute