I was working at a client site today, on a NIM master that I
configured a month or so ago. I was there to install the TSM backup client
software on about 30 or so LPARs. Of course I was going to use NIM to
accomplish this task.

The software install via NIM worked for the majority of the LPARs
but I noticed a few of them were failing. This was very odd, as the last time
I’d use the same NIM method to install software, everything was fine.

I suspected that perhaps something had changed on the client
LPARs...maybe with their /etc/niminfo
file for instance. So I performed the following steps to reconfigure the /etc/niminfo file
and configure the nimsh subsystem on
the client LPAR.

lpar1#
mv /etc/niminfo /etc/niminfo.old

lpar1#
niminit -a master=nim1 -a name=`hostname`

lpar1#
stopsrc -s nimsh

lpar1#
smit nim_config_services

Configure Client Communication Services

Type
or select values in entry fields.

Press
Enter AFTER making all desired changes.

[Entry Fields]

* Communication Protocol used by client[nimsh]+

NIM Service Handler Options

*Enable Cryptographic Authentication[disable]+

for client communication?

Install Secure Socket Layer Software (SSLv3)?[no]+

Absolute path location for INSTALLP package[/dev/cd0]/

-OR-

lpp_source which contains INSTALLP
package[]+

Alternate Port Range for Secondary
Connections

(reserved values will be used if left
blank)

Secondary Port Number[]#

Port Increment Range[]+#

The last step failed with the following error message:

0042-358
niminit: The connect attribute may only be assigneda service value of "shell" or
"nimsh".

I checked the NIM client and confirmed it was configured for nimsh
and it was fine. However, I did notice something odd when I ran the following
command:

lpar1#
egrep 'nimsh|nimaux' /etc/services

lpar1#

The entries for nimsh
were missing from the /etc/services
file!

Somebody had decided that these entries were not required and had
simply removed them! Gee, thanks so much for that!

After adding the following entries back into the services file,
everything started working again!

nimsh3901/tcp# NIM Service Handler

nimsh3901/udp# NIM Service Handler

nimaux3902/tcp# NIMsh Auxiliary Port

nimaux3902/udp# NIMsh Auxiliary Port

I’ve also encountered this error when there is another
process (other than nimsh) using
port 3901 or 3902.

Another error message you might confront, if those
entries are either missing or commented out, is on the NIM master:

nimmast#
nim -o showlog -a log_type=lppchk lpar1

0042-001
nim: processing error encountered on "master":

0042-006 m_showlog: (From_Master) connect
Error 0

poll:
setup failure

I thought I’d also mention another error message that
can potentially drive you insane (especially if you haven’t had your morning
coffee!). The error doesn’t relate to nimsh
at all but I thought I’d describe it anyway. The message appears when running
the nim –o showlog command against a
client LPAR.

nimmast# nim -o showloglpar1

0042-001
nim: processing error encountered on "master":

0042-006 m_showlog: (From_Master) connect
Error 0

0042-008 nimsh: Request denied – wronghostname

I’ve modified the output a little to make it easier to
identify the problem. Can you see it? I thought so! Upon investigation you may
find that the IP address for the NIM master is resolving to a different
hostname on the client. For example:

On the NIM master:

nimmast# host nimmast

nimmast
is 172.29.150.177

nimmast# host 172.29.150.177

nimmast is 172.29.150.177

nimmast# grep 177 /etc/hosts

172.29.150.177nimmast

On the NIM client:

lpar1# host nimmast

nimmast
is 172.29.150.177

lpar1# host 172.29.150.177

wronghostname is 172.29.150.177

lpar1# grep 172.29.150.177 /etc/hosts

172.29.150.177wronghostname

172.29.150.177nimmast

In this example, someone placed two host entries in /etc/hosts with the same IP address. The client was resolving the IP address
to an incorrect hostname. This resulted in our nim –o showlog command failing.

In order to create the WPAR, I needed an AIX 5.2 mksysb file to
supply to the mkwpar command.

Fortunately, I just happened to have an old AIX 5.2 mksysb image
in my archives!

I then executed the following command to build the WPAR:

# mkwpar -n wpar1 -C -B /home/cgibson/AIX5202_64bit-mksysb

The flags to the command are:

-n wparname

Specifies the name for the workload partition to be
created. You must specify a name, either using the -n flag or in a
specification file using the -f flag, unless the -p name or both –w and -o
flags are used.

-B wparbackupdevice

Specifies a device containing a workload partition
backup image. This image is used to populate the workload partition file
systems. The wparBackupDevice parameter is a workload partition image that is
created with the savewpar, mkcd, or mkdvd command. The -B flag is used by the
restwpar command as part of the process of creating a workload partition from a
backup image.

-C

Creates a versioned workload partition. This option
is valid only when additional versioned workload partition software has been
installed.

I was then able to start my new AIX 5.2 WPAR successfully!

# startwpar -v wpar1

Starting workload partition wpar1.

Mounting all workload partition file systems.

Mounting /wpars/wpar1

Mounting /wpars/wpar1/home

Mounting /wpars/wpar1/mksysb

Mounting /wpars/wpar1/nre/opt

Mounting /wpars/wpar1/nre/sbin

Mounting /wpars/wpar1/nre/usr

Mounting /wpars/wpar1/opt

Mounting /wpars/wpar1/proc

Mounting /wpars/wpar1/tmp

Mounting /wpars/wpar1/usr

Mounting /wpars/wpar1/usr/local

Mounting /wpars/wpar1/var

Mounting /wpars/wpar1/var/log

Mounting /wpars/wpar1/var/tsm/log

Loading workload partition.

Exporting workload partition devices.

Exporting workload partition kernel extensions.

Starting workload partition subsystem cor_wpar1.

0513-059 The cor_wpar1 Subsystem has been started.
Subsystem PID is 8388822.

Verifying workload partition startup.

Return Status = SUCCESS.

The WPAR was now in an active state and the associated file
systems were mounted (as shown from the Global
environment).

# lswpar

NameStateTypeHostnameDirectoryRootVG WPAR

--------------------------------------------------------

wpar1ASwpar1/wpars/wpar1no

# mount | grep wpar

/dev/lv00/wpars/wpar1jfsJul 26 20:13 rw,log=/dev/loglv00

/dev/lv01/wpars/wpar1/home
jfsJul 26 20:13 rw,log=/dev/loglv00

/dev/lv02/wpars/wpar1/mksysb
jfsJul 26 20:13 rw,log=/dev/loglv00

/opt/wpars/wpar1/nre/opt namefs Jul 26 20:13 ro

/sbin/wpars/wpar1/nre/sbin namefs Jul 26 20:13 ro

/usr/wpars/wpar1/nre/usr namefs Jul 26 20:13 ro

/dev/lv03/wpars/wpar1/opt
jfsJul 26 20:13 rw,log=/dev/loglv00

/proc/wpars/wpar1/proc
namefs Jul 26 20:13 rw

/dev/lv04/wpars/wpar1/tmp
jfsJul 26 20:13 rw,log=/dev/loglv00

/dev/lv05/wpars/wpar1/usr jfsJul 26 20:13 rw,log=/dev/loglv00

/dev/lv06/wpars/wpar1/usr/local jfsJul
26 20:13 rw,log=/dev/loglv00

/dev/lv07/wpars/wpar1/var
jfsJul 26 20:13 rw,log=/dev/loglv00

/dev/lv08/wpars/wpar1/var/log jfsJul 26
20:13 rw,log=/dev/loglv00

/dev/lv09/wpars/wpar1/var/tsm/log jfsJul 26 20:13 rw,log=/dev/loglv00

I was curious what the WPAR environment was going to look like, so
I used clogin to access
it and run a few commands.

From the Global environment I confirmed I was indeed on an AIX 7
system.

# uname -W

0

# oslevel

V7BETA

From within the WPAR, I confirmed that I was indeed running AIX
5.2! Wow!

# clogin wpar1

wpar1 : / # oslevel

5.2.0.0

And I could see all 8 logical CPUs (4 hardware threads per POWER7
processor i.e. SMT-4).

wpar1 : / # sar -P ALL 1 5

AIX wpar1 2 5 00F602734C0007/26/10

wpar1 configuration: @lcpu=8@mem=4096MB@ent=0.50

20:22:20 cpu%usr%sys%wio%idlephysc%entc

20:22:2107781140.010.0

11700290.010.0

2010990.000.0

30001000.010.0

40350650.000.0

70280720.000.0

U--0930.4793.9

-030970.030.0

I noticed an interesting device in the lscfg output.

wpar1 : / # lscfg

INSTALLED RESOURCE LIST

The following resources are installed on the
machine.

+/- = Added or deleted from Resource List.

*=
Diagnostic support not available.

Model
Architecture: chrp

Model
Implementation: Multiple Processor, PCI bus

+ sys0System Object

*
wio0WPAR I/O Subsystem

Also noticed some new and interesting mount points, for example /nre/opt.

wpar1 : / # df

Filesystem512-blocksFree %UsedIused %Iused Mounted on

Global1310729992824%14245% /

Global1310721267044%701% /home

Global104857610155604%171% /mksysb

Global78643242890446%733114% /nre/opt

Global4587528840081%1002047% /nre/sbin

Global498073624872100%5369887% /nre/usr

Global1310726380052%164011% /opt

Global-----/proc

Global1310721250805%521% /tmp

Global157286416574490%2318312% /usr

Global5242884944646%1541% /usr/local

Global 13107211151215%4934% /var

Global2621442537444%281% /var/log

Global1310721268324%201% /var/tsm/log

I did have one minor problem when I first tried to start my WPAR,
but that issue was quickly resolved by the AIX developers on the AIX 7 Open
Beta Forum.

Or the PDF. It's not as interesting or fun, as the animations don't work but you get the idea. I'm sharing the PDF because it appears that you can open this file on Windows fine (in read-only mode) but on a Mac it prompts for a password without an option for opening in read-only. Shame.

You are receiving that mail because we have been either in touch in regards
of LPA2RRD tool about year ago or you are on LPAR2RRD mailing list.

I’ve decided to make professional paid support of LPAR2RRD.
Here are my reasons which led me to that thought:
- I have not touched the source code for nearly 1.5 year (looks like the
last version is quite stable :) ). No time at all and lack of motivation...
- Only what I trying to keep is support of the tool, but my response times
are not what I am proud of
- the tool itself would need big time investment to implement new
functionalities, rewrite the web face (would need to hire a web designer
for that) etc.

Only the solution what I see how to keep continuity and do not let project
slowly died is paid support.
Basically I feel that as the last chance for any further development or
support from my side.
I simply cannot manage to do everything I actually do around and still work
on LPAR2RRD.
If I would get some money from the project then I will be able to
prioritize my other activities in favour of LPAR2RRD development and
support.

From that reason I’ve created new LPAR2RRD home here:http://www.lpar2rrd.com
I am about to found a company which would be responsible for support. Based
on your feedback I will decide go or no-go.
I could even imagine working on LPAR2RRD full time in case of enough
support subscribers.

I do not intend to force you ordering support. If you are happy with the
actual version, you have no issues etc ... that this makes me also happy.
Make people happy was the main reason why I spent my free time by
developing of the tool since 2006. I do not regret that at all! It was fun.

Essentially
this program is designed to give IBM customers, ISVs and IBM BPs the
opportunity to gain early experience with the latest release of AIX prior to
general availability. This is a great time to join forces and help IBM mould
the next generation of the AIX OS.

I got
involved in the AIX 6 Open
Beta back in 2007. It was a worthwhile experience. The time I
spent learning new features like WPARs and RBAC, put me in a good position when
it came time to actually implement these outside of my lab environment. It was
also a good opportunity to provide feedback to the IBM AIX development
community. Several AIX developers monitored the comments/questions in the Beta
Forum and provided advice (and sometimes fixes) for known (and unknown!) issues
with the beta release. It also provided the developers with plenty of real
world feedback that they could take back to the labs, long before the product
was officially released. This certainly helped fix bugs and improve certain
enhancements before customers starting using the OS in their computing
environments.

The Getting Started guide provides useful
information that you will need to know before attempting to install the OS. For
example, the beta code will run on any IBM System p, eServer pSeries or POWER system
that is based on PPC970, POWER4, POWER5, POWER6 or POWER7 processors.

The guide
also describes what new functionality has been included in this release of the
beta. If this program is anything like the AIX 6 beta, there may be more than
one release of the code, with further enhancements available in each release.
The new function in this release includes:

AIX 5.2 Workload Partitions
for AIX 7provides the capability to create a WPAR running AIX
5.2 TL10 SP8. This allows a migration path for an AIX 5.2 system running on old hardware
to move to POWER7. All that is required is to create a mksysb image of the AIX
5.2 system and then provide this image when creating the WPAR. The WPAR must be
created on a system running AIX 7 on POWER7 hardware. This is a very
interesting feature, one that I am eager to test.

B.Removal of WPAR local storage device
restrictions.

AIX 7 will allow for exporting a virtual or physical fibre channel
adapter to a WPAR. The WPAR will essentially own the physical adapter and its
child devices. This will allow for SAN storage devices to be directly assigned to
the WPAR's FC adapter(s). This means it will not be necessary to provision the storage
in the Global environment first and then export it to the WPAR. This is also
interesting as we may now be able to assign SAN disk to a WPAR for both rootvg
and data volume groups. Maybe even FC tape devices within a WPAR will work?

D.Etherchannel enhancements in 802.3ad mode.

There are some enhancements to AIX 7 EtherChannel support
for 802.3AD mode. The enhancement makes sure that the link is LACP ready before
sending data packets. If I’m interpreting this correctly, this will ensure that
the aggregated link is configured appropriately. If it’s not, it will provide
an error in the AIX errpt stating that the link is not configured correctly.
This can help avoid situations where the AIX EtherChannel is configured but the
Network Switch is not. At present there is very little an AIX administrator can
do as the link will appear to be functioning even if the Switch end has not
been configured for an aggregated link.

And
the most important point in the Getting
Started guide has to be, how to install the AIX beta code! An ISO image of
the code is provided for download. The installation steps are straightforward
as the image can be installed via a DVD device. Using a media repository on a
VIO server could be one way to accomplish this task. Unfortunately, there is no
mention of NIM install support yet. Here are the basic steps from the guide:

Installing the AIX Open Beta Driver

The
AIX 7 Open Beta driver is delivered by restoring a system backup (mksysb) of
the code downloaded via DVD ISO image from the AIX Open Beta web-site.

Once
you have downloaded and created the AIX 7 Open Beta media (as described above)
follow the following steps to install the ‘mksysb’.

1.Put
the DVD of the AIX Open Beta in the DVD drive. A series of screens/menus will
be displayed. Follow the instruction on the screens and make the following
selections:

•
Type 1 and press Enter to have English during install.

•
Type 1 to continue with the install.

•
Type 1 to Start Install Now with Default Setting.

2.The
system will start installing the AIX 7.0 BETA.

3.Upon
completion of the install, the system will reboot. You can then login as
“root”, no password is required.

Next I
recommend that you take a look at the Release
Notes. It provides a few bits of information that may come in handy when
planning for the install, such as:

·The
Open Beta code is being delivered via an “mksysb” install image. Migration
installation is not supported with the open beta driver.

·The
open beta driver does not support IBM Systems Director Agent.

·When
installing in a disk smaller than 15.36 GB, the following warning is displayed:
A disk of size 15360 was specified in the bosinst.data file, but there is
not a disk of at least that size on the system. You can safely ignore this
warning.

·The
image is known to install without issues on an 8 GB disk.

·oslevel
output shows V7BETA.

By the
way, if you are unable to find a spare system or LPAR on which to install the
beta, perhaps you can consider using the IBM Virtual Loaner Program (VLP).
They are planning to support LPARs running the AIX 7 Open Beta starting from
July 17th. I use the VLP all the time and found it be a fantastic
way to try new things without the need for, or expense of, my own IBM POWER
system. There are some drawbacks, such as not having access to your own
dedicated hardware, HMC, VIO server, NIM master, but still it’s great if you
just want to test something on an AIX system.

I’ll
report back once I’ve got my AIX 7 Open Beta system up and running!

I received an email this week from a
colleague that worked with me on the NIM
Redbook back in 2006. He was experiencing an issue with DSM and NIM. He was
attempting to use the dgetmacs
command to obtain the MAC address of the network adapters on an LPAR. The
command was failing to return the right information.

I experienced this very issue during
the writing of the AIX 7.1 Differences
Guide Redbook. And given that I was in Austin, sitting in the same building
as the AIX development team, I was able to speak with the developers directly
about the issue. At that time they provided me with the following workaround.

First they asked me to check the size
of the /usr/lib/nls/msg/en_US/IBMhsc.netboot.cat message
catalog file.

# ls –l /usr/lib/nls/msg/en_US/IBMhsc.netboot.cat

-rw-r--r--1 binbin3905 Aug 08 09:54

They were surprised to find that the
file appeared to be “too small”. They promptly sent me the catalog file from
one of their development AIX 7.1 systems.I replaced the file as follows:

# cd/usr/lib/nls/msg/en_US/

# ls -ltr IBMhsc*

-rw-r--r--1 binbin3905 Aug 08 09:54
IBMhsc.netboot.cat

# cp -p IBMhsc.netboot.cat
IBMhsc.netboot.cat.old

# cp /tmp/lpar1/IBMhsc.netboot.cat.new IBMhsc.netboot.cat

# ls -ltr IBMhsc*

-rw-r--r--1 binbin3905 Aug 08 09:54
IBMhsc.netboot.cat.old

-rw-r--r--1 binbin26374 Dec 23 11:24 IBMhsc.netboot.cat

This fixed the problem for me during
the residency.

So I asked my friend to do the same
(after I sent him the message catalog file). He ran the dgetmacs command again and this time it returned the MAC address
for all the network adapters in his LPAR. Success!

This is something
that I experience on all new Power/AIX systems that I install:

System migrated to AIX
5.3 (or later) might experience double boot

When booting AIX Version 5.3 (or
later) on a system that has previously been running an earlier release of AIX, you
may notice that the system automatically reboots and restarts the boot process.
This is how the firmware processes changed information in the boot image. This
reboot also occurs if the process is reversed. A system previously running AIX
5.3 (or later) that is booting a release of AIX prior to 5.3 goes through the
same process. This
″double boot″ occurs only once; if the stored value does not change,
then the second boot does not occur. If you install AIX 5.3 (or later) and
continue to use only that version, this double boot occurs once, and it occurs
only if your system was running a pre-AIX 5.3 release before you boot AIX 5.3
(or later). Systems
that are preinstalled with AIX 5.3 (or later) and use only that version do not
experience the ″double boot.″

Starting with AIX Version 7.2, the AIX operating system provides the AIX Live Update function which eliminates downtime associated with patching the AIX operating system. Previous releases of AIX required systems to be rebooted after an interim fix was applied to a running system. This new feature allows workloads to remain active during a Live Update operation and the operating system can use the interim fix immediately without needing to restart the entire system. In the first release of this feature, AIX Live Update will allow customers to install interim fixes (ifixes) only. Ultimately it may be possible to use this function to install AIX Service Packs (SPs) and Technology Levels (TLs) without a reboot.

IBM delivers kernel fixes in the form of ifixes to resolve issues that are reported by customers. If a fix changes the AIX kernel or loaded kernel extensions that cannot be unloaded, the host logical partition (LPAR) must be rebooted. To address this issue, AIX Version 7.1, and earlier, provided concurrent update-enabled ifixes that allowed deployment of some limited kernel fixes to a running LPAR. Unfortunately not all ifixes could be delivered as “concurrent update-enabled”. The AIX Live Update solution is not constrained by the same limitations as in the case of concurrent update enabled ifixes. The AIX 7.2, Live Update feature will allow customers to install ifixes without needing to reboot their AIX systems, avoiding downtime for their mission critical, production workloads.

This article (in the link below) will discuss the high-level concepts relating to AIX Live Updates and then provide a real example of how to use the tool to patch a live AIX system. I was fortunate enough to take part in an Early Ship Program (ESP) for AIX 7.2. During the ESP I had the opportunity to test the AIX Live Update feature. I’ll share my experience using this tool in the example that follows.

In my previous post on AIX Live Updates I discussed how to use the geninstall command to perform a non-disruptive (ifix) update on an AIX system. In this post I wanted to show you how to perform the same task using NIM.

NIM can be used to start an AIX Live Update operation on a target machine (NIM client) either from a NIM master or from the NIM client itself (with nimclient).

Note: The AIX Live Update operation started by NIM calls the hmcauth command during the cust operation to authenticate to the NIM client with the HMC by using the HMC passwd file. The NIM master is responsible for obtaining password information from the HMC (using ssh).Without it, NIM clients will not have the password information necessary when running hmcauth as part of the NIM client operation.So, we must first define an hmc object in NIM and create the password file (used when accessing the console.)Once this required step has been completed, all clients using NIM live_update have the ability to pass the proper hmc login credentials when configuring 'hmcauth'.

First, I need to install the dsm.core fileset and configure SSH keys between the NIM master and the HMC.

The NIM client must either be defined with or updated to include the Managed System name (Management Source) and LPAR id number.

# smit nim_chmac

Change/Show Characteristics of a Machine

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

Machine Name [AIXmig]

* Hardware Platform Type [chrp] +

* Kernel to use for Network Boot [64] +

Machine Type standalone

Network Install Machine State currently running

Network Install Control State ready for a NIM operation

Primary Network Install Interface

Network Name net1

Host Name [AIXmig]

Network Adapter Hardware Address [0]

Network Adapter Logical Device Name [ent]

Cable Type N/A +

Network Speed Setting [] +

Network Duplex Setting [] +

IPL ROM Emulation Device [] +/

VLAN Tag Priority (0 to 7) [] #

VLAN Tag Identifier (0 to 4094) [] #

CPU Id [00F94F584C00]

Communication Protocol used by client [nimsh] +

NFS Client Reserved Ports [] +

Comments []

Managing System Information

LPAR Options

Identity [88]

Management Source [S824]

# lsnim -l AIXmig

AIXmig:

class = machines

type = standalone

connect = nimsh

platform = chrp

netboot_kernel = 64

if1 = net1 AIXmig 0

cable_type1 = N/A

mgmt_profile1 = hsc02 88 S824 <<< LPARD id 88, Mgmt Src S824

Cstate = ready for a NIM operation

prev_state = ready for a NIM operation

Mstate = currently running

cpuid = 00F94F584C00

Cstate_result = success

I also need to configure an lpp_source for the ifix location (on the NIM master) and the Live Update data file (on the NIM master). This file can reside on the NIM client if you wish but I’ve chosen to manage all the resources on the NIM master.

# lsnim -t lpp_source

lpp_sourceaix72 resources lpp_source

liveupdatefix resources lpp_source

# lsnim -l liveupdatefix

liveupdatefix:

class = resources

type = lpp_source

arch = power

Rstate = ready for use

prev_state = unavailable for use

location = /nim/lvup/ifix

alloc_count = 0

server = master

# ls -ltr /nim/lvup/ifix

total 72

-rw-r----- 1 root system 35625 Oct 15 14:50 dummy.150813.epkg.Z

# lsnim -t live_update_data

liveupdate_AIXmig resources live_update_data

# lsnim -l liveupdate_AIXmig

liveupdate_AIXmig:

class = resources

type = live_update_data

Rstate = ready for use

prev_state = unavailable for use

location = /nim/lvup/lvupdate.data

alloc_count = 0

server = master

# ls -ltr /nim/lvup/

total 16

drwxr-xr-x 2 root system 256 Oct 15 14:54 ifix

-r--r----- 1 root system 4289 Oct 15 15:04 lvupdate.data

# tail -20 /nim/lvup/lvupdate.data

# Users need not provide redundant options such as "-a -U -C and -o"

# in the trc_option field for trace stanza.

# Do not add a trace stanza to the lvupdate.data file unless you

# want the live update commands to be traced.

#

general:

mode = automated

kext_check = no

disks:

nhdisk = hdisk0

mhdisk = hdisk1

tohdisk =

tshdisk =

hmc:

lpar_id = 88

management_console = 10.1.50.30

user = hscroot

Now I can perform a preview of the live update operation, from the NIM master. The preview operation will be run on the NIM client called AIXmig.

If you want, you could initiate the live update from the NIM client using the nimclient command. All the resources reside on the NIM master, but the NIM client starts the operation, not the NIM master.

I’ve
written about multibos before, here and here. But recently I
started experimenting with multibos mksysb migration. A customer asked me how
this worked and apart from a high-level view I wasn’t able to provide any real
world experience, so I thought I’d give it a try. What follows is just a ‘brain
dump’ from my quick test.

First of all
this isn’t really a migration. It just simply populates a second instance of
AIX with a higher-version. It doesn’t really migrate (or merge) your existing
configuration into the second instance. So I’m not sure how useful this feature
really is right now.

Starting with
5.3 TL9 you can add a 6.1 TL2 (or above) instance. This is done with the new –M
flag. You must be running with the 64bit kernel.

This isn’t really a migration because it populates the second instance using a
mksysb based on the new release.

In 6.1 TL2 a new flag (-M) was added to the mksysb command which allows you to
create a mksysb for use with multibos. It creates a backup of BOS (/, /usr,
/var, /opt).
bos.alt_disk_install.boot_images must be installed.

It is not advised to run in this environment for an extended period of time.
There could be problems if tfactor or maps are used. Be aware that 6.1 specific
attributes may not be reflected in the standby instance.

So in my
lab environment I have two AIX LPARs. One is running AIX 6.1 and the other
running AIX 7.1.

First I
take a mksysb (with the –M flag) of the AIX 7.1 system to a file. This file
will be called by multibos to populate the second instance.

aix7[/] > mksysb -Mie /data/aix7-mksysb

Creating information file
(/image.data) for rootvg.

Creating list of files to back up.

Backing up 71643 files.....

71643 of 71643 files (100%)

0512-038 mksysb: Backup Completed
Successfully.

aix7[/] > ls -ltr /data

total 4276112

drwxr-xr-x2 rootsystem256 Feb 21 20:59
lost+found

-rw-r--r--1 rootsystem2189363200 Feb 21 21:06
aix7-mksysb

I copied this
file over to my AIX 6.1 system. This was the system that was to be ‘migrated’.
The next step was to perform a preview of the multibos operation.

Upon
checking my bootlist output, I
noticed (as expected) that the list now contained two extra entries for bos_hd5. These were the boot logical
volume entries for the second instance. If I was to boot from this LV I’d be
booting into AIX 7.1. Cool.

root@aix6 /# bootlist -m normal -o

hdisk0 blv=bos_hd5

hdisk0 blv=bos_hd5

hdisk0 blv=hd5

hdisk0 blv=hd5

So at this
point, I’d created a second instance of AIX running 7.1. My current version of
(running) AIX was AIX 6.1. All I had to do now was reboot the LPAR and let it
restart as an AIX 7.1 system.

root@aix6 /# oslevel -s

6100-01-05-0920

root@aix6 / # shutdown –Fr

The LPAR
rebooted successfully and I found I was now running AIX 7.1, just as I’d hoped.

aix6[/] > oslevel -s

7100-00-01-1037

If I wanted
to go back to AIX 6.1, I would change my bootlist setting again and restart the
LPAR.

Now that
I’ve actually tried this method of migration, I’m not sure I’d actually use it
in its current form.

Although
the migration keeps my hostname and IP address, the file systems are not shared
between instances. Most of the target systems configuration is not retained.
For example, any user accounts I create on my AIX 6.1 system would also need to
be created on the existing AIX.7.1 system which I used to create the AIX 7.1
mksysb image. It reminds me a little of a preservation install.

IBM made some announcements today relating to
their latest POWER7 server offerings. The new line of systems includes new
entry level systems and the highly anticipated high-end system, the POWER7 795!
They also officially outlined some of
the new features available in AIX 7.1. You can review the details here.
I’ve discussed some of these new features here
and here.
The official AIX 7.1 announcement details are available here.

The announcement got me thinking about my
recent customer engagements and why some have chosen to deploy AIX into their
IBM POWER environments, while others are considering a Linux on POWER solution.

I’ve found that it usually comes down to a
skills decision more than anything else. Most customers are happy to either
continue working with AIX (if they are existing AIX users) or migrate from
another UNIX OS to AIX. I’ve seen very few customers actually migrate to Linux
on POWER, but I’ve worked with several that have seriously considered it. Those
that have chosen to deploy Linux are doing so purely because they have in-house
Linux skills. They are concerned that migrating to AIX may be too big a jump
for their technical staff. I find this thinking interesting, as most of the
customers I’ve dealt with who run other UNIX OS’s like Tru64, Solaris or HP-UX
are more than happy to migrate to AIX. They believe the move is relatively
minor and doesn’t require massive re-training of their UNIX admins. I tend to
agree.

For me, AIX
is my preferred “Enterprise class” UNIX Operating System. Notice I’m prefacing
this with the words Enterprise class. Don’t get me wrong, I have worked with Linux
systems in both small and large customer environments. It is a great OS. But
I’ve found that it really only fits into environments that have a relatively
small number of users and where significant downtime can be tolerated for
things like operating system maintenance. This doesn’t fit the Enterprise class
of UNIX server OS’s that I’m thinking of here. When I contemplate the word Enterprise, I think of servers and
operating systems that can respond to business demands in terms of performance,
reliability, stability and availability. An Enterprise UNIX can provide all of
these things without compromise. Linux can offer performance and reliability
(in my opinion). However, from what I’ve seen, it lacks features & functions
in the areas of stability and availability. AIX on the other hand ticks all the
boxes. Again, this is just my opinion based on my experiences with both AIX and
Linux in the Enterprise landscape. Others will no doubt have their own
experiences that may or may not match my own.

So when I’m designing an Enterprise UNIX
server environment for a customer, I always start with an AIX on POWER base. If
the customer wants Linux, sure I can look at that too, but I strongly recommend
AIX as my preferred choice for large systems. Most of my customers are running
relatively large SAP/Oracle systems. AIX on POWER is a great combination for
large Enterprise systems. If you need to deploy large database systems that
must service tens of thousands of users (like a big SAP system), then I believe
AIX is the perfect OS on which to provide a platform for these large scale
systems.

AIX is a very mature and powerful UNIX OS. It
has been a major player in the UNIX server market for over 20 years (as shown
below). Some people are just not aware of how mature, robust and stable the AIX
OS has become over the years. There are many impressive aspects of the OS in
the areas of performance, scalability, reliability, management and
administration.

Just looking at some of the administration
capabilities built into AIX are enough for me to always recommend AIX over
Linux (or any other UNIX OS), when it comes to large Enterprise servers.

For example, the System Management Interface Tool (SMIT) can make the UNIX admins life a lot simpler.
This is an interactive tool that is part of the AIX OS. Almost all tasks that
an AIX administrator may need to perform can be executed using this tool. It is
a text-based tool (there is also an X interface but I recommend sticking with
the text-based menus). Everything it does, it does through standard AIX
commands and Korn shell functions. This feature is especially useful when you
need to automate a repetitive task; you can have SMIT create the proper
command-line sequence, and you can then use those commands in your own script.
My compatriot, Anthony English, has a nice intro to SMIT on his AIX blog.

The
AIX Logical Volume Manager (LVM)
is built into the OS, for free. AIX LVM helps UNIX system administrators manage their
storage in a very flexible manner. LVM allows logical
volumes to span multiple physical volumes. Data on logical volumes appears to
be contiguous to the user, but might not be contiguous on the physical volume.
This allows file systems, paging space, and other logical volumes to be resized
or relocated, span multiple physical volumes, and have their contents
replicated for greater flexibility and availability. It provides capabilities
for mirroring data across disks, migrating data across disks & storage
subsystems, expand/shrink filesystems and more.......all of which can be
performed dynamically.....no downtime
required. The concept, implementation and interface to the AIX LVM is one of a
kind. All of its features support the ‘continuous availability’ philosophy.

One of the biggest reasons that I love AIX
over Linux is the mksysb. It’s built
into the OS and allows you to create a bootable image of your AIX system. This
image can be used to restore a broken AIX system or for cloning other systems.
The cloning feature is truly amazing. You can take an image created on a
low-end system and deploy it on any POWER system, all the way up to the
high-end POWER boxes. This simplifies the installation and cloning processing
when you need to install and manage many AIX LPARs. By using an SOE mksysb
image you can deploy consistent AIX images across your Enterprise POWER server
environment.

This brings me to another wonderful feature of
AIX, the Network Installation Manager
(NIM).
NIM is powerful network installation tool (comparable to Linux Kickstart).
Using NIM you can backup/restore, update and upgrade one or more AIX systems
either individually or simultaneously. This can all be achieved over a network
connection, removing the need for handling physical installation media forever.

Another fine example of AIXs superior OS management tools is multibos. This tool allows an AIX administrator to
create and maintain two separate, bootable instances of the AIX OS within the
same root volume group (rootvg). This second instance of rootvg is known as a
standby Base Operating System (BOS) and is an extremely handy tool for
performing AIX TL and Service Pack (SP) updates. Multibos lets you install,
update and customize a standby instance of the AIX OS without impacting the running and active production instance of the
AIX OS. This is valuable in environments with tight maintenance windows.

When it comes to upgrading the OS to a new
release of AIX, the nimadm
utility can assist the administrator greatly in this task. The nimadm
utility offers several advantages. For example, a system administrator can use nimadm to create a copy of a
NIM client's rootvg and migrate the disk to a newer version or release of AIX.
All of this can be done without
disruption to the client (there is no outage required to perform the
migration). After the migration is finished, the only downtime required will be
a single scheduled reboot of the system.

AIX
6.1 introduced new capability that most UNIX operating systems are still
working on. Concurrent
Updates of the AIX Kernel.....without a reboot! IBM is always working hard at making AIX an OS
that can provide continuous availability,
even if it needs to be patched. AIX now has the ability to update certain kernel
components and kernel extensions in place, without needing a system reboot. In
addition, concurrent updates can be removed from the system without needing a
reboot. Can you do this with other UNIX OSs?

And that’s just some of the features that make
AIX the only UNIX OS that I recommend
for Enterprise systems. There are many more tools and features that I couldn’t
live without (like alt_disk_install, savevg, installp, WPARs, etc, the list
goes on). If you are new to AIX and you are considering what your next UNIX OS
should be, then I recommend you take a very
close look at AIX.

And finally, the support provided by IBM, is
first class. Whenever I’ve needed assistance with an AIX issue or query, I have
always received timely, professional and useful advice. On the rare occasions
where I’ve uncovered a new bug, IBM AIX support have always been quick to
provide me with an interim fix to resolve or workaround a problem. That’s the
sort of support you’d expect for an Enterprise UNIX OS, isn’t it? What’s the
support like from your current UNIX (or Linux) OS vendor?

Linux is still a viable UNIX operating system.
However, I think it’s more suited to certain workloads like small to medium
size mail, web and other utility servers and services. AIX, however, would be
my platform of choice for my 10TB Oracle database running SAP ERP, not just for
performance reasons, but primarily because of the system administration
features of AIX that allow me to support and manage the system without
impacting my customers or enforcing reboots/outages whenever I need to change
something on the system.

All IBM need to do is create their own Linux
distribution (Blue Linux perhaps?)
that has all the features of AIX built in and then I’m sold! But why would
they? We already have AIX.

AIX has a new “critical volume group” capability which will monitor for the loss or failure of a volume group. You can apply this to any volume group, including rootvg. If applied to rootvg, then you can monitor for the loss of the root volume group.

This feature may be useful if your AIX LPAR experiences a loss of SAN connectivity e.g. total loss of access to SAN storage and/or all SAN switches. Typically, when this happens, AIX will continue to run, in memory for a period of time and will not immediately crash. Often you can still log on to the AIX system but if you attempt to write a file you’ll see in I/O error. But even then the system may (potentially) remain up. When the SAN issue is resolved the AIX system may continue running, with file systems in read-only mode (or not, it depends) but to really resolve the issue you would still need to reboot the AIX LPAR in order for it regain access to its disks. This can result in the need to run fsck against file systems. Note that the behaviour you encounter will be impacted by a variety of factors, such as length and type of outage. As always, your mileage may vary!

You can encounter this behaviour with both VSCSI and NPIV SAN booted LPARs. This new AIX VG option, which caters for the scenario described above, is not enabled by default. From the chvg man page:

y Enables the critical VG option of the volume group. If the volume group is set

to the critical VG, any I/O request failure starts the Logical Volume Manager

(LVM) metadata write operation to check the state of the disk before

returning the I/O failure. If the critical VG option is set to rootvg and if the

volume group losses access to quorum set of disks (or all disks if quorum

disabled), instead of moving the VG to an offline state, the node is crashed

and a message is displayed on the console.”

PowerHA also caters for and supports this now....and should already be enabled by default. You want this feature enabled for your HA clusters so that they respond appropriately to loss of the root volume group and initiate a failover.

AIX LVM has recently added the capability to change a volume group to be a known as critical volume group. Though PowerHA has allowed critical volume groups in the past, that

only applied to non-operating system/data volume groups. PowerHA v7.2 now also takes advantage of this functionality specifically for rootvg. If the volume group is set to the critical VG, any I/O request failure starts the Logical Volume Manager (LVM) metadata write operation to check the state of the disk before returning the I/O failure. If the critical VG option is set to rootvg and if the volume group losses access to quorum set of disks (or all disks if quorum is disabled), instead of moving the VG to an offline state, the node is crashed and a message is displayed on the console. You can set and validate rootvg as a critical volume group by executing the commands shown below. The command has to run once since we are using the CAA distributed command clcmd.

# clcmd chvg -r y rootvg

# clcmd lsvg rootvg |grep CRIT

DISK BLOCK SIZE: 512 CRITICAL VG: yes

DISK BLOCK SIZE: 512 CRITICAL VG: yes"

To test this new feature in my lab, I simulated a disk "failure" or accidental unmapping/removal of a rootvg disk from an LPAR.

On the AIX LPAR, prior to disk failure simulation, I turn on the “CRITICAL VG” option for rootvg.

# oslevel -s

7200-00-00-0000

# lsvg rootvg | grep CRIT

DISK BLOCK SIZE: 512 CRITICAL VG: no

# chvg -r y rootvg

# lsvg rootvg | grep CRIT

DISK BLOCK SIZE: 512 CRITICAL VG: yes

On the VIOS, I unmap the rootvg disk from the corresponding vhost adapter:

If you run snap whilst a system dump has been recorded on the dump device (logical volume), then it will be collected and included in the snap data file. If there is a valid dump to collect you’ll see the message highlighted below in the snap output.

If you are on a version of AIX that doesn’t have the snap –Z flag (such as AIX 5.3) then the alternative way to capture snap data without including the dump info is to run snap –a, when that is finished, remove the /tmp/ibmsupt/dump directory and run snap –c to create the snap pax file.

A customer was attempting to use sudo to run some commands on an AIX host. Each time, sudo would fail with the following error message:

$ sudo -l

sudo: no tty present and no askpass program specified

The only information we had was that the host had been rebooted at some point in the last day or so.

We spent some time checking and verifying a bunch of things (specifically making sure that the /etc/sudoers file was intact and did not contain any bad entries) but none of them provided any real clues as to the root cause of the problem.

Problem Debugging and Investigation

We enabled debug for sudo. This involved creating an /etc/sudo.conf file with the following entries:

# vi sudo.conf

Debug sudo /var/log/sudo_debug.log all@debug

Debug sudoers.so /var/log/sudo_debug.log all@debug

Then, as the non-root user, we attempted to run sudo again. This generated some useful data in the /var/log/sudo_debug.log file. You'll notice we grep'ed for the string tty specifically, given that this was part of the sudo error message provided to us.

The following information was of great interest to us! It appeared to be that some information was "missing" from the /proc file system on the host?

# grep tty /var/log/sudo_debug.log | grep unable

Aug 10 16:20:26 sudo[14352830] unable to resolve tty via /proc/14352830/psinfo: A file or directory in the path name does not exist. @ get_process_ttyname() ./ttyname.c:442

Aug 18 11:19:59 sudo[21889328] unable to resolve tty via /proc/21889328/psinfo: A file or directory in the path name does not exist. @ get_process_ttyname() ./ttyname.c:442

Upon further investigation, we found that /proc did, in fact, appear to be completely empty!! This was not expected or normal on a modern AIX system.

# ls -ltr /proc

total 0

And, after running the mount command, we discovered that, yes, the /proc file system was NOT mounted!

# mount | grep proc

#

Resolution

Next, we attempted to mount /proc. But we couldn't, as it was not a known file system? Curious indeed.

# mount /proc

mount: 0506-334 /proc is not a known file system.

The /proc file system stanza was missing from /etc/filesystems. So, we added it back in and mounted the file system successfully.

# grep -p proc /etc/filesystems

# << No /proc entry found in /etc/filesystems!

# grep -p proc /etc/filesystems

/proc:

dev = /proc

vol = "/proc"

mount = true

check = false

free = false

vfs = procfs

# mount /proc

# mount | grep proc

/proc /proc procfs Aug 18 11:31 rw

# ls -ltr /proc | head -10

total 0

dr-xr-xr-x 8 root system 0 Aug 10 16:21 sys

-r--r--r-- 1 root system 91 Aug 10 16:21 version

dr-xr-xr-x 1 root system 0 Aug 10 16:21 0

dr-xr-xr-x 1 root system 0 Aug 10 16:21 1

dr-xr-xr-x 1 root system 0 Aug 10 16:21 260

dr-xr-xr-x 1 root system 0 Aug 10 16:21 65798

dr-xr-xr-x 1 root system 0 Aug 10 16:21 131336

dr-xr-xr-x 1 root system 0 Aug 10 16:21 196874

dr-xr-xr-x 1 root system 0 Aug 10 16:21 262412

As soon as /proc was mounted again, the non-root user could run sudo without any issues.

$ sudo -l

Password:

User cgibson may run the following commands on 750lpar4:

(ALL) /usr/sbin/slibclean

$

The /proc file system is different from other file systems. It does not reside on disk. The /proc file system resides in AIX memory and is referred to as a virtual file system. The files maintained in /proc represent the running processes on the system.

"I tested in the lab, for a versioned wpar you can use it more like a regular LPAR - extend a disk into rootvg, mirror on it, unmirror from the old one, reduce out and rmdev. No bosboot or bootlist needed. There is no bootset in a versioned wpar for some reason.”

I have a rootvg WPAR that is on one disk, is there a method to move it to a new disk?

Answer

There may be an occasion where you have created a rootvg WPAR on a specific disk and you want to move the entire WPAR to another disk. One example might be that the original disk is from an older storage enclosure, and you wish to move the WPAR to newly purchased storage, connected to the system.

You can do this by means of an alternate bootset. Similar to how using the alt_disk_copy command in a global LPAR will create a copy of rootvg on another disk, an alternate bootset is a copy of a WPAR's rootvg on another disk.

The example in this technote will use a rootvg wpar that is on a single disk (hdisk11), and has private /opt and /usr filesystems (AKA a "detached" WPAR). This was initially created using these options:

That bootset also believes hdisk9 is "hdisk0" for it, and the other disk is hdisk1. Notice the bootset ID has not changed, bootset 0 is still on (global) disk hdisk11 and bootset 1 on (global) disk hdisk9.

"The viosbr command automatically creates a backup, whenever there are any configuration changes. This functionality is known as the autoviosbr backup. It is triggered every hour, and checks if there are any configuration changes, or any other changes. If it detects any changes, a backup is created. Otherwise, no action is taken. The backup files resulting from the autoviosbr backup are located under the default path /home/padmin/cfgbackups with the names autoviosbr_SSP.<cluster_name>.tar.gz for cluster level and autoviosbr_<hostname>.tar.gz for node level. The cluster-level backup file is present only in the default path of the database node.

The -autobackup flag is provided for the autoviosbr backup functionality. By default, autoviosbr backup is enabled on the system. To disable the autoviosbr backup, use the stop parameter and to enable it you can use the start parameter. When the autoviosbr backupis disabled, no autoviosbr related tar.gz file is generated.

To check if the autoviosbr backup file, present in the default path is up to date, you can use the status parameter. To access the cluster-level backup file on any node of the cluster, use the save parameter. This action is necessary as the cluster-level backup file is present in the default path of the database node only.

If the node is a part of cluster, you can use the -type flag to specify the parameter. The parameter can be either cluster or node, depending on if it is a cluster-level or a node-level backup.

Flags""

On your VIOS, under oem_setup_env (as root), you'll find the following new entry in root's crontab:

# crontab -l | grep autoviosbr

0 * * * * /usr/ios/sbin/autoviosbr -start 1>/dev/null 2>/dev/null

This entry will check for any configuration changes and generate a new backup if necessary. Here's an example from my VIOS, running 2.2.5.10.

$ ioslevel

2.2.5.10

I create a new virtual optical device, which should trigger a new backup the next time the autoviosbr script runs (once per hour). Prior to creating the device, viosbr shows the autobackup status as Complete (no changes). I ensure that autobackup is configured by stopping and starting it with the viosbr command.

$ viosbr -autobackup stop -type node

Autobackup stopped successfully.

$ viosbr -autobackup start -type node

Autobackup started successfully.

$ viosbr -autobackup status -type node

Node configuration changes:Complete.

$ mkvdev -fbo -vadapter vhost34

vtopt13Available

Immediately after the vtopt device is created, the autobackup status displays as Pending (something has changed but has not yet been backed up).

$ viosbr -autobackup status -type node

Node configuration changes:Pending.

The autoviosbr file is created in /home/padmin/cfgbackups.

$ r oem

oem_setup_env

# cd /home/padmin/cfgbackups

# ls -tlr

total 72

-rw-r--r-- 1 root staff 12189 May 1014:00autoviosbr_s824vio2.tar.gz

On the hour, the autoviosbr script runs, notices that the configuration has changed and generates a new viosbr backup file. The viosbr autobackup status changes to Complete.

Just say you
change the queue_depth on a hdisk
with chdev –P. This updates the devices
ODM information only, not its running configuration. The new value will take
effect next time I reboot the system. So now I have a different queue_depth in the ODM compared to the
devices current running config (in the kernel).

What if I
forget that I’ve made this change to the ODM and forget to reboot the system
for many months? Someone complains of an I/O performance issue....I check the
queue_depths and find they appear to be set appropriately but I still see disk
queue full conditions on my hdisks. But have I rebooted since changing the values?

How do I
know if the ODM matches the devices running configuration?

For example,
I start with a queue_depth of 3,
which is confirmed by looking at lsattr
(ODM) and kdb (running config)
output:

# lsattr -El
hdisk6 -a queue_depth

queue_depth 3 Queue DEPTH
True

# echo
scsidisk hdisk6 | kdb | grep queue_depth

ushort queue_depth =
0x3;
< In Hex.

Now I change
the queue_depth using chdev –P i.e. only updating the ODM.

# chdev -l
hdisk6 -a queue_depth=256 -P

hdisk6
changed

# lsattr -El
hdisk6 -a queue_depth

queue_depth 256 Queue DEPTH
True

kdb reports that the disks running
configuration still has a queue_depth of
3.

# echo
scsidisk hdisk6 | kdb | grep queue_depth

ushort queue_depth = 0x3;

Now if I varyoff
the VG and change the disk queue_depth,
both lsattr (ODM) and kdb (the running config) show the same
value:

# umount
/test

# varyoffvg
testvg

# chdev -l
hdisk6 -a queue_depth=256

hdisk6
changed

# varyonvg
testvg

# mount
/test

# lsattr -El
hdisk6 -a queue_depth

queue_depth 256 Queue DEPTH
True

# echo
scsidisk hdisk6 | kdb | grep queue_depth

ushort queue_depth =
0x100;
< In Hex = Dec 256.

# echo
"ibase=16 ; 100" | bc

256

This is one
way of checking you’ve rebooted since you changed your queue_depth attributes. I’ve tried this on AIX 6.1 and 7.1 only.

You can collect this data by running Java and specifying the powervp.jar file, as shown below. You need to specify the hostname, username and password for the host where the system level agent resides. In the following example the hostname/IP address is 10.1.1.99 and the username/password is root and mypass1. I found the PowerVP JAR file in the default PowerVP install directory, which (on AIX) is usually /IBM/PowerVP/PowerVP_GUI_Installation/PowerVP/.

This got me thinking. Perhaps I could write a small script to wrap all this up and then schedule it from cron to collect data on a regular basis?

I wrote the beginnings of a basic expect script (shown below) which would allow me to run the script for a specified amount of time (in seconds) and pass the hostname, username and password from the AIX command line. I guess this would work fine from Linux as well?