Here’s an example of converting
rootvg file systems from JFS to JFS2 using alt_disk_copy.

My lab system was migrated from AIX
5.3 to 7.1 via nimadm. Unfortunately, nimadm does not convert JFS file
systems to JFS2 during the migration. So, in this case, even though I’ve
migrated to AIX 7.1 (which is a good thing) I’m still left with legacy JFS file
systems in rootvg.

And because the AIX 5.3 version of
the alt_disk_copy command does not
have the –T option, I can’t convert my JFS file systems to JFS2 before I
migrate to AIX 7.1. So my best option is to migrate to AIX 7.1 then convert
rootvg to JFS2 file systems. A few hops in the process but it’s good enough.

aixlpar1 : /
# lsvg -l rootvg

rootvg:

LV NAMETYPELPsPPsPVsLV STATEMOUNT POINT

hd5boot111closed/syncdN/A

hd6paging32321open/syncdN/A

hd8jfslog111open/syncdN/A

hd4jfs441 open/syncd/

hd2jfs29291open/syncd/usr

hd9varjfs20201open/syncd/var

hd3jfs1641641open/syncd/tmp

hd1jfs441open/syncd/home

hd10optjfs441open/syncd/opt

localjfs441open/syncd/usr/local

loglvjfs441open/syncd/var/log

hd7sysdump991open/syncdN/A

hd71sysdump991open/syncdN/A

hd11adminjfs221open/syncd/admin

aixlpar1 : /
# oslevel -s

7100-01-01-1141

I clone rootvg to a spare disk using
alt_disk_copy and the –T flag (which will convert the file
systems to JFS2). The process converts the file systems to JFS2, as shown in
the section below (highlighted in green).

aixlpar1 : /
# alt_disk_copy
-d hdisk1 -T

Source boot
disk is: hdisk2

jfs2j2: Current data file
/image.data moved to /image.data.acct.save.3735802.

Before I reboot the system on the
alternate rootvg I verify that the cloned volume group now contains JFS2 file
systems only. I “wake up” the altinst_rootvg and run the lsvg command to confirm the file system is correct. I then put the
altinst_rootvg to “sleep”, reboot the system and verify all rootvg file systems
are mounted as jfs2.

aixlpar1 : /
# alt_rootvg_op -W -d hdisk1

Waking up
altinst_rootvg volume group ...

aixlpar1 : /
# lsvg -l altinst_rootvg

altinst_rootvg:

LV NAMETYPELPsPPsPVsLV STATEMOUNT POINT

alt_hd5boot111closed/syncdN/A

alt_hd6paging32321closed/syncdN/A

alt_hd8jfs2log111open/syncdN/A

alt_hd4jfs2441open/syncd/alt_inst

alt_hd2jfs229291open/syncd/alt_inst/usr

alt_hd9varjfs220201open/syncd/alt_inst/var

alt_hd3jfs21641641open/syncd/alt_inst/tmp

alt_hd1jfs2441open/syncd/alt_inst/home

alt_hd10optjfs2441open/syncd/alt_inst/opt

alt_localjfs2441open/syncd/alt_inst/usr/local

alt_loglvjfs2441open/syncd/alt_inst/var/log

alt_hd7sysdump991closed/syncdN/A

alt_hd71sysdump991closed/syncdN/A

alt_hd11adminjfs2221open/syncd/alt_inst/admin

aixlpar1 : /
# alt_rootvg_op -S altinst_rootvg

Putting
volume group altinst_rootvg to sleep ...

forced
unmount of /alt_inst/var/log

forced
unmount of /alt_inst/var

forced
unmount of /alt_inst/usr/local

forced
unmount of /alt_inst/usr

forced
unmount of /alt_inst/tmp

forced
unmount of /alt_inst/opt

forced
unmount of /alt_inst/home

forced
unmount of /alt_inst/admin

forced
unmount of /alt_inst

Fixing LV
control blocks...

Fixing file
system superblocks...

aixlpar1 : /
#

; Reboot on
the alternate rootvg hdisk

aixlpar1 : /
# uptime

10:18AMup 1 min,1 user,load average: 0.32, 0.09, 0.03

aixlpar1 : /
# lspv

hdisk100c342c637f21a59rootvgactive

hdisk200c342c6161c6b47old_rootvg

aixlpar1 : /
# df

Filesystem512-blocksFree %UsedIused %Iused Mounted on

/dev/hd452428841156822%30407% /

/dev/hd2380108853717686%3408935% /usr

/dev/hd9var262144024135208%35542% /var

/dev/hd321495808214763121%1101% /tmp

/dev/hd15242885224561%901% /home

/proc-----/proc

/dev/hd10opt52428826656050%513315% /opt

/dev/local5242884953206%2491% /usr/local

/dev/loglv5242885220401%491% /var/log

/dev/hd11admin2621442613841%71% /admin

aixlpar1 : /
# lsvg -l rootvg

rootvg:

LV NAMETYPELPsPPsPVsLV STATEMOUNT POINT

hd5boot111closed/syncdN/A

hd6paging32321open/syncdN/A

hd8jfs2log111open/syncdN/A

hd4jfs2441open/syncd/

hd2jfs229291open/syncd/usr

hd9varjfs220201open/syncd/var

hd3jfs21641641open/syncd/tmp

hd1jfs2441open/syncd/home

hd10optjfs2441open/syncd/opt

localjfs2441open/syncd/usr/local

loglvjfs2441open/syncd/var/log

hd7sysdump991open/syncdN/A

hd71sysdump991open/syncdN/A

hd11adminjfs2221open/syncd/admin

===

The message “filesystem
not converted” (below)
is not related to the JFS to JFS2 conversion. This messge refers to whether or
not the file system needs to be changed to use Variable Inode Extents (VIX).
This is the default setting for JFS2 file systems.

Why would I want to convert rootvg
to JFS2 anyway? Well for starters, it’s generally considered best practice to
use JFS2 as it offers several performance & scalability enhancements over
JFS. For example, you cannot create files greater than 2GB on JFS, unless the
file system was created as “large (big) file” enabled; jfs file systems in
rootvg were never created as large file enabled.

Another reason…..eventually JFS will
be retired.

Here’s an example of a potential
problem with JFS in rootvg. You try to create a file of a size greater than 2GB
in /tmp (type jfs). Even though the ulimit settings are not restricting the
creation of a file of this size, the JFS file system will not allow it. The
file creation process fails. The bf attribute
for the /tmp file system is set to false.
This indicates the file system is not “large file” enabled.

When you import a volume group, the importvg command will populate the /etc/filesystems file based on the logical volume minor number order (which is stored in the VGDA on the physical volume/hdisk). If someone manually edits the /etc/filesystems, then its contents will no longer match the order contained in the VGDA of the physical volume. This can become a problem the next time someone attempts to export and import a volume group. Essentially they may end up with file systems over-mounted and what appears to be the loss of data!

Here’s a quick example of the problem.

Let’s create a couple of new file systems; /fs1 and /fs1/fs2. I’ll deliberately create them in the “wrong” order.

# mklv -tjfs2 -y lv2 cgvg 1

lv2

# crfs -vjfs2 -dlv2 -Ayes -u fs -m /fs1/fs2

File system created successfully.

65328 kilobytes total disk space.

New File System size is 131072

# mklv -tjfs2 -y lv1 cgvg 1

lv1

# crfs -vjfs2 -dlv1 -Ayes -u fs -m /fs1

File system created successfully.

65328 kilobytes total disk space.

New File System size is 131072

Hmmm, lv2 appears before lv1 in the output from lsvg. The first indication of a potential problem!

# lsvg -l cgvg

cgvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

lv2 jfs2 1 1 1 closed/syncd /fs1/fs2

loglv00 jfs2log 1 1 1 closed/syncd N/A

lv1 jfs2 1 1 1 closed/syncd /fs1

Whoops! /fs1 should be mounted before /fs1/fs2!!! Doh!

# mount -t fs

# mount | tail -2

/dev/lv2 /fs1/fs2 jfs2 Jul 19 23:07 rw,log=/dev/loglv00

/dev/lv1 /fs1 jfs2 Jul 19 23:07 rw,log=/dev/loglv00

Data in /fs1/fs2 is now hidden and inaccessible. The /fs1 file system has over-mounted the /fs1/fs2 file system. This could look like data loss i.e. someone removed all the files from the file system.

# df -g | grep fs

/dev/lv2 - - - - - /fs1/fs2

/dev/lv1 0.06 0.06 1% 4 1% /fs1

The file systems are listed in the wrong order in /etc/filesystems as well. Double Doh!

# tail -15 /etc/filesystems

/fs1/fs2:

dev = /dev/lv2

vfs = jfs2

log = /dev/loglv00

mount = true

type = fs

account = false

/fs1:

dev = /dev/lv1

vfs = jfs2

log = /dev/loglv00

mount = true

type = fs

account = false

No problem. I’ll just edit the /etc/filesystems file and rearrange the order. Simple, right?

# vi /etc/filesystems

/fs1:

dev = /dev/lv1

vfs = jfs2

log = /dev/loglv00

mount = true

type = fs

account = false

/fs1/fs2:

dev = /dev/lv2

vfs = jfs2

log = /dev/loglv00

mount = true

type = fs

account = false

Let’s remount the file systems in the correct order.

# umount -t fs

# mount -t fs

# df -g | grep fs

/dev/lv1 0.06 0.06 1% 5 1% /fs1

/dev/lv2 0.06 0.06 1% 4 1% /fs1/fs2

That looks better now, doesn’t it!? I’m happy now.....although, lsvg still indicates there could be a potential problem here…

# lsvg -l cgvg

cgvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

lv2 jfs2 1 1 1 open/syncd /fs1/fs2

loglv00 jfs2log 1 1 1 open/syncd N/A

lv1 jfs2 1 1 1 open/syncd /fs1

All is well, until one day someone exports the VG and re-imports it, like so:

# varyoffvg cgvg

# exportvg cgvg

# importvg -y cgvg hdisk2

cgvg

# mount -t fs

# mount | tail -2

/dev/lv2 /fs1/fs2 jfs2 Jul 19 23:07 rw,log=/dev/loglv00

/dev/lv1 /fs1 jfs2 Jul 19 23:07 rw,log=/dev/loglv00

Huh? What’s happened here!? I thought I fixed this before!?

Try to avoid this situation before it becomes a problem (for you or someone else!) in the future. If you discover this issue whilst creating your new file systems, remove the file systems and recreate them in the correct order. Obviously, try to do this before you place any data in the file systems. Otherwise you may need to back up and restore the data!

# mklv -tjfs2 -y lv1 cgvg 1

lv1

# crfs -vjfs2 –dlv1 -Ayes -u fs -m /fs1

File system created successfully.

65328 kilobytes total disk space.

New File System size is 131072

# mklv -tjfs2 -y lv2 cgvg 1

lv2

# crfs -vjfs2 –dlv2 -Ayes -u fs -m /fs1/fs2

File system created successfully.

65328 kilobytes total disk space.

New File System size is 131072

You may be able to detect this problem, prior to importing a volume group, by using the lqueryvg command. Looking at the output in the “Logical” section, you might be able to ascertain a potential LV and FS mount order issue.

# lqueryvg -Atp hdisk2 | grep lv

0516-320 lqueryvg: Physical volume hdisk2 is not assigned to

a volume group.

Logical: 00f603cd00004c000000013ff2fc1388.1 lv2 1

00f603cd00004c000000013ff2fc1388.2 loglv00 1

00f603cd00004c000000013ff2fc1388.3 lv1 1

Once you’ve identified the problem you can fix the issue retrospectively (once the VG is imported) by editing /etc/filesystems. Of course, this is just a temporary fix until someone exports and imports the VG again, in which case the mount order issue will occur again.

The essential message here is do NOT edit the /etc/filesystems file by hand when creating file systems.

OK, so I had
a NIM master with two network interfaces, en0 and en1. The en0 interface was connected
to the 192.168.10.0 network and the en1 interface was connected to the 10.1.1.0 network. When the
master was initially configured, we chose en0 as the primary install interface.
The master could now install NIM clients on the 192.168.10.0 network, without
any additional NIM network configuration.

There was
a single NIM network definition for the 192.168.10.0 network (net_192_168_10) in the NIM database.

root@nim1 /
# lsnim -c networks

net_192_168_10 networksent

Sure
enough, the time came when we needed to install NIM clients on the 10.1.1.0
network. Rather than adding NIM routes, we chose to configure the NIM master
with an additional install interface on the 10.1.1.0 network. Fortunately, the
NIM master was already “directly” connected to the 10.1.1.0 network on its en1
interface. So all we had to do was update the NIM configuration.

First we
added a new entry in the /etc/hosts file for the NIM master’s 10.1.1.0 network
address. The hostname for this interface was nim1i.

root@nim1 /
# grep nim1 /etc/hosts

192.168.10.10nim1

10.1.1.10 nim1i

root@nim1 / #
host nim1i

nim1i is 10.1.1.10

Next we
defined a new network install interface for the NIM master via the smit
fastpath, ‘smit nim_mac_if’.

We entered
the hostname of the NIM master on the 10.1.1.0 network i.e. nim1i.

Define a Network Install Interface

Type or
select a value for the entry field.

Press Enter
AFTER making all desired changes.

[Entry Fields]

* Host Name
of Network Install Interface[nim1i]

We named
the new NIM network, net_10_1_1 and entered the appropriate subnet mask for
this network. We did not enter a default gateway for the client or the NIM
master (it is unnecessary, as the master is “directly” connected to this
network).

Define a Network Install Interface

Type or
select values in entry fields.

Press Enter
AFTER making all desired changes.

[Entry Fields]

* Machine
Namemaster

Network Install Interface

*Cable TypeN/A+

Network Speed Setting[]+

Network Duplex Setting[]+

*NIM Network[net_10_1_1]

*Network Typeent

*Ethernet TypeStandard+

*Subnetmask[255.255.255.0]

Default Gateway Used by Machine[]

Default Gateway Used by Master[]

*Host Namenim1i

Network Adapter Hardware Address[0]

Network Adapter Logical Device Name[]

There were
now two network definitions in our NIM database. The Nstate attribute was set to ‘ready
for use’, so our definition was OK to use.

root@nim1 /
# lsnim -c networks

net_192_!68_10networksent

net_10_1_1networksent

root@nim1 /
# lsnim -l net_10_1_1

net_10_1_1:

class= networks

type= ent

comments= Generated during definition of nim1i

Nstate= ready for use

prev_state = information is missing from
this object's definition

net_addr= 10.1.1.0

snm= 255.255.255.0

The NIM
master has two network interfaces defined.

root@nim1 /
# lsnim -l master | grep if

if_defined= chrp.64.ent

if1= net_192_168_10 nim1 0

if2= net_10_1_1
nim1i 0

Now we
could install new NIM clients using this network. We verified this by adding a
new NIM client to the NIM database.

# smit
nim_mkmac

Define a Machine

Type or
select a value for the entry field.

Press Enter
AFTER making all desired changes.

[Entry Fields]

* Host Name
of Machine[lpar22i]

(Primary Network Install Interface)

After
pressing enter on the previous smit panel, we were immediately presented with a
panel to modify the attributes of the NIM client and (more importantly) the
correct network and hostname were automatically selected as part of the clients
definition.

Define a Machine

Type or
select values in entry fields.

Press Enter
AFTER making all desired changes.

[Entry Fields]

* NIM
Machine Name[lpar22]

* Machine
Type[standalone]+

* Hardware
Platform Type[chrp]+

Kernel to use for Network Boot[64]+

Communication Protocol used by client[nimsh]+

Primary Network Install Interface

*Cable TypeN/A+

Network Speed Setting[]+

Network Duplex Setting[]+

*NIM Networknet_10_1_1

*Host Namelpar22i

Network Adapter Hardware Address[0]

Network Adapter Logical Device Name[]

IPL ROM Emulation Device[]+/

CPU Id[]

Machine Group[]+

Managing System Information

WPAR Options

Managing System[]

-OR-

LPAR Options

Identity[]

Management Source[]+

Comments[]

If we had
configured NIM incorrectly i.e. it was not aware of the 10.1.1.0 network, then
we would have been prompted for network information before we could configure
the NIM client (as shown below).

This post introduces two new features that I came across recently and found rather interesting. The first relates to PowerVP (VCPU affinity) and the second to POWER8 (Flexible SMT).

I’m particularly impressed by this new feature in PowerVP version 1.1.2 (SP1). You can view CPU and memory affinity information directly from the PowerVP GUI.

From the PowerVP Installation and User Guide:

“If you go to the View menu and select the Display CPU affinity information, the CPU utilization information will be replaced in the Core columns by the partition affinity information for the cores. If you hover your mouse over a core, you will see a tool tip showing the virtual CPU affinity by partition and will see the LPAR ID and the number of virtual CPUs assigned to the partition on that core. This information can be helpful when analyzing the processor affinity of your system. Note that for shared partitions, a partition could have affinity for multiple cores. Also, just because a partition has affinity for a core, that partition will not necessarily be dispatched to that core when it runs. Partition dispatching is performed by the hypervisor, if you want more information on this, refer to documentation on the hypervisor in the IBM Infocenter.”

Once I selected the “Display CPU affinity information” option, I noticed that the cores, shown in the “node drill down” view, showed partition affinity using different colours for each LPAR. Hovering my mouse over a core showed each of the LPAR ids and their associated virtual CPU count assigned to the core.

I was able to do the same with memory. The boxes next to the memory controllers (MC0 or 1) are memory affinity boxes. The colours in these boxes show the percentage of memory that is assigned to a partition on that particular memory controller. Hovering the mouse over this box showed the LPAR id and the percentage of memory assigned to each LPAR. This information may be useful if you are reviewing a particular partitions memory affinity.

To make it easier to read, I was able to obtain a list of LPARs and their associated colours from the Edit menu with the “Select Visible LPARs” option.

I have noticed that if you have partitions configured with dedicated processors, if you click on the LPAR name (in the partition list), PowerVP will highlight the cores with the colour assigned to the dedicated partition. However, if your partitions are configured with shared processors, they are all highlighted with the same colour (blue). At this time, PowerVP will not differentiate between different shared processor pools. Perhaps this feature will appear in the future?

You can learn more about PowerVP from the following Redbook on the topic:

Something else I wanted to mention, that is related to CPU affinity, is Flexible SMT. This new feature is available on POWER8 systems. It is covered in more detail in section 4.2 of the new POWER8 tuning Redbook. What is interesting is that compared to previous generations of POWER processor, the performance characteristics of a thread are the same, regardless of which h/w thread is active. This will allow for more equal execution of work on any thread of the processor. It also means that techniques such as rsets and bindprocessor may no longer be required on POWER8.

On POWER7 and POWER7+, there is a correlation between the hardware thread number (0-3) and the hardware resources within the processor. Matching the thread numbers to the number of active threads was required for optimum performance. For example, if only one thread was active, it was thread0; if two threads were active, they were thread0 and thread1.

On POWER8, the same performance is obtained regardless of which thread is active. The processor balances resources according to the number of active threads. There is no need to match the thread numbers with the number of active tasks. Thus, when using the bindprocessor command or API, it is not necessary to bind the job to thread0 for optimal performance.

With the POWER8 processor cores, the SMT hardware threads are designed to be more equal in the execution implementation, which allows the system to support flexible SMT scheduling and management.

On POWER8, any process or thread can run in any SMT mode. The processor balances the processor core resources according to the number of active hardware threads. There is no need to match the application thread numbers with the number of active hardware threads.

Hardware threads on the POWER8 processor have equal weight, unlike the hardware threads under POWER7. Therefore, as an example, a single process running on thread 7 would run just as fast as running on thread 0, presuming nothing else is on the other hardware threads for that processor core. AIX will dynamically adjust between SMT and ST mode based on the workload utilization.

I’ve shared
my tips for resolving DLPAR problems in the past. So this week, when one of my
colleagues was experiencing an issue with DLPAR, I referred him to my blog post
and suggested he follow the troubleshooting steps. He did so and I went about
my business. Later that same day I asked him how he had fared. He told me that
DLPAR was still not working on his particular AIX LPAR. It was an AIX 5.3
system and he was attempting to another Virtual Processor to the LPAR. He
expressed his frustration with the situation, so I offered to take a look for
him.

What I
found was that the system was missing an important fileset. A fileset that
enabled DLPAR operations on AIX 5.3 systems. The fileset in question was named csm.client. Without this fileset
installed DLPAR would never work.

I advised
my colleague of the problem and suggested he follow the steps below to resolve
the issue. After he reinstalled the fileset, RMC communication between the HMC
and LPAR was restored and his DLPAR processor add operation completed without
issue.

1. Mount the NIM masters lpp_source file system:

aix53lpar1
: / # mount nim1:/export/lpp_source /mnt

2. Verify CSM filesets are not installed and the IBM.DRM subsystem is either inoperative
or missing.

I received an email from one of my customers recently that simply said:

“mate …on lpar30, for some reason the IBM.DRM and others are failing to start .. um .. any chance you could have a quick look at that ?”

So I asked, “OK, so this use to work right?”. To which I received a relatively confused reply, ”.....yep it did....actually no....it’s never worked....or has it???...I’m not sure...”.

Based on my experience, the most common issue that prevents DLPAR operations from working are network problems. Before diving into the deep end and trying to debug RSCT, it’s always best to start with the basics. For example, can you ping the HMC from the LPAR? Can you ping the LPAR from the HMC? If either of these tests fails, check the network configuration on both components before doing anything else.

– Check the LPAR communicationsbox in HMC configuration screen for LAN adapter that is used for HMC-to-LPAR communication.

– By the way, unlike POWER4 systems, LPARs on POWER5 and POWER6 systems do not depend on host name resolution for DLPAR operations.

·Check routing on the LPAR and the HMC.

– Use ping and the HMC’s Test Network Connectivity task to verify the LPAR and the HMC can communicate with each other.

If you check the network and you are happy that the LPAR and the HMC can communicate, then perhaps you need to re-initialise the RMC subsystems on the AIX LPAR. Run the following commands:

# /usr/sbin/rsct/bin/rmcctrl –z

# /usr/sbin/rsct/bin/rmcctrl –A

# /usr/sbin/rsct/bin/rmcctrl –p

Wait up to 5 minutes before trying DLPAR again. If DLPAR still doesn’t work i.e. the HMC is still reporting no values for DCaps, and the IBM.DRM subsystem still won’t start, try using the recfgct command.

WARNING: The recfgct command referenced above is *not* supported for use by customers without direct IBM support instructions. It erases all RSCT configuration info and makes it look like the node was just installed. This may be fine for DLPAR recycling, but if you have any other products dependent on RSCT on the partition in question, you will be *broken*. In particular, PowerHA 7 will crash, and Tivoli SAMP will have all its cluster info destroyed, partitioning it from the rest of the domain until it can be manually re-added (and it may also crash, depending on the presence of resources). If you find that DLPAR is not working, and all other network checks and even the RMC recycling (-z/-A/-p) does not work, it is strongly recommended that you use the ctsnap command to gather data and contact IBM support. (Capturing iptrace for a few minutes would not be a bad idea either. A complementary tcpdump on the HMC would also be good, but this may not be possible for most customers given HMC's access restrictions.) Then, if you wish to proceed with recfgct and find that it does resolve whatever the problem was, it would be equally wise to gather another ctsnap after the partition is once again connected to the HMC, to compare to the previous one.

CAUTION: Running the recfgct command on a node in a RSCT peer domain or in a Cluster Aware AIX (CAA) environment should NOT be done before taking other precautions first. This note is not designed to cover all CAA or other RSCT cluster considerations so if you have an application that is RSCT aware such as PowerHA, VIOS Storage Pools and several others do not proceed until you have contacted support. If you need to determine if your system is a member of a CAA cluster then please refer to the Reliable Scalable Cluster Technology document titled, "Troubleshooting the resource monitoring and control (RMC) subsystem". http://www-01.ibm.com/support/knowledgecenter/SGVKBA_3.1.5/com.ibm.rsct315.trouble/bl507_diagrmc.htm

Only run the rmcctrl and recfgct commands if you believe something has become corrupt in the RMC configuration of the LPAR. The fastest way to fix a broken configuration or to clear out the RMC ACL files after cloning (via alt_disk migration) is to use the recfgct command.

These daemons should work “out of the box” and are not typically the cause of DLPAR issues. However, you can try stopping and starting the daemons when troubleshooting DLPAR issues.

The rmcctrl -z command just stops the daemons. The rmcctrl -A command ensures that the subsystem group (rsct) and the subsystem (ctrmc) objects are added to the SRC, and an appropriate entry added to the end of /etc/inittab and it starts the daemons.

The rmcctrl –p command enables the daemons for remote client connections i.e. from the HMC to the LPAR and vice versa.

If you are familiar with the System Resource Controller (SRC) you might be tempted to use stopsrc and startsrc commands to stop and start these daemons.

Do not do it; use the rmcctrl commands instead.

If /var is 100% full, use chfs to expand it. If there is no more space available, examine subdirectories and remove unnecessary files (for example, trace.*, core, and so forth). If /var is full, RMC subsystems may fail to function correctly.

The polling interval for the RMC daemons on the LPAR to check with the HMC daemons is 5-7 minutes; so you need to wait long enough for the daemons to start up and synchronize.

The Resource Monitoring and Control (RMC) daemons are part of the Reliable, Scalable Cluster Technology (RSCT) and are controlled by the System Resource Controller (SRC). These daemons run in all LPARs and communicate with equivalent RMC daemons running on the HMC. The daemons start automatically when the operating system starts and synchronize with the HMC RMC daemons.

The daemons in the LPARs and the daemons on the HMC must be able to communicate over the network for DLPAR operations to succeed. This is not the network connection between the managed system (FSP) and the HMC; it is the network connection between the operating system (AIX) in each LPAR and the HMC.

Note: Apart from rebooting, there is no way to stop and start the RMC daemons on the HMC.

The following links also contain some (out dated) information relating to DLPAR verification and troubleshooting. Even though it is quite old some of it is still relevant today and is good a place to start.

The previous link (above) provides some information relating to the values for DCaps and what they mean (also out dated):

0 - DR CPU capable(can move CPUs)

1 - DR MEM capable(can move memory)

2 - DR I/O capable(can move I/O resources)

3 - DR PCI Bridge(can move PCI bridges)

4 - DR Entitlement(POWER 5 can change shared entitlement)

5 - Multiple DR CPU (AIX 5.3 can move 2+ CPUs at once)

0x3f = max, and 0xf is common for AIX 5.2

If you are interested in how HMC and LPAR authentication works with DLPAR, then read on. Otherwise, happy DLPARing!

HMC and LPAR authentication (RSCT authentication)

The diagram below outlines how the HMC and an LPAR authenticate with each other in order for DLPAR operations to work. RSCT authentication is used to ensure the HMC is communicating with the correct LPAR.

Authentication is the process of ensuring that another party is who it claims to be. Authorization is the process by which a cluster software component grants or denies resources based on certain criteria. The RSCT component that implements authorization is RMC. It uses access control list (ACL) files to control user access to resources.

The RMC component subsystem uses cluster security services to map the operating system user identifiers, specified in the ACL file, to network security identifiers to determine if the user has the correct permissions. This is performed by the identity mapping service, which uses information stored in the identity mapping files ctsec_map.global and ctsec_map.local.

The RSCT authorization process in detail:

1.On the HMC:DMSRM pushes down the secret key and HMC IP address to NVRAM when it detects a new CEC; this process is repeated every five minutes. Each time an HMC is rebooted orDMSRM is restarted, a new key is used.

2.On the AIX LPAR:CSMAgentRM, through RTAS (Run-time Abstraction Services), reads the key and HMC IP address out from NVRAM. It will then authenticate the HMC. This process is repeated every five minutes on a LPAR to detect a new HMCs and if the key has changed. An HMC with a new key is treated as a new HMC and will go though the authentication and authorization processes again.

3.On the AIX LPAR:After authenticating the HMC,CSMAgentRM will contact the DMSRM on the HMC to create aManagedNode resource in order to identify itself as a LPAR of this HMC.CSMAgentRM then creates a compatibleManagementServer resource on AIX. This can be displayed on AIX with the lssrsrc command. e.g.

root@aix6 / # lsrsrc "IBM.ManagementServer"

Resource Persistent Attributes for IBM.ManagementServer

resource 1:

Name= "192.168.1.244"

Hostname= "192.168.1.244"

ManagerType= "HMC"

LocalHostname= "10.153.3.133"

ClusterTM= "9078-160"

ClusterSNum= ""

ActivePeerDomain = ""

NodeNameList= {"aix6"}

4.On the AIX LPAR: After the creation of theManagedNode and ManagementServer resources on the HMC and AIX respectively,CSMAgentRMgrants HMC permission to access necessary resource classes on the LPAR. After granting the HMC permission,CSMAgentRM will change itsManagedNode, on the HMC, Status to1. (It should be noted that without proper permission on AIX, the HMC would be able to establish a session with the LPAR but will not be able to query for OS information, DLPAR capabilities, or execute DLPAR commands afterwards.)

5.On the HMC: After theManagedNode Status is changed to1, LparCmdRM establishes a session with the LPAR, queries for operating system information and DLPAR capabilities, notifies CIMOM about the DLPAR capabilities of the LPAR, and then waits for the DLPAR commands from users.

Prior to AIX 5.3 TL7 and AIX 6.1,
there was an 8 character limit on AIX user passwords. If you need passwords of
greater than 8 characters then you must enable one of the supplied Loadable
Password Algorithms (LPAs). The following table lists the available algorithms
and the limitations of each:

For example, to enable the MD5 algorithm
I can modify /etc/security/login.cfg
file with the chsec command as follows:

# chsec -f
/etc/security/login.cfg -s usw -a pwd_algorithm=smd5

# tail -2
/etc/security/login.cfg

pwd_algorithm = smd5

This
algorithm (smd5) will allow a password limit of 255 characters. Each of the
available algorithms is listed in the /etc/security/pwdalg.cfg file.

* /usr/lib/security/ssha is a
password hashing load module using SHA and

* SHA2 algorithms. It
supports password length up to 255 characters.

*

* This LPA accepts three
options. The options are separated by commas.

...etc...

Once you’ve enabled
the LPA of your choice, and you set/change a users’ password, you’ll notice
that the /etc/security/passwd stanza
for that user will look different when compared to the stanzas of users that
have not had their password set/changed using the new LPA:

fred:

password = E7nOaTrrz9Q16

lastupdate = 1330986703

flags = ADMCHG

joe:

password = {smd5}z9JrHDJB$Oq/cZXr0jUyAWvfFyjt161

lastupdate = 1330987903

flags = ADMCHG

In the example above,
user joe’s password has been set using the smd5 algorithm.

For those of you who
run PowerHA (HACMP) and are thinking about using one of the LPAs with the clpasswd utility, you may want to
review this APAR first:

The APAR states “HACMP
cluster-wide C-SPOC password administration does not support use of the feature
allowing passwords longer than 8 characters which became available with the
Loadable Password Algorithm as part of AIX 53 TL 7.”

The last time I tested this with PowerHA, the problem was
that the password entry in /etc/security/passwd
was corrupted/truncated when a users password was changed using the clpasswd utility.

For example, if the passwd
utility is linked to clpasswd and I
changed a users password, the password field appeared to be corrupted/truncated
and the user could not log in successfully:

I’ve not tried this again recently but I am curious if the
same behaviour can be expected on a PowerHA system today. When I first encountered
this problem (in 2008) I opened a PMR for the issue. In that call I was told
that the “clpasswd utility is corrupting the encrypted password when
distributing to the nodes, so that a login fails”. I’ll configure a HA
cluster soon and try it again with PowerHA 6.1 and AIX 6.1 and report back with
the results.

UPDATE: I built a HA 6.1 cluster (on AIX 6.1) this afternoon in
my lab and tested this successfully. Based on the tests I’ve performed so far,
it appears that this limitation no longer exists. Thanks to hafeedbk@us.ibm.com for the help on this
one.

The following IBM
tech note has more information on the available Loadable Password Algorithms
and support for longer than 8 character passwords on AIX:

If restoring a workload
partition, target disks should be in available state.

So I tried the command again,
this time with the –O flag as suggested in the error message. It failed again
stating that it could not remove cgvg from hdisk5. This was also good. At this
point, an experienced AIX admin would look for the existence of a volume group
on hdisk5. However, a junior AIX admin might not! :)

Global# mkwpar -O -D
rootvg=yes devname=hdisk5 -n rootvgwpar1

mkwpar: 0960-620
Failed to remove cgvg on disk hdisk5.

Global# lsvg -l cgvg

cgvg:

LV
NAME
TYPE LPs
PPs PVs LV STATE
MOUNT POINT

cglv
jfs2 1
1 1 open/syncd /cg

loglv00
jfs2log 1
1 1 open/syncd N/A

But what if the file systems
were already unmounted prior to running mkwpar
with the –O flag. What would happen?

So, to test my theory, I
unmounted my file system so that the logical volumes in cgvg were all now
closed.

This could be a trap for “first time” users of RootVG WPARs! So
look out! :)

Apparently this is working as designed, as the –O flag is really
only meant to be called by WPAR tools such as the WPAR Manager.

The man page for mkwpar
states:

-O This flag is used to force the overwrite of an existing volume group
on the given set of devices specified with the -D rootvg=yes flag directive. If not
specified, the overwrite value defaults to FALSE. This flag should only be specified
once, for its setting will be applied to all devices specified with the -D
rootvg=yes flag directive.

After reading about
the latest AIX updates here: http://t.co/bkIpnXkS
I decided to download and install the latest TL & SP for AIX 6.1 and 7.1
and take a peek at some of the latest features.

Here’s what I found
so far.

There appears to be
some new integration between NIM and the VIOS. The nim command now has an updateios
option e.g. nim
–o updateios.

So you can update
your VIO servers from NIM now. This is nice.

On my lab NIM master
I checked the nim man page and found
the following new information:

NIM
[/] # oslevel -s

6100-07-02-1150

NIM
[/] # man nim

...

updateios

Performs
software customization and maintenance on a virtual input output server (VIOS)
management server that is of the vios or ivm type.

updateios

1To install fixes or to update VIOS with the
vioserver1 NIM object name to the latest maintenance level, type:

nim
-o updateios
-a lpp_source=lpp_source1 -a preview=no vioserver1

The
updates are stored in lpp_source and lpp_source1 files. Note: The updateios
operation runs a preview during installation. Running the updateios operation
from NIM runs a preview unless the preview flag is set to no. During the
installation, you must run a preview when using the updateios operation with
updatios_flags=-install. With the preview, you can check if the installation is
running accurately before proceeding with the VIOS update.

2To reject fixes for a VIOS with the
vioserver1 NIM object name, type:

nim
-o updateios -a updateios_flags=-reject vioserver1

3To clean up partially installed updates for
a VIOS with the vioserver1 NIM object name, type:

nim
-o updateios -a updateios_flags=-cleanup vioserver1

4To commit updates for a VIOS with the
vioserver1 NIM object name, type:

nim
-o updateios -a updateios_flags=-commit vioserver1

5To
remove a specific update such as update1 for a VIOS with the vioserver1 NIM
object name, type:

There’s also mention
of a new resource type, specifically for VIOS mksysbs. This resource type is
called ios_mksysb:

...

ios_mksysb

Represents
a backup image taken from a VIOS management server that is of the vios or ivm
type.

26 To
define a ios_mksysb resource such as ios_mksysb1, and create the ios_mksysb
image of the vios client as vios1, during the resource definition where the
image is located in /export/nim/ios_mksysb on the master, type:

nim
-o define -t ios_mksysb -a server=master \

-a
location=/export/nim/ios_mksysb -a source=vios1 \

-a
mk_image=yes ios_mksysb1

This is all starting
to come together now, since the introduction of the new “management” object
class, vios, with AIX 6.1 TL3.

Next I thought I’d
take a look at the TCP Fast Loopback
option. This new option should help to reduce TCP/IP (CPU) overhead when two
(TCP) communication end points reside in the same LPAR. This could be
useful where you have an LPAR running a database and application in the same
LPAR e.g. SAP and Oracle in the same
LPAR. It can also be used when two
or more WPARs, in the same LPAR need to communicate with each other over
TCP/IP.

I turned on this new
feature on my AIX 7.1 LPAR.

AIX7[/]
# oslevel -s

7100-01-02-1150

AIX7[/]
# netstat -p tcp | grep fastpath

0 fastpath loopback connection

0 fastpath loopback sent packet (0
byte)

0 fastpath loopback received packet (0
byte)

AIX7[/]
# no -p -o tcp_fastlo=1

Setting
tcp_fastlo to 1

Setting
tcp_fastlo to 1 in nextboot file

Change
to tunable tcp_fastlo, will only be effective for future connections

AIX7[/]
# no -p -o tcp_fastlo_crosswpar=1

Setting
tcp_fastlo to 1

Setting
tcp_fastlo to 1 in nextboot file

Change
to tunable tcp_fastlo, will only be effective for future connections

AIX7[/]
# no -a | grep tcp_fast

tcp_fastlo = 1

tcp_fastlo_crosswpar = 1

Initially I did not
see any traffic via the fastpath.

AIX7[/]
# netstat -s -p tcp | grep fastpath

0 fastpath loopback connection

0 fastpath loopback sent packet (0
byte)

0 fastpath loopback received packet (0
byte)

So I created two WPARs
in the same LPAR and started transferring files between them via FTP.

So I start with a
standard JFS2 filesystem, without any additional mount options.

NIM
[/] # mount | grep cg

/dev/cglv/cgjfs2Jan 18 21:14 rw,log=/dev/hd8

Then I dynamically
remounted it with the rbr option.
This option will prevent user data pages from being cached after a file is read
from this filesystem.

NIM
[/] # mount -o remount,rbr
/cg

NIM
[/] # mount | grep cg

/dev/cglv/cgjfs2Jan 18 21:14 rw,rbr,log=/dev/hd8

Still can’t
dynamically mount a filesystem with CIO however. But that’s OK.

NIM
[/] # mount -o remount,cio /cg

mount: cio is not valid with the remount option.

According to the
presentation, there are several options that can now be changed dynamically
e.g. atime,rbr,rbw,suid,dev,
etc. Take a look at the presentation if you are interested in this
new functionality.

By the way, just
to make sure, I tried changing the same mount option, dynamically, on an AIX 6.1
TL6 system, and it failed as expected. I’d have to umount and mount the
filesystem to do this on TL6 (or lower).

#
oslevel -s

6100-06-04-1112

# mount
-o remount,rbr /cg

mount: remount,rbr,log=/dev/loglv00 is not valid with the remount
option.

OK, let’s look at the
new LVM Infinite Retry Capability.
Designed to improve system availability, by allowing LVM to recover from
transient failures of storage devices. Sounds interesting!

AIX7[/]
# oslevel -s

7100-01-02-1150

The man page for mkvg states the following:

-O y / n

Enables the infinite retry option
of the logical
volume.

n

The infinite retry option of
the logical
volume is not enabled. The failing I/O of the logical volume is not
retried. This is the default value.

y

The infinite retry option of
the logical
volume is enabled. The failed I/O request is retried until it is
successful.

I think “logical
volume” should be “volume group”. But anyway, I get the idea.

So let’s create a new
VG with infinite retry enabled.

AIX7[/]
# mkvg -O y
-S -y cgvg hdisk6

cgvg

AIX7[/]
# lsvg cgvg

VOLUME
GROUP:cgvgVG IDENTIFIER:00f6048800004c0000000134f4236851

VG
STATE:activePP SIZE:128 megabyte(s)

VG
PERMISSION:read/writeTOTAL PPs:1599 (204672 megabytes)

MAX
LVs:256FREE PPs:1599 (204672 megabytes)

LVs:0USED PPs:0 (0 megabytes)

OPEN
LVs:0QUORUM:2 (Enabled)

TOTAL
PVs:1VG DESCRIPTORS: 2

STALE
PVs:0STALE PPs:0

ACTIVE
PVs:1AUTO ON:yes

MAX
PPs per VG:32768MAX PVs:1024

LTG
size (Dynamic): 128 kilobyte(s)AUTO SYNC:no

HOT
SPARE:noBB POLICY:relocatable

MIRROR
POOL STRICT: off

PV
RESTRICTION:noneINFINITE RETRY: yes

AIX7[/]
#

Now, let’s disable it.

AIX7[/]
# chvg -On
cgvg

AIX7[/]
# lsvg cgvg

VOLUME
GROUP:cgvgVG IDENTIFIER:00f6048800004c0000000134f4236851

VG
STATE:activePP SIZE:128 megabyte(s)

VG
PERMISSION:read/writeTOTAL PPs:1599 (204672 megabytes)

MAX
LVs:256FREE PPs:1599 (204672 megabytes)

LVs:0USED PPs:0 (0 megabytes)

OPEN
LVs:0QUORUM:2 (Enabled)

TOTAL
PVs:1VG DESCRIPTORS: 2

STALE
PVs:0STALE PPs:0

ACTIVE
PVs:1AUTO ON:yes

MAX
PPs per VG:32768MAX PVs:1024

LTG
size (Dynamic): 128 kilobyte(s)AUTO SYNC:no

HOT
SPARE:noBB POLICY:relocatable

MIRROR
POOL STRICT: off

PV
RESTRICTION:noneINFINITE RETRY: no

AIX7[/]
#

The man page for mklv states the following:

-O y / n

Enables the infinite retry option
of the logical volume.

n

The infinite retry option of
the logical volume is not enabled. The failing I/O of the logical volume is not
retried. This is the default value.

y

The infinite retry option of
the logical volume is enabled. The failed I/O request is retried until it is
successful.

And last, but not
least, let’s take a brief look at Active
System Optimiser (ASO). To be
honest, I’m still not entirely sure how ASO works. But I have no doubt that
more information will be available from IBM soon. According to the presentation
material, ASO can “increase system performance by autonomously tuning system
configuration”. Wow, cool! It focuses on optimizing cache and memory affinity.
Hmmm, interesting. How the heck does it do that!? Only works with POWER7 and
AIX 7.1.

So can I enable this
on my p7 LPAR? Let’s give it a try!

AIX7[/]
# oslevel -s

7100-01-02-1150

AIX7[/var/log/aso]
# asoo -a

aso_active = 0

AIX7[/var/log/aso]
# asoo -p -o aso_active=1

Setting
aso_active to 1 in nextboot file

Setting
aso_active to 1

AIX7[/var/log/aso]
# asoo -a

aso_active = 1

Is the aso daemon
running already? Nope.

AIX7[/]
# ps -ef | grep aso

AIX7[/]
# lssrc -a | grep aso

asoinoperative

Can I start it now? Nope.

AIX7[/var/log]
# startsrc -s aso

0513-059
The aso Subsystem has been started. Subsystem PID is 7209122.

There may be better ways to do this, and if there are, please let me know. But lately I've been "hacking around" with cloud-init on AIX and trying to make it behave the way I want it to. There were two problems I faced and solved.

My first challenge. The AIX /etc/hosts file isn't updated with the IP address and hostname of the new AIX VM after deployment from PowerVC.

To work-around this niggle*, I added the following short but effective shell script to the Activation Input in the PowerVC GUI.

And the second niggle. In my customers lab environment, where there's no DNS at all. Everything is /etc/hosts only. Every time they deployed an AIX VM, there was a significant delay for ssh sessions, even with netsvc.conf set to hosts=local4, the dodgy**, default, resolv.conf (search localdomain) forced DNS first....and ssh connections would hang for a minute or so before a login prompt appeared....waiting on name resolution to complete.

Some people told me that I could change "manage_resolv_conf" to false in my cloud-init config file and this would prevent the resolv.conf file from being managed (over-written) by cloud-init. But changing that option did nothing. And I really didn't want a resolv.conf file at all anyway!

What I really wanted was for cloud-init to deploy the AIX VM and to NOT create an /etc/resolv.conf file. But how? Well, I managed to fudge it***. I made a change to the aix.py python script. With this change, the script now writes out an /etc/resolv.conf.cloud file instead. This works OK.

And I guess the next time I install the latest release of cloud-init for AIX, I'll need to modify the script again. But I'm OK with this, as I expect the newer release may actually provide me with a fix to each of the problems I faced.

*niggle: To cause slight but persistent annoyance, discomfort, or anxiety.

Before
the change, I was unable to compile anything. Note the highlighted text below.

#
/usr/vac/bin/xlc cgc.c

The
license for the Evaluation version of IBM XL C/C++ for AIX V11.1 compiler
product has expired. Please send an email to compiler@ca.ibm.com
for information on purchasing the product. The evaluation license can be
extended to 74 days in total by either a) setting an environment variable XLC_EXTEND_EVAL=yes; or, b) specifying a compiler command line option
-qxflag=extend_eval. The extended evaluation license will expire on Sat
Jun 18 10:26:33 2011. Use of the Program continues to be subject to the terms
and conditions of the International License Agreement for Evaluation of
Programs, including the accompanying License Information document (the
"Agreement"). A copy of the Agreement can be found in the
"LicAgree.pdf" and "LicInfo.pdf" files residing in the root
directory of the installation media. If you do not agree to the terms and
conditions of the Agreement, you may not use or access the Program.

With
the environment variable in place, I was able to compile again.

#
/usr/vac/bin/xlc cgc.c

#
./a.out

Hello
World!

Of
course I could also export the environment variable as required instead e.g.

#
export XLC_EXTEND_EVAL=yes

#
/usr/vac/bin/xlc cgc.c

#
./a.out

Hello
World!

If
you do place this environment variable in /etc/profile, you may need to restart
any processes on the system that need to call the compiler.

I enjoy it when I open my email in the morning and find a new message with a subject line of “weird one….”! I immediately prepare myself for whatever challenge awaits. Fortunately I do delight in helping others with their AIX challenges so I usually open these emails first and start to diagnose and troubleshoot the problem!

This week I was contacted by someone that was having a little trouble with a mksysb backup on one of their AIX systems.

“Hi Chris,

This one has me stumped, any ideas? I’ll have to log a call I think as I’m not sure why this is happening. I run a mksysb and it just backs up 4 files! I also can’t do an alt_disk_copy that also fails.

My /etc/exclude.rootvg is empty.

# cat /etc/exclude.rootvg
# mksysb -i /mksysb/aixlpar1-mksysb

Creating information file (/image.data) for rootvg.

Creating list of files to back up.

Backing up 4 files

4 of 4 files (100%)
0512-038 mksysb: Backup Completed Successfully.

# lsmksysb -f /mksysb/aixlpar1-mksysb
New volume on /mksysb/aixlpar1-mksysb:
Cluster size is 51200 bytes (100 blocks).
The volume number is 1.
The backup date is: Wed Oct 21 22:12:04 EST 2015
Files are backed up by name.
The user is root.5911 ./bosinst.data
11 ./tmp/vgdata/rootvg/image.info
11837 ./image.data
270567 ./tmp/vgdata/rootvg/backup.data
The total size is 288326 bytes.
The number of archived files is 4.”

This little tip was passed on to me by
a friendly IBM hardware engineer many years ago.

When entering a capacity on demand
(CoD) code into a Power system, you can tell how many processors and how much
memory will be activated, just by looking at the code you’ve given by IBM.

For example, the following codes, when
entered for the appropriate Power system, will enable 4 processors (POD) and
64GB of memory (MOD). I can also tell* that once the VET code is entered, this
system will be licensed for PowerVM Enterprise Edition (2C28).

This is a significant enhancement, as it will allow AIX administrators to install TLs and SPs (and ifixes) without restarting their AIX systems. From the announcement:

"AIX Live Update for Technology Levels, Service Packs, and Interim Fixes

Introduced in AIX 7.2, AIX Live Update is extended in Technology Level 1 to support any future update without a reboot, with either the geninstall command or NIM.

The genld command is enhanced to list processes that have an old version of a library loaded so that processes can be restarted when needed in order to load the updated libraries."

In this post I'll show you how to install updates without rebooting your AIX server. I recommend you first review my original article on Live Updates (from Oct 2015) in order to better understand the Live Update process, how it works and the requirements.

TL and SP Live Update support is delivered in AIX 7.2 TL1 (available November 11th 2016).

One of the biggest differences between the Live Update process for TL/SPs versus ifixes, is that you must backup your system prior to the update. This will be used in case you need to back out your system. The easiest way to do this is, is to create an alternate rootvg (alt disk clone).

On my system, I first applied TL1 for AIX 7.2 and verified the correct level was installed.

root@AIXmig / # oslevel -s

7200-01-00-0000

root@AIXmig / # cat /proc/version

Oct 10 2016

11:53:17

1640C_72D

@(#) _kdb_buildinfo unix_64 Oct 10 2016 11:53:17 1640C_72D

I had several "free" disks that I could use for the Live Update process. I'd need at least 3 disks, one for my alternate rootvg (back out), one for the mirror disk and one for the new rootvg. In this case I used hdisk6 (alt rootvg), hdisk3 (mdisk) and hdisk2 (ndisk). This was specified in the lvupdate.data configuration file. All three disks were large enough to hold a complete copy of my existing rootvg.

root@AIXmig / # lspv

hdisk0 00f94f58cecabed6 rootvg active

hdisk1 00f94f58697b768f datavg active

hdisk2 00f94f58697b7655 None

hdisk3 00f94f58ce74a739 None

hdisk4 00f94f58a3b2f963 None

hdisk5 00f94f58a3b2f9d4 None

hdisk6 00f94f58def77f2c None

root@AIXmig / # cat /var/adm/ras/liveupdate/lvupdate.data

...

disks:

nhdisk = hdisk2

mhdisk = hdisk3

Note, with 7.2 TL1, only two disks need to be specified in the lvupdate.data file. The tohdisk and tshdisk are not needed with TL1 unless you have a paging or dump device outside of rootvg.

I cloned my rootvg to a spare disk (hdisk6) first. I specified the -B flag to ensure the boot list was not changed.

If I needed to back out, to the previous level, I could change the boot list to point to the alternate rootvg (hdisk6), and restart the system.

root@AIXmig / # lspv

hdisk0 00f94f58cecabed6 rootvg active

hdisk1 00f94f58697b768f datavg active

hdisk2 00f94f58697b7655 lvup_rootvg

hdisk3 00f94f58ce74a739 None

hdisk4 00f94f58a3b2f963 None

hdisk5 00f94f58a3b2f9d4 None

hdisk6 00f94f58def77f2c altinst_rootvg

root@AIXmig / # bootlist -m normal -o

hdisk0 blv=hd5 pathid=0

hdisk0 blv=hd5 pathid=1

root@AIXmig / # bootlist -m normal hdisk6

hdisk6 blv=hd5 pathid=0

hdisk6 blv=hd5 pathid=1

root@AIXmig / # bootlist -m normal -o

hdisk6 blv=hd5 pathid=0

hdisk6 blv=hd5 pathid=1

root@AIXmig / # at now

shutdown –Fr

Job root.1477276394.a will be run at Mon Oct 24 13:33:14 AEDT 2016.

This is very cool technology. Gone are the days of needing to plan reboots shortly after applying a new TL or SP to a critical AIX system. You simply "live update" your system, without disrupting your workloads or your users. This is a win for AIX administrators everywhere!

Please refer to the AIX 7.2 Knowledge Center for more information on Live Update.