Chris's AIX Blog

IBM
recently announced some significant changes to their ‘AIX Release and Service
Delivery’ strategy. Starting February 2011, the following changes will apply
for both AIX 6 and AIX 7.

•Three years of support for each Technology Level
(TL).

•A single Technology Level per AIX version, per year.

•Service Packs will be released approximately 4 times
per year per Technology Level.

Some of the benefits of the new strategy are:

Longer
support for Technology Levels.

–More closely matches client deployment requirements.

·Single
Technology Level per year, per version, and about one Service Pack per quarter.

–Fewer updates reduce administrative workload.

·New hardware
is supported on prior Technology Levels.

–Easier
integration of new hardware into existing environments.

You can
review the official presentation and details here.
I particularly
like this FAQ from the presentation.

Q: Why doesn’t
the new strategy include AIX V5.3?

A: AIX V5.3 is
nearing end of life with the End of Marketing for AIX V5.3 set for April 2011

So if you
are on AIX 5.3, start planning a migration to AIX 7.1 (or at least AIX 6.1), NOW! I’ve provided some further
information from the presentation below.

Longer Support for each Technology Level

The 2011
Release Strategy extends the support for each Technology Level for up to three
years from the introduction of the Technology Level. This means that clients
with a Software Maintenance Agreement for the AIX OS will receive IBM defect
support during that two year period without having to move up to the latest
Technology Level update.

For example, a Technology Level introduced in the second half of 2010 will be
supported though the second half of 2013. Since each Technology Level will be
supported for three years and IBM plans to release one Technology Level per
year, IBM eventually will support up to six Technology Levels for each AIX release.Please note that due to minor variations in
release dates, some Technology Levels will be supported for slightly more that
three years and some will be supported for slightly less than three years. A
three year service life for each Technology Level is an objective, not an
absolute limit.

Fewer Technology Levels and Fewer Service Packs

Historically,
new Technology Levels were the only way to deliver new hardware support to
clients. Since 2007, we have supported most new hardware on previous Technology
Levels via a service pack. Since this new approach to supporting new hardware
has been well proven with the introduction of POWER7 in 2010, we decided to
eliminate the “Spring” Technology Level which reduces the number of updates
clients have to deal with. The number of Service Packs has been about 5 per
year and we want to reduce this number to about once per quarter.

Between
Technology Level releases, clients maintain their AIX operating systems by
installing Service Packs (SP) or Interim Fixes for the entire support life of
the Technology Level update. Service Packs are also used to provide support for
newly released hardware.

The release
dates and frequency of Service Packs are variable due to many factors.
including new Technology Level releases, new hardware introduction and the need
to deliver software fixes.

In the past
IBM delivered new Service Packs about every eight to twelve weeks or about five
to eight times per year per Technology Level. With the 2011 release strategy,
IBM intends to reduce the number of Service Packs released per year, per AIX
release to about four. As stated earlier, there are many factors that drive the
release of a Service Pack, so releasing four Service Packs per year, per
Technology Level represents a goal and objective, but not an absolute limit.

Note that new
Technology Levels are almost always accompanied very closely by the first
Service Pack for that new Technology Level. This first Service Pack includes
fixes to problems that are discovered between the time the new Technology Level
is released to manufacturing for the media to be replicated and the time the
new Technology Level is actually available to clients.

Summary

The providers of all
software, including platform software such as the AIX operating system, must
balance the client’s need for stability against the need to enhance the
software to provide new functionality. The changes introduced with the AIX OS
Release and Service Strategy of 2011 balances those two conflicting goals to
help provide our clients with significant improvements in the manageability of
the AIX operating system.

The increased service life, better
manageability and support for new hardware on older fix levels provided by the
new strategy are in direct response to our client’s requirements.

As stated earlier, this new
strategy represents goals and objectives of IBM and is subject to change. Our
clients understand that substantial changes such as this new strategy are an
indication of the commitment that IBM has to improving the market leading
capabilities of the AIX operating system.

Several new features
were mentioned in the launch, but there were two new features that I found
particularly interesting:

- AIX 5.2 WPARs
for AIX 7

- Cluster
Aware AIX

I thought I would
briefly describe each feature in this post.

AIX 5.2 WPARs
for AIX 7

In AIX version 7,
administrators will now have the capability to create Workload Partitions
(WPARs) that can run AIX 5.2, inside an AIX 7 operating system instance. This
will be supported on the POWER7 server platform.This is pretty cool. IBM have done this to allow some
customers, that are unable to migrate to later generations of AIX and Power, to
move to POWER7 whilst keeping their legacy AIX 5.2 systems operational. So for
those clients that MUST stay on AIX 5.2 (for various reasons such as
Application support) but would like to run their systems on POWER7, this
feature may be very attractive. It will help to reduce the effort required when
consolidating older AIX 5.2 systems onto newer hardware. It may also reduce
some of the risk associated with migrating applications from one version of the
AIX operating system to another.

To migrate an existing
AIX 5.2 system to an AIX 7 WPAR, administrators will first need to take a
mksysb of the existing system. Then they can simply restore the mksysb image
inside the AIX 7 WPAR. IBM will also offer limited defect and how-to support
for the AIX 5.2 operating system in an AIX 7 WPAR. These WPARs can, of course,
be managed via IBM Systems Director with the Workload Partitions Manager
plug-in.

The following figure
provides a visualization of how these AIX 5.2 systems will fit into an AIX 7
WPAR. The WPARs in blue are native AIX 7 WPARs, while the WPARs in orange are
AIX 5.2 WPARs running in the same AIX 7 instance. Pretty amazing really!

I can only speculate as to what other versions of AIX will be
supported in this manner in the future. Just imagine an AIX 5.3 WPAR inside
AIX7?

Cluster
Aware AIX

Another very interesting feature
of AIX 7 is a new technology known as “Cluster
Aware AIX”. Believe it or not, administrators will now be able to create a
cluster of AIX systems using features of the new AIX 7 kernel. IBM have introduced
this “in built” clustering to the AIX OS in order to simplify the configuration
and management of highly available clusters. This new AIX clustering has been
designed to allow for:

- The easy creation
of clusters of AIX instances for scale-out computing or high availability.

-Capabilities such as common device naming to help simplify
administration.

- Built in event management and
monitoring.

- A foundation for
future AIX capabilities and the next generation of PowerHA SystemMirror.

This does not replace PowerHA but
it does change the way in which AIX traditionally integrates with cluster
software like HACMP and PowerHA. A lot of the HA cluster functionality is now
available in the AIX 7 kernel itself. However, the mature RSCT technology is
still a component of the AIX and PowerHA configuration. I’m looking forward to
reading more about this new technology and it’s capabilities.

These are just two of the many
features introduced in AIX 7. I’m eagerly looking forward to what these
features and others mean for the future of the AIX operating system. It’s
exciting to watch this operating system grow and strengthen over time. I can’t
wait to get my hands on an AIX 7 system so that I can trial these new features.

And speaking of trialing AIX 7,
there is good news. IBM plan on running another AIX Open Beta program for AIX 7
mid 2010. Just as they did with AIX
Version 6, customers will be given the opportunity to download a beta
version of AIX 7 and trial it on their own systems in their own environment.
This is very exciting and I’m really looking forward to it.

I encourage you to read the
official AIX 7 announcement to learn more about the future of the AIX operating
system and what it can do for you and your organization…..for many years to
come!

There are two new HMC (V7.7.3.0) commands that can force a client Virtual
Fibre Channel adapter to log into a SAN. This should make the life of the AIX
and SAN administrator easier, as they will no longer need to install AIX in
order for the new VFC adapters to log into the SAN. Although there was an
unsupported method* for doing this already (see links below). Nor will the SAN
admins need to “blind” zone the WWPNs.

The
new lsnportlogin and chnportlogin commands on the 7.730 HMC provide the ability
to utilize the new function in the VIOS. From the Readme:

Added the
chnportlogin and lsnportlogin commands.

1.The chnportlogin command allows you to
perform N Port login and logout operations for virtual fibre channel client
adapters that are configured in a partition or a partition profile. Use this
command to help you in zoning WWPNs on a Storage Area Network (SAN). A login
operation activates all inactive WWPNs, including the second WWPN in the pair
assigned to each virtual fibre channel client adapter. This feature is
particularly useful for Logical Partition Migration. A logout operation
deactivates all WWPNs not in use. A successful login of a virtual fibre channel
adapter requires that the corresponding virtual fiber channel server adapter
must exist and that it must be mapped.

...timeout Enables you to specify the initial timeout for a query to a nameserver. The default value is five seconds. The maximum value is 30 seconds. For the second and successive rounds of queries, the resolver doubles the initial timeout and is divided by the number of nameservers in the resolv.conf file.

attempts Enables you to specify how many queries the resolver should send to each nameserver in the resolv.conf file before it stops execution. The default value is 2. The maximum value is 5.

rotate Enables the resolver to use all the nameservers in the resolv.conf file, not just the first one....

Specifies the number of threads in threaded mode, where the value of the thread parameter is 1. This value applies only when the thread mode is enabled. The nthreads attribute can be set to any value between 1 and 128. The default value is 7.

Queue size (queue_size)

Specifies the queue size for the Shared Ethernet Adapter (SEA) threads in threaded mode where the value of the thread parameter is 1. This attribute indicates the number of packets that can be accommodated in each thread queue. This value applies only when the thread mode is enabled. When you change this value, the change does not take effect until the system restarts. The queue_size attribute can be set to any value between 2 and 65535. The default value is 8192.

Hash algorithms (hash_algo)

Specifies the hash algorithm that is used to assign connections to Shared Ethernet Adapter (SEA) threads in threaded mode, where the value of the thread parameter is 1. When the hash_algo parameter is set to 0 (the default), an addition operation of the source and destination Media Access Control (MAC) addresses, IP addresses, and port numbers is done. When the hash_algo parameter is set to 1, a murmur3 hash function is done instead of an addition operation. The murmur3 hash function is slower, but it achieves better distribution. This value applies only when the thread mode is enabled.

Number of concurrent partition mobility operations for the mover service partition

True

concurrency_lvl

3

Concurrency level

True

lpm_msnap_succ

1

Create a mini-snap (when a migration ends, the set of information related to a specific migration, that is gathered and packed on each mover service partition involved in the migration), for successful migrations

Specify fibre channel ports using vios_fc_port_name. Run the lslparmigr command to show a list of available slot IDs for a VIOS partition. Run the migrlpar command to accomplish the following tasks:

Specify virtual slot IDs for one or more virtual adapter mappings.

Validate the specified slot IDs.

Note: You can specify the port name of the Fibre Channel to be used for creating Fibre Channel mapping on the source server when you are performing partition migration.

You can use the HMC command line interface to specify the port name. List all the valid port names of the Fibre Channel by running the lsnports command. From the list of the valid port names, specify the port name that you want to use by running the migrlpar command with the attribute

vios_fc_port_name specifying the port name you want to use.

The following attributes of pseudo device can be modified by using the migrlpar command:

num_active_migrations_configured

concurr_migration_perf_level

Run the following HMC command to modify the attribute values of the pseudo device, for example to set the number of active migrations to 8 run:

The virtual adapter is not ready to be moved. The source virtual Ethernet is not bridged.

2

The virtual adapter can be moved with less capability. All virtual local area networks (VLAN) are not bridged on the destination. Hence, the virtual Ethernet adapter has less capability on the target system compared to the source system.

3

The stream ID is still in use.

64

The migmgr command cannot be started.

65

The stream ID is invalid.

66

The virtual adapter type is invalid.

67

The virtual adapter DLPAR resource connector (DRC) name is not recognized.

68

The virtual adapter method cannot be started, or it was prematurely terminated.

69

There is a lack of resources (that is, the ENOMEM error code).

80

The storage that is being used by the adapter is specific to the VIOS and cannot be accessed by another VIOS. Hence, the virtual adapter cannot complete the mobility operation.

81

The virtual adapter is not configured.

82

The virtual adapter cannot be placed in a migration state.

83

The virtual devices are not found.

84

The virtual adapter VIOS level is insufficient.

85

The virtual adapter cannot be configured.

86

The virtual adapter is busy and cannot be unconfigured.

87

The virtual adapter or device minimum patch level is insufficient.

88

The device description is invalid.

89

The command argument is invalid.

90

The virtual target device cannot be created because of incompatible backing device attributes. Typically, this is because of a mismatch in the maximum transfer (MTU) size or SCSI reserve attributes of the backing device between the source VIOS and the target VIOS.

91

The DRC name passed to the migration code is for an adapter that exists.

I’ve been working with a customer recently on
an issue with nimadm. They were
attempting to migrate a system from AIX 5.3 to 7.1 using nimadm. The NIM client AIX
level was 5.3 TL12 SP4 and the NIM master was running AIX 7.1 TL1 SP1.

Given that the error appeared to be related to init_multibos, we assumed the failure was due to some multibos checks being performed by alt_disk_copy on the client. The client
system did not have an existing multibos
standby instance. So, we tried two things: First we created a standby instance
on the client (multibos –s –X) and
re-tried the nimadm operation. This
failed. Next we removed the standby instance (multibos –R) and re-tried the nimadm
operation. This worked and the client then migrated to AIX 7.1 successfully. We
re-tried the same operations (i.e. create standby instance, remove standby
instance & nimadm) several times and each worked as expected.

So it appeared that the unofficial work around to this problem would
be to create and then remove a standby multibos instance prior to the nimadm migrate. However, the customer
has over 200 LPARs that they need to migrate to AIX 7.1. If possible they would
really rather avoid this extra step in the AIX 7.1 migration plan. We’ve made
contact with IBM support and are hoping they can assist us in identifying the
root cause of the issue and provide us with an official solution to the
problem.

And just yesterday we hit the same problem when migrating from AIX 6.1
to 7.1 using nimadm. I’ll update my
blog with any progress we make with this problem. In the meantime, our
unofficial work around will get us “out of hot water”!

UPDATE (14/12/2011): The simple fix is to remove the /bos_inst directory before attempting
the AIX migration. i.e.

If you have a multibos image in rootvg, remove it. AIX
migrations are not
supported with multibos enabled systems. Ensure all rootvg LVs are
renamed to their legacy names. If necessary, create a new instance of
rootvg and reboot the LPAR. For example:

# multibos
–sXp

# multibos
–sX

# shutdown
–Fr

Confirm the legacy LV names are now
in use that is, not bos_.

# lsvg -l
rootvg | grep hd | grep open

hd6paging801602open/syncdN/A

hd8jfs2log122open/syncdN/A

hd4jfs2122open/syncd/

hd2jfs27142open/syncd/usr

hd3jfs216322open/syncd/tmp

hd1jfs2122open/syncd/home

hd9varjfs28162open/syncd/var

hd7sysdump881open/syncdN/A

hd7asysdump881open/syncdN/A

hd10optjfs28162open/syncd/opt

Remove the old multibos instance.

# multibos -R

Unfortunately, it appears that ‘multibos
–R’ may not clean up the /bos_inst directory. If this directory exists the
nimadm operation will most likely fail.

Someone installs SAP onto an AIX system and decides to use TCP
port 3901 as an SAP service port. This is the same port used by nimsh. In some
rare cases, nimsh may not be active on the LPAR, which makes it easy for the
SAP installation to hijack port 3901. If nimsh is active, the person installing
SAP may consciously stop nimsh and use port 3901 for SAP anyway. Hopefully that
doesn’t happen. Hopefully, they will talk to the AIX administrator and discuss
the best way forward. Hopefully...

In either case, if the port is taken by SAP, nimsh will no longer
work. If you love using NIM as much as I do, this is a real problem! We could
revert back to using rsh but no-one will do this anymore because of concerns
around security. And rightfully so!

The ports used by nimsh (3901 and 3902) are registered to Internet
Assigned Number Authority (IANA). These port numbers appear in the
/etc/services file.

nimsh3901/tcp# NIM Service Handler

nimsh3901/udp# NIM Service Handler

nimaux3902/tcp# NIMsh Auxiliary Port

nimaux3902/udp# NIMsh Auxiliary Port

Considering these port numbers are registered with IANA, we can usually
persuade our SAP colleagues to change their SAP installation to use a different
port number. However, depending on the skills/experience of the SAP resource,
one of two things usually happens 1) They take an outage, re-install SAP and
choose a different port number or 2) The more experienced/confident SAP basis
resource will take an outage and modify the instance to use a different port:
without reinstalling SAP.

Perhaps SAP need to include a warning in their install notes,
advising customers not to use port 3901 on AIX systems (i.e. best practice)?

Now, if you must change nimsh to use a different port number, it
is possible. But not recommended.

To do this, you must change the /etc/services file on the NIM master
and the NIM client to reflect the same port numbers for nimsh. This will work
until the NIM master or the NIM client have their services file overwritten by
way of install or fileset updates. After which, the default values for nimsh
will be reinstated.

You would also need to change the services file on all of your NIM clients. Every time you
performed a NIM fileset update, you would need to remember to change the /etc/services
file again. This is painful and bound to catch someone out eventually!

In the following example I’ll demonstrate how to change the port
number used by nimsh.

We start with a typical nimsh configuration using port 3901. On
the NIM client, nimsh is listening on port 3901.

We can confirm that we have connected to the NIM client on port
39011 by looking at the output from lsof
and netstat. There is a TCP session
established between the master and the client on port 39011.

I don’t know if there are any plans to fix this but I wanted to
share this information in case anyone else encounters this issue. iSCSI is not
all that popular on AIX systems (perhaps this is a good thing?), so it may not
impact enough people to warrant a quick fix.

The nohup command in AIX has the
ability to “nohup” an existing process and protect if from hangup signals. Using the -p
flag, an existing, running process can be “nohup’ed” and told to ignore SIGHUPs.

"Why is this useful?" I hear you ask.

Well, I'll often start a "long running" process
on a system at work, like a backup for instance. I'll sometimes forget to use the nohup
command, so that my long running job will ignore any SIGHUPs and continue
running, say for example if my ssh session to the server dies. The nohup command
also generates a nohup.out file to capture any
messages (errors or otherwise) generated by the process e.g.

$ nohup ./processname
&

$ ls –l
nohup.out

-rw-------1 cgibsonstaff0 May 26 19:42 nohup.out

In older releases of AIX, if I forgot to use nohup, I
had two choices. I could either stop the process and start it again with nohup(often not desirable or possible) or I could
leave my ssh session logged in on my work PC and hope that it didn't power down
or crash or that my ssh connection did not drop over night.

Fortunately with more recent releases of AIX, this is not
a problem anymore. If I forget to “nohup” a process, I can use the -p
flag to inform the already running process that it should ignore all hangup
(SIGHUP) signals. This will allow the process to continue running if my ssh
session (terminal) dies and will also capture any output from the process into
the nohup.out file,
so I don’t lose any important output from the process.

Here’s an example. I’m at work, it's 5pm and I've started
a "long running" process named mycmd. This was started
via an ssh (Putty) session from my PC.

Oh no! I've forgotten to start the process with nohup! I don't want to restart my process but I want to protect
it from SIGHUPs, while it executes after hours.

No problem, I'll use nohup to modify the
running process, so I can rest assured it will continue to run, even if my ssh
connection drops over night. From home I start a new ssh session on the server
and run nohup against
the running process:

$ nohup -p
823432

At work the next morning, I discover that my original
Putty session has indeed dropped! But, my process is still happily running! :)

; Partial
output from topas showing mycmd still
running.

NamePIDCPU%

mycmd82343299.3

$ proctree 823432

520394-ksh

585738-ksh

823432/home/cgibson/mycmd

You might find this useful if you ever forget to “nohup”
a process and don’t want to restart it again.

For those who weren't aware.....starting with AIX 5.3 TL7, quorum changes are allowed online without having to varyoff and varyon the volume group. This also means no reboot is required when changing quorum for rootvg. Of course this also applies to AIX 6.1.