I recently enabled the AIX name server caching daemon (netcd) on AIX. The process simply involved starting the netcd subsystem and uncommenting its entry in /etc/rc.tcpip so that the service will automatically start during the next system boot.

This is where the fun stopped - as I soon found that the daemon crashed on certain systems.

I will update this blog entry once the root cause of the crashes has been established. In the meantime, I would recommend running on the latest SP before enabling netcd as there are at least a few known issues, one of which appear fairly serious:

Sometimes I write scripts where it is important or at least desirable to include time-stamps. Why? When scripting, it is good practice to add time-stamps and perform return code validation around system calls and external programs. For more complex activities, I prefer to make use of perl. Today however, I was busy writing a tiny script to DLPAR entitled CPU from one LPAR to another that did not require such complexity when suddenly, like a stubborn mule, I was struck by an inexplicable refusal to make use of backticks or sub-routines to refresh the time-stamps of variables in the echo statements of my ksh script.

Alas, I started to convulse at the thought of code like this(!):

echo "`date +"%Y-%m-%d %H:%M:%S"`: invoking chhwres on remote HMC."

It was then that I decided to visit the google fairy, who directed me to a neat enhanced korn shell (ksh93) feature called a discipline function, explained in more detail here. In short, this allows you to assign a function to each variable in your script. While the possibilities are endless, it provides a straightforward method to ensure that time-stamp variables are updated as and when they are referenced in your shell script.

Here is a simple example:

#!/usr/bin/ksh93

function TIMESTAMP.get { .sh.value=$(date +"%Y-%m-%d %H:%M:%S")}

# $TIMESTAMP will dynamically be updated when referenced. echo "Time-stamp is $TIMESTAMP."sleep 2echo "Time-stamp is $TIMESTAMP."

I was doing some health checks when I noticed that some of our VIO (2.1.3) servers was running a version of OpenSSH that has a known security vulnerability. You will hopefully be aware that OpenSSH is provided by IBM as part of the AIX since v6.1 which means that the latest TL and VIO updates will include fixes for SSH and any related security issues.

I will be quick to acknowledge that the simplest approach might be to ensure that regular VIOS software maintenance is instated and to regularly perform VIO server updates of the entire system. But in many production environments it is not possible to do this without extensive planning and delay, so the alternative I would like to discuss is an update of just OpenSSH itself.

What follows here is an outline of a reasonably straightforward process for updating just the IBM OpenSSH software without having to update any other software. Please however note that the file set names and versions used do not apply to the portable version of OpenSSH.

1. Determine if you are running a vulnerable or weak level of OpenSSH (or the dependent OpenSSL file set) that requires an update.

There are various sources to identify SSH vulnerabilities, and I based my information on to http://www.openssh.org/security.html. This stipulates that Portable OpenSSH prior to version 5.8p2 is vulnerable to a local
host key theft attack described in
portable-keysign-rand-helper.adv advisory.

Oh dear. This system is making use of OpenSSH 5.2 and would be at risk against a local/internal network attack, however unlikely that may be.

2. Decide on a new OpenSSH file set level.

The typical approach might be to navigate to the OpenSSH on AIX SourceForge page but I elected not to do so since I am more trusting of the actual software included in the AIX technology level and service pack updates.

An (easy/safe) approach is to navigate to the Fileset Information for openssh.base.server page and select the highest level of OpenSSH for the current release of AIX that is active the VIO server. In other words, I would deem openssh.base.server.6.0.0.6100 (in AIX 6100-08) as the best update choice, since it is the highest level of SSH available for AIX 6.1 that runs on a VIO 2.1.3 server. I would however never consider a file-set for AIX 7.1 as a candidate on a AIX 6.1 system, even though it may technically happen to be the same.

At this point, you would have to download the service pack or technology level if you have not already done so. Since I have various packages available at my disposal on my NIM servers, I was able to simply navigate to the appropriate directory before continuing.

3. Commit all existing file sets on the VIO server.

It is essential to ensure that all software changes are revertible - this is the golden rule of system administration. I ensured that I have a valid backup of the VIO server before proceeding to commit all existing file sets, as the safest approach to the update would be to only APPLY the new SSH file sets.

and this committed about 260 file-sets which remained as a backout point from a previous VIO server update in short amount of time.

4. Copy the required OpenSSH and OpenSSL file sets to a temporary directory.

I found that it is not possible to simply select and update just the SSH specific file sets from the AIX TL repository, as a preview operation (be safe) indicated that automatic requisite installation would require an update of other file sets like bos.rte.install. While obvious, I will rather also mention that under no circumstances should one attempt to update a VIO server directly using any AIX technology level software - the only valid upgrade source for a VIO server is the VIO server specific service packs.

Now, to identify the installed openssh and openssl packages that must be updated:vio1:/# lslpp -l|egrep -e "openss[hl]"

At which point I was able to identify the following file-sets to update (language specific file sets may vary):openssl.man.en_USopenssl.license (needed for licensing purposes)openssl.base (includes server and client software)openssh.man.en_US

To identify names of the actual BFF files that contain the updates, I simply had to run a grep command for each of the above in the .toc, for example: vio1:/mnt/aix6/6100-07-05-1228# grep "openssh.base.server" .toc

At this point, I was able to identify and copy the file sets to a temporary location:vio1:/mnt/aix6/6100-07-05-1228# mkdir /tmp/openssh && cp -p U846944.bff /tmp/openssh

...

and after repeating the copy for each of the above file sets I could execute inutoc . from within the temporary directory.

The thing is, in the worst case scenario it is conceivable that the SSH update fails and that the SSH daemon may abort for an inexplicable reason. In such a case only existing SSH sessions will work, new connections can not be made unless the sshd daemon is active. So it is always a good idea to establish a separate SSH connection as a fall back (and to make sure that the TMOUT setting does not automatically log you out). This session will not be terminated if the sshd itself aborted during the update process.

Secondly, the safest approach would also be to back up the SSH keys, authorized_keys, known_hosts and all other SSH configuration files in ~root/.ssh in case anything gets overwritten, so run:

tar cvf /tmp/root-ssh.tar ~root/.ssh

Okay, enough fooling around. In order to perform the actual update - I first perform a preview update (as per any software operation I do), and then perform the actual installation:

Now, all that is required is a restart of the SSH daemon. It is managed by the SRC, and can be restarted by the following commands:stopsrc -s sshd; sleep 3; startsrc -s sshd;

and then testing a new SSH connection to the system. The process associated with this new SSH connection should have a later start time than the actual sshd parent process that we just restarted. Do not forget to restart the SSH daemon, or else you might get a nasty surprise when you reboot the system (if something went wrong).

In conclusion, this is a very straightforward update process that I approached rather cautiously to be safe. It is actually best to test an update like this on a non-critical environment, but it is not very challenging and it may be useful in case you require the update for compliance or other reasons of irritation.

A customer asks you to migrate a WPAR from one GE (Global Environment) to another. Workload partition manager is not available so it is not possible to show off the zero downtime mobility features, but the customer does not mind a migration using backup and restore.

After requesting a shut-down of the applications, you create a WPAR backup:

savewpar -f /backup/my_wpar.backup my_wpar.

It works just fine. You know that /etc/wpars/my_wpar.cf contains the information you would need to manually create the WPAR if problems are encountered. Confidence levels are high!

You then initiate the restore by running restwpar -f/mnt/my_wpar.backup -n my_wpar -d /wpars/my_wpar on the destination GE, but disaster strikes!

Reading through the APAR, it is clear that a modification of image.data is required, since applying the fix and repeating the process on the source GE is not feasible. The image.data file can simply be grabbed from the source system or if it is not available, it will have to be extracted from the WPAR backup file via the following command:

restore -xvf /mnt/my_wpar.backup ./.savewpar_dir/image.data

You are now able to modify the image.data file, and change the invalid entries to address the defect.

MIRROR_WRITE_CONSISTENCY= 0

LV_SEPARATE_PV= 432

to:

MIRROR_WRITE_CONSISTENCY= on/ACTIVE

LV_SEPARATE_PV= yes

(On my system, I actually had LUNS mapped to another LPAR and the PV numbers for the volume group differed. So all the hdisk numbers had to be modified as well)