A special PTF (U86665.bff) provides POWER8 support on AIX 5.3 with the following limitations:

The LPAR must be at AIX 5.3 TL12 SP9 (latest 5.3 release).

The POWER8 LPAR will run in POWER6 compatibility mode and is limited to SMT2 mode. SMT2 mode results in some capacity loss compared to SMT4/SMT8 mode. IBM publishes SMT2 rPerf values that can be used to quantify Power8 SMT2 capacity.

mksysb – First perform an in place update to a supported (P5/P6/P7) AIX 5.3 TL12 SP9 LPAR with PTF U866665. Standard mksysb command can then be used to capture a P8 capable mksysb image. The mksysb image can then be used to install POEWR8 LPARs

NIM – An AIX 5.3 TL12 SP9 NIM environment must be updated to support POWER8. An AIX 5.3 TL12 SP9 NIM lpp_source must be updated to include PTF U866665. A NIM SPOT must then be created to utilize the updated lpp_source.

All POWER8 systems are supported with the following restrictions:

POWER8 system must be at 840 firmware level.

POWER8 LPAR must be served by a 2.2.4.10 VIOS.

Other restrictions:

Remote restart is not supported with POWER8 5.3 LPARs

POWER8 AIX 5.3 LPARs are not supported with a VIOS using Shared Storage Pools.

I tested this in my lab and it all worked as expected. I installed the PTF on an existing AIX 5.3 LPAR on a POWER7 750 and rebooted the partition.

[root@53gibbo]/ # lspv

hdisk0 00f603cdcd9ba6de rootvg active

[root@53gibbo]/ # oslevel -s

5300-12-09-1341

[root@53gibbo]/ # lsconf | more

System Model: IBM,8233-E8B

Machine Serial Number: 1003CDP

Processor Type: PowerPC_POWER7

Processor Implementation Mode: POWER 6

Processor Version: PV_6_Compat

Number Of Processors: 2

Processor Clock Speed: 3550 MHz

CPU Type: 64-bit

Kernel Type: 64-bit

LPAR Info: 3 53gibbo

Memory Size: 2048 MB

Good Memory Size: 2048 MB

Platform Firmware level: AL730_142

Firmware Version: IBM,AL730_142

[root@53gibbo]/tmp/cg/aix53p8 # inutoc .

[root@53gibbo]/tmp/cg/aix53p8 # ls -ltr

total 35808

-rw-r----- 1 root system 8994816 Feb 8 09:35 U866665.bff

-rw-r----- 1 root system 8994816 Feb 8 09:37 bos.mp64.5.3.12.10.U

-rw-r--r-- 1 root system 328592 Feb 8 09:37 .toc

[root@53gibbo]/tmp/cg/aix53p8 # update_all -d .

install_all_updates: Initializing system parameters.

install_all_updates: Log file is /var/adm/ras/install_all_updates.log

install_all_updates: Checking for updated install utilities on media.

install_all_updates: Processing media.

install_all_updates: Generating list of updatable installp filesets.

install_all_updates: The following filesets have been selected as updates

My POWER8 system was an S824 running the latest firmware (840_056), with VIOS running the latest code (2.2.4.10) and the latest HMC code (8.4).

p8vio1:/home/padmin# lsconf | head -20

System Model: IBM,8286-42A

Machine Serial Number: 214F55V

Processor Type: PowerPC_POWER8

Processor Implementation Mode: POWER 7

Processor Version: PV_7_Compat

Number Of Processors: 2

Processor Clock Speed: 3525 MHz

CPU Type: 64-bit

Kernel Type: 64-bit

LPAR Info: 101 82861_vio1

Memory Size: 18432 MB

Good Memory Size: 18432 MB

Platform Firmware level: SV840_056

Firmware Version: IBM,FW840.00 (SV840_056)

On my NIM master I created a new lppsource and SPOT for AIX 5.3 TL12 SP9 and included the required PTF U86665.

# lsnim -t spot | grep 53

spotAIX53TL12SP9 resources spot

# lsnim -t lpp_source | grep 53

AIX53TL12SP9 resources lpp_source

# nim -o showres spotAIX53TL12SP9 | grep bos.mp64

bos.mp64 5.3.12.10 C F Base Operating System 64-bit

# nim -o showres AIX53TL12SP9_P8 | grep bos.mp64

bos.mp64 5.3.11.2 I b usr,root

bos.mp64 5.3.12.9 S b usr

bos.mp64 5.3.12.10 S b usr

I took a mksysb on the 5.3 LPAR to my NIM master. I then used the mksysb to perform an NIM mksysb install (restore) to a new partition on the S824. This worked flawlessly. The partition was configured in POWER6 mode with SMT2 only.

[root@53gibbo]/ # oslevel -s

5300-12-09-1341

[root@53gibbo]/ # lsconf | head -10

System Model: IBM,8286-42A

Machine Serial Number: 214F55V

Processor Type: PowerPC_POWER8

Processor Implementation Mode: POWER 6

Processor Version: PV_6_Compat

Number Of Processors: 2

Processor Clock Speed: 3525 MHz

CPU Type: 64-bit

Kernel Type: 64-bit

LPAR Info: 2 53gibbo

Memory Size: 2048 MB

Good Memory Size: 2048 MB

Platform Firmware level: SV840_056

Firmware Version: IBM,FW840.00 (SV840_056)

Console Login: enable

Auto Restart: true

Full Core: false

[root@53gibbo]/ # smtctl

This system is SMT capable.

SMT is currently enabled.

SMT boot mode is not set.

SMT threads are bound to the same virtual processor.

proc0 has 2 SMT threads.

Bind processor 0 is bound with proc0

Bind processor 1 is bound with proc0

proc2 has 2 SMT threads.

Bind processor 2 is bound with proc2

Bind processor 3 is bound with proc2

[root@53gibbo]/ # lparstat -i

Node Name : 53gibbo

Partition Name : 53gibbo

Partition Number : 2

Type : Shared-SMT

Mode : Uncapped

Entitled Capacity : 0.20

Partition Group-ID : 32770

Shared Pool ID : 0

Online Virtual CPUs : 2

Maximum Virtual CPUs : 4

Minimum Virtual CPUs : 1

Online Memory : 2048 MB

Maximum Memory : 4096 MB

Minimum Memory : 1024 MB

Variable Capacity Weight : 128

Minimum Capacity : 0.10

Maximum Capacity : 4.00

Capacity Increment : 0.01

Maximum Physical CPUs in system : 24

Active Physical CPUs in system : 24

Active CPUs in Pool : 14

Shared Physical CPUs in system : 14

Maximum Capacity of Pool : 1400

Entitled Capacity of Pool : 1160

Unallocated Capacity : 0.00

Physical CPU Percentage : 10.00%

Unallocated Weight : 0

Desired Virtual CPUs : 2

Desired Memory : 2048 MB

Desired Variable Capacity Weight : 128

Desired Capacity : 0.20

The example above shows you how to migrate an AIX 5.3 system using NIM and mksysb. But you don’t have to do this, as LPM is fully supported. The migration effort for many customers is as easy as kicking off an LPM operation to move their (updated) AIX 5.3 partitions from their older Power servers to POWER8. Easy!

So I guess customers that still need to run AIX 5.3 on POWER8 now have an alternative to AIX 5.3 Versioned WPARs. But please keep in mind the performance and support considerations when making your decision.

So you want to open a text console session on an LPAR on one of your Power Servers? You could log on to the HMC and run vtmenu from the HMC command line. This works but if you’re looking for a more efficient way, you might want to consider using the dconsole command on AIX. If, like many, you have a central or main NIM server from which you manage your AIX environment, then the dconsole command is probably a logical choice of tools to add your AIX admin bag of tricks.

Create (or edit) the /etc/ibm/sysmgt/dsm/nodeinfo file on the NIM master. This contains the LPAR name, IP address of the HMC, the hardware type-model , serial number of the target Power System, LPAR id number and the (default) location of the HMC password file.

On the odd occasion, NIM may report that a resource is allocated to a NIM client, when, in fact, it is not. Typically, you’d check that the resource was, in fact, not allocated for use to any NIM client and if it was, you would reset the client; and this would resolve the issue. But if that doesn’t work, you may need to take an additional action to resolve the problem. This doesn’t happen very often but it can frustrate you when it does.

Here’s an example of the problem. I try to remove an lpp_source resource but I’m told that it’s still allocated to a client. But it isn’t, I tell you!

# nim -o remove liveupdaterte

0042-001 nim: processing error encountered on "master":

0042-061 m_rmpdir: the "liveupdaterte" resource is currently

allocated for client use

Even lsnim is telling me that the resource is still allocated, somewhere, because alloc_count is set to 1.

# lsnim -Fl liveupdaterte

liveupdaterte:

id = 1447111715

class = resources

type = lpp_source

comments = LIVE

arch = power

Rstate = ready for use

prev_state = verification is being performed

location = /export/nim/cglpp

alloc_count = 1

server = master

After trying to de-allocate the resources, by resetting my NIM clients (see my script at the bottom of the page), and still receiving the same error, I’m left with little choice but to manually reset the alloc_count value to 0, using the (almost undocumented) /usr/lpp/bos.sysmgt/nim/methods/m_chattr NIM utility.

The release level of the resource is incomplete, or incorrectly specified. The level of the resource can be obtained by running the lsnim -l ResourceName command and viewing the version, release, and mod attributes. To correct the problem, either recreate the resource, or modify the NIM database to contain the correct level using the command on the NIM master:/usr/lpp/bos.sysmgt/nim/methods/m_chattr -a Attribute= Value ResourceName, where Attribute is version, release, or mod; Value is the correct value; and ResourceName is the name of the resource with the incorrect level specification.

One question that comes to mind is how did the NIM resource end up in this state? Most likely it was the result of a failed NIM operation on the lpp_source and NIM client to which it was to be allocated. This can be tricky to pick up and almost always, it’s the next person who tries to use the resource that finds the problem and has no idea what events led up to this point.

As always, use caution when experimenting with this tool. If in doubt, take a backup of your NIM database before you start messing with the attributes, just in case you need it in the future.

Here’s my NIM client reset script. It resets the client and de-allocates any resources assigned to it. It also resets the NIM client cpuid (this is not always required) but I often use the same NIM client to install multiple AIX partitions across several Power servers, so it’s useful to me only (probably)! You can remove that line if need be.

There’s a new NIM HTTP service handler included with AIX 7.2 (due for release next month, December 2015). This new service is designed “…….to help Clients better conform to emerging data center policies restricting the use of NFS, NIM will now have support to apply updates to AIX or install new packages over HTTPs.Initial AIX installs will still require the use of NFS version 3 or the more secure NFS version 4 protocol.

In addition to fileset installs, NIM customization activities such as script execution and file_res copying also support access over HTTPs.

Major Advantages of using HTTP during NIM Management:

All communication occurs over a single http port, so the authorization through a firewall is quite easy to manage.

Actions are driven from the client's end (the install target), so remote access isn't necessary for pushing the commands.

Easy to consume by NIM or other products that currently use the client/server model of NFS.

Able to extend the end-product to support additional protocols (context driven).”

“How Does it Work?

AIX ships a new service handler (in 7.2.0) that provides http access to NIM resources. The service name (defined in /etc/services) is nimhttp and it listens for requests over port 4901. When active, NIM clients attempt file access and/or scripting customization requests from nimhttp. If http access fails or is denied, a failover attempt at NFS client access occurs. Future support will include options to remove NFS client attempts altogether.”

“On startup, the nimhttp service attempts to read the httpd.conf configuration file -‐-‐ located in the default home directory of the user. First time users will notice that starting the service without a configuration file will result in one being created and populated with default service values.”

“document_root

….for now, the key detail to point out is that NIM expects all http accessible files to exist under the path of /export/nim/. This path location is defined as the document_root and cannot be modified at this time. Future enhancements will support multiple document_root paths. The document root path is not limited in depth and may contain many sub-directories. Client requests are able to traverse the path setting by using the enable_directory_listing option. If set to “no”, all files being served must reside in the current working directory of document_root.”

“The default authentication used in nimhttp for client access is a basic protocol handshake and is probably considered by some (if not all) as undesirable. To enable the more secure Digest Authentication method, users must provide valid paths for certificate authority and root certificate files for the server. The certificate authority and root PEM files used in nimhttp are easily created using the existing SSL management option in NIM. Run the following command on the NIM master to create the ssl.cert_authority and ssl.pemfiles used by the nimhttp service:

# nimconfig –c”

I tested this functionality during the AIX 7.2 Early Ship Program.

Warning: The information shown here was collected from testing conducted with beta level code. Some details may change in the final release.

Configuring the service was easy. For the sake of simplicity I chose not to use SSL with the authentication mechanism. With my NIM master already configured, all I need to do is confirm that the NIM client fileset is installed on the master and any client I wish to manage with the HTTP service.

NIM MASTER:

# lslpp -l | grep nim

bos.sysmgt.nim.master 7.2.0.0 COMMITTED Network Install Manager -

bos.sysmgt.nim.client 7.2.0.0 COMMITTED Network Install Manager –

NIM CLIENT:

# lssrc -s nimsh

Subsystem Group PID Status

nimsh nimclient 6554064 active

# lslpp -l | grep nim

bos.sysmgt.nim.client 7.2.0.0 COMMITTED Network Install Manager -

Start the NIMHTTP service on the NIM master. This starts the nimhttpd daemon (on the master only) and creates the default httpd.conf file (in root’s home directory, /).

# startsrc -s nimhttp

0513-059 The nimhttp Subsystem has been started. Subsystem PID is 6685178.

# lssrc -s nimhttp

Subsystem Group PID Status

nimhttp 6685178 active

# ps -ef | grep nimhttp

root 6685178 4194712 0 Nov 10 - 0:00 /usr/sbin/nimhttpd –v

# ls -ltr /httpd.conf

-rw-r--r-- 1 root system 1159 Nov 05 15:31 /httpd.conf

# cat /httpd.conf

#

#---------------------

# http service defines

#---------------------

#

service.name=nimhttp

# Designates the service name used when discovering the listening port for requests (i.e., nimhttp)

#

service.log=/var/adm/ras/nimhttp.log

# Log of access attempts and equivalent responses. Also useful for debug purposes.

#

# service.proxy_port=

# Designates the service port number used when configured as a proxy.

#

# service.access_list=

# White-list of IP (host) addresses which have access to our http file service. All others are denied.

#

#

#---------------------

# http configuration

#---------------------

#

document_root=/export/nim/

# Designates the directory to serve files from.

#

enable_directory_listing=yes

# Allow requests for listing served files/directories under the document root.

#

enable_proxy=no

# Enable the web service to act as a proxy server.

#

ssl.cert_authority=/ssl_nimsh/certs/root.pem

# Designates the file location of the certificate authority used for digital certificate signing.

#

ssl.pemfile=/ssl_nimsh/certs/server.pem

# Designates the file location of the PEM format file which contains both a certificate and private key.

#

I configured a new lpp_source resource (liveupdaterte) on the NIM master. I ensured that all the files for the lpp_source were in the correct location (i.e. /export/nim) . This restriction will be lifted in the future, but during my testing the service required all files to be served from /export/nim, on the master.

I enjoy it when I open my email in the morning and find a new message with a subject line of “weird one….”! I immediately prepare myself for whatever challenge awaits. Fortunately I do delight in helping others with their AIX challenges so I usually open these emails first and start to diagnose and troubleshoot the problem!

This week I was contacted by someone that was having a little trouble with a mksysb backup on one of their AIX systems.

“Hi Chris,

This one has me stumped, any ideas? I’ll have to log a call I think as I’m not sure why this is happening. I run a mksysb and it just backs up 4 files! I also can’t do an alt_disk_copy that also fails.

My /etc/exclude.rootvg is empty.

# cat /etc/exclude.rootvg
# mksysb -i /mksysb/aixlpar1-mksysb

Creating information file (/image.data) for rootvg.

Creating list of files to back up.

Backing up 4 files

4 of 4 files (100%)
0512-038 mksysb: Backup Completed Successfully.

# lsmksysb -f /mksysb/aixlpar1-mksysb
New volume on /mksysb/aixlpar1-mksysb:
Cluster size is 51200 bytes (100 blocks).
The volume number is 1.
The backup date is: Wed Oct 21 22:12:04 EST 2015
Files are backed up by name.
The user is root.5911 ./bosinst.data
11 ./tmp/vgdata/rootvg/image.info
11837 ./image.data
270567 ./tmp/vgdata/rootvg/backup.data
The total size is 288326 bytes.
The number of archived files is 4.”

In my previous post on AIX Live Updates I discussed how to use the geninstall command to perform a non-disruptive (ifix) update on an AIX system. In this post I wanted to show you how to perform the same task using NIM.

NIM can be used to start an AIX Live Update operation on a target machine (NIM client) either from a NIM master or from the NIM client itself (with nimclient).

Note: The AIX Live Update operation started by NIM calls the hmcauth command during the cust operation to authenticate to the NIM client with the HMC by using the HMC passwd file. The NIM master is responsible for obtaining password information from the HMC (using ssh).Without it, NIM clients will not have the password information necessary when running hmcauth as part of the NIM client operation.So, we must first define an hmc object in NIM and create the password file (used when accessing the console.)Once this required step has been completed, all clients using NIM live_update have the ability to pass the proper hmc login credentials when configuring 'hmcauth'.

First, I need to install the dsm.core fileset and configure SSH keys between the NIM master and the HMC.

The NIM client must either be defined with or updated to include the Managed System name (Management Source) and LPAR id number.

# smit nim_chmac

Change/Show Characteristics of a Machine

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

Machine Name [AIXmig]

* Hardware Platform Type [chrp] +

* Kernel to use for Network Boot [64] +

Machine Type standalone

Network Install Machine State currently running

Network Install Control State ready for a NIM operation

Primary Network Install Interface

Network Name net1

Host Name [AIXmig]

Network Adapter Hardware Address [0]

Network Adapter Logical Device Name [ent]

Cable Type N/A +

Network Speed Setting [] +

Network Duplex Setting [] +

IPL ROM Emulation Device [] +/

VLAN Tag Priority (0 to 7) [] #

VLAN Tag Identifier (0 to 4094) [] #

CPU Id [00F94F584C00]

Communication Protocol used by client [nimsh] +

NFS Client Reserved Ports [] +

Comments []

Managing System Information

LPAR Options

Identity [88]

Management Source [S824]

# lsnim -l AIXmig

AIXmig:

class = machines

type = standalone

connect = nimsh

platform = chrp

netboot_kernel = 64

if1 = net1 AIXmig 0

cable_type1 = N/A

mgmt_profile1 = hsc02 88 S824 <<< LPARD id 88, Mgmt Src S824

Cstate = ready for a NIM operation

prev_state = ready for a NIM operation

Mstate = currently running

cpuid = 00F94F584C00

Cstate_result = success

I also need to configure an lpp_source for the ifix location (on the NIM master) and the Live Update data file (on the NIM master). This file can reside on the NIM client if you wish but I’ve chosen to manage all the resources on the NIM master.

# lsnim -t lpp_source

lpp_sourceaix72 resources lpp_source

liveupdatefix resources lpp_source

# lsnim -l liveupdatefix

liveupdatefix:

class = resources

type = lpp_source

arch = power

Rstate = ready for use

prev_state = unavailable for use

location = /nim/lvup/ifix

alloc_count = 0

server = master

# ls -ltr /nim/lvup/ifix

total 72

-rw-r----- 1 root system 35625 Oct 15 14:50 dummy.150813.epkg.Z

# lsnim -t live_update_data

liveupdate_AIXmig resources live_update_data

# lsnim -l liveupdate_AIXmig

liveupdate_AIXmig:

class = resources

type = live_update_data

Rstate = ready for use

prev_state = unavailable for use

location = /nim/lvup/lvupdate.data

alloc_count = 0

server = master

# ls -ltr /nim/lvup/

total 16

drwxr-xr-x 2 root system 256 Oct 15 14:54 ifix

-r--r----- 1 root system 4289 Oct 15 15:04 lvupdate.data

# tail -20 /nim/lvup/lvupdate.data

# Users need not provide redundant options such as "-a -U -C and -o"

# in the trc_option field for trace stanza.

# Do not add a trace stanza to the lvupdate.data file unless you

# want the live update commands to be traced.

#

general:

mode = automated

kext_check = no

disks:

nhdisk = hdisk0

mhdisk = hdisk1

tohdisk =

tshdisk =

hmc:

lpar_id = 88

management_console = 10.1.50.30

user = hscroot

Now I can perform a preview of the live update operation, from the NIM master. The preview operation will be run on the NIM client called AIXmig.

If you want, you could initiate the live update from the NIM client using the nimclient command. All the resources reside on the NIM master, but the NIM client starts the operation, not the NIM master.

As you are probably aware, in a lot of cases, customers have chosen to include IBM flash system storage as the preferred read LVM mirror on AIX. This gives them enhanced I/O read performance and resiliency. An AIX administrator can add the hdisks to the VG, mirror the data and set the LV scheduling policy to parallel/sequential indicating writes will be done to both copies in parallel and reads will come from the first LVM copy which should be on flash. This is discussed at length in section 5.5.2 Implementing preferred read, Preferred read with AIX, of the following Redbook.

I discovered today that both the mklv and chlv commands have recently been enhanced to support this configuration. Both now have an option for setting the preferred read to the required copy of a logical volume (usually on the flash disk).

From the man page for mklv and chlv, I found the following new information.

# man mklv

-R PreferredRead

Sets read preference to the copy of the logical volume. If the -R flag is specified and if the preferred copy is available, the read operation occurs from the preferred copy. If the preferred copy is not available, the read operations follow the scheduling policy of the logical volume. The PreferredRead variable can be set to a value in the range 0 -3. The default value is 0.

# man chlv

-R PreferredRead

Changes preferred read copy of the logical volume. Always reads from the preferred copy if the preferred copy is available. If the preferred copy is not in available then the reads follows the scheduling policy of the logical volume. The PreferredRead variable can be set to a value ranging from 0 to 3. Setting the PreferredRead variable to 0 disables the preferred read copy of the logical volume.

To my surprise, this is available in AIX 7.1 TL3 SP5.

# oslevel -s

7100-03-05-1524

Now I can specify the preferred copy during creation of the logical volume. And lslv will now show me the preferred read copy.

Previously, to set the preferred read to the flash storage, it would often require you to mirror a volume group to the flash disks, then un-mirror the volume group to make the flash disk the primary LV copy, then mirror the volume group again to make the non-flash disks the secondary LV copy. This is no longer necessary with the new –R option to the chlv command and can be changed as and when necessary.

Starting with AIX Version 7.2, the AIX operating system provides the AIX Live Update function which eliminates downtime associated with patching the AIX operating system. Previous releases of AIX required systems to be rebooted after an interim fix was applied to a running system. This new feature allows workloads to remain active during a Live Update operation and the operating system can use the interim fix immediately without needing to restart the entire system. In the first release of this feature, AIX Live Update will allow customers to install interim fixes (ifixes) only. Ultimately it may be possible to use this function to install AIX Service Packs (SPs) and Technology Levels (TLs) without a reboot.

IBM delivers kernel fixes in the form of ifixes to resolve issues that are reported by customers. If a fix changes the AIX kernel or loaded kernel extensions that cannot be unloaded, the host logical partition (LPAR) must be rebooted. To address this issue, AIX Version 7.1, and earlier, provided concurrent update-enabled ifixes that allowed deployment of some limited kernel fixes to a running LPAR. Unfortunately not all ifixes could be delivered as “concurrent update-enabled”. The AIX Live Update solution is not constrained by the same limitations as in the case of concurrent update enabled ifixes. The AIX 7.2, Live Update feature will allow customers to install ifixes without needing to reboot their AIX systems, avoiding downtime for their mission critical, production workloads.

This article (in the link below) will discuss the high-level concepts relating to AIX Live Updates and then provide a real example of how to use the tool to patch a live AIX system. I was fortunate enough to take part in an Early Ship Program (ESP) for AIX 7.2. During the ESP I had the opportunity to test the AIX Live Update feature. I’ll share my experience using this tool in the example that follows.

Here’s a tip I picked up from the IBM support team. I’ve not tested this myself but it looks sound. I will try to test this in my lab soon and report back.

“You can use SEA poll_uplink method (requires VIOS 2.2.3.4). In this case SEA can pass up the link status, no "!REQD" style ping is required any more.

Yes, you can install VIOS 2.2.3.50 on top of 2.2.3.4.

At the moment I'm not aware any official documentation regarding how to configure SEA poll_uplink in PowerHA environment. I was in touch with Dino Quintero (editor of the PowerHA Redbooks) and his team will update the latest PowerHA Redbook with this information soon.

However, it's very easy to enable SEA poll_uplink in PowerHA. Configuration steps:

Enable poll_uplink on ent0 interface (run this command for all virtual interfaces on all nodes):

IBMs PowerVP tool became available in November 2013. It was designed to provide Power Systems administrators with performance information in an enhanced visual format. The aim was to accelerate the identification of performance bottlenecks so that performance analysts could make better decisions based on more detailed and comprehensive data from POWER7 (and POWER8) systems. PowerVP presents both System (frame) and Partition level views of performance data. This has not been possible in the past using any single tool. Administrators would typically need to use many different tools and interfaces to obtain a single, system-wide performance view across an entire CEC and to drill down to all individual partitions.

The tool was originally developed for IBM internal use only (known as Sleuth) which helped the IBM development team with rapid development of prototype technology and performance analysis. After a brief demonstration of the tool during an internal, invitation only, event for customers at IBM Austin, almost all of the customer attendees requested that the tool be made available for use outside of IBM.

In this post I will briefly discuss how to quickly install and configure PowerVP in an AIX environment. I will start by discussing how to install the PowerVP GUI on a Windows laptop and then cover to how to install the PowerVP agents on an AIX and/or VIOS partition. I’ll then show you how to monitor your system and collect system wide metrics for an entire frame by recording and playing back your PowerVP sessions.

To begin, let’s download, extract and install the latest version of PowerVP. Customers that are entitled (PowerVM Enterprise Edition customers) can download the PowerVP software directly from the IBM Entitled Systems Support (ESS) website.

You can be forgiven for thinking that the latest version of PowerVP isn’t available in ESS but if you look closely and expand everything out, you’ll find v1.1.3 is listed for download.

Once you’ve downloaded the software you’ll end up with a PowerVP package that is named something similar to ESD_-_PowerVP_Standard_Edition_v_1.1.3_62015.zip. Extract the zip file and you’ll discover the following directory structure:

To install the PowerVP GUI for Windows, simply run the PowerVP.exe from the Windows folder. Choose PowerVP Client GUI and click next.

When prompted, select to install Liberty for PowerVP as part of the GUI installation. The IBM PowerVP Redbook explains why:

“Starting with Version 1.1.3 PowerVP has a web based GUI. It is packaged in the Web Application Archive (WAR) format and it must be deployed onto an application server. By default, PowerVP GUI uses IBM WebSphere® Application Server Liberty Core. Liberty profile is a new server profile of IBM WebSphere Application Server V8.5. Liberty profile provides all features required to run the PowerVP, it is lightweight, has a small footprint and fast startup time. PowerVP and a configured Liberty profile are packaged into a compressed file. This provides for an easy and efficient distribution and a simplified installation procedure. Because the new PowerVP GUI is web based it is now possible that a single instance of this GUI be accessed by multiple users using web browsers. This eliminates the need to install a console for each PowerVP user and avoids the potential overhead generated by additional performance data requests initiated from multiple consoles. PowerVP users can connect to the web GUI using web browsers. Users must be able to connect to the ports on which the application server is listening. Default port numbers are 9080 for HTTP traffic and 9443 for HTTPS traffic. Port numbers can be changed during the installation process.”

The following diagram, also from the Redbook, provides a visual representation as to where Liberty fits in with the new GUI, the System and Partition level agents.

Once the GUI is installed on your Windows desktop, the next step is to extract the PowerVP agent for AIX/VIOS. To do this you, once again, run the PowerVP.exe installer. Select PowerVP Server Agents and click next.

Select AIX/VIOS and click next.

When prompted for a System Level Agent Hostname or IP Address, you can enter anything here as it is ignored. All we want to do here is extract the installation software, not connect to an agent. I entered localhost, even though this is not where the PowerVP agent will reside. Click next.

Once the install process is complete you’ll find the extracted AIX/VIOS installation filesets in the C:\Program Files (x86)\IBM\PowerVP\PowerVP_Installation\PowerVP_Agent_Installation_Instructions\AIX directory.

You can now transfer these files to your AIX and/or VIOS system of choice, essentially wherever you’d like to install and run the PowerVP server agent and any partitions you’d like to monitor as a partition level agent. Many customers have chosen to install the PowerVP System level agent on their VIOS. This seems like a logical place to install it as these systems are typically always up and available. Ensure that you copy the powervp.1.1.30.bff fileset and the the GSKit filesets to the destination system, as both are needed for installation. Of course, you should download and install the latest fixes for PowerVP from the IBM Fix Central site as well.

The agent installation on AIX is very simple. Make sure that your hardware and system firmware support PowerVP before you install it. The IBM Redbook, IBM PowerVP Introduction and Technical Overview REDP-5112-00, has a comprehensive list of supported systems and minimum requirements.

To install the agent on an AIX or VIOS partition, simply copy the filesets to the system and use installp to install both the GSKit and powervp filesets. The GSKit filesets are required for SSL support with PowerVP. Even if you don’t plan on using SSL with PowerVP, these filesets must be installed when powervp.rte is installed, regardless.

With the agent installed successfully, you can simply start the PowerVP agent. No further configuration is required at this point. The agent will run as a System level agent and allow you to connect to it with the PowerVP client GUI.

However, if you wanted to configure this agent as a Partition level agent, you need to run the PowerVP iconfig tool to point the Partition agent at an existing System level agent. For example, we could configure the newly installed agent to communicate with an existing System level agent at IP address 10.1.50.59. Then we start the agent using the SPowerVP script. We then confirm that the agent has registered with the System level agent by reviewing the output in the /var/log/powervp.log file on the client partition.

Now that the agent is installed we can connect to it with the PowerVP GUI. You can start the GUI by doubling clicking on the PowerVP icon on your Windows desktop. This starts the Liberty server, opens your web browser and connects you to the PowerVP interface.

You should see the following messages as the PowerVP GUI server is started on your Windows desktop/laptop.

Note: Please ensure that Java is in your path on your Windows machine. If it is not PowerVP will fail to start and you may be presented with an error stating that “javaw” cannot be found. You can check if Java is in the path by opening a DOS prompt and entering a java command For example:

To connect to the System level agent, click on New Connection and enter the IP address or hostname of the partition running the System leve agent, followed by the user name and password of the root user (or padmin if running the agent on a VIOS). Then click on connect.

You will be presented with the PowerVP main panel. From here you can start exploring each of the main views available, such as System Topology, Node Drill Down and Partition Drill Down.

The System Topology view shows the hardware topology of the system we are connected to in the current session. In this view, we can see the topology of a POWER8 S824 with two processor modules. We can see each node has two chips/sockets. We can also see numbers in the boxes which indicate how busy each of the chips are on the system. The lines between the nodes show the traffic on the SMP fabric between each node. If you select Toggle Buses button, PowerVP GUI will show lines between the processor module boxes and processor nodes which represent buses. The Toggle Affinity button is intended to show affinity where every partition has a different colour.

The Node Drill Down view appears when you click on one of the nodes and allows you to see the resources being consumed by the partitions running on the system. In this view we can see this processor module has 12 cores/processors. We can also see lines showing the buses between the chips. We can also see the memory controllers and the PHB buses which shows traffic to and from our remote I/O. We also see connecting to the other processor module; this is the SMP connections to other nodes and shows traffic.

The Partition Drill Down view allows us to drill down on resources being used by a specific partition that we clicked on. This view opens in a new tab in our web browser. In this view, we see CPU, Memory, Disk Iops, and Ethernet being consumed. We can also get an idea of cache and memory affinity (under Detailed LSU Breakdown).

The main panel also provides you with a view of processor utilisation for each LPAR on the system. You can easily sort LPARs based on utilisation to quickly understand which LPARs are consuming the most (or least) CPU across a single system.

Overall system processor utilisation is available from the main panel also. This view provides a graph of total processor utilisation, over time, for the entire POWER system. Directly above this graph useful information is displayed for items such as clock frequency, total cores, platform (AIX, Linux, VIOS or IBM i), system model/serial number and sample rate.

One very useful feature of PowerVP is the ability to record and playback your PowerVP sessions. By clicking on the Start Recording button, PowerVP will start to record your session to your local machine (in my case, my Windows laptop). I can then load this recording at later date for playback inside the PowerVP GUI.

A new feature, in version 1.1.3, now allows you to run the VIOS advisor (part) directly from the PowerVP GUI. When you connect to a System level agent on a VIOS, you will be presented with the VIOS Performance Advisor panel in the GUI. You can configure PowerVP to run the VIOS advisor at a particular time or you can run it on demand. You can also retrieve previously created VIOS advisor reports from the GUI. This is a very nice feature.

When I clicked on the Run Advisor button I noticed, on my VIOS, that a new topas_nmon and part process were started. The process ran for 10 minutes (default) and then a new tar file was created in /opt/ibm/powervp/advisor.

A new tab was automatically opened in my web browser which showed me the VIOS advisor report for my VIOS. Impressive stuff!

Please refer to the PowerVP Redbook for more information how to configure and use this option.

Several features and functions have changed since the last release of PowerVP. Here is the short list of important changes I’ve encountered so far:

The new PowerVP web interface does not support the older level of PowerVP agents. You will need to update both the System level and Partition level agents to v1.1.3 in order for the new version to function.

The old PowerVP GUI was previously installed on Windows as a PowerVP.exe application. This has been replaced by a laucnh-powervp.bat file (a shortcut will be created on your desktop). This starts the Liberty server for the GUI. You must select to install Liberty for this file to be installed. The following screenshot lists the contents of my PowerVP GUI Installation directory for my Windows laptop.

I also came across this useful tip in the Redbook. PowerVP can record large amounts of data when recording is enabled. So you should make sure you have sufficient space available to store recorded data on your local machine. It is recommended to increase the sample rate from the default of 1 second to reduce the amount of data collected during recording. The sample rate can be changed by editing the/etc/opt/ibm/powervp/powervp.conf file and changing SampleInterval to a larger value. You need only change the sample interval on the System level agent (the Partition level agents pick up the sample interval from the System level agent). Once you’ve modified the powervp.conf file you must restart the PowerVP system level agent (syslet on AIX).

The aim of this post was to help you quickly install and configure PowerVP in your AIX environment. I encourage the reader to review the available PowerVP material from IBM, in particular the PowerVP Redbook, to learn more about the features and functions of the tool. This tool, finally, provides Power System administrators with a single method of obtaining some important performance data in their POWER7/POWER8 systems environment.

One of my customers was configuring a new AIX 5.3 Versioned WPAR when they came across a very interesting issue. I thought I’d share the experience here, just in case anyone else comes across the problem. We configured the VWPAR to host an old application. The setup was relatively straight forward, restore the AIX 5.3 mksysb into the VWPAR and export the data disk from the Global into the VWPAR, import the volume group and mount the file systems. Job done! However, we noticed some fairly poor performance during application load tests. After some investigation we discovered that disk I/O performance was worse in the VWPAR than on the source LPAR. The question was, why?

We initially suspected the customers SAN and/or the storage subsystem, but both of these came back clean with no errors or configuration issues. In the end, the problem was related to a lack of ODM attributes in the PdAt object class, which prevented the VWPAR disk from using the correct queue depth setting.

Let me explain by demonstrating the problem and the workaround.

First, let’s add a new disk to a VWPAR. This will be used for a data volume group and file system. The disk in question is hdisk3.

# uname -W

0

# lsdev -Cc disk

hdisk0 Available Virtual SCSI Disk Drive

hdisk1 Available Virtual SCSI Disk Drive

hdisk2 Defined Virtual SCSI Disk Drive

hdisk3 Available Virtual SCSI Disk Drive <<<<<<

We set the disk queue depth to an appropriate number, in this case 256.

Note: This value will differ depending on the storage subsystem type, so check with your storage team and/or vendor for the best setting for your environment.

# chdev -l hdisk3 -a queue_depth=256

hdisk3 changed

Using the lsattr command, we verify that the queue depth attribute is set correctly in both the ODM and the AIX kernel.

# lsattr -El hdisk3 -a queue_depth

queue_depth 256 Queue DEPTH True

# lsattr -Pl hdisk3 -a queue_depth

queue_depth 256 Queue DEPTH True

We can also use kdb to verify the setting in the kernel. Remember at this stage, we are concentrating on hdisk3, which is referenced with a specific kernel device address in kdb.

From the output above, we can see that the queue depth is correctly i.e. set to 0x100 in Hex (256 in decimal).

Next, we export hdisk3 to the VWPAR using the chwpar command. The disk, as expected, enters a Defined state in the Global environment. It is known as hdisk1 in the VWPAR.

# chwpar -D devname=hdisk3 p8wpar1

# lswpar -D p8wpar1 | head -2 ; lswpar -D p8wpar1 | grep hdisk

Name Device Name Type Virtual Device RootVG Status

-------------------------------------------------------------------

p8wpar1 hdisk3 disk hdisk1 no EXPORTED <<<<<<

p8wpar1 hdisk2 disk hdisk0 yes EXPORTED

[root@gibopvc1]/ # lsdev -Cc disk

hdisk0 Available Virtual SCSI Disk Drive

hdisk1 Available Virtual SCSI Disk Drive

hdisk2 Defined Virtual SCSI Disk Drive

hdisk3 Defined Virtual SCSI Disk Drive

In the VWPAR, we run cfgmgr to discover the disk. We create a new data volume group (datavg) and file system (datafs) for application use (note: the steps to create the VG and FS are not shown below). This is for demonstration purposes only; the customer imported the data volume groups on their system.

We perform a very simple I/O test in the /datafs file system. We write/create a 1GB file and time the execution. We noticed immediately that the task took longer than expected.

# cd /datafs

# time lmktemp Afile 1024M

Afile

real 0m7.22s <<<<<<<<<<<<<<< SLOW?

user 0m0.04s

sys 0m1.36s

We ran the iostat command, from the Global environment, and noticed that “serv qfull” was constantly non-zero (very large numbers) for hdisk3. Essentially the hdisk queue was full all the time. This was bad and unexpected, given the queue depth setting of 256!

Now comes the interesting part. With a little help from our friends in IBM support, using kdb we found that the queue depth was reported as being set to 1 in the kernel and not 256! You’ll also notice here that the hdisk name has changed from hdisk3 to hdisk1. This happened as a result of exporting hdisk3 to the VWPAR. The disk is known as hdisk1 in the VWPAR (not hdisk3) but the kernel address is the same.

Fortunately, IBM support was able to provide us with a workaround. The first step was to add the missing vparent PdAt entry to the ODM in the Global environment.

# cat addodm_pdat_for_vparent.txt

PdAt:

uniquetype = "wio/common/vparent"

attribute = "naca_1_spt"

deflt = "1"

values = "1"

width = ""

type = "R"

generic = ""

rep = "n"

nls_index = 0

# odmadd addodm_pdat_for_vparent.txt

# odmget PdAt | grep -p "wio/common/vparent"

PdAt:

uniquetype = "wio/common/vparent"

attribute = "naca_1_spt"

deflt = "1"

values = "1"

width = ""

type = "R"

generic = ""

rep = "n"

nls_index = 0

We did the same in the VWPAR.

# clogin p8wpar1

# uname -W

11

# odmget PdAt | grep -p "wio/common/vparent"

#

# odmadd addodm_pdat_for_vparent.txt

# odmget PdAt | grep -p "wio/common/vparent"

PdAt:

uniquetype = "wio/common/vparent"

attribute = "naca_1_spt"

deflt = "1"

values = "1"

width = ""

type = "R"

generic = ""

rep = "n"

nls_index = 0

In the VWPAR, we removed the hdisk and then discovered it again, ensuring that the queue depth attribute was set to 256 in the ODM.

# uname –W

11

# rmdev -dl hdisk1

hdisk1 deleted

# cfgmgr

# lspv

hdisk0 00f94f58a0b98ca2 rootvg active

hdisk1 none None

# lsattr -El hdisk1 –a queue_depth

queue_depth 256 Queue DEPTH True

# odmget CuAt | grep -p queue

CuAt:

name = "hdisk1"

attribute = "queue_depth"

value = "256"

type = "R"

generic = "UD"

rep = "nr"

nls_index = 12

Back in the Global environment we checked that the queue depth was set correctly in the kernel. And it was!

# uname -W

0

# echo scsidisk 0xF1000A01C014C000 | kdb | grep queue_depth

ushort queue_depth = 0x100;

We re-ran the simple I/O test and immediately found that the test ran faster and the hdisk queue (for hdisk3, as shown by iostat from the Global environment) was no longer full. Subsequent application load tests showed much better performance.

I was contacted recently by a customer who was attempting to restore an AIX 5.3 Versioned WPAR (VWPAR) from backup using NIM. The restore worked OK but the data was restored to the wrong volume group!

When the VWPAR was created, the –g option was specified with mkwpar to force the creation of the VWPAR file systems in a separate volume group (named wparvg) rather than the default location of the Global root volume group (rootvg).

# mkwpar -g wparvg -n p8vw2 -B /cg/53gibbo.mksysb -C -O

Running lsvg against wparvg confirmed the file systems were in the right location after creation.

# lsvg -l wparvg

wparvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

loglv00 jfs2log 1 1 1 open/syncd N/A

fslv02 jfs2 4 4 1 open/syncd /wpars/p8vw2

fslv03 jfs2 2 2 1 open/syncd /wpars/p8vw2/home

fslv04 jfs2 12 12 1 open/syncd /wpars/p8vw2/opt

fslv05 jfs2 6 6 1 open/syncd /wpars/p8vw2/tmp

fslv06 jfs2 56 56 1 open/syncd /wpars/p8vw2/usr

fslv07 jfs2 12 12 1 open/syncd /wpars/p8vw2/var

Before handing the VWPAR over for production use, the customer wanted to ensure they could successfully backup and recover the VWPAR using NIM. First they took a backup of the VWPAR using NIM. From the NIM master, they created a “savewpar backup image”, as shown below.

In the Global environment, we then stopped and removed the VWPAR (p8vw2).

# stopwpar -F p8vw2

# rmwpar -F p8vw2

Back on the NIM master, we attempted to restore the VWPAR from the recently created backup image (p8vw2-backup).

# smit nim_wpar_create

p8vw2

Create a Managed Workload Partition

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

* Target Name [p8vw2]

Remain NIM client after install? [yes] +

Specification Resource [] +

WPAR Options

WPAR Name p8vw2

Resource for WPAR Backup Image [p8vw2-backup] +

Resource for System Backup Image [] +

Alternate DEVEXPORTS for installation [] +

Alternate SECATTRS for installation [] +

The restore completed successfully but to our surprise, the VWPAR file systems were in the Global rootvg not wparvg.

# lsvg –l rootvg

rootvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

hd5 boot 1 1 1 closed/syncd N/A

hd6 paging 8 8 1 open/syncd N/A

hd8 jfs2log 1 1 1 open/syncd N/A

hd4 jfs2 6 6 1 open/syncd /

hd2 jfs2 35 35 1 open/syncd /usr

hd9var jfs2 7 7 1 open/syncd /var

hd3 jfs2 2 2 1 open/syncd /tmp

hd1 jfs2 1 1 1 open/syncd /home

hd10opt jfs2 5 5 1 open/syncd /opt

hd11admin jfs2 2 2 1 open/syncd /admin

lg_dumplv sysdump 16 16 1 closed/syncd N/A

livedump jfs2 4 4 1 open/syncd /var/adm/ras/livedump

cglv jfs2 100 100 1 open/syncd /cg

fslv02 jfs2 2 2 1 open/syncd /wpars/p8vw2

fslv03 jfs2 1 1 1 open/syncd /wpars/p8vw2/home

fslv04 jfs2 6 6 1 open/syncd /wpars/p8vw2/opt

fslv05 jfs2 3 3 1 open/syncd /wpars/p8vw2/tmp

fslv06 jfs2 28 28 1 open/syncd /wpars/p8vw2/usr

fslv07 jfs2 6 6 1 open/syncd /wpars/p8vw2/var

# lsvg -l wparvg

wparvg:

LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT

loglv00 jfs2log 1 1 1 closed/syncd N/A

We attempted the restore again but this time we explicitly included a WPAR “Specification Resource”. We did this to ensure that the restwpar process was using the correct specification file.

# smit nim_wpar_create

p8vw2

Create a Managed Workload Partition

Type or select values in entry fields.

Press Enter AFTER making all desired changes.

[Entry Fields]

* Target Name [p8vw2]

Remain NIM client after install? [yes] +

Specification Resource [p8vw2-spec] +

WPAR Options

WPAR Name p8vw2

Resource for WPAR Backup Image [p8vw2-backup] +

Resource for System Backup Image [] +

Alternate DEVEXPORTS for installation [] +

Alternate SECATTRS for installation [] +

We created a WPAR Specification NIM resource. The file was created in the Global environment, using the mkwpar command to write out the VWPAR specification details to a text file. This file was then copied to the NIM master to be used to create the NIM resource.

# mkwpar -e p8vw2 -w -o /tmp/cg/p8vw2_cg.cf

# lsnim -l p8vw2-spec

p8vw2-spec:

class = resources

type = wpar_spec

Rstate = ready for use

prev_state = unavailable for use

location = /tmp/cg/p8vw2_cg.cf

alloc_count = 1

server = master

# lsnim -t wpar_spec

p8vw2-spec resources wpar_spec

The specification file contained the volume group name (wparvg) where each of the VWPAR file systems where located.

# grep vg /tmp/cg/p8vw2_cg.cf

rootvgwpar = "no"

vg = "wparvg"

vg = "wparvg"

vg = "wparvg"

vg = "wparvg"

vg = "wparvg"

vg = "wparvg"

# grep -p vg /tmp/cg/p8vw2_cg.cf

general:

version = "1"

name = "p8vw2"

hostname = "p8vw2"

checkpointable = "no"

directory = "/wpars/p8vw2"

privateusr = "yes"

uuid = "3e7a2bfb-6060-4770-ad7e-4d6b2a84f657"

devices = "/etc/wpars/devexports"

architecture = "none"

ostype = "1024"

xwparipc = "no"

auto = "no"

rootvgwpar = "no"

preserve = "no"

routing = "no"

mount:

logname = "/dev/loglv00"

directory = "/home"

vfs = "jfs2"

vg = "wparvg"

size = "131072"

mount:

logname = "/dev/loglv00"

mountopts = "rw"

directory = "/opt"

vfs = "jfs2"

vg = "wparvg"

size = "786432"

mount:

logname = "/dev/loglv00"

directory = "/var"

vfs = "jfs2"

vg = "wparvg"

size = "786432"

mount:

logname = "/dev/loglv00"

directory = "/tmp"

vfs = "jfs2"

vg = "wparvg"

size = "393216"

mount:

logname = "/dev/loglv00"

directory = "/"

vfs = "jfs2"

vg = "wparvg"

size = "262144"

mount:

logname = "/dev/loglv00"

mountopts = "rw"

directory = "/usr"

vfs = "jfs2"

vg = "wparvg"

size = "3670016"

However, even with the specification file in place, the result was the same and the VWPAR file systems were created in rootvg rather than wparvg.

Note: Both the Global environment and the NIM master were running AIX 7100-03-04-1441.

We were able to request in ifix from AIX support. Once we installed the ifix, in the Global environment, the restore process via NIM worked as expected and the VWPAR file systems were recovered in wparvg. We did not need to use the WPAR specification NIM resource.

After updating my AIX 7.1 TL3 system to service pack 4, I noticed that each time I started a new ssh session with this system, there was a noticeable delay before the login prompt was displayed. I initially thought there was a network or host name resolution (DNS) problem, but after thoroughly checking related files, such as /etc/hosts, /etc/resolv.conf and /etc/netsvc.conf, I started looking for a problem elsewhere.

I used truss to assist me in my investigation. I found that the each time an ssh client connected to the sshd daemon, sshd would attempt to access a device named /dev/pkcs11. Each time this happened (once per login) there was a significant delay/pause before the ssh session continued to the login prompt. I also noticed that prior to applying SP4, this delay wasn’t present.

I ran truss with the following options (the -d flag provided me with a timestamp for each line of output, and helped my detect the delay!). Immediately after the pkcs11 device was opened, there was a several second delay before the process continued.

I decided to un-install the security.pkcs11 fileset. This solved the issue and my ssh sessions started quickly with the login prompt appearing instantly again.

# installp -u security.pkcs11 -g

...

# time ssh lpar9 date

Wed Jan 14 15:13:56 2015

real 0m0.43s

user 0m0.02s

sys 0m0.01s

I’m still not sure what caused this problem. Prior to SP4, I did not encounter this issue with the security.pkcs11 fileset installed, so I can only assume that there may be some issue with this fileset at the 7.1.3.15 level. Here’s the truss output from a system running a lower level of security.pkcs11 (no delay).

0.1248: kopen("/dev/pkcs11", O_RDONLY) = 3

0.1320: kioctl(3, 2, 0x2FF21848, 0x00000000) = 0

I also found some advice that suggested placing ‘UsePKCS no’ in the /etc/ssh/sshd_config file, but this did not help me in the tests that I conducted.

It was safe for me to remove this fileset as I was not using it for any purpose. Typically, this fileset is required when using special crypto cards in POWER servers.

I’ve been working with a customer recently that required a special kind of disaster recovery capability for their shared storage pool (SSP) environment. The customer had implemented a private cloud solution in their existing POWER environment. The solution consisted of an IBM SmartCloud deployment with IBM Systems Director and VMControl managing the PowerVM deployment of virtual machines (AIX partitions) across the landscape.

The decision to use shared storage pools was driven by the fact that the customer was using SAN storage that could not be managed by VMControl. However, shared storage pool support was available with VMControl. This meant that the customer could continue using their non-IBM disk in their private cloud. The customer quickly noticed the many advantages that SSPs bring to a POWER cloud environment, such as very fast virtual machine deployment using snapshots and linked-clones.

Traditionally, the customer’s disaster (data) recovery process relied on SAN disk replication between its production and disaster recovery sites. In the event of a disaster the customer could recover a production system using a replicated copy of the SAN disks at the disaster recovery site. This also allowed the customer to perform regular disaster recovery tests. In these tests, the disaster recovery site is “fenced” from the real production network and SAN. The customer is then able to “bring up” a complete copy of any or all (replicated) production systems. Because these “copies” of production are fenced off from the real production environment the customer can undertake lengthy disaster recovery test programs without impacting the running production systems.

The customer had used this process for many, many years. They were happy with it and the flexibility it gave them. They wanted to continue using this process for the production SSP based systems.

The customer wanted the ability to recover a (storage) replicated copy of their entire shared storage pool at their disaster recovery (DR) site. We discussed the use of SSP failure group mirroring to cater for their DR requirements of their production AIX partitions. The following diagram shows our initial proposal of SSP mirroring across the production and DR sites.

However, this did not meet their needs. They needed a way to recover their AIX production partitions in isolation to their real production environment. SSP mirroring would not provide them with this flexibility. Essentially the customer could not start a “copy” of any or all production partitions at the DR site. In a SSP failure group setup, the SSP spans both the production and DR sites (as shown in the previous diagram). Meaning the SSP is in production across both sites. SSP mirroring would protect the customer in the event that the production site was to fail; they could start their production systems at the DR site, however they could not start isolated copies of the production systems in the same SSP. This meant SSP mirroring would break their current DR processes and provide them with reduced capability when compared to their traditional DR method.

Prior to SSP mirroring, one of the major drawbacks of SSPs was the lack of resilience at the storage layer. We were not able to mirror the disks in the pool from one storage subsystem to another (either local or remote). Having a single disk storage subsystem in a shared storage pool was a rather large single point of failure (SPOF). Starting with VIOS 2.2.3.0, we were given the capability to mirror the disks in a SSP from one storage system to another. All of the mirroring was configured and managed from the VIOS in the SSP cluster. The VIO client partitions are not aware that their disks are ‘mirror protected’ in the SSP. There are no changes required in the client partitions as the SSP mirroring feature is completely transparent to them. This feature enabled SSPs to be considered ready for production use.

Some other considerations that we discussed with the customer regarding SSP mirroring were the fact that the disks in the SSP failure group need to be shared across all the SSP nodes at both the production and DR sites. This is not a viable option in the case of long distances between sites (more than 100KM apart). SSP mirroring uses synchronous writes. This means that any write that happens at the production site has to be written synchronously on the failure group disks at the DR site first then production. This will introduce a delay in completing I/O which will have an impact on performance. Fortunately the customer’s two sites were far less than 100KM apart, so this consideration did not apply to them, but I mention it simply to make others aware of this point.

After much discussion with our SSP development team at IBM, we were able to provide the customer with an acceptable method of recovering their SSP environment at their DR site using a replica of the storage. What follows is a preview of this new recovery mechanism. This feature is not currently available as part of the standard SSP capability with VIOS 2.2.3.X. It is hoped that we will see this feature officially offered and supported at some stage in 2015. Of course, this is not an announcement and plans are always subject to change.

The following diagram provides a pictorial view of what the customer needed for DR with their SSP systems. As you can see in the diagram, there is a production and DR site. At the DR site there is a single SSP (with two VIOS in the cluster). Both the production and DR VIOS are in two separate shared storage pools.

The SSP development team provided a special ifix for the customer to install on their DR virtual I/O servers (VIOS). They also provided a script and procedure for the customer to follow. We successfully tested this procedure at the customers DR site. This was a huge success from the customer’s perspective as they were able to recover their entire production SSP in a few minutes. They could then selectively start/stop clones of the production AIX partitions at their DR site, without impacting the “real” production partitions at the production site. They were also able to continue using their existing storage replication method (as they had always done for all of their systems) in a DR situation. Not having to dramatically change their existing DR process and procedures was very important to the customer.

I’ll walk through the procedure now and share some insights (from my lab tests) as we go. It goes without saying, that before we started, the customer ensured that all the production SSP disks (LUNs) were being replicated to their DR site using their standard storage replication facility. The SSP repository disk was not replicated to the DR site. A new disk was used as the repos disk at the DR site.

The process (described in detail below) consists of a) taking a backup of the production SSP cluster configuration, b) running a “customise” script to modify the saved backup file to replace the production VIOS and hdisk names with the DR VIOS and hdisk names, c) restoring the modified VIOS backup on the DR VIOS at the DR site and finally d) verifying that the cluster has been recovered using the replicated storage at DR.

--

The first step was to take a backup of the production SSP VIOS cluster using the viosbr command. At the primary (production) site we used viosbr to create a backup of the cluster configuration. This generated a backup file named /home/padmin/cfgbackups/sspbackup.mycluster.tar.gz.

$ viosbr -backup -clustername mycluster -file sspbackup

The viosbr backup file was then transferred to one of the DR VIOS. On the selected DR VIOS we copy the backup file (sspbackup.mycluster.tar.gz) to the /home/padmin/backups directory.

We create a new file called nodelist in the backups directory. This file contains a list of the DR VIOS hostnames. These hostnames are used to create the replicated cluster at the remote site.

sspnode1.abc.com
sspnode2.abc.com

Also in the backups directory, we create another new file called disklist. This file contains a list of each of the replicated SSP disks with their unique_ids. These disks are the replicated copies of the SSP pool disks from the primary site. All these disks must be accessible on all DR VIOS nodes.

The next step is to copy the necessary ifix file to the backups directory on the selected DR VIOS. We untar this file, which extracts two files. One file is the customize script and the other is the actual ifix we need to install on both VIOS.

./customize_ssp_backup
./ifix/sspDR_1.140627.epkg.Z

We install the ifix on all the DR VIOS which are going to be part of SSP setup.

$ updateios -install -dev /home/padmin/backups/ifix -accept

We run the script to customize the backup file for new setup at the DR site.

We can now verify the new cluster has been configured and recovered correctly. We use the cluster and lu commands to check the cluster is up and that the logical units are available in the shared storage pool.

$ cluster -status -clustername mycluster

$ lu –list

The SSP is now ready for use at the DR site.

--

Before we tested this procedure in the customer’s environment, I tried it in my lab first. My lab had the following configuration:

1 x POWER7 750 with a single VIOS and a Shared Storage Pool. I considered this my “production” SSP, which I wanted to recover on another POWER system in my lab.

1 x POWER7 PS701 Blade with a single VIOS. No SSP. I treated this system as my “DR” environment on which I wanted to recover a complete copy of my “production” SSP.

My “production” SSP was configured with V7000 storage. I asked my storage administrator to create a copy (via flashcopy) of the nominated SSP disks from my 750 VIOS cluster. He then presented these new (flashcopied) LUNs to the “DR” VIOS on the PS701.

I then ran through the recovery steps (as outlined above). I started by taking a backup of the VIOS on the 750 using the viosbr command.

750vio1 (“production”)

-------------------------------

$ viosbr -backup -clustername VIOCLUSTER -file sspbackup

Backup of this node (750vio1) successful

I discovered that a SSP was already configured on my blade VIOS. I would need to delete the SSP cluster before I could recover the “production” cluster backup on this system. I obtained a list of the current disks in the SSP.

ps701-vios (“DR”)

-----------------------

$ pv -list

POOL_NAME: VIOSSP

TIER_NAME: SYSTEM

FG_NAME: Default

PV_NAME SIZE(MB) STATE UDID

hdisk2 51200 ONLINE 3321360050768019E027CC000000000000~

hdisk3 51200 ONLINE 3321360050768019E027CC000000000000~

I also took note of the existing repository disk.

$ lspv | grep caa

hdisk1 000a366a031ffa37 caavg_private active

I removed any existing client VSCSI mappings for SSP disk so that I could delete the logical units from the SSP and delete the cluster.

If another cluster is using this disk, that cluster will be destroyed.

Are you sure? (y/[n]) y

WARNING: Force continue.

ERROR: import caavg_private failed.

remove_cluster_repository: Force continue.

ERROR: Cannot varyonvg caavg_private. It does not exist.

ERROR: rc=1, remove caavg_private failed.

remove_cluster_repository: Force continue.

rmcluster: Successfully removed hdisk1.

I entered the oem_setup_env shell and ran cfgmgr to bring in the new (flashed) disks on the “DR” VIOS. Two new disks were configured. I would use these disks with the SSP recovery script and disklist file.

$ oem_setup_env

# cfgmgr

#

# lspv

hdisk0 000a366a68640e58 rootvg active

hdisk1 000a366a031ffa37 None

hdisk2 none None

hdisk3 none None

hdisk4 none None

hdisk5 none None

hdisk6 none None

hdisk7 none None

hdisk8 none None

hdisk9 none None

hdisk10 none None

hdisk11 000a366a6b0d0d56 None

hdisk12 000a366a6b0d0d56 None

hdisk13 000a366a6b0d0d56 None

hdisk14 none None <<<<<<<<

hdisk15 00f603cde9a7b15a None <<<<<<<<

#

I took note of the hdisks and the unique_ids and placed this information in a new file named disklist.

After the successful restore, I was able to display the cluster status and list the logical units in the recovered SSP.

$ cluster -list

CLUSTER_NAME: VIOCLUSTER

CLUSTER_ID: 9fa498e4896611e38d72f603cdcd9c55

$ lu -list

POOL_NAME: VIOSSP

TIER_NAME: SYSTEM

LU_NAME SIZE(MB) UNUSED(MB) UDID

sspdisk1_750lpar11 10240 7251 4f42b83c0ed7e826a4784b204b1c81ea

SNAPSHOTS

sspdisk1_750lpar11_snap1

$ cluster -status -clustername VIOCLUSTER

Cluster Name State

VIOCLUSTER OK

Node Name MTM Partition Num State Pool State

ps701-vios 8406-71Y0310A366A 1 OK OK

$

As expected, both of the “flashed” disks were now assigned to the recovered disk pool.

$ pv -list

POOL_NAME: VIOSSP

TIER_NAME: SYSTEM

FG_NAME: Default

PV_NAME SIZE(MB) STATE UDID

hdisk15 51200 ONLINE 3321360050768019E027CC000000000000~

hdisk14 51200 ONLINE 3321360050768019E027CC000000000000~

The existing hdisk1 had been successfully re-used for the repository disk of the “new” cluster.

$ lspv | grep caa

hdisk1 000a366a031ffa37 caavg_private active

$

The VIOS error report on the “DR” VIOS also confirmed that the SSP cluster was recovered and the services started correctly.

$ errlog

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION

228C1EBE 0707110114 I S POOL Cluster state change has occurred

228C1EBE 0707110014 I S POOL Cluster state change has occurred

EDFF8E9B 0707110014 I O StorageRM IBM.StorageRM daemon has started.

3B16518D 0707110014 I S ConfigRM The node is online in the domain indicat

4BDDFBCC 0707110014 I S ConfigRM The operational quorum state of the acti

AFA89905 0707110014 I O cthags Group Services daemon started

$

At this point I was able to map the LU to my VIO client partition and boot it on the PS701. AIX started without an issue and I now had a complete clone of my existing AIX partition (from the 750) running on my POWER blade.

Shared storage pools already provide some fantastic capabilities that are attractive to most POWER customers. Features such as SSP mirroring, snapshots and linked-clones provide some very powerful and flexible configurations that will accelerate virtual machine deployment and management on IBM Power Systems.

I believe the enhancement I’ve discussed here will prove very popular with many of our enterprise AIX and POWER customers. It will provide one more compelling reason for customers to consider shared storage pools in their environments, particularly for disaster recovery situations. There may even be the capability (in the future) to completely automate the entire procedure that I’ve outlined in this article. I hope that I can provide further updates on this capability in early 2015.

It’s possible, with PowerVC, to manage existing AIX and Linux partitions in your environment. This may be useful for customers that have recently installed PowerVC in their environment and have existing AIX partitions. The question that is often asked is “Can I manage these existing partitions with PowerVC?”. These existing partitions were deployed outside of PowerVC, often long before PowerVC was even available as product from IBM. Fortunately the answer to this question is most likely yes!

Before you add an existing virtual machine to PowerVC, ensure that the following requirements are met:

Here’s an example of managing an existing partition with PowerVC. Click on the Hosts icon from the PowerVC home screen then click on the “Manage Existing Virtual Machines” icon.

You are presented with a drop-down list of all the (Power Systems) Hosts known to PowerVC. Select the appropriate host where the existing partition currently resides.

Next you’ll be asked if you want to manage any supported virtual machines that are not currently being managed by PowerVC. “Supported” (with PowerVC Standard Editionv1.2.1) means any virtual machine that is either using virtual fibre channel adapters attached to supported SAN and storage devices (typically brocade switches and v7000) and/or virtual SCSI adapters attached to disk in a VIOS shared storage pool, that is currently managed by PowerVC.

Or you can select a specific virtual machine (or machines) to manage. I’ve selected one specific partition I wish to manage with PowerVC. It is an AIX partition that is connected to a shared storage pool (SSP). The SSP is already managed by PowerVC but the AIX partition is not.

A list of virtual machines (partitions) will be displayed next. Select the partition you’d like to manage with PowerVC and click on Manage.

Under the Virtual Machines view you’ll see the partition that you selected. It will appear in an Active and Pending state.

After a minute or two, the partition will change to an Active and OK state. The virtual machine is now under the management of PowerVC. You can now stop, start, restart, resize, migrate, attach volumes to, capture and/or delete the virtual machine through PowerVC.

Just as a point of interest, you can also access the Manage Existing option from the Virtual Machines view in PowerVC.

If you need to remove a partition (virtual machine) from PowerVC management (without deleting the partition) you can do so using the following procedure. From the Hosts view, select the desired Host and under Virtual Machines, select the partition you want to un-manage.

PowerVC will prompt you to confirm the removal of the virtual machine from PowerVC management. This does not delete the virtual machine; it will continue to run without disruption.

If you run snap whilst a system dump has been recorded on the dump device (logical volume), then it will be collected and included in the snap data file. If there is a valid dump to collect you’ll see the message highlighted below in the snap output.

If you are on a version of AIX that doesn’t have the snap –Z flag (such as AIX 5.3) then the alternative way to capture snap data without including the dump info is to run snap –a, when that is finished, remove the /tmp/ibmsupt/dump directory and run snap –c to create the snap pax file.

Starting with HMC V8R8.1.0.1 there’s a new, enhanced HMC interface available. You are given the choice of using either the Classic or Enhanced interface when you login to the HMC.

If you select ‘Enhanced’ you may find that the traditional DLPAR menu (Dynamic partitioning) has seemingly disappeared! You may be expecting the “classic” DLPAR menu, as shown in the following image.

CLASSIC DLPAR

You will not find the ‘Dynamic partitioning’ option or interface. Instead, in the Enhanced interface, you must select ‘Manage’ (as shown in the following image).

ENHANCED DLPAR

You are presented with a very different DLPAR interface. However, you can perform normal DLPAR operations as you have in the past. In the example (shown below) we can immediately see that the RMC connection to the partition is active and the partition is running. Here we could change the processor settings for the partition. For example, I could change the partition from Uncapped to Capped and then simply select either OK and/or Apply to make this change dynamically.

Learn about the differences between the Classic and Enhanced graphical user interface (GUI) in the Hardware Management Console (HMC).

You can collect this data by running Java and specifying the powervp.jar file, as shown below. You need to specify the hostname, username and password for the host where the system level agent resides. In the following example the hostname/IP address is 10.1.1.99 and the username/password is root and mypass1. I found the PowerVP JAR file in the default PowerVP install directory, which (on AIX) is usually /IBM/PowerVP/PowerVP_GUI_Installation/PowerVP/.

This got me thinking. Perhaps I could write a small script to wrap all this up and then schedule it from cron to collect data on a regular basis?

I wrote the beginnings of a basic expect script (shown below) which would allow me to run the script for a specified amount of time (in seconds) and pass the hostname, username and password from the AIX command line. I guess this would work fine from Linux as well?

I tested this in my lab. It worked as advertised. I simply ran the mkwpar command to restore an AIX 5.3 TL12 SP9 image and several minutes later I had an AIX 5.3 vWPAR up and running on my POWER8 system (S824).

I like using the IBM Microcode Discovery Service (MDS) to perform spot checks on POWER system and adapter firmware levels. This tool gives me snapshot of a POWER systems current firmware levels. I can tell almost immediately if I need to update the firmware on the system or any device installed in the server. This report is especially useful when performing a health check on a customer’s system.

The tool is easy to use. First you need to download the latest inventory scout catalog file from the IBM website. The most current microcode catalog can be downloaded here:

Once downloaded, copy the catalog file to an AIX (or VIOS) partition on the physical server. In most cases, a Virtual I/O Server (VIOS) is the most logical place to copy this file. The VIOS typically owns all the physical adapter devices on the system.

# scp catalog.mic padmin@vio1:

padmin@vio1's password:

catalog.mic 100% 263KB 263.2KB/s 00:00

Next, login to the VIOS and move the new catalog file into the correct location for use with the inventory scout (invscout) tool.

# ssh padmin@vio1

padmin@vio1's password:

$ ls -ltr

total 968

drwxr-xr-x 3 padmin staff 256 May 17 21:20 tivoli

-rw-r--r-- 1 padmin staff 2389 Aug 19 04:43 smit.transaction

-rw-r--r-- 1 padmin staff 1290 Aug 19 04:43 smit.script

-rw-r--r-- 1 padmin staff 11247 Aug 27 11:57 smit.log

drwxr-xr-x 2 padmin staff 4096 Sep 15 15:26 SV810_081

-rw-r--r-- 1 root system 5103 Sep 15 15:53 install.log

drwxr-xr-x 2 padmin staff 256 Sep 15 15:53 fixes

-rw-r--r-- 1 padmin staff 176510 Sep 15 18:34 errlog.out

drwxrwxr-- 2 root staff 256 Sep 16 13:25 config

-rw-r----- 1 padmin staff 269478 Sep 17 20:18 catalog.mic

-rw-r--r-- 1 root staff 5143 Sep 17 20:18 ioscli.log

$

$ r oem

oem_setup_env

# ls -ltr

total 8

-rw-r--r-- 1 root system 88 May 17 20:26 catalog.mic

# cd /var/adm/invscout/microcode

# mv catalog.mic catalog.mic.old

# cp /home/padmin/catalog.mic .

# ls -ltr

total 536

-rw-r--r-- 1 root system 88 May 17 20:26 catalog.mic.old

-rw-r----- 1 root staff 269478 Sep 17 20:20 catalog.mic

#

Now all we need to do is run the invscout command to create a new MDS upload file (MUP file).

# invscout

****** Command ---- V2.2.0.20

****** Logic Database V2.2.0.2

Initializing ...

Identifying the system ...

Working ...

Getting system microcode level(s) ...

Scanning for device microcode level(s) ...

117 devices detected; each dot (.)

represents 10 devices processed:

...........

Writing Microcode Survey upload file ...

Microcode Survey complete

The output files can be found at:

Upload file: /var/adm/invscout/vio1.mup

Report file: /var/adm/invscout/invs.mrp

Report file: /var/adm/invscout/invs.mrrup

To transfer the invscout 'Upload file' for microcode

comparison, see your service provider's web page.

#

The new MUP file (vio1.mup) can be uploaded to the MDS tool at the following URL:

Immediately my eyes are drawn to the items highlighted in red. Both the PCIe2 4-Port (10GbE SFP+ & 1GbE RJ45) and PCIe3 RAID SAS Adapter have new firmware available for them. I should update the firmware on these devices! But what fixes are included in the new firmware? Well, directly below the devices report is the ‘Microcode by Type’ table.

I can simply click on the ‘Readme’ hyperlink in the report and a new browser window opens with the readme file for the adapters firmware.

So, should I install these fixes right away? How critical are they? Again, the MDS report provides you with a quick way to answer these questions. Under the ‘Severity’ column, you will see several terms that describe the severity of the related fix. If you click on the word ‘Severity’ you’ll be presented with an ‘Understanding the MDS Report’ window.

On this page you’ll find the definition and description of each severity type. This will make it easier for you to ascertain if a fix should be installed ASAP or if it can wait until the next maintenance window.

I often save the MDS report as a PDF and include it in my health check reports. Alternatively you can save the entire page as a HTML document and send this to the customer. This will ensure the hyperlinks still work and can be referenced by the customer as needed e.g.

By the way, if the microcode catalog file you used to generate the MDS report is not the latest, the MDS tool will warn you. If you see this message, simply download the latest file and send it over to the server where it’s needed and re-run invscout and upload the new MUP file e.g.

“WARNING: the microcode catalog used in surveying host vio1 is now out of date. Version 2014.09.10 was used for the survey and version 2014.09.17 is the most current. We recommend that you either download and install the latest microcode catalog, and run this program again, or run the MDS Applet and allow it to automatically update the catalog.”

I received the following errors whilst running dsh on a NIM master recently.

root@nim1 : / # dsh -waixlpar1 date

0042-053 lsnim: there is no NIM object named "aixlpar1"

The node aixlpar1 is not defined in NIM database.

aixlpar1: Mon Aug 4 14:01:57 EET 2014

I had to set the following environment variable, shown below. By setting DSH_CONTEXT to DSH this prevented the dsh command from referring to the NIM database and instead forced it to query a user-defined node list.

root@nim1 : / # export DSH_CONTEXT=DSH

root@nim1 : / # dsh -waixlpar1 date

aixlpar1: Mon Aug 4 14:02:22 EET 2014

root@nim1 : / # env | grep -i dsh

DSH_CONTEXT=DSH

DSH_NODE_RSH=/usr/bin/ssh

root@nim1 : / # dsh -q

DSH:DCP_DEVICE_OPTS=

DSH:DCP_DEVICE_RCP=

DSH:DCP_NODE_OPTS=-q

DSH:DCP_NODE_RCP=/usr/bin/scp

DSH:DSH_CONTEXT=DSH

DSH:DSH_DEVICE_LIST=

DSH:DSH_DEVICE_OPTS=

DSH:DSH_DEVICE_RCP=

DSH:DSH_DEVICE_RSH=

DSH:DSH_ENVIRONMENT=

DSH:DSH_FANOUT=

DSH:DSH_LOG=

DSH:DSH_NODEGROUP_PATH=

DSH:DSH_NODE_LIST=/usr/local/etc/csmnodes.list

DSH:DSH_NODE_OPTS=

DSH:DSH_NODE_RCP=

DSH:DSH_NODE_RSH=/usr/bin/ssh

DSH:DSH_OUTPUT=

DSH:DSH_PATH=

DSH:DSH_REPORT=

DSH:DSH_SYNTAX=

DSH:DSH_TIMEOUT=

DSH:RSYNC_RSH=

Here’s another dsh tip I picked up. By default dsh will use the default port for ssh connections to nodes. For example, by default sshd listens on port 22 on an AIX node. I recently came across a customer environment where they had configured sshd to listen on port 6666 (not the real port number!). They wanted to use dsh from a NIM master which would connect to all the defined nodes in their custom list. When they ran it they got the following error message:

# dsh date

aixlpar1: ssh: connect to host aixlpar1 port 22: Connection refused

dsh: 2617-009 aixlpar1 remote shell had exit code 255

On the AIX node, we could see that sshd was listening on port 6666:

# netstat -a | grep 6666 | grep LIST

tcp6 0 0 *.6666 *.* LISTEN

tcp4 0 0 *.6666 *.* LISTEN

We needed to find a way to force dsh to use a different port number when starting the ssh connection. This was accomplished by setting the DSH_REMOTE_OPTS variable, as shown below.

[root@nim1]/ # export DSH_REMOTE_OPTS=-p6666

[root@nim1]/ # dsh date

aixlpar1: Tue Aug 5 17:37:16 2014

[root@nim1]/ # env | grep DSH

DSH_REMOTE_CMD=/usr/bin/ssh

DSH_NODE_LIST=/etc/ibm/sysmgt/dsm/nodelist

DSH_REMOTE_OPTS=-p6666

DSH_NODE_RSH=/usr/bin/ssh

DSH CONTEXT

The DSH CONTEXT is the in-built context for all the DSH Utilities commands. It permits a user-defined node group database contained in the local file system. The DSH_NODEGROUP_PATH environment variable specifies the path to the node group database. Each file in this directory represents a node group, and contains one host name or TCP/IP address for each node that is a group member. Blank lines and comment lines beginning with a # symbol are ignored. If all nodes are requested for the DSH CONTEXT, a full node list is built from all groups in the DSH_NODEGROUP_PATH directory, and cached in /var/ibm/sysmgt/dsm/dsh/$DSH_NODEGROUP_PATH/AllNodes. This file is recreated each time a group file is modified or added to the DSH_NODEGROUP_PATH directory. Device targets are not supported in the DSH context.

Following on from my post on March 25th regarding AIX Virtual Ethernet Link Status. It appears that the latest TL and SP for AIX 7.1 (TL3 SP3) allows an administrator to view the (VIOS SEA) physical link status from a virtual Ethernet adapter.

# oslevel -s

7100-03-03-1415

# chdev -l ent0 -a poll_uplink=yes –P

# shutdown –Fr

# entstat -d ent0 | grep PHY

PHYS_LINK_UP

Using sdiff I compared the output from entstat –d (before and after I applied TL3 SP3).

This post introduces two new features that I came across recently and found rather interesting. The first relates to PowerVP (VCPU affinity) and the second to POWER8 (Flexible SMT).

I’m particularly impressed by this new feature in PowerVP version 1.1.2 (SP1). You can view CPU and memory affinity information directly from the PowerVP GUI.

From the PowerVP Installation and User Guide:

“If you go to the View menu and select the Display CPU affinity information, the CPU utilization information will be replaced in the Core columns by the partition affinity information for the cores. If you hover your mouse over a core, you will see a tool tip showing the virtual CPU affinity by partition and will see the LPAR ID and the number of virtual CPUs assigned to the partition on that core. This information can be helpful when analyzing the processor affinity of your system. Note that for shared partitions, a partition could have affinity for multiple cores. Also, just because a partition has affinity for a core, that partition will not necessarily be dispatched to that core when it runs. Partition dispatching is performed by the hypervisor, if you want more information on this, refer to documentation on the hypervisor in the IBM Infocenter.”

Once I selected the “Display CPU affinity information” option, I noticed that the cores, shown in the “node drill down” view, showed partition affinity using different colours for each LPAR. Hovering my mouse over a core showed each of the LPAR ids and their associated virtual CPU count assigned to the core.

I was able to do the same with memory. The boxes next to the memory controllers (MC0 or 1) are memory affinity boxes. The colours in these boxes show the percentage of memory that is assigned to a partition on that particular memory controller. Hovering the mouse over this box showed the LPAR id and the percentage of memory assigned to each LPAR. This information may be useful if you are reviewing a particular partitions memory affinity.

To make it easier to read, I was able to obtain a list of LPARs and their associated colours from the Edit menu with the “Select Visible LPARs” option.

I have noticed that if you have partitions configured with dedicated processors, if you click on the LPAR name (in the partition list), PowerVP will highlight the cores with the colour assigned to the dedicated partition. However, if your partitions are configured with shared processors, they are all highlighted with the same colour (blue). At this time, PowerVP will not differentiate between different shared processor pools. Perhaps this feature will appear in the future?

You can learn more about PowerVP from the following Redbook on the topic:

Something else I wanted to mention, that is related to CPU affinity, is Flexible SMT. This new feature is available on POWER8 systems. It is covered in more detail in section 4.2 of the new POWER8 tuning Redbook. What is interesting is that compared to previous generations of POWER processor, the performance characteristics of a thread are the same, regardless of which h/w thread is active. This will allow for more equal execution of work on any thread of the processor. It also means that techniques such as rsets and bindprocessor may no longer be required on POWER8.

On POWER7 and POWER7+, there is a correlation between the hardware thread number (0-3) and the hardware resources within the processor. Matching the thread numbers to the number of active threads was required for optimum performance. For example, if only one thread was active, it was thread0; if two threads were active, they were thread0 and thread1.

On POWER8, the same performance is obtained regardless of which thread is active. The processor balances resources according to the number of active threads. There is no need to match the thread numbers with the number of active tasks. Thus, when using the bindprocessor command or API, it is not necessary to bind the job to thread0 for optimal performance.

With the POWER8 processor cores, the SMT hardware threads are designed to be more equal in the execution implementation, which allows the system to support flexible SMT scheduling and management.

On POWER8, any process or thread can run in any SMT mode. The processor balances the processor core resources according to the number of active hardware threads. There is no need to match the application thread numbers with the number of active hardware threads.

Hardware threads on the POWER8 processor have equal weight, unlike the hardware threads under POWER7. Therefore, as an example, a single process running on thread 7 would run just as fast as running on thread 0, presuming nothing else is on the other hardware threads for that processor core. AIX will dynamically adjust between SMT and ST mode based on the workload utilization.

I wanted to mention a new AIX feature, available with AIX 7.1 TL3 (and 6.1 TL9) called the ‘AIX Virtual Ethernet Link Status’ capability. Previous implementations of Virtual Ethernet do not have the ability to detect loss of network connectivity.

For example, if the VIOS SEA is unavailable and VIO clients are unable to communicate with external systems on the network, the Virtual Ethernet adapter would always remain “connected” to the network via the Hypervisors virtual switch. However, in reality, the VIO client was cut off from the external network.

This could lead to a few undesirable problems, such as, a) needing to provide an IP address to ping for Etherchannel (or NIB) configurations to force a failover during a network incident, lacking the ability to auto fail-back afterwards, b) unable to determine total device failure in the VIOS and c) PowerHA fail-over capability was somewhat reduced as it was unable to monitor the external network “reach-ability”.

The AIX VEA Link Status feature provides a way to overcome the previous limitations. The new VEA device will periodically poll the VIOS/SEA using L2 packets (LLDP format). The VIOS will respond with its physical device link status. If the VIOS is down, the VIO client times out and sets the uplink status to down.

To enable this new feature you’ll need your VIO clients to run either AIX 7.1 TL3 or AIX 6.1 TL9. Your VIOS will need to be running v2.2.3.0 at a minimum (recommend 2.2.3.1). There’s no special configuration required on the VIOS/SEA to support this feature. On the VIO client, you’ll find two new device attributes that you can configure/tune. These attributes are:

poll_uplink (yes, no)

poll_uplink_int (100ms – 5000ms)

Here’s some output from the lsattr and chdev commands on my test AIX 7.1 TL3 partition that show these new attributes.

This feature is still considered “new” but I’m very interested to see how this will integrate with PowerHA in the future. Perhaps the use of “Single Adapter” configurations with PowerHA will become more robust (allowing PowerHA to track and respond to network events)...…and possibly more prevalent. You can find more information on this feature here: