In a previous post
I discussed how you can identify some of the different types of a PowerVM Capacity
on Demand (CoD) activation keys from IBM.

Recently I had to Activate Memory Expansion (AME) on a
couple of POWER7 systems. I discovered that all of the keys contained a similar
string. It appears that if a CoD key contains the string CA1F0000000800then it is safe to assume it will activate
AME for a particular system. e.g.

9741EF3AE6969F17CA1F0000000800419D

937A1240F00F5B05CA1F0000000800413D

And while I’m talking about AME, I thought I’d share this
tip as well.

I was performing a demo of AME for my team and wanted to
change the AME expansion factor using DLPAR during the demo. I did not want to
use the HMC GUI but rather the HMC command line (as it’s faster).

To change the expansion factor for an LPAR (that’s enabled
for AME), you can use the chhwres
command from the HMC CLI.

During the demo I highlighted the current (running)
expansion factor for the LPAR (using the lshwres
command).

I’ve shared
my tips for resolving DLPAR problems in the past. So this week, when one of my
colleagues was experiencing an issue with DLPAR, I referred him to my blog post
and suggested he follow the troubleshooting steps. He did so and I went about
my business. Later that same day I asked him how he had fared. He told me that
DLPAR was still not working on his particular AIX LPAR. It was an AIX 5.3
system and he was attempting to another Virtual Processor to the LPAR. He
expressed his frustration with the situation, so I offered to take a look for
him.

What I
found was that the system was missing an important fileset. A fileset that
enabled DLPAR operations on AIX 5.3 systems. The fileset in question was named csm.client. Without this fileset
installed DLPAR would never work.

I advised
my colleague of the problem and suggested he follow the steps below to resolve
the issue. After he reinstalled the fileset, RMC communication between the HMC
and LPAR was restored and his DLPAR processor add operation completed without
issue.

1. Mount the NIM masters lpp_source file system:

aix53lpar1
: / # mount nim1:/export/lpp_source /mnt

2. Verify CSM filesets are not installed and the IBM.DRM subsystem is either inoperative
or missing.

I received
an email from one of my customers recently that simply said:

“mate
…on lpar30, for some reason the IBM.DRM and others are failing to start .. um
.. any chance you could have a quick look at that ?”

So I asked, “OK, so this use to work right?”. To which I received a
relatively confused reply, ”.....yep it
did....actually no....it’s never worked....or has it???...I’m not sure...”.

Based on my experience, the most common issue that prevents DLPAR
operations from working are network problems. Before diving into the deep
end and trying to debug RSCT, it’s always best to start with the basics. For example, can you ping the HMC from the
LPAR? Can you ping the LPAR from the HMC? If either of these tests fails, check
the network configuration on both components before doing anything else.

– Check the LPAR
communicationsbox in HMC configuration screen for LAN adapter
that is used for HMC-to-LPAR communication.

– By the way, unlike POWER4 systems, LPARs on POWER5 and POWER6 systems
do not depend on host name resolution for DLPAR operations.

·Check routing
on the LPAR and the HMC.

–
Use ping and the HMC’s Test
Network Connectivity task to verify the LPAR and the HMC can communicate
with each other.

If you check the network and you are happy that the LPAR and the HMC
can communicate, then perhaps you need to re-initialise the RMC subsystems on
the AIX LPAR. Run the following commands:

# /usr/sbin/rsct/bin/rmcctrl –z

# /usr/sbin/rsct/bin/rmcctrl –A

# /usr/sbin/rsct/bin/rmcctrl –p

Wait up to 5
minutes before trying DLPAR again. If DLPAR still doesn’t work i.e. the HMC is
still reporting no values for DCaps, and
the IBM.DRM subsystem still won’t
start, try using the recfgct command.

Only run
the rmcctrl and recfgct commands if you believe something has become corrupt in the
RMC configuration of the LPAR. The fastest way to fix a broken
configuration or to clear out the RMC ACL files after cloning (via alt_disk
migration) is to use the recfgct
command.

These daemons should work “out of the
box” and are not typically the cause of DLPAR issues. However, you can try
stopping and starting the daemons when troubleshooting DLPAR issues.

The rmcctrl -z command just
stops the daemons. The rmcctrl -A command ensures that the subsystem group
(rsct) and the subsystem (ctrmc) objects are added to the SRC, and
an appropriate entry added to the end of /etc/inittab and it
starts the daemons.

The
rmcctrl –p command enables the
daemons for remote client connections i.e. from the HMC to the LPAR and vice
versa.

If you are familiar with the System
Resource Controller (SRC) you might be tempted to use stopsrc and startsrc
commands to stop and start these daemons.

Do not do it; use the rmcctrl commands
instead.

If /var is 100% full, use chfs
to expand it. If there is no more space available, examine subdirectories
and remove unnecessary files (for example, trace.*, core, and so forth). If /var is full, RMC subsystems may fail
to function correctly.

The polling interval for the RMC
daemons on the LPAR to check with the HMC daemons is 5-7 minutes; so you need
to wait long enough for the daemons to start up and synchronize.

The Resource Monitoring and Control
(RMC) daemons are part of the Reliable, Scalable Cluster Technology (RSCT) and
are controlled by the System Resource Controller (SRC). These daemons run in
all LPARs and communicate with equivalent RMC daemons running on the HMC. The
daemons start automatically when the operating system starts and synchronize
with the HMC RMC daemons.

The
daemons in the LPARs and the daemons on the HMC must be able to communicate
over the network for DLPAR operations to succeed. This is
not the network connection between the managed system (FSP) and the HMC; it is
the network connection between the operating system (AIX) in each LPAR and the
HMC.

Note:
Apart from rebooting, there is no way to stop and start the RMC daemons on the
HMC.

The
following links also contain some (out dated) information relating to DLPAR
verification and troubleshooting. Even though it is quite old some of it is
still relevant today and is good a place to start.

The previous
link (above) provides some information relating to the values for DCaps and
what they mean (also out dated):

0 - DR CPU capable(can move CPUs)

1 - DR MEM capable(can move memory)

2 - DR I/O capable(can move I/O resources)

3 - DR PCI Bridge(can move PCI bridges)

4 - DR Entitlement(POWER 5 can change shared entitlement)

5 - Multiple DR CPU (AIX 5.3 can move 2+ CPUs at
once)

0x3f = max, and 0xf is common for AIX 5.2

If you are
interested in how HMC and LPAR authentication works with DLPAR, then read on.
Otherwise, happy DLPARing!

HMC
and LPAR authentication (RSCT authentication)

The diagram
below outlines how the HMC and an LPAR authenticate with each other in order
for DLPAR operations to work. RSCT authentication is used to ensure the HMC is
communicating with the correct LPAR.

Authentication
is the process of ensuring that another party is who it claims to be.
Authorization is the process by which a cluster software component grants or
denies resources based on certain criteria. The RSCT component that implements
authorization is RMC. It uses access control list (ACL) files to control user
access to resources.

The RMC
component subsystem uses cluster security services to map the operating system
user identifiers, specified in the ACL file, to network security identifiers to
determine if the user has the correct permissions. This is performed by the
identity mapping service, which uses information stored in the identity mapping
files ctsec_map.global and ctsec_map.local.

The
RSCT authorization process in detail:

1.On the HMC:DMSRM pushes
down
the secret key and HMC IP address to NVRAM when it detects a new CEC; this
process is repeated every five minutes. Each time an HMC is rebooted orDMSRM is
restarted, a new key is used.

2.On the AIX LPAR:CSMAgentRM, through RTAS (Run-time Abstraction Services),
reads the key and HMC IP address out from NVRAM. It will then authenticate the
HMC. This process is repeated every five minutes on a LPAR to detect a new HMCs
and if the key has changed. An HMC with a new key is treated as a new HMC and
will go though the authentication and authorization processes again.

3.On the AIX LPAR:After
authenticating the HMC,CSMAgentRM will contact the DMSRM on the HMC to
create aManagedNode
resource
in order to identify itself as a LPAR of this HMC.CSMAgentRM then
creates a compatibleManagementServer resource on AIX. This can be displayed
on AIX with the lssrsrc command.
e.g.

root@aix6 / # lsrsrc
"IBM.ManagementServer"

Resource Persistent
Attributes for IBM.ManagementServer

resource 1:

Name= "192.168.1.244"

Hostname= "192.168.1.244"

ManagerType= "HMC"

LocalHostname= "10.153.3.133"

ClusterTM= "9078-160"

ClusterSNum= ""

ActivePeerDomain = ""

NodeNameList= {"aix6"}

4.On the AIX LPAR: After the creation of theManagedNode and ManagementServer resources on
the HMC and AIX respectively,CSMAgentRMgrants HMC
permission to access necessary resource classes on the LPAR. After granting the
HMC permission,CSMAgentRM will change itsManagedNode, on the
HMC, Status to1. (It
should be noted that without proper permission on AIX, the HMC would be able to
establish a session with the LPAR but will not be able to query for OS
information, DLPAR capabilities, or execute DLPAR commands afterwards.)

5.On the HMC: After theManagedNode Status is changed
to1, LparCmdRM establishes
a session with the LPAR, queries for
operating system information and DLPAR capabilities, notifies CIMOM about
the DLPAR capabilities of the LPAR, and then waits for the DLPAR commands from
users.

Here are some questions I received recently regarding VLAN
tagging on the VIO server. My answers are shown in green.

“Hi Chris,

Q: I’m trying to
understand when, where and why there would be the need to use ‘mkvdev –vlan
(etc.) on the VIOS, and I’m wondering whether you would be able to clarify this
for me, please.

Is it necessary to add
the VLAN tag devices to the SEA, or is it suffice to just have them defined
within the Virtual Ethernet itself which is part of the SEA?”

A:
It is suffice to simply define the VLAN ids assigned to the Virtual Ethernet
adapters associated with the SEA.

“Q: For completeness,
on the rare occasions I have done this, I have added the VLAN’s to the Virtual
Ethernet and also as VLAN devices on the VIOS (mkvdev –vlan etc.)”

A:
mkvdev –vlan is not necessary, unless
the VIOS needs to communicate with hosts on different VLANs i.e. you need an IP
address on the VIOS for each VLAN. This does not mean the SEA will bridge this
VLAN traffic for VIOCs.

“Q: The reason I
started thinking of this is, is because one of our customers wants to add new
VLAN’s to their SEA, but they’re not running Power7 hardware. Therefore, the
online method would be to add a new Virtual Adapter which contains the new VLAN
ID’s to the VIOS using DLPAR, then use chdev –dev (etc.) on the SEA to include
the new Virtual Ethernet.”

A:
Agreed. The “IBM PowerVM Virtualization Managing and Monitoring” Redbook
states: “If your system doesn’t support dynamic VLAN modifications and you are
modifying the VLAN list of a virtual Ethernet adapter that is configured in a
SEA with ha_mode enabled, the HMC will not allow you to reconfigure the list of
VLANs on that interface. You will need to add an additional virtual Ethernet
adapter and modify the virt_adapters list of the SEA, or modify the profile of
both Virtual I/O Servers and re-activate both Virtual I/O Servers at the same
time.”

“Q: From the phone
call I had, it would appear that the VLAN tags are included on the Virtual
Ethernet device, but have not been added to the SEA by running mkvdev –vlan
(etc. ) on the VIOS’s. This leads me to assume that the ‘mkvdev –vlan’ is
only required if there is a requirement to access the VIOS itself from a
particular VLAN. Am I right, or is there something I’m not understanding?
I’m unable to find documentation that explains the answer. Do you happen to
know?”

A:
That is also my understanding (based on my experience). On page 483 of the “IBM
PowerVM Virtualization Introduction and Configuration” Redbook , it states:
“The addition of VLAN interfaces to the SEA adapter is only necessary if the VIO
Server itself needs to communicate on these VLANs”.

“Q: Hi Chris,

We are trying to
associate a new entX Virtual Ethernet Trunk Device to an existing SEA. The new
device must be configured for VLAN tagging. The existing virtual Ethernet
adapter that (is already associated with the SEA) is not configured for VLAN
tagging. This device will remain associated to the SEA and continue to pass
untagged packets to the already configured network.

Ultimately the configuration
we want would be two entX devices associated with the existing SEA. One entX
device is configured for notagged packets and the other entX device is
configured for tagging.

Reply: “hmm ok I see
what you are saying, I will give it a go and tell you how it turns out...thanks.
ok finally got around to testing using a VIOS at DR site. Created
new virtual adapter PVID 55 and VID 888 (ent9) then added it to the existing
SEA as shown below: