I was working
with a customer recently on a Power Blade that was running the Integrated
Virtualisation Manager (IVM). They’d installed a VIO partition onto the Blade
and had hoped to install a couple of AIX LPARs on the system. However they didn’t
get very far.

As soon as they
attempted to NIM install the LPARs, they would get stuck at trying to ping the
NIM master from the client. Basically, the Shared Ethernet Adapter (SEA) was
not working properly and none of the LPARs could communicate with the external
network. So they asked for some assistance.

The Blade server
name was Server-8406-71Y-SN06BF99Z. The SEA was configured as ent7.

On the network
switch port, the native VLAN (PVID), was configured as 11, with VLAN tag 68
added as an allowed VLAN. If the client LPARs tried to access the network using
a PVID of 68, instead of a VLAN TAG of 68, they would get stuck at the switch
port i.e. the un-tagged packets for 10.1.68.X via PVID 11 would fail. The
packets for 10.1.68.X needed to be tagged with VLAN id 68 in order for the
switch to pass the traffic.

So the question
was, how do we add VLAN tags in the IVM environment? If we’d been using a HMC,
then this would be simple to fix. Just add the VLAN tags into the Virtual
Ethernet Adapter used by the SEA and we’d be done.

We had to use the
lshwres and chhwres commands to resolve this one. First we listed the virtual
adapters known to the VIO server (IVM). At slot 12, we found our SEA adapter
with port_vlan_id set to 68 and addl_vlan_ids set to none.

We needed to
change port_vlan_id to 11 and addl_vlan_ids to 68. We also required
the ieee_virtual_eth value set to 1.

First we removed
the existing SEA adapter, as we would not be able to make changes to it while
it was “active”. We then removed the adapter from slot 12 and then re-added it,
again at slot 12, with port_vlan_id
and addl_vlan_ids set to the desired
values.

How can I determine vscsi0 or vscsi1 is currently carrying data
through VIOS and VIOC.

Thanks a lot!

By default the Virtual I/O Client (VIOC) will use the first VIOS for
all VSCSI traffic. You can confirm which VSCSI adapter is being used by
starting nmon (on the VIOC) and
pressing the ‘a’ key. This will show which vscsi adapter is currently in use.
e.g.

Use the lspath command to determine the current path priority for a hdisk.
By default it will be set to 1 for both adapters, meaning that traffic will go
via the first path. Priority 1 is the highest priority, and you can define a
priority from 1 to 255.

# lspath
-AE -l hdisk2 -p vscsi2

priority 1
Priority True

# lspath
-AE -l hdisk2 -p vscsi3

priority 1
Priority True

You can change the path priority of a hdisk using the chpath command. For example, we change
the priority for vscsi2 to 255, so that vscsi3 now has the highest priority:

# chpath -l
hdisk2 -a priority=255 -p vscsi2

# chpath -l
hdisk2 -a priority=1 -p vscsi3

This will change the path priority for hdisk2 so that its traffic will
use vscsi3 instead vscsi2.

If you run out of space in the root
file system, odd things can happen when you try to map virtual devices to
virtual adapters with mkvdev.

For example, a colleague of mine was
attempting to map a new hdisk to a vhost adapter on a pair of VIOS. The VIOS
was running a recent version of code. He received the following error message
(see below). It wasn’t a very helpful message. At first I thought it was due to
the fact that he had not set the reserve_policy
attribute for the new disk to no_reserve
on both VIOS. Changing the value for that attribute did not help.

I found the
same issue on the second VIOS i.e. a full root file system due to a core file
(from cimserver). I also found no trace of a full file system event in the error
report. Perhaps someone had taken it upon themselves to “clean house” at some
point and had removed entries from the VIOS error log.

Make sure
you monitor file system space on your VIOS. Who knows what else might fail if
you run out of space in a critical file system.

Recently I
had the pleasure of configuring a couple of POWER7 720s for a customer. Each
720 was to host roughly 12 LPARs each. There were two VIO servers and a NIM
server per 720.

Everything
went along nicely and according to plan. In a few days we had both systems
built. My steps for building the systems were almost identical to those
described by Rob McNelly in a recent post.

All four
VIO servers were running the latest VIOS code i.e. 2.2.0.10-FP-24 SP-01. All the client LPARs were running AIX 6.1 TL6 SP3. Each VIOS was
configured with two FC paths to the SAN and the SAN storage device was an IBM DS5020.

Native AIX
MPIO was in use on the VIOS and the AIX LPARs. I did not deploy SDDPCM on the
VIOS as this is currently unsupported with the DS5020.

Once the
LPAR builds were complete we performed a number of “integration tests”. These typically
involve disconnecting network and SAN cables from each VIOS and observing how
the VIOS and LPARs respond to and recover from these types of conditions.

One of the
integration tests required that BOTH fibre cables be disconnected from the
first VIOS and confirm that the client LPARs were not impacted i.e. that all
I/O travelled via the second VIOS.

During the test we notice the following:

I/O on
the client LPAR would hang for approximately 5 minutes. Our test was simple
enough, create a file in a file system. Eventually the I/O would continue
after the 5 minute delay.

The VIOS
became sluggish and took some time to respond. The lspath command would take a very long time to return (or in some cases it would never
return). During one test, the VIOS actually hung and had to be restarted
(however we were not able to reproduce this issue again).

When we
reconnected both the cables, the paths would recover on the first VIOS
after another 5 minutes. On the client however, it took roughly 20 minutes
for the paths to recover. And yes, the hcheck_interval
was set to 60 on all disks.

What's even more puzzling was that if we simply rebooted the first VIOS,
everything worked as expected i.e. the client LPARs were not impacted, I/O continued
as normal and when the first VIOS was back up, the paths on the client LPARs
recovered quickly.

After
doing some research, we discovered the following post on the IBM developerWorks AIX forum:

This post
highlighted a number of things we needed to check and also confirmed several
decisions we’d made during the design process, such as SDDPCM not being
supported with DS5020 storage and VIOS (this was good as some people were
starting to believe we should have installed SDDPCM to resolve this problem.
I’d only be happy to do this if it was a supported combination and it’s not).

Finally we
found the following IBM tech note that related directly to our issue.

"For active/passive storage device, such as DS3K, DS4K, or DS5K if
complete access is lost to the storage device, then it may take greater than 5 minutes to fail I/O. This feature is for
Active/Passive storage devices, which are running with the AIX Default A/P PCM. This includes DS3K, DS4K, and DS5K family of
devices.”

The new feature was
described as follows.

“Added feature which health checks
controllers, when an enabled path becomes unavailable, due to transport problems.
By default this feature is DISABLED.
To enable this feature set the following ODM attributes for the active/passive storage
device. Enabling this feature, results
in faster I/O failure times.

cntl_delay_time:
Is the amount of time in seconds the storage device's controller(s) will be
health checked after a transport failure. At the end of this period, if
no paths are detected as good, then all pending and subsequent I/O to the
device will be failed, until the device health checker detects a failed path
has returned.

cntl_hcheck_int:
The first controller health check will only be issued after a storage fabric transport failure had been
detected. cntl_hcheck_int is the amount of time in seconds, which the next
controller health check command will be issued. This value must be less than
the cntl_delay_time(unless set to
"0",
disabled).

If you wish to allow the storage
device 30 seconds to come back on the fabric (after leaving the fabric), then
you can set cntl_delay_time=30 and
cntl_hcheck_int=2.

The device, /dev/hdisk#, must not be in use, when setting the ODM values (or
the chdev "-P" option must be used, which requires a
reboot).

CAUTION: There are cases where the storage
device may reboot both of the controllers and become inaccessible for
a period of time. If the controller health check sequence is enabled, then this
may result in an I/O failure. It is recommended to to make sure you have
an mirrored
volume
to failover to, if you are running with controller health check enabled (especially
with under 60 second cntl_delay_time). “

And as I
suspected the issue was related to the type of storage we were using. It
appears the I/O delay was attributed to the following attributes on the DS5020
hdisks on the VIOS:

After
making the changes to the hdisks on all VIOS, I performed the same test i.e.
disconnected BOTH fibre cables from the first VIOS and continued to write a
file to a file system on the client LPARs. By modifying these values on all the
DS5020 disks, on all the VIO servers, the I/O delay was reduced to seconds
rather than five minutes!

The following
attributes were used for the hdisks and adapters in the final configuration.

I updated my
lab VIOS to the latest fix pack (V2.2.2.1)
this week and thought I’d try the new VIOS part
command. This new command is an improved version of the existing vios_advisor tool. The major difference
between the two is the fact that the new tool is now included with VIOS code
and will be updated via new VIOS fix packs. The following link has some
information on using the command:

This new
tool “Provides performance reports with
suggestions for making configurational changes to the environment, and helps to
identify areas for further investigation. The reports are based on the key
performance metrics of various partition resources that are collected from the Virtual I/O Server (VIOS)
environment.” Just like the old VIOS advisor.

I ran the
tool for 10 minutes on my idle VIOS, just to see how the new XML report looked.

I then scp’ed the tar file to my
laptop, extracted it and opened the vios_advisor_report.xml
file. This is what the report looked like:

I was also able to open the nmon file using the nmon analyser tool. It produced typical
nmon performance graphs as you’d
expect.

So, not only does the new part tool run the VIOS advisor it also capturesnmon performance data at the same time.

This is rather impressive and a great
move by IBM. The original VIOS advisor tool was free and of course not
officially supported by IBM (although the development team were very responsive
to requests from users of the tool). The new tool is fully supported by the IBM
team and as a result will only get better and better as time goes by. I’m not a
fan of the new command name, part (good luck trying to google
it!), I still prefer vios_advisor
but hey, what’s in a name, right?

“The virtual optical device seems
to use max_transfer size as 0x40000 (256 KB) internally, which is not allowed
to be modified. Virtual optical CD's max_transfer seems to limit vhost
adapter's max_transfer size. So if you want to increase max_transfer size for
virtual SCSI disks, you should separate them from virtual optical SCSI CD by
configuring those two groups under different vhost adapters”

Until recently, if you were
configuring a new LPAR with virtual FC adapters you couldn’t force it to log
into the SAN before an operating system (such as AIX) was installed. I’ve
written about this before (see link below). I also offered a way to work around
this issue.

I’ve successfully used this method
on both POWER6 (595) and POWER7 (795) systems. After configuring a new LPAR
profile with a single VFC adapter, the VIOS reported that the client was not
logged into the SAN:

The first
time I ran the ’viostat –adapter’ command I expected to find
non-zero values for kbps, tps, etc, for each vfchost adapter. However, the values were always zero, no matter
how much traffic traversed the adapters.

$ viostat
-adapter 1 10

...

vadapter:Kbpstpsbkreadbkwrtn

vfchost00.00.00.00.0

...

vadapter:Kbpstpsbkreadbkwrtn

vfchost10.00.00.00.0

I wondered
if this was expected behaviour. Was the output supposed to report the amount of
pass-thru traffic per vfchost? In 2011, I posed this question on the IBM developerWorks
PowerVM forum regarding this observation. One of the replies stated:

"viostat
does not give statistics for NPIV devices. The vfchost adapter is just a
passthru, it doesn't know what the commands it gets are."

I
appreciated someone taking the time to answer my question but I was still
curious. I tested the same command again (in 2013) on a recent VIOS level (2.2.2.1),
but I received the same result. It was time to source an official answer on
this behaviour.

Here is the
official response I received from IBM:

1.FC adapter stats in viostat/iostat do not include
NPIV.

2.viostat & iostat are an aggregate of all the
stats from the underlying disks, which of course NPIV doesn't have.

There's really no way for the vfchost adapter to monitor I/O,
since it doesn't know what the commands it gets are. He's just a passthru,
passing the commands he gets from the client directly to the physical FC
adapter.

3.You can run fcstat on the VIOS but that has the same
issues/limitations mentioned above.

Intent here was that customers would use tools on the client to monitor
this sort of thing.

To summarize
the comments from Development:

viostat
does NOT give statistics for NPIV devices.”

This made
sense but I wondered why the tool hadn’t been changed to exclude vfchost adapters from the output (to
avoid customer confusion). There's obviously no valid reason to ever display any
information for this type of adapter. I also understood that it was expected
that I/O would be monitored at the client LPAR level. But I must say that an
option for monitoring VFC I/O from a VIO server would be advantageous i.e. a
single source view of all I/O activity for all VFC clients; particularly when
there are several hundred partitions on a frame.The response was:

“…the way
the vfchost driver currently works is that it calls iostadd to register a
dkstat structure, resulting in the adapter being listed when viostat is
called.This is misleading, however,
since the vfchost driver does not actually track I/O.The commands coming from the client partition
are simply passed as-is to the physical FC adapter, and we don't know if a
particular command is an I/O command or not.The iostadd call is left over from porting the code from the vscsi
driver, and Development agrees it should probably have been removed before
shipping the code.

There has
also been mention of a DCR #MR0413117456 (Title: FC adapter stats in
viostat/iostat does not include NPIV) which you can follow-up with Marketing to
register your interest/track progress if that is something you're interested in
pursuing.”

I came across this strange LPM issue recently. Thought I’d share it with you.

All the customers VIOS were configured with the viosecure level set to high.

When the VIOS is configured with a security profile set to high, PCI or DoD, a new feature is enabled during LPM. This new feature is called “Secure LPM”. As a result when you initiate an LPM operation, the “Secure LPM” feature automatically enables the VIOS firewall and configures a secure (ipsec) tunnel for all LPM traffic over the network.

Before we started LPM, the VIOS firewall is off:

$ viosecure -firewall view

IPv4 Firewall OFF

ALLOWED PORTS

Local Remote

Interface Port Port Service IPAddress Expiration

Time(seconds)

--------- ---- ---- ------- --------- ---------------

And it's activated after we start LPM:

$ viosecure -firewall view

IPv4 Firewall ON

ALLOWED PORTS

Local Remote

Interface Port Port Service IPAddress Expiration

Time(seconds)

--------- ---- ---- ------- --------- ---------------

In the customer’s case, we moved a partition from a 795 running a pair of VIOS at 2.2.1.3 to another 795 running another pair of VIOS at 2.2.2.1. This worked OK. When we attempted to move the partition back, we received an error stating that the source MSP had rejected the request.

HSCLA230

The mover service partition on the source managed system has rejected the request to stop the migration. Verify that the migration state of the partition is Migration Starting, and try the operation again.

After a while we discovered that when the LPM operation started, the source VIOS would enable its firewall and we would immediately loose connectivity (i.e. our SSH session to the VIOS would hang). We also found that RMC connectivity between the source VIOS and HMC also dropped. Somehow the VIOS firewall was blocking network connectivity to pretty much everything.

Eventually, after much digging we found that the source VIOS was the victim of an errant firewall rule (configured on the source VIOS itself).

0 permit 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 yes all any 0 any 0 both both no all packets 0 all 0 none Default Rule

We were then able to move the partition back to the original frame. We are still searching for an answer to how this happened. The IBM support team believe the rule must have been created by an administrator at some point. Of course, the administrator claims that is untrue.

I’ve been working with a customer recently that required a special kind of disaster recovery capability for their shared storage pool (SSP) environment. The customer had implemented a private cloud solution in their existing POWER environment. The solution consisted of an IBM SmartCloud deployment with IBM Systems Director and VMControl managing the PowerVM deployment of virtual machines (AIX partitions) across the landscape.

The decision to use shared storage pools was driven by the fact that the customer was using SAN storage that could not be managed by VMControl. However, shared storage pool support was available with VMControl. This meant that the customer could continue using their non-IBM disk in their private cloud. The customer quickly noticed the many advantages that SSPs bring to a POWER cloud environment, such as very fast virtual machine deployment using snapshots and linked-clones.

Traditionally, the customer’s disaster (data) recovery process relied on SAN disk replication between its production and disaster recovery sites. In the event of a disaster the customer could recover a production system using a replicated copy of the SAN disks at the disaster recovery site. This also allowed the customer to perform regular disaster recovery tests. In these tests, the disaster recovery site is “fenced” from the real production network and SAN. The customer is then able to “bring up” a complete copy of any or all (replicated) production systems. Because these “copies” of production are fenced off from the real production environment the customer can undertake lengthy disaster recovery test programs without impacting the running production systems.

The customer had used this process for many, many years. They were happy with it and the flexibility it gave them. They wanted to continue using this process for the production SSP based systems.

The customer wanted the ability to recover a (storage) replicated copy of their entire shared storage pool at their disaster recovery (DR) site. We discussed the use of SSP failure group mirroring to cater for their DR requirements of their production AIX partitions. The following diagram shows our initial proposal of SSP mirroring across the production and DR sites.

However, this did not meet their needs. They needed a way to recover their AIX production partitions in isolation to their real production environment. SSP mirroring would not provide them with this flexibility. Essentially the customer could not start a “copy” of any or all production partitions at the DR site. In a SSP failure group setup, the SSP spans both the production and DR sites (as shown in the previous diagram). Meaning the SSP is in production across both sites. SSP mirroring would protect the customer in the event that the production site was to fail; they could start their production systems at the DR site, however they could not start isolated copies of the production systems in the same SSP. This meant SSP mirroring would break their current DR processes and provide them with reduced capability when compared to their traditional DR method.

Prior to SSP mirroring, one of the major drawbacks of SSPs was the lack of resilience at the storage layer. We were not able to mirror the disks in the pool from one storage subsystem to another (either local or remote). Having a single disk storage subsystem in a shared storage pool was a rather large single point of failure (SPOF). Starting with VIOS 2.2.3.0, we were given the capability to mirror the disks in a SSP from one storage system to another. All of the mirroring was configured and managed from the VIOS in the SSP cluster. The VIO client partitions are not aware that their disks are ‘mirror protected’ in the SSP. There are no changes required in the client partitions as the SSP mirroring feature is completely transparent to them. This feature enabled SSPs to be considered ready for production use.

Some other considerations that we discussed with the customer regarding SSP mirroring were the fact that the disks in the SSP failure group need to be shared across all the SSP nodes at both the production and DR sites. This is not a viable option in the case of long distances between sites (more than 100KM apart). SSP mirroring uses synchronous writes. This means that any write that happens at the production site has to be written synchronously on the failure group disks at the DR site first then production. This will introduce a delay in completing I/O which will have an impact on performance. Fortunately the customer’s two sites were far less than 100KM apart, so this consideration did not apply to them, but I mention it simply to make others aware of this point.

After much discussion with our SSP development team at IBM, we were able to provide the customer with an acceptable method of recovering their SSP environment at their DR site using a replica of the storage. What follows is a preview of this new recovery mechanism. This feature is not currently available as part of the standard SSP capability with VIOS 2.2.3.X. It is hoped that we will see this feature officially offered and supported at some stage in 2015. Of course, this is not an announcement and plans are always subject to change.

The following diagram provides a pictorial view of what the customer needed for DR with their SSP systems. As you can see in the diagram, there is a production and DR site. At the DR site there is a single SSP (with two VIOS in the cluster). Both the production and DR VIOS are in two separate shared storage pools.

The SSP development team provided a special ifix for the customer to install on their DR virtual I/O servers (VIOS). They also provided a script and procedure for the customer to follow. We successfully tested this procedure at the customers DR site. This was a huge success from the customer’s perspective as they were able to recover their entire production SSP in a few minutes. They could then selectively start/stop clones of the production AIX partitions at their DR site, without impacting the “real” production partitions at the production site. They were also able to continue using their existing storage replication method (as they had always done for all of their systems) in a DR situation. Not having to dramatically change their existing DR process and procedures was very important to the customer.

I’ll walk through the procedure now and share some insights (from my lab tests) as we go. It goes without saying, that before we started, the customer ensured that all the production SSP disks (LUNs) were being replicated to their DR site using their standard storage replication facility. The SSP repository disk was not replicated to the DR site. A new disk was used as the repos disk at the DR site.

The process (described in detail below) consists of a) taking a backup of the production SSP cluster configuration, b) running a “customise” script to modify the saved backup file to replace the production VIOS and hdisk names with the DR VIOS and hdisk names, c) restoring the modified VIOS backup on the DR VIOS at the DR site and finally d) verifying that the cluster has been recovered using the replicated storage at DR.

--

The first step was to take a backup of the production SSP VIOS cluster using the viosbr command. At the primary (production) site we used viosbr to create a backup of the cluster configuration. This generated a backup file named /home/padmin/cfgbackups/sspbackup.mycluster.tar.gz.

$ viosbr -backup -clustername mycluster -file sspbackup

The viosbr backup file was then transferred to one of the DR VIOS. On the selected DR VIOS we copy the backup file (sspbackup.mycluster.tar.gz) to the /home/padmin/backups directory.

We create a new file called nodelist in the backups directory. This file contains a list of the DR VIOS hostnames. These hostnames are used to create the replicated cluster at the remote site.

sspnode1.abc.com
sspnode2.abc.com

Also in the backups directory, we create another new file called disklist. This file contains a list of each of the replicated SSP disks with their unique_ids. These disks are the replicated copies of the SSP pool disks from the primary site. All these disks must be accessible on all DR VIOS nodes.

The next step is to copy the necessary ifix file to the backups directory on the selected DR VIOS. We untar this file, which extracts two files. One file is the customize script and the other is the actual ifix we need to install on both VIOS.

./customize_ssp_backup
./ifix/sspDR_1.140627.epkg.Z

We install the ifix on all the DR VIOS which are going to be part of SSP setup.

$ updateios -install -dev /home/padmin/backups/ifix -accept

We run the script to customize the backup file for new setup at the DR site.

We can now verify the new cluster has been configured and recovered correctly. We use the cluster and lu commands to check the cluster is up and that the logical units are available in the shared storage pool.

$ cluster -status -clustername mycluster

$ lu –list

The SSP is now ready for use at the DR site.

--

Before we tested this procedure in the customer’s environment, I tried it in my lab first. My lab had the following configuration:

1 x POWER7 750 with a single VIOS and a Shared Storage Pool. I considered this my “production” SSP, which I wanted to recover on another POWER system in my lab.

1 x POWER7 PS701 Blade with a single VIOS. No SSP. I treated this system as my “DR” environment on which I wanted to recover a complete copy of my “production” SSP.

My “production” SSP was configured with V7000 storage. I asked my storage administrator to create a copy (via flashcopy) of the nominated SSP disks from my 750 VIOS cluster. He then presented these new (flashcopied) LUNs to the “DR” VIOS on the PS701.

I then ran through the recovery steps (as outlined above). I started by taking a backup of the VIOS on the 750 using the viosbr command.

750vio1 (“production”)

-------------------------------

$ viosbr -backup -clustername VIOCLUSTER -file sspbackup

Backup of this node (750vio1) successful

I discovered that a SSP was already configured on my blade VIOS. I would need to delete the SSP cluster before I could recover the “production” cluster backup on this system. I obtained a list of the current disks in the SSP.

ps701-vios (“DR”)

-----------------------

$ pv -list

POOL_NAME: VIOSSP

TIER_NAME: SYSTEM

FG_NAME: Default

PV_NAME SIZE(MB) STATE UDID

hdisk2 51200 ONLINE 3321360050768019E027CC000000000000~

hdisk3 51200 ONLINE 3321360050768019E027CC000000000000~

I also took note of the existing repository disk.

$ lspv | grep caa

hdisk1 000a366a031ffa37 caavg_private active

I removed any existing client VSCSI mappings for SSP disk so that I could delete the logical units from the SSP and delete the cluster.

If another cluster is using this disk, that cluster will be destroyed.

Are you sure? (y/[n]) y

WARNING: Force continue.

ERROR: import caavg_private failed.

remove_cluster_repository: Force continue.

ERROR: Cannot varyonvg caavg_private. It does not exist.

ERROR: rc=1, remove caavg_private failed.

remove_cluster_repository: Force continue.

rmcluster: Successfully removed hdisk1.

I entered the oem_setup_env shell and ran cfgmgr to bring in the new (flashed) disks on the “DR” VIOS. Two new disks were configured. I would use these disks with the SSP recovery script and disklist file.

$ oem_setup_env

# cfgmgr

#

# lspv

hdisk0 000a366a68640e58 rootvg active

hdisk1 000a366a031ffa37 None

hdisk2 none None

hdisk3 none None

hdisk4 none None

hdisk5 none None

hdisk6 none None

hdisk7 none None

hdisk8 none None

hdisk9 none None

hdisk10 none None

hdisk11 000a366a6b0d0d56 None

hdisk12 000a366a6b0d0d56 None

hdisk13 000a366a6b0d0d56 None

hdisk14 none None <<<<<<<<

hdisk15 00f603cde9a7b15a None <<<<<<<<

#

I took note of the hdisks and the unique_ids and placed this information in a new file named disklist.

After the successful restore, I was able to display the cluster status and list the logical units in the recovered SSP.

$ cluster -list

CLUSTER_NAME: VIOCLUSTER

CLUSTER_ID: 9fa498e4896611e38d72f603cdcd9c55

$ lu -list

POOL_NAME: VIOSSP

TIER_NAME: SYSTEM

LU_NAME SIZE(MB) UNUSED(MB) UDID

sspdisk1_750lpar11 10240 7251 4f42b83c0ed7e826a4784b204b1c81ea

SNAPSHOTS

sspdisk1_750lpar11_snap1

$ cluster -status -clustername VIOCLUSTER

Cluster Name State

VIOCLUSTER OK

Node Name MTM Partition Num State Pool State

ps701-vios 8406-71Y0310A366A 1 OK OK

$

As expected, both of the “flashed” disks were now assigned to the recovered disk pool.

$ pv -list

POOL_NAME: VIOSSP

TIER_NAME: SYSTEM

FG_NAME: Default

PV_NAME SIZE(MB) STATE UDID

hdisk15 51200 ONLINE 3321360050768019E027CC000000000000~

hdisk14 51200 ONLINE 3321360050768019E027CC000000000000~

The existing hdisk1 had been successfully re-used for the repository disk of the “new” cluster.

$ lspv | grep caa

hdisk1 000a366a031ffa37 caavg_private active

$

The VIOS error report on the “DR” VIOS also confirmed that the SSP cluster was recovered and the services started correctly.

$ errlog

IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION

228C1EBE 0707110114 I S POOL Cluster state change has occurred

228C1EBE 0707110014 I S POOL Cluster state change has occurred

EDFF8E9B 0707110014 I O StorageRM IBM.StorageRM daemon has started.

3B16518D 0707110014 I S ConfigRM The node is online in the domain indicat

4BDDFBCC 0707110014 I S ConfigRM The operational quorum state of the acti

AFA89905 0707110014 I O cthags Group Services daemon started

$

At this point I was able to map the LU to my VIO client partition and boot it on the PS701. AIX started without an issue and I now had a complete clone of my existing AIX partition (from the 750) running on my POWER blade.

Shared storage pools already provide some fantastic capabilities that are attractive to most POWER customers. Features such as SSP mirroring, snapshots and linked-clones provide some very powerful and flexible configurations that will accelerate virtual machine deployment and management on IBM Power Systems.

I believe the enhancement I’ve discussed here will prove very popular with many of our enterprise AIX and POWER customers. It will provide one more compelling reason for customers to consider shared storage pools in their environments, particularly for disaster recovery situations. There may even be the capability (in the future) to completely automate the entire procedure that I’ve outlined in this article. I hope that I can provide further updates on this capability in early 2015.