Blogs

ACERCA DE ESTE BLOG

AIXpert Blog is about the AIX operating system from IBM running on POWER based machines called Power Systems and software related to it like PowerVM for virtualisation, PowerVC for Deploying VM's and PowerSC for security plus performance monitoring and nmon

Shared Storage Pool Thin provisioning is pretty cool and saves a lot of disk space. Effectively the 100's of GBs of unused disks space in 100's of Virtual Machines (LPARs) are bought together in one place in the Shared Storage Pool and then can be used for real. All this without the use of clever disk subsystems. The risk is that you use all the disk space of the pool and the next VM that tries to write to a new chuck of disk (a chunk is 64MB), fails to get it gets disk errors. Thus monitoring the free space and getting alerts is very important. But the alerts may not work as you expected. The "alert" command sets the threshold simply enough but then it gets complicated:

First, we found a bug for which there is a fix now that I tested (see below for more information on the bug).

Second, we have the Virtual I/O Servers in a cluster and the one reporting the Alert could be any of them. There is a very sound reason for this but to the VIOS administrator it looks effectively chosen at random.

Third, the message is in the VIOS error log, which you access via errlog which flies of the screen unless you use "errlog | more" - how very DOS like! errlog is very very similar to the AIX errpt commend. Then you have to use "errlog -ls | more" to find the details.

Fourth, it is not clear that it is an alert for going over the threshold. The current threshold percent nor the current free space percent is in the message!

Note: There are other log entries that are very similar but have "Storage Pool Up Event" and don't say "Threshold Exceeded." They are NOT free space low alerts.

I keep thinking of those jokes about a black cat, at night and dark coal bunker! I am sure the developers have not tried to hid the alert messages but that is what we have got - well hidden high impart warning messages hidden in the cluster.

Making the low free space issue more visible

So even if you know the free space has just gone below the limit it is a lot of work finding the Alert - not what I would call pro-actively letting you know there is a problem that needs fixing urgently. Various ideas are covered in the movie like some people escalate the VIOS system logs to other tools and could find the error condition there.

We could have cron based scripts regularly sending Shared Storage Pool free space stat email messages or only when the free space is low.

One alternative was using Systems Director 6.3 (ISD) - which I tried. I Discovered all four VIOS, gained access and ran Inventory - this only took a few minutes. Then I set the free space threshold percentage just below the current use and started the client VMs writing to new large files. Before I could change to the Systems Director browser to check it had already detected the alert event and reported it on the Problem panel. See below:

Alternatively we could use "cron" to regularly make checks and escalate email messages.

The movie also covers a script or two to reformat the "lssp" and "alert" command output to calculate percentages and amount of over-commit.

The developers found the bug which caused the alerts not to be captured, pretty quickly and the worked an efix (emergency fix) for us.

We then installed this on all VIOS nodes and checked the fix.

It worked first time and every time.

This will get released at some point - customers need to raise a PMR to get access to this fix.

One important point to remember is to install VIOS fixes with the updateios command - NOT the AIX emgr command (like I did, oops!).

I don't know the official number or release mechanism but the file I used included a number: 823942

If you or IBM Support have trouble identifying the fix ask them to contact me: Nigel Griffiths IBM UK.

When I get more information, I will add it here.

The next movie may be for SSP2 snapshots and perhaps Live Partition Mobility. I would like to encourage lots of people to try out Shared Storage Pools - I known the technology is not developed to help out poor hard working Power Systems techies like myself but it is so simple to use, quick compared to mucking around with LUNs and Zones and very flexible too as it opens up LPM to every LPAR I will create in 2012. Saves me time, saves disk space, makes for flexible VM environment - it is a win:win:win.