Blogs

About this blog

AIXpert Blog is about the AIX operating system from IBM running on POWER based machines called Power Systems and software related to it like PowerVM for virtualisation, PowerVC for Deploying VM's and PowerSC for security plus performance monitoring and nmon

Tags

nmon - new online Physical CPU Graphs arrive for latest AIX 6.1

I have just upgraded my EMEA Power Systems Advanced Technical Support internal Wiki Apache Web-server to the very latest AIX level, which is AIX 6.1 Technology Level 6 Service Pack 5 or AIX6.1TL6sp5 for short!

Then I noticed that the nmon at this AIX level has been updated - I worked the code for the internal prototype of this new feature some time ago and it has arrived in the official topas/nmon code at this release - COOL. The latest release AIX 7.1 service pack has this new options too.The problem: As the numbers of CPU goes up with each generation of Power processor, it gets hard to monitor all the CPUs. nmon displays logical CPUs, so you get four for each physical CPU (core) when you have SMT=4 switched on. For the mighty Power 795 with all 256 CPUs in one partition (unlikely but this is just for example) - you get 1024 lines of output - I don't know anyone with a screen big enough. And I do hear about customers with 128 CPU LPARs.

What we need on super large and also smaller partitions is:

Auto-Scaling so that we can see the tiny peaks when not busy and still monitor large peaks when the uncapped partition goes well over Entitlement.

We want slightly longer term view with the last 70 snap shots on the screen so we can see the peakiness of the work. This is better than dozens of rapidly moving individual CPU graphs and lots of numbers.

To minimise the screen updates the plotted graphs are not moved - just the update point.

Also the average seen on the screen would be useful.

Know the Entitlement. so we know when its going over it the the actual current Physical CPU used.

A nice peaceful green colour!

So with this new level of AIX, you get a new Physical CPU monitor graph which you switch on with l# (that is lowercase L and hash). See below for a period when the CPU is running between zero and 1 physical CPU used and the scale matches that. The screen is updated every 2 second (the default hit minus or plus to halve or double that) so we are looking at 70 x 2 = 140 seconds = a little over the last two minutes worth:

What if the peaks are all small and in the bottom 20% of the screen?Below, you can see in a quieter period the the left hand scale has been changed to zero to 0.5 CPUs - when it makes a scale change it flashes up "RESCALING" at the bottom:

And what about when the peaks get very very small - like below a single character above?Then below when the machine is really totally idle it is Rescaled to zero to 0.62 of a CPU, so you still get to see the tiny peaks:

What happens when the partition gets very busy and over Entitlement?Here below, you see the peak in work on my Web-server and the scale is zero to 1 CPU. If the peaks get low as the white bar moves across and cycles round then the graph is rescaled and flashes "RESCALING" at the bottom as it does it. The same happens if the peak go over the top of the graph.

Advanced features

1. If you hit "0" (zero) - the graph is reset to the front (left most column) and the graph will start again Rescaling until its correct. 2. If left running the update line (white "|" line) moves to the right edge and then wraps to the left and starts over writing the oldest data. 3. Note the plus ("+") on the update line - this shows the average of the data points shown.