xx_perf_watcher for monitoring waits during short time intervals

A while ago I worked on a performance troubleshooting case where frequent short time degradation of IO performance on NetApp storage was suspected to be the root cause. The problem was to get some proof as looking at the averages of IO service time was not alarming enough. I decided to write a tool that could be used to monitor wait times for any DB wait event in short intervals, e.g. so I could get measures for db file sequential/scattered read performance each second

This is what I thought it needed to be capable of:

It should be lightweight and easy to set up;

the results have to be visible real-time;

it should be possible to spool results to file;

it should allow monitoring any wait event;

it should allow to define interval length between measurement points.

In the end I came up with the solution - a pipelined function, with interval size and name of wait event as parameters, that “queries” the performance metrics using a simple select statement, making it possible to spool the results into a file and seeing results real-time. You can take a look at it in the video below. Continue reading if you’re interested in seeing the source code and reading some explanations on key implementation tricks that made this possible.

OK, here are some details on how the tool is written:

It’s a pipelined function that returns SYSTEM.PA_VARCHAR2_240_TBL_TYPE type result, I chose this type as it was available on 10g and 11g and this way I didn’t need to create my own data types. Function is pipelined therefore it allows constructing a SELECT statement to return data from function that passes the results via PIPE ROW statement. Choosing this construction allows to use query syntax to get the results, note setting arraysize to 1 is really important to see the real-time results, SQL*Plus is fetching rows in arraysize sets and not displaying them until the whole set is received.

(Updated on 5-Apr-2013): Looks like SYSTEM.PA_VARCHAR2_240_TBL_TYPE was not a good type to use as it’s specific to e-Business Suite database. Iče adjusted the code below to create the required types just to get the tool running for now. I’ll also revise the blog post in near future to validate it. I just tested the new version on 11.2.0.3 by creating all objects as SYS and it worked nicely.

I’ve also created a function XX_PERF_WATCHER.SLEEP on top of DBMS_LOCK.SLEEP, so that it could be included in the same NESTED LOOP sampling technique to set the required sampling interval.

The package works on 10g and 11g (I’ve tested it on 10.2.0.4, 10.2.0.5, 11.1.0.7, 11.2.0.1 and 11.2.0.2), but as the definition of X$KSLEI differs on 10g and 11g, I had to use compiler directives $IF, $ELSIF, $ELSE, $END to separate 10g code from 11g code.

As the function samples X$ table, I installed the package using SYS, unfortunately I didn’t manage to figure out how to get similar functionality without need of creating any DB objects, so if any of you know some ways, please let me know!

And finally here is the full code of the pl/sql package (click on the “show source” link), just remember, try it out in the test environment if you decide to give it a go!