A More Complex Statspack Example - Part 1

Doug's Oracle Blog

Following on from the last Statspack example, up popped an example at work this week of another common reason I use Statspack - comparing the performance of different environments. It's also a nice illustration of some of Statspack's limitations.

Because a Statspack report contains a lot of information and this particular exercise threw up a fewcommon problems, I'm going to break it up into several postings. In fact, my overall analysis was prefixed with the following.

"I'm sorry this reply is so long and sprawling and doesn't contain any easy answers."

Mmmm, quite.

When you're a DBA for a large organisation, it's only a matter of time before one of your software suppliers claims that the performance they're getting in their test environment is substantially better than your test environment. The initial stages of this discussion usually involve sending them lists of init.ora parameters (this is *very* important ), lists of indexes, export dumps of your schema and so it goes on.

Eventually, though, the only sensible solution is to analyze the performance of the same job in the two different environments. My preference here is to use event 10046 or DBMS_MONITOR/DBMS_SESSION to trace the actions and wait events (in fact, there's an example from the same application here) but that can be a little tricky if the application isn't instrumented and uses connection pools. Sometimes it's difficult to narrow down which session you should be tracing. Without wanting to be too disparaging, I also need to think about how experienced the vendor's technical people are with tracing. Bitter experience leans towards 'not very'.

In those cases Statspack can be useful, particularly if we're using test environments and we can guarantee that the only job running is the one we're interested in. This is what Mogens Norgaard means when he says :-

"The one exception is when only one thing was running ... then the
summary becomes just a summary of that one form, report or whatever."

in this blog which focusses on the limitations of Statspack. That's what I suggested to the vendor - we would both enable Statspack snapshot collection, run the same job with a before and after snapshot and compare the resulting reports.

Anyway, let's start delving into these reports. I usually open the pair in different windows and work my way through them, flicking back and forward between the two. Most of the information in this blog will be obfuscated, but I've kept the important details intact.

The first thing I noticed about the reports was that we're running Oracle 9.2.0.7 and the vendor is running 9.2.0.1. That difference might turn out to be minor, but I've hit enough 9.2.0.x bugs to know that it might not be. Having looked through both reports completely, I don't think it's the cause of the performance difference in this case but I've suggested they might want to upgrade to a later version.

Here's the next section that shows the report period. At first I was a little confused by this because we have 15-minute snapshots, so I took a snapshot before the job was running and then the nearest 15 minutes after it finished. On our system the job took about 12 hours.

The reason for the anomaly was that they had left the job running and had run a closing snapshot hours after the job finished, the next morning. However, as their system was running nothing after the job completed, I could make allowances for this. I would have preferred them to have a report covering the exact period of the job, though.

In fact, you might think at this point that I've broken one of the cardinal rules of sensible Statspack use. A 12 hour report?!? What good could that be? Think of the detail hidden in those 12 hours! Well, not really because the job cycles round a few different sections multiple times and, even though it's a very long business task, it is a single business task and I want to compare the entire task running on two different environments. If I do want to drill down to a narrower period, I can use the multiple 15 minute snapshots.

Next the Cache Sizes. This section is one of a few (including the list of non-default parameters at the end of the report) that don't help me solve any performance problem, but offer a quick sanity check of the two environments.

Why are our caches so much bigger than the vendor's? Because we've been working on this problem for a little while now and the vendors just keep asking us to increase cache sizes in the hope that everything will magically perform better. If in doubt, bump everything up a bit! (I'll leave it to the reader to guess how successful this has proved.)

This is the stage where I started to have doubts. Look at the difference in redo size per second and per transaction.

Now that I'm writing down how I tend to use Statspack, I realise that Redo size is a figure I tend to focus on because I think it gives a pretty good indication of workload. If both these databases are running the same job, why the difference, particularly the massive difference in redo size per transaction? (In fact, it looks like our system is doing about 50% more work than the vendors when you look at the various load profile values)

Interesting idea and I confess I'd never thought of that. I think the reason is that, given most external vendors struggle with anything more complex than a list of init.ora parameters, they would just be overwhelmed by an RDA report!

Seriously, it takes them a long time to swallow even a well-explained Statspack analysis.

The other thing is that it's very little effort to give them what they want with a brief comment that I don't think it will help them and why

Disclaimer

For the avoidance of any doubt, all views expressed here are my own and not those of past or current employers, clients, friends, Oracle Corporation, my Mum or, indeed, Flatcat. If you want to sue someone, I suggest you pick on Tigger, but I hope you have a good lawyer. Frankly, I doubt any of the former agree with my views or would want to be associated with them in any way.