Oracle’s greatest performance feature July 14, 2009

With oracle 10 onwards, you get what I feel is the greatest single step forward for general oracle database performance since…Ohhhh, I give up. I think it is more significant than anything else I’ve seen and I started with Oracle 6.

What is it? It’s the automated statistics gathering job. If you don’t know what this is, I’ll give you a brief summary in a few moments.
If you don’t agree with me, I’d love to know what you would nominate as the best single performance improvement since Oracle 6. {Or even before, for those of you even greyer and more world-weary than I}. This is after all, just an opinion and I’m open to changing my mind.

Now for that promised and brief {fairly brief} description of the automated stats gathering and why I love it {but please do not interpret this as an attempt to stop you telling me you alternatives, I really would like to know}.

By default, every work-day night, an Oracle database that is 10g or higher will spend from 10pm collecting table, index and column stats for you. This stats gathering job will keep going until it has collected all that it thinks it needs or the clock hits 6am, at which point it stops. At the weekend, your busy little databases will work even harder, from midnight Saturday {ie 1 second after Friday ended} to the end of Sunday gathering stats, if need be. Again, the job will stop once it has gathered the stats it thinks your system needs.

The job gathers stats on tables/indexes in all schemas, including SYSTEM, SYS and those other odd internal ones like DBSNMP and WMSYS, as well as all your own schemas. It gathers stats only on tables that have changed by 10% or more since last stats were gathered, or that have been truncated or created since the last run. For each such table, this job will sample a proportion of said table that it thinks is needed to give reliable stats. For each table it also gathers stats on each index {I’ll skim over a slight issue in respect of sample size for indexes}. When a table has it’s stats gathered, Oracle will even make a stab at gathering the correct level a column statistics for each column, based on whether you ever use the column in joins or where clause {ie there is some sense to collecting detailed histogram stats only for those columns it would help}.

Finally in my brief description, and something that not all DBAs and developers appreciate, is that this automatic job stores the stats as they were before it gathered new ones, so you can go back to previous stats if you so wish {and it was not more than a month ago}.

This process, this nightly job, has flaws. Some of it’s decisions can be poor. It can gather stats you might not want it it and it can mess up. BUT! It does run regularly and it does gather generally beneficial stats for all tables, indexes, partitions, sub-partitions, columns. All of them. {mostly} .

Prior to this automatic job, many sites’ object stats were in very poor health.

Some sites did not gather table/index stats at all.

Many sites gathered stats only occasionally.

Some sites would gather stats on a few tables and none on most.

This last situation, stats on some tables none on many others, condemned the cost based optimiser to having to make decisions based on a very poor information and was/is very common under 9. I like to think of it as taking a map of the UK and removing all the roads and then trying to drive from London to Manchester. So long as you keep going generally North and west, you will get there. Eventually.

{I have a private theory that the number of hints in code is in proportion to the three situations above, the most being on systems with “stats on only a few tables”.}

This is why I think that job is the biggest step forward for performance. It ensures there is at least some information for every table and every index and that for most of them the data is correct to within 10%. Thus it allows the CBO to work with pretty accurate information and all that clever maths has a good chance of working.

Yes, if you know more about stats and performance and your system, you can do better than the automated job alone, but for the majority of sites it is a step forward. Especially for those sites that lack strong DBA/Developer expertise. Ie, the majority :-).

I know from talking to some Oracle support people that their lives have got quieter. I’m told that they get a lot fewer calls about SQL performance from Oracle 10 and 11 systems. Mind you, the usual “fix” of getting the customer to just gather some stats has gone out the window, which is a shame as it fixed most issues. All in all, it has been a massive boon to them.

So, I think it is the greatest step forward.

I also hate it. I hate that automated stats job. It makes silly decisions, it breaks, it is poorly documented and it makes a dog’s dinner of very big, very active or very odd systems. In fact it is rubbish. But very, very, very beneficial rubbish. I love that pile of rubbish.

Like this:

LikeLoading...

Related

I think the only problem with this job is gathering histograms by default which can be evil for 10G especially for databases which were upgraded to 10G. If you are not experienced enough, problems caused by this default histogram gathering option, can make you think CBO is stupid or buggy, because prior to 10g you were always solving problems by manually refreshing stale stats so what is the problem now :))

This job has been one of the major causes of bad performance since we upgraded to 10g. Before, we knew exactly when stats were gathered, how, for how long. where, etc.
Now, for example:
1- In our deelopment dbs that get a cold backup overnight, it’s odds on this thing will still be running when the backup kicks-in. Result? Incomplete stats, no alerting errors.
2- The catch word you used, “mostly”, defines its actions. It is not complete, it does not necessarily gather stats on all necessary objects, nor the correct ones.
3- The last thing I want in a stable system is to introduce an upsetting factor, such as changing the decisions of the CBO through different stats.

Fact is: the CBO is far from perfect. We often have to “help” it either through hints – if we have access to source code, a rarer and rarer proposition – or through a set of stats that makes it behave properly and most importantly: predictably. All out the window with the overnight stats gathering.

Having said that, I also reckon it is indeed a good thing. Provided one knows its limitations and plans accordingly. We have turned it off in just about all our dbs, with few exceptions. But for a very good and specific reason and always replaced by some policy or process.

In the absence of such conscious changes, I’d rather have it going than nothing. Hence why it is such a good idea.

Well – I think Oracle should have introduced the 11g SQL Plan Management way earlier … like with the introduction of the CBO itself !

Like Noons said, before we knew when the stats were gathered … now, you come in the office one morning and all hell breaks lose because an exec plan changed because the stats gathering job decided it was a good idea to put histograms on some tables…

The first thing we do is adjust the job so it doesn’t create any new histograms. (bind variables and histograms just don’t mix well in our situation … and if we have partitions, we disable bind peeking)
And we *try* to monitor exec plan changes … basically doing what Oracle should have done years ago… (oh and we need awr for that so they’ll make you pay for their own mess)

I can’t tell you how wonderful a day is when an exec plan changes on a query that is run by 6000 users every so many minutes ;-)

I know being “negative” isnt productive but:
The “statistics” and “CBO” subjects have eaten up soo much valuable time and brain-capacity of everybody…
And we still end up improving the access-paths with actual real-world knowledge, better indexes, better SQL, or even locked/hinted plans.
(I feel a blog-rant coming… )

I vowed somewhere around 8174 or 9202 not to get sucked into “cbo”: it was too much geeky/tweaky. But I ended up spending (wasting?) a lot of time on cbo-related problems.

The CBO is very advanced, very clever, but sometimes just trying to be too clever (or I’m too simple to see…).
It is also a nice money-spinner for the trouble-shooters, and that sometimes includes me, in the role of fixer/mercenary.

My main, simple message: Avoid Trouble.
CBO and stats are often a solution to a problem that can be avoided by having better, simpler, more clear access paths to data. Read up on Richard Foote, Tom Kyte and others. Think about the access yourself, upfront, rather then fixing it with CBO tweaks after go-live.

The intro of the Timed wait-events And the bright idea of UTLB/ESTAT were the defining point in monitoring and troubleshooting Oracle.
They were the foundation for all further work right up to AWR/ASH and the beloved GC graph (with thx to the TORA tool as well).

But I think we are now creeping off scope a bit. Martin started the post more or less onto CBO. And a brilliant piece of engineering that is, CBO.

Ahh, Noons, Kurt, that rogue SQL statement is what gets your blood moving without the need for caffeine! :-) {and yes, I have had many an interesting morning as a result of them}.

As I said, I hate the stats job, I love the stats job. I kind of figure that if, as the DBA for a system {systems, who has a DBA per system these days}, you wrote some code and procedures for gathering stats before 10, you should use it after you have gone to 10 – or at least intelligently use it in conjunction with the automated stats job. Lock stats {or delete stats and then lock stats} on key tables, copy stats to new partitions, gather stats manually on really tricky tables…

If the system is big, complex or highly active then the automated job is likely not to be good enough and those are the systems “problem solvers” tend to end up working on. For all those average oracle databases out there, I think it saves more pain than it gains…Maybe.

Piet, I’m happy for suggestions outside of the CBO arena, people might think there is some other more significant improvement – and I have to say I might agree with you and Graham on the Wait Events information. A major shift in how we look at performance.

I think in the love/hate equation (and it’s clear that most people have an element of both) I’d come down on the hate side.

The defaults are all wrong – all those damn histograms combined with bind variable peeking – horrible!

The timing is terrible. Yes I know 10pm seems like a good idea for a batch maintenance window – that’s why so many systems already use it! Duh!

Don’t introduce a significant new feature with such little fanfare and without, for example, fixing bind variable peeking *before* introducing shed-loads of new histograms without warning!

Really, this job has probably solved many problems on those small workgroup databases which are in the majority but those databases aren’t the critical ones so performance problems aren’t such a big deal. This job has caused many problems on the critical databases – the ones were problems are a big deal. Yes, I know, if we’re going to upgrade, we should check all of these features. But site after site I have to switch this damn job off left, right and centre because people simply didn’t know it was on and were collecting their own stats in tandem! Chaos. I hit my latest example of this just the other week.

Finally, it gives people an excuse to never think about stats again – not a good idea!

Nah, I hate it. I think the fact that Oracle felt the need to publish a white paper about it shows just how many problems this has caused their customers.

Yes, but look at all the great response I’ve had Doug! All those people being passionate!

Like I keep saying, the job is flawed and it certainly has issues for bigger, more demanding systems. But then, if you have these bigger, more demanding systems you should be looking after your stats as a major consideration of your implementation. Before this job came along, I kept seeing systems and problems where people were saying the CBO was crud {which some bits of it are} but it was because they had garbage stats – if they had collected decent stats they would have had far fewer issues to deal with and, in fact, gathering decent stats fixed many “problems”. The number of times I saw people using “rule” hints to fill in for the fact they had not collected good stats at the right time. Then you asked them what the rules were for the rule based optimizer – and they did not know {not old and cranky enough}. So they were fixing a problem the wrong way with a sticking plaster they did not understand.

So, what IS the best performance improvement of the last decade or so? {I think we have established that automatic stats is not popular with a lot of people}.

No mention for AWR Doug? But you love pictures now?! I think it’s up there as it makes access to other things so much easier and more immediate and managers understand pcitures, so you can SHOW them the problem, SHWO them when it started and SHOW them it’s fixed. They don;t care how it’s fixed of course.

When did Oracle bring in hashing? Later Oracle 7 or 8? I can’t remember.

At the UKOUG do I have to buy the beer as I started the discusion, or do I get free beer? :-)

Hash clusters can be the dog’s proverbials. I worked with some DBAs in Saudi who took care of a Model 204 (IIRC) database which was based almost entirely on hash tables. Stunning performance if you’re looking for single key look-ups. Funny how rarely they’re used in Oracle-land.

1- Function-based indexes. This has saved so much pain in managing extra columns in tables, to me it’s a given.

2- Partitioning. I fell in love with it after Jonathan’s excellent description in his book. First met it in 8.0, where it was kinda clunky but workable. In 11g, it’s getting pretty darn near to ideal. And it has such an impact in performance in this day and age of multi-billion row tables! (besides, I think Doug will love this one as well: close to home, eh?;)

Dang, once again I’m gonna have to pass on meeting some of ya folks upfront. Still stuck down-under, kids/family/dogs/cat/etc stop me from attending all these overseas functions!

Ah well, maybe in another 5 years or so. Meanwhile guys, have a few cold ones for me!

God, partitioning. Why didn’t I think about that? Seems so obvious when you say it.

Interesting that you picked two physical design features and you’re right. Good design and design possibilities probably trump all cool features which I suppose is what Practical Oracle 8i was all about. In fact there are so many features and possibilites these days that if people just knew more about what was available we’d probably see better systems. I remember a presentation at last year’s OOW when one of the Oracle guys (Juan Loiza, I think) was saying something along the lines of … some people think we’re done with the database server but we always have work to do there. (from memory)

As for meeting up, it looks like I’ll just have to haul my *rse down to that part of the world. I have built up a few air miles and the missus is very keen on Oz. As long as there’s an advanced spider sweep party ;-)

Partitioning… Well, It’s a cool extension and you can’t really get into the terabyte scale without them, but I see them much more as a management improvement not performance. In fact, I growl in presentations when people say Partitions give better performance – I’ve seen the opposite toooo many times. Your query is going to find half a dozen rows in your 100 partition table and you can’t partition prune? That’s 100 index’s to do a “quick” on scan boys and girls. Besides, it’s an extra cost Enterprise Edition level feature, I take of points for Oracle,s unfriendly stance on that one.

I’m with you and Mr de Visser on design – keep it simple, keep it smart.