optimizer mode parameter, hint and missing statistics

On older versions at least, using an all_rows hint with the rule-based optimizer is not necessarily the same as setting optimizer_mode to all_rows.

Edit:
Note that I’m deliberately avoiding dynamic sampling and newer features like cardinality feedback, just showing how a little oddity in code paths for slightly different all_rows scenarios might lead to different estimates in older versions.

This resulted from a conversation with a colleague that started something like this:

Colleague: Is the default cardinality for a temporary table 2000?
Me: No, it’s related to block size so you should see 8168.
Colleague: No, I see 2000.

This was observed on 9.2.0.8 using a global temporary table but similar observations are possible with collections, external tables and even normal heap tables.

Yes, 9.2.0.8 is an old version, yes rule-based optimizer is now defunct, but this will remain relevant for the many systems that will continue to run old versions for years to come.

It is reasonably well-known that the default cardinality for collections and global temporary tables is related to block size – an overhead*. So, with an 8k block size, it is common to see a cardinality estimate of 8168, as shown below:

So it seems that rule-based optimizer + all_rows hint drops into a different path in the optimizer which uses a default cardinality of 2000 rows.

The same behaviour is not true in the latest versions.

In 11gR2 you can still force the deprecated RBO but if you hint all_rows then you seem to go down a consistent path (i.e. consistent estimate of 8168 provided no dynamic sampling or cardinality feedback), as you might have expected above.

*Going back to the default value of 8168, this is calculated from the formula of

num_of_blocks * (block_size – cache_layer) / avg_row_len

And for collections and global temporary tables without any additional statistics, num_of_blocks and avg_row_len default to 100 so we’re left with (block_size – cache_layer) with the cache_layer typically evaluating to 24.

In terms of the general application of this default formula, until recently I didn’t appreciate that in the absence of statistics AND dynamic sampling, that the actual number of blocks is used (if there is an actual number of blocks).

One of the reasons that it’s easy to have missed this formula is that in any recent version, dynamic sampling will kick in by default and so you have to explicitly prevent it. That’s my excuse anyway.

Like this:

Related

3 Responses to optimizer mode parameter, hint and missing statistics

Dom, this default estimate can be circumvented by using the dynamic sampling. I redid the first part of your example with optimizer_dynamic_sampling=0 (off) and 2 (objects without statistics are sampled). The latter improves the estimate dramatically:

The first result, without the dynamic sampling, has the very same estimate as the one in your post. With the dynamic sampling, CBO gets much more accurate statistics and the estimates are much, much better.

Dynamic sampling is a fabulous feature, probably one of my top 5 performance tuning features. But judgement is reserved on whether features such as cardinality feedback and sql plan baselines will eventually change that.

Whilst we’re talking about dynamic sampling, I also mentioned collections with respect to the defaults for cardinality. Dynamic sampling does not work with collections until something like 11.1.0.7.

Also, on the same subject and whilst we’ve also mentioned newer functionality, I know you’ve been having a good look at new features in 11gR2 recently. Note that up to 11.2.0.2 at least, cardinality feedback will not kick in if dynamic sampling has been applied.

Note that I’ve added a little edit to the article just to clarify that I’m deliberately avoiding dynamic sampling here but I don’t want to undermine your valuable comments on its benefits.