I suggest that if you still have any questions regarding this webinar then you can comment on the post over there and I'll try to address them.

If you interested in more stuff like that, then stay tuned as there are more of them planned covering many of the aspects that I deliberately left out or mentioned only briefly in this basic introduction.

Thursday, April 12, 2012

I hope the technical difficulties with my new mic (which of course work flawlessly during several test runs) weren't too severe so that you could still understand what I said.

As usual when I do presentations in English it took me far too long so probably some of you missed the end of the session. If you did you may want to check the webinar recording once available at the webinar archive at AllThingsOracle.com, because we had a very good and extensive Q+A session with excellent questions asked.

Thanks again to AllThingsOracle.com and James Murtagh for hosting the event.

An archive containing the presentation along with the demo scripts and screenshots of the live demos will be made available at AllThingsOracle.com, but if you can't wait until it's published there, you can download the package also from here.

Tuesday, April 10, 2012

Oracle 11g added Extended Statistics support for column groups in order to detect correlated columns for filter predicates using an equal comparison.

Note that Oracle 11g also added the ability to use the number of distinct keys of a composite index as an upper limit for the cardinality estimates for matching column predicates, which means that the optimizer is now capable of detecting correlated columns without the explicit addition of Extended Statistics / Column Groups.

The "composite index" feature was also backported to 10.2.0.4 and 10.2.0.5 but needs to be activated explicitly via a FIX_CONTROL. You can read more about that on Jonathan Lewis' blog.

The "Column Group" feature (either by explicit Extended Statistics or via a matching composite index) however has a weakness in a very special case: If there is only a single distinct value in the statistics then the "out-of-range" detection of the optimizer is not working correctly.

As you can see the optimizer still thinks to return "all" rows even in obvious cases where no rows will be returned.

In 11g the problem does not appear if there is more than a single distinct value, however the 10g backport seems to produce similar silly estimates even with more than a single distinct value.

One possible workaround is to generate a histogram on the column group in 11g. This way the out-of-range detection works again as expected for the single distinct value case.

It's also possible to generate a histogram on one or more of the affected columns each to avoid the problem, but this "breaks" the column group and correlation detection and hence might lead to worse cardinality estimates in the case of more than a single distinct value.

Note that when generating histograms you need to carefully check if you run into the problems described here: If you end up with a histogram containing only a single bucket but the underlying basic column statistics show more than a single distinct value then in principle unfortunately again the same issue will show up as demonstrated in the other blog post.

Note that this inconsistency between basic column statistics (Number Of Distinct Values, NDV) and the corresponding histogram is a consequence of how 11g gathers statistics when using the AUTO_SAMPLE_SIZE default.