“Oracle 10g enhancements

Oracle 10g greatly aided the optimization of large SQL workloads with the introduction of dynamic sampling and root-cause optimization with dbms_stats. Oracle acknowledged that the root cause of sub-optimal SQL execution plans related to the quality of CBO statistics, and they introduced enhancements to dbms_stats to allow for automatic histogram creation and the gather_system_stats procedure to collecting all-important external information, most notably the average disk access times for index access (sequential reads) and full-scan access (scattered reads).”

What, if anything, is wrong with the above quote from the book?

The point of blog articles like this one is not to insult authors who have spent thousands of hours carefully constructing an accurate and helpful book, but instead to suggest that readers investigate when something stated does not exactly match what one believes to be true. It could be that the author “took a long walk down a short pier”, or that the author is revealing accurate information which simply cannot be found through other resources (and may in the process be directly contradicting information sources you have used in the past). If you do not investigate in such cases, you may lose an important opportunity to learn something that could prove to be extremely valuable.

As I stated in a comment in the earlier blog article, I would like to clarify that my review on the Amazon site is not intended to suggest that people should not buy the book. As the publisher’s website states that the book is written for senior Oracle DBAs, I suggest that senior DBAs, Oracle Certified Masters, and other people who are able to research the book’s contents *should* buy the book and post reviews of the book that highlight specific content of the book indicating whether or not that content is correctly stated. Such feedback will be a benefit to the Oracle community, and could help to improve Oracle Database books that are published in the future. I think that we need to keep a positive outlook in order to see things improve.

Actions

Information

6 responses

That time let’s start at the end:
* it’s simply wrong to suggest indices are accessed (only) by sequential reads. Also the link between FTS and scattered reads is not that 100% as it could be assumed after reading this sentence.
* the quality of statistics is not THE root cause for bad execution plans. Probably it’s A root cause, but not the only. I’d like to give you at least 2 other possible root causes:
(not yet) implemented features in CBO
complex skewed data: If working with statistics (as CBO is doing) there will always be place for a tuple of data which is not represented by the statistics well and therefore the particular execution plan will be suboptimal.
Just a little starter ;-)

Impressive answer. Reading from the start of the quote I identified a couple of other problems with the quote – I will let someone else mention those problems (hint: are those items Oracle Database 10g enhancements?).

One final thought. Is dynamic sampling only helpful in cases involving “large SQL workloads”, or can it also be helpful if during the last statistics collection a table contained 0 rows, and that table (at the time of a SQL statement execution) now contains 220 rows (such a problem was mentioned in the “Expert Oracle Practices” book)? For that matter, what is a “large SQL workload”? SQL is a language, so would a large SQL workload be a SQL statement that is 10,000 characters long?

– can it also be helpful if during the last statistics collection a table contained 0 rows, and that table (at the time of a SQL statement execution) now contains 220 rows (such a problem was mentioned in the “Expert Oracle Practices” book)

I would say it is not helpful in this case either. The documentation states:

“In both the serial and parallel cases, the database performs dynamic sampling when existing statistics are not sufficient:

* Missing statistics

When one or more of the tables in the query do not have statistics, the optimizer gathers basic statistics on these tables before optimization. In this case, the statistics are not as high-quality or as complete as the statistics gathered using the DBMS_STATS package. This tradeoff is made to limit the impact on the compile time of the statement.

* Collected statistics cannot be used or are likely to lead to poor estimates

For example, a statement may contain a complex predicate expression, but extended statistics are not available (see “Extended Statistics”). Extended statistics help the optimizer get good quality cardinality estimates for complex predicate expressions. Dynamic sampling can compensate for the lack of extended statistics.”

In this case the table has statistics, even though they are not correct, therefore dynamic sampling won’t kick in in the general case. I imagine there are some exceptions depending on query complexity and such. I did a single table test that agreed with the Oracle documentation:

I think that I need to better clarify this statement:
“or can it also be helpful if during the last statistics collection a table contained 0 rows, and that table (at the time of a SQL statement execution) now contains 220 rows (such a problem was mentioned in the “Expert Oracle Practices” book)?”

What I meant by the above is that statistics were collected on the table when it last contained 0 rows (a victim of the automatic nightly stale statistics collection procedure). If statistics exist for a table, the default level of dynamic sampling will not cause a dynamic sample of the table to take place when parsing a SQL statement that accesses the table, and that could be a significant problem when the table contains 220 rows later in the day (there is a Metalink article that describes this behavior, but I cannot find it at the moment – the referenced problem/case study starts on page 306 of the “Expert Oracle Practices” book). Based on tests that I performed, it is still possible that dynamic sampling might take place if parallel query is enabled for the SQL statement and the query is executed in a recent Oracle Database release (see https://hoopercharles.wordpress.com/2010/11/12/dynamic-sampling-changes/ and Metalink ID 1102413.1) – and that agrees with your comment statement.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: