Tuesday, February 21, 2012

There is a lot more to say about Dynamic Sampling and indexes, and I'll try to cover these basics in my Dynamic Sampling series on AllThingsOracle.com, but two recent discussions on the OTN forums and on Charles Hooper's blog prompted me to publish this blog post.

These discussions revolved around the following issues with Dynamic Sampling and indexes:

1. CREATE INDEX On Empty Table

There is an interesting change in behaviour that has been introduced with release 11.2. Whereas pre-11.2 versions do not automatically gather index statistics on empty tables, 11.2 does so. So a simple CREATE TABLE followed by a CREATE INDEX command (or an ALTER TABLE add constraint that implicitly/explictly creates indexes) will now lead to a table without statistics, but index statistics with every value set to 0 (rows, distinct keys, clustering factor, leaf blocks etc.)

Since Oracle 10g a CREATE INDEX command automatically computes statistics for an index (since Oracle 9i you could add optionally a COMPUTE STATISTICS clause to the CREATE INDEX command). The interesting point is that there is no "NOCOMPUTE STATISTICS" clause, so you cannot prevent this from happening via the syntax.

There is a way to prevent it, but only indirectly by locking the table statistics. Funny enough, if the table statistics are locked, there is again no way to overwrite the lock by using a FORCE keyword or similar as part of the CREATE INDEX command, whereas DBMS_STATS offers such a FORCE option.

Note that creating constraints requiring indexes as part of the CREATE TABLE command shows a different behaviour: The indexes do not have statistics in this case.

This shouldn't be too relevant for real-life cases, but is probably more important to setups of test cases / demonstrations. Of course it can become relevant to real-life deployment tasks that add new objects to a database. If statistics are not handled properly afterwards you now end up with tables missing statistics but indexes with 0 statistics.

Of course when deliberately leaving a table without statistics you need to do something about it anyway, because otherwise the default statistics job (since 10g) will turn this table into a potential threat by gathering statistics during the maintenance window where such tables might be in some state that is not representative for the actual workload during processing.

Either the default statistics job has to be disabled or re-configured via the AUTOSTATS_TARGET option of the SET_GLOBAL_PREFS / SET_PARAM procedures of DBMS_STATS, or the table statistics need to be locked individually via LOCK_TABLE_STATS.

So a reasonable order of commands for such a table at deployment time could look like this:

CREATE TABLE...

EXEC DBMS_STATS.LOCK_TABLE_STATS(...)

CREATE INDEX...

This way the statistics will be handled consistently for both table and indexes.

Note that all this doesn't apply to Global Temporary Tables (GTTs) as they are not analyzed by the default statistics job and creating an index on them doesn't automatically gather statistics for that index either, even in 11.2.

The change introduced in 11.2 can be seen by simply trying the following in 11.2 and pre-11.2 versions to see the difference:

The reason why this change in behaviour is particularly relevant is the fact that the optimizer behaves inconsistently if a table doesn't have statistics but a suitable index does.

If Dynamic Sampling gets used (very likely from 10g on because the table is missing statistics and the default level of 2 will trigger Dynamic Sampling for such a table in this case) the optimizer will not only obtain basic statistics information about the table but also overwrite the existing index statistics, but only partially.

The problem is that the existing index leaf blocks statistics get overwritten by the Dynamic Sampling code, but the clustering factor is not.

So although the index has statistics gathered (there is no (NOT ANALYZED) mentioned for the index) the Dynamic Sampling updates the index stats "blocks".

This is not consistent with the expected behaviour of Dynamic Sampling, because by default it is not supposed to overwrite existing statistics. If you want to force Dynamic Sampling to do so you need to specify the (not really) documented table level hint DYNAMIC_SAMPLING_EST_CDN.

However, another pretty important number, the clustering factor of the index, doesn't get updated. Since the clustering factor is important for accessing the table via an index any execution plan that needs to visit the table in such a way will now be costed inconsistently: The index access part will be based on the index block statistics updated by Dynamic Sampling whereas the table access part will be costed using the clustering factor (and potentially other existing index statistics left unmodified by Dynamic Sampling).

This will potentially lead to rather unreasonable plans, as the following test case demonstrates:

set echo on linesize 200

drop table t;

purge table t;

create table t ( c1 number not null, c2 number not null, c3 varchar2(300) not null);

create index t_idx on t(c2);

create index t_pk on t(c1);

insert into tselect rownum as c1, 1 as c2, rpad('A',300,'A') as c3from dualconnect by level<=100000;

Notice how the optimizer for the first execution plan gets a reasonable cardinality estimate from the Dynamic Sampling (due to the increased level it is spot on in this case) but still thinks that reading 100,000 rows from a 100,000 table via an index is a good idea.

While the index access is costed reasonably based on the updated index blocks statistics, the table access is "cost-free" due to the clustering factor left unmodified at 0.

When deleting the index statistics we can see that a default clustering factor of 800 gets assumed (you can see this in the optimizer trace file), which still makes the index access more favourable.

Only with index statistics gathered a more reasonable plan gets selected.

There is a bug filed against 11.2 (bug no 12942119 "Suboptimal plan if index has statistics but table has no statistics and dynamic sampling occurs") that seems to address this issue, but it is not clear from the bug description what exactly the bug fix does. It hasn't made it into the 11.2.0.3 patch set but it is available as one-off patch and part of some Windows platform intermediate patch sets.

3. 11.2.0.3 Change In Behaviour

The 11.2.0.3 patch set introduces another interesting change in behaviour: As I've just demonstrated even with index statistics missing and hence consistent optimizer behaviour the default clustering factor assumed might still favour unreasonable execution plans.

There is a bug fix 12399886:"OPTIMIZER STILL USES DEFAULT INDEX STATISTICS AFTER DYNAMIC SAMPLING WAS USED" that is by default enabled in 11.2.0.3 that addresses this issue and uses a different clustering factor derived from the Dynamic Sampling results (it turns out to be based on the blocks of the table, not the rows, see below for more details).

This is a good thing in principle because the cost estimates now might be closer to reality, but as always this might cause a lot of headaches after applying the patch sets because execution plans might change. This applies of course only to those scenarios that rely on Dynamic Sampling and can make use indexes.

Note that the inconsistent behaviour described under 2. is still there in 11.2.0.3, so tables without statistics having index statistics gathered still don't mix very well.

Here are the execution plans that I got from 11.2.0.3 for above test case:

Notice 11.2.0.3 suffers from the same cost-free table access with the zero index statistics in place, but after deleting the index statistics a full table scan will be chosen. It looks like that the new clustering factor is based on table blocks * 8 rather than a hard coded value of 800. Both constants (8, 800) might depend on the default block size - these tests were performed on 8KB.

In particular the latter closes an instrumentation gap that always bothered me in the past: So far it wasn't easy to answer the question which session used to allocate TEMP space in the past. Of course it is easy to answer while the TEMP allocation was still held by a session by looking at the corresponding V$ views like V$SORT_USAGE, but once the allocation was released answering questions like why was my TEMP space exhausted three hours ago was something that couldn't be told by looking at the information provided by Oracle.

So these two columns now should allow to easily analyze PGA memory and TEMP space issues that happened in the past.

Time Model And I/O Delta

Furthermore there are two new classes of columns: TM_DELTA_TIME / TM_DELTA_CPU_TIME / TM_DELTA_DB_TIME as well as the DELTA_TIME plus the remaining DELTA_*_IO_* columns. These allow to analyze which sessions contributed to what extend to the overall Time Model statistics DB_TIME as well as the IO related statistics. Again a nice addition that allows for more sophisticated analysis based solely on ASH data. In the past such analysis was only possible when persisting the session level statistics.

Reasons For Non-Shared Cursors

So far the V$SQL_SHARED_CURSOR view represented a user-friendly output of a bitmask (taken from X$KKSCS) telling why child cursors couldn't be shared, although unfortunately many of the "reasons" are not documented very well. With every release Oracle adds more columns to the view trying to keep up with the ever growing number of reasons why child cursors couldn't be shared.

With now more than 60 possible documented reasons determining why a child cursor wasn't shared becomes more and more cumbersome and can only be done reasonably by using some kind of script that turns this view into some more human-readable format, for example like Dion Cho did here or MOS note "High SQL Version Counts - Script to determine reason(s) ID 438755.1".

In 11.2.0.2 another new column has been introduced: REASON.

This column is in XML format and seems to be an attempt to allow for a more straightforward understanding why a cursor wasn't shared. I don't know if there are any Oracle tools that already process this column, but at present it looks like there is still room for improvement. First of all the XML generated isn't even valid XML, so creating a XMLType based on the contents regularly fails with parsing errors, mainly due to a missing root node of the document in my cases.

But also the generated content seems to be questionable, for example 11.2.0.2 added the new column USE_FEEDBACK_STATS indicating that a new child cursor was deliberately created due to the new Cardinality Feedback feature. Nevertheless the REASON column only shows a generic "OptimizerEnvironmentMismatch" reason. Although this isn't wrong it would helpful if the column at least showed the same level of information that could be told from looking at the individual columns.

Nevertheless the new column seems to be a move into the right direction. In particular it looks like that it not only states the different reasons but also further details for each reason that might allow getting a better understanding why cursors weren't shared. At present these details sometimes look quite cryptic, however time will tell if these will be more user-friendly or someone will come up with further explanations what these mean.

Certainly I've already seen cases where the REASON column explicitly stated optimizer related parameters that were different and hence allowed a straightforward understanding why cursors weren't shared rather than just indicating a Y in one of the remaining columns.