Wednesday, February 18, 2009

EXPLAIN PLAN on DDLs

Update May 2009: Inspired by Jonathan Lewis' post about index sizes I've revisited this and added some more cases that covers Jonathan's points and some more experiments regarding 11g's virtual columns and bitmap indexes.

Although it's mentioned in the official documentation of the EXPLAIN PLAN command, I believe it's a not so well known fact that you can EXPLAIN the following DDL commands:

CREATE TABLECREATE INDEXALTER INDEX REBUILD

What is the benefit of doing so? First of all in case of a complex CREATE TABLE AS SELECT (CTAS) statement you'll get the execution plan, and when using the Cost Based Optimizer this information will include the number of estimated rows that are going to be generated and the estimated time it takes (from 10g on, prior to 10g you need to turn the reported cost yourself into time if you like).

Of course usually this plan is supposed to correspond to the plan of the SELECT statement alone, provided that you're using the default ALL_ROWS optimizer mode, but I have already encountered situations where the plans were different although this usually was due to some buggy behaviour.

In case of a CREATE INDEX DDL you'll get again a cost and time indication, along with a row estimate. Unfortunately it doesn't show an estimation of the required TEMP space for sorting. Additionally it doesn't work very well for function-based indexes where the number of rows actually indexed might be far less than the number of rows of the underlying table. The same applies to NULLABLE columns, it would be nice if this information would be used to come up with the correct number of rows that the index will cover.

As you will see in the sample script provided below, the optimizer's cardinality estimates are not used at full extent when generating indexes, in particular function-based indexes. The cardinality estimates of a query are significantly different from that of a corresponding index expression.

Starting with Oracle 10.2 you'll get an indication of the size of the index based on the dictionary statistics in the "Notes" section, so the estimate is only as good as your statistics allow for, in addition above points apply regarding the accuracy of the estimate in case of null values or function-based indexes. The size estimate is obviously based on the average column length recorded in the statistics.

Interestingly the CREATE TABLE AS SELECT estimation doesn't provide such a size estimation which could be quite handy, too.

Explaining an ALTER INDEX REBUILD shows similar information to that of a CREATE INDEX, but it doesn't show the size estimate, which is a pity because it might provide an indication of the size reduction that might be the result of a rebuild. Of course you can help yourself by explaining the corresponding CREATE INDEX statement.

One potentially interesting information is what is going to be used to perform the create or rebuild index, e.g. the database might be capable of using another index to read the information from rather than performing a full table scan (although you should check in this case if the index isn't redundant).

Other DDLs like ALTER INDEX COALESCE obviously are not supported and a simple CREATE TABLE doesn't add any value, so it's supported but useless.

The following script shows some test results of a EXPLAIN PLAN on DDLs in 11.1.0.7:

SQL> SQL> -- it looks like that a certain overheadSQL> -- is included in the estimated index sizeSQL> -- otherwise this should fit into aSQL> -- single 64kb extentSQL> explain plan for 2 create index explain_ddl_test_idx 3 on explain_ddl_test (object_id);

SQL> SQL> -- what about function-based indexes?SQL> -- this is the default 5% hard-codedSQL> -- selectivity for functionsSQL> -- by the optimizerSQL> explain plan for 2 select 3 * 4 from 5 explain_ddl_test 6 where 7 case when object_id = 1 then 1 else null end is not null;

In 10gR2, I can see a DBMS_SPACE.CREATE_INDEX_COST procedure which does provide estimates of USED_BYTES and ALLOCATED_BYTES (but not Time and TempSpace). Rudimentary testing shows that it DOES rely on Table (and column statistics).

I haven't tested it yet but looking at the procedure definition it may just run an EXPLAIN PLAN under the covers since it requires a plan table and the information about used bytes and allocated bytes could actually be taken from the EXPLAIN PLAN result (the BYTES column and the OTHER_XML xml information).

If it does so the results should be consistent with what I've mentioned here.