Monday, August 4, 2008

Oracle's exchange partition feature is a handy method for loading mass data. Data is usually being loaded into an "exchange" table that mirrors the attributes of a single partition or subpartition of the actual table.

Before 10g it probably didn't matter if you gathered statistics on the exchange table before performing the exchange or afterwards on the actual table, because the "SIZE AUTO" option of the method_opt parameter that was already in 9i was rarely used.

But starting with Oracle 10g the "SIZE AUTO" option is used as default when gathering table resp. column statistics. "SIZE AUTO" works in a two-level approach: In the first step it uses "column usage monitoring", another feature introduced in Oracle 9i to gather the information which columns potentially could benefit from histograms as they are used in WHERE clauses with certain kind of predicates, e.g. equal comparisons. In the next step a histogram is generated to detect if the column values are skewed or have significant gaps (which raised an interesting question about what are "gaps" in VARCHAR based data), and if they do, the histogram is stored in the statistics (this part corresponds to the "SIZE SKEWONLY" option behaviour).

So if you want to take advantage of new default "SIZE AUTO" feature, it is crucial that the statistics are gathered on the actual table because otherwise the "column usage monitoring" won't work as expected. It needs the workload of the actual table, not the one of the exchange table.

As already mentioned, from 10g on the SIZE AUTO option is the default option if you haven't changed it using DBMS_STATS.SET_PARAM resp. DBMS_STATS.SET_*_PREFS in 11g.

SQL> SQL> -- flush the monitoring info, so we can see it in col_usage$SQL> -- otherwise it may take up to 15-20 minutes to see the latest changesSQL> -- DBMS_STATS.GATHER_*_STATS will flush this as wellSQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO

SQL> SQL> -- So if you exchange now you won't benefitSQL> -- from any histograms potentially being generatedSQL> -- due to column usage and skewnessSQL> alter table size_auto_test exchange subpartition pkey_1_101 with table exchange_test;

So it can be seen from this sample that gathering statistics on the exchange table uses the workload of the exchange table and therefore the SIZE AUTO option doesn't work as expected. Gathering statistics on the actual partitioned table makes use the column workload information and generates histograms based on both workload and skewness.

Another interesting observation is that the column usage in sys.col_usage$ is maintained on table level rather on partition or subpartition level. Although I've explicitly queried on subpartition level the monitoring info refers to the table object and not to the (sub-)partition subobjects. This means that if your data in the partitions is used differently by queries depending on the way you gather statistics you might end up with histograms on particular partitions that are unnecessary. This can have an impact on both the time it takes to gather the statistics as for each histogram a separate gather step resp. query is required and the execution plans and cardinality estimates can be influenced by the existing histograms.