Sunday, June 1, 2008

Aggregated statistics on partition or global level are by definition of worse quality than actually gathered statistics, mainly because you can not derive each and every statistic by aggregation (e.g. number of distinct column values), nevertheless there might be applications where they are sufficient and maintaining them is far less resource intensive than actually gathering them.

Aggregated statistics are only maintained when calling DBMS_STATS.GATHER_*_STATS if two conditions are met: For all underlying (sub-)partition objects statistics are available (either gathered or aggregated themselves in case of subpartitioning) and there are no actual statistics gathered on the upper level. In case actual statistics have been gathered, these take always precedence over aggregated statistics which means they are never overwritten by aggregation. You can distinguish between aggregated and gathered statistics by checking the column "GLOBAL_STATS" e.g. in the *_STATISTICS dictionary views. If it says "YES", the statistics are gathered and won't be overwritten by aggregation. In case of "NO" the statistics are based on aggregation and can be updated by aggregating the underlying statistics (provided that for all underlying objects statistics are available).

Up to 10gR2 the documentation of ALTER TABLE EXCHANGE PARTITION mentiones that "All statistics of the table and partition are exchanged, including table, column, index statistics, and histograms. Oracle Database recalculates the aggregate statistics of the table receiving the new partition."

This is only correct for the first part, the second part regarding the automatic aggregation of statistics is a documentation bug. In the beginning of partitioning support with Oracle8 according to Oracle this was true but due to customer complaints that this automatic re-aggregation took a significant amount of time (probably in case there were numerous partitions or subpartitions) it has been removed from the default code. There is a (more or less) documented (internal) parameter "_MINIMAL_STATS_AGGREGATION" that has been introduced in Oracle 8i to control the aggregation of statistics. According to the documentation available the following applies: "If the parameter is FALSE, the aggregate statistics will be recomputed. If the parameter is TRUE, the statistics will not be recomputed.". See here the documentation. So you have to be a bit cautious when using this parameter as the setting is counter-intuitive. You have to set it to the non-default value of FALSE to activate the aggregation when exchanging partitions.

While coming across this issue a documentation bug has been raised with Oracle that should correct the last part of the first quoted paragraph regarding the default aggregation rule (so that the documentation correctly mentions that no aggregation is performed by default).

Now looking at the latest available documentation of 11gR1, the particular paragraph reads like this:

"Statistics and histograms on the table or partition are not exchanged. Use the DBMS_STATS package to reaggregate statistics or create a histogram for the table receiving the new partition."

This is again quite interesting as now both parts have been modified. It says now not only that no aggregation takes place but the statistics are not exchanged at all. This would mean a significant change in functionality in 11gR1 compared to pre-11 versions.

I think it is a good idea at this point to come up with a simple test case to check what the actual behaviour is in the current releases 10gR2 Patch Set 3 (10.2.0.4) and 11gR1 (11.1.0.6).

SQL> SQL> -- The exchange table now holds twice the data of subpartition pkey_1_101SQL> SQL> -- Now we get rid of global stats and use instead aggregated statistics on partition and global levelSQL> -- Otherwise exchanging subpartitions will never affect the partition or global level statsSQL> -- Gathered statistics on partition or global level take precedence over aggregated statistics, check the column GLOBAL_STATSSQL> -- If it shows YES, then the statistics have been gathered, if it shows NO then these are aggregated statisticsSQL> -- Aggregated stats are created/updated whenever all partitions/subpartitions of the lower level have statisticsSQL> -- and one of the lower level partitions/subpartitions are analyzed AND the parent level does not have global statsSQL> SQL> exec dbms_stats.delete_table_stats(user, 'partition_test', cascade_parts=>false)

SQL> SQL> -- The exchange table now holds twice the data of subpartition pkey_1_101SQL> SQL> -- Now we get rid of global stats and use instead aggregated statistics on partition and global levelSQL> -- Otherwise exchanging subpartitions will never affect the partition or global level statsSQL> -- Gathered statistics on partition or global level take precedence over aggregated statistics, check the column GLOBAL_STATSSQL> -- If it shows YES, then the statistics have been gathered, if it shows NO then these are aggregated statisticsSQL> -- Aggregated stats are created/updated whenever all partitions/subpartitions of the lower level have statisticsSQL> -- and one of the lower level partitions/subpartitions are analyzed AND the parent level does not have global statsSQL> SQL> exec dbms_stats.delete_table_stats(user, 'partition_test', cascade_parts=>false)

This simple test shows that the documentation is unfortunately still or again wrong since it is obvious that 11g still behaves like 10gR2 including the option to aggregate the statistics when setting the "_minimal_stats_aggregation" parameter.

So in my opinion the mentioned paragraph in the official documentation could be written like the following:"All statistics of the table and partition are exchanged, including table, column, index statistics, and histograms. Oracle Database does not recalculate any aggregated statistics of the table receiving the new partition by default. If you want to have this aggregation to happen automatically each time you exchange a partition, set the parameter '_minimal_stats_aggregation' to FALSE. The default setting of the parameter is TRUE which means that it prevents the aggregation from happening."

3 comments:

I know this is a long time to be posting a comment on this, particularly as I read the post for the first time so long ago!

But I was having a discussion at work last week when someone said that "_minimal_stats_aggregation" is documented in 11g and when we were discussing references to that effect, your blog post cropped up.

However, although I agree with your post that the documentation is wrong and that was your main point, I think there is an interesting side-effect that because you draw attention to it being documented, people might think it's a bit different to other underscore parameters. Although you post a doc link, it's not immediately clear from the post that it's documented in the "Data Cartridge Developer's Guide" under "Using Extensible Optimizer".

I'm sure you would agree that this is documenting a pretty esoteric area of the RDBMS and that it doesn't really make it a 'documented' parameter in the same sense as non-underscore parameters that are documented in the 'Reference' manual are.

Maybe it seems I'm splitting hairs, but I was worried when someone used this as an argument for Oracle advocating it's use in standard environments so I thought it was worth a comment.

Since this is a quite old post and I always have trouble editing them without loosing the whole formatting I don't know yet if I manage to update it to point to your comment in the text.

There is this rule "never touch an underscore parameter in your production environment without confirmation from Oracle Support" and that certainly applies here as well - I was probably quite enthusiastic when I found the parameter being mentioned in the official documentation.