Monday, January 14, 2008

Oracle 9i introduced the "subquery factoring" feature (the WITH clause), and also offers a optimizer feature called "complex view merging". Subquery factoring allows the database to optionally "materialize" the result of a WITH clause as global temporary table (GTT), which can be beneficial if the WITH clause is referenced more than once in the main statement.Complex views are view definitions or inline views that contain e.g. distinct or group by clauses.

Oracle 10g in addition introduced the "Cost based query transformation" (CBQT) feature that allows Oracle to work out if e.g. applying complex view merging is beneficial by comparingthe cost of the transformed with the cost of the original query, whereas 9i - if possible - always applies the complex view merging even if the transformed query is not superior to the original one in terms of costs.

In order to take advantage of the potential benefit of "subquery factoring" one might come to the conclusion that it's a good idea to use the subquery factoring as often as possible, since in theory it should just offer advantages. In case the instantiation of the query result is not used the plan should be not worse than the one generated when using traditional views/inline views.

Unfortunately test cases show that the usage of subquery factoring actually disables certain options otherwise available to the optimizer, in particular the "Cost based query transformation" optimizer feature seems to be skipped if the subquery factoring is used.

This is a pitty as it means that you have to be careful with the usage of subquery factoring, because rather than potentially improving the performance it might make things actually worse under certain circumstances.

This still holds true even for the current Oracle release 11g, it looks like things haven't changed there.

The following test case ran on Oracle 11.1.0.6.0 Windows 32-Bit (default database installation) shows the different execution plans generated by the optimizer when optimizing two similar statements using traditional inline views and subquery factoring.

Since the second table just contains a fraction of the first query, it is rather obvious that it is probably better to first join the two tables in order to eliminate most of the rows and afterwards apply the group by to the greatly reduced result set. This can be achieved by the "Complex view merging" feature and the cost of the merged statement should reflect this as it should be less than the cost of the unmerged statement that first applies the group by to the larger table and afterwards performs the join, since the group by in this case does not reduce the result very much.

As a side note it's worth to mention that when using the parallel query feature things might be quite different in case you have a HASH distribution used by the "Hash Join" operation and the join key criteria is very skewed. Since this results in a skewed distribution of the data to the parallel query slaves which means that most of the work will be done by only a few of the parallel slaves rendering the parallel operation rather inefficient in this particular case it could be more advantageous to potentially reduce the skewness first by applying the grouping before joining the resulting set.

The generated plan is more expensive than the one generated using the statement without subquery factoring. In order to get the same plan as before, the "merge" hint can be used to force the view merge transformation to take place:

This results in the cheaper plan seen before, but the optimizer does not come up with this plan on its own. Examining the 10053 optimizer trace of the subquery factoring statement, the following output can be found multiple times:

So this seems to confirm the assumption that the usage of the subquery factoring effectively disables some other features of the optimizer, in this particular case the complex view merging of the cost based query transformation.

As of now you probably have to manually test which of the two approaches might offer the best performance when considering the subquery factoring feature. It is advisable to use the subquery factoring option not too blindly, especially if the query contains views to merge.