Wednesday, January 27, 2010

When your projection is not cost-free

First of all I have to admit that the title of this post is a bit misleading - the projection is almost never "cost-free". Indirectly cost will be generated by accessing rows from row sources to obtain any columns selected, and with the introduction of System Statistics, Oracle even assigns a direct CPU cost to accessing a particular column of a row - as it has been pointed out by Joze Senegacnik and is demonstrated e.g. in Christian Antognini's "Troubleshooting Oracle Performance" book on page 117.

However, Oracle obviously treats work that has to be performed as part of the projection differently than work that has to be performed as part of the selection part.

In particular user-defined functions or scalar subqueries will not be accounted for costing when calculating the overall query cost.

This holds even true for user-defined functions that have a cost assigned via the Extensible Optimizer framework. For more information regarding the "Extensible Optimizer" framework, read Adrian Billington's article on oracle-developer.net or refer e.g. to our latest "Expert Oracle Practices" OakTable book where Joze Senegacnik dedicated a whole chapter to this topic.

Of course the best way to deal with this situation is to avoid using user-defined functions or scalar subqueries and replace them with appropriate joins/subqueries. In most cases this yields the best performance, provided that the optimizer comes up with a reasonable execution plan, and also solves the issue of non-contributed work, because regular join constructs will be considered in the cost calculation.

However, what to do of you're in the situation that you can't simply change the query (e.g. third party vendor application)?

If a user-defined function or scalar subquery is used as part of the selection clause the cost-based optimizer will make use of any cost associated with the function, or evaluated for the scalar subquery, whereas the same construct used as part of the projection will not be taken into account for the cost.

In particular in the case of user-defined functions that perform costly operations, for example recursive SQL, and are not declared as deterministic (or are really of undeterministic nature) this can make a significant difference.

The problem with functions that are not declared as deterministic is that the built-in caching feature of Oracle that can help with scalar subqueries or functions declared as deterministic (since Oracle 10.2) can not be used to alleviate the potentially resource-intensive numerous calls to the function.

In case of more complex queries that make use of views, the view merging transformations applied to the query therefore can lead to quite different work performed by a query.

The code creates a table holding 40,000 rows and a simple index with a bad clustering factor. A user-defined function that allows to generate logical I/O based on that index is created and associated a default cost (and selectivity) using ASSOCIATE STATISTICS. The function deliberately uses the DBMS_RANDOM package to simulate a non-deterministic behaviour. Each BATCH_ID in the table covers 400 rows, with 100 batch_ids in total.

It is not that obvious from the "Projection" information, but in this case the function (the "VAL" of the "Projection" in operation id 4) is evaluated before the join takes place.

Notice that the (still undocumented) "NO_PLACE_GROUP_BY" hint is required from 11g on to prevent the optimizer from getting too clever with this kind of statement. The GROUP BY is used in this case to simplify the result set processing aggregating it into a single row, but in 11g by default the new GROUP BY placement pushes the GROUP BY into the view, effectively solving the issue of excessive function calls by simply reducing the row source sizes that subsequently get joined. Since this is not supposed to be point of this demonstration, the clever trick of pushing the group by into the view is prevented. However it is interesting to note how new features of the optimizer can help to solve problems by side-effects.

The problem described here can still be seen in 11g without any hints when not using a GROUP BY clause to aggregate the result set.

If you run this with statistics_level set to ALL and check the runtime statistics (DBMS_XPLAN.DISPLAY_CURSOR), you'll notice a significant difference between merging the view T2 or not. By default Oracle will merge the view, and obviously perform the "projection" as part of the HASH GROUP BY operation after joining the data, leading to 160,000 calls to the function, each generating three logical I/Os (when using 1 as parameter to the function).

So obviously it is a good idea in this particular case to not merge the view, but although a I/O cost has been explicitly assigned to the function, you can see that both execution plans have exactly the same cost and Oracle happily merges the view.

The costing looks different when using the function as part of the projection clause:

Note that I had to change the function parameter to something that refers to a column expression, otherwise the optimizer treats "generate_lio(1)" as "independent" and adds a FILTER operation that evaluates "generate_lio" exactly once.

It is now obvious from the execution plan that the cost of executing the function is considered:

It is also interesting to note that although the subquery is executed as part of the VIEW projection step in operation id 7, the scalar subquery is still shown at top level of the query starting with operation id 1. It would be more accurate to show it as child of operation id 7 in this particular case, but this is probably not supported by EXPLAIN PLAN at present.

At runtime however, the outcome is different from the function case, mainly due to the filter optimization / subquery caching feature, which also makes the subquery implicitly deterministic - it will only get executed as many times as there are distinct number of input values, which is the A_RANDOM column in this case.

In both cases the subquery will be executed only approx. 400 times, because there are only 400 distinct values in the generated row source.

It is obvious from the "Buffers" column that the scalar subquery has been executed at different steps of the execution plan.

However in cases where you're not that lucky and the filter optimization doesn't work that efficiently, there still might be a significant difference between the merged and unmerged view variant of the query.

Again, when moving the subquery to the selection, the cost calculation looks quite different:

It is interesting that the FILTER operation is executed "late" (and therefore potentially more often) - since it is only depending on the T1.RUN_ID and T1.A_RANDOM column it could be executed "earlier" while processing the T1 row source, and in fact this can be achieved by adding the PUSH_SUBQ hint to the subquery. I haven't investigated this further, but may be the optimizer doesn't cost the different subquery pushing options when explicitly requesting to not unnest it (the NO_UNNEST hint) - without the NO_UNNEST hint the subquery is transformed into a join in this particular case.

It can be seen that in this case the optimizer used a "worst case" approach estimating that the scalar subquery gets executed many, many times. Very likely the cost increase can be explained by 160,000 times the cost of the scalar subquery (which might be less than 3 and gets rounded up in the EXPLAIN PLAN output) due to the "late" execution.

So it is not obvious to me why the optimizer treats these two cases differently - in case of the selection it uses a "worst-case" approach, but why this is not used in case of the projection is not clear to me.

In summary, you need to be careful in particular when using functions as part of the projection clause and complex queries - the order of evaluation might make a significant difference to the overall query performance.

As already mentioned, the best way to deal with such constructs is to avoid them. If you can not, you first should evaluate if the function can be declared as deterministic (will be cached from 10.2 on), or if you can use the workaround of wrapping the function call into a scalar subquery (select f(x) from dual) to take advantage of the subquery caching feature, which makes the function also implicitly deterministic.

if you are referring to the "no_place_group_by" hint, then yes, I think this does the same and effectively disables this optimization. There are certain bugs in 11.1 that you might hit and therefore support might suggest this setting.

It will affect the performance in those cases where you currently benefit from the new optimization - there is no way to tell how much impact this will have on your performance without further analysis.

If your system was already running on 10g, then your SQLs were already running without that optimization since it was introduced in 11.1 - may be this can give you a clue about the impact, but it is not necessarily a fair comparison since you might benefit from other optimizer changes introduced in 11.1.

For more information about that optimization, you can e.g. read the blog post by Jonathan Lewis.