Friday, December 23, 2011

So this will be my Oracle related Christmas present for you: A prototype implementation that extends the DBMS_XPLAN.DISPLAY_CURSOR output making it hopefully more meaningful and easier to interpret. It is a simple standalone SQL*Plus script with the main functionality performed by a single SQL query. I've demoed this also during my recent "optimizer hacking sessions".

DBMS_XPLAN.DISPLAY_CURSOR together with the Rowsource Statistics feature (enabled via SQL_TRACE, GATHER_PLAN_STATISTICS hint, STATISTICS_LEVEL set to ALL or controlled via the corresponding hidden parameters "_rowsource_execution_statistics" and "_rowsource_statistics_sampfreq") allows since Oracle 10g a sophisticated analysis of the work performed by a single SQL statement.

Of course you'll appreciate that it doesn't go as far as the Real-Time SQL Monitoring feature added in Oracle 11g but only available with Enterprise Edition + Diagnostic + Tuning Pack that is "always on" and provides similar (and much more) information while a statement is executing and doesn't require reproducing the execution with the corresponding hints / parameters set.

It's usually necessary to reproduce the execution without the Tuning Pack because the overhead of the Rowsource Statistics is significant and therefore it doesn't make sense to have them always enabled - unfortunately Oracle 11g gathers the same information "always on" but you're only allowed to access that information if you have the Tuning Pack license.

But for users without the corresponding licenses DBMS_XPLAN.DISPLAY_CURSOR together with Rowsource Statistics is still a very valuable tool.

However during my seminars and consulting at client sites I've realized that people quite often struggle to interpret the output provided for several reasons:

1. They have problems in general to interpret the execution plan - here I refer in particular to the flow of execution and underlying execution mechanics

2. They have problems in identifying the operations that are responsible for the majority of the work due to the cumulative nature of the work-related figures provided like Elapsed Time, Logical I/O, Physical Reads etc.

3. They are potentially mislead when trying to identify those steps in the execution plan that are subject to cardinality mis-estimates of the optimizer - the single most common reason for inefficient execution plans - due to the way the optimizer shows the number of estimated rows for operations that are executed multiple times (for example the inner row source of a Nested Loop join).

I've tried to address all of the above points (and even more) with this prototype implementation. In fact point 1 above has already been addressed by Adrian Billington'sXPLAN wrapper utility that adds the Parent ID and Order of Execution to the DBMS_XPLAN output and I've picked up that idea of injecting additional information into the output with this prototype, so kudos to Adrian for his great idea and implementation.

Apart from any home-grown scripts there have probably been numerous attempts to address point 2 und 3 above, the latest one I know of being Kyle Hailey's DISPLAY_CURSOR post and his "TCF query" provided in the same article. I've included his TCF-GRAPH and LIO-RATIO information, so also thanks to Kyle for posting this.

I plan to eventually turn this into a SQL statement analysis "Swiss-army knife" for non-Tuning Pack users with more sophisticated formatting options (for example specifying which columns to show and in which order) and the ability to combine the information with the ASH info available from the Diagnostic Pack license (similar to the output provided by the Real-Time SQL Monitoring text mode).

However I believe that this prototype is already quite helpful and therefore decided to publish it as it is.

Let's have a look what the extended output has to offer by performing a couple of sample Rowsource Profiles.

Examples

The first example is deliberately kept as simple as possible to explain the basic functionality by performing a full table scan.

The first thing that becomes obvious is the fact that you need a veeery wide display setting to see all the columns provided :-)

As you can see if you call the script without any parameters it will try to pick up the last statement executed by the current session and call DBMS_XPLAN.DISPLAY_CURSOR with the ALLSTATS LAST formatting option. Further options can be found in the documentation provided with the script.

To the left you can see the "Pid" and "Ord" column that Adrian added in his original XPLAN wrapper script - these define the Parent Id as well as the Order of Execution. Note that this Order of Execution is only correct for the common cases - it doesn't cater for the various exceptions to the general rules and therefore can be misleading. You'll find below an example that demonstrates this.

Furthermore you see in addition the following columns to what is provided out of the box by DBMS_XPLAN.DISPLAY_CURSOR with the ALLSTATS LAST option:

A-Time Self: This is the time spent on the operation itself. For leaf operations this corresponds to the A-Time, but for all non-leaf operations this is the time that was spent on the operation itself obtained by subtracting the time spent on all direct descendant operations from the time shown for the parent operation. Please note that if you use a lower rowsource sample frequency (for example as set by the GATHER_PLAN_STATISTICS hint) the A-Time information will be pretty wrong and misleading. You need to set the sample frequency to 1 to get a stable time information reported - of course this means that the overhead of the rowsource sampling gets maximized

Bufs Self/Reads Self/Write Self: This is the corresponding self-operation statistic obtaining in the same way as just described

Graphs: The self-operation work shown relative to the total work performed. Note that the "total" is defined by querying the MAX value found in the statistics rather than picking the top-most cumulative value. This is because for queries that are cancelled or performed using Parallel Execution the top-most value may either not be populated at all or may be different from the values accumulated by the Parallel Slaves. So there are cases where the Graphs may be wrong and misleading - treat them carefully

LIO Ratio: This the simply the ratio between the number of rows generated by the row source and the number of logical I/O recorded for the particular operation required to generate them. As usual care should be taken when interpreting a ratio, but in general a high value here might indicate that there are more efficient ways to generate the data, like a more selective access path. This can be very misleading for aggregation operations for example - a COUNT(*) will potentially show a huge LIO ratio but doesn't indicate a problem by itself

TCF Graph: "Tuning by Cardinality Feedback" - this is a graph in a different style - it shows either plus or minus signs and each plus / minus corresponds to one order of magnitude difference between the estimated and the actual rows. Plus stands for underestimates, minus for overestimates. So two plus signs indicate that the the actual number of rows was 100 times greater than the estimated number, and similarly two minus signs would indicate an overestimate of factor 100. Note that this information will be partially misleading with Parallel Execution, because an operation that is only started once with serial execution might be started several times to obtain the complete result set when executed in parallel. Also cancelling queries might show misleading information here, see the "E-Rows*Sta" column description for an explanation why.

E-Rows*Sta: This is the estimated number of rows times the Starts column. This tries to address the point 3 above where the simple comparison of E-Rows and A-Rows can be very misleading, however doesn't indicate a problem at all if the operation has been started a corresponding number of times. If a query gets cancelled then this might still indicate a difference between this and A-Rows simply because the operation wasn't run to completion. Also for Parallel Execution this information needs to be carefully treated because an operation executed in parallel will be started many times that gets only executed once with serial execution

Looking at above example graph it becomes obvious that all of the logical and physical I/O has been caused by the full table scan of course, but with the increased STATISTICS_LEVEL setting you can see that the SORT AGGREGATE function also required some time - presumably CPU time due to instrumentation overhead whereas the top-most operation didn't account for any work at all. The cardinality estimate is also spot on.

Have I already mentioned that you need a veery wide display setting :-) ??

Anyway here we can see a couple of interesting points:

- An example of a parent operation requiring a significant amount of time - in this case a WINDOW SORT operation that spills to disk (see the Used-Tmp and Writes columns)

- A problem with the cardinality estimates as indicated by the TCF Graph. In this case it is the reason for an inefficient index-based access path. Note that the LIO Ratio isn't indicating this problem here very clearly

- The majority of the logical I/O (and time and work) is caused by the random access to the table, again caused by the bad choice of the optimizer due to the wrong cardinality estimates

This is another case where a more efficient execution plan could be found if the cardinality estimate was in the right ballpark - you can see this pretty clearly in the "TCF Graph" column. Due to the strong underestimation several bad choices have been made: Reading all rows from T by ROWID rather than performing simply a full table scan and again an index driven random access to T1 which drives up the logical I/O unnecessarily. This is a crafted example that minimizes the logical and physical I/O due to the good clustering of T1 in relation to the data returned by the driving row source - a more real-life bad clustering together with larger table sizes would have turned this into a more or less infinitely running query.

It is also an example that simply looking at E-Rows and A-Rows can be misleading: Check operations 28 and 29: A-Rows is 1000K but E-Rows is 1, so should this be worrying? Not at all if you look at the "E-Rows*Sta" column because the operation has been started 1000K times hence the estimate is spot on.

The "LIO Ratio" for operation 23 is 16 - this means it took 16 LIOs on average to generate a single row and might indicate that there are more efficient ways to generate those rows than a full table scan.

Here is the same query, but this time with a bad clustering of T1 - I've cancelled it after 40 seconds to give you an example of that you can use DBMS_XPLAN.DISPLAY_CURSOR without the need to run a statement for completion.

The result is similar to the previous, but you can see that the increased number of physical reads on the T1 table segment slowed down the execution significantly.

Here is an example of Parallel Execution. Note that I strongly recommend the Real-Time SQL Monitoring feature if you have to deal a lot with Parallel Execution, because it is offering much more insight and information than DBMS_XPLAN.DISPLAY_CURSOR.

The important points to consider when dealing with Parallel Execution are:

- You need to use ALLSTATS instead of ALLSTATS LAST in order to get a meaningful output. ALLSTATS LAST would only show the activity of the Query Coordinator. However ALLSTATS means that it shows the statistics cumulative for all executions so far, so if the query has been executed multiple times this will show you not only the statistics of the last execution. If you want to ensure to see only the statistics for the last execution you need to create a new cursor, for example by adding a simple comment that makes the SQL statement text unique

- You can see in the output that the "TCF Graph" and "E-Rows*Sta" columns can be misleading for Parallel Execution - the full table scan has been divided into 52 chunks executed by four parallel slaves, hence the Starts column shows 52 but the cardinality estimate of 1000K rows was spot on instead of wrong by factor 52

- The elapsed time information for the parts executed in parallel is not the wall clock time but the accumulated time spent by all parallel slaves, hence the Graphs will be partially misleading due to the MAX value found

- In this case also the "Reads Self" column seems to indicate reads by the SORT AGGREGATE operation - this looks questionable, too

The last example shows that it takes just a simple scalar subquery to make the output misleading again - so be aware that are some exceptions (like scalar / early filter subqueries, certain Parallel Execution plans etc.) to the rules how to interpret execution plans and usually any automated interpretation of such plans is therefore mislead:

The scalar subquery is shown as child operation to the root node (or in 10g with the missing ID = 0 operation in V$SQL_PLAN_STATISTICS(_ALL) as independent operation with no parent at all) and according to the usual rules therefore would be executed first (see the "Ord" column), but this is not true - the execution starts with the first leaf of the main branch of the plan (the full table scan of T1).

Note that not only the "Ord" column is wrong, also the fact that the SORT AGGREGATE operation includes the work performed by the scalar subquery is not interpreted correctly by the remaining logic calculating the operation self statistics.

The Script

Below you can find the current version of the script. If you're too lazy to copy&paste (and because I don't have a fancy "copy to clipboard" button) you can also download the script from here.

Of course I'm interested in feedback. This prototype is not tested much yet, so expect glitches and problems. If you get back to me with reproducible cases I'll try to address them and publish updated versions of the script.

A final note: This tool comes for free but with no warranties at all. Use at your own risk.

Happy rowsource profiling (and holiday season)!

set echo off verify off termout offset doc offdoc-- -------------------------------------------------------------------------------------------------- Script: xplan_extended_display_cursor.sql---- Version: 0.9-- December 2011---- Author: Randolf Geist-- oracle-randolf.blogspot.com---- Description: A free-standing SQL wrapper over DBMS_XPLAN. Provides access to the-- DBMS_XPLAN.DISPLAY_CURSOR pipelined function for a given SQL_ID and CHILD_NUMBER---- This is a prototype for an extended analysis of the data provided by the-- Runtime Profile (aka. Rowsource Statistics enabled via-- SQL_TRACE = TRUE, STATISTICS_LEVEL = ALL or GATHER_PLAN_STATISTICS hint)-- and reported via the ALLSTATS/MEMSTATS/IOSTATS formatting option of-- DBMS_XPLAN.DISPLAY_CURSOR---- Versions: This utility will work for all versions of 10g and upwards.---- Required: The same access as DBMS_XPLAN.DISPLAY_CURSOR requires. See the documentation-- of DISPLAY_CURSOR for your Oracle version for more information---- The script directly queries-- 1) V$SESSION-- 2) V$SQL_PLAN_STATISTICS_ALL---- Credits: Based on the original XPLAN implementation by Adrian Billington (http://www.oracle-developer.net/utilities.php-- resp. http://www.oracle-developer.net/content/utilities/xplan.zip)-- and inspired by Kyle Hailey's TCF query (http://dboptimizer.com/2011/09/20/display_cursor/)---- Features: In addition to the PID (The PARENT_ID) and ORD (The order of execution, note that this doesn't account for the special cases so it might be wrong)-- columns added by Adrian's wrapper the following additional columns over ALLSTATS are provided:---- A_TIME_SELF : The time taken by the operation itself - this is the operation's cumulative time minus the direct descendants operation's cumulative time-- LIO_SELF : The LIOs done by the operation itself - this is the operation's cumulative LIOs minus the direct descendants operation's cumulative LIOs-- READS_SELF : The reads performed the operation itself - this is the operation's cumulative reads minus the direct descendants operation's cumulative reads-- WRITES_SELF : The writes performed the operation itself - this is the operation's cumulative writes minus the direct descendants operation's cumulative writes-- A_TIME_SELF_GRAPH : A graphical representation of A_TIME_SELF relative to the total A_TIME-- LIO_SELF_GRAPH : A graphical representation of LIO_SELF relative to the total LIO-- READS_SELF_GRAPH : A graphical representation of READS_SELF relative to the total READS-- WRITES_SELF_GRAPH : A graphical representation of WRITES_SELF relative to the total WRITES-- LIO_RATIO : Ratio of LIOs per row generated by the row source - the higher this ratio the more likely there could be a more efficient way to generate those rows (be aware of aggregation steps though)-- TCF_GRAPH : Each "+"/"-" sign represents one order of magnitude based on ratio between E_ROWS_TIMES_START and A-ROWS. Note that this will be misleading with Parallel Execution (see E_ROWS_TIMES_START)-- E_ROWS_TIMES_START : The E_ROWS multiplied by STARTS - this is useful for understanding the actual cardinality estimate for related combine child operations getting executed multiple times. Note that this will be misleading with Parallel Execution---- More information including demos can be found online at http://oracle-randolf.blogspot.com/2011/12/extended-displaycursor-with-rowsource.html---- Usage: @xplan_extended_display_cursor.sql [sql_id] [cursor_child_number] [format_option]---- If both the SQL_ID and CHILD_NUMBER are omitted the previously executed SQL_ID and CHILD_NUMBER of the session will be used-- If the SQL_ID is specified but the CHILD_NUMBER is omitted then CHILD_NUMBER 0 is assumed---- This prototype does not support processing multiple child cursors like DISPLAY_CURSOR is capable of-- when passing NULL as CHILD_NUMBER to DISPLAY_CURSOR. Hence a CHILD_NUMBER is mandatory, either-- implicitly generated (see above) or explicitly passed---- The default formatting option for the call to DBMS_XPLAN.DISPLAY_CURSOR is ALLSTATS LAST - extending this output is the primary purpose of this script---- Note: You need a veeery wide terminal setting for this prototype, something like linesize 400 should suffice---- This tool is free but comes with no warranty at all - use at your own risk--#

Wednesday, December 7, 2011

If you consider the usage of Table Functions then you should be aware of some limitations to the optimizer calculations, in particular when considering a join between a Table Function and other row sources.

As outlined in one of my previous posts you can and should help the optimizer to arrive at a reasonable cardinality estimate when dealing with table functions, however doing so doesn't provide all necessary inputs to the join cardinality calculation that are useful and available from the statistics when dealing with regular tables.

Therefore even when following the recommended practice regarding the cardinality estimates it is possible to end up with some inaccuracies. This post will explain why.

Join Cardinality Basics

In order to appreciate the problem that can be encountered let's have a quick walkthrough what basic information is used by the optimizer to calculate a join cardinality. Here is a very simplified version of the join selectivity formula:

I've omitted the NULL (and histogram) case and hence simplified the formula further. Furthermore I'll restrict the show case here to a single join column.

There is another information that is evaluated but not obvious from above formula: The low and high values of the join columns. If the join columns do not overlap at all the join cardinality will be calculated as 1.

Finally this join selectivity will be multiplied by the (filtered) cardinality of the two row sources to arrive at the join cardinality:

Join Cardinality = Join Selectivity * cardinality t1 * cardinality t2

So for this simplified basic join cardinality formula the following information is required from the statistics (if available):

So we can see that the number of distinct values is one of the crucial inputs to the join cardinality calculation (Again: I deliberately keep things simple here and for example omit nulls). Another input are the min and max values of the join columns - this can be seen by again slightly modifying the example:

So it's interesting to see that the dynamic sampling code detected that the column T2.ID is used as part of a join and therefore not only determined the cardinality of the table but also the num_distinct and num_nulls information of the join column.

So this is a case where even with dynamic sampling the join cardinality estimate is incorrect and different from what you get with actually gathered statistics. Whether this is a deliberate design decision to keep the footprint of the dynamic sampling query as low as possible or an omission that could be fixed by a bug/enhancement request I don't know but if you were asking yourself whether dynamic sampling is a reasonable replacement for actual statistics or not - here is a case where it doesn't produce the same as basic table and column statistics.

Joins With Table Functions

Now that the basics have been clarified, let's move on to table functions. As outlined in the past and in one of my previous posts you shouldn't use table functions without helping the optimizer to come up with a reasonable cardinality estimate.

But as you have just seen for a proper join cardinality estimation there is more required than a reasonable cardinality estimate of a row source. Let's see what this means when attempting to join table functions with other row sources.

For that purpose I create the following simple table function that allows to generate a simple set of data controlled by the input parameters:

drop type t_num_list;

drop function f_tab_pipelined;

create or replace type t_num_list as table of number;

create or replace functionf_tab_pipelined( in_start in number default 1, in_end in number default 10, in_val in number default null) return t_num_list pipelinedisbegin for i in in_start..in_end loop pipe row(coalesce(in_val, i)); end loop; return;end f_tab_pipelined;/

So I can define the start and end values which will be returned with a step size of one. If the third parameter is provided rather than returning the current loop value the third parameter will be returned, resulting in a single distinct value repeated end - start + 1 times, otherwise the number of distinct values will be equal to the number of rows generated.

From 11.1 on dynamic sampling of table functions is supported, so let's simulate the cases I've just demonstrated with real tables.

First the case with 20 distinct values of T2 and overlapping join column values:

So that's a pity: The dynamic sampling code for table functions at present doesn't recognize the need to gather join column statistics, so the optimizer simply doesn't know about the num_distinct / num_nulls figures of the join column generated by the table function.

It certainly would be nice if Oracle enhanced the dynamic sampling code so that we have at least the same level of information available as with regular tables.

So what does that mean to the Join Cardinality estimate of the optimizer? It looks like that the Join Selectivity formula when dealing with Table Functions could be extended like this:

So the "greater" function will return NULL if any of the operands are NULL. In this case it seems to use a non-null num_distinct if found, and if none of them are defined, resort to a hard-coded default of 100 resulting in a default selectivity of 1/100.

Of course you'll appreciate that the same applies to Table Functions with regards to the non-overlapping join columns - the optimizer doesn't have a clue about these low and high values from the dynamic sampling performed hence it cannot detect such a case.

If you like to see the default case described in above formula in action, it just needs a join of two table functions:

So dynamic sampling gets the row source cardinality right, but the join cardinality is way off. If you repeat the same exercise with regular tables and dynamic sampling the correct join cardinality of 20 will be estimated because Oracle detects the 10000 distinct values of T1.FK.

If you really have the need to perform corrective actions with regular tables you can resort to the special hints that Oracle uses in SQL Profiles, which are of course undocumented and therefore you'll have to use them at your own risk. Christian Antognini published a very nice paper about SQL Profiles some time ago that also covers the internal details including the hints introduced for that purpose.

In our case the hint that would allow to provide the missing information would be in particular COLUMN_STATS.

Unfortunately the optimizer obviously refuses to apply those hints to table functions - they are only working with regular tables, so this doesn't help either.

This might also explain why the dynamic sampling code doesn't bother to query that kind of information for Table Functions - may be the optimizer at present cannot process such information and therefore it would be useless to gather it anyway.

Summary

If you would like to use Table Functions and also join them to other row sources you'll have to carefully check the join cardinality estimates of the optimizer, because some crucial information required for a proper join cardinality calculation is not available when dealing with Table Functions. This is also true when using 11g features like Table Function dynamic sampling.