Case Study

A recent thread on the ODC database forum highlighted a case where the optimizer was estimating 83,000 for a particular index full scan when the SQL Monitor output for the operation showed that it was returning 11,000,000 rows.

Apart from the minor detail that the OP didn’t specifically ask a question, the information supplied was pretty good. The OP had given us a list of bind variables, with values, and the SQL statement, followed by the text output of the Monitor’ed SQL and, to get the predicate section of the plan, the output from a call to dbms_xplan. This was followed by the DDL for the critical index and a list of the stats for all the columns in the index.

Here’s the critical line of the plan (from the SQL Monitor report) followed by its predicate section (from the dbms_xplan output, but cosmetically enhanced) and some details of the columns used in the predicate:

I’ve included the full Monitor output at the end of the posting, or you could visit the ODC page if you want to see it, but if we look at just this line we can see that the index full scan starts running in the first second of the query (‘Start Active’), runs once (‘Execs’) and, as the OP said, retrieved 11M rows in that one scan compared to an estimated 83,917.

When we examine the predicate section we can understand why the optimizer could make such a large error – the SQL requires Oracle to combine two columns from the table with various bits of bind variables to construct a date which is then compares with a couple of constant dates derived from several input bind variables using range based comparisons.

This is an example of Oracle using a fixed estimate of 5% for the selectivity of “unknown range-based comparison” – but with two comparisons the selectivity becomes 5% of 5% = 0.25% (i.e. 1/400).

If we look at the column definitions and stats we see that we seem to have 5 possible years and 12 possible months (which could mean a range as small as 3 years and 2 months) – so a selectivity of 1/400 would be in the right ballpark if we were querying for a date range of roughly 4.5 days. Working the figures the other way around – if 83,917 is 1/400 of the data then there are about 33.5M rows in the table and we are querying for something more like 1/3 of the table.

Observations

I find it curious that the optimizer used an “index full scan” to fetch a huge amount of data from the index when there is no requirement for sorting (there is a subsequent “hash unique”, rather than “sort unique nosort”). I would have expected an “index fast full scan” so I am curious to know if some optimizer parameters have been fiddled with to get the optimizer to bypass the fast full scan. Possibly a change in parameter settings would result in a very different plan.

The names of the bind variables are of the form “SYS_B_nn” – which means that the original query has been subject to the effects of forced cursor sharing. Since we are apparently expecting to identify and manipulate millions of rows this looks like the type of query where you don’t want to use cursor sharing. If the session can set “cursor_sharing=exact” before running the query, or inject the hint /*+ cursor_sharing_exact */ into the query then perhaps we’d get a better estimate of rows (and a better plan). If hinting or setting session parameters is possible then setting optimzer_dynamic_sampling to level 3, or possibly 4, might be sufficient.

The messy expression combining monthand yearis a crippling handicap to the optimizer – so fixing the query to make the literals visible isn’t actually going to help. This is Oracle 12c, though – so we could add a virtual date column (declared as invisible to avoid the threat of inserts that don’t specify column lists) and gather stats on it. The combination of virtual column and literal values might give the optimizer the information it really needs. Here’s a little script to demonstrate:

I’ve created a table with a million rows with data going back roughly 5 years from current date, which means I need roughly 550 rows per day. I’ve then created histograms on the month and yearcolumns to match the original posting. Now I’ll set up the bind variables and values specified by the OP and run a simple query to show the date information that the bind variables give, and the 1/400 selectivity of the OP’s predicate:

Note: in this and subsequent text I’ve had to use .le. to represent “less than or equal to” and .ge. to represent “greater than or equal to”. in the execution plans

This shows us that the first row in my table has a date component of 1st Dec 2013, while the date range required by the OP was one year’s worth of data between 1st Oct 2017 and 30th Sept 2018. The optimizer’s estimate of 2,500 rows out of 1M is the 1/400 we expect.

Let’s test the effect of running the query using literals (i.e. in the OP’s environment stop the “cursor_sharing = force” effect):

The optimizer sees that the expression involving monthand yearmatches the virtual column definition, and evaluates the two date expression to produce simple constants and gives us a cardinality estimate in the right ballpark.

Conclusion

Cursor sharing and “big” queries don’t mix. If you have queries that have to manipulate large volumes of data then the overhead of optimising each one separately is likely to be insignificant, and the threat of cardinality errors introduced by bind variables being re-used could be significant.

If you have to make use of an existing (bad) table definition, and can’t managed to write predicates that allow the optimizer to use existing column statistics, remember that you might be able to create a virtual (and invisible) column that captures the necessary definition thereby allowing you to give Oracle some statistics about the necessary predicate.

Footnote

In case you didn’t want to scan through the ODC page, here’s the full SQL Monitor output for the original query: