Brain Teaser: Why is this Query Performing a Full Table Scan

14092011

September 14, 2011

While taking a look at the OTN threads this morning, I found an interesting test case, where the OP was asking why a query used a full table scan and not an index range scan. The OP would like for the query to use the index without using a hint in the query. So, why doesn’t the OP’s test case use an index range scan? Just for fun I will state that my first two initial guesses were not quite on target.

A slightly modified table creation script of the setup for the OP’s test case:

Actions

Information

45 responses

I did not consider a bitmap index. Can you explain how it could help in this particular case? Is it just because bitmap indexes are typically used when there are few distinct values in a column, or is there some other bit of information embedded in a bitmap index that would help the optimizer improve the cardinality estimates?

Alan, excellent point. The OP stated that he was running Oracle Database 10.2.0.4.5. I wonder why no one else mentioned that fact in the OTN thread until this blog article was published. By the way, this was my first initial guess.

Nice suggestion. That was my first and second thoughts for allowing the index to be used. In testing on Oracle Database 11.2.0.2, a histogram was generated, yet the execution plan remained unchanged. A definite problem if the statistics sample size was not large enough.

That certainly is a different approach – something that I had not considered. I believe that I first saw a solution like the above in one of Tom Kyte’s books. The function based index will be very small in size, which should increase the chances that the index would be used. As shown in the execution plan that you posted below, I think that you just demonstrated the point that Randolf made – find a way to fix the optimizer’s statistics calculations.

… Certainly I would review the cardinality estimate of the optimizer, compare it to data distribution you’ve already shown and ask then myself if there is something I could do to bring the estimate closer to the reality…

Good comments to help the original poster. Using an old saying: “Give a man a fish, feed him for a day; teach a man to fish, feed him for life.” :-)

By the way, nice job in this Usenet posting – my comment apparently completely missed the mark, and your comment was right on target (I might have made it to that point after a couple of weeks of research ;-) ):

The poster has to make sure a histogram is created on STATUS. Otherwise the density is 0,5 (1/num_distinct) and that will favour the FTS. One way that will likely result in a histogram is to use DBMS_STATS.GATHER_TABLE_STATS as Stew pointed out.

But then, if you are on 11, the way the CBO uses “NewDensity” (which is 0,5 in this case) for unpopular values will make it use the FTS again. I ran into this problem before and I still don’t understand why this was done. It seems to have such a huge negative impact for a lot of queries. I wish I would be intelligent enough to understand Alberto Dell’Era posts on this subject… :-)

Good points, but I think that the DENSITY might still show 0.5 (or 0,5 depending on you NLS settings) even when a histogram is created on the column. See my comment at the following link and the follow ups by Martin Preiss and Niall Litchfield:

I was surprised to see that the DENSITY had not changed in my test, but I supposed that issue could have been caused by a lack of workload before collecting the statistics or because of the default sample size.

—

I wonder if the optimzer just selected the full table scan access path because the calculated cost of that access path was less than the calculated cost of the index access path. Of course that again comes back to Randolf’s point about the accuracy of the statistics.

No one suggested changing the OPTIMIZER_INDEX_COST_ADJ parameter, as was suggested in the OTN thread? That parameter just might work (but what else will it break in the process?

So you seem to have ended up with a sampled frequency histogram (5414 rows sampled) that consisted of a single popular value – which simply means that due to the sampling the histogram missed the very rare INVALID value. It’s a bad side effect of AUTO_SAMPLE_SIZE in 11g that it uses 100% sampling for the table and basic column statistics but sometimes scales down the sampling size for histograms. This probably has been implemented for performance reasons in order to loose not too much time on the additional gathering iterations required for each histogram, but it can lead to inconsistent statistics – as you can see from the other parts of the 10053 trace file the basic column statistics have covered both values (NDV: 2).

Now combine this inconsistency with the fact that since Oracle 10.2.0.4 a value not found in a frequency histogram will no longer get a cardinality estimate of 1 as it used to be in the past but half of the cardinality of the least popular value – and you end up with a cardinality estimate of 50% of the table, since you have only a single popular value in the histogram that covers 100% of the data…

This bad cardinality estimate drives up the cost of the index access so you end up with the FTS being favoured by the optimizer.

This behaviour can be influenced with a FIX_CONTROL (5483301) if it causes you consist trouble in one of your applications. See Jonathan’s post on this and the corresponding comments. The fix control let’s you return to the old behaviour – missing values will get an cardinality estimate of 1.

Repeat your test case on 11.2 with estimate_percent => NULL or 100 and you should end up with a frequency histogram covering both values and therefore get both a correct cardinality estimate and an index usage.

Which leads back to my initial reply – check the cardinality estimates – bad cardinality estimates are the most common issue with bad execution plans.

thanks a lot for your answer, which hit the nail on the head. I noticed the difference in the CBO trace file with estimate_percent => NULL vs. the default value yesterday, but was confused that the basic statistics knew about the 2 dictinct values in both cases.

Great explanations! (I think that it is safe to say that I would not have been able to drive as deep into this answer as Randolf did, I seem to have forgotten some of these fine details.)

—

I think that everyone who commented in this blog article was certainly headed in the right direction (just some of you were moving a bit faster in that direction). It is interesting to see how such a simple case of “why isn’t Oracle using my index” has brought out so much helpful information on the topic.

One thing to bear in mind with generic “why isn’t my query using this index” type questions, is that we’re often talking about production code on live systems and that the query in question isn’t the only query using the table/s in question.

So, unless there is a system-wide issue with the statistics on the table/s then I’m not comfortable using a solution with a system-wide scope – i.e. statistics and histograms.

So, an alternative is dynamic sampling, hinted at the statement level.
As nothing comes for free, you should use the lowest level that makes a difference, bearing in mind that there are certain sanity checks at the different levels that kick in and prevent dynamic sampling from being considered. These can often be overruled by using the additional dynamic_sampling_est_cdn (which was documented in 9.2 but has since gone from the manuals).

I think with this example everyone has pretty much hit the nail on the head with the appropriate hammer, but I just wanted to make the point that going around calculating statistics or changing histograms has a scope beyond a single problem sql statement. But then we all know that, don’t we.

Dom offers some excellent big picture advice – you might fix one problem with a global change (such as adding a histogram, changing an index definition, changing statistics, or changing the OPTIMIZER_INDEX_COST_ADJ parameter), but you just might open up a whole new set of problems by making that global change in a production system.

Let’s take another look with the big picture in mind. We do not know if the OP is running the Exterprise Edition of Oracle Database, the Standard Edition, or something else. First, let’s recreate the test case, this time collecting statistics using DBMS_STATS (I am using a 100% sample size, but that should not be necessary):

That is interesting. The stored outline was used with the default value for the OPTIMIZER_INDEX_COST_ADJ parameter because of the stored outline, the displayed cost of the execution plan is the same as when the OPTIMIZER_INDEX_COST_ADJ parameter was set to 40, and the child number is 0 just as it was when the OPTIMIZER_INDEX_COST_ADJ parameter was initially at its default value.

Cardinality feedback monitoring may be enabled in the following cases: tables with no statistics, multiple conjunctive or disjunctive filter predicates on a table, and predicates containing complex operators that the optimizer cannot accurately compute selectivity estimates for.

But this led me to the final answer to the question what the OP should do. Answer: EXEC dbms_stats.DELETE_TABLE_STATS(>SCHEMA_NAME<,’T1′);

“Cardinality feedback monitoring may be enabled in the following cases: tables with no statistics, multiple conjunctive or disjunctive filter predicates on a table, and predicates containing complex operators that the optimizer cannot accurately compute selectivity estimates for. In some cases, there are other techniques available to improve estimation; for instance, dynamic sampling or multi-column statistics allow the optimizer to more accurately estimate selectivity of conjunctive predicates. In cases where these techniques apply, cardinality feedback is not enabled. However, if multi-column statistics are not present for the relevant combination of columns, the optimizer can fall back on cardinality feedback.”

Thanks for the link to the Oracle optimizer group’s blog article. That quote explains why cardinality feedback did not take effect (I read the article in the past, but it seems that I forgot the rules that determine what will cause cardinality feedback to take place).

Let’s test a couple of the suggestions to see if they work in my 11.2.0.2 test environment:

I must admit that I was a little side-tracked with my previous comment, and partially forgot that I was trying to see if cardinality feedback would show up in the execution plans. The reason why I was side-tracked? Notice in the SQL*Plus output in my previous comment where an index range scan was selected in every case without teh dynamic sampling note in the autotrace generated execution plans… I was thinking: “neat, the methods mentioned by Marcus and Dom to help the optimizer decide to use the index access paths worked.”

Side-tracked by the index access?
Talking about the statement at line 64, given that:
– you’ve deleted the table stats
– dropped the outline
– there should be no dynamic sampling
– cardinality feedback does not apply on run 1
is there anything else that catches your eye?

Yes, line 64 in the test case that I posted above (SELECT /*+ dynamic_sampling(0) */ ) that specifically requested no dynamic sampling, was not (apparently) influenced by cardinality feedback, and was not influenced by a modified OPTIMIZER_INDEX_COST_ADJ parameter – that index access path surprised me. I might experiment a bit more later to see if I can determine what happened.

:)
So, in this case, the absence of a note saying “dynamic sampling used” does not really mean that dynamic sampling was not used.
It might make more sense in light of what I said in my first comment about dynamic_sampling_est_cdn.

The dynamic_sampling hint applies to the selectivity of the single table access predicates.
The “Note” also seems to only apply to that selectivity sampling.

By using dynamic_sampling(0) we turned off the selectivity sampling (which allowed cardinality feedback to kick in on the subsequent execution if there was one).

But, there’s no way to turn off the dynamic sampling of the cardinality.
For example, there is no hint dynamic_sampling_no_est_cdn or no_dynamic_sampling_est_cdn.

So, you then get a standard 1% selectivity estimate on the dynamically sampled cardinality rather than the “1” we migh have expected with no stats and apparently no sampling.

It’s obvious I suppose once you realise what must be going, which I certainly didn’t even notice originally.

> So, in this case, the absence of a note saying “dynamic sampling used” does not really mean that dynamic sampling was not used.
> It might make more sense in light of what I said in my first comment about dynamic_sampling_est_cdn.

> The dynamic_sampling hint applies to the selectivity of the single table access predicates.
> The “Note” also seems to only apply to that selectivity sampling.

> By using dynamic_sampling(0) we turned off the selectivity sampling (which allowed cardinality feedback to kick in on the subsequent execution if there was one).

> But, there’s no way to turn off the dynamic sampling of the cardinality.
> For example, there is no hint dynamic_sampling_no_est_cdn or no_dynamic_sampling_est_cdn.

Have you verified above? It would mean that you get dynamic sampling for base table cardinality estimates without a note “dynamic sampling used” in the plan output.

I don’t think this is the case. Above figures are actually based on hard-coded defaults – based on the segment size in blocks Oracle multiplies this by a default number of rows per block based on default block size. It’s a bit of coincidence that these defaults actually lead to a base table cardinality that is close to the actual one used in this example here.

There is no dynamic sampling taking place, not even for the base table cardinality, although your explanation sounds quite right with the dynamic_sampling vs. dynamic_sampling_est_cdn.

If you can show me evidence of dynamic sampling taking place here from a 10053 trace file, only then I’ll buy it.

As a side note – this case here can be used as an example where dynamic sampling results can be rejected – however the case needs some modification:
– No index eligible for dynamic sampling (e.g. unindexed complex expression used as filter predicate)
– Underlying statistics available

In such a case if you request dynamic sampling for better selectivity estimates on top this will be rejected in many cases up to quite high levels of dynamic sampling, because Oracle does not find any rows satisfying the filter predicate in the sample and hence rejects the dynamic sampling results until you reach a certain threshold where the majority of table blocks gets sampled.

What I’m trying to say here is that there are cases where dynamic sampling doesn’t help as expected – and the data distribution here represents such an edge case.

You’re right about the cardinality.
I had expected to find a cardinality note in the 10053 and didn’t verify before leaping to a conclusion and it’s not there.
A case of 1 + 1 = 3 and a lesson in being thorough.

Regarding your comments on the rejection of dynamic sampling, that was the point of my original comment mentioning this as a possible solution – that if underlying statistics are available then this example SQL won’t use dynamic sampling if you just hint it with /*+ dynamic_sampling() */ but that you could force it using dynamic_sampling_est_cdn. i.e. it can help here but you need the additional hint.

I was referring to a slightly different case I think where the dynamic sampling is actually done but result is rejected (and corresponding output can be found in the 10053 trace file) – but you might be correct that the treatment is different when using the dynamic_sampling_est_cdn hint in addition – something I need to test.

I thought you were referring to the case where dynamic sampling does not kick in because statistics have been gathered and default level 2 therefore doesn’t satisfy.

Of course it also depends on whether you mean cursor or table level dynamic sampling with your hint…

You’re right – I was talking about the scenario where the checks fail and sampling is not done.
You were talking about the scenario where the dynamic sampling is actually done but the results rejected.

Hints for Posting Code Sections in Comments

********************
When the spacing of text in a comment section is important for readability (execution plans, PL/SQL blocks, SQL, SQL*Plus output, etc.) please use a <pre> tag before the code section and a </pre> tag after the code section:

<pre>

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;
TODAY
---------
01-MAR-12

</pre>
********************
When posting test case samples, it is much easier for people to reproduce the test case when the SQL*Plus line prefixes are not included - if possible, please remove those line prefixes. This:

SELECT
SYSDATE TODAY
FROM
DUAL;

Is easier to execute in a test case script than this:

SQL> SELECT
2 SYSDATE TODAY
3 FROM
4 DUAL;

********************
Greater than and Less than signs in code sections are often interpretted as HTML formatting commands. Please replace these characters in the code sections with the HTML equivalents for these characters: