The issue:

DB: 11.2.0.2.0 – 64bit
I have a small query with a little error, which causes big troubles.
The relevant part of the query is
WHERE ….
AND inst_prod_type=003
AND setid=’COM01′

but INST_PROD_TYPE is VARCHAR2.

this leads to filter[ (TO_NUMBER(“INST_PROD_TYPE”)=3 AND “SETID”=’COM01′) ]

based on this TO_NUMBER ( I guess!) the optimiser takes a fix selectivity of 1%.

Can someone tell me if this 1% is right? Jonathan Lewis “CBO Fundamentals” on page 133 is only talking about character expressions.

Unfortunately there are only 2 distinct values of INST_PROD_TYPE so this artificial [low] selectivity leads to my problem:
An INDEX SKIP SCAN on PS0RF_INST_PROD is choosen. (columns of PS0RF_INST_PROD: INST_PROD_TYPE, SETID, INST_PROD_ID )

After fixing the statement to
AND inst_prod_type=’003′
another index is used and the statement performs as expected.

Now I have no problem, but want to find the optimizers decisions in my 10053 traces.

The Important Bits of Information

From Martin’s email we need to pay close attention to:

Predicate of “inst_prod_type=003″ where INST_PROD_TYPE is VARCHAR2 (noting no single quotes around 003)

In this case the estimated number of rows is spot on – 1 million rows. Now lets regather stats and because of our queries using C1 predicates, it will become a candidate for a histogram. We can see this from sys.col_usage$.

And whammy! Note that the implicit cast [ filter(TO_NUMBER(“C1″)=003) ] now has an estimate of 1 row (when we know there is 1 million).
So what is going on here? Let’s dig into the optimizer trace for some insight:

Using prorated density: 0.000000 of col #1 as selectvity of out-of-range/non-existent value pred

The presence of the histogram and the implicit conversion of TO_NUMBER(“C1″)=003 causes the CBO to use a density of 0 because it thinks it’s a non-existent value. The reason for this is that TO_NUMBER(“C1″)=003 is the same as TO_NUMBER(“C1″)=3 and for the histogram the CBO uses TO_CHAR(C1)=’3′ and 3 is not present in the histogram only ‘003’ and ‘100’.

Dumb Luck?

So, what if the predicate contained a number that was not left padded with zeros, say 100, the other value we put in the table?

I’ve written before on choosing an optimal stats gathering strategy but I recently came across a scenario that I didn’t directly blog about and think it deserves attention. As I mentioned in that previous post, one should only deviate from the defaults when they have a reason to, and fully understand that reason and the effect of that decision.

Understanding METHOD_OPT

The METHOD_OPT parameter of DBMS_STATS controls two things:

on which columns statistics will be collected

on which columns histograms will be collected (and how many buckets)

It is very important to understand #1 and how the choice of METHOD_OPT effects the collection of column statistics.

Depending on if the table is partitioned or subpartitioned, and depending on what GRANULARITY the stats were gathered with, the latter two of those views may or may not be populated.

The Bane of METHOD_OPT: FOR ALL INDEXED COLUMNS

If you are using FOR ALL INDEXED COLUMNS as part of your METHOD_OPT you probably should not be. Allow me to explain. Using MENTOD_OPT=>'FOR ALL INDEXED COLUMNS SIZE AUTO' (a common METHOD_OPT I see) tells DBMS_STATS: “only gather stats on columns that participate in an index and based on data distribution and the workload of those indexed columns decide if a histogram should be created and how many buckets it should contain“. Is that really what you want? My guess is probably not. Let me work through a few examples to explain why.

Notice that in both cases only column PK has stats on it. Columns A,B,C,D and E do not have any stats collected on them. Also note that when no SIZE clause is specified, it defaults to 75 buckets.

Now one might think that is no big deal or perhaps they do not realize this is happening because they do not look at their stats. Let’s see what we get for cardinality estimates from the Optimizer for a few scenarios.

Notice the E-Rows estimate for T1. The Optimizer is estimating 10,000 rows when in reality there is 500,227. The estimate is off by more than an order of magnitude (50x). Normally the calculation for the cardinality would be (for a one table single equality predicate):
number of rows in T1 * 1/NDV = 1,000,000 * 1/2 = 500,000
but in this case 10,000 is the estimate. Strangely enough (or not), 10,000 is exactly 0.01 (1%) of 1,000,000. Because there are no column stats for T1.A, the Optimizer is forced to make a guess, and that guess is 1%.

As you can see from the 10053 trace (below), since there are no statistics on the column, defaults are used. In this case they yield very poor cardinality estimations.

Yikes. In this case the cardinality estimate is 100 when the actual number of rows is 124,724, a difference of over 3 orders of magnitude (over 1000x). Where did the 100 row estimate come from? In this case there are two equality predicates so the selectivity is calculated as 1% * 1% or 0.01 * 0.01 = 0.0001. 1,000,000 * 0.0001 = 100. Funny that. (The 1% is the default selectivity for an equality predicate w/o stats.)

Doh! The cardinality estimate is now 5, but the actual number of rows being returned is 109,816. Not good at all. The Optimizer estimated 5 rows because it used a default selectivity of 1% (for A=1) * 1% (for B=3) * 5% (for D+E > 50) * 1,000,000 rows. Now can you see why column statistics are very important? All it takes is a few predicates and the cardinality estimation becomes very small, very fast. Now consider this:

What is likely to happen in a data warehouse where the queries are 5+ table joins and the fact table columns do not have indexes?

Would the Optimizer choose the correct driving table?

Would nested loops plans probably be chosen when it is really not appropriate?

Hopefully you can see where this is going. If you don’t, here is the all too common chain of events:

Non representative (or missing) statistics lead to

Poor cardinality estimates which leads to

Poor access path selection which leads to

Poor join method selection which leads to

Poor join order selection which leads to

Poor SQL execution times

Take 2: Using the Defaults

Now I’m going to recollect stats with a default METHOD_OPT and run through the 3 execution plans again:

As you can see, the first two queries have spot on cardinality estimates, but the the third query isn’t as good as it uses a column combination and there are no stats on D+E columns, only D and E individually. I’m going to rerun the third query with dynamic sampling set to 4 (in 10g it defaults to 2) and reevaluate the cardinality estimate.

I hope this little exercise demonstrates how important it is to have representative statistics and that when statistics are representative the Optimizer can very often accurately estimate the cardinality and thus choose the best plan for the query. Remember these points:

Recent statistics do not necessarily equate to representative statistics.

Statistics are required on all columns to yield good plans – not just indexed columns.

You probably should not be using METHOD_OPT => 'FOR ALL INDEXED COLUMNS SIZE AUTO', especially in a data warehouse where indexes are used sparingly.

Dynamic Sampling can assist with cardinality estimates where existing stats are not enough.

The author of the original post had stated that the table in question contains about 16 million rows and states: “The table contains three IDEAL columns for bitmap indexes the first of which
may have only two, the second three and the third four distinct values. I was planning to change the index type on these columns to BITMAP [from B-tree].” To keep the focus of this post narrow, I’m only going to discuss whether or not one should consider bitmap indexes for queries, and not discuss potential update related issues.

The Data

For this experiment, I’m going to create a table that has three columns with the given NDV from above and add in a few extra filler columns to pad it out a bit. Since I do not know the exact table structure, I’ll just go with a simple example. In reality, the posters table may be wider, but for this example, it is what it is.

The original post did not contain any queries or predicates, so for the purpose of this example I’m going to assume that there are exactly three predicates, one on each of column A, B and C, and that each predicate is a single equality (e.g. A=1 and B=1 and C=1). Looking at the data distribution from the query above, we observe there are approximately three different grouping counts: the lower around 333,000 the middle around 666,000 and the upper around 1,300,000. I will choose tuples from each of these groupings for the three test cases.

Query A

select *
from bm_test
where a=1 and b=1 and c=1;
333292 rows selected.

As you can see from the execution plans, Query A and B use the bitmap indexes and Query C uses a Full Table Scan. Of the 16,000,000 rows, Query A returns 333,292 (2.08%), Query B returns 666,130 (4.16%) and Query C returns 1,332,121 rows (8.33%). I think it is important to note that the change in the execution plan from index access to table scan is due to the costing, not directly due to the percentage of data returned.

Execution Times

I’m going to gather two sets of execution times. The first will be with a cold buffer cache, and the second with a warm buffer cache. All elapsed times are in seconds.

Query

Execution Plan

Cold Cache

Warm Cache

A

Bitmap Index

38

3

B

Bitmap Index

40

4

C

FTS

16

16

As you can see from the execution times, there is a significant difference (approx. 11x) between the cold and warm cache executions of each Query A and Query B. The other observation is that Query C (FTS) is faster than Query A (Index Access) on a cold cache. We surely need to account for this. One observation I made (from iostat) is that the I/O throughput rate for Query A and Query B was around 23MB/s while the I/O rate for Query C was around the 55MB/s range during the cold cache execution. None of the queries used the Parallel Query Option.

Lets take a look at the tkprof output from both the cold and warm cache executions of Query A and see if we can find where the time is being spent. The traces were collected using event 10046, level 8.

As you can see the majority of the time was spent on db file sequential read doing the 99,722 physical reads. This explains the difference in elapsed time between the cold and warm cache executions of Query A: it comes down to physical I/O. But why does Query C run in half the time that Query A runs in when the cache is cold, given that Query C is doing a FTS and Query A is not? Shouldn’t the FTS plan be slower than the index plan?

As you can see with Query C, the read size is 128 blocks or 1MB (128 blocks * 8k block), the largest I/O that Oracle will issue. This explains the difference in the observed I/O throughput (23MB/s vs. 55MB/s): the bitmap index plan reads the table 1 block at a time, and the FTS reads (most of) it 128 blocks at a time. It makes good sense that if the read throughput rate is ~2x (23MB/s vs. 55MB/s) then the execution time would be ~0.5 as long (38 seconds vs. 16 seconds). The larger I/O size will have a higher throughput rate compared to a smaller I/O size. The exact breakdown of the multi-block reads are:

If we look at the tkprof output again from Query A, we see there are 99,722 waits on db file sequential read. Of those 99,722 waits, 98,681 are on the table (grep is our friend here using the raw trace file and the event and object number), the remaining are for the indexes. This tells us that 98,681 out of 102,592 total blocks of the table were retrieved, just 1 block at a time. Basically we have done a very inefficient full table scan. This explains our two observations: 1) why the FTS is faster than the index access plan with a cold cache and 2) why the FTS has a higher read throughput than the index access plan. It all comes down to efficient physical I/O.

The Big Picture

Just because a column has a low NDV does not necessarily mean it is an ideal candidate for a bitmap index. Just like B-tree indexes, bitmap indexes are best leveraged when the combination of them makes it very selective (returns only a small number of rows). The classic example of using a bitmap index on a gender column (male/female) is a horrible one in my opinion. If there are only two values, and there is an even distribution of data, 50% selectivity is too large and thus not a good candidate for a bitmap index. Would you use any index to access 50% of a table?

Bitmap indexes can be very useful in making queries run fast, but if the BITMAP CONVERSION TO ROWIDS returns a large list of rowids, you may find that a FTS (or partition scan) may yield better performance, but may use more I/O resources. It comes down to a trade off: If there is a high buffer cache hit rate for the objects in the bitmap plans, it will run reasonably fast and requite less physical I/O. If the objects are unlikely to be in the buffer cache, a FTS will yield better performance as long as it is not bottlenecked on I/O bandwidth.

I recently came across a query in which the Optimizer was making a poor cardinality estimate, which in turn caused inefficient join type, which in turn caused the query to run excessively long. This post is a reenactment of my troubleshooting.

The Suspect SQL

The original SQL was quite large and had a fairly complex plan so I simplified it down to this test case for the purpose of this blog post:

select [...]
from fact_table al1
where al1.region = '003' and
al1.order_type = 'Z010' and
al1.business_type in ('002', '003', '007', '009') and
(not (al1.cust_po like '%MATERIAL HANDLING%' or
al1.cust_po like '%SECURITY%' or
al1.cust_po like '%SUMMER OF CREATIVITY%' or
al1.cust_po like '%TEST%'));

Looking at the plan I would surmise that the cardinality estimate for the fact table (line 4) must surely be greater than 9. To find out how far the Optimizer’s guess is off, I ran this query which contained all the filter predicates for FACT_TABLE:

select count(*)
from fact_table al1
where al1.region = '003' and
al1.order_type = 'Z010' and
(not (al1.cust_po like '%MATERIAL HANDLING%' or
al1.cust_po like '%SECURITY%' or
al1.cust_po like '%SUMMER OF CREATIVITY%' or
al1.cust_po like '%TEST%'));
COUNT(*)
----------
1,324,510

As you can see, the cardinality estimate in this case is way off. The Optimizer estimated 9 rows and in reality the query returns 1.3 million rows, only a difference of 6 orders of magnitude (10^6 or 1,000,000). How could this be? Let’s try and understand why and where the cardinality estimate went wrong.

Bite Size Chunks

I find the easiest way to debug these issues is to use start with one predicate then add one predicate at a time, noting the cardinality estimate and comparing it to the actual cardinality value.

With the addition of the NOT LIKE predicate we start to see a bit of a difference. This plan has a 20x reduction from the previous cardinality estimate (1,456,000/72,803 = 20). Let’s add one more NOT LIKE predicate and see what we get.

With the addition of the third NOT LIKE predicate the cardinality estimate has dropped from 3,640 to 182, another 20x reduction. Looks like we may have found the issue. Each NOT LIKE predicate appears to result in a 20x reduction (5% selectivity) from the previous estimate. The original query had all four NOT LIKE predicates on it and had a cardinality estimate of 9. If we work the math: 182 * 5% = 9.

Looking at the query we can see there are four NOT LIKE predicates each with a leading and trailing wild card (%). Since DBMS_STATS does not gather table column information on parts of strings, each of the NOT LIKE predicates will have a default selectivity guess of 5%. Given this query has four NOT LIKE predicates, the total reduction for those four predicates will be 5%^4 = 1/160,000 = 0.00000625 which is quite significant, and in this case not representative and the root cause of the original query’s suboptimal access and join type.

Dynamic Sampling To The Rescue

Dynamic Sampling was designed with cases like this in mind. That is, cases where the Optimizer has to resort to selectivity guesses and could very well guess poorly. Substring guessing is not simply done as the substring could appear anywhere in the string. Let’s see what the cardinality estimate is when I add a dynamic_sampling hint to the query.

With a level 4 dynamic_sampling hint the Optimizer estimates 1.6 million rows, very close to the actual value of 1.3 million rows. This estimate is surely close enough to give us the optimal access and join type such that the original query should perform optimally.

Summary

There are case where the Optimizer guesses and sometimes can guess poorly. When this is the case, dynamic sampling can be used to give the Optimizer a better cardinality guess. Dynamic sampling is probably best suited for queries that run minutes or longer, such that the overhead of the dynamic sampling query is only a fraction of the total run time. In these cases, the minimal overhead of dynamic sampling can well out weigh the cost of a suboptimal plan.

Disclaimer

The views or opinions expressed on this blog are my own and do not necessarily reflect the views or opinions of current or past employers. The views or opinions expressed by visitors on this blog are theirs solely and may not reflect mine.