Why Is My Index Not Being Used No. 2 Quiz (Quicksand) October 3, 2011

I have a table that has 1M rows with dates that span 2000 days, all evenly distributed (so there are 500 rows per day for the mathematically challenged). All stats are 100% accurate and I have an index on the date column.

And now everything has gone wrong. I’m still getting the same 500 rows but the CBO is choosing an inefficient FTS. The estimates are now way way off, with the CBO expecting 1500, not 500 rows to be returned.

The cardinality estimation is different because between is re-written as = .. which means end points are included in range. Due to this, we need to include the enpoint selectivity (1/2000) as well in estimation of overall cardinality.

However, the NLS_DATE_FORMAT does need to be the same as the string you are passing in order for the example to work.

I’m just trying to work out why the cardinality estimates are so different and make such a plan dramatic change. The problem I have is that for the full table scan, 1500/1000000 is 0.15% of the records, whilst, for the index 500/1000000 is 0.05%. I would not have thought that was sufficient to change from index to FTS. It is as if the index is just not being considered at all.

As Dom says above, the clustering factor of this index is bad (1000000 = number of rows in table), so it is not a good candidate to start with.

The selectivity changes by the addition of 1/num_distinct from the open ranges to the closed ranges.

num_distinct = 2000 for this table.

So, in moving from open bounded ranges to closed bounded ranges the selectivity changes by the addition of 1/2000 + 1/2000. For the 1000000 row table this increases cardinality by 1000000 * 2/2000 = 1000. This is where the estimated cardinality of 1500 comes from.

The optimiser therefore predicts a cost of 1507 which is just enough to put it over the full table scan cost of 1344 and hence the full tables scan is estimated as the best plan.

My theory is if the clustering factor was reduced from a value of NUM_ROWS to NUM_BLOCKS this would make the index more attractive.

Since the date is truncated, you have dates rounded (all dates in the table). Oracle decides that the volume of the table to be scanned has exceeded the threshold, and a full table scan is more efficient than the index scan and the full table scan (5.26%).
You have three times the data since one is greater than and less than, the other is between or greater than or equal to and less than or equal to.
You should be able to raise or lower this with the index optimizer parameter to see when it hits the index scan rather than the full table scan.
It would be more interesting if you can give the other row counts included, being from 500 rows to 1500 rows, perhaps to find the number of rows where the full table scan occurs.

As Mark says, the addition of a histogram on the indexed column will reduce the cost sufficiently to choose the index over the full table scan.

Nevertheless, the clustering factor still makes the index an unattractive offer. If you have a system where the overhead of a histogram is prohibitive. The changing of the clustering factor would be a cheap and efficient solution.

Surely changing the clustering factor just hides the fact that the order of the data in the table is not conducive to efficient index access? And by changing it, aren’t we encouraging the use of an index that we shouldn’t? (Even though the slight issue with the cardinality estimate shows the tipping point).

If it is the nature of the application/data that it arrives in this order and more often that not we want date ranges of this data, then might this not be a candidate for an IOT? We might have to mess with the PK to do it but it will at least give us nicely ordered data.

E.g. assuming id would normally be a PK even though it’s not in the original definition

(1) I don’t get full table scan in my 10.2.0.4.0 oracle release
(2) I am wondering why are you comparing string with date
(3) When I use dynamic_sampling I can get accurate estimation on table but not on the index
(4) as Dom has already pointed that out it’s clear that the index clustering_factor (100000) has a dramatic value that might have damaged the desirability of the index by the CBO. The index is such that the sys_op_lbid function when applied on it shows that 377 keys per leaf block are scattered into 2652 blokcs where the rest of keys is located into one leaf block.

That is said, I am not sure that the difference between E-Rows and A-Rows comes from this CF dramatic value but from the fact that available statistics do not reflect the real scattering of hist_date column making the CBO unable to have accurate estimations

same result of the wrong query.
the difference for me is the clause between.
Using AND operator between operator is equivalent to 2
conditions using "greater than or equal to" and
"less than or equal to" operators respectively

same result of the wrong query.
the difference for me is the clause between.
Using AND operator between operator is equivalent to 2
conditions using "greater than or equal to" and
"less than or equal to" operators respectively

hi all,
I’m be able to reproduce this issue finally in another DB.
I agree with Donatello and his analisys, the cardinality it’s the main factor to change the plans, CBO so.
…. and now we wait the answer from Richard…. ;-)

In the example of Richard, the CF is equal to the number of rows in the table.

Instead, a CF close to the number of blocks of the table, would leave unchanged the estimate (wrong) the selectivity, but instead of a FTS (with BETWEEN predicate) there would be access-type “range scan” …

so I think we should point out two things different:

1) the cardinality estimate is wrong for the calculation that I showed