Indexes and NOT Equal (Not Now John) August 13, 2008

The Cost Based Optimizer (CBO) is a rather complex piece of code that has to deal with countless different possible scenarios when trying to determine what the most optimal execution plan might be. It’s also a vitally important piece of code because not only do the decisions need to be reasonably accurate so that it doesn’t generate inefficient execution plans but it needs to make these decisions in a reasonably efficient manner else it wastes resources and most importantly wastes time while it performs its calculations.

So there’s a trade-off between ensuring the CBO makes reasonable decisions while ensuring it makes its decisions in a timely and resource efficient manner. Database performance could be directly impacted if these trade-offs are not managed effectively.

Therefore, there are all sorts of short cuts and assumptions that are coded into the CBO to make its life a little easier. However, these short cuts can sometimes be problematic if they’re not recognised and handled appropriately.

One of these little short cuts worth noting is how the CBO deals with NOT EQUAL (and NOT IN) conditions …

Typically when we have a condition where we just say NOT EQUAL, we’re basically suggesting we’re interested in the vast majority of possible values with the exception of the value specified in the NOT EQUAL condition. We want most values but not if it’s this particular value.

For example, a condition where we state something such as:

WHERE TEXT <> ‘BOWIE’

means we want all the other possible values of TEXT, just not those with the specific value of ‘BOWIE’. In other words, we’re typically interested in the vast majority of possible values when we specify a NOT EQUAL condition.

However, we all know that typically, Oracle will not use an index if generally a relatively “high” percentage of rows are to be selected. It would generally be more efficient and less costly to simply perform a Full Table Scan if most rows are going to be returned anyways.

Therefore the CBO simply ignores indexes when costing a NOT EQUAL condition. Why bother going to all the overhead of calculating the cost of using an index to retrieve the vast majority of rows when a Full Table Scan is going to be the cheaper alternative in the vast majority of such cases. So the CBO doesn’t even bother trying and ignores all indexes that could potentially be used to retrieve the rows based on the NOT EQUAL condition.

But what if the data isn’t evenly distributed and the NOT EQUAL condition actually retrieves only a relatively small proportion of the rows. What if most rows actually have the value specified in the NOT EQUAL condition and the remaining rows constitute a relatively small proportion of the remaining rows ?

When the CBO ignores indexes, it ignores indexes in all cases. Even if 99.99% of rows match the value in the NOT EQUAL condition and there’s only a handful of remaining rows to actually be retrieved, the code path in the CBO is still followed and indexes are ignored regardless. The reason possibly being such queries could be re-written to avoid the use of the NOT EQUAL condition and so its use is still suggesting a large selectivity.

The refusal of the CBO to consider an index with a NOT EQUAL condition can easily be illustrated.

First, let’s create a table and populate a TEXT column with the same value, ‘BOWIE’:

We note that Oracle has decided to not use the index but use a FTS instead. If we look at the relevant parts of the 10053 trace, we note that the CBO did not even cost or consider using the index. The index was basically ignored and not considered at all:

You can try to hint the query but the CBO will still ignore any RANGE SCAN operation because the CBO can’t know what all other possible potential values that are not ‘BOWIE’ might be (remembering the statistics may not necessarily be accurate). It can perform a FULL INDEX SCAN but this means reading all the leaf nodes that contain all the unwanted ‘BOWIE’ values and so it still an inefficient option:

Or, if there a many different distinct values that are not ‘BOWIE’ but which in total still constitute a relatively small percentage of the total rows, then it could be re-written as follows which can make use of the index in an effective manner by concatenating two separate index range scans:

How about “select text from bowie where text ‘BOWIE’” … you’d imagine that the CBO would head straight for the index for that one if the table average row length were big enough to make an FTS less efficient than a full or fast full index scan.

Yes, very good point, thank you. This is actually something I cover in my index seminar when I discuss bitmap indexes.

If you have a set of rowids and you want to elliminate some of them based on a NOT condition on another index, bitmap indexes can do this very well. It can scan the bitmap looking for all the rowids you don’t want and effectively subtracts them from a set of rowids you do want.

However, a single NOT condition will not be serviced via a bitmap index.

Richard –
another good article.
You gave a good example of turning a negative into a positive (“” to ).
How about an in list predicate?
for example…
select deal_nbr
from big_table
where deal_nbr not in (select deal_nbr from deal_lookup_t)
there is an index on the big_table deal_nbr not being used in this form.

Do recommend a common approach to such a query? I’ve played with various forms of inline views and such but can’t get it to where i want/need.

Another way I have come across to allow efficient access for NOT type predicates is through a CASE construct. I have seen this work well for a grocer that needed to report “NOT tobacco” sales at department level… something like:
… (case when department = 1 then ‘tobacco’ else ‘non-tobacco’ end) = ‘non-tobacco’ …

From at least a theoretical point of view it is in fact possible to build an example so that performing an FULL INDEX SCAN will be an efficient option. It suffice just to modify a little bit your example:

As discussed, if the number of rows of interest are low and you are excluding lots of distinct values, rewrite the query if possible to select the actual values of interest or introduce a new flag column that can identify such rows.

Depending on how the index is accessed, the size of the index is of a lesser concern if it can outperform other alternatives. The best way to make the CBO “relax” and for it not used a FTS inappropriately is to give it accurate statistics on which to base its decisions (or to give it the generally uncommon “take it easy, sit back and have a cup of tea” hint).

[…] Richard Foote’s blog on indexing a not equal to predicate is useful and gives me the idea of trying to amend the query ACTION <> ‘D’ to ACTION >’D’ and adding an index on ACTION and seeing how that works out. Looks like my invisible index blog entry will come in useful for that […]