The Cost Based Optimizer (CBO) is a rather complex piece of code that has to deal with countless different possible scenarios when trying to determine what the most optimal execution plan might be. It’s also a vitally important piece of code because not only do the decisions need to be reasonably accurate so that it doesn’t generate inefficient execution plans but it needs to make these decisions in a reasonably efficient manner else it wastes resources and most importantly wastes time while it performs its calculations.

So there’s a trade-off between ensuring the CBO makes reasonable decisions while ensuring it makes its decisions in a timely and resource efficient manner. Database performance could be directly impacted if these trade-offs are not managed effectively.

Therefore, there are all sorts of short cuts and assumptions that are coded into the CBO to make its life a little easier. However, these short cuts can sometimes be problematic if they’re not recognised and handled appropriately.

One of these little short cuts worth noting is how the CBO deals with NOT EQUAL (and NOT IN) conditions …

Typically when we have a condition where we just say NOT EQUAL, we’re basically suggesting we’re interested in the vast majority of possible values with the exception of the value specified in the NOT EQUAL condition. We want most values but not if it’s this particular value.

For example, a condition where we state something such as:

WHERE TEXT <> ‘BOWIE’

means we want all the other possible values of TEXT, just not those with the specific value of ‘BOWIE’. In other words, we’re typically interested in the vast majority of possible values when we specify a NOT EQUAL condition.

However, we all know that typically, Oracle will not use an index if generally a relatively “high” percentage of rows are to be selected. It would generally be more efficient and less costly to simply perform a Full Table Scan if most rows are going to be returned anyways.

Therefore the CBO simply ignores indexes when costing a NOT EQUAL condition. Why bother going to all the overhead of calculating the cost of using an index to retrieve the vast majority of rows when a Full Table Scan is going to be the cheaper alternative in the vast majority of such cases. So the CBO doesn’t even bother trying and ignores all indexes that could potentially be used to retrieve the rows based on the NOT EQUAL condition.

But what if the data isn’t evenly distributed and the NOT EQUAL condition actually retrieves only a relatively small proportion of the rows. What if most rows actually have the value specified in the NOT EQUAL condition and the remaining rows constitute a relatively small proportion of the remaining rows ?

When the CBO ignores indexes, it ignores indexes in all cases. Even if 99.99% of rows match the value in the NOT EQUAL condition and there’s only a handful of remaining rows to actually be retrieved, the code path in the CBO is still followed and indexes are ignored regardless. The reason possibly being such queries could be re-written to avoid the use of the NOT EQUAL condition and so its use is still suggesting a large selectivity.

The refusal of the CBO to consider an index with a NOT EQUAL condition can easily be illustrated.

First, let’s create a table and populate a TEXT column with the same value, ‘BOWIE’:

We note that Oracle has decided to not use the index but use a FTS instead. If we look at the relevant parts of the 10053 trace, we note that the CBO did not even cost or consider using the index. The index was basically ignored and not considered at all:

You can try to hint the query but the CBO will still ignore any RANGE SCAN operation because the CBO can’t know what all other possible potential values that are not ‘BOWIE’ might be (remembering the statistics may not necessarily be accurate). It can perform a FULL INDEX SCAN but this means reading all the leaf nodes that contain all the unwanted ‘BOWIE’ values and so it still an inefficient option:

Or, if there a many different distinct values that are not ‘BOWIE’ but which in total still constitute a relatively small percentage of the total rows, then it could be re-written as follows which can make use of the index in an effective manner by concatenating two separate index range scans: