Indexing A Column With Just One Distinct Value (All The Madmen) August 10, 2011

When one thinks about a column that might benefit from being indexed, one generally considers columns with lots of different values so that the selectivity of the column is such that relatively few rows get selected, making the index appealing to the cost based optimizer.

There are of course many exceptions to this generalisation and there are times when an index is the perfect and most efficient method when selecting 100% of all data, even if it involves a normal index range scan reading every row out of the table.

There are also times when a column that has very few distinct values should be indexed, even with a standard B-tree index.

In this example, I’ll index a column that only has just the 1 single value and the CBO will choose to use the index gladly.

Just going to create my standard little table and create an index on the CODE column:

Note that the index statistics clearly shows it only has the 1 distinct value. Remember, NULLS are not indexed by default in B-Tree indexes and as a result the index is tiny with just the one leaf block:

Therefore, Oracle with accurate statistics and without requiring any histograms has all the information it needs to know when selecting rows that contain the one and only distinct value of our CODE column, that it will only actually be selecting 100 rows out of the 1 million rows in the table.

The CBO has the rows estimate/cardinality spot on (100) and has decided that the index is indeed the most efficient access path to select the 100 rows of interest.

With flag and bollean type columns, it might be worth some consideration simply storing the one value when one value is relatively rare, such that the resultant index can take advantage of not having to store any of the corresponding NULL values.

Even columns with just the one distinct value can potentially benefit from being indexed …

It all comes down as always to the relative costs associated with accessing the rows via the index (so the clustering factor as well as the overall number of rows is critical) vs. the cost of other alternatives (such as a full table scan).

This, of course, is one of the great benefits of function-based indexes (when you can also modify the SQL) – you can take the multi-valued status column from Brian Tkatch and create as many function-based indexes on it as you like – one for each of the “rare” values – defining each index to expose the rows in a given status.