One of the great myths in Oracle is that bitmap indexes are only suitable and should only be used with columns that have so-called low cardinality (few distinct) values. A classic example of this myth being propagated is in this latest Burleson news item, “Oracle bitmap index maximum distinct values“, from Feb 16 2010 (article itself updated January 8, 2010):

It says “As the number if distinct values increases, the size of the bitmap increases exponentially, such that an index with 100 values may perform thousands of times faster than a bitmap index on 1,000 distinct column values.”

Over 10,000 distinct values – At this point, performance is ten times slower than an index with only 100 distinct values”

Now this of course is all generalised nonsense. Not only can a column with 10,000+ distinct values be perfect as a bitmap index but it can be considerably smaller than a bitmap index with only a handful of distinct values, even with columns of the same size and data type

A very simple example to demonstrate. First, I’m going to create a table with 1 million rows and have a CODE column that has 10,000 distinct values and a TYPE column with just 4 distinct values:

OK, so it’s a BLEVEL 2 index with 1752 leaf blocks. Let’s now compare it with an equivalent bitmap index. As the column only has 4 distinct values, it should be perfect as a bitmap index and much smaller than the B-Tree:

Indeed it is smaller. It’s now just 84 leaf blocks in size, down from the previous 1752 leaf blocks. The Blevel has even reduced to just 1.

OK, let’s attempt something really silly and outrageous. Let’s create a bitmap index on the CODE column, a column with 10,000 distinct values. I know, I’m crazy to even suggest such a thing as the resultant bitmap will simply be “HUGE” right ?

Well, would you look at that. It’s not “huge” at all, it’s just a tiny 52 leaf blocks !! In fact, it’s actually smaller and just 62% the size of the TYPE bitmap index that only had 4 distinct values.

So a bitmap index with 10,000 distinct values is actually smaller, more compact and more efficient that a bitmap index with just 4 distinct values !!

Why so small ?

Because the size and efficiency of a bitmap index doesn’t just depend on the number of distinct values but a whole range of other factors as well, not least the size and the clustering of the data in the table. Clue: I inserted the data into my BIG_BOWIE table in a very careful manner. However, one does need to actually understand how bitmap indexes work and how they actually store data to appreciate and determine whether a column is suitable for a bitmap index.

In short, there is no limit to the number of distinct values by which a column is suitable for a bitmap index. You could have a column with 10s of millions of distinct values (in say a billion row table) and a bitmap index might be perfectly suitable and significantly smaller than an equivalent B-Tree index. This is because a B-Tree index needs to store each and every occurence of a column value (unless the index is compressed) as well as a corresponding rowid whereas a Bitmap index might only need to store each distinct column value once with just 2 corresponding rowids. The savings in space can be massive, even if there are relatively few repeated occurences of each distinct value on average.

The next time you unfortunately read that bitmap indexes should only be used with very “few” distinct values and would be “huge” otherwise, well hopefully you’ll appreciate that’s simply not correct.