Unique Bitmap Indexes Part I (Unnatural Selection) March 24, 2010

As I’ve discussed previously, a Bitmap index can be considered over a B-tree index (where concurrent DML is not an issue) even if there are potentially tens of millions of distinct values, in a table that has say hundreds of millions of rows.

However, if a column is unique or “approaches” uniqueness, then one has gone too far and the bitmap index is going to be larger and less efficient than an equivalent b-tree index. So you wouldn’t consider a bitmap index on a column with a million distinct values if the table itself only has in the vicinity of a million rows as well.

To understand why a column approaching uniqueness shouldn’t be considered as a bitmap index, one only needs to understand the structure and corresponding differences of index entries in both bitmap and b-tree indexes.

I’ll begin by creating a simple table and populating it with a million rows.

We notice the length of these first 5 index entries are all 11 bytes (len=11).

An index entry from this Unique index basically consists of the indexed value (col 0) which is 2 bytes in length in the above sample plus the following overhead:

2 bytes for flags and locks
6 bytes for the rowid
1 byte for the index column length

So there’s a total of 9 bytes of overhead per index entry in this index in addition to the index value itself. Note also there’s an index entry for each and every indexed value. This is always the case for a non-compressed b-tree index.

If we now compare this with an equivalent Non-Unique index on the same column:

We notice the index is now somewhat larger than the equivalent Unique index, with there now being 1999 leaf blocks, an increase of 124 leaf blocks. A block dump of a leaf block reveals the key difference:

As I’ve discussed previously, Oracle makes the Non-Unique index effectively unique by adding the rowid as an additional indexed column within the index entry (col 1 is this additional index column comprising the rowid). There are therefore 2 columns in the index entry, not just the one (denoted by col 0 and col 1). This ensures all duplicate indexed values are subsequently sorted in rowid order within the index and can be efficiently accessed as necessary.

The consequence of this subtle difference is that an additional byte is now required to store the length of the rowid column and so the total overhead increases from 9 bytes to 10 bytes per index entry. The overall length of an index entry has therefore increased from 11 to 12 byes (len=12) and this results in the overall increase of 124 leaf blocks in the index, required to effectively store these additional 1 million bytes.

The index entry structure is now somewhat different. We now have an index that has not 1 column (as in the Unique index) or 2 columns (as in the Non-unique index) but 4 columns in the index entry. As before, we still have the index column value of 2 bytes but we now have the following overheads per index entry:

2 bytes for flags and locking (as before)
1 byte for the indexed column length (as before)
6 bytes for a rowid index column (col 1) stating the start of a range of rowids that are covered by the particular index entry
1 byte for the length of this start rowid index column
6 bytes for a rowid index column (col 2) stating the end of a range of rowids that are covered by the particular index entry
1 byte for the length of this end rowid index column
1 byte for the bitmap bit sequence column (col 3) required for all the bits referencing rows within the above rowid ranges
1 byte for the length of this bitmap column

So the total overhead for each of the 5 index entries listed above is now 19 bytes, not 9 or 10 bytes as for the equivalent b-tree indexes. The length of an index entry is therefore 21 bytes in total, not 11 or 12 bytes as for the equivalent b-tree indexes.

A few important points to note.

As the columns are effectively unique, the number of index entries are the same for both b-tree and bitmap indexes. A key advantage of a bitmap index over a b-tree index is that for each distinct value, a single index entry is sufficient to cater for a range of rowids, potentially covering the whole table. For example, a specific column value with 100 duplicates may only need the one index entry for the column value within a bitmap index, but would require 100 different index entries within a (non-compressed) b-tree. However, as the column in the above example is unique, there are no duplicate values and so this potential saving is not possible in this bitmap index.

Notice also the size of the bitmap string for each index entry is actually tiny, just a single byte, even though there are 1 million rows in the table. It doesn’t even look like it’s using a million bits to store the necessary bitmap string information for each index entry. This is because for each index entry, only one bit is ever set to 1 (“true”), all other occurrences are logically false as only 1 row in the table ever has the specific index value. Remember, the column values are effectively unique.

Therefore, Oracle can use a very narrow range of rowid ranges for each index entry and effectively not bother storing details for the vast majority of the possible rowid ranges within the table as there’s only one bit that’s of interest and it only corresponds to a specific part of the table. Even in cases where there might just be a duplicate here or there, most values would be zeros (false) regardless and can be compressed extremely efficiently (topic for another day).

Although many folks commonly think otherwise (see original Burleson article for a perfect example of the following misperception), if a column which is unique or is approaching uniqueness is indexed via a bitmap index, the overheads associated with the bitmap string in the index entry is usually very minimal as by definition most bit values are logically “false” (0), with only the rare “true” (1) bit value needing to be stored and reference.

The issue is not necessarily with the overheads associated with just the bitmap string per se but also with the other overhead components, namely the additional rowid and column length bytes.

In short, the bitmap index can’t be any more efficient that use just 1 byte to store the necessary bitmap string information (plus 1 byte for the bitmap string length), however 19 bytes of overhead is the minimum required, mainly because of the requirement to store 2 rowids instead of 1 rowid and for all the additional index column length bytes. If the bitmap index needs to cater for a wider range of rowids and for more occurrences of 1s (true) values, then the overheads associated with the bitmap sequence can of course be much more considerable than the 1 byte (again, a topic for another day).

Therefore, the bitmap index is going to be significantly less efficient if the indexed values are unique or near unique as there’s all this additional overhead per index entry without the subsequent savings by not having to store separate index entries for duplicates column values. There needs to be at least some measure of duplication within a column for a bitmap index to have some chance of being the more efficient when compared to an equivalent b-tree index.

However, how many duplicate values within a column are actually necessary for a bitmap index to be considered and be viable alternative ? The answer is far fewer than many may think (again see original Burleson article for a common misunderstanding in this respect), although this question will be addressed in a future post on the subject.

Share this:

Like this:

Related

Richard,
I know you said it’s a topic for another day, but could you throw some light on the encoding of the bitmap string (col3 in your bitmap index dump)? I mean, given that the rowid range covered is 8 rows, and that a byte has 8 bits, I would have expected to see the values 00, 01, 02, 04, and 08 in the bitmap block dump, not 00, 01, 02, 03, and 04. Why is that?
And I’m impressed that a bitmap of 00 is used to encode a bitstring of seven zeros and a one! Thinking about it, there appears to be no need to use 00 to represent a string of zeros – the mere presence of the index entry means at least one bit must be 1, so we could use the value 00 to encode something useful :-)

The reason why is simply because Oracle uses a single byte Hex value to represent a bit string pattern with the number of O bits that preceed a 1 bit value. For a range of 8 rowids that contain a single 1 bit, basically:

Richard,
thank you again for the broad explanation of the topic. I have one additional question: you write: “In part this is because the data is extremely well clustered however by far the more important consideration here is that for each index entry, only one bit is ever set to 1 (“true”) …”. I understand that the size of the bitmap is small because Zeros are compressed, but I don’t see were the clustering becomes important if we do not have any groups of subsequent 1/True – values. I assumed that the physical order of the table would not be relevant for a bitmap index on a unique column – is this wrong?

This blog piece started life as a discussion on how relatively few duplicate values are necessary for a bitmap to outperform a btree but I decide to change tack part way through and discuss a column that is totally unique instead. In the example I finally used, the clustering of the data within the table will indeed make no difference to the overall size of the bitmap index I created.

I’ll discuss “approaching” uniqueness and what that actually means and the impact of the clustering of the data in a different way in a future post. I’ll also cover how the CF is meaningless as captured for bitmap indexes as well.

I’ve change the text in the post to remove the error. Thanks for pointing it out, much appreciated.

Is it possible to extend the discussion regarding composite bitmap indexes block level storage. I haven’t find any document explaining about internal storage of composite indexes . If you have written any blogs regarding the topic, please point me to the necessary links.

These are restricted ROWIDs, whereas by “common representation” you probably mean the 10-byte extended ROWIDs. To convert a restricted rowid to an extended rowid you need the data object id, as demonstrated above. You can (in most cases) find the data object id of the table from file# and block# in dba_extents, but if you’re dumping indexes, you should already know on which table they are defined.