Concatenated Bitmap Indexes Part I (Two Of Us) May 6, 2010

Although Bitmap Indexes are commonly created on one column, you can create multi-column, concatenated Bitmap indexes as well.

Many of the same issues and factors in deciding to create a single, multi-column index vs. several, single column indexes apply to Bitmap indexes as they do with B-Tree indexes, although there are a number of key differences to consider as well.

The first difference to note is that a bitmap index doesn’t have as many index entries as there are rows in the table (with not null values), as with B-Tree indexes. A Bitmap index can potentially just have only as many index entries as there are distinct values for the indexed columns. This is one of the main reasons why Bitmap indexes can be considerably smaller than equivalent B-Tree indexes. A newly created Bitmap index only needs to have multiple index entries for the same column value if the associated index entry is greater than 1/2 a block size. If an index entry were to be larger than 1/2 a block size, Oracle creates another Bitmap index “piece” for the same indexed value, with a bitmap column covering a different range of rowids. (Note: additional Bitmap index pieces can be created based on subsequent DML, especially in earlier versions of Oracle. To be discussed at a later point in time).

Another thing to note regarding a concatenated Bitmap index is that the potential number of index entries is a product of distinct combinations of data of the indexed columns. For example, if two columns have 100 distinct values each, then as separate Bitmap indexes, they potentially may have as few as 100 index entries each. However, when combined as a concatenated Bitmap index, there may be a minimum of just 100 index entries only if there are just 100 different combinations of data between the columns (ie. there is a 1 to 1 relationship between column values). If there are however say 100 x 100 = 10,000 maximum combinations of data, there will be 10,000 index entries as a minimum in the associated concatenated Bitmap index.

A key point though is that if there are many more combinations of data when stored in a concatenated index, the occurrence of each distinct value will be far less within the table, meaning there will be many more “0” (not true) values in the corresponding bitmap column for each index entry and so can be compressed more effectively and likely use substantially less space than a corresponding index entry in a single column Bitmap index.

A simple little example to illustrate. To start, I’m going to create a 1M row table that has an ID and a CODE Colum, each with 100 distinct values.

In this first example, there’s an implicit 1 to 1 relationship between these 2 columns (eg. they always have the same corresponding values) such that there’s also only 100 distinct combinations of ID and CODE. Additionally, the values are distributed evenly throughout the table so the effective clustering of the data in relation to the index is awful.

Firstly, even though there are only 100 distinct values for each indexed column or columns, there are actually 400 index entries in these indexes. This means there are on average 4 Bitmap index pieces for each distinct indexed value. Oracle can’t fit the associated index entry for a specific value within 1/2 a block (in this example, the block size is 8k). In fact, it takes approximately two 8K index leaf blocks it fit all the index data for a specific value and it therefore requires 4 Bitmap index pieces per indexed value.

If we look at a partial block dump of a leaf block from say the BOWIE_1_I:

We notice there are actually 2 index entries within the block. Each index entry is for the same indexed value (both col 0 have identical values) but because the associated bitmap column (col 3) is so large (3926 and 3925 byes respectively), we need 4 index entries on average to store the bitmap data for each specific indexed value in order for each piece to not exceed the 1/2 block size limit.

Remember, for 100 distinct values in a 1M row table, that’s approximately 10,000 occurrences for each distinct value that need to somehow be mapped within the index. Oracle needs 4 index entries per value to fit the necessary bitmap information with the index such that no single index entry exceeds the 1/2 block size limit.

The next thing to note is that the size of the concatenated Bitmap index is actually about the same size of each of the individual single column Bitmap index (200 leaf blocks). Therefore, the overall storage required to store the two columns in one Bitmap is only half of that required to store the columns as separate Bitmap indexes.

We notice the overall length of an index entry is practically identical to those of the single column index. The storage required to store the additional indexed column (col 1) within the index entry is just 3 byes in the above example, 2 bytes for the numeric index value and 1 byte for its length. Considering the length of the bitmap column (col 4) is in the order of 3920 bytes for each index entry, an additional 3 bytes here or there is trivial and so doesn’t impact the overall size of the Bitmap index.

OK, let’s look at a slightly different example. The table is again 1M rows with the overall data being similar. However, I’m making a few subtle differences. Firstly, the data for the ID is actually perfectly clustered and is ordered in exactly the same manner as the index. Additionally, the distribution of data between the columns is such that there are now 100 x 100 = 10,000 distinct combinations of ID and CODE values.

We see a number of key differences when compared to the Bitmap indexes in the first example. Firstly, the Bitmap index for the ID column is tiny, just 25 leaf blocks compared to the 200 leaf blocks required previously. Additionally, there are only 100 index entries, rather than the 400 previous index entries. This is due to the fact the data is perfectly clustered within the table and as such, all the “1”s (true) are all grouped together and all the “0”s (false) are likewise grouped together and can be compressed very efficiently. The overall size of the bitmap has now reduced such that it can all fit comfortably within 1/2 a block and so just the 1 index entry is necessary for each indexed value.

We see that the bitmap column (col 3) is now only 1797 bytes and the overall index entry is 1818 bytes, which comfortably fits within 1/2 an 8K block.

The Bitmap index for the CODE remains unchanged because its values are still poorly clustered and distributed throughout the entire table.

The concatenated Bitmap index is now significantly larger than is was previously (417 leaf blocks, up from 200 leaf blocks), primarily because we now have 10,000 distinct values to deal with rather than just 100. However, as the occurrences of each of these 10,000 distinct values is so much rarer (only 100 occurrences per distinct combination of values), the associated bitmap column is going to be relatively small and well compressed for each Bitmap index entry.

If we look at a complete index entry from a partial block dump of the concatenated Bitmap index:

We notice the index entries are relatively small at just 326 bytes even though we have 2 indexed columns (col 0 and col 1). The size of the bitmap column (col 4) is just 302 bytes, only a fraction of that of the single column Bitmap indexes. With so few 1s (true) to contend with, the resultant bitmap is far more efficiently compressed than with the single Bitmap column indexes as it has far more 0s (false) than can be effectively compressed.

A concatenated Bitmap index can potentially use less or more space than corresponding single column indexes, it depends on the number of index entries that are derived and the distribution of the data with the table.

Julian Dykes presentation is mandatory reading for anyone interested in Bitmap Index internals. Of course, sometimes it’s difficult to understand all the concepts from just a powerpoint presentation, that’s why seeing and listening live to these types of folks in action can be so rewarding.

You said:
“The storage required to store the additional indexed column (col 1) within the index entry is just 3 byes in the above example, 2 bytes for the numeric index value and 1 byte for its length.”

If I look at index entry,

col 1; len 2; (2): c1 02

It’s saying: column name, length of the data it’s storing and actual data. How you’re concluding that 1 byte is being used to store it’s length part. Also, how we interpret the len 2 part i.e., len 2