Bitmap Index Degradation After DML Prior To 10g (Beauty and the Beast) May 25, 2010

Bitmap Indexes have a bad reputation with regard to being problematic and suffering from severe degradation following heavy DML operations, especially larger amounts of insert activity. Bitmap indexes have been known to grow substantially and require periodic rebuilds following such insert activity.

While this was certainly the case in previous versions of Oracle, things have dramatically improved since version 10g. I thought it might be worthwhile explaining why Bitmap Indexes had such issues prior to 10g and how things are now so much better, in many cases making Bitmap index rebuilds unnecessary.

To start with, a demo on an Oracle 9.2.0.7 database. I’m first going to create a simple little table with 1M rows, with a Bitmap index on a CODE column with 1,000 distinct values. The values are effectively inserted into the table in an evenly distributed manner throughout the entire table.

We notice that the Bitmap index has 1000 index entries, one for each index value. There are 500 leaf blocks which means we can only fit 2 index entries in each leaf block on average. If we look at a partial block dump of the first leaf block:

I’ve listed just the first portions of each of the 2 index entries in the leaf block. Each indexed value is different (col 0) and each have a bitmap string of 2998 bytes in size (col3). Therefore in an 8K block, Oracle can’t indeed fit a 3rd index entry in a block, hence why there are only 2 index entries per block. The bitmap string column is quite large because in a 1M row table, there are 1000 occurences of each indexed value littered throughout the table that need to be referenced.

OK, next I’m going to insert just 1 additional row. It has a CODE value of 1, which is the minimum existing CODE value and so should be indexed in the first index leaf block. Let’s see what impact this has on the index statistics:

The crucial point here is that although the value (1) already existed within the index, Oracle has nonetheless created an additional index entry (1001 NUM_ROWS up from 1000). If look at a partial dump of the leaf block now:

We notice there is indeed a new index entry (row#1) which only has a rowid range (col 1 and col 2) of just 8 rows (between 02 02 d6 2e 00 00 and 02 02 d6 2e 00 07). So rather than somehow modify the existing index entry, Oracle has created a new index entry with a rowid range of just 8 consecutive rowids.

This means we should now be able to add an additional 7 rows with a CODE value of 1, which providing they can all fit within the same table block, will be associated with this allocated rowid range.

We notice we have yet another index entry (row#2) covering another narrow 8 row range.

Basically prior to 10g, if a new index value didn’t fit within an existing bitmap index rowid range for the corresponding indexed value, a new index entry is added and all the overheads associated with it, with a default range of just 8 rowids. This is an extremely costly process as the new index entry not only has to store the index value itself again, but additionally 2 rowids and corresponding length bytes and the necessary bitmap string column as well. Additionally, in all likelihood, the rows in the specific rowid range are quite likely to contain differing values in the index columns and so will in turn require new index entries. I’ve just inserted a whole bunch of CODE values of 1 in the above example. In a randomly inserted column value, this might be quite a rare event as each subsequent indexed value is quite likely to differ for each new consecutive row.

If we create a table and index first and then populate the table with the 1 million rows, the number of index entries and size of index itself becomes huge …

Because the clustering is so bad, each new row for a given index value isn’t within an existing 8 rowid window and so each and every row in the table requires its own bitmap index entry. As a result, the Bitmap index is huge at 5,347 leaf blocks and has a full1M NUM_ROWS. A partital block dump shows just how inefficient this index is:

We can see that each and every indexed value (col 1) in this partial leaf block dump is the same (hex c1 02) with each only having a rowid range (col 1 and col 2) that covers the bare minimum 8 row range. It’s about as inefficient a Bitmap index as you can get with 1000 distinct values in a 1M row table.

so where part ii then? i didnt know bitmap updates had changed in 10g. i’m old school and was once accustomed to having to rebuild hash keys and isams because they couldnt slot new index entries into the correct blocks when those blocks were filled, which was immediate, thus one got physical overflow, much like row chaining in tables.I’ve worked extensive testing with bitmaps in 25Tb+ DW and no performance increase in bitmaps, appends only. in the past, probably oracle9 i found entire database indexed using bitmaps :) somebody read it in the manuals. that person also built an entire app using shell scripting. i know that btree branch & leaf blocks can rearrange themselves on the fly, during DML activity, without rebuilds. can bitmaps now do the same 10g+?