Bitmap Indexes and Not Equal Part II (Sheep) July 7, 2011

An excellent comment/question by mdinh made me realise my demos in Part I might be a little extreme in returning 0 rows and perhaps give the false impression that Not Equal conditions are only considered or applicable if no rows are returned. This is not the case and with the bitmap index now considered with Not Equal conditions, the choice of whether or not to actually use the index as usual comes down to the comparative costs associated with the available plans.

So, I’ll expand on my demo a tab by introducing a new value for the FLAG column:

OK, so now we have some 100 rows which have a value of FLAG which are not equal to 42, which are evenly distributed among all 5 CODE values. I’ve created a histogram however on the FLAG column as the 2 values (1 and 42) are not evenly distributed.

We notice a few key points. Firstly, as we have a histogram on the FLAG column and the data is perfectly evenly distributed among the CODE values, the CBO has got the estimated cardinality of 20 rows spot on. So all things being equal, we can have some confidence the CBO has done the right thing and selected the most efficient execution plan.

We also notice that the cost has now gone up considerably to 46 (up from 3) but it’s still significantly less than the cost of 761 associated with a Full Table Scan. Therefore, the CBO has still chosen the same execution plan with the two bitmap indexes returning the 20 rows, as it did when it returned none in the previous example.

Yep, cheap with Bitmap Indexes says the CBO. Note the difference here though is that the CBO uses a BITMAP MERGE to first get all possible rowid values of FLAG that are > 42 and then uses a BITMAP AND operation in combination with the CODE Bitmap index to get all rowids that match from both Bitmap indexes. However, as it evaluates the Bitmap Index on the FLAG index first and there are no index entries with a value > 42, it doesn’t have to actually worry about the CODE condition as no rows can possibly be returned. Therefore a very tiny 2 consistent gets are all that are necessary.

The following will looking for anything < than 42, remembering we now have 20 rows that meet this condition:

Yep, also cheap with Bitmap Indexes, using the same plan as the previous > than example but using more consistent gets as there are a number of rows that need to be accessed this time (although all in the same data block).

We now get a combination of both previous plans, concatenated together. Note in this case, it’s actually a cheaper and more efficient alternative to the first Not Equal example. Having got the setup for this demo, you can of course create the same demo yourselves and have a play and experiment. Me, I’m now off to watch Cadel Evans win the Tour De France :)

With 2 distinct values, it doesn’t matter whether you specify 2 or 5 as Oracle will only use as many buckets as there are distinct values and ignore the rest. Personally, I think it’s actually good practice to specify a few more than necessary in case an extra value or two sneaks in someday and you suddenly start using height based histograms that could change things unexpectedly.

Oracle [determines] the maximum number of rows that could potentially fit within a block and assigns a bit for every possible rowid (I’ll expand on this point in my next post).

Will that explanation be in the next, next post? I’m very interested in understanding that mechanism. With variable length columns being the norm, it seems like the difference between the actual number of rows/block and the theoretical rows/block would be quite different. Right? I know you mentioned a lot of the consecutive zeros would get compressed but why would they all be consecutive? Wouldn’t the theoretical rowids be evenly interspersed between existing rowids?

Ok, so I thought that the “row” portion of the rowid was an offset and therefore the “unused” rowids could exist between actual rowids but if that portion is a counter then of course they will all be at the “end”. i.e. a block COULD hold 100 rows but actually holds 25, there will be zeros for potential rowids 26-100. Is that right?