Rebuilding Indexes and the Clustering Factor Quiz (One Of The Few) September 20, 2011

Today’s question has been prompted by various recent comments regarding the Clustering Factor (CF) of an index and how to change the CF requires a reorg of the underlining table.

It used to be quite a common myth that if the CF of an index was greater that “X” or based on some nonsensical formula the CF was greater than “Y”, then rebuilding the index somehow made everything better again. I believe it’s now much more commonly accepted that rebuilding an index does not change the CF of an index in any way. By rebuilding an index, the actual order of the index entries remains unchanged as does the order of the rows within the table and so the resultant CF can’t possibly change.

Pick any index, no matter how bad the CF or badly fragmented the index or table, take fresh statistics and after rebuilding the index (and fresh statistics if compute statistics on the index isn’t performed), the CF will remain the same.

However, there are nearly always exceptions …

Give an example of when rebuilding an index will significantly change the actual CF of an index, assuming 100% accurate statistics are gathered before/after the index rebuild on a table with no DML.

Here are my (sensible this time, I hope) thoughts:
Since th CF depends only on two things – the order of the key values and the order of the rowids, a rebuild must change one of these in order to affect the CF.
We’re not allowed to move the table rows around, so that leaves only the order of the keys.
A rebuild can change that order if:
a) it reverses it, as David showed, or
b) a parameter has changed that affects the ordering of the keys – NLS parameters come to mind, probably in conjunction with a function-based index
In any case, we would end up with e different index, so the rebuild would not be safe.

For non-unique indexes the rowids are part of the key, but we can affect their order.
For bitmap indexes, two rowids per entry are part of the key and can be changed via a rebuild, but the CF has no meaning there.

Sorry, there is no Oracle for Maemo, so I cannot test any of this right now.

I’ve never seen a convincing explanation for why bitmap indexes should be treated like this. It is still provably (and intuitively) more efficient to physically cluster data segment rows by bitmap indexed columns, and that ought to make a difference to the cost calculation for bitmap index plans.

Unfortunately however, the answer is incorrect and does not qualify for a number of reasons.

Basically, the DOP does not in fact change the CF, it only changes the manner in which the stats are collected, which in turn can result in inaccurate statistics, rather than a change of the CF as such.

The wording in my question was quite clear, “Assuming 100% accurate statistics”, you simply have forced Oracle to sample 100% of data but in a more efficient manner that doesn’t result in 100% accurate stats.

It might all be worth a separate blog piece to discuss, but to prove my point, simply look at the CF after you perform the rebuild as the stats are automatically recalculated since 10g and before you use dbms_stats. You will see then that the CF is totally unchanged. When you actually collect the stats, the degree has now changed to 16, which means Oracle has no accurate method to merge the CF from the parallel processes which results in a less accurate figure.

So it’s not the rebuild that changed the CF it was recalculating stats with a degree greater than 1. You didn’t need to rebuild the index to do this, a simple alter index blah parallel 16 would have sufficed.

So, no good. If I accept this answer, than I would have to accept any answer that didn’t collect 100% accurate stats.

it seems that there are already two correct answers (rebuild reverse, rebuild parallel) so I want to add a wrong one: blocksize. Or more precisely: the blocksize of the index segment. A change of the blocksize of the table segment would change the clustering factor (that would be a physical reorganization) – but the blocksize of the index segment has no impact on the clustering factor.

No, rebuilding an index with a different DOP doesn’t count as a valid answer. The index entries still end up in the same order and so you end up with the same CF. You just collect less accurate stats when you use dbms_stats but you would end up with the same CF if you looked at the stats right after the rebuild.

Assuming that the Clustering Factor can be changed by a simple index rebuild under specific circumstances, then I would guess at a table with many deleted rows in it. Prior to the rebuild but after the deletion of many rows, the index could be quite large and sparse, with most leaf blocks having very few valid entries in them. This might lead to a high (poor) Clustering Factor (a guess on my part).

Following an index rebuild, the leaf pages would now be dense and fully populated with entries, which might decrease the value of the Clustering Factor (better clustering), assuming the data in the table was physically ordered by the columns in the index.

I can also see that if “statistics were up to date” before the rebuild, then the index would still be in order with regard to the data in the table, and so the Clustering Factor might be “correct” anyway, and not change. So just a guess from me.

What might change things more is if the index allowed duplicates and there were a great many duplicate values in the table before the deletion. And the deletion deleted all the duplicates but one. Is it possible that the index with many duplicates would not have them in any particular order within the same values in the leaf blocks, and so Clustering Factor could be poor (high value). Post deletion there are no duplicates, and the index rebuild would be perfect with a good Clustering Factor (low value).

I think you’re off the mark on this, as the algorithm for clustering factor would not depend on the index being sparse — as I understand it the index is processed in a range scan from low to high and the clustering factor is incremented by 1 for every rowid that represents a different data segment block to the prior rowid.

The number of index leaf blocks scanned would not be a consideration for the clustering factor, although it could well affect the cost determination for an index range scan (and maybe a single value lookup as well, albeit indirectly through affecting the b-level).

As I said, it was a total guess on my part. “Clutching at straws” as it were, trying to quickly think up anything that might have an impact on an index’s structure when rebuilt. Good to have it confirmed that deleting rows has no impact on the index statistics, and hence Oracle’s likelihood of choosing it.

in my understanding deleted rows in the index structure should have no impact on the clustering factor because the factor only tells the cbo “how likely contiguous rows from an index range scan will point to the same table block” (Randolf Geist: http://oracle-randolf.blogspot.com/2010/01/clusteringfactor-what-if-analysis.html). In Randolf’s Blog there is a query using some dbms_rowid functions to determine the clustering factor for an index without the need to actually create the index (and there’s a similar query in Christian Antognini’s Performance Tuning book too). So the clustering factor is only a change count for table block access when range scanning the index.

Hi Richard,
I tryed to do some tests for create a couple of situation in 10.2.0.5.0.
In the first example I tryed to alter index with rebuild reverse and CF is changed.
In the second I tryed drop and recreate the table but in this case CF remains the same.
Therefore the CF can be change but in my first example in worst.
I’ve for this table 37771 blocks and a CF for the index 108557, after rebuild it’s worsened 114889

It all comes down to Oracle not being able to look at a single rowid in the index entry to see if the block id of the rowid in the next index entry changes. There are the 2 rowids that cover a whole range of possible rowids and so it’s impossible to determine the CF by simply looking at the index itself.

So Oracle basically goes here’s an index entry that covers a whole range of rowids, increment CF by 1, here’s another index entry that covers a whole range of rowids, increment CF by 1, and so on for all index entries.

The processing would have to be totally different and Oracle just hasn’t ever bothered with it, because it would be quite a tricky thing to do no doubt with some accuracy. It can’t just look at all the 1 bits and see how clustered they are because the 0s in between could be other column values or phamton rows that don’t really exist. So it would have to pop down to the table itself all the time which would be really expensive.

But I agree, it would be a useful thing to have for bitmap indexes as the “real” CF is so critical to the efficiency of using the index.

Hmmm, I’m not convinced that it would be that difficult — the internal processing is more complex of course, but the statistics gathering process doesn’t have to deal with that because it just runs SQL statements. The clustering factor can be calculated from a simple query optimised to use the index regardless of it’s internal organisation, as Randolf Geist demonstrated.

Arguably there might be a more useful way of calculating the clustering factor for bitmap indexes that takes account of a reduced likelihood of encountering range-based predicates, though, as the current method would probably be a little prejudiced against high cardinality bitmap indexes.

CBO fundamentals page 71 says “You can often reduce the number of leaf_blocks (and, very occasionally, the blevel) of an index by
rebuilding the index; but rebuilding an index has no effect on the clustering_factor”.

Page 68 describes how clustering factor is determined and based on that I am really curios to know how clustering factor can change with index rebuild and no DML activity.