It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ? February 13, 2008

A common myth or mis-perception is that when deciding how to order the columns in a concatenated, multi-column index, one should avoid placing low cardinality columns in front.

For example, if you want to create an index on two columns, column ID which has many many distinct values and column CODE which has very few distinct values, create the index as (ID, CODE) as it’ll be far more efficient than a corresponding index on (CODE, ID).

The reasoning goes that by creating the (CODE, ID) index, one decreases the performance and efficiency of using the index as Oracle will have to scan through multiple index leaf blocks containing the low cardinality column, until it eventually finds the specific index entries of interest.

Or so the theory goes …

In actual fact, there’s no real difference in navigating to the specific leaf block of interest for an index on (ID, CODE) compared to an index based on (CODE, ID), providing both indexed columns are known.

The important fact that’s missed is that the branch index entries contain column entries based on all indexed columns, or at least on as much as is necessary to uniquely identify the required navigational path. Therefore, Oracle can directly navigate to the leaf block of interest, no matter the index order, providing all index column values are know.

The only slight overhead that an index based on (CODE,ID) will have is that these branch index entries are going to be somewhat larger as it will likely require both columns for the branch index entries but likely only the one column the other way around. However, branch blocks usually take up a small percentage of the overall index structure and this (usually) minor overhead is very unlikely to make a difference to the index height.

This demo on Index Column Cardinality Order shows how Oracle navigates to a specific leaf block of interest in the same manner and with the same costs, regardless of the ordering of low and high cardinality columns in the index. It also shows and describes a couple of index branch block dumps to highlight how Oracle uses the column values to define the necessary navigational path.

So the high cardinality column shouldn’t necessarily be the column given leading column status.

In actual fact there are a number of good reasons why the low cardinality column could be considered as the better option as the leading column. For a start, the index can be compressed much more efficiently if the leading column has lower cardinality. Also, an Index Skip Scan can at least be considered if the leading column has lower cardinality.

Of course, the ordering of columns in an index can be very significant and can make a huge difference to the possible efficiency of an index for other key reasons as well. Whether the leading column is always going to be a known value is an important consideration, as is the clustering factor of the leading column.

All good discussions for another day🙂

Share this:

Like this:

Related

But picture changes if you want find by ID column only (SELECT * FROM ziggy_stuff WHERE id = 4242). Index by (ID, CODE) produced the same 6 CRs, but 22 CRs when index build by (CODE, ID), due Index Skip Scan. Therefore, in many cases preferable to place high cardinality columns in front.

As I clearly stated, “Oracle can directly navigate to the leaf block of interest, no matter the index order, providing all index column values are know“.

I also clearly stated “Of course, the ordering of columns in an index can be very significant and can make a huge difference to the possible efficiency of an index for other key reasons as well. Whether the leading column is always going to be a known value is an important consideration”

The issue you’ve raised has nothing really to do with the cardinality of the column and everything to do with the fact in the “bad” performing example, the leading column is unknown and Oracle is forced into the relatively inefficient Index Skip Scan.

If the leading column is unknown, the index will be either inefficient at best or totally useless at worst, regardless of the cardinality of the columns.

In many (if not all cases) it’s preferrable to place “columns with a known value” in front …

Therefore, you only have 10000 index entries which only requires 31 leaf blocks which can all be referenced by just the one branch block (which therefore happens to be the root block).

If you look in dba_indexes, you’ll find the index will only have a BLEVEL = 1, meaning the index only consists of the root block and the 31 leaf blocks. There are no intermediate branch blocks in your index.

What about updates? for example I have a composite index with 3 columns in a 50M table. ucol1+ccol2+scol3
ucol1=unchanged column, primary key
ccol2= changed column that changes by batch 300k to 500k rows
scol3= sometimes changes, 1 row at a time by OLTP

most of the queries are using col1+col2+col3, Iam trying to tune the batch update and am thinking that we should move col2 as the last column in the index. ccol2 is a status col so goes from created to executed to completed, 90% of the rows will be completed.

Do you think moving col2 as last in the order will help the batch updates? not 100% sure as i think it will always delete/insert index block rather than update…what do you think, will it help batch updates? if so why? if not why? Thank you

I’m just about to pop on a plane so sorry for the short response but I would suggest it wouldn’t make much different either way as in both scenarios, the entire index entry is having to be deleted and re-inserted elsewhere with the index structure (even if it’s logically within the same leaf block).

It’s one of those things you could always benchmark although the option of dropping / marking unusable the index and then rebuilding after the batch process, may improve the batch process by enough to justify the additional overheads of such a move.

I think that you will find several index myths repeated in Mr. Burleson’s latest Definite Reference book. Could you devote a couple of articles to address those myths – which would be helpful information in the event that someone searches for supporting evidence for what is printed as fact in the book? It appears that much of the coverage of indexes in the book starts at page 906:http://books.google.com/books?id=hiOhVSO-EFcC&pg=PA906

There is also a bit more starting at page 521.

For example:
On page 982, we find that a certain way for a DBA to get fired is to waste computing resources due to keeping an excessive number of indexes.
On page 985, we learn that a composite index cannot be used to police a primary key constraint.

Blimey, that has got to be the funniest read I’ve had in a long time !! The number of errors and inconsistencies are just staggering, it reads like some kinda tragic comedy.

I think my favourite bit is where he recommends dropping a so-called duplicate index which could very well result in the crippling of the associated database. I love to know what a “unique foreign key” is😉 I also love how he recommends on one hand caching all your indexes in a 32K cache and later says databases really should be as much as 60% indexes, meaning you should of course simply just cache 60% of your entire database but then says random I/Os waste memory and so store such tables in a 2K block whereas it’s perfectly OK to waste memory for the random I/Os for all your indexes !!

The bit where he has a pretty diagram listing all the factors that influences Oracle’s choice of index (such as raid level and striping) but forgets to mention index stats is a classic.

I actually dispel many of the myths listed in the book already in previous posts (large block size myth, bitmap low cardinality myth, etc. etc.) although perhaps it might be worth going back to basics with some of the elementary errors (such as how pctfree and freelists really work for indexes, PK and indexes, etc. etc.) for newbies who might read the book and know no better.

Based on what I’ve read, I give it a 1/2 star out of five, the 1/2 star only because of its comic value😉

Thank you for taking some time to examine the technical accuracy of the index related content in that book.

I agree that you have already addressed many of the false statements in previous blog articles and in Usenet posts dating at least as far back as 2003 (yes, some of those mistakes pointed out in Usenet posts from 2003/2004 were copied into the book). I assume that suggests that at least one person working with Oracle Database still believes those points to be true, and hopes to convince others of the same by printing the information in book form.

Your posts have a very fluid, logical, consistent, and easy to follow writing style. If you have the time, I think that it would be a beneficial to the Oracle community if you were to write a back to the basics series of blog articles.

I can’t believe someone who had the correct explanation detailed to them years ago (such as how Oracle leaf blocks actually split and that no, it’s not possible for an index to have a greater depth in one part of the index from another part of the index) would still repeat this type of nonsense😦

Page 727: “Hence an Oracle index may have four levels, but only in those areas of the index where the massive inserts have occured“.

And:

“Note Oracle indexes will spawn to a fourth level only in areas of the index where a massive insert has occured, such that 99% of the index has three levels but the index is reported as having four levels“.

What rubbish and I explained why to him way way back in Feb 2003 when he made the same mistake: