I’ve already written a few posts regarding concatenated indexes and things to consider (and not consider) when deciding the column order of a concatenated (composite) index.

A comment I see from time to time is that the whole question of column order within an index is now somewhat redundant anyways as Oracle since 9i has introduced the Index Skip Scan access path. Prior to 9i, if the leading column of an index wasn’t specified in a predicate, the index was effectively ignored by the CBO. However, if the leading column isn’t referenced now, Oracle can use the index anyways via an Index Skip Scan access path.

So the column order in a concatenated index doesn’t matter that much now, right ?

Well, not quite ….

An Index Skip Scan can only actually be used and considered by the CBO in very specific scenarios and is often an indicator there’s either a missing index or an exisiting index has the columns in the wrong order.

If the leading column of an index is missing, it basically means the values in subsequently referenced columns in the index can potentially appear anywhere within the index structure as the index entries are sorted primarily on the leading indexed column. So if we have column A with 100,000 distinct values and column B with 100,000 distinct values and an index based on (A,B), all index entries are sorted primarily on column A and within a specific value of column A, sorted by column B. Therefore if we attempt a search on just Column B = 42, these values could potentially appear anywhere within the index structure and so the index can not generally be effectively used.

However, what if the leading column actually contained very few distinct values ? Yes, the subsequent column(s) values could appear anywhere within the index structure BUT if these subsequent columns have relatively high cardinality, once we’ve referenced the required index entries for a specific occurrence of a leading column value, we can ignore all subsequent index row entries with the same leading column value. If the leading column has few distinct values, this means we can potentially “skip” several/many leaf blocks until the leading column value changes again, where we can again “probe” the index looking for the subsequent indexed column values of interest.

So if we have a leading column with few distinct values, we may be able to use the index “relatively” efficiently by probing the index as many times as we have distinct leading column values.

On the other hand, if the leading column has relatively high cardinality then an Index Skip Scan is not a viable option. An index can generally store many hundreds of index entries per index leaf block, depending on the block size and the average size of the index row entries of course. So if the leading column were to change just once on average within the subsequent index leaf block, Oracle would be forced to scan the next index leaf block anyways as it may contain the required index row entry.

For Oracle to be able to “skip” an index leaf block, the leaf block must contain nothing but the same leading column value as last changed in a preceding leaf block. Hopefully, there are many leaf blocks where the leading column value doesn’t change and so hopefully there are many leaf blocks that can potentially be “skipped”.

Therefore, the cardinality of the leading column is crucial for an Index Skip Scan to be viable. In the example above where we had 100,000 distinct values for columns A and B, unless the table is massive, it’s unlikely an Index Skip Scan will be viable regardless of which column is the first column in the index. However, if column B only had 10 distinct values, then an index based on (B,A) may very well be able to use an Index Skip Scan whereas an index on (A,B) would not.

Note though that an Index Skip Scan must probe the index at least as many times as there are distinct values of the leading column. This will not be as efficient as an index that only requires the one index probe. Therefore although a query with a predicate based on A=42 could use an Index Skip Scan with an index on (B,A) assuming column B had few distinct values, an index on (A,B) or (A) would be more efficient as it would only require the one index probe.

However, if the performance of index (B,A) were “good enough” and/or a search on just A=42 was uncommon, then the index on (B,A) may be quite adequate and an index on (A,B) may be unnecessary. The index on (B,A) would also be able to handle queries based on columns A and B and queries based on just column B (providing the CBO determined the selectivity acceptable, which it might for unevenly distributed rare values of column B).

No, an Index Skip Scan doesn’t mean we don’t need to consider the column order of an index. If anything, it’s something else that needs to be considered and along with index compression, is another reason why low cardinality leading index columns have advantages.

Absolutely. If all the required columns can be retrieved directly from an index, Oracle has the option of treating the index as a “skinny” version of the table. A Fast Full Index Scan (which is just an index version of a Full Table Scan) may result in less LIOs than a Full Table Scan and so is selected by the CBO.

BTW, I’m really really pleased to see someone have an indea, develop a quick demo to illustrate the idea and see whether the idea indeed holds water. Well done :)

In the event that the data is not on the INDEX, but, the searched COLUMN is in the second COLUMN of the INDEX, and all we want is one record, wouldn’t a FAST FULL INDEX SCAN plus one get to the TABLE be faster than a FULL TABLE SCAN, just because of the amount of blocks grabbed?

My test showed this not to be the case. (Though, i didn’t check block numbers, i’m not as familiar with all that.)

Looks like my copy and paste needs a little work. I also forgot to CREATE the INDEX in this example, but i did test it with an INDEX both yesterday, and again just now. The same basic: CREATE INDEX ziggy_stuff_id_code_i ON ziggy_stuff(id, code); after the second exec statement. Yet, it still does not use the INDEX.

Hi Brian, I agree that such a plan could very well be the way to go in a scenario such as this one, but as Jonathan points out, it’s not one considered by the CBO.

I suspect the reason is that it would simply be yet another thing for the CBO to calculate and consider which would further complicate an already complex process with regard to the FFIS and the processing involved during the actual FFIS.

Also such an execution plan would in the majority of cases be far bettered by a traditional index range scan using an index with a suitable leading column. So although the plan would possibly be better than a FTS, in most cases you would want to change it anyways to use a suitable index.

But I’m only guessing out loud …

Jonathan, what can I say, except what a great book !!

However, I must say I do feel a little like I did when I was a kid back in Manchester, England after the fellowship of the ring had broken up, with Frodo and Sam going one way and poor Merry and Pippin captured by the Orcs. I was dying to know what was going to happen next but I had to wait for weeks for the next book in the series to be available from the local library …

I guess what I’m trying to say is hurry up and write “The Two Towers” ;)

Brian, Richard: “In the event that the data is not on the INDEX, but, the searched COLUMN is in the second COLUMN of the INDEX, and all we want is one record, wouldn’t a FAST FULL INDEX SCAN plus one get to the TABLE be faster than a FULL TABLE SCAN, just because of the amount of blocks grabbed?”

My theory why Oracle does not do this would be that your description contains an assumption which Oracle cannot possibly know. The assumption is in “one get to the TABLE”: even for a multi column UNIQUE index there can be potentially many matches for this type of query. As far as I understand statistics they do not provide this type of information as there are no histograms that cover multiple columns. Am I missing something?

Don’t forget, Oracle has column stats so it knows the cardinality of columns. So if the second column has on average just two occurances of each distinct value (num of rows / distinct values), it would know there is likely only to be two visits (or whatever) to the table.

I’ll try to repeat in order to help me think and prove that I got it: assuming an index with columns (A, B). We have 20 distinct A’s and 10 distinct B’s. We look for B=x and know that on average there must be 2 index entries per B. Correct?

Well, that was way too easy – why did I not think of this? I had my mind in histograms all the way down when it is so easy. Grumble. Stupid me.

Not quite. The discussion is based on the fact we don’t know the first column A value. Therefore, the number of table visits would simply be number of not null rows divided by 10, the distinct values of B.

If we knew both A and B, then Oracle by default calculates the number of distinct combinations to be 20×10=200, and so the selectively of an A and B combination is 1/200, unless we’ve collected extended statistics with 11g.

If you have an Index Skip Scan operation, it means there is a more efficient way to access the data (ie. just one scan of an specific index rather than several).

If the column(s) used were actaully the leading columns of an index, it would be more efficient. The question that needs to be answered is would the likely performance benefit outweigh the costs of having another index, assuming the current index is being used efficiently by other processes. Or is the performance “good enough” and extra resource “low enough” to not warrant the creation of an additional index.

Yes you can improve performance but is it worth doing so. It’s a question that should be asked and answered if you encounter an index skip scan operation.

I have a question concerning “index skip scan”, after I went to a few experiments, I found out that the prefix column cardinality is not the only issue for the “Index Skip Scan”, the tests I ran are the followings

I firstly made a table called bbb which has 2 columns, col1 and col2, and I create composite index idx_bbb on those two columns
CREATE INDEX idx_bbb ON bbb(col1,col2).

And the after inserting data to bbb table (from empty condition), I gather table stats using DBMS_STATS

And for checking the result, I use explain plan, using the following statement :
explain plan for select * from bbb where col2=4;

and to recheck whether the Skip Scan is chosen or not looked at the 10053 event’s trace file

As you see at test 2, eventhough col1’s distinct values are only 3 and col2’s distinct values are 1000, index skip scan wasn’t executed. I wonder how could this be ?

Oh yeah, one more thing, when I delete/didn’t collect statistics, I also found that “index skip scan” is not even considered in the access path calculation (after looking at the trace file), is there any documentation that would clear this up ? Does Oracle mean to do it ? Since I couldn’t find any clue from Oracle’s documentations. Or is it just me who is having this case ???

If I’m not mistaken, although not collected, Oracle would assign the statistics to their default values, right ? So, I thought that “index skip scan” could be considered as well at the access path calculation

1) The cardinality of the first column to be low so that Oracle will only need a “few” probes of the index such that it can indeed skip leaf blocks to the point of making the index worthwhile

AND

2) The overall selectivity of the index skip scan be sufficient so that it’s associated costs are less than other alternatives.

The costs of the skip scan is very roughly the blevel of the index x cardinality of the first column(s) to be skipped plus selectivity of the indexed columns x cardinality of the first columns to be skipped plus the selectivity of the table x clustering factor of index.

I suspect in example 2, the table iat 1000 rows is only a few blocks and is cheaper to read than probing the index 3 times to find possible rows of interest.

Without stats, certainly if Oracle resorts to the RBO, then a skip scan won’t be considered. Not sure what you mean by “Oracle would assign the statistics to their default values” ? Depending on version and options, no stats on table and index will either use dynamic sampling or the RBO.

Thank you for the skip scan cost….As I thought, the cardinality would not be the only main issue… But, too bad the exact selectivity calculation would remain mystery hidden…

Oh yeah, I’m sorry for forgeting to mention the Oracle version I’m using. I’m currently using Oracle 11g at Linux Redhat 4 Enterprise Edition, and I also verify that Oracle’s optimizer_mode remain “all_rows”, so I think Oracle doesn’t resort to the RBO (or is there any parameter that should be considered as well ?)

Oh yeah, I also turn off the dynamic sampling (by setting optimizer_dynamic_sampling to 0) on purpose to verify whether index skip scan could be used without collecting statistics.

And about the “default statistics” I’ve mentioned, please refer to the link below

From what I inferred, if no statistics gathered, Oracle would use those numbers (or formulas) as “default statistics”. That’s why I’m wondering why wouldn’t Oracle at least consider index skip scan in it’s access path calculation even though there are “default statistics”

Couple of points from your questions. In the example with 1000 rows, note the table with an 8K block size has 2 blocks. That will have a rough cost of 2 associated with it (1 for the extent map, 1 to read the 2 blocks with a multiblock read).

The index which has a blevel of 1 will require at least a cost of 3 (to probe the the index 3 times for each value of col1) plus at least 1 to read the table for the expected table cardinality. The skip scan loses out because it’s basically more “costly” than the full table scan. The FTS cost of 2 is less than the skip scan cost.

The second point is that for skip scan to work, we need to have very few distinct values for the leading column. In fact, we need to have at least as many repeated occurances of a value on average as we can fit index entries in an average leaf block else we’ll need to visit each and every leaf block, which can be performed via a full index scan at the very least much cheaper than a skip scan.

With these default stats, the values are such that there is no way we can possibly have a cardinalty for a leading column so low as to make a skip scan viable. Therefore, if these default stats are the best stats we can get, then there’s no point even considering a skip scan as a full index scan will always always at the very least beat a skip scan.

Where I come from ? Not many people know my nationality this fast though (LOL, just kidding)

Thanks for clearing many things up. Oh yeah, I found that when index skip scan is considered, there would be 3 values : SS sel, ANDV, and SS io.

I’m just guessing and confirming, SS sel is obviously stands for skip scan selectivity, right ? But how about those other two ?? I once saw Jonathan Lewis’s writing that ANDV stands for actual Number of Distinct Values, do you have any clue about how to calculate this ??
And about SS io, I’m guessing it stands for Skip Scan IO, but, the thing that bothers me is, it always has the same value as ANDV. I haven’t found it yet but, is there any case that would make SS IO’s value and ANDV’s different ???

I have an issue which is slightly more involved (index partitions and skip scan for the third index column).

The statement is “DELETE FROM T where (Y=:b0 and Z=:b1)”

Of the four indexes the only one that is expensive is the one that gets used in very rare cases:

IND_P_1 (W,X,Y,Z), global partition on all four columns

For the three other indexes, they are all on (,Y,Z) and they all perform reasonable well.

I do have the 10053 trace, SQLT report, etc. which I’m trying to get my head around, but I wonder is something can be said about how Oracel goes about calculating costs for the skip scan on the THIRD column in the case of a partioned index.

Yes (as the SQLT / Tuning Advisor says) why not create an index on (Y,Z)? Our client says no. We may try and use SQL Profile out Stored Outlines instead, since (for the packages app) we can’t change the code in this case.

The likely correct solution is to create the index on Y,Z. Any idea why the client says no ?

The number of distinct possible combinations of W, X will determine how many times Oracle will estimate it will need to traverse the index. So if there are 10 disitnct values of W and 10 distinct values of X, that’s 100 possible combinations (unless you possibly have extended stats on 11g) and 100 times the index needs to be accessed. 100 and 100 distinct values and that’s 10,000 index traverses.

In summary, it’s a very unlikely event and beaten by the index on just the Y and Z columns.

Part of my response was what I say in my response to Gary below: num_skip_scans seems to be independent of the number of columns to skip!

Concerning the reluctance to create the really good index I do not have the explanation. Fear that something else will break I would think.

Now, in the packaged application there were no partioned indexes. So, someone must have decided to create them, probably due to some other performance problem. Maybe they are reluctant to change thinking that what was fixed by the partioned index will break.

Is there a tool (in 10gR2) to estimate the impact of creating another index?

I’ll check the constraints, but I would be surprised if that was the case because when run with the bad partioned index the execution plan shows that all four partitions are scanned. Anyway, for a global hash partition I don’t see how the values would be found in a single partition?

Coming back to the choice of the optimizern here is a part of the 10053 trace file (from an execution where one of the three good indexes were chosen):

I’m surprised to see that the number of skips, which is very close to the cost, is the same in each case in spite of the fact that for the bad index there are two columns to skip and for the three others only one. Btw, for the bad index the second column does have 2929 distinct values as is true for the first column of the other indexes (not the same column though). The first column of the bad index as in addition around 1800 distinct entries.

The problem is on a 10.2 database. Thinking maybe that the number of skips for the case of multiple leading columns was an evolving feature I devised a simple test on a 11.1 test instance. But again the 100543 showed the same number of skips independent of the number of columns to skip..

To make things clear, when run with the bad index the DELETE goes from < 10 seconds to 5-15 minutes and the number of gets and reads is very much higher. Clearly the estimated cost is way off.

I’m packing for yet another trip so I don’t have the time at the moment to investigate this properly. However, on the surface it does look odd. Using only the first column to determine the number of possible skip scan it needs to perform makes little sense as the third column of interest can potentially appear anyway within the range of the first column due to the fact everything is ordered within the first column by the missing second column. Therefore, in theory, no index leaf block can be skipped making the exercise pointless (and very inefficient).

However, I need to have a play and investigate. On what column(s) is the table partitioned ? Also, are saying that the other 3 indexes only have the one leading missing column and it’s just the index with the 2 missing columns that’s problematic ?

This article is excellent.
Have you ever encoutered situations where execution plans have changed from Full table Scan to Index Skip Scan access after an 11g migration? Is the COST of Skip scan cheaper in 11g than in 10g?

I can’t say I have noticed a difference with 11g but that said, I very rarely come across Index Skip Scan execution plans. So my lack of visibility in this area is ceratinly no indication that there has been no changes.

Do you have an example of where such a change has taken place and the relative costs between 10g and 11g (and which version of 11g ?) ??