October 23, 2012

Skip Scan

A recent question on OTN asked how you could model a case where Oracle had the choice between a “perfect” index for a range scan and an index that could be used for an index skip scan and choose the latter path even though it was clearly (to the human eye) the less sensible choice. There have been a number of wierd and wonderful anomalies with the index skip scan and bad choice over the years, and this particular case is just one of many oddities I have seen in the past – so I didn’t think it would be hard to model one (in fact, I thought I already had at least two examples somewhere in my library – but I couldn’t find them).

Take a data set with two columns, call them id1 and id2, and create indexes on (id1), and (id2, id1). Generate the id1 column as a wide range of cyclic values, generate the id2 set with a small number of repetitive values so that a large number of physically adjacent rows hold the same value. The clustering_factor on the (id1) index will be very large, the clustering_factor on the (id2, id1) index will be relatively small because it will be controlled largely by the repetitive id2 value. Here’s the data set:

If you examine the code you will see that id2 repeats each value 10,000 times, and as it does so id1 ranges from 0 to 9,9999; this means the index on (id2, id1) will essentially be listing the data in exactly the order it appears in the table, while each entry in the index on (id1) will either be jumping 10,000 rows down the table from the previous one, except of the cases where it jumps from the end of the table to the start of the table. As a quick sanity check on the indexes I’ve created – it can be quite easy to fool yourself that you’ve done what you’ve want when the indexes prove otherwise (for reference, there are 3,872 blocks in the table):

Now all we have to do is run a suitable query – so here’s a simple query that does the “wrong” thing (in 10.2.0.5). I’ve shown the query (unhinted, then hinted to use the “correct” index) with the output from autotrace, but the plan that appears from autotraceis the same plan that actually gets executed at run-time. I’ve disabled CPU costing to reduce the risk of variation in results, but I see the same effects when any reasonable values for system statistics are enabled. The text is a straight cut-n-paste from an SQL*Plus session

I have to say I was expecting to have to play around a little bit with unpeeked bind variables, or scalar subqueries in the where clause to get Oracle to pick the wrong index, but in the latest version of 10.2.0.5 this wasn’t necessary. (I chose this version, by the way, because the question on OTN related to 10gR2.)

If you think carefully about the arithmetic of the skip scan execution plan, you’ll notice a couple of oddities. The cost of the index probe is 288, but the optimizer should know from the column statistics that there are 100 distinct values for id2 and might, therefore, calculate the cost as 100 (plus a bit), because it will have to find 100 leaf blocks.

The second oddity is that the cost of finding the estimated 299 rows in the table seems to have been calculated by applying the table selectivity (derived from the predicate id1 between 501 and 502) to the clustering_factor of the (id2, id1) index – which isn’t a good idea given the extreme difference between the two available clustering factors. Without looking at the (id1)clustering_factor, perhaps Oracle should have taken the cost of (id2 = constant and id1 between 501 and 502) and multiplied the result by the number of distinct values of id2.

If you’re wondering why the estimated cardinality is (nearly) 300, by the way, this is simply the standard selectivitity calculation: (range required by predicate)/(total range of column) + 1/num_distinct + 1/num_distinct that you get from a between clause. Of course, once you’ve got a model – especially a model that seems to do the wrong thing – you might as well run it on as many sensible versions of Oracle as possible to see how things change over time. My example didn’t change on 11.1.0.7, but on 11.2.0.3 the cost of the skip scan increased slightly, just enough to make the “expected” range scan appear. I didn’t bother to pursue this in detail, butI did have a couple of other scenarios to play around with that I might write up some time soon.

I’ll leave you with one interesting thought. The skip scan is, in effect, a query that probes the index for every possible value of the leading edge – so what to do you think the costs will show if I write a query that explicity runs my range scan on id1 for every value that is currently in id2. (Logically it’s not the same query, of course, but coincidentally it is temporarily equivalent.)

SQL> set autotrace traceonly explain
SQL>
SQL> select
2 *
3 from t1
4 where
5 id2 in (
6 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
7 10,11,12,13,14,15,16,17,18,19,
8 20,21,22,23,24,25,26,27,28,29,
9 30,31,32,33,34,35,36,37,38,39,
10 40,41,42,43,44,45,46,47,48,49,
11 50,51,52,53,54,55,56,57,58,59,
12 60,61,62,63,64,65,66,67,68,69,
13 70,71,72,73,74,75,76,77,78,79,
14 80,81,82,83,84,85,86,87,88,89,
15 90,91,92,93,94,95,96,97,98,99
16 )
17 and id1 between 501 and 502
18 ;
Execution Plan
----------------------------------------------------------
Plan hash value: 2879882323
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 296 | 6512 | 102 (0)| 00:00:02 |
| 1 | INLIST ITERATOR | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 296 | 6512 | 102 (0)| 00:00:02 |
|* 3 | INDEX RANGE SCAN | T1_I2 | 297 | | 100 (0)| 00:00:02 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access(("ID2"=0 OR "ID2"=1 OR "ID2"=2 OR "ID2"=3 OR "ID2"=4 OR "ID2"=5
OR "ID2"=6 OR "ID2"=7 OR "ID2"=8 OR "ID2"=9 OR "ID2"=10 OR "ID2"=11 OR
"ID2"=12 OR "ID2"=13 OR "ID2"=14 OR "ID2"=15 OR "ID2"=16 OR "ID2"=17 OR
"ID2"=18 OR "ID2"=19 OR "ID2"=20 OR "ID2"=21 OR "ID2"=22 OR "ID2"=23 OR
"ID2"=24 OR "ID2"=25 OR "ID2"=26 OR "ID2"=27 OR "ID2"=28 OR "ID2"=29 OR
"ID2"=30 OR "ID2"=31 OR "ID2"=32 OR "ID2"=33 OR "ID2"=34 OR "ID2"=35 OR
"ID2"=36 OR "ID2"=37 OR "ID2"=38 OR "ID2"=39 OR "ID2"=40 OR "ID2"=41 OR
"ID2"=42 OR "ID2"=43 OR "ID2"=44 OR "ID2"=45 OR "ID2"=46 OR "ID2"=47 OR
"ID2"=48 OR "ID2"=49 OR "ID2"=50 OR "ID2"=51 OR "ID2"=52 OR "ID2"=53 OR
"ID2"=54 OR "ID2"=55 OR "ID2"=56 OR "ID2"=57 OR "ID2"=58 OR "ID2"=59 OR
"ID2"=60 OR "ID2"=61 OR "ID2"=62 OR "ID2"=63 OR "ID2"=64 OR "ID2"=65 OR
"ID2"=66 OR "ID2"=67 OR "ID2"=68 OR "ID2"=69 OR "ID2"=70 OR "ID2"=71 OR
"ID2"=72 OR "ID2"=73 OR "ID2"=74 OR "ID2"=75 OR "ID2"=76 OR "ID2"=77 OR
"ID2"=78 OR "ID2"=79 OR "ID2"=80 OR "ID2"=81 OR "ID2"=82 OR "ID2"=83 OR
"ID2"=84 OR "ID2"=85 OR "ID2"=86 OR "ID2"=87 OR "ID2"=88 OR "ID2"=89 OR
"ID2"=90 OR "ID2"=91 OR "ID2"=92 OR "ID2"=93 OR "ID2"=94 OR "ID2"=95 OR
"ID2"=96 OR "ID2"=97 OR "ID2"=98 OR "ID2"=99) AND "ID1">=501 AND "ID1"<=502)

Unfortunately the costing isn’t consistent with the skip scan costing.

Footnote: I haven’t looked at the 10053 trace files for these plans – they take a lot of time to read and don’t often help very much, so it’s only the sort of thing I do if I think it’s necessary to solve an important production problem or (if I’m doing some R&D) it looks like it might be entertaining.

Related

Most recently, I was sorry that “skip scan” uses are too limited. For example, it would be great to use “skip scan” if needed to get distinct id2 from your test table T1 or Top N id1 for each id1, but the oracle has no such mechanism yet.
Although I found a tolerable workaround for this with recursive subquery factoring, but it would be much better able to cbo it myself. I wrote about it here: http://orasql.org/2012/09/21/distinct-values-by-index-topn

I have logged a SR on MOS to address clustering factor when index SS access path used, couple of weeks ago. Applying “table_sel * CF” is really wrong idea to determine the cost of finding rows in table blocks. I have suggested development to use MIN(“table_sel * CF(t1_i2) + NDV(id2), table_sel * table_num_rows) instead. Your idea is very simple and probably much smarter, however there is probably reason oracle is not using this straightforward formula.

There is a lot of changes in IDNEX_SS costing in 11gR2. The fix 9195582 (introduced in 11.2.0.2) caused performance issue of our application, since estimate for index scan (but only index part) will be no more than leaf blocks. So it decreased (probably in correct manner) cost of index blocks visits, but made CF effect even worse.

Interesting also to note that the index range scan operation necessitate 100 more logical reads (301) when compared to the skipping operation of the same index (210). It might be due to the number of time (Starts=100) the index range scan has been started while the index skip scan has been started only once. Knowing that the index skip scan is more expensive because it requires a special pinning and it has to go up and down the branch levels of the index, the above situation is not suggesting such a conclusion. Isn’t it?

I have also tried to compress the index T1_I2 to see if, for the original query, the index range scan will be chosen automatically but it hasn’t.

One of the problems of creating small tests to demonstrate a principle is that the sample data may introduce boundary conditions if you try to use the same data to extend the investigation. The contradiction between predicted cost and the actual buffer gets on the two cases arises, in part, because of the differences between the generic range scan model and the specific data set I’ve constructed; and in past because of the differences between the skip scan model and its physical implementation. (Physical implementation, of course, explains why the cost model for subquery filters can be very misleading – the optimizer doesn’t allow for scalar subquery caching.)

The difference in buffer gets is becausse of the root and branch pinning that takes place as you run the skip scan and travel up and down the index; so the block visits are still there, but they don’t cause the same level of latch activity that the iteration does, and aren’t “gets”. It would be instructive to finish off the comparison by creating a “driving table” just the 100 rows in it to drive a nested loop join and see how that affected the cost of probing the table 100 times.