The CBO and Indexes: OPTIMIZER_INDEX_COST_ADJ Part I July 8, 2009

In the previous entry regarding The CBO and Indexes, we saw how the CBO with a query that selected 5 distinct values in an IN list, representing 5% of the data, decided to use a FTS because the costs of doing so were less than that of using a corresponding index. These costs (using the I/O costing model) represented the number of expected I/Os and the FTS was basically going to perform fewer I/Os than the index. Less I/Os, less cost and so the FTS was selected as the preferred access path.

However, by default, the CBO when determining these costs via the I/O costing model makes two very important assumptions which may not necessarily be true.

Assumption one is that all I/Os are likely to be “physical I/Os” which all need to be costed and taken into account.

Assumption two is that all I/Os are costed equally, even though the size of a multiblock I/O performed typically during a FTS is larger and so potentially more costly than a single block I/O usually associated with an index access.

Today, I’m only going to focus on this second assumption.

Now, when performing and processing data from a multiblock I/O as performed during a FTS operation, it’s typical for such operations to be more resource intensive than that of a single block I/O as performed during an index range scan, as the associated overheads are likely be greater such as having to read more actual data off the disk, having to transfer more data into the SGA, having to process more data in each associated block, etc.

Therefore, not all I/Os are equal. However, by default the CBO ignores all these possible differences and costs all I/Os associated with a FTS (multiblock) and an index (single block) as being equivalent or the same.

Now, this hardly seems fair or indeed accurate and desirable when determining the true cost differences between an index and a FTS. Shouldn’t the fact that a single block I/O is likely to be less resource intensive and take less elapsed time to process be taken into consideration when determining these relative costs ?

Enter the optimizer_index_cost_adj parameter.

The purpose of this parameter is simply to “adjust” the corresponding costs associated with an index to (hopefully) more accurately reflect the relative I/O costs between using an index and a FTS. If for example a single block I/O only takes 1/2 the time and resources to perform compared to a multiblock I/O, shouldn’t these associated I/O cost differences be reflected when determining whether or not to use an index and perhaps reduce the index related costs by 1/2 as a result ?

This parameter has a very simple impact on how the CBO costs the use of an index based access path. It takes the value of the optimizer_index_cost_adj as a percentage and adjusts the cost of an index related range scan access path to only be the corresponding percentage of the total index cost. By default, it has a value of 100 meaning that a single block I/O is 100% when compared to that of a multiblock I/O which in turn means that the index related I/O costs are treated the same as that of a multiblock FTS I/O. A default value of 100 therefore has no effect on the overall cost of using an index related access path.

However, if the optimizer_index_cost_adj only has a value of (say) 25, it means that all single block I/O are only 25% as costly as that of a multiblock I/O and so index related range scan costs are adjusted to be only 25% of that of the total index access path cost.

Going back to the previous demo where the FTS was selected, I calculated the cost of using the index when retrieving the 5% of data to be:

We notice of couple of key differences. The first obvious difference is that the plan has changed and that the CBO has now decided to use the index. The second difference is the associated cost relating to the use of the index. Previously, it was calculated as being 80 but now it only has a cost of 20. The maths is pretty simple as with an optimizer_index_cost_adj = 25, we need only mutliply the previous total with 0.25:

Note also that just the index range scan cost component was previously 2 + 5 x ceil(0.01 x 602) = 37, but is now also adjusted to 37 x 0.25 which rounds to 9.

Basically by setting the optimizer_index_cost_adj = 25, we have effectively reduced the overall cost of using the index based execution path down from 80 to just 20, to just 25% of the previous total index cost.

The cost of the FTS remains unchanged at 65. The index access path at just 20 is now less than the FTS alternative and so the index is now chosen by the CBO.

Yes, all these numbers and costs make sense when one understands how the CBO performs its calculations and the effect of setting the optimizer_index_cost_adj parameter to a non-default value.

The optimizer_index_cost_adj parameter can therefore obviously have a very significant impact in the behaviour and subsequent performance of the database as the CBO will reduce (or maybe increase) the actual costs of index related access paths by the percentage denoted in the optimizer_index_cost_adj parameter. It can potentially dramatically increase (or decrease) the likelihood of an index access path being chosen over a FTS.

There are typically 3 very different ways in which this parameter is set, which I’ll list in increasing order of preference.

1) Set it arbitrarily to a very low figure such that indexes reign supreme as their associated costs get adjusted to such a low figure by the CBO that a FTS access path has little chance of being chosen (for example, here’s a suggestion to set it to a magical value of 12). Generally a very bad thing to do in any database …

2) Set it to a value that the DBA determines is an approximate percentage of the costs associated with a single block I/O when compared to a multiblock I/O. An improvement of option 1), but I still prefer the next option 3) …

3) Leave it at the default value of 100 such that it has no impact and the CBO does not use it to adjust the cost of an index access path

I’ll explain in Part II a sensible approach in setting the optimizer_index_cost_adj parameter and why option 3 is the preferred option with any currently supported version of Oracle.

Related

Indeed. All examples thus far have deliberately used the IO costing model. Chat about system stats and the CPU costing model to come. Note though that the impact in index costs is generally minimal although the same might not be the case for the FTS …

However, the test isn’t necessarily so simple. Remember you potentially have many many tables and many many indexes, living on all different tablespaces and parts of your file systems, with all sorts of different loads at different times of the day or week.

Therefore, in a “simple” test, how do you ensure that the cost of a particular single and multiblock I/O is really indicative of your system ? What if that particular index or table lives on a slower disk or on a slower part of a disk. What if during the test, the disk was particularly hot but not so during most of the day. What if a particular table is flushed frequently and remains in storage cache but not so other I/Os. Etc.Etc.

Therefore, you need to average these things out to get a “typical”, “average”, “general” value of the relative costs of these sorts of I/Os.

And that’s the problem with a “one value for the entire system”, generic value for these sorts of things. Hopefully, it’ll be close enough for most scenarios so any test or measurement needs to be global and general enough to not be unduly impacted by any specific issue.

I see. So in order to be accurate, it would need to be an ongoing test, averaging over a period of time. Similar to line conditioning performed by some DSL/Cable modems to find appropriate window size and whatnot.

That’s right. It’s not going to be a perfect scenario, it can’t be when you have just one set of global values but the greater the sample, the more accurate the “average” , indicative result.

However of course, when you take these measurements are important as well as I/O characteristics may change as the database load and processes change. So systems stats taken at night (during batch processing loads) might not be as indicative as when they’re taken during the day (during more OLTP processing loads).

So you might need different system stats for different database processing loads.

Because not only the index part was made cheaper the optimizer now chooses and index range scan which is infact silly because all the rows are retrieved and so all the table blocks are accessed anyway.

I disagree. I think it’s quite a good way to calculate the cost but you have given a good example of what feeding the CBO bad information can do.

A few key points to your example.

Firstly, the index obviously has a really good clustering factor as the total cost of your second plan with the index is just 31 which would have been around 310 with the full costings in place. A cost of 310 to read a 50,000 row table means the CF is excellent and the index would likely only have to read each block 1 or 2 times at most.

Next point is that the effective db_file_multiblock_read_count must be quite low. The FTS has a cost of 63 which means it’s only reading about 800 rows or so per I/O which in turns means it’s not reading too many blocks per multiblock I/O, I estimate about 8-9 blocks.

Thirdly, you have set a really really low optimizer_index_cost_adj to just 10.

The problem here is not that the way the CBO is costing things is wrong, but that when you have an index with a really good CF AND a db_file_multiblock_read_count that is relatively low AND a likely inaccurate optimizer_index_cost_adj that is way under-estimating the relative costs of a single block I/O vs. these smallish multiblock I/Os, the CBO is going to make the wrong decision.

If by reading the table with a FTS the CBO is going to use 63 I/Os that are 10 times more expensive than the 310 I/Os used by the index plan, then the CBO is going to think it quicker to use the index.

It’s that simple, the CBO doesn’t have the smarts to say but wait, we’re reading the whole table with methods anyways.

The fix is to set perhaps a larger and more effective db_file_multiblock_read_count AND to set the optimizer_index_cost_adj parameter to a more suitable value (or leave both values alone and let system stats and the defaults look after themselves).

Then the CBO will calculate the more appropriate costs with the formulas.

The second formula is basically the formula for the optimizer_index_caching parameter, which I’ll discuss in my next blog post. It truly only impacts I/Os associated with just the index part of the execution path as it only considers the caching of index related blocks.

However the I/Os associated with accessing the table blocks when using an index path are all also single block I/Os, that’s why all the associated I/Os costs are adjusted for the optimizer_index_cost_adj parameter. It therefore adjusts the costs of the index access path, not just the I/Os associated with index blocks.

The parameter could possibly be called the optimizer_singleblock_io_cost_adj and it would perhaps make more sense🙂