The CBO and Indexes: Introduction Continues … June 15, 2009

OK, I previously briefly covered how the CBO calculates the basic cost of an index range scan. Yes, those cardinality/rows values in the execution plans are vitally important as they highlight whether or not the CBO has used the appropriate selectivity values in the index costing formula. And yes, the associated cost values are meaningful and potentially useful as they determine the actual costs associated with the execution plan in terms of the expected number of I/Os the CBO estimates will be required (when using the IO costing model and often the CPU costing model as well).

I’m just going to look at another example now using the same table setup as before, but this time running an SQL query that has 5 distinct values in an IN list predicate on our demo table (again, follow the link to see the query and formatted execution plan).

The first thing we notice in this example, is that Oracle has decided to use a FTS rather than use the index on the ID column. Considering we’re only after 5 values out of the possible 100 values, some may not see this as expected behaviour, especially considering the index has such a good Clustering Factor. Basically Oracle is deciding to access each and every block below the HWM of the table, retrieving all 100% of the rows in the table, only to ultimately discard 95% of them.

It certainly appears at first glance to be a more “costly” option than using the index to directly access just the 5% of rows we’re interested in …

The first thing to check is the estimated cardinality figures, to see if the CBO has miscalculated the expected number of rows it needs to retrieve. However, as the statistics have just been fully computed and that the ID column has perfectly even distributed values, we notice the cardinality figures are again spot on. The query returns 10,000 rows and indeed the rows estimate in the execution plan is exactly10,000 rows. The calculation is simply 0.01 (density of column) x 200,000 (rows) x 5 (values in select list) = 10,000.

Let’s now calculate the cost of using the index using our index costing formula, using the CEIL function this time😉

So the cost of using an index range scan to retrieve 5% of the rows in our example comes to a total of 80.

If we look at the cost of the FTS in our explain plan in the above link, we notice the cost is just 65. 65 is less than 80, so the FTS wins.

So how did the CBO come to a cost of just 65 when it has to read all 659 blocks in the table ?

Well, the first missing piece of information is the value of the db_file_multiblock_read_count parameter because this governs how many blocks Oracle will attempt to read within a single logical multiblock I/O call. Remember, when performing a FTS, Oracle knows it has to read all the table related blocks below the HWM and so rather than reading one tiny little block at a time, it does so more efficiently by reading multiple blocks at a time. This is the fundamental advantage of the FTS over the index range scan which can only ever access the one block at a time.

The next thing to note is that it’s very unlikely that Oracle will actually read the full 16 blocks at a time as there are a number of factors that prevents this from occurring. Extent boundaries is one classic example (a multiblock read can not span across extent boundaries) but the more common issue is a block within the table already being stored in the buffer cache. Rather than storing the same block at the same consistent point twice in memory, Oracle breaks up the multiblock read and only reads up to the block that is already cached in the buffer cache. Therefore, for Oracle to actually read the entire table using the full 16 block multiblock I/Os, it would mean there are no cached blocks from the table currently in the buffer cache, an unlikely event.

Therefore, Oracle doesn’t use the full 16 value when determining the number of expected multiblock I/Os, but a modified “fudge” value which equates to approximately 10.4. for a MBRC of 16. Again, Jonathan Lewis in his excellent “Cost-Based Oracle Fundamentals” book discusses all this is some detail.

Remember also that Oracle needs to access the segment header as part of a FTS as I explained is some detail in my “Indexes and Small Table” series. So that’s an additional single block I/O on top of the multiblock I/Os.

The 65 cost for the FTS does make sense when one understands a little how this value is derived by the CBO …

As the FTS can read big chunks of the table at a time whereas the index range scan can only read each necessary block one at a time, the FTS can indeed read the table and retrieve the required 5% of data in fewer LIOs and so has the lesser associated cost than the index.

Now there are a few issues with all of this. Firstly, is the db_file_multiblock_read_count actually a valid and correct setting as this directly impacts not only the actual size of the multiblock read operations but critically, the associated costs relating to FTS operations (and indeed Fast Full Index Scans as well) ?

Also, is it really correct and valid to assume the cost of a multiblock I/O to be the same and equal to the cost of a single block I/O ? Surely, the process of performing a single block I/O is likely to be “cheaper” than that of a multiblock I/O and yet the CBO treats both types of I/Os as having the same fundamental “cost”.

Also the CPU overheads of having to access each and every row in each and every block is likely going to be more significant than the CPU required to access just specific data from specific blocks when using an index.

Perhaps, the more “expensive” index range scan might actually be a better alternative than the FTS if these factors were taken into consideration ?

Now this may indeed be true, if these factors were correctly taken into consideration. However, this may also indeed be quite false and the FTS may really truly be the better and more efficient alternative and attempts to force the use of the index may be inappropriate and ultimately more expensive.

I’ll next discuss some really bad (although still very common) methods of making the CBO favour indexes, using generally inappropriate so-called “Silver Bullets” …

Share this:

Like this:

Related

Hi Richard
Very good note. But I am a bit confused with the closing paragraph. It appears that both FTS or Index scan can be favored depeding on certain conditios (and you have arguments in favor of both and I understad them). But how does the optimizer understand these? It uses the formulae above to determie them. But again, I can cheat the optimizer to use a FTS instead of a Index scan (by increasing hte db_multiblock_read_count).
I am stating the obvious here probably, but is it not b est to gather stats and let optimizer choose the best execution plan. Can I conclude that from this because in your final paragraphs you seem to have arguments in favor of both FTS and Index scan.

That’s correct, either the FTS or the Index Scan might actually be the more appropriate (in terms of less resources and/or less response time).

Providing the database is set up with the most appropriate db_file_multiblock_read_count AND the CBO has a reasonable value by which to base an average sized mutliblock read AND the CBO has a reasonable idea on the comparative costs between a single block read and a multiblock read AND it has a reasonable idea on the related CPU costs AND of course it has reasonably accurate segment stats to determine the correct selectivity, then the CBO will likely pick the correct access path.

In my demo, the only thing we can reasonably say is that it has the correct stats but for the rest, we’ll who’s to say. Assuming everything I’ve mentioned is correctly set up in the database, then yes the FTS is likely the better option.

And there are good ways and bad ways of trying to ensure the CBO has the rest of this necessary information. As I’ll discuss.

Yes (although there could always be the odd variance). The CBO doesn’t consider blocks currently in memory when an SQL is parsed and the caching characteristics of a segment might of course change over time as the same execution plan is reused.

However, I predict segment level caching stats is possibly not that far away …

I am very wondering how did you calculate the “fudge” value? I did read the Jonathan Lewis’ excellent book “Cost-Based Oracle Fundamentals”. I see that Oracle might not read the exact number of MBRC because of the intended block might be found in the memory. So, Oracle has to break the read up to the that point then continue. I guess this “fudge” value has been derived because of this situation. Am I right? However, I cannot understand much. I am trying to figure out how the cost calculation of the full table scan can be done. Appereantly, it is impossible to calculate it without “fudge”value. So, do you have any idea that how can we calculate it?

Yes, you’re right. However, these values can be found on the web and indeed in Jonathan’s books.

However, this is all a little dated now with system statistics and the CPU costing model becoming more the norm. To breakdown how the CBO calculates the cost of the FTS, it’s basically the number of blocks in the table divided by the MBRC system statistic + the CPU component, which is derived from the (%CPU) value in the execution plan.