RE: Which plan is better - what COST really means ?

From: Wolfgang Breitling <breitliw@xxxxxxxxxxxxx>

To: J.Velikanovs@xxxxxxxx

Date: Sat, 15 Jan 2005 11:10:07 -0700

At 10:00 AM 1/15/2005, you wrote:
>Which type of IO by your opinion optimizer takes as base type for cost
>estimating?
>As far as I know different types of operations (FS, IRS, IQS, IFS, FILTER)
>execute IO differently. I mean CPU utilization is different for single IO
>unit in case of different operations.
>
>Another question is If cost is IO, how optimizer estimates sorting/hash
>etc. actions in IO units?
The "cost" in the CBO's calculations is basically the count of the number
of blocks that need to be examined in order to get the result. For example,
for a fully qualified primary key access to a row one block in each of the
index levels needs to be examined from the root over the branch blocks to
the leaf and then the data blocks needs to be examined, ergo the cost is
"LVLS +1", the height of the index plus one. For a full table scan, each
block up to the hwm needs to be examined. The CBO is aware of multiblock
reads, so the number of blocks is divided be db_file_multi_block_read_count
to come up with a basic cost for the FTS. However, the CBO is also aware
that not every multiblock read will be set up to read dfmrc blocks so it
applies a fudge factor to the basic FTS cost => table-scan-cost (tsc) =
blocks/dfmrc*fudge-factor.
These were the easy cases. The CBO needed 1 or all blocks of the table. If
you have a predicate where an unknown number of rows qualify, the CBO must
first estimate how many rows will qualify. That's the selectivity of a
predicate and this is where the first assumption comes in, that each of the
possible num_distinct predicate values will yield the same
num_rows/num_distinct (= FilterFactor = Selectivity) rows - unless you
collected a frequency histogram then the CBO has the exact number. Once it
has the row estimate, the CBO needs to convert that into a block estimate,
its cost "currency". If the base access method is a FTS, the cost is the
FTS cost as explained above, else it uses the clustering_factor of the
index as a measure of how clustered vs scattered the rows found via the
index are. If they are clustered (a low clustering factor) the row estimate
translates into a lower block estimate than if they are scattered. But
remember that this is a statistic and therefore an aggregate. The
clustering factor measures how many block transitions there are when you
progress from one index leaf value to the next one. But if your predicates
don't include all the trailing index columns then the rows you're seeking
may be far more scattered than the clustering_factor for the full index
indicates. Even worse if you are missing leading index columns (skip_scan).
To answer Juri's question about the costs of sorting/hashing. As long as
from the size of the sort/hash the CBO can assert that no spill to the temp
tablespace is necessary (that is where sort/hash area size play a role),
the sort and hash cost seem to be constant at 2 and 1 respectively (from
what I see in 9.2). Beyond that the number of blocks to be written/read
(i.e. the cost "currency") get estimated from the number of sort and merge
runs required or hash probes required and then tacked on to the cost.
I hope you appreciate by now how reality can differ from even the best
attempts by the CBO of estimating the cost (ie single block IOs) of a
query. That is why it is virtually impossible to compare the costs of two
plans even for the same query, even though the cost is by design not a
meaningless number. You have to know the details of how those costs were
derived. Assume just for a moment you have a single table sql with a single
range predicate which results in an index range scan with cost N. If I set
optimizer_index_cost_adj=50, the cost will come out as N/2. Would you
assume that just by doing that the query now runs twice as fast?
Regards
Wolfgang Breitling
Centrex Consulting Corporation
http://www.centrexcc.com
--
http://www.freelists.org/webpage/oracle-l