When a SQL
statement is executed, the database must convert the query
into an execution plan and choose the best way to retrieve the
data. For Oracle, each SQL query has many choices for
execution plans, including which index to use to retrieve
table row, what order in which to join multiple tables
together, and which internal join methods to use (Oracle has
nested loop joins, hash joins, star joins, and sort merge join
methods). These execution plans are computed by the Oracle
cost-based SQL optimizer commonly known as the CBO.

The choice of executions plans made by the Oracle SQL
optimizer is only as good as the Oracle statistics. To always
choose the best execution plan for a SQL query, Oracle relies
on information about the tables and indexes in the query.

Starting with the introduction of the dbms_stats
package, Oracle provides a simple way for the Oracle
professional to collect statistics for the CBO. The
old-fashioned analyze table and dbms_utility methods
for generating CBO statistics are obsolete and somewhat
dangerous to SQL performance because they don't always capture
high-quality information about tables and indexes. The CBO
uses object statistics to choose the best execution plan for
all SQL statements.

The dbms_stats utility does a far better job in
estimating statistics, especially for large partitioned
tables, and the better stats result in faster SQL execution
plans. Andrew Holdsworth of Oracle Corporation notes
that dbms_stats is essential to good SQL performance,
and it should always be used before adjusting any of the
Oracle optimizer initialization parameters:

?The payback from good
statistics management and execution plans will exceed any benefit of
init.ora tuning by orders of magnitude?

To fully appreciate dbms_stats, you need to examine
each of the major directives. Let?s take a close look at each
directive and see how it is used to gather top-quality
statistics for the cost-based SQL optimizer.

The options parameter

Using one of the four provided methods, this option governs
the way Oracle statistics are refreshed:

gather auto?Reanalyzes objects which
currently have no statistics and objects with stale
statistics (Using gather auto is like combining
gather stale and gather empty.)

Note that both gather stale and gather auto
require monitoring. If you issue the alter table xxx
monitoring command, Oracle tracks changed tables with the
dba_tab_modifications view, which allows you to see the
exact number of inserts, updates, and deletes tracked since
the last analysis of statistics.

The estimate percent option

The following estimate_percent argument is a new way to
allow Oracle?s dbms_stats to automatically estimate the
best percentage of a segment to sample when gathering
statistics:

estimate_percent =>
dbms_stats.auto_sample_size

You can verify the accuracy of the automatic statistics
sampling by looking at the dba_tables sample_size
column. It is interesting to note that Oracle chooses between
5 and 20 percent for a sample size when using automatic
sampling. Remember, the better the quality of your statistics,
the better the decision of the CBO.

The method_opt option
The method_opt parameter for dbms_stats
is very useful for refreshing statistics when the
table and index data change. The method_opt
parameter is also very useful for determining
which columns require histograms.

In some cases, the distribution of values within
an index will effect the CBOs decision to use an
index versus perform a full-table scan. This
happens when a where clause has a disproportional
amount of values, making a full-table scan cheaper
than index access.

Oracle histograms statistics can be created when
you have a highly skewed index, where some values
have a disproportional number of rows. In the real
world, this is quite rare, and one of the most
common mistakes with the CBO is the unnecessary
introduction of histograms in the CBO statistics.
As a general rule, histograms are used when a
column's values warrant a change to the execution
plan.

To aid in intelligent histogram generation, Oracle
uses the method_opt parameter of
dbms_stats. There are also important new
options within the method_opt clause,
namely skewonly, repeat and auto:

The skewonly option is very time-intensive
because it examines the distribution of values for
every column within every index.

If dbms_stats discovers an index whose
columns are unevenly distributed, it will create
histograms for that index to aid the cost-based
SQL optimizer in making a decision about index
versus full-table scan access. For example, if
an index has one column that is in 50 percent of
the rows, a full-table
scan is faster than an index scan to retrieve
these rows.

--*************************************************************
-- SKEWONLY option?Detailed analysis
--
-- Use this method for a first-time analysis for
skewed indexes
-- This runs a long time because all indexes are
examined
--*************************************************************

If you need to reanalyze your statistics, the
reanalyze task will be less resource intensive
with the repeat option. Using the repeat
option will only reanalyze
indexes with existing histograms, and will not
search for other histograms opportunities. This is
the way that you will reanalyze you statistics on
a regular basis.

The auto option within dbms_stats is
used when Oracle table monitoring is implemented
using the alter table xxx monitoring;
command. The auto option, shown in
Listing D, creates histograms based upon data
distribution and the manner in which the column is
accessed by the application (e.g., the workload on
the column as determined by monitoring). Using
method_opt=>?auto? is similar to using the
gather auto in the option parameter of
dbms_stats.

Oracle allows for parallelism when collecting CBO
statistics, which can greatly speed up the time
required to collect statistics. A parallel
statistics collection requires an SMP server with
multiple CPUs.

Better execution speed

The dbms_stats utility is a great way to
improve SQL execution speed. By using
dbms_stats to collect top-quality statistics,
the CBO will usually make an intelligent decision
about the fastest way to execute any SQL query.
The dbms_stats utility continues to improve
and the exciting new features of automatic sample
size and automatic histogram generation greatly
simplify the job of the Oracle professional.

> Will dbms_stats someday detect sub-optimal table join orders from a workload,
and create appropriate histograms?
If histograms exist, then they were either automatically created because the
columns met the criteria defined on page 10-11 of the document, or manually
created. If they were created automatically, then is probable they will
influence the plan for the better.

Sub-optimal join orders are generally the result of poor cardinality estimates.
Histograms are designed to help with cardinality estimates where data skew
exists.

Keeping statistics: What is the current "official" policy regarding statistics retention?

The old CW was that the DBA should collect a deep, representative sample, and
keep it, only re-analyzing when it's "a difference that makes a difference"?

I don't know if there is an "official" policy per se, but I will offer my
professional opinion based on experience. Start with the dbms_stats defaults.
Modify as necessary based on plan performance. Use dynamic sampling and/or
dbms_sqltune or hints/outlines where appropriate (probably in that order).
Understand the problem before attempting solutions.

There are a couple of cases that I would be mindful of:

1) Experience has shown that poor plans can be a result of under estimated NDV
with skewed data and DBMS_STATS.AUTO_SAMPLE_SIZE
(or too small of a sample). This has been addressed/enhanced in 11g. In 10g it
requires choosing a fixed sample size that yields an accurate enough NDV to get
the optimal plan(s). The sample size will vary case by case as it is data
dependent.

2) Low/High value issues on recently populated data, specifically with
partitioned tables. If the partition granule size is small (say daily or
smaller) the default 10% stale might be too little. It may be best to gather
partition stats immediately after loading, or set them manually. It's better to
have stats that are an over estimate on the number of rows/values than an under
estimate. For example, its better to have a hash join on a small set of data
than a nested loops on a large set.

Note:This Oracle
documentation was created as a support and Oracle training reference for use by our
DBA performance tuning consulting professionals.
Feel free to ask questions on our
Oracle forum.

Verify
experience!Anyone
considering using the services of an Oracle support expert should
independently investigate their credentials and experience, and not rely on
advertisements and self-proclaimed expertise. All legitimate Oracle experts
publish
their Oracle
qualifications.

Errata? Oracle technology is changing and we
strive to update our BC Oracle support information. If you find an error
or have a suggestion for improving our content, we would appreciate your
feedback. Just e-mail:
and include the URL for the page.