Reliably predicting when to
reorganize your tables and indexes is an important decision, and one
that you can be the first to automate. This has become the ?Holy
Grail? for many Oracle professionals, with
numerous articles and
debates about predicting the benefit from a reorg. Some purists
say that it's better to design the table/indexes never to fragment
in the first place, but the reality of the Oracle DBA is that high
attrition, vendor packages and the inability to control the
developers makes this a very important real-world problem.

It is clear that Oracle will soon
leverage upon their wonderful time-series performance repository,
the AWR, and create predictive models:

In an OracleWorld 2003
presentation titled ?Oracle Database 10g: The Self-Managing
Database? by Sushil Kumar of Oracle Corporation, Kumar
states that the new Automatic Maintenance Tasks (AMT) Oracle10g
feature will "automatically detect and re-build sub-optimal
indexes.?

In a paper titled ?Metric
Baselines: detecting and explaining performance events in em
10gr2? (Presented at the
RMOUG 2005
Training Days), John Beresniewicz of Oracle Corporation notes
that the use of "baselines to capture and adapt thresholds to
expected time-dependent workload variations" is a core
feature of the next release of Oracle.

The Objective:

The goal is to create a reliable
predictive model that will suggest tables and indexes which will
measurably benefit from reorganization, predict the reduction in I/O
(logical I/O - consistent gets and physical I/O - physical
reads) after the reorganization, and suggest changes that will
prevent a reoccurrence of the fragmentation (i.e. new pctfree,
new blocksize, etc.):

AWR (STATSPACK) has a history
of how the tables were accessed by historical SQL (dba_hist_sql_plan,
stats$sql_plan, etc.), including the I/O and CPU costs
associated with each step of SQL execution.

We can see the current
internal structure of every table/index (e.g. chain_cnt,clustering_factor, etc.)

Most databases have repeating
patterns of behavior and historical SQL is usually
representative of future activity. The model should be able to
see if the table/index history has simply by examining the
historical access patterns.

The steps to solving this problem
are simple, but the process logic is complex:

1 - Gather table/index
structure information - There are
many examples
of routines that can perform a alter index xxx validate
structure command and store the results in a table for
analysis.

2 - Analyze historical
patterns of usage - We can alter the plan.sql
script to access the stats$sql_plan table and see exactly
how table/indexes are accessed by our SQL, and how often too:

But how can we find these rules?
Do the rules have to make-sense? No, they don't! Scientists
commonly create models (The Oracle data mining tools) that scan
historical data and identify statistically significant correlations
(within 2 standard deviations of the mean value). For example, the
popular MMPI
test is a set of 500 true/false questions that accesses personality
with remarkable validity, and it's results are accepted in all U.S.
courts. Their test-base consists of hundreds of thousands of
subjects,. with a pre-diagnosed mental disorder (see
DSM IV).
By comparing their responses to seemingly innocuous questions (e.g.
"I read the editorials in the newspaper every day") a proven
predictive model was created (Federal
courts have affirmed the MMPI as a scientifically valid) and
accepted procedure for personality assessment.

For example, the subjects
preference to take showers vs. baths is an extremely reliable
measure of self-esteem. Do we know why? No. Do we care? Not
really. All that is proven is that this correlation is a
statistically reliable predictor of feelings of self-worth. We see
the exact same scientific principle applied to Oracle data mining
(ODM) tools. For example, we might find-out that people with red
hair buy a disproportionate amount of skin care products. Knowing
"why" is not important. What's important is knowing that the data
supports the assertion. Also useful is the book "Unobtrusive
Measures", which shows creative techniques for finding "hidden"
significant metrics.

In sum, rules don't have to be
proven true to be statistically reliable, and exceptions do not make
the rule invalid. For example, if two out of every 1,000
read-haired people don't buy skin care products, we still have a
model with a 99% predictive quality.

Some possible Rules for your
AI engine:

To get you started, here is a list
of possible conditions that may contribute to table/index
fragmentation. Remember, you don't need to pre-justify your rule
set. The only thing that counts is that your rules make accurate
predictions.

Index fast full scans will run faster after
index reorganization whenever the ?density? of the index entries
becomes greater. In other words, it takes less time to read
100,000 entries from a 100 block index than reading the entries
from a 500 block index.

Multi-block Index scans (INDEX_FFS), will run faster
when the table data blocks are arranged in index-key order and when
the data blocks have a high number of row entries (as evidenced
by clustering_factor in dba_indexes).

Large-table full-table scans may run faster
after table reorganization, especially when the table has excessive chained or
relocated rows, or low block density after massive DML activity (updates
and deletes).

Proving the Benefit:

?Full-table scans - The predictive model will
have a rate, expressed as a function of the number of rows and the
amount of I/O (both logical and physical) that are required to
access the rows via a ?db file scattered read? (Usually an index
fast-full scan, a full-table scan, etc.).

The
report output should show the forecasted reduction in I/O after the
table was reorganized.

?Multi-block Index range scans - The predictive
model should be able to see the size of each index range scan (the
number of rows returned), the number of blocks that were required to
fetch the rows, and the number of logical and physical I/O.

?Index Rebuild - The predictive model should
capture the ?speed? of index fast-full scans, multi-block index
range scans, and show the degradation over time.

Sparsity of the index blocks (using dba_hist_sqlstat
and dba_hist_sql_plan)

Rate of time (expressed in blocks/index row values) required to
perform an index fast-full scan

Finding the Data:

The Oracle 10g AWR (and STATSPACK
in previous releases) provides a time-series database. This
time-series performance details can be used to create accurate
predictions of the real benefit of reorganizing tables and indexes.
Specifically, the AWR can measure, between any two snapshot values:

1 - The change in the speed of
SQL execution steps over-time.

In
this prototype below, we see the change in the speed of full-table
scans:

The
AWR contains important information about the data distribution and
density of the data blocks within each Oracle segment:

Table Segment Report:

FREELISTDATE SEGEMENT BLOCKS NUM_ROWS LENGTH

6/20/05
10:00 AM CUST_TABLE 100,000 500,000,000 77

Sample Index Report:

Average Average Average

Index Average Average Leaf Leaf Row Average Leaf Block

Name Date Height Blocks Rows Length Leaf Blocks Length

CUST_IDX 6/19/05 1 236 12 234 1 7,996

The Method:

The solution does not have to
generate the syntax for the reorg, just suggest table and index
names, predict the improvement, and suggest changes to the parameter
setting/table structure to prevent a subsequent reorg:

Table reorgs can be done with dbms_redefinition,
CTAS, or ?alter table move? syntax.

Index reorgs are performed by rebuilding the
index with the ?alter index xxx rebuild? command.

Boolean rules and process control flow can be
implemented in PL/SQL stored procedures.

When can we "prove" a benefit from an index
rebuild? Here, Robin Schumacher
proves that an index that is rebuilt in a larger tablespace will
contain more index entries be block, and have a flatter structure:

"As you can see, the amount of logical reads has
been reduced in half simply by using the new 16K tablespace and accompanying
16K data cache."

In an OracleWorld 2003
presentation titled ?Oracle Database 10g: The Self-Managing
Database? by Sushil Kumar of Oracle Corporation, Kumar
states that the new Automatic Maintenance Tasks (AMT) Oracle10g
feature will "automatically detect and re-build sub-optimal
indexes.?

This
Kim Floss article shows the Oracle 10g segment advisor recommending a
rebuild of an index:

?The page lists all the segments
(table, index, and so on) that constitute the object under review. The
default view ("View Segments Recommended to Shrink") lists any segments that
have free space you can reclaim.?

You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts.

��

Burleson is the American Team

Note:This Oracle
documentation was created as a support and Oracle training reference for use by our
DBA performance tuning consulting professionals.
Feel free to ask questions on our
Oracle forum.

Verify
experience!Anyone
considering using the services of an Oracle support expert should
independently investigate their credentials and experience, and not rely on
advertisements and self-proclaimed expertise. All legitimate Oracle experts
publish
their Oracle
qualifications.

Errata? Oracle technology is changing and we
strive to update our BC Oracle support information. If you find an error
or have a suggestion for improving our content, we would appreciate your
feedback. Just e-mail:
and include the URL for the page.