JL Computer Consultancy

Trendy new ways with Statspack.

Sept 2004

From time to time (though not very often) I find that a site I’m
visiting has installed Statspack and is using dbms_job
to take regular snapshots of their system-level performance. There are various
ways in which a string of snapshots can be a useful source of strategic
information, so when I find the data available I often end up writing a quick
SQL statement to pick up trending information about just one or two critical
statistics.

Typically, I would produce a report that took some raw data (say ‘db
file sequential read’waits) from stats$system_event, and
converted it into something that showed the number of events in each 30 minutes
(or whatever the client’s interval was set to) for the last 24
hours.If the results look interesting,
I would then extend the code a little and compare data from the corresponding
period the previous week against data for this week to see if there was a
pattern or an anomaly.

Browsing the internet recently, I discovered that most of the published SQL
for doing this type of thing seemed to employ a fairly outdated and inefficient
style of code that used a self-join on two copies the table that was to be
reported (One exception showed up in Tim Gorman’s paper on Statspack at RMOUG 2004, which can also
be found on his own website
– no doubt there are others). The following code fragment gives you the
general idea:

There is a better way – analytic functions: a feature that,
co-incidentally, made its public appearance in the same version of Oracle as statspack
(see footnote) and happens to be ideally suited to
‘trend-like’ reporting of statspack results. (For a short note
on analytic functions that I wrote in 2000, follow this link, for
a more complex example that I published in the magazine of the UK Oracle Users’ Group, follow this link.)

Here’s a bare-bones example, picking up the ‘db
file sequential read’ figure from the system event history. The
site I wrote this at was taking snapshots every half-hour, so each line of the
report shows the activity in the last 30 minutes. The idea is simple – we
start with a query that gets data from a time-range, then uses the lag()
function to align a copy of the values from the ‘previous’
timestamp reported against the values for the ‘current’ timestamp.
Then we can wrap that query into an inline view and report the difference between
the previous and current values as the work done in the period.

rem

remSimple example
of accessing statspack tables with

remanalytic
functions. The effect of lag(,1) is to

remreport the
values from the ‘previous’ row against

remthe current
row.

rem

remNote: to do
the job properly, you should pre-query

remv$instance and
v$database to get the instance number

remand DBID into
substitution (or bind) variables.

rem

column snap_waitsformat 999,999,990

column snap_waited format 999,999,990

column wait_average format 999,999.99

select

to_char(snap_time,'Mon-dd
hh24:mi:ss')time_stamp,

total_waits -
prev_waitssnap_waits,

time_waited -
prev_timesnap_waited,

round(

(time_waited
- prev_time) /

decode(

(total_waits
- prev_waits),

0,null,

(total_waits
- prev_waits)

),

2

)wait_average

from

(

select

--starting query: get current and previous
values

--note: time_waited_micro is the version 9
column

--and time_waited is the version 8 column.

--The division by 10,000 converts 9i values
centiseconds.

ss.snap_time,

se.total_waits,

lag(se.total_waits,1)
over (order by se.snap_id)prev_waits,

se.time_waited_micro/10000time_waited,

lag(se.time_waited_micro/10000,1) over
(order by se.snap_id) prev_time

Once you have applied the basic lag() approach, you can start
getting sophisticated about the way in which you report data. In principle, you
could (for example) write a report that took two Monday sets of data, and
reported them side by side so that you could check whether the pattern from one
Monday repeated the next Monday; you could even add a third set of figures that
reported the differences in absolute value or percentages. Personally I rarely
bother to write code to produce such prettily formatted output, I would just
report the two days one after the other and judge changes by eye – but
for management reports, you may want to clean the report up and make it easier
to interpret.

If you do get interested in this type of manipulation of statspack data,
remember that the indexes on the (default) perfstat schema are just primary key
indexes designed for the way that statspack works. If you are leaving
a large history in place, you may want to create an index on stats$snapshot
with the columns (dbid, instance_number, snap_time) with compress 2 so that you
can pick your target time range efficiently. (You may choose to forget about
the first two columns if you are using statpack for just one database and
instance, of course).

As an example of the output – the script above produced the following
output on a site which scheduled statspack every half hour, and
(luckily) didn’t have any extra snapshots fired off manually in the
interval reported.

TIME_STAMPSNAP_WAITSSNAP_WAITED
WAIT_AVERAGE

--------------- ------------ ------------ ------------

May-19 17:30:22307158.51

May-19 18:00:2500

May-19 18:30:2342.5

May-19 19:00:2567,65435,544.53

May-19 19:30:25266100.38

May-19 20:00:28718303.42

May-19 20:30:24702288.41

May-19 21:30:23670326.49

May-19 22:00:31 1,093741.68

May-19 22:30:251,208655.54

May-19 23:00:271,356819.6

May-19 23:30:231,373728.53

May-20 00:00:2812,30112,6171.03

May-20 00:30:231,406924.66

May-20 01:00:3015,74013,790.88

May-20 01:30:261,422742.52

May-20 02:00:2400

May-20 02:30:229,45610,3471.09

May-20 03:00:2718,41510,471.57

May-20 03:30:2813,9017,855.57

May-20 04:00:2890,94858,901.65

May-20 04:30:25500

May-20 05:00:2400

May-20 05:30:2400

May-20 06:00:23100

May-20 06:30:23100

May-20 07:00:2800

May-20 07:30:2200

May-20 08:00:3700

May-20 08:30:2611067.61

May-20 09:00:31219.43

May-20 09:30:25441

May-20 10:00:261,3705,7164.17

May-20 10:30:242314.61

May-20 11:00:30100

May-20 11:30:283730.81

May-20 12:00:3073.43

May-20 12:30:249350.54

May-20 13:00:2655,31739,901.72

May-20 13:30:273,9911,475.37

May-20 14:00:2611,3355,201.46

May-20 14:30:24300

May-20 15:00:2632.67

May-20 15:30:2342.5

May-20 16:00:2664661.03

May-20 16:30:2353.6

May-20 17:00:298581.95

47 rows selected.

As you can see, the site has a couple of evening and early morning batch
jobs; but, most importantly for their end-users, something happened around
lunch time on this day that happened to thrash the I/O sub-system quite
severely.Running the report for a week,
it became apparent that whatever it was, it happened every week day some time
between 12:00 and 2:00, and did about the same amount of work every time
– so the next step was simply to wait until midday the following day and
chase down any sessions that suddenly got aggressive at around that time.

Footnote:

I couldn’t remember when statspack first appeared in
production, so I had to call on Graham Wood (one of the key players in the
creationof statspack) to check, and he told me that although it had been in use
for a very long time in within Oracle, it finally went production (largely due
to the efforts of Connie Green) in Oracle 8.1.6 – the same release that
gave us the Analytic functions.

Update 2006

I’ve recently discovered a good argument for sticking with something
like the old-style code. Analytic functions are not available
with Standard Edition. If you are in this position there are two important
thoughts to consider.(Correction courtesy of Tom Kyte – they
are part of Standard Edition in 9 – but the second point may still be
useful anyway for performance and clarity of code)

First – the typical old-style code usually has a
predicate like:and se2.snap_id =
se1.snap_id + 1to connect
together the data extracted from the two separate calls to the history tables.
But the snap_id is generated from a sequence and there is a
fair chance that on a busy production system the sequence will be flushed
from the library cache from time to time, losing sequence values. If you
want to depend on getting consecutive snap_ids, you will probably want to
modify the stats$snapshot_id sequence to nocache.

Secondly – if you don’t keep the snapshot history small the work
done in joining a history table to itself may be a bit resource intensive. If
you are running Oracle 9i, you may get better performance, even allowing for
the global temporary table that gets created and populated, by taking advantage
of subquery
factoring (which I think is available in standard edition) to extract
the relevant subset of the history, materialize it, then join it to itself, for
example:

rem

remSample code to
extract a small section of snapshot

remhistory into a
(cursor-level) global temporary table

rem

withevent_history as (

select

/*+
materialize */

ss.snap_id,

ss.snap_time,

se.total_waits,

se.time_waited_micro/10000time_waited

from

perfstat.stats$snapshotss,

perfstat.stats$system_eventse

where

ss.dbid
= 1663614895

andss.instance_number = 1

andss.snap_time between sysdate - 1 and sysdate

/**/

andse.dbid = ss.dbid

andse.instance_number = ss.instance_number

andse.snap_id = ss.snap_id

andse.event = 'db file sequential read'

)

select

ev1.snap_id,

to_char(ev1.snap_time,'Mon-dd
hh24:mi:ss')snap_time,

ev2.total_waits
- ev1.total_waitstotal_waits,

round(ev2.time_waited
- ev1.time_waited,2)time_waited

from

event_historyev1,

event_historyev2

where

ev2.snap_id =
ev1.snap_id + 1

order by

ev1.snap_id

;

The following is an example of the sort of execution you get (in 9.2.0.6,
using explain plan and package dbms_xplan. Notice that statspack
(in this version of Oracle, at least) has an ideal index for uniquely
identifying a wait by name in a given snapshot. Note also how my dbid
and instance_number
have been transferred across the join by means of transitive closure
– always check the execution paths and performance traps of code you
download from the internet before you try it on production.