All about Performance of Oracle and other relational databases on Linux and UNIX based systems. How to measure it, analyse it, tune it, and manage it over time.

Thursday, 21 November 2013

Basic Performance Analysis using AWR data

I've previously covered a number of queries that can be combined
together to pull out key statistics from the Oracle AWR data for each
snapshot it makes, which you could use to analyse the performance of a
system. One example would be to use this to look at how the workload
changes during the day, and to identify the period of peak activity.
Another use would be to see if there were any periods experiencing
significantly higher waits than at other times.

I'll now go
through how you can use some of the AWR data to do an overall
performance analysis of a system. But before we do that, we need to
review some basics on performance tuning, just to clarify some things
and avoid potential confusion.

Performance Tuning Basics

I'm of the same school of tuning as people like Cary Millsap
and Jeff Holt and others - generally speaking you should be tuning
specific performance problems as experienced by individual Oracle
sessions, and not tuning the system as a whole. This is because any
"improvements" you make to the system that do not benefit the slow
sessions are completely wasted and pointless. For instance, the system
may be experiencing slow disk I/O's, but it might be that the reported
slow interactive response time by some users is due to very high CPU
usage resulting from a poor execution plan for the particular SQL
statement that session is executing. Making the disk I/O's happen
quicker might "improve" things for the system as a whole, but it will
have no impact at all on the CPU bound SQL statement that is the cause
of slow response times on an end user's screen.

That said, there
are times where system level tuning is appropriate or when you can only
do system level tuning, and that is where AWR can help as it gathers
system wide activity data. For session level tuning you need to look
elsewhere - such as ASH (Active Session History) or using SQL Trace.

There are some big assumptions being made when we do system level tuning:

All sessions are executing the same or similar SQL statements

So a system wide "average" view across all sessions is valid

Any "poor performance" on the system impacts all sessions equally

So even one rogue SQL statement can slow down all other SQL statements being executed from other sessions

The system is under a high enough load that performance problems are due to interactions between the SQL statements

It is assumed that poorly performing individual SQL statements would have been identified earlier under initial testing

Providing these assumptions or similar are true, then you can do system level tuning.

System Level Tuning Measurements

A
computer system exists to do work, and its performance is really about
how long it takes to to that work and which resources it uses while
doing it. Performance is an issue when a task takes too long to
complete.

We need to measure both sides of this – the input work
requests, and the output usage of the system resources that results. We
can only do performance tuning properly when we know both of these, as
this lets us see the impact of any changes we make - does resource usage
go down or not for the same input workload?

Measuring the input
workload is best done at business or application level transactions.
And this is best achieved by instrumenting the application software
itself to record both the type of transaction it is performing and the
elapsed time it takes when submitted. However, very few applications if
any are instrumented this way.

This leaves a database level
measurement as the only viable and consistently available way of
recording the input workload, and in my view the best available
measurement is the number of SQL statements executed per second. While
this is very imperfect as a measure of the input workload, it is the
nearest we are going to get to it, and all input application
transactions result in SQL statements being executed. So it does
correlate with input workload.

For a measurement of workload on
the system - the output resulting from the execution of the SQL
statements - the best measurement is probably Average Active Sessions.
AAS is simply the Database Time ("DB Time" from System Time Model)
divided by elapsed time for the period over which Database Time was
measured. The Database Time value is a measurement of the amount of
elapsed time sessions were active executing SQL statements, and includes
both active execution time (running on a CPU) and waiting time (for
disk I/O or any other wait event). AAS indicates how many sessions were
active at the same time executing SQL statements i.e. the level of
concurrency on your system.

AAS correlates with the number of
CPU's on your system - if all SQL statements executed with no waiting at
all they would only use up CPU resource and your transaction rate would
be limited by the number of CPU's in the system. If AAS is less than
the number of CPU's in your system then you are not reaching capacity.
If AAS is close to or more than the number of CPU's then you have an
overloaded system and waiting of some form or another is occurring.

To
make real use of the AAS value we also want to know the percentage of
time spent waiting during each period i.e. total wait time as a
percentage of database active time (DB time again). The Percentage Wait
(which I label Wait%) indicates the efficiency of the SQL statement
execution - if this waiting was eliminated then that wait time would be
removed from the elapsed time of each SQL statement executed. This is
only relevant when AAS is high enough. When the level of concurrency is
very low you can get various anomalies, and there will always be some
waiting somewhere. Again remember the assumptions I stated earlier -
the system must be under a high enough load to experience performance
problems due to the combined workload itself.

That's it - 3
simple measurements should be enough to tell you whether your system is
overloaded or not, and whether it is performing efficiently. These can
be easily extracted from AWR (see later in this post), and can graphed
in your favourite spreadsheet tool to spot trends and anomalies.

Using these measurements

If
you have a system wide bottleneck / capacity limit then checking on the
total system resource usage during a period of time will hopefully
identify any such bottlenecks. As stated, if AAS is high enough (I
would say more than half of the number of CPU's), and the Wait% is also
high (say 50% or more), then you have poor performance and it can be
improved.

At this point I would use the expanded version of the
AWR query that I have built up over the previous posts to get all of the
data out of AWR for each snapshot, and again into a spreadsheet or
something else. Within this AWR data is a breakdown of the Oracle Wait
Time by Wait Class, which tells you which "type" of wait was consuming
most of the wait time. This can either primarily be Disk (I/O),
Network, Cluster, Commit (Redo), or Concurrency (locking).

Knowing
the wait class you can then query AWR for SQL statement execution and
sum them over the wait class identified earlier, and sort by the wait
class time summed. I showed how to do this in my previous post, summing
the number of disk reads per SQL statement executed in each snapshot.
This way you easily get to see the top SQL by that wait type, and see
how much time they spent waiting relative to the total wait time.

Equally
you could just do all SQL statements by their total wait time, to see
those that experienced the worst waits whatever type of wait they were.
Wait time for a SQL statement would be calculated as its Elapsed time
minus the CPU time.

AWR System Performance Query

Here is a
query that gives you just the 3 key measurements mentioned before for a
system from the AWR data. It reports all the measurements for yesterday
- modify the date range constraint to report on different periods.

Search This Blog

About Me

I am an Oracle Database professional with over 20 years experience of using Oracle on large, complex, high performance database systems. Excellent knowledge of Oracle internals and scalability, especially on UNIX and Linux systems.