SQL Stats is important performance data that should be checked during the performance analysis
of a SQL statement. This article describes the method of using time analysis
and resource consumption analysis together to study SQL Stats,
verifying whether a SQL statement has a performance issue,
what if any is the possible root cause of the SQL performance issue,
and what are the corresponding SQL tuning strategies.
It also illuminates a series of SQL Stats patterns in Oracle 11g.

Introduction

SQL Stats, also called as ‘SQL execution statistics’, refers to
the performance data contained in dynamic view V$SQLSTATS
and persisted in AWR and exposed via DBA_HIST_SQLSTAT.
V$SQLSTATS contains real-time performance data for all SQL statements
loaded in the shared pool. For SQL statements with considerable performance impact,
Oracle’s AWR snapshot process frequently saves performance data
in V$SQLSTATS to DBA_HIST_SQLSTAT as historical performance data,
which enables us to see the trend of SQL Stats over a time period.

SQL Stats includes two basic categories of information about how a SQL statement runs on database:

How long a SQL statement has been running and where it spends most of its time

Amount of resources a SQL statement has consumed and the size of the returned result set

The differentiation between these two categories of information lays out the
foundation of our two analytical models for SQL Stats: time analysis and resource consumption analysis.
The ultimate goal of our SQL performance tuning is to ensure SQL statements to use the least time
and resources to accomplish their tasks. Time analysis and resource consumption analysis
provide two different angles to verify whether a SQL statement has performance bottleneck.
SQL tuning opportunities should not be constrained by whether someone complains about
a SQL statement running too long. A skilled performance tuner always keeps close eye
on those SQL statements staying as top resource consumers and applies tuning strategies
as proactive actions. Also, time analysis and resource consumption analysis provide
us complementary messages about how SQL statements scale by striking the balance between time and resource.
This is very useful for tuning SQL statements on database servers
which have resource limits on CPU cycles or inadequate I/O bandwidth.

Methodology

We propose using two analytical models together as a method to analyze
a SQL statement’s performance data in SQL Stats.
Each analytical model includes a set of variables under study.
As shown in table 1 and table 2, those major variables are listed with their corresponding fields in SQL Stats.
Those variables are most commonly seen players in sketching the performance landscape for a SQL statement,
and our discussion will be focused on them.

When using our two analytical models to examine these variables, there are three basic guidelines:

SQL Stats is performance data collected at database instance level.
Oracle’s AWR report has Top SQL section displaying several lists of SQL statements
ranking with respect to variables such as SQL ordered by Elapsed Time, SQL ordered by Buffer Gets, etc.
Those lists not only provides us the entry point of which SQL statements we should start our analyses,
but also enable us to study one SQL statement variables in the context of their database-wide rankings.
It is always beneficial to identify SQL statements with highest performance impact and work on them first.

SQL Stats is time-based data and we need to study the changes incurred to variables as time goes by.
We believe that when a SQL statement is having a performance issue, its performance data will be skewed
compared to those at previously-accepted normal time. Therefore, SQL Stats are more meaningful when
being reviewed in the context of a continuous time frame and the changes in variables over time can be revealed.
It is a common situation that a SQL statement has abrupt changes in one or two variables within a short period of time,
signaling a performance issue.

Each analytical model involves multiple variables and it is important to investigate
the correlating relationships among those variables. We believe that some variables are driving variables,
while others are dependent on these driving variables and usually change in a synchronized manner with them.
The changes in variables can be better understood if we put variables in the context of their correlated relationships.
The SQL tuning strategies are more effective if they are targeted at the driving variables.

We are going to elaborate our analytic technique in details.
All examples provided are from Oracle 11g database and shown in the graphic interface of DBspeed (www.dbspeed.com)
for better presentation purposes. The corresponding SQL scripts are also downloadable at its website.

SQL with SQL Plan Change

SQL plan change is famous for causing performance issues. However, SQL plan is not included
in either of our two analytical models. Do we miss anything here? Not really.
A SQL plan change is considered having negative performance impact only if the new plan
requires more resources and/or more time for running the same SQL statement. As shown in figure 1,
the performance impact of SQL plan change needs to be evaluated by our two analytical models so that
we can compare different plans for the same SQL statement in order to verify which plan is the most optimal.

Detecting SQL plan change provides us another good entry point to start our analyses.
As shown in figure 2, SQL Stats not only help us discover when a new plan came into life, but also enable us
to identify the flip-flop situation where a SQL statement frequently switches between multiple plans.

SQL with High Elapsed Time

Elapsed time, usually called as response time, is the total time a SQL statement uses for its executions.
Time analysis is based on the theoretical model of

elapsed time = service time + wait time

The wait time is designed to be expressed in variables of cluster wait time,
concurrency wait time, and application wait time in order to demonstrate the impact of wait events.
We believe that under optimal situations, a SQL statement would spend most of it time on either CPU or I/O,
and not on these three types of wait classes. These variables should not be significant and if they are,
could be indicators of a performance issue. One example is shown in figure 3.
Cluster wait time is specific to SQL statements running on RAC databases and
should be trivial in a healthy well-performing RAC configuration. A sudden increase of cluster wait time
could be signal of cluster wide issues such as node eviction and cluster reconfiguration.
Another example is shown in figure 4. Locking issues are one of the most important contributors
to the increase of application wait time.

However, it has been noticed that SQL Stats shows limits in collecting accurate timing for concurrency
wait time, which leads to situations where high elapsed time is not accountable by subcomponents.
Figure 5 and 6 show one example when a SQL statement had latch contention on database buffer cache.
Figure 7 and 8 display another example when a SQL statement had latch contention on library cache.
Both wait events belong to concurrency wait class but their wait time did not get recorded in SQL Stats.
It is important to keep in mind that SQL Stats is one aspect of performance data for our performance analysis,
and we should always cross check our analyses using wait events, ASH reports, AWR report, SQL trace, etc.

The service time primarily refers to CPU time and user I/O wait time.
A SQL statement is considered as either CPU bound or I/O bound depending on whether its CPU time
or I/O wait time is the majority part of its elapsed time. We believe that the subcomponent
which makes up the majority part of the elapsed time is the potential performance bottleneck.
Resource consumption analysis can provide more diagnostic information: CPU time can be driven up
by various other resource related variables,
while I/O wait time is closely related to the disk reads variable which points to physical I/O.

SQL with High Buffer Gets

When Oracle requires a data block, it first checks to see whether this data block is already in database buffer cache.
If the data block is found there, Oracle will read it. This process is called as buffer get. If the data block is not found,
Oracle will need to bring the data block from disk into buffer cache first. This process is called disk read.
From Oracle database's standpoint, buffer get is logical I/O, while disk read is physical I/O.
A buffer get may or may not require a disk read.

SQL statements with high buffer gets are always a concern for database performance tuners. This is because of the following reasons:

A large number of buffer gets usually also cause a large number of disk reads when the requested buffers
are not in the buffer cache, resulting in a cache miss and an I/O to load that block to the buffer cache.
This leads to I/O workload increase and I/O path bottleneck. When tables with a large amount of data are accessed,
or database buffer cache is inadequate, or a combination of both factors, the SQL statements with high buffer gets
usually show high disk reads because not all data blocks will be able to fit in database buffer cache and
Oracle needs to swap data in and out to disk.

Under situations where the majority of data blocks already exist in database buffer cache and disk reads are low,
a large amount of buffer gets will put pressure on the Oracle’s memory structure, causing contention and burning CPU cycles.
As shown in figure 9, it is a common situation that SQL statements with high buffer gets also have high CPU time,
and more easily encounter buffer cache-related wait events such as 'free buffer waits', 'buffer busy wait', etc.

The key tuning strategies for SQL statements with high buffer gets include:

From the database's standpoint, disk reads are physical I/O which bring data blocks
from disk into Oracle’s memory structure. Disk reads require the processing power of I/O subsystem
and also may need access to physical storage tier.
Therefore, disk reads can have compounding effects on database performance.

We need to keep an open mind about the relationship between buffer gets and disk reads.
One may or may not lead to the other, and vice versa.

A buffer get may not require a disk read, but high buffer gets can cause high disk reads.

When Oracle uses direct path read operation, the disk reads incurred may or may not count for buffer gets.

Direct path read occurs when Oracle reads data blocks from disk directly into process’s private memory (PGA),
bypassing database buffer cache. Oracle can use direct path read against either datafile or tempfile.
When Oracle performs direct path read against datafile in situations such as reading LOB segments,
scanning a table or index using parallel queries, performing full table scan on a (sufficiently)
large table (11g new feature), etc., the disk reads incurred are still counted for buffer gets
and high disk reads will cause high buffer gets. Figure 10 shows
the SQL Stats pattern for a parallel query performing full table scan using direct path reads.

Another common situation where Oracle use direct path read is reading sort segments on
tempfile and the disk reads thereby are not counted for buffer gets. For SQL statements which perform
a large amount of data sorting, it is a common situation to see more disk reads than buffer gets and also,
high CPU time due to sorting algorithm. An example is shown in figure 11.

The SQL statements with high disk reads are usually I/O bound. Our tuning strategies include:

Tune database parameters such as filesystemio_options and db_writer_processes to facilitate
the most efficient I/O processing at OS level.

Ensure a healthy I/O subsystem and watch the average wait time for those I/O related wait events such as ‘db file sequential read’
or ‘db file scattered read’ (My Oracle Support Note 1275596.1).

If it is the buffer gets which cause high disk reads, we use the same tuning strategies as what we have for high buffer gets.

Pay special attention to disk path read operation. Oracle 11g has changed the heuristics
to choose between direct path reads or reads through database buffer cache for serial table scans
(My Oracle Support Note 793845.1), which makes databases more likely to have SQL statements with high disk reads.
We not only need to ensure optimal SQL plans for SQL statements,
but also need to verify if the direct path reads show benefits than reads through buffer cache.

SQL with High CPU Time

CPU spikes are tightly coupled with database performance issues because CPU spikes are signs of contention
and starvation of system resources. When troubleshoot CPU spike issues on database servers, SQL statement with high CPU time
should be checked unless we know that the top CPU consumers come from outside of database.

CPU bound SQL statements have potential to be the root cause of CPU spikes. The key tuning strategies include:

SQL statements which adopt algorithms to consume CPU cycles aggressively ( sort, merge, hash, etc.)
can have high CPU time and directly contribute to CPU spikes. As shown in figure 12 and 13, merge join cartesian
is famous for causing CPU spikes on database servers. In order to reduce CPU time for this type of SQL statements,
we need to ensure optimal SQL plans and try to use less CPU intensive algorithms to reduce CPU usage.

CPU time can be driven up by other variables such as high buffer gets or high parse calls due
to contention on Oracle’s memory structure. To address CPU spike issues caused by this type of SQL statements,
we need to identify driving variables and apply corresponding tuning strategies.

SQL with High Parse Calls

The parse calls in SQL Stats refers to the number of times Oracle parses a SQL statement before executing it.
The purpose of a parse call is to load the SQL statement into shared pool, optimize it and generate SQL plan for execution.

Oracle doesn’t need to issue parse call before each execution and also, there are different types of parse calls.
When Oracle receives a SQL statement and try to run it, the possibilities are:

Oracle searches the shared pool but cannot find the same SQL statement there.
Oracle has to optimize and generate a SQL plan.
This parse call is a hard parse.

Oracle searches the shared pool and finds the same SQL statement there. However, Oracle needs to test if it can be reused.
If that cursor cannot be reused for any reason, Oracle will create another child cursor,
optimize it and possibly generate another SQL plan.
This parse call is called cursor authentication.

Oracle searches the shared pool and finds the same SQL statement there.
Oracle simply reuses the SQL plan and there is no optimization.
This parse call is called a soft parse.

Oracle uses the session cursor cache or PL/SQL cursor cache as a shortcut to find a
SQL statement’s location in the shared pool and use it, eliminating the need to search shared pool and optimization.
There is no parse call in this situation.

As shown in figure 14, SQL Stats provides another three variables to help us understand the types of parse calls.
The loads and invalidation tells whether a cursor is invalidated and gets reloaded, which helps us to verify
whether a parse call becomes a hard parse. The version count tells how many child cursors a SQL statement
has in the shared pool, indicating whether cursor authentication creates child cursors.

We need to investigate SQL statements with high parse calls because parse calls,
no matter if they are soft parses or hard parses will have a performance impact.
Excessive hard parses cause latch contention on the shared pool and therefore lead to CPU spikes on a database server.
Our tuning goal is to achieve parse once and execute many times scenario.

The tuning strategies can be applied either on the application side or on the database side:

On the application side, commonly-adopted techniques to minimize the impact of parsing include
using prepared statements with bind variables to avoid hard parses or using client-side cursor caching to avoid parse calls.

On the database side, adjusting database parameters such as cursor_sharing and session_cached_cursors to adjust parsing behavior.

SQL with High Executions

The executions in SQL stats refers to how many times a SQL statement runs against the database.
A SQL statement with high executions has potential to incur performance issues because:

The increase of executions can drive up all other resource variables, as shown in figure 15.

High executions can cause more contention and concurrency issues.

The key tuning strategies include:

We need to keep an eye on those SQL statements which show a pattern of high executions
and verify them in the context of application logic. What is the business logic behind making so many calls of the same SQL statement?
The OLTP type of database has the characteristic of processing short transactions and SQL statements
with high executions are unavoidable by design. In that case, ensuring optimal SQL plans for those SQL statements become critical.

We also need pay attention to situations where SQL statements suddenly have significant increase in the number
of executions within a short period of time. This raises a red flag pointing to performance issues if scalability is constrained.
Some SQL statements demand a large amount of resources to run, and
the change of their executions could also impact performance database wide.
While this is a more common situation at data warehouse environment,
the cause should be investigated. For example, there could be a temporary jump in executions for processing a backlog.

SQL with No Rows Processed

The rows processed is the total number of rows in the result data set returned by a SQL statement.
Some SQL statements have their rows processed directly linked to their executions,
while others show no pattern because different input parameters produce different result sets.

We need to pay attention to those SQL statements with no rows processed and understand them in the context of application logic.
Some SQL statements return no rows by design. However, some SQL statements will not produce a result data set due to unexpected situations,
such as when errors occur but applications are not aware of those errors.
If business logic has changed over time and there is no data to be processed anymore..
We need to investigate those situations when we observe SQL statements with no row (s) processed,
as shown in figure 16. Running the correct SQL statements ensures applications are efficient,
benefiting the entire database, reducing unnecessary workloads.

SQL statements with no row (s) processed do not necessarily show rows processed as zero as shown by the
two following examples For those SQL statements whose executions should exhibit a ratio with rows processed,
it is easier to detect whether some rows don’t get processed. As shown in figure 17, the SQL statement was
a insert statement whose executions should have one to one relationship with rows processed. However,
it did not get processed correctly every time and ended up with many less rows processed than executions.
Further investigation turns out that a unique constraint on the target table prevented insertion of duplicate
values but the application code was careless about what actual values were to be inserted.

Conclusion

The objective of SQL performance tuning is to ensure that a SQL statement uses the least amount
of time and resources to accomplish its task. Time analysis and resource consumption analysis provides
us different angles to understand SQL Stats, identify a problem SQL statement, clarify problem symptoms
as well as lead to root causes, and enable us to utilize different tuning strategies to provide effective solutions.
Both analytical models examine multiple variables in the context of their database instance wide ranking,
continuous time frame and correlational relationships. We should also keep in mind that SQL tuning is
a joint effort of performance tuning from both the database side as well as the application side.