Overview of Data Gathering

To effectively diagnose performance problems, statistics must be available. Oracle generates many types of cumulative statistics for the system, sessions, and individual SQL statements. Oracle also tracks cumulative statistics on segments and services. When analyzing a performance problem in any of these scopes, you typically look at the change in statistics (delta value) over the period of time you are interested in. Specifically, you look at the difference between the cumulative value of a statistic at the start of the period and the cumulative value at the end.

Cumulative values for statistics are generally available through dynamic performance views, such as the V$SESSTAT and V$SYSSTAT views. Note that the cumulative values in dynamic views are reset when the database instance is shutdown. The Automatic Workload Repository (AWR) automatically persists the cumulative and delta values for most of the statistics at all levels except the session level. This process is repeated on a regular time period and the result is called an AWR snapshot. The delta values captured by the snapshot represent the changes for each statistic over the time period. See "Automatic Workload Repository".

Another type of statistic collected by Oracle is called a metric. A metric is defined as the rate of change in some cumulative statistic. That rate can be measured against a variety of units, including time, transactions, or database calls. For example, the number database calls per second is a metric. Metric values are exposed in some V$ views, where the values are the average over a fairly small time interval, typically 60 seconds. A history of recent metric values is available through V$ views, and some of the data is also persisted by AWR snapshots.

A third type of statistical data collected by Oracle is sampled data. This sampling is performed by the active session history (ASH) sampler. ASH samples the current state of all active sessions. This data is collected into memory and can be accessed by a V$ view. It is also written out to persistent store by the AWR snapshot processing. See "Active Session History (ASH)".

A powerful tool for diagnosing performance problems is the use of statistical baselines. A statistical baseline is collection of statistic rates usually taken over time period where the system is performing well at peak load. Comparing statistics captured during a period of bad performance to a baseline helps discover specific statistics that have increased significantly and could be the cause of the problem.

AWR supports the capture of baseline data by enabling you to specify and preserve a pair or range of AWR snapshots as a baseline. Carefully consider the time period you choose as a baseline; the baseline should be a good representation of the peak load on the system. In the future, you can compare these baselines with snapshots captured during periods of poor performance.

Oracle Enterprise Manager is the recommended tool for viewing both real time data in the dynamic performance views and historical data from the AWR history tables. Enterprise manager also is able to capture operating system and network statistical data that can be correlated with AWR data.

Database Statistics

Database statistics provide information on the type of load on the database, as well as the internal and external resources used by the database. This section describes some of the more important statistics.

Wait Events

Wait events are statistics that are incremented by a server process/thread to indicate that it had to wait for an event to complete before being able to continue processing. Wait event data reveals various symptoms of problems that might be impacting performance, such as latch contention, buffer contention, and I/O contention.

The wait classes are based on a common solution that usually applies to fixing a problem with the wait event. For example, exclusive TX locks are generally an application level issue and HW locks are generally a configuration issue.

The following list includes common examples of the waits in some of the classes:

Time Model Statistics

When tuning an Oracle system, each component has its own set of statistics. To look at the system as a whole, it is necessary to have a common scale for comparisons. Because of this, most Oracle advisories and reports describe statistics in terms of time. In addition, the V$SESS_TIME_MODEL and V$SYS_TIME_MODEL views provide time model statistics. Using the common time instrumentation helps to identify quantitative effects on the database operations.

The most important of the time model statistics is DBtime. This statistics represents the total time spent in database calls and is a indicator of the total instance workload. It is calculated by aggregating the CPU and wait times of all sessions not waiting on idle wait events (non-idle user sessions).

DBtime is measured cumulatively from the time that the instance was started. Because DBtime it is calculated by combining the times from all non-idle user sessions, it is possible that the DBtime can exceed the actual time elapsed since the instance started up. For example, a instance that has been running for 30 minutes could have four active user sessions whose cumulative DBtime is approximately 120 minutes.

The objective for tuning an Oracle system could be stated as reducing the time that users spend in performing some action on the database, or simply reducing DBtime. Other time model statistics provide quantitative effects (in time) on specific actions, such as logon operations and hard and soft parses.

Active Session History (ASH)

The V$ACTIVE_SESSION_HISTORY view provides sampled session activity in the instance. Active sessions are sampled every second and are stored in a circular buffer in SGA. Any session that is connected to the database and is waiting for an event that does not belong to the Idle wait class is considered as an active session. This includes any session that was on the CPU at the time of sampling.

Each session sample is a set of rows and the V$ACTIVE_SESSION_HISTORY view returns one row for each active session per sample, returning the latest session sample rows first. Because the active session samples are stored in a circular buffer in SGA, the greater the system activity, the smaller the number of seconds of session activity that can be stored in the circular buffer. This means that the duration for which a session sample appears in the V$ view, or the number of seconds of session activity that is displayed in the V$ view, is completely dependent on the database activity.

As part of the Automatic Workload Repository (AWR) snapshots, the content of V$ACTIVE_SESSION_HISTORY is also flushed to disk. Because the content of this V$ view can get quite large during heavy system activity, only a portion of the session samples is written to disk.

By capturing only active sessions, a manageable set of data is represented with the size being directly related to the work being performed rather than the number of sessions allowed on the system. Using the Active Session History enables you to examine and perform detailed analysis on both current data in the V$ACTIVE_SESSION_HISTORY view and historical data in the DBA_HIST_ACTIVE_SESS_HISTORY view, often avoiding the need to replay the workload to gather additional performance tracing information. The data present in ASH can be rolled up on various dimensions that it captures, including the following:

Operating System Statistics

Operating system statistics provide information on the usage and performance of the main hardware components of the system, as well as the performance of the operating system itself. This information is crucial for detecting potential resource exhaustion, such as CPU cycles and physical memory, and for detecting bad performance of peripherals, such as disk drives.

Operating system statistics are only an indication of how the hardware and operating system are working. Many system performance analysts react to a hardware resource shortage by installing more hardware. This is a reactionary response to a series of symptoms shown in the operating system statistics. It is always best to consider operating system statistics as a diagnostic tool, similar to the way many doctors use body temperature, pulse rate, and patient pain when making a diagnosis. To help identify bottlenecks, gather operating system statistics for all servers in the system under performance analysis.

CPU Statistics

CPU utilization is the most important operating system statistic in the tuning process. Get CPU utilization for the entire system and for each individual CPU on multi-processor environments. Utilization for each CPU can detect single-threading and scalability issues.

Most operating systems report CPU usage as time spent in user space or mode and time spent in kernel space or mode. These additional statistics allow better analysis of what is actually being executed on the CPU.

On an Oracle data server system, where there is generally only one application running, the server runs database activity in user space. Activities required to service database requests (such as scheduling, synchronization, I/O, memory management, and process/thread creation and tear down) run in kernel mode. In a system where all CPU is fully utilized, a healthy Oracle system runs between 65% and 95% in user space.

The V$OSSTAT view captures machine level information in the database making it easier for you to determine if there are hardware level resource issues. The V$SYS_TIME_MODEL supplies statistics on the CPU usage by the Oracle database. Using both sets of statistics enable you to determine whether the Oracle database or other system activity is the cause of the CPU problems.

Virtual Memory Statistics

Virtual memory statistics should mainly be used as a check to validate that there is very little paging or swapping activity on the system. System performance degrades rapidly and unpredictably when paging or swapping occurs.

Individual process memory statistics can detect memory leaks due to a programming failure to deallocate memory taken from the process heap. These statistics should be used to validate that memory usage does not increase after the system has reached a steady state after startup. This problem is particularly acute on shared server applications on middle tier machines where session state may persist across user interactions, and on completion state information that is not fully deallocated.

Disk Statistics

Because the database resides on a set of disks, the performance of the I/O subsystem is very important to the performance of the database. Most operating systems provide extensive statistics on disk performance. The most important disk statistics are the current response time and the length of the disk queues. These statistics show if the disk is performing optimally or if the disk is being overworked.

Measure the normal performance of the I/O system; typical values for a single block read range from 5 to 20 milliseconds, depending on the hardware used. If the hardware shows response times much higher than the normal performance value, then it is performing badly or is overworked. This is your bottleneck. If disk queues start to exceed two, then the disk is a potential bottleneck of the system.

Network Statistics

Network statistics can be used in much the same way as disk statistics to determine if a network or network interface is overloaded or not performing optimally. In today's networked applications, network latency can be a large portion of the actual user response time. For this reason, these statistics are a crucial debugging tool. See "Using Dynamic Performance Views for Network Performance".

Operating System Data Gathering Tools

Table 5-1 shows the various tools for gathering operating statistics on UNIX. For Windows NT/2000, use the Performance Monitor tool.

Table 5-1 UNIX Tools for Operating Statistics

Component

UNIX Tool

CPU

sar, vmstat, mpstat, iostat

Memory

sar, vmstat

Disk

sar, iostat

Network

netstat

Interpreting Statistics

When initially examining performance data, you can formulate potential theories by examining your statistics. One way to ensure that your interpretation of the statistics is correct is to perform cross-checks with other data. This establishes whether a statistic or event is really of interest.

Some pitfalls are discussed in the following sections:

Hit ratios

When tuning, it is common to compute a ratio that helps determine whether there is a problem. Such ratios include the buffer cache hit ratio, the soft-parse ratio, and the latch hit ratio. These ratios should not be used as 'hard and fast' identifiers of whether there is or is not a performance bottleneck. Rather, they should be used as indicators. In order to identify whether there is a bottleneck, other related evidence should be examined. See "Calculating the Buffer Cache Hit Ratio".

Wait events with timed statistics

Setting TIMED_STATISTICS to true at the instance level directs the Oracle server to gather wait time for events, in addition to wait counts already available. This data is useful for comparing the total wait time for an event to the total elapsed time between the performance data collections. For example, if the wait event accounts for only 30 seconds out of a two hour period, then there is probably little to be gained by investigating this event, even though it may be the highest ranked wait event when ordered by time waited. However, if the event accounts for 30 minutes of a 45 minute period, then the event is worth investigating. See "Wait Events Statistics".

Note:

Timed statistics are automatically collected for the database if the initialization parameter STATISTICS_LEVEL is set to TYPICAL or ALL. If STATISTICS_LEVEL is set to BASIC, then you must set TIMED_STATISTICS to TRUE to enable collection of timed statistics. Note that setting STATISTICS_LEVEL to BASIC disables many automatic features and is not recommended.

If you explicitly set DB_CACHE_ADVICE, TIMED_STATISTICS, or TIMED_OS_STATISTICS, either in the initialization parameter file or by using ALTER_SYSTEM or ALTERSESSION, the explicitly set value overrides the value derived from STATISTICS_LEVEL.

Comparing Oracle statistics with other factors

When looking at statistics, it is important to consider other factors that influence whether the statistic is of value. Such factors include the user load and the hardware capability. Even an event that had a wait of 30 minutes in a 45 minute snapshot might not be indicative of a problem if you discover that there were 2000 users on the system, and the host hardware was a 64 node machine.

Wait events without timed statistics

If TIMED_STATISTICS is false, then the amount of time waited for an event is not available. Therefore, it is only possible to order wait events by the number of times each event was waited for. Although the events with the largest number of waits might indicate the potential bottleneck, they might not be the main bottleneck. This can happen when an event is waited for a large number of times, but the total time waited for that event is small. The converse is also true: an event with fewer waits might be a problem if the wait time is a significant proportion of the total wait time. Without having the wait times to use for comparison, it is difficult to determine whether a wait event is really of interest.

Idle wait events

Oracle uses some wait events to indicate if the Oracle server process is idle. Typically, these events are of no value when investigating performance problems, and they should be ignored when examining the wait events. See "Idle Wait Events".

Computed statistics

When interpreting computed statistics (such as rates, statistics normalized over transactions, or ratios), it is important to cross-verify the computed statistic with the actual statistic counts. This confirms whether the derived rates are really of interest: small statistic counts usually can discount an unusual ratio. For example, on initial examination, a soft-parse ratio of 50% generally indicates a potential tuning area. If, however, there was only one hard parse and one soft parse during the data collection interval, then the soft-parse ratio would be 50%, even though the statistic counts show this is not an area of concern. In this case, the ratio is not of interest due to the low raw statistic counts.

The Automatic Workload Repository (AWR) collects, processes, and maintains performance statistics for problem detection and self-tuning purposes. This data is both in memory and stored in the database. The gathered data can be displayed in both reports and views. See "Workload Repository Views" and "Workload Repository Reports".

The statistics collected and processed by AWR include:

Object statistics that determine both access and usage statistics of database segments

Time model statistics based on time usage for activities, displayed in the V$SYS_TIME_MODEL and V$SESS_TIME_MODEL views

Some of the system and session statistics collected in the V$SYSSTAT and V$SESSTAT views

SQL statements that are producing the highest load on the system, based on criteria such as elapsed time and CPU time

Active Session History (ASH) statistics, representing the history of recent sessions activity

AWR automatically generates snapshots of the performance data once every hour and collects the statistics in the workload repository. You can also manually create snapshots, but this is usually not necessary. The data in the snapshot interval is then analyzed by the Automatic Database Diagnostic Monitor (ADDM). See "Automatic Database Diagnostic Monitor".

AWR compares the difference between snapshots to determine which SQL statements to capture based on the effect on the system load. This reduces the number of SQL statements that need to be captured over time.

The space consumed by the Automatic Workload Repository is determined by several factors:

Number of active sessions in the system at any given time

Snapshot interval

The snapshot interval determines the frequency at which snapshots are captured. A smaller snapshot interval increases the frequency, which increases the volume of data collected by the Automatic Workload Repository.

Historical data retention period

The retention period determines how long this data is retained before being purged. A longer retention period increases the space consumed by the Automatic Workload Repository.

The Automatic Workload Repository space consumption can be reduced by the increasing the snapshot interval and reducing the retention period. When reducing the retention period, note that several Oracle self-managing features depend on AWR data for proper functioning. Not having enough data can affect the validity and accuracy of these components and features, including the following:

Automatic Database Diagnostic Monitor

SQL Tuning Advisor

Undo Advisor

Segment Advisor

If possible, Oracle Corporation recommends that you set the AWR retention period large enough to capture at least one complete workload cycle. If your system experiences weekly workload cycles, such as OLTP workload during weekdays and batch jobs during the weekend, you do not need to change the default AWR retention period of 7 days. However if your system is subjected to a monthly peak load during month end book closing, you may have to set the retention period to one month.

Under exceptional circumstances, the automatic snapshot collection can be completely turned off by setting the snapshot interval to 0. Under this condition, the automatic collection of the workload and statistical data is stopped and much of the Oracle self-management functionality is not operational. In addition, you will not be able to manually create snapshots. For this reason, Oracle Corporation strongly recommends that you do not turn off the automatic snapshot collection.

It is important that you create baselines from the Automatic Workload Repository to capture typical performance periods. The baselines, which are specified by a range of snapshots, are preserved for comparisons with other similar workload periods when performance problems occur.

The STATISTICS_LEVEL initialization parameter must be set to the TYPICAL or ALL to enable the Automatic Workload Repository. If the value is set to BASIC, you can manually capture AWR statistics using procedures in the DBMS_WORKLOAD_REPOSITORY package. However, because setting the STATISTICS_LEVEL parameter to BASIC turns off in-memory collection of many system statistics, such as segments statistics and memory advisor information, manually captured snapshots will not contain these statistics and will be incomplete.

In addition to the data collection by the AWR, Automatic Optimizer Statistics Collection is performed by the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC procedure as a scheduled job of the Maintenance Window. See "Automatic Statistics Gathering".

Perform a variety of tasks through the pull-down Actions menu, including creating additional snapshots, preserved snapshot sets from an existing range of snapshots, or an ADDM task to perform analysis on a range of snapshots or a set of preserved snapshots.

To modify AWR settings, click the Edit button. On the EditSettings page, you can set the SnapshotRetention period and SnapshotCollection interval.

See Also:

Oracle Enterprise Manager Concepts and Oracle Enterprise Manager online help for information about monitoring and diagnostic tools available with Oracle Enterprise Manager

Managing Snapshot and Baseline Data with APIs

While the primary interface for managing the Automatic Workload Repository is the Oracle Enterprise Manager Database Control, monitoring functions can be managed with procedures in the DBMS_WORKLOAD_REPOSITORY package.

Snapshots are automatically generated for an Oracle database; however, you can use DBMS_WORKLOAD_REPOSITORY procedures to manually create, drop, and modify the snapshots and baselines that are used by automatic database diagnostic monitoring. Snapshots and baselines are sets of historical data for specific time periods that are used for performance comparisons.

Creating Snapshots

You can manually create snapshots with the CREATE_SNAPSHOT procedure if you want to capture statistics at times different than those of the automatically generated snapshots. For example:

BEGIN
DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT ();
END;
/

In this example, a snapshot for the instance is created immediately with the flush level specified to the default flush level of TYPICAL. You can view this snapshot in the DBA_HIST_SNAPSHOT view.

Dropping Snapshots

You can drop a range of snapshots using the DROP_SNAPSHOT_RANGE procedure. To view a list of the snapshot Ids along with database Ids, check the DBA_HIST_SNAPSHOT view. For example, you can drop the following range of snapshots:

In the example, the range of snapshot Ids to drop is specified from 22 to 32. The optional database identifier is 3310949047. If you do not specify a value for dbid, the local database identifier is used as the default value.

Active Session History data (ASH) that belongs to the time period specified by the snapshot range is also purged when the DROP_SNAPSHOT_RANGE procedure is called.

Modifying Snapshot Settings

You can adjust the interval and retention of snapshot generation for a specified database Id, but note that this can affect the precision of the Oracle diagnostic tools.

The INTERVAL setting affects how often in minutes that snapshots are automatically generated. The RETENTION setting affects how long in minutes that snapshots are stored in the workload repository. To adjust the settings, use the MODIFY_SNAPSHOT_SETTINGS procedure. For example:

In this example, the retention period is specified as 43200 minutes (30 days) and the interval between each snapshot is specified as 30 minutes. If NULL is specified, the existing value is preserved. The optional database identifier is 3310949047. If you do not specify a value for dbid, the local database identifier is used as the default value. You can check the current settings for your database instance with the DBA_HIST_WR_CONTROL view.

Creating and Dropping Baselines

A baseline is created with the CREATE_BASELINE procedure. A baseline is simply performance data for a set of snapshots that is preserved and used for comparisons with other similar workload periods when performance problems occur. You can review the existing snapshots in the DBA_HIST_SNAPSHOT view to determine the range of snapshots that you want to use. For example:

In this example, 270 is the start snapshot sequence number and 280 is the end snapshot sequence. peakbaseline is the name of baseline and 3310949047 is an optional database identifier. If you do not specify a value for dbid, the local database identifier is used as the default value.

The system automatically assign a unique baseline Id to the new baseline when the baseline is created. The baseline Id and database identifier are displayed in the DBA_HIST_BASELINE view.

The pair of snapshots associated with the baseline are retained until you explicitly drop the baseline. You can drop a baseline with the DROP_BASELINE procedure. For example:

In the example, peakbaseline is the name of baseline and FALSE specifies that only the baseline is dropped. TRUE specifies that drop operation should remove the pair of snapshots associated with baseline along with the baseline. 3310949047 is an optional database identifier.

Workload Repository Views

Typically, you would view the AWR data through Oracle Enterprise Manager screens or AWR reports. However, you can view the statistics with the following views:

Workload Repository Reports

You can view the AWR reports with Oracle Enterprise Manager or by running the following SQL scripts:

The awrrpt.sql SQL script generates an HTML or text report that displays statistics for a range of snapshot Ids.

The awrrpti.sql SQL script generates an HTML or text report that displays statistics for a range of snapshot Ids for a specified database and instance.

To run an AWR report, a user must be granted the DBA role.

The reports are divided into multiple sections. The HTML report includes links that can be used to navigate quickly between sections. The content of the report contains the workload profile of the system for the selected range of snapshots.

Note:

If you run a report on a database that does not have any workload activity during the specified range of snapshots, calculated percentages for some report statistics can be less than 0 or greater than 100. This result simply means that there is no meaningful value for the statistic.

Running the awrrpt.sql Report

To generate a text report for a range of snapshot Ids, run the awrrpt.sql script at the SQL prompt:

@$ORACLE_HOME/rdbms/admin/awrrpt.sql

First, you need to specify whether you want an HTML or a text report.

Enter value for report_type: text

Specify the number days for which you want to list snapshot Ids.

Enter value for num_days: 2

After the list displays, you are prompted for the beginning and ending snapshot Id for the workload repository report.

Enter value for begin_snap: 150
Enter value for end_snap: 160

Next, accept the default report name or enter a report name. The default name is accepted in the following example:

Enter value for report_name:
Using the report name awrrpt_1_150_160

The workload repository report is generated.

Running the awrrpti.sql Report

If you want to specify a database and instance before entering a range of snapshot Ids, run the awrrpti.sql script at the SQL prompt to generate a text report:

@$ORACLE_HOME/rdbms/admin/awrrpti.sql

First, specify whether you want an HTML or a text report. After that, a list of the database Ids and instance numbers displays, similar to the following: