Tales From Everyday's Oracle Database Development and -Administration

Monitoring Oracle Performance with AppDynamics

One of my customers uses AppDynamics as their tool of choice for Application Performance Management of a multi-tier, B2B/B2C application environment with several Oracle Databases at the back end. This tool gives a graphic overview of the application infrastructure, discovers (JDBC or .net) database queries automatically and provides metrics on the response time for those database calls as well as metrics from the servers where the application components run on.

To my surprise, I found out that there are so-called “Machine Agents” that can be run on the servers to collect performance and other metrics, but there wasn’t any component for collecting metrics from the Oracle RDBMS itself. Thus, born was the idea to create such a component (“Monitor” in the AppDynamics terminology)!

What’s it all about?

The basic idea was to enable application administrators to gain insight on database performance as an additional source of information about the overall performance of their application infrastructure. Usually, application administrators aren’t DBAs and aren’t necessarily versatile with the interpretation of AWR and ASH reports. Furthermore, many larger organizations practice a separation of duties, meaning that an application admin has to call the DBA in charge (or – even worse – file a ticket) to find out if a performance issue of the application has its roots within the database instance or not.

On the DBA side, there are tools like “Oracle Enterprise Manager” that provide extensive insight into the database. But on one hand, those tools don’t show the complete picture of every link in the chain of a business transaction. And on the other hand, their usage is often restricted to DBAs only and therefore not visible to other stakeholders.

So, wouldn’t it be great to have a common repository of performance metrics from every system component? Combined with a front-end that allows for public dashboards and for drill-down into the collected data? Free of surplus charges and including the Java sources for your own customisation? Well – it’s here!

First of all, let’s take a look at a screenshot to demonstrate what this monitor is doing:

The three colored lines above display the metric “Executions Per Sec” (the number of SQL executes in a second) on three different Oracle instances in this system. This was shot during the ramp-up phase of a test scenario where you can see the different workloads unfold on each instance. The AppDynamics Machine Agent calls its monitors every 60 s (default, but configurable), thus, metrics are sampled on a 60s-basis.

License Restrictions

Many useful metrics are already aggregated in the Active Session History (ASH) or stored in the Automatic Workload Repository (AWR). There’s one problem, however: Usage of ASH and AWR requires an additional license for Oracle’s Diagnostics Pack. Not every organization is willing to pay the additional license fees, as they can add up quite significantly on multi-CPU hardware.

So there should be an alternative to ASH/AWR or at least an option not to use views that are part of the diagnostics pack.

V$SYSMETRIC to the rescue!
This view displays the system metric values captured for the most current time interval for both the long duration (60-second) and short duration (15-second) system metrics. And with a monitor that samples every 60 seconds, the long duration metrics are a perfect fit!

To explain this in more detail: Before the advent of ASH, Oracle’s performance metrics consisted mostly of cumulative statistics that were counted up from zero at instance startup. To make sense of those metrics, one usually had to build differences of two snapshots, dividing them by the time that passed inbetween the snapshots (e.g., by using Statspack, which still works in 11g, btw). With ASH, Oracle implemented a mechanism of sampling system metrics every second. Those metrics are then aggregated in v$sysmetric for 15- and 60-s-intervals. Values older than 60 s can be retrieved from the DBA_HIST_% views (which are subject to licensing).

But why do all the snapshot and aggregation stuff yourself when – leaving some limitations aside – Oracle already does that for you? So, to get a v1 of that monitor up and running fast, v$sysmetric became the preferred source of information, as it provides us with ASH metrics but without the cost for ASH. The “H” part of it, the historicising and rollup, can be done by the AppDynamics Controller.

Information Hiding

In their business best-seller “re:work“, the authors mention how important it is to decide on which features not to include in a product to maintain usability. The same decision had to be made here: The Oracle kernel provides an abundance of metrics (in the hundreds, see the Oracle docs for an idea), but would an application admin really need all of them, not to mention the ability to interpret them? Which metrics would in turn be necessary to get a good picture about what is happening inside the instance?

For v1.0, we decided to collect 42 metrics (Douglas Adams reference partly intended :-) that should mainly serve two purposes:

Provide throughput data to build baselines on. Those baselines can be used to assess throughput changes after changing application code or system configuration.

Provide further information to identify and alert on performance bottlenecks.

These are the metrics that we chose to collect in the first version. Some of them are scaled up by 100, because AppDynamics displays integers only, which is bad for small numbers with fractional values:

Activity

Active Sessions Current: Number of active sessions at the point in time when the snapshot was taken.

Average Active Sessions: Average number of active sessions within the last 60 s. This is maybe the single most important DB load metric and a good starting point for a drill-down.

Average Active Sessions per logical CPU (*100): This shows the average load the database imposes on each logical CPU (i.e. cores or hyperthreads). Values above 100 (more than 1 waiting DB session per CPU) indicate a higher demand for resources than the host can satisfy. This often marks the beginning of quickly rising response times.