Friday, August 6, 2010

Tasks that are performed via jobs in the database will be double accounted in the system time model that has been introduced with Oracle 10g.

So if you execute significant workload via DBMS_JOB or DBMS_SCHEDULER any system time model related statistic like DB Time, DB CPU etc. that gets recorded for that workload gets double accounted.

This bug is not particularly relevant since your top workloads will still be the same top workloads, because all other statistics (like Elapsed Time, CPU, Buffer Gets etc.) are not affected by the bug.

I mention it only here since the bug (see below for details) as of the time of writing can't yet be found on My Oracle Support in the bug database but I recently came across several AWR reports where the majority of workload was generated via job processes and therefore the time model statistics were effectively doubled.

It might help as a viable explanation if you sometimes wonder why an AWR or Statspack report only captures 50% or less of the recorded total DB Time or DB CPU and where this unaccounted time has gone. If a significant part of the workload during the reporting period has been performed by sessions controlled via DBMS_JOB or DBMS_SCHEDULER then probably most of the unaccounted time is actually not unaccounted but the time model statistics are wrong.

So if you have such an otherwise unexplainable unaccounted DB Time / DB CPU etc. you might want to check if significant workload during the reporting period was executed via the job system. Note that I don't say that this is the only possible explanation of such unaccounted time - there might be other reasons like uninstrumented waits, other bugs etc.

Of course all the percentages that are shown in the AWR / ADDM / Statspack reports that refer to "Percentage of DB Time" or "Percentage of DB CPU" will be too small in such cases.

If the majority of workload during the reporting period has been generated by jobs then you can safely assume that the time model statistics have to be divided by 2 (and the percentages have to be doubled). If you have a mixture of jobs and regular foreground sessions then it will be harder to derive the correct time model statistics.

Note that the "Active Session History" (ASH) is not affected by the bug - the ASH reports always were consistent in my tests regarding the DB Time (respectively the number of samples) and CPU time information.

The following simple test case can be used to reproduce the issue at will. Ideally you should have exclusive access to the test system since any other concurrent activity will affect the test results.

You might want to check the 1000000000 iterations of the simple PL/SQL loop on your particular CPU - on my test system this takes approx. 46 seconds to complete.

The first version assumes that a PERFSTAT user with an installed STATSPACK is present in the database since STATSPACK doesn't require an additional license. An AWR variant follows below.

As you might have guessed my single CPU test system has not been added a second CPU when performing the same task via DBMS_SCHEDULER / DBMS_JOB yet the time model reports (almost) 2 DB Time / DB CPU seconds and active sessions per second in that case.

I have reproduced the bug on versions 10.2.0.4, 11.1.0.7 and 11.2.0.1 but very likely all versions supporting the time model are affected.

A (non-public) bug "9882245 - DOUBLE ACCOUNTING OF SYS MODEL TIMINGS FOR WORKLOAD RUN THROUGH JOBS" has been filed for it, but the fix is not available yet therefore as far as I know it is not yet part of any available patch set / PSU.

Note that there seems to a different issue with the DB CPU time model component: If you have a system that reports more CPUs than sockets (for example a Power5, Power6 or Power7 based IBM server that reports 16 sockets / 32 CPUs) then the DB CPU component gets reduced by approximately 50%, which means it is divided by 2.

This means in combination with above bug that you end up with a doubled DB Time component for tasks executed via jobs, but the DB CPU time model component is in the right ballpark since the doubled DB CPU time gets divided by 2.

I don't know if the bug fix also covers this issue, so you might want to keep this in mind when checking any time model based information.