May 26, 2009

CPU used

From time to time users of Statspack on the newer versions of Oracle are surprised to see the “CPU Time” in the “Top N Timed Events” section of the report looking very different from the “BUSY_TIME” that appears in the “OS Statistics” part of the report.

There are various reasons why the numbers can differ, but one of the reasons is simple and highly beneficial – prior to 10g Oracle usually updated time figures at the end of each database call; but from 10g there are some views where time is updated more regularly.

We can see this difference with a simple test (in this case using the session-based views views v$sesstat (old style) and v$sess_time_model (new style)) by executing a long-running, CPU-intensive query (such as the code from my “kill CPU” routine) in one session, then monitoring it from another.

After starting my killer query – I ran the following statement roughly once every second:

This produced a string of results, which I’ve trimmed down to show only the results where critical figures change:

NAME VALUE
------------------------- ----------
CPU used by this session 3456
DB CPU 3452
NAME VALUE
------------------------- ----------
CPU used by this session 3456
DB CPU 3988
NAME VALUE
------------------------- ----------
CPU used by this session 3456
DB CPU 4588
NAME VALUE
------------------------- ----------
CPU used by this session 3456
DB CPU 5187
NAME VALUE
------------------------- ----------
CPU used by this session 5739
DB CPU 5735

Notice how the “DB CPU” from v$sess_time_model increases every six seconds, while the “CPU used by this session” from v$sesstat changes only at the end of the test when the killer query completes.

As a quick check – if your BUSY_TIME time is much larger than the CPU time in the Top N, it’s possible that you’ve had some CPU intensive SQL in the period that didn’t complete in the period. Conversely if your CPU Time is much larger than your BUSY_TIME then it’s possible that a CPU intensive query that ran for a long time in the previus period completed in this period.

(There are other reasons for the difference, of course, but it’s worth remembering that this one is a possibility).

Related

Another reason is that v$osstat reports the OS level system wide CPU usage. But Oracle’s v$sysstat and v$sys_time_model report CPU usage of current INSTANCE only. So if you have other stuff going on in the server, you’ll see discrepancy.

I guess it would have been sensible to make it clear at the start that the OS CPU isn’t restricted to just the the single instance CPU and therefore ought to be greater than the instance CPU – but I forgot that this wasn’t something that everyone would be aware of. Thanks for highlighting the point.

Of course, there’s a more significant point that people might expect the “DB CPU Time” from the “Time model” section to be similar to the “CPU Time”/”CPU used by this session” – and the difference in the moment of recording is more appropriate for that comparison.

I have a 9.2.0.8 db on a Solaris 10 server. I review Statspack reports in a weekly basis, since last week, the CPU Time dropped to a 10% of what it used to be so, I got an increase on controlfile and log file waits… CPU Time dropped from 90% to 60%!

What could be the reason for that?
I know the workload from the application hasn’t changed, neither anything else on the Statspack reports.

I can imagine a lot of people reading your note and thinking: “How can a drop from 90% to 60% be the same as dropping to one tenth of what it used to be”. I think I can guess what you’re looking at when you say this – but it’s not obvious. I can also guess why this might happen – but it’s not compatible with no change in the workload, application, or statspack output.

If you want to email me a pair of one-hour reports from before and after the change that you think are good examples of the difference, I’ll take a quick look at them – and write a note about them if they demonstrate a useful principle.

Just to clarify, the CPU Time in seconds dropped to a 10% (from 100 seconds an hour to 10 seconds an hour).
That impacted the Top 5 Wait Events causing the controlfile and log file wait events to increase in % but not in amount of waits, so, the CPU Time % of Total Elapsed Time dropped to 60%.

Jonathan, I’ve just sent you an e-mail with Statspack reports so you can check them out.

Could a decrease like that in the “CPU used by this session” statistic mean that the database is not being used?

I’ve had a quick look at an early and a late report. First thought 100 CPU seconds in one hour is virtually idle anyway.

But looking at where that CPU time came from (I checked SQL ordered by buffer gets as a first cross-check) most of it is coming from a couple of queries that look like monitoring queries (queries against v$lock and v$session, v$sql) and don’t look like application code.

These disappear from the later statspack. So maybe the person who was checking on the database decided it wasn’t worth the effort.

If you check the times you mention, the average wait time for each of those events hasn’t changed – it’s just that the reduction in CPU usage pushes their total time higher up the list and makes their percentage contribution higher.

I think you’re right! I’m going to investigate further on that!
The explanation was simple then… A monitoring script was disabled and a lot of the CPU it was using got free, so now the instance is more idle than before.

Hi,
If , in AWR, we have CPU TIME =36000s and total CPU capacity is 46. so Total CPU for 30 minutes elapsed time is 46*30*60=82800s.
So % of CPU TIME taken by oracle in 30 minutes is 36000/82800*100=43.47%.
Is this calculation right?
also if % of busy time is 70% then is it right to say that 70-43.47=26.53% of CPU was taken by non oracle processes?

Your calculations look correct. However, to follow up the 70% question, I would also at the section on OS Statistics to cross-check the BUSY_TIME, and the split between SYS_TIME and USER_TIME – I like to ensure that different sources of information give consistent values before I jump to any conclusions.

If I’m right, “CPU used by this session” is measured in tens of milliseconds and user calls under ten milliseconds are not added to this metric whereas “DB CPU” is measured in microseconds.

Could that difference in accuracy explain why “DB CPU” could be significantly higher than “CPU used by this session” ?
Would that mean that a lot of user calls complete in less than 10 millisecond in the following example ?

I think it’s impossible to say – and detailed interpretation may depend on Version and Platform.

I would assume that Oracle used a rounding mechanism to convert whatever time it acuired for CPU, so that the boundary case would be 5,000 ms rather than 10,000, and we might hope that on average the errors summed to zero across all sessions. We might also question whether Oracle populated the session value for “CPU Used by this session” by accumulating figures like “how many hundredths of CPU seconds have I used since last time” or whether it echoed the o/s result for “CPU time current recorded against this process”. The former would tend to introduce a cumulative error that might explain your results.

A couple of reference points:

a) you could query v$sess_time_model and compare it with v$sesstat to see if there were particular sessions that showed the largest discrepancy. This might give you some further ideas on the cause.

b) when I ran your query on the next 11.2.0.3 I used, the error was the other way – i.e. DB CPU was smaller (by about 5%) than CPU used. (I also had a few sessions which recorded CPU used, but showed zero DB CPU.