January 5, 2011

Shared Server – 2

Although they are becoming increasingly rare (thanks, largely, to Web-based applications taking over the world) a few of the systems I get called in to review are still using Shared Server technology (formerly known as Mutli-threaded Server / MTS); and I have to say that there are a couple of nice “overview” features supporting this technology that I would like to see in the AWR or Statspack reports. These are the views which allow you to see how the workload is being shared out and what the time distribution looks like, and I’ll be taking a look at these views over the course of three or four blog notes.

The first view is v$reqdist – a view for which there is no match in dedicated server technology. This view gives you a simple histogram of how the shared servers have spent their time in the database. Here’s a simple query of the view, with the results from a system I was looking at a little while ago:

According to the view definition, bucket N records the number of operations (for example a single “fetch” call) that took less than 4 * (2^N) hundredths of a second to complete. So this output shows 285 million operations took less than 4/100 seconds to complete, 482 thousand operations took less than 16/100 second to complete, and 28 operations took up to 81.92 seconds to complete. (In fact, anything taking longer than 81.92 seconds also falls into the last bucket.)

There are a few drawbacks to using this view, of course. The first is that it accumulates data since instance startup – and neither Statspack nor the AWR use it in their snapshots – so you have to write your own code to do interval analysis.

Secondly the choice of buckets isn’t ideal: a “fastest bucket” of 4 centiseconds doesn’t really help very much when you’ve got a system where you think most tasks are going to be very quick, and a “slowest bucket” of only 82 seconds doesn’t really highlight the massively slow jobs that you might want to know about. Perhaps this second loss of detail is deliberate – after all, everything you do through shared servers is supposed to be very quick, so perhaps all you really need to know is that some of your tasks are taking more than a few seconds – after which is doesn’t really matter how long they are taking.

The final problem with this view is that it loses data. Notice the zero that appears as the count in bucket 1 (the 4/100 to 8/100 second bucket) – that looks suspiciously out of place. Given the numbers in the buckets either side you can’t help feeling that there should have been some tasks falling into that bucket. And there is a bug (though not one I can find on MOS/Metalink): tasks taking 4/100 to 8/100 second fall into the 0 – 4/100 second bucket.

If you want to demonstrate this for yourself, and with your own platform and version of Oracle, it’s quite easy to do. From an SQL*plus session connected through a shared server simply query v$reqdist before and after running a script that looks like the following:

Pick a few variations in the sleep time – and see which bucket your sleeps are recorded in. (The call to disable serveroutput is there so that you don’t see the effects of the dbms_output calls that SQL*Plus would otherwise issue after each call to the database – these would show up in the 0-4/100 bucket.)

There is actually another little problem with with view – it’s another of those little glitches that appear when you experiment with the SYS account – the view doesn’t seem to record the time spent in the virtual circuit when you were connected as SYS (although other views would show the traffic).

Update: Having created a simple piece of demonstration code to show that bug with the bucket usage, it occurred to me recently to use it for at stress test – and I found another bug with v$reqdist when running 11.1.0.6 on Windows XP. It looks as if, with sufficient concurrent sessions (leading to a backup in the COMMON queue) Oracle loses track of how to compute round-trip time. I haven’t pursued this very closely yet, but it looks almost as if Oracle starts to use the wrong start and end times to calculate the round-trip time, intially subtracting the start time of message N-1 from the end time of message N and gradually slipping further and further out of sync until it’s using times from widely separated messages.

Related

it is so obvious that v$reqdist was desinged in Oracle 7 and nobody didn’t touch it and I think nobody will ever touch :) 0.04s? Not a big deal in 1995, but very long time for today :-)

I personally hate shared server and I’m very happy I has never been forced to use it in production system. We have system with 10 000 connected users using old-fashioned client-server architecture, but thanks to AIX OS we didn’t have to use shared server (SOLARIS or HP UX would have much more problems handling 10 000 processes).