5 Answers
5

This depends very much on the type of application you run. If you've got applications which are very trigger-happy WRT syscalls you can expect to see high amounts of context switching. If most of your applications idle around and only wake up when there's stuff happening on a socket, you can expect to see low context switch rates.

System calls

System calls cause context switches by their very own nature. When a process does a system call, it basically tells the kernel to take over from it's current point in time and memory to do stuff the process isn't privileged to do, and return to the same spot when it's done.

When we look at the definition of the write(2) syscall from Linux, this becomes very clear:

NAME
write - write to a file descriptor
SYNOPSIS
#include
ssize_t write(int fd, const void *buf, size_t count);
DESCRIPTION
write() writes up to count bytes from the buffer pointed buf to the file
referred to by the file descriptor fd. [..]
RETURN VALUE
On success, the number of bytes written is returned (zero indicates
nothing was written). On error, -1 is returned, and errno is set
appropriately.
[..]

This basically tells the kernel to take over operation from the process, move up to count bytes, starting from the memory address pointed at by *buf to file descriptor fd of the current process and then return back to the process and tell him how it went.

A nice example to show this is the dedicated game server for Valve Source based games, hlds. http://nopaste.narf.at/f1b22dbc9 shows one second worth of syscalls done by a single instance of a game server which had no players on it. This process takes about 3% CPU time on a Xeon X3220 (2.4Ghz), just to give you a feeling for how expensive this is.

Multi-Tasking

Another source of context switching might be processes which don't do syscalls, but need to get moved off a given CPU to make room for other processes.

A nice way to visualize this is cpuburn. cpuburn doesn't do any syscalls itself, it just iterates over it's own memory, so it shouldn't cause any context switching.

Take an idle machine, start vmstat and then run a burnMMX (or any different test from the cpuburn package) for every CPU core the system has. You should have full system utilization by then but hardly any increased context switching. Then try to start a few more processes. You'll see that the context switching rate increases as the processes begin to compete over CPU cores. The amount of switching depends on the processes/core ratio and the multitasking resolution of your kernel.

my moderately loaded webserver sits at around 100-150 switches a second most of the time with peaks into the thousands.

High context switch rates are not themselves an issue, but they may point the way to a more significant problem.

edit: Context switches are a symptom, not a cause.
What are you trying to run on the server? If you have a multiprocessor machine, you may want to try setting cpu affinity for your main server processes.

Alternatively if you are running X, try dropping down into console mode.

edit again: at 16k cs per second, each cpu is averaging two switches per millisecond - that is half to a sixth of the normal timeslice. Could he be running a lot of IO bound threads?

edit again post graphs: Certainly looks IO bound. is the system spending most of its time in SYS when the context switches are high?

edit once more: High iowait and system in that last graph - completely eclipsing the userspace. You have IO problems.
What FC card are you using?

edit: hmmm. any chance of getting some benchmarks going on your SAN access with bonnie++ or dbench during deadtime? I would be interested in seeing if they have similar results.

edit: Been thinking about this over the weekend and I've seen similar usage patters when bonnie is doing the "write a byte at a time" pass. That may explain the large amount of switching going on, as each write would require a separate syscall.

I'm still not convinced that a high context-switch rate is not a problem, I'm talking about high as in 4K to 16K, not 100-150.
–
XerxesMay 29 '09 at 5:11

None of our servers run any X. I agree with you on the IO wait problem, and the relationship between that and the CS. The HBA card is not a suspect though because we use the same card on the other hundred or so servers... Conclusion is that I blame the SAN teams crappy EVA SAN that they desperately try and defend all the time. Note that a high IO-wait is not always reason to be alarmed, if most processes on a machine are IO-bound, it's expected that the server will have nothing better to do that idle spins.
–
XerxesMay 29 '09 at 7:31

On second though - the 4th graph attached shows that it's not really as close as I though at first. Not exactly an eclipse by any means. I still blame the SAN though. =)
–
XerxesMay 29 '09 at 7:38

There's no rule of thumb. A context switch is just the CPU moving from processing one thread to another. If you run lots of processes (or a few highly threaded ones) you'll see more switches.
Luckily, you don't need to worry about how many context switches there are -- the cost is small and more or less unavoidable.

Actually the cost of a context switch is expensive. This is even worst on Virtual machines - we did some testing a few months ago that showed that one of the biggest causes of VM performance was context-switching.
–
XerxesMay 29 '09 at 2:13

In fact, in any modern (multi-tasking) operating system, the minimization of context-switching is a very significant optimization task. Do you have any sources to back up your claim that the cost is small?
–
XerxesMay 29 '09 at 2:23

Sorry, are you talking about minimising context switches from the perspective of OS development? Having nothing to do with such development I have no opinion on the benefits of designing a system to minimise CS :) If you are talking about minimising context switches on a server, the issue is mitigating context switches introduces latency in other places. EG reducing the number of processes on a machine means you have to move these processes to another machine, which means communication occurs over a network, which is much slower!
–
Alex JurkiewiczMay 29 '09 at 3:02

I believe your definition of context switches is flawed; they also happen when a system call is performed, even if it returns to the same thread. Applications optimize against this by doing various tricks. For example Apache needs to get system time very often; for that purpose a thread calls localtime repeatedly and stores the result in shared memory. The other threads only have to read from RAM and do not incur a process switch when doing so.
–
niXarMay 29 '09 at 16:26

I'm more inclined to concern about the CPU occupancy rate of the system state. If it's close to 10% or higher, that means your OS is spending too much time doing the context switches.Although move some processes to another machine is much slower,it deserves to do so.

Things like this are why you should try and keep performance baselines for your servers. That way, you can compare things you notice all of a sudden with things you have recorded in the past.

That said, I have servers running (not very busy Oracle servers, mainly), which are steady around 2k with some 4k peaks. For my servers, that is normal, for other people's servers that might be way too low or too high.

I definitely agree with keeping a baseline, and we have nagios data going back for long periods - the problem with this server is that it's new blood - only been around for a short while. In addition, it's running enterprise (read: crap) software - Teamsite - just to add to the undefined-variable list. I still prefer sar (personal preference) so I'll configure it to keep more than the default (2-week), and see how it goes.
–
XerxesMay 29 '09 at 8:42

Using sar in combination with rrdtool (which it looks like your graphs come from) can be an easy means of keeping your data (or at least abstracts of it) for a long time.
–
wzzrdMay 29 '09 at 9:45