Performance impact: driving up context switches/sec

Too many context switches per second are considered bad for your database performance. But how many is too many has never been clear. With the core count of new servers going up rapidly, it becomes even less clear how we should evaluate this counter to help understand the SQL Server behavior in the environments we support. Recognizing that any attempt to rehash what is already said/recommended out there will more likely be a disservice than a service, I’d like to look at it from a different angle, and hopefully contribute to its understanding with some data points.

Personally, I subscribe to the belief that one of the best ways to understand a behavior is to be able to create and manipulate the behavior on demand. And it naturally follows to ask: how can we drive up the value of the System\Context Switches/sec counter with a SQL Server workload?

Knowing how SQL Server schedules its tasks, I’d expect to be able to drive up context switches/sec by running a lot of very small tasks.

And that is indeed the case. Here is how it goes.

I first create two stored procedures that basically does nothing on the server side. These are just null transactions. (By the way, the parameters in these proc don’t mean anything. They are there because the client program I use expect them and I’m too lazy to modify the client program. Plus, it adds absolutely no value to modify the client code.)

Then, I simulate 200 concurrent users by starting 200 threads on a client and each thread calling these two procs within an infinite loop with no wait between the calls. The following chart shows the sustained values of the Context Switches/sec on a DL580 G7 with four E7-4870 processors when different number of cores are enabled. In all the cases, hyperthreading is enabled. And note that each E7-4870 has 10 cores.

With this approach, the value of the Context Switches/sec counter is driven into 200,000 to 250,000 per second range. These are pretty high numbers. I have no idea if they can be driven even higher with a different approach. But I know that I have not seen the counter approaching this level in any real production environment. If you have, let us know what kind values you have seen and with what kind of workload.

I should also report that this null transaction workload fails to push the total processor time very high. See the following chart:

The maximum %Processor Time (_Total) that can be reached by this workload ~24%. And this is not only the case with the different number of core count, but also is the case no matter how many concurrent users (threads) are submitting the transactions.

It is worth noting, and it is evident from the chart, that the %Privileged Time (_Total) accounts for a very large percentage of the %Processor Time (_Total). In a real production environment, this would spell trouble. With this null transaction workload, I don’t know whether this should be expected and is by design, or something is not behaving properly and the %Privileged Time should be much lower. But I do know that when the transactions are actually doing something useful (e.g. by including some non-trivial SELECT statements), we’ll see the %Privileged Time (_Total) value go down rather quickly. For instance, with the workload used in this previous post, the %Privileged Time (_Total) is typically around 1% while %Processor Time (_Total) is near 100%. And with that workload (which is doing a lot more useful work in its transactions), the Context Switches/sec counter is typically observed to be less than 49,000.

How useful are these data points? I’m not really sure. Hey, at least we know that this particular workload can drive up the Context Switches/sec counter. And if this starts a conversation, it would be a plus.

Comment Notification

Comments

I constantly monitor context switches on our main system - I do this as it is a graph I can monitor, I have no idea what is high or otherwise for my system. I have 4 x quad core processors ( no HT ) and I see a range typically between 10k and 25k context switches/sec. As we are web facing I don't actually have a direct concept of users - but typically at any one time I have up to 450 users connecting with about 200 connections on my server.

I ran a query across one of our online performance monitoring repositories which captures perfmon from various live SQL systems.

I found a few servers with surprisingly high peak context switching stats. A couple were 400k+ & each of these were physical cluster nodes. The virtual instance stats were nowhere near as high (even combined) & the Win03 / SQL05 systems were much worse than the Win08/SQL08 systems. I wonder whether this indicates faults with older clustering?

Otherwise, there were many systems in the high 10s of thousands and a few above 100k. Interestingly, not all of these 100k+ systems are known high tx rate systems so I'm going to try & identify why they had such high peak context switching per sec. I'm guessing some were anomolies but trend analysis should confirm

I haven't looked into %Privileged time tonight, as it took hours to just run the above query - the repository on this server is 550+ billion rows so this analysis takes a while.

If you're intersted in digging any deeper or correlating with other stats let me know & I'll run other queries if necessary tomorrow.

I'm curious about your 400k+ context switches/sec stats. Are these sustained values (i.e. sustained over several consecutive samples)? And what kind of hardware is this? A lot of servers may not even be able to sustain this level of context switches.

Yes, it would be nice to find out what such high values may be correlated with, especially if they are sustained (not transient) values. To narrow the scope, could you verify whether the high context switches values in your envrionment are generally correlate with high %Privileged Time? Sorry don't mean to add more work, but since you asked.

Let me know if you want any more. Personally, I think your DL 580 G7 could do massively higher than your 250k context switches but the problem is your SQL queries couldn't drive it much higher due to the nature of SQL queries - TDS interruption / NIC activity etc etc even though your queries are not accessing data. Then, there's the whole SQLOS / WinNT task co-ordination layer as well.

If you wrote a multi-threaded program to run in executive mode with many more process threads than logical processors you'd get massively higher numbers than 250k. I'd love to try driving a DL580 G7 to see how far it could go but I only have access to live ones, none in labs unfortunately. Our lab servers are all smaller at the moment.

Come to think about it, if we take the general recommendation of 5000 context switches/sec per processor as the threshold, on a 80 processor machine (such as DL580 G7 with four E7-4870 and HT enabled), I should expect the threshold to be 400,000. if we just limit tha to cores, the threshold for the total context switches/sec would be 200,000. And given that's just the threshold, we should expect to be able to push the number higher on the machine. But of course, that recommendation is pretty old, and I don't know if that general reocmmendation still applies on a modern mult-core/many-core machine.

The other thing is that I'm more interested in how high a SQL workload can push the context switches/sec counter as that helps understanding the SQL Server behavior. Still knowing how high it could go is definitely useful in developing a sense of scope and magnitude.

The highest context switches / sec I saw outside a cluster node (on a regular SQL machine) was ~150k. However, I only queried the past month from one of our monitoring servers & am not sure if there weren't higher numbers earlier its possible there might have been much higher previously.

The machine that was doing the 150k is a HP DL580 G6 with high spec CPUs - could look up the details but not sure off hand exactly which SKUs. It's a known high OLTP workload box which requires regular fine tuning and we've been working with for years so I know is a clean server, without any other software running on it hence I'm fairly sure the context switches are all generated by the SQL workload.

About high level of %Privileged Time (_Total). I would speculate that it accounts for the work needed to do all these context switches, thread/process rescheduling, cache prefetches, SMP stuff, NUMA stuff etc. If you assume that it needs around 1 us of CPU time to do one context switch, with 250000 context switches/s you get 250 ms/s for doing just that. Divide it by the number of cores and this explains your chart.

Hi Guru (Kevin?) the higher context switches / sec scenario relates to a cluster node, which only had a single instance running on it at the time it peaked at 600k context switches / sec. There was no change from normal in query activity - ie, no material increase in n# of queries or logical page reads / sec and also no increased NIC activity. It appears to be some sort of system anomoly relating to clustering & similar observations were also curiously made on other clustered nodes.

The lower context switches / sec scenario was definitely from an increased amount of SQL activity & it specifically related to some online index relocation work that was being performed when it peaked. I suspect that the increase in context switches / sec comes from the threads going in & out of I/O calls during the reindexing.

Joe, I imagine so, but I will run a query over the perf data for the instances which experienced extremely high context switching & come back to you. Might take a couple of days but I'll follow it up when I can

Usually, I notice high cpu shows up as scheduler yield, cxpacket, etc waittypes. But there are times we have the same waits, but low cpu. Ive usually found the culprit, but Linchi's experiment above makes me wonder if Ive ever been fooled by a high context switching scenario that wasnt causing high cpu.