December 8, 2011

Takeaway: Is the wait type CXPACKET bothering you? It means parallelism. Here’s how to find the queries that are giving you this grief.

The Wait Type CXPACKET

Since 2005, SQL Server provides dynamic management views (DMVs). These views help monitor database health and they’re the first place I go to when troubleshoot problems. For example the view sys.dm_os_wait_stats gives an aggregated way to tell you where the waits are. Often you’ll come across the wait type CXPACKET. Microsoft explains:

Hmmm… I know what those words mean individually… Any way, if you keep searching, you’ll see others explain: “CXPACKET means parallelism”. We’re getting a bit closer now. If a query is run with multiple threads, and one thread completes before the others, then SQL Server will report the wait type CXPACKET for that thread.

It’s like several chefs are making soup. If one chef’s only job is to add the cilantro, there’s going to be some waiting involved. So is the extra help worth the trouble?

SQL Server thought it could use more cooks for a particular “recipe”… Maybe it’s time to make simpler recipes. (The answer is almost always yes, especially for OLTP systems). SQL Server doesn’t use parallelism often. It only comes up with a multi-threaded query plan when it needs to, i.e. when the plan is estimated to be costly enough. It’s usually those queries with the high estimated cost that need to be fixed, the CXPACKET wait type is a symptom of that.

Those High Cost Queries

So which queries are those? Again, I like to go to the DMVs. The one I like to look at is sys.dm_exec_query_stats except that it doesn’t have a flag called uses_parallelism. For the definitive way to find parallel queries, you have to scan and parse all the query plans that are cached using sys.dm_exec_query_plan.

But that takes for ever so I don’t recommend it. But recently I found out that we don’t really need it: Parallel queries are interesting, they’re one of the only kind of queries that can use more CPU time than actual elapsed time. So if that’s true, then we just need to examine these queries:

But it might not be true, for example, a parallel query might be suffer from blocking too long. Long enough that the elapsed time might be longer than the total amount of CPU time. That’s an interesting question: How good an indication of parallelism is the ratio between elapsed time and CPU time?

(total_elapsed_time < total_cpu_time) Vs. uses_parallelism

I’m lucky. I’ve got a large set of query data that can help me.

This chart needs a little explanation. Pick a dot in the chart above. It represents a set of queries whose ratio of cpu to elapsed time are the same. Look at the value on the x-axis to see what that common ratio is. Say that it’s 1.5, this means that that dot represents a set of queries whose elapsed time is exactly 50% greater than its cpu time. Now look at its value on the Y axis. You might see 2%. That’s means that 2% of those queries use parallelism.

I expected a slope, not a waterfall and this is what that means to me. Knowing that a query’s worker time is greater than its elapsed time is a great indication of parallelism. And knowing by how much doesn’t change that.

CXPACKET? Here’s What’s Next

This query lists the top 20 worst parallel queries cached in your database.

I’ve written it for SQL Server Management Studio. In SSMS, you can click on the query and the query plan to explore these queries further.

I’ve added a filter to weed out queries whose average cpu time is more than a millisecond. I’m sure you won’t miss those stats. Microsoft reports CPU time in microseconds, but it’s only accurate to milliseconds. So you can get false-positives if you don’t ignore the blazing fast queries. That’s why I added that filter.

Interesting – but I have a question: you seem automatically to equate parallel with “bad.” Parallel does not automatically imply badness. How do you separate the good parallel queries from the ones where the parallel processing actually does harm and a single threaded plan would be better?

Fantastic, another great post Michael. I particularly enjoyed the novel use of the good ole “Too may cooks…” analogy.

Merrill, in the post Michael eludes that it’s more geared toward analysing the occurrence of parallel queries for OLTP workloads.

Far be it from me to put words into the man’s mouth but my own understanding is that in such scenarios it is particularly undesirable to see a parallel plan being produced given that the goal is for short lived/fast transactions which are often performing relatively small data reads/writes. Seeing a parallel query plan produced for OLTP workloads is often an indicator of a poor plan choice/issue, such as a poorly written query or missing indexes.

Michael, I am getting a number of false positives with your code (ie queries where there isnt parallelism in the plan). It seems to be most often when the execution count is high so I would guess that its down to a rounding effect somewhere in the work.

If you take the difference between the Total CPU and the Total Elapsed Time and divide that by the number of executions and you get 0 then these plans dont have any parallelism operators. Adding “( ( total_worker_time – total_elapsed_time ) / execution_count ) * 1.0 > 1000” to your WHERE clause gives me results with no false positives.

@Fatherjack, Yes, I can confirm that. I struggled with that, trying to find the best way to mitigate that scenario and I think your change works well. I’ve updated the post. Thanks!

@Merrill, @John, John’s right. CXPACKET isn’t bad in itself just like blocking isn’t bad in itself. It’s excessive blocking and excessive CXPACKET that’s the problem. In my head, the audience for this article is someone who is managing an OLTP database and who sees a lot of parallelism. This means some query is doing a lot of work. That’s not bad in itself. Sometimes queries need to do a lot of work. The question that should be asked is whether this query is doing too much work. Before you can answer that, you need to know which queries those are. My hope is that this post helps with that.

This is a good start.. One thing to note is that although you are identify plans that have parallelism in their QP, these may contain multiple statements of which only some have the parallelism operators and of those it may account for a small percentage of the overall CPU of the batch
Actual Example: a Stored procedure with 2 statements in the plan, 1 of these with 99.9% of the CPU and the other 0.1%, it is only the 0.1% (CPU) statement that has parallelism operators (this was a Clustered index table scan as it was retrieving 70% of the tables rows) so in this case was the required result

As your [Query Text] shows the statement that uses the parallelism operators… in this example that was the 0.1% CPU of the total batch
So we need to wary that this could maybe *not always” be the reason for the HIGH CPU usage – you will need to look further

Of course this is just 1 example and many will be the majority of the CPU and causing performance issues for a variety of reasons (thread skew, blocking or other wait conditions etc..)

High Neil, the query stats are for the statement and the query plan is for the batch. So it’s important to look at the query text to help pinpoint the query in the plan with parallelism.

“high cpu queries using parallelism” is strongly strongly related to “queries that cause CXPACKET”. So if troubleshooting CPU is a concern, I would use the top 20 query without a filter. But for digging into CXPACKET waits, I’ve found the filter works well.