CXCONSUMER Is Harmless? Not So Fast, Tiger.

In Theory

Let’s say you’ve got a query, and the point of that query is to take your largest customer/user/whatever and compare their activity to smaller whatevers.

You may also stumble upon this issue accidentally if your query happens to capture populations like that by accident.

For our purposes, let’s look at a query that compares Jon Skeet to 1000 users with the lowest activity. To make this easier, I’m not going to go through with the comparison, I’m just going to set up the aggregations we’d need to do that.

The wait stats for it are pretty boring. Some SOS_SCHEDULER_YIELD, some MEMORY_ALLOCATION_EXT.

Stuff you’d expect, for amounts of time you’d expect (lots of very short waits).

In Closing

This isn’t a call to set MAXDOP to 1, or tell you that parallelism is bad.

Most of the time, I feel the opposite way. I think it’s a wonderful thing.

However, not every plan benefits from parallelism. Parallel plans can suffer from skewed row distribution, exchange spills, and certain spooling operators.

Today, it’s hard to track stuff like this down without capturing the actual plan or specifically monitoring for it. This information isn’t available in cached plans.

This also isn’t something that’s getting corrected automatically by any of SQL Server’s cool new robots. This requires a person to do all the legwork on.

One other way is to use sp_BlitzFirst/sp_BlitzWho to look at wait stats. If you see queries running that are spending long periods of time waiting on CXCONSUMER, you just might have a thread skew issue.

If you blindly follow random internet advice to ignore this wait, you might be missing a pretty big performance problem.

In Updating

This query is now about 17 hours into running. Through the magic of live query plans, I can see that it’s stuck in one particular nut:

I missed the last 30 seconds or so of the query running, which is why the CXCONSUMER waits here don’t quite line up with the total query CPU time, but they’re very close. Why doesn’t that wait show up in the query plan? I have no idea.

What really gummed things up was the final Nested Loops Join to Posts.

That’s a 13 digit number of rows for a database that doesn’t even have 50 million total rows in it.

Insert comma here

Bottom Line: Don’t go ignoring those CXCONSUMER waits just yet.

Thanks for reading!

Brent says: we were writing new parallelism demos for our PASS Summit pre-con and our Mastering Server Tuning class to show EXECSYNC, and we kept coming across wait stats results that just didn’t line up with what Microsoft has reported about “harmless” waits. This is going to be a really fun set of demos to share in class.

I have run into this several times recently on SQL Server 2017 Enterprise. It’s highly annoying, and it’s certainly not a “harmless” wait stat. Has anyone come across a good method of troubleshooting for the solution? Our solution was re-engineer the entire query from the ground up. It seems to have fixed it, but it’s a lot of work, and we still don’t know what the cause of the query wait was. And yes, I looked at the plan, and it didn’t tell me anything concrete.

Not really on a subject, but: I remember when CXCONSUMER wait type was announced to be added, many people believed (including me) that it will be used just to recognize control thread (ID = 0) and make CXPACKET waits more meaningful. Unfortunately, control thread can register CXCONSUMER or CXPACKET so I think there is still no 100% sure method to exclude control thread waits. Or am I wrong?