ErlandOur 'max worker threads' is set at the default "0"Next time I see this condition I will query the sys.dm_os_workers tables directly to get the count.

Since the Activity Monitor in SQL Server 2008 is so bad I usually take a quick look at it then run a couple of different queries that I found on this forum to really see what is going on.I have wondered if maybe there is a bad join in one of the queries. If you have time and are willing, I could post the two queries that I use for you to see. On the surface they both look correctly formed but there was one table/view that I am not familiar with so I took the poster's word for the correctness. For "normal" processes the number of records returned is accurate so it gave me the feeling that the number for the massive records could be correct.

ELLEN-610393 (10/17/2013)I have wondered if maybe there is a bad join in one of the queries. If you have time and are willing, I could post the two queries that I use for you to see. On the surface they both look correctly formed but there was one table/view that I am not familiar with so I took the poster's word for the correctness. For "normal" processes the number of records returned is accurate so it gave me the feeling that the number for the massive records could be correct.

Yes, it sounds like there is a join problem. Please post your queries, although it may take some days before I get the time to look at them. But maybe someone who is not at PASS can step in.

dwilliscp (10/16/2013)So if you have one SPID, but 700 rows, that means the server is trying to do parallel processing? They all show CXPacket.. So how do I track down whay this is going on? (about twice per week)?

You and Ellen are completely unrelated, aren'y you? I am just wondering if we are trying to to solve one or two problems here.

Where do you see 700 rows? CXPacket waits token of a parallel query, where some threads have completed their job and are waiting for some other thread to complete. Typically this happens when the optimizer makes a misestimation on how to partition the data over the threads.

To see what is going on, the first step is to find which querythe process is running. My own favourite tool is beta_lockinfo, but then again I wrote it. You find it on http://www.sommarskog.se/sqlutil/beta_lockinfo.html. This gives you a snapshot on what is going on in the system, including current query and plan.

Unrelated ... The following is one of the queries that I use to show what is going on... when the server is not responding, and unlike the worker query, this is the one that showed me the 700 rows with the same SPID (all generated by the same query), and I could not see one that was not CXPACKET..but maybe I should have filtered out the CXPACKET wait types after I noticed the SPID creating all the issues.

When in doubt, tinker. Tinker with what you are allowed to change (which doesn't sound like much). Can you set MAXDOP to 4? MAXDOP 0 can be a killer as it'll bulk up on threads for a query on the assumption nothing else is going to run. A few milliseconds later when 100 similar queries have been executed, VOILA! Thousands of extra threads. For my dollar, even with hyperthreading MAXDOP should be set no higher than the number of physical cores - half that in AMD environments or busy Intel environments.Usually things are disk bound anyway. Sure would be nice to install a few SSDs and use them for TEMPDB - that can make a huge difference everywhere.

Oh - and bad statistics. That could indirectly cause a lot of parallelism and extra threads.

Here is a potential problem. This join is good in most cases, but not if you use MARS (Multiple Active Result Sets) or SOAP requests, in which case there may be multiple requests for the same session id.

If we ignore that, and assume that a single request is generating 700 worker threads, that is is excessive. You need to look into the query plan for that query, and investigate if you can add indexes or improve the query.

There has been some other suggestions for diagnostic queries. I can offer beta_lockinfo, whicb you find at http://www.sommarskog.se/sqlutil/beta_lockinfo.html. This gives you both the query, the plan as well as a bunch of other information. And itjoins the various DMVs correctly. :--)

ELLEN-610393 (10/17/2013)ErlandOur 'max worker threads' is set at the default "0"Next time I see this condition I will query the sys.dm_os_workers tables directly to get the count.

Since the Activity Monitor in SQL Server 2008 is so bad I usually take a quick look at it then run a couple of different queries that I found on this forum to really see what is going on.I have wondered if maybe there is a bad join in one of the queries. If you have time and are willing, I could post the two queries that I use for you to see. On the surface they both look correctly formed but there was one table/view that I am not familiar with so I took the poster's word for the correctness. For "normal" processes the number of records returned is accurate so it gave me the feeling that the number for the massive records could be correct.

If one query is generating several hundred worker threads that really sounds like you have MAXDOP set to 0 (the default). Since you know the problematic query apparently, just take that query and append OPTION (MAXDOP 4) to the end of it and fire it off in a few query windows. If it doesn't cause a couple thousand worker threads you've found your problem.

Here is the problem. Sysprocess does not have one per process, it has one per process and execution context id. And sometimes there can even be multiple rows for ecid = 0. Since you already have one per task, you get a complete explosion here.