Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

Today we experienced a degradation in performance on our production sql server. Durring the time this occurred we logged several "The query processor could not start the necessary thread resources for parallel query execution" errors. The reading that I've done suggests that this has to do with how many CPUs to use when executing a complex query. However when I checked during the outage our CPU Utilization was only at 7%. Is there something else this could be referring too that I haven't come across yet? Is this a likely culprit of the performance degradation or am I chasing a red herring?

What is the value of max degree of parallelism configured and how many processors do you have currently on the server along with NUMA configuration ? You can use coreinfo.exe from sysinternals to find out number of processors and NUMA configuration.
–
KinJul 29 '13 at 18:35

2 Answers
2

Few months ago, I faced similar situation wherein the MAXDOP setting was default and a run away query exhausted all the worker threads.

As Remus pointed out this is called worker thread starvation.

There will be a memory dump created on your server when this condition occured.

If you are on 2008R2+SP1 and up then sys.dm_server_memory_dumps will give you the dump file location as well.

Now back to the problem :

There is 1 scheduler monitor thread per NUMA node and since you have 2 NUMA nodes there will be 2 scheduler monitor threads which are responsible for health checking of all the schedulers every 60 secs for that particular NUMA node while making sure that the scheduler is stuck or not.

Each time a new work request is pulled from the schedulers worker queue, the work processes counter is incremented. So if the scheduler has work request queued and has not processes one of the work requests in 60 sec the scheduler is consider stuck.

Due to a run-away query or extensive parallelism, there arises a condition of worker threads begin exhausted as all the threads are occupied by that single run-away query or excessive prolonged blocking and no work can be done unless that offending process gets killed.

Your best bet is to first tune your Max Degree of Parallelism setting. Default of 0 means SQL Server can use all available CPU’s for parallel processing and there by exhausting all the worker threads.

There are many reasons that can lead to exhaustion of worker threads :

Extensive long blocking chains causing SQL Server to run out of worker threads

Extensive parallelism also leading to exhaustion of worker threads

Extensive wait for any type of "lock" - spinlocks, latches. An orphaned spinlock is an example.

Refer to my answer here that will show you how you can calculate MAXDOP value for your server instance.

is there anything that would be indicative of a run awway query? Anything I can use to attempt to identify queries that are at risk of this?
–
LumpyJul 29 '13 at 21:22

Suggest you to look at the wait stats info to find out where it hurts. Also, look at sys.dm_os_schedulers --> current_tasks_count, runnable_tasks_count, current_workers_count and active_workers_count as well as sys.dm_os_wait_stats and sys.dm_os_waiting_tasks
–
KinJul 29 '13 at 21:45

There could be several reasons. Most likely is that you were out of workers. See max_worker_threads. The condition is called 'worker stravation'. The workers could be stolen by any one of multiple means (none of which would result in high CPU utilization, btw), like having many requests blocked or doing stupid things in CLR (eg. HTTP requests).

The symptom you see is the victim of the problem, not the cause. We cannot recommend a solution w/o knowing the cause. You need to collect perf counters, DMVs and check the ERRORLOG for more info.