There's one thing that you should note about this deadlock right off the bat: all of the participants (the "process" nodes in the -T1222 output) are from the same session identifier (spid) and the same batch (sbid).They each have a different thread ID (kpid and ecid).Each "process" entry in this deadlock represents a different worker thread, but all of these worker threads are running part of a single large parallel query that was submitted by spid 51.

Another interesting thing about this deadlock is in the resource list: most deadlocks involve lock resources ("pagelock", "keylock", etc), but this one only deals with "exchangeEvent" and "threadpool" resources.

Deadlocks centering around exchangeEvent resources have been given the name "intra-query parallelism deadlock". (I know -- it just drips 'sexy', doesn't it?)They may be accompanied by this error message -- sent to the client app only, not logged in the SQL errorlogs:

An "exchangeEvent" resource indicates the presence of parallelism operators in a query plan.The idea is that the work for an operation like a large scan, sort, or join is divided up so that it can be executed on multiple child threads.There are "producer" threads that do the grunt work and feed sets of rows to "consumers".Intra-query parallel requires signaling between these worker threads: the consumers may have to wait on producers to hand them more data, and the producers may have to wait for consumers to finish processing the last batch of data.Parallelism-related waits show up in SQL DMVs as CXPACKET or EXCHANGE wait types (note that the presence of these wait types is normal and simply indicates the presence of parallel query execution -- by themselves, these waits don't indicate that this type or any other type of deadlock is occurring).

Wherever you have threads waiting for resources, there is a risk that they will end up in a circular blocking chain (thread A holding resource X and waiting for resource Y, thread B holding resource Y and waiting for resource X).The synchronization objects used in parallel query execution are no exception; in rare cases, the threads running a single query can end up deadlocking with one another.Most intra-query parallelism deadlocks are considered bugs, although some of them can be risky bugs to fix so a fix may not be possible.If you run into one and you're already on the latest SQL service pack, your best bet may be to investigate workarounds.Luckily, this type of deadlock is relatively uncommon, and in most cases it's possible to work around the problem by eliminating parallelism in the query.Try one of these two approaches:

Workaround #1: Add an index or improve the query to eliminate the need for parallelism.In most cases, the use of parallelism in a query indicates that you have a very large scan, sort, or join that isn't supported by proper indexes.If you tune the query, you will often find that you end up with a much quicker and more efficient plan that doesn't use parallelism, and therefore isn't subject to this type of problem. Of course, in some queries (DSS/OLAP-type queries, in particular) it may be difficult to eliminate all large scans.

Workaround #2: Force single-threaded execution with an "OPTION (MAXDOP 1)" query hint at the end of the query.If you can't modify the query, you can apply the hint to any query with a plan guide (assuming that you're running SQL 2005 or later).

"threadpool" Deadlock Resources

A process waiting for a "threadpool" resource is actually waiting for a worker thread.There are a finite number of threads in SQL's thread pool, and if they are all in use, new requests must wait for an in-progress task to complete and free up a thread.Thread pool waits (in DMVs, "THREADPOOL" or "UMSTHREAD" waittype) are typically a side effect of a massive resource contention problem -- most commonly, a large blocking chain.You should investigate what is tying up all of your worker threads, and eliminate that bottleneck.While it's not clear from this deadlock output alone, I suspect that in this case there may have been many other large untuned queries using lots of parallel threads, so excessive parallelism itself may have been the cause of the thread starvation.

"resourceWait" Deadlock Resources

(UPDATE: The following info on "resourceWait" was added to this post 15 Oct 2009.)

A process waiting for a "resourceWait" resource is waiting for a "resource semaphore". Resource semaphores are typically used to govern memory used for query sorts and hashes. So the following wait graph:

Spid A is waiting for a shared key lock, but is blocked by Spid B, who holds an exclusive lock on this key. Spid B is waiting for more memory to run his query (and, eventually, to release the X key lock), but he is blocked by other spids, including Spid A, who are currently holding all of the memory available for this type of operation.

You could tackle this by looking for tuning opportunities in the queries run by both deadlock participants. They are probably running a query plan that involves a hash or sort operation. Remove this through indexing or query changes and you should eliminate the query's need to wait for a query memory grant. You could also try throwing RAM at the problem, but keep in mind that query workspace memory, the memory used for sorts and hashes, must be drawn from "visible buffer pool". On a 32-bit box, visible bpool is limited to approximately 2GB (3GB if you are running with /3GB). If SQL already has this much memory available to it, adding more won't help.

Caveats

Just because you see “exchangeEvent” resources in your deadlock graph doesn’t necessarily mean that you are facing an intra-query parallelism deadlock.Sometimes the engine includes extraneous resources in the deadlock graph.This makes it important to find out how the waiters relate to one another so that you can determine which of the resources is an essential part of the circular blocking chain.The post http://blogs.msdn.com/bartd/archive/2006/09/09/Deadlock-Troubleshooting_2C00_-Part-1.aspx steps you through a deconstruction of -T1222 output so that you can get a clearer understanding of the relationships.As a rule of thumb, if there are any lock resources in your deadlock output (pagelock, keylock, rowlock) along with the exchangeEvent resources, you should suspect that the exchangeEvent resources are non-essential and that you are probably facing a “normal” deadlock.

Hey Bart, great write up. Thanks for taking the time to do it. I recently had this problem in an ETL query where we were using an exclusive table lock to speed things up. I figured out that the spid was blocking itself and applied the maxdop change before I found your article, but it seems to me that this problem would happen 100% of the time if you had a parallelized query with a table lock. So, in an effort to leave a google breadcrumb for someone else to find this page more easily, I wanted to comment to add some keywords so that future sql nerds could find this page by searching for their error message. Without further ado:
was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim
TABLOCKX