Avoiding Deadlock

Deadlock is a permanent blocking of a set of threads that are
competing for a set of resources. Just because some thread can make progress
does not mean that a deadlock has not occurred somewhere else.

The most common error that causes deadlock is self deadlock or recursive deadlock. In a self deadlock
or recursive deadlock, a thread tries to acquire a lock already held by the
thread. Recursive deadlock is very easy to program by mistake.

For example, assume
that a code monitor has every module function grab the mutex lock for the
duration of the call. Then, any call between the functions within the module
protected by the mutex lock immediately deadlocks. If a function calls code
outside the module that circuitously calls back into any method protected
by the same mutex lock, the function deadlocks too.

The solution for this kind of deadlock is to avoid calling functions
outside the module that might depend on this module through some path. In
particular, avoid calling functions that call back into the module without
reestablishing invariants and do not drop all module locks before making the
call. Of course, after the call completes and the locks are reacquired, the
state must be verified to be sure the intended operation is still valid.

An example of another kind of deadlock is when two threads, thread 1
and thread 2, acquire a mutex lock, A and B, respectively. Suppose that thread
1 tries to acquire mutex lock B and thread 2 tries to acquire mutex lock A.
Thread 1 cannot proceed while blocked waiting for mutex lock B. Thread 2 cannot
proceed while blocked waiting for mutex lock A. Nothing can change. So, this
condition is a permanent blocking of the threads, and a deadlock.

This
kind of deadlock is avoided by establishing an order in which locks are acquired,
a lock hierarchy. When all threads always acquire locks
in the specified order, this deadlock is avoided.

Adherence to a strict order of lock acquisition is not always optimal.
For instance, thread 2 has many assumptions about the state of the module
while holding mutex lock B. Giving up mutex lock B to acquire mutex lock A
and then reacquiring mutex lock B in that order causes the thread to discard
its assumptions. The state of the module must be reevaluated.

The blocking synchronization primitives usually have variants
that attempt to get a lock and fail if the variants cannot get the lock. An
example is pthread_mutex_trylock() . This behavior of primitive
variants allows threads to violate the lock hierarchy when no contention occurs.
When contention occurs, the held locks must usually be discarded and the locks
reacquired in order.

Deadlocks Related to Scheduling

Because the order in which locks are acquired is not guaranteed, a problem
can occur where a particular thread never acquires a lock.

This problem usually happens when the thread holding the lock releases
the lock, lets a small amount of time pass, and then reacquires the lock.
Because the lock was released, the appearance is that the other thread should
acquire the lock. But, nothing blocks the thread holding the lock. Consequently,
that thread continues to run from the time the thread releases the lock until
the time the lock is reacquired. Thus, no other thread is run.

You can usually solve this type of problem by calling sched_yield()(3C) just before the call to reacquire the lock. The sched_yield() function allows other threads to run and to acquire the lock.

Because the time-slice requirements of applications are so variable,
the system does not impose any requirements. Use calls to sched_yield() to
make threads share time as you require.

Locking Guidelines

Follow these simple guidelines for locking.

Try not to hold locks across long operations like I/O where
performance can be adversely affected.

Do not hold locks when calling a function that is outside
the module and might reenter the module.

In general, start
with a coarse-grained approach, identify bottlenecks, and add finer-grained
locking where necessary to alleviate the bottlenecks. Most locks are held
for short amounts of time and contention is rare. So, fix only those locks
that have measured contention.

When using multiple locks, avoid deadlocks by making sure
that all threads acquire the locks in the same order.

Finding Deadlocks

The Sun Studio Thread Analyzer is a tool that you can use to find deadlocks
in your program. The Thread Analyzer can detect potential deadlocks as well
as actual deadlocks. A potential deadlock does not necessarily occur in a
given run, but can occur in any execution of the program depending on the
scheduling of threads and the timing of lock requests by the threads. An actual
deadlock is one that occurs during the execution of a program, causing the
threads involved to hang, but may or may not cause the whole process to hang.