Content series:

This content is part # of # in the series: Common threads

This content is part of the series:Common threads

Stay tuned for additional content in this series.

Mutex me!

In my previous article, I talked about threaded code that did unusual and unexpected things. Two threads each incremented a global variable twenty times. The variable was supposed to end up with a value of 40, but ended up with a value of 21 instead. What happened? The problem occurred because one thread repeatedly "cancelled out" the increment performed by the other thread. Let's take a look at some corrected code that uses a mutex to solve the problem:

Comprehension time

If you compare this code to the version in my previous article, you'll notice the addition of the calls pthread_mutex_lock() and pthread_mutex_unlock(). These calls perform a much-needed function in threaded programs. They provide a means of mutual exclusion (hence the name). No two threads can have the same mutex locked at the same time.

This is how mutexes work. If thread "a" tries to lock a mutex while thread "b" has the same mutex locked, thread "a" goes to sleep. As soon as thread "b" releases the mutex (via a pthread_mutex_unlock() call), thread "a" will be able to lock the mutex (in other words, it will return from the pthread_mutex_lock() call with the mutex locked). Likewise, if thread "c" tries to lock the mutex while thread "a" is holding it, thread "c" will also be put to sleep temporarily. All threads that go to sleep from calling pthread_mutex_lock() on an already-locked mutex will "queue up" for access to that mutex.

pthread_mutex_lock() and pthread_mutex_unlock() are normally used to protect data structures. That is, you make sure that only one thread at a time can access a certain data structure by locking and unlocking it. As you may have guessed, the POSIX threads library will grant a lock without having put the thread to sleep at all if a thread tries to lock an unlocked mutex .

For your enjoyment, four znurts re-enact a scene from recent pthread_mutex_lock() calls

The thread in this image that has the mutex locked gets to access the complex data structure without worrying about having other threads mess with it at the same time. The data structure is in effect "frozen" until the mutex is unlocked. It's as if the pthread_mutex_lock() and pthread_mutex_unlock() calls are "under construction" signs that surround a particular piece of shared data that's being modified or read. The calls act as a warning to other threads to go to sleep and wait their turn for the mutex lock. Of course this is only true if your surround every read and write to a particular data structure with calls to pthread_mutex_lock() and pthread_mutex_unlock().

Why mutex at all?

Sounds interesting, but why exactly do we want to put our threads to sleep? After all, isn't the main advantage of threads their ability to work independently and in many cases simultaneously? Yes, that's completely true. However, every non-trivial threads program will require at least some use of mutexes. Let's refer to our example program again to understand why.

If you take a look at thread_function(), you'll notice that the mutex is locked at the beginning of the loop and released at the very end. In this example program, mymutex is used to protect the value of myglobal. If you look carefully at thread_function() you'll notice that the increment code copies myglobal to a local variable, increments the local variable, sleeps for one second, and only then copies the local value back to myglobal. Without the mutex, thread_function() will overwrite the incremented value when it wakes up if our main thread increments myglobal during thread_function()'s one-second nap. Using a mutex ensures that this doesn't happen. (In case you're wondering, I added the one-second delay to trigger a flawed result. There is no real reason for thread_function() to go to sleep for one second before writing the local value back to myglobal.) Our new program using mutex produces the desired result:

If this code were in a single-threaded program we'd expect the thread_function() code to execute in its entirety. The execution would then be followed by the main thread code (or the other way around). In a threaded program without mutexes, the code can (and often will, thanks to the sleep() call) end up executing in the following order:

When the code executes in this particular order, the main thread's modification to myglobal gets overwritten. We then end up with an incorrect value at the end of our program. If we were manipulating pointers we'd probably end up with a segfault. Notice that our thread_function() thread executes all its instructions in order. It's not as if thread_function() does anything out of order. The problem is that we have another thread performing the other modifications to the same data structure effectively at the same time.

Inside threads 1

Before I explain how to figure out where to use mutexes, I'll offer a little insight into the internal working of threads. Here's our first example:

Let's say you have a main thread that creates three new threads: threads "a", "b", and "c". Let's say that thread "a" is created first, thread "b", second and thread "c" last.

After the first pthread_create() call completes, you can assume either that thread "a" exists or that it has finished and is now stopped. After the second pthread_create() call, both the main thread and thread "b" can assume that thread "a" exists (or has stopped).

However, immediately after the second create() call returns, the main thread can't assume which thread (a or b) will actually start running first. Although both threads exist it's up to the kernel and threads library to give them a slice of CPU time. And there is no hard rule as to who goes first. Now, it's very likely that thread "a" will start executing before thread "b", but it isn't guaranteed. This is especially true on a multi-processor machine. If you write code that assumes that thread "a" will actually start executing its code before thread "b" starts, you'll end up with a program that works 99% of the time. Or worse, a program that works 100% of the time on your machine but 0% of the time on your client's quad-processor server.

Another thing that we can learn from this example is that the threads library preserves the order of code execution for each individual thread. In other words, those three pthread_create() calls will in fact execute in the order that they appear. From the main thread's perspective, all the code is executing in order. Sometimes we can take advantage of this to optimize parts of our threaded programs. For instance, in the above example, thread "c" can assume that thread "a" and "b" are running or have already terminated. It does not have to worry about the possibility that threads "a" and "b" haven't been created yet. You can use this logic to optimize your threaded programs.

Inside threads 2

OK, now for another hypothetical example. Let's say we have a bunch of threads that are executing the following code:

myglobal=myglobal+1;

Do we need to lock and unlock the mutex before and after the increment respectively? Some of you may say "no". The compiler will after all very likely compile the above assignment into a single machine instruction. As you know a single machine instruction cannot be interrupted "mid-stream". Even hardware interrupts will respect the atomicity of machine instructions. Because of this tendency it's tempting to leave out the pthread_mutex_lock() and pthread_mutex_unlock() calls entirely. Don't do it.

Am I being a wimp? Not really. First, you shouldn't assume that the
above assignment would be compiled as a single machine instruction
unless you personally verify the machine code yourself. Even if you
insert some inline assembly to ensure that the increment happens
atomically -- or even if you wrote the compiler yourself! -- you may still have problems.

Here's why. Using a single
inline assembly opcode will probably work wonderfully on a
uni-processor machine. Each increment will happen atomically, and
chances are you'll get the desired result. But a multi-processor
machine is another story. On a multi-CPU
machine, you can have two separate processors executing the above
assignment at nearly (or at times exactly) the same time. And remember
that this memory modification will need to trickle down from L1 to L2
cache, and then to main memory. (An SMP machine is not
just an additional processor; it also has special hardware that
arbitrates access to RAM.) In the end, you'll really have no idea
which CPU "wins" the race of writing to main memory. To produce
predictable code, you'll want to use mutexes. Mutexes will insert a
"memory barrier," which ensures that the writes to main memory occur in
the order the threads lock the mutex.

Consider an SMP architecture that updates main memory in 32-bit blocks.
If you are incrementing a 64-bit integer without mutexes, the
uppermost 4 bytes might come from one CPU and the other four might come
from another. Bummer! Worst of all, using poor technique will
probably make your program bomb out once in a blue moon or at
3 AM on an important client's system.
David R. Butenhof covers the
possible permutations of not using mutexes in his book, Programming
with POSIX Threads (see Related topics at the end of this article).

Many mutexes

If you place too many mutexes, your code won't have any kind of
concurrency and will run slower than a single-threaded solution. If
you place too few, your code will have weird and embarrassing bugs.
Fortunately, there is a middle ground. First
of all, mutexes are used to serialize access to *shared data*. Don't
use them for non-shared data, and don't use them if your program's
logic ensures that only one thread is accessing a particular data
structure at a single time.

Second of all, if you are using shared data, use mutexes for both
reading and writing. Surround your read and write sections with
pthread_mutex_lock() and pthread_mutex_unlock(), or use them any time a
program's invariant is temporarily broken. Learn to view your code
from the perspective of a single thread and make sure each individual
thread in your program has a consistent, friendly view of memory.
It'll probably take you several hours of writing your own code to get
the hang of mutexes, but they will soon become second nature and
you'll be able to use them properly without thinking *too* much.

Using the calls: initialization

OK, now it's time to see all the different ways to use mutexes.
First, we'll start with initialization. In our thread3.c example, we
used a static initialization method. This involved
declaring a pthread_mutex_t variable and assigning it the constant
PTHREAD_MUTEX_INITIALIZER:

pthread_mutex_t mymutex=PTHREAD_MUTEX_INITIALIZER;

That's pretty easy. But you can also create a mutex dynamically. Use this dynamic method whenever your
code allocates a new mutex using malloc(). In this case, the static
initialization method won't do, and the routine pthread_mutex_init()
should be used:

As you can see, pthread_mutex_init accepts a pointer to an
already-allocated region of memory to initialize as a mutex. As a
second argument, it can also accept an optional pthread_mutexattr_t
pointer. This structure can be used to set various mutex attributes.
But usually these attributes are not needed, so specifying NULL is the norm.

Whenever you initialize a mutex using pthread_mutex_init(), it should
be destroyed using pthread_mutex_destroy(). pthread_mutex_destroy()
accepts a single pointer to a pthread_mutex_t and frees any resources
allocated to the mutex when it was created. Be sure to note that
pthread_mutex_destroy() does not free the memory used to store
the pthread_mutex_t. It's up to you to free() your own memory. Also remember that both pthread_mutex_init() and pthread_mutex_destroy() return zero
on success.

Using the calls: locking

pthread_mutex_lock(pthread_mutex_t *mutex)

pthread_mutex_lock() accepts a single pointer to a mutex to lock. If
the mutex already happens to be locked, the caller will go to sleep.
When the function returns, the caller will be woken up (obviously),
and the caller will also now hold the lock. This call either returns zero on
success or a non-zero error code on failure.

pthread_mutex_unlock(pthread_mutex_t *mutex)

pthread_mutex_unlock() complements pthread_mutex_lock() and unlocks a
mutex that the thread has already locked. You should always unlock a
mutex that you've locked as soon as safely possible (to increase
performance). And you should never unlock a mutex that you don't hold
a lock for (otherwise, the pthread_mutex_unlock() call will fail
with a non-zero EPERM return value).

pthread_mutex_trylock(pthread_mutex_t *mutex)

This call is handy when you want to lock a
mutex while your thread is doing something else (because the mutex is
currently locked). When you call pthread_mutex_trylock() you'll
attempt to lock the mutex. If the mutex is currently unlocked
you'll get the lock, and this function will return zero. However, if
the mutex is locked this call won't block. Rather, it will return a
non-zero EBUSY error value. Then you can go about your business and
try to lock at a later time.

Waiting on conditions

Mutexes are necessary tools for threaded programs, but they can't do
everything. What happens, for instance, if your thread is waiting for
a certain condition to appear in shared data? Your code could
repeatedly lock and unlock the mutex, checking for any changes to the
value. At the same time it will also quickly unlock the mutex so that others can make
any necessary changes. But this is a horrible approach because this thread
will need to busy-loop to detect a change in a reasonable time frame.

You could put the calling thread to sleep for a little bit, say three
seconds between each check, but then your threaded code
wouldn't be optimally responsive. What you really need is a way to put a
thread to sleep while it waits for some condition to be met. Once the condition is met you
need a method to wake up the thread(s) that are waiting for that
particular condition to become true. If you can do this, your threaded code will be
really efficient and it won't tie up valuable mutex locks. That's precisely
what POSIX condition variables can do for you!

And POSIX condition variables are the subject of my next
article, where I'll show you exactly how to use them. Then you'll have all
the resources to create sophisticated threaded programs that model
work crews, assembly lines, and more. I'm going to pick up the pace
in the next article now that you're getting more familiar with
threads. I'm hoping this will allow me to squeeze in a reasonably
sophisticated threaded program at the end of the next article. And
speaking of waiting on conditions, I'll see you then!