Recursive Locks Will Kill You!

by Ed on October 14, 2008

A while back, at Adobe, I ran into a most evil bug that took me hours to track down. In the end, it was the end results of people violating what (to me) are the three tenets of threaded programming. This post is what I ended up establishing as the thread safety guidelines for our project.

Before digging into it, I should say this: threads are evil. If you don’t believe this, then I believe you are made of living tissue over metal endoskeleton. You should endeavor to avoid threads at all cost. In most cases, timers are your best friend. Know them. Love them.

Of course, there are times when you need to use threads. I needed to add a few threaded operations to my iPhone application to make sure the UI stayed responsive. On the project at Adobe, we were doing thumbnailing of images, definitely something you don’t want happening on the main thread.

1. Avoid recursive locks

A recursive lock is a lock which allows the same thread that locked it to lock it again. The advantage is that you don’t need to worry about what’s locked and what’s not on any particular thread. They also make it easy to retrofit locks into existing code.

The problem though is they most often end up being used to lock large pieces of code for indeterminate amounts of time. This makes things very coarse-grained. They also lead to issues where if you didn’t unlock the exact right amount of times, you will leave a thread holding the lock forever. It’s very easy for this to happen. And these problems are nearly impossible to find because the lock that was taken is usually very far away from where the hang/deadlock happens later.

Better to use a real honest-to-goodness lock. This means that if a thread tries to lock a lock that’s already, well, locked, you’ll deadlock. This is good. I like this, just as I like crashes. I know, you think I’m crazy, but if you deadlock on your own thread, then you know you just did something bad. Most likely, you’ve called out of a function when locked to another function that ends up needing the same resource.

Most people, when confronted with this, switch to recursive locks. Problem solved, right? Wrong. You’ve just opened yourself up to a world of hurt, mostly due to what I said above. So what do you do? Well, what I typically do is split the function into two. Consider:

That’s it. So basically, put the locks on the outermost (public) layer and do whatever you need to while locked on the internal layer. Yes, it’s more typing, but it’s really clarifies exactly what is going on.

The other thing about non-recursive locks like pthread_mutex is that is also forces the next two points on you, which you probably would want to do anyway.

2. Keep it short and sweet

You should hold your locks for as short a period of time as is necessary or practical. Using a non-recursive lock helps to enforce this rule. This is convenient since contention will be reduced. On the other hand, don’t go insane and lock every little thing. Make your locks coarse-grained enough to avoid many trips to the kernel via locking, but fine-grained enough to know that contention is low between threads so they aren’t running choppily.

3. Call out when locked, go to jail

Never, ever lock a mutex and then call out to some other function (unless it’s some other utility function on the same object that expects the resource to be locked). This is a deadlock waiting to happen, as that callout might end up calling something that also wants that lock. Now you’re deadlocked. Use of recursive locks over large bodies of code almost always makes this a certainty to occur.

(As a side note, it is fine to use a recursive lock if you follow 2 & 3 religiously and can guarantee correctness, as then the locality of the locks should make things obvious where problem are when there is a problem. But I still prefer non-recursive. It forces you to get it right, and enough can already go wrong with threaded programming.)

My original document also had notes about other topics, but they were more specific to our project. I think the above has universal relevance, however. As I originally mentioned, our project was violation all three rules, but I’m pretty sure the root cause was #1. All other evil tends to stem from there.

I think if you follow these three rules, you’ll take a lot of the guesswork out about what’s locked when, and you’ll find problems immediately (it will just deadlock, which is typically an easy fix). This should allow you to focus on the more important issues and sidestep the nastiness that might otherwise befall you.

Hi man,
sorry to say this, but I would say you are talking full BS! You seem not have understood what exactly locking is. Your both examples are wrong, wrong wrong! Check it again!
And: recursive locks are very useful.