Introduction

The code is a simple implementation of a readers/writers lock that supports reentrance and lock escalation, i.e. a thread holding a read lock can request and is granted write access provided that no other thread is holding the read lock and a thread holding a write lock is granted read lock access.

Background

The Windows synchronization primitives do not include support for locking readers and writers. Sometimes, it is useful to allow read access to multiple threads without the threads needing to read data, not having to block just because other threads are reading too. The risk of data corruption arises only when the data is altered. Write access must be exclusive (to other writers and to any reader) but read access can be shared between readers. Allowing multiple reader threads to share the lock allows for greater concurrency and reduces the risk of deadlocks. The existing implementations that I could find didn't support reentrancy which was key to avoiding deadlocks in the application I was working on.

Using the code

The code is straightforward to use, either call ClaimReader/ClaimWriter and later ReleaseReader/ReleaseWriter, or use the auto-lock classes AutoLockReader and AutoLockWriter:

Points of interest

When a thread requests for a write lock after having obtained a read lock, the reader lock is released and the thread waits until there are no readers and the write lock is granted. This is needed to avoid deadlocks caused by two threads holding the read lock and simultaneously requesting for a write lock. The release and reacquire upon lock escalation can possibly lead to data corruption as the info that the thread may have gotten while holding the reader lock may not be consistent with the state of the data protected by the lock after the write lock is granted, as other writer threads might have gotten hold of the lock while the thread was upgrading his lock from read to write. To avoid this, the upgraded thread must be conscious of the fact that after having requested for the write lock it has to reacquire info about the data protected by the lock it needs for its write operation. If this behavior is not desired, ClaimWriterNoEscalating() can be used, in which case the lock throws an ImplicitEscalationException exception if it finds that the thread already holds the read lock.

Performance

I've run a simple benchmark program (included in the source example code) and the result is shown in the following graph:

The fully reentrant lock is about 3 times slower than a simple Windows critical section, which is not too bad. It's faster than .NET ReaderWriterLock. The biggest performance gain comes from avoiding the use of heavy-weight OS synchronization primitives and instead using the active wait plus Sleep(0).

History

1-10-06

First release.

1-12-06

Measured .NET ReaderWriterLock.

1-20-06

Use of spin lock and thread local storage for 8x speedup, timeouts, release reader lock when escalating with option to allow or disallow implicit escalation.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

About the Author

Comments and Discussions

I've posted an update to the article that adds a ClaimWriterNoEscalation() method that throws an exception if the thread already has the read lock. ClaimWriterAllowEscalating() instead lets the "implicit" escalation happen and releases the read lock before waiting for the write lock. I'll try using the latter method in my application as it should be fine provided that the thread that gets the write lock knows that some other thread may have gotten hold of the lock while it was waiting for the lock escalation to take place.

The problem isn't the optimization per se, it's the entire concept of switching a reader lock to a writer lock without first releasing the reader lock. It is theoretically impossible to do such a thing without causing deadlock. Let me illustrate with a code walkthrough:

1. Let's start with threads 1 and 2 both having called ClaimReader.
2. Next, both threads call ClaimWriter.
3. Both threads get their thread id.
4. One of the threads (the winner) wins the race to get the critical section (it doesn't matter since both are exactly the same).
5. The winner thread checks to see if it is the only reader. It isn't, the loser thread is also a reader.
6. The winner thread checks to see if it is already a writer. It isn't.
7. The winner thread checks to see if there are no other active readers or writers. There are. The loser thread is an active reader.
8. The winner thread unlocks the critical section (without having changed the state in any way!).
9. The winner thread waits for the spin event.
10. At this point it is likely that the loser thread acquires the critical section, although it doesn't matter.
11. The new winner goes back to step 5.

There is no way out of the loop. Not only are the threads deadlocked, but they are consuming some CPU going around the loop over and over, which to many developers would make it appear as if deadlock has NOT occured.

Implementation issues aside there is no general solution that works in all cases and there never will be. The deadlock is avoidable but the irreconcilable contention is not. When I implemented a read/write lock some time ago, I introduced a third type of lock which was an upgradable (I called it "escalatable") read lock. The idea is that the following types of concurrency are supported

This is incidentally exactly the way boost::shared_mutex works. The advantage is expensive read/modify/write operations need not block plain readers until they are actually performing the write. At the same time, it is guaranteed that an upgradable read lock can be converted to a write lock atomically (barring starvation), so there is no risk that the expensive operation suddenly fails half-way through because of contention.

My implementation additionally offered upgrade operations for plain read locks, but those could fail. Instead of deadlocking, however, the code would detect this condition and throw an exception in the losing (IOW late) thread. There are some corner cases where the best approach isn't obvious. For example, if a reader (thread A) tries to upgrade its lock, but there is already another thread B with a guaranteed upgradable lock, the function could fail immediately. But maybe thread B is actually going to release the lock without upgrading. In this case, it would have been better to just block A until B releases its lock. The optimum strategy depends on how probable it is for thread B to actually upgrade.

That works if you know you will never have more than a few reader/writer locks, but you can't count on an arbitrary number of TLS slots being available--there is a limit, it isn't very large, and if you are writing this in a DLL, you have no way of knowing how many TLS slots other DLLs in the process have used until you run out.

Alright sure, it's over 1K on windows NT systems but if you think you need more there's always other ways...

It's certainly possible to have a single TLS value and simply allocate an array per thread containing the maximum number of r/w locks you'll need.... (manage them in any number of ways)... heck, make the array a few million big if you want and just use the vm functions and don't commit the memory.

The speed increase in switching to using TLS data to indicate the threads read state rather then keeping track using a multiset with calls to new/malloc and it's OWN calls to the TLS functions is probably deserving of another rev, even if it's just a special case for people needing only a small number of r/w locks.

Yes. I believe a proper implementation could rely on getting *one* TLS slot, because that would fail when the DLL is loaded instead of at some random point later. I wouldn't advise making the arrays "a few million big" because you would very quickly run out of virtual address space. Physical memory would be fine, but there IS a limit of 2GB of virtual address space and many systems in use today are already pushing up against that boundary. If you have 50 threads using R/W locks (this is reasonable in many systems--you're using R/W locks so you must have more than 2 threads!), and used 2 million 4-byte values for each thread to track R/W lock access, that's 400MB of virtual address space which is a very significant chunk. A proper implementation would definitely have to be more complicated. If you're not worried about an implementation for general use, this type of simplified system is fine (except for the deadlock problem mentioned in my other thread).

The results in the graph are just the execution times of the demo project (click on "Download demo project" at the top of the page). To test the different locks change the typedef at the bottom of ReaderWriterLock.h