Rants of an ADO.NET Dev

Monthly Archives: February 2012

Over the past few weeks I’ve been covering various ways to fix a race condition bug. Answer 0 was a very basic solution where we were using a lock to serialize access, and with Answer 1 we significantly improved the performance without sacrificing safety by using the “Copy, Check, Continue” pattern.

However, what you may have noticed from the code is I had a variable called _disposed that I was using it to indicate that we have nulled out _foo – but what if _foo was IDisposable? And we had to make sure that we don’t call any method on it after we disposed it?

But where is the fun in that?

The most obvious thing that we could do is to simply use Answer 0 – by putting locks around the usage of _foo we can be certain that _disposed accurately reflects if _foo has been disposed or not, but again this is a trivial solution that has a performance cost.

An alternative is to implement Answer 1 and then to catch any ObjectDisposedExceptions that are thrown. This allows us to be “lazy” with our thread safety, and so get back some performance. The problem with this solution is that we can’t be sure that using _foo will throw an ObjectDisposedException: it is entirely possible that _foo will null out one of its fields and we would end up with another race condition similar to the one we were originally trying to solve; or it may just throw the wrong exception type (for example, using a closed\disposed SqlConnection will result in an InvalidOperationException).

Teamwork is key

What makes Answer 0 safer compared to Answer 1 is that both threads are aware of what each other are doing, and they are coordinating their actions. If we were simply concerned that access to _foo would be serialized under a lock, then we could switch to a ReaderWriterLockSlim, and have Thread1 obtain a ReadLock and Thread2 obtain the WriteLock. However, if there are few threads trying to access _foo at once, then you’ll probably find that an uncontested lock provides better performance than the ReaderWriterLockSlim.

An alternative to this is to implement similar mechanics to a ReaderWriterLockSlim but using faster primitives without any locks. Firstly, there are two things our ReaderWriterLockSlim alternative needs to keep track of: the number of readers in the reader lock, and if the writer lock is in used – this suggests that we will need an integer for the readers, but a bool for the writer. Secondly, if the writer lock is held, then no readers may enter the lock; but the writer can not do its work until all readers have finished reading. So, putting this together, we already have our writer lock bool (_disposed), and we can introduce the int for the readers (_activeReaders). Readers must then Increment _activeReaders when they start, and decrement it when they end – but must not do any work if _disposed is true. Similarly, the writer set _disposed to true when it starts and then waits for the readers to finish (i.e. _activeReaders becomes 0).

There are a couple things that you should note about this solution:

We expect there to only ever be one thread “writing” at a time (otherwise you need to synchronize access to _disposed – or switch it for an integer, increment it and then only disposed if it equals 1)

Once we have “written” no other locks can be taken again

I’ve used Interlocked.Increment\Decrement to manipulate _activeReaders (this is because incrementing\decrementing is not guaranteed to be atomic)

I’ve had to disable warningCS0420: “a reference to a volatile field will not be treated as volatile”, but it is safe to do so because the Interlocked APIs are “volatile aware”.

If you didn’t understand the title, then this post isn’t for you.If you think you understood, and you think that this may help you with your debugging – then turn back now, you’ve gone completely the wrong way.

I’ve put this disclaimer here since, unless you are interested in how the JIT works (in which case, skip this and read on!), the only reason you are getting the assembly from the JIT is because you believe that the JIT compiled something incorrectly (or, at least, that was why I was investigating this – although it turned out that we were hitting an issue due to Out of Order Execution which lead to a race condition which had a window of a couple of CPU instructions – which just goes to show how awesome our stress test is).

As a note (if you want to try this at home) this uses the SpotTheDefect[0] code, which has been modified to initialize _foo to null (ensuring that the application crashed) and I’m trying to get the IL and disassembly for the Thread1 method

Then we can find the method table for the class that contains the method we want. In my case, I’m looking for the Program class which is under the SpotTheDefect1 namespace in the SpotTheDefect1 assembly:

You will notice that the methods are marked wither “JIT”, “PreJIT” or “NONE” – this indicates if the JIT has compiled the method (“JIT”), if NGEN compiled the method ahead of time (“PreJIT”) or if the method is yet to be compiled (“NONE”). I’m interested in Thread1, which has been compiled by the JIT, so now I can dump out the method descriptor (I’ll also show later on what happens if you try to get a method that is yet to be compiled).

A couple of things to note: Firstly, the IL tends to match up with the actual code quite accurately (unless you are using compiler magic like yield return or await/async), and I’m using !U to dump the disassembly which (if you have the symbols) will show you what line of code generated which assembly commands (although the JIT may rearrange sections of code (especially for try/catch/finally statements) – so don’t expect the assembly to always work out as nicely as it did above). You may have also noticed that the assembly is quite verbose, contains debugging information (“CORINFO_HELP_DBG_IS_JUST_MY_CODE”) and some unnecessary NOPs (e.g. line 28 of my code was turned into a single NOP) – this is because I was using a debug build of SpotTheDefect1.exe, if I was using the release build, then a lot of these instructions would be optimized away (or not generated in the first place).

Finally, if we were interested in the assembly for Thread2, we could try to dump of the method descriptor

A couple of weeks ago I asked you to play a little game of “Spot the Defect”, and recently I posted the first of three posts with the answers, which focused on the output of the program, what the bug was and a possible way to fix it. This first fix focused on using locks to serialize access and, while locks in .Net are very fast, they are not free and prevent your application from taking full advantage of multi-core hardware. So now we are going to explore a lock free, safe alternative – but first we need to take a step back and remember how objects work in .Net (and most other Object Orientated runtimes)

Can you give me some pointers?

Usually when we code in an object orientated language and we “new up” an object we have a mental model that says that our variable contains the actual object. Similarly, if we then set that object to null, then we believe that we are removing the object from existence – but this is not the case. The variable we assigned the object to does not contain the actual object, but rather it holds a pointer to the actual object in the heap, and so setting that variable to null merely means that we are clearing the pointer, but the object will still reside in the heap until the Garbage Collector (GC) comes along to remove it. Additionally, all we need to do to ward off the GC from destroying an object is to maintain a reference (i.e. a pointer) to it.

Copy, check, continue

(There is probably already a name for this “pattern”, but I couldn’t really find it – so this will have to do. If you have a better name, or find the real name, let me know.)

So, to avoid taking locks, there are a few things that we need to do: firstly, we need to create out own reference to the object that _foo is pointing to such that we don’t have to rely on using _foo (which could become null at any time) and to prevent the GC from eating the object. The easiest way to do this is to make a copy of _foo (remember, that is copying the pointer, not the object). Which leads to my second point: once we’ve made the copy of _foo, we need to check that our copy isn’t null (because _foo has already been set to null). Finally, if our copy isn’t null, then we can continue to do whatever we planned to do.

A couple of notes about the code (you may want to come back to these after reading the code)

The performance overhead to Thread1 is minimal (we allocate a pointer on the stack, copy a pointer to it and then check if it is 0 – in reality the compiler will probably optimize fooLocal away and just use a register, further reducing the overhead)

If you were really concerned about performance, you could now drop the _disposed variable as well (since it’s not really need it)

Scheduling Conflict

Most of the time, the application would have kicked off both threads, and these threads would run in their entirety without ever being preempted. Additionally, since we started Thread1 before Thread2, in all likelihood Windows would have scheduled and ran the threads in that order. However, it is entirely possible that Windows could decide to preempt either thread at any stage in our code, or execute them in any order. So, in order to see “Disposing” before “Hello, World!” we would need Thread2 to go first and then be preempted just after it wrote to the console to allow Thread1 to run in its entirety. The NullReferenceException is the opposite: If Thread1 is preempted just after it checks _disposed and enters its if-statement, then this allows Thread2 to “race in” and set _foo to null, causing Thread1 to hit a NullReferenceException when it resumes.

This kind of bug is especially difficult to diagnose because of how “tight” the timing is. You would see the exception intermittently at best (you can try this by wrapping it in a for-loop), and it would occur even less frequently if there was a debugger attached or if you added more diagnostics\tracing (simply because there becomes more “safe” lines of code to be preempted at, so the timing window becomes even tighter). And even if, somehow, you did manage to get a crash dump with the exception, you would need to dig through all of your code to try to find how you ended up crashing (which, if you’re not thinking about how threads can affect each other, is about as effective as bashing your head against a wall).

When in doubt, lock

At the beginning of this post, I described this as the “trivial” solution, but that doesn’t mean that you should discredit it – although locks seem like the most heavy handed approach, they are usually the safest as well. For those of you unfamiliar with C#’s lock keyword (or the MonitorEnter\Exit that it wraps), this is a language construct that acts similar to a critical section or a mutex, except that you do not create a synchronization primitive to manage the state of the lock, but rather you lock on an object (and the Monitor maintains the lock state). Take a minute to think about that: you must lock on an object – not a primitive, not a struct, and not null. This complicates matters for our code, since the only object we have is _foo, and we can’t guarantee that it isn’t null. We also can’t lock on the ‘this’ object because we are in a static method (not that you should ever lock on ‘this’ or any other object that is publically visible, as this can lead to unexpected deadlocks if other code decides to lock on your object or its properties\return values). The common solution to this problem is to add a dedicated ‘lock object’ that is only ever used for locking and is guaranteed to not be null.

Many developers shy away from locks as they are viewed as performance issues and the leading cause of hangs. While a lock is not free, you’d be surprised just how fast a lock is, especially when there is no contention on it. Deadlocks, however, always remain a problem – but, I can say from experience: it is much easier to reason about the ordering that locks are taken and released, than to consider and find the type of bug I’ve described above.

So, this “trivial” solution is to add a new “lock object”, and then to serialize access to _foo: