John Hughes is co-founder and CEO of Quviq AB, and the originator of
Quviq QuickCheck. From 2002-2005 he led a major research project in
software verification, funded by the Swedish Strategic Research
Foundation. This led to the development of Quviq QuickCheck in Erlang.
Before John's involvement with Erlang, he was deeply involved with the
design of Haskell from the start, and co-chaired the committee that
defined the current language standard.

How I found five lurking race conditions in mnesia with 200 lines of QuickCheck code

Race conditions are among the worst kind of problem to debug: they tend to appear only rarely (and unrepeatably), often arise only in long running cases in production, and leave little evidence of what went wrong. Erlang is not immune to race conditions, despite its excellent support for concurrency, and they can give rise to rare intermittent failures in OTP libraries such as mnesia, the OTP database. Mnesia is known to fail "once every month or two" in production, and race conditions are one likely cause.

One reason race conditions tend not to be found earlier is that it is hard to write unit tests that might reveal them; thus they are not usually found until integration testing at the earliest. Moreover, the unit tests needed are always concurrent, and can be rather intricate and unobvious. Recently we extended QuickCheck with features to generate concurrent unit tests to provoke race conditions, and shrink them to minimal examples. With less than 200 lines of QuickCheck code, I was able to generate concurrent tests of dets (the disk storage layer of mnesia), and provoke five separate race conditions in short order. I'll explain the technique, which is easy to apply to any software specified by a QuickCheck state-machine.