Using Hypothesis

Posted on February 2, 2015

One of the difficulties when writing unit tests is picking input data (or test cases) that expose all of the potential bugs in your code. Often, you write test cases to catch the bugs you know exist (or thought about guarding against), but miss the input that would lead to the bugs that still exist in your code. An alternative to this is property-based testing, where rather than choosing inputs yourself, you let the computer choose inputs for you.

Property-based testing is an alternative approach to unit testing where rather than describing specific input values and the results of executing operations on those values, you instead write a test that should be true of all input data of a certain form, and then let the test framework feed in data until it either finds an example that fails, or runs out of attempts to do so. The grand-daddy of all property-testing libraries is QuickCheck, but re-implementations exist in many languages.

This testing technique is most directly applicable to pure functions (where the output of the function depends only on the input), but can be used to generate test data for many other types of tests as well.

For instance, in the edX LMS, we have several functions to encode and decode strings with / in them.

But run the same test again, and it passes! That highlights one of the biggest issues with property-testing, which is that it relies on generating enough input data to catch the bug. As the space of input grows, so does the time it takes to explore it.

With Hypothesis, we can combat this by increasing the number of examples.

Now the test consistently detects our bug (the number of examples and test timeout can both be tuned to limit the amount of time spent in the test suite).

An advantage to this randomized testing over our fixed list of strings is that it will test characters that we might not have thought to in our attempt at an exhaustive list. If we change the set of characters used to encode the /, our tests won’t need to change. However, the test is still limited by what characters might be generated by the random data generator. If we switch @given(str) to @given(unicode), the test no longer identifies the bug, because Hypothesis uses a data generator for unicode that includes only numbers and ascii characters (and no symbols such as /). This seems like a questionable choice to me, but was perhaps made to limit the search space to “text-like” strings. There is always a tradeoff between on breadth and depth of the search that property-based testing makes, because there is a finite time to generate new test data. By limiting the number of characters used to generate strings, we can expect to more completely explore the space of a given string length.

One might also consider injecting generated strings into a list of ddt items.

This would give some of the advantages of using property based testing. However, one facility that this wouldn’t provide is test-case shrinking. Hypothesis, like QuickCheck before it, will attempt to reduce your test cases for you, when it finds a failure, to find the smallest possible counterexample for the property. This is important, especially when your test generation code can potentially produce very, very large input data initially.

I think that Hypothesis may have a place in the edx testing ecosystem. The methods covered in this post would benefit, and there are likely other properties that we could test as well, especially with a little investment in data generation. For instance, we could generate random courses with the installed XBlocks, and validate that import/export are inverses. We might also be able to test stateful code using Hypothesis’ stateful testing mechanism (which I hope to explore in a future post).

My name is Calen Pennington. I'm a software developer and lead architect at edX,
father to a two-year old, part-time haskell hacker, board/card/video-gamer. This blog will primarily
focus on the first of those.
Find me on: