Friday, November 1, 2013

Test stream programming using Haskell's `QuickCheck`

pipes is a stream programming library built on top of a foundation of basic category theory. The core of the library consists of a set of five categories that all intersect in a single streaming data type and the library's contract is the set of associated category laws.

For example, one such category is the "respond category", which pipes uses to implement for loops and ListT. The two key operations are (/>/), which is the category composition operator, and respond, which is the identity. These must satisfy the following category laws:

Previously, I described how I manually proved the category laws using equational reasoning, which elicited a strong response from readers that I should also verify the laws empirically using Haskell's QuickCheck library. After all, Haskell's QuickCheck library shines at automated property-based testing. Why not just fire up QuickCheck and test something like:

> quickCheck $ \f -> f />/ respond == f

However, this leads to several problems:

You can't compare pipes for Equality

QuickCheck can't Show pipes when it discovers a counter-example

You can't generate Arbitrary pipes

The latter is the most challenging problem of the three to solve: how do we generate random pipes to test? This has to be done in such a way that it exercises the system and efficiently discovers corner cases.

Randomizing pipes

I decided to try encoding a random pipe as random sequence of operations (specifically Kleisli arrows). This actually works out quite well, because we already have two natural operations built into the core bidirectional API: request and respond. Both of them allow information to pass through twice, as illustrated by the following ASCII diagrams:

When we compose proxies using something like (>+>) (the bidirectional generalization of (>->)), we conceptually place these random chains side by side and match inputs with outputs. For example, if we generate the following two random sequences of request and respond:

This comparison is pure because the Writer [] base monad is pure, so we can pass it as suitable property that QuickCheck can test. Well, almost...

We also need to be able to Show the randomized values that we selected so that QuickCheck can print any counter-examples it discovers. The solution, though, is pretty simple. We can use an intermediate representation that is just an enumeration. This just stores placeholders for each action in our chain, and these placeholders are Showable:

QuickCheck then generates 100 random test cases and verifies that all of them obey the associativity law:

>>> main
++ OK, passed 100 tests.

However, this is still not enough. Perhaps my randomization scheme is simply not exercising corner cases sufficiently well. To really convince myself that I have a good randomization scheme I must try a few negative controls to see how effectively QuickCheck uncovers property violations.

For example, let's suppose that some enterprising young NSA employee were to try to commit a modification to the identity pipe pull' to try to log the second value flowing upstream. We could set up a test case to warn us if the modified pull' function failed to obey the identity law:

Not only does QuickCheck detect violations, but it also goes out of its way to minimize the violation to the minimum reproducing test case. In this case, the way you read the QuickCheck output is that the minimal code necessary to trigger the violation is when:

p1 = respond
p2 = request >=> request

In other words, QuickCheck detects a spurious log on the left-hand side of the equation if p2requests two values and p1responds with at least one value. Notice that if p1 did not respond with at least one value then the left pipeline would terminate before p2's second request and avoid triggering the log statement.

QuickCheck can do this kind of minimization because of purity. Since our test case is pure, QuickCheck can safely run it repeatedly as it tries to shrink the counter-example, without having to worry that repeated runs will interfere with each other because of side effects or statefulness.

Here's another example, where we accidentally wrote pull wrong and inserted one extra request too many:

This instructs p1 and p2 to exchange information twice. This triggers pull' to accidentally request one value too many after the second exchange and terminate early before p2 can call inc.

Examples like these give me confidence that permutations on these four actions suffice to build most useful counter-examples. I could probably even narrow it down to three commands by eliminating log, but for now I will keep it.

Conclusion

I believe this is the first example of a useful Arbitrary instance for randomizing a stream programming data type. This allows pipes to test more powerful properties than most stream programming libraries, which would normally settle for randomizing input to the system instead of randomizing the flow of control.

I want to thank Csernik Flaviu Andrei who took the time to write up all the pipes laws into a QuickCheck suite integrated with cabal test. Thanks to him you can now have even greater confidence in the correctness of the pipes library.

The next step would be to machine-check the pipes laws using Agda, which would provide the greatest assurance of correctness.

So, technically `>+>` both sides and the top. I just didn't show it in the arrow diagrams of `Pipes.Core` because then the arrows got too cluttered. In the diagram for `>+>` composition there should be an arrow from the downstream pipe's first upstream argument to the upstream pipe's first input.