1

Haskell as an ultimate "smoke testing" tool

OR

Using QuickCheck as DIY test data generator

1.1 Preface

Recently, my wife approached me with the following problem: they had to
test their re-implementation (in Java) of the part of the huge
software system previously written in C++. The original system is poorly
documented and only a small part of the sources were available.

Among other things, they had to wrote a parser for home-brewn DSL
designed to describe data structures. DSL is a mix of ASN.1 and BNF
grammars, describes a structure of some data records and simple
business rules relevant to processing of said record. The DSL is not
Turing-complete, but allows user to define it's own functions,
specify math and boolean expression on fields and was designed as
"ASN.1 on steroids".

Problem is, that their implementation (in JavaCC) on this DSL parser
was based on the single available description of the DSL grammar,
which was presumably incomplete. They tested implementation on several
examples available, but the question remained how to test the parser on a
large subset of data in order to be fairly sure that "everything
works"

1.2 The fame of Quick Check

My wife observed me during the last (2005) ICFP contest and was amazed
at the ease with which our team has tested our protocol parser and
printer using Quick Check. So, she asked me whether it is possible to
generate pseudo-random test data in the similar manner for use
"outside" of Haskell?

"Why not?" I thought. After all, I found it quite easy to generate
instances of 'Arbitrary' for quite complex data structures.

1.3 Concept of the Variant

The task was formulated as follows:

The task is to generate test datasets for the external program. Each dataset consists of several files, each containing 1 "record"

A "record" is essentially a Haskell data type

We must be able to generate pseudo-random "valid" and "invalid" data, to test that external program consumes all "valid" samples and fails to consume all "invalid" ones. Deviation from this behavior signifies an error in external program.

Lets capture this notion of "valid" and "invalid" data in a type
class:

The careful reader will have already spotted that once we hand-coded the instances of 'Variant' for a few "basic" types (like 'Name', 'Number', 'OutputType' etc), defining instances of Variant for more complex datatypes becomes easy, though quite a tedious job. We call to the rescue a set of simple helpers to facilitate this task

1.4 Helper tools

It could easily be seen that we consider an instance of a data type to be "invalid" if at least one of the arguments to the constructor is "invalid", whereas a "valid" instance should have all arguments to data type constructor to be "valid". This calls for some permutations:

You see that we could control size, nature and destination of each test dataset. This approach was taken to produce test datasets for the task I described earlier. The final Haskell module had definitions for 40 Haskell datatypes, and the topmost datatype had a single constructor with 9 fields.

This proved to be A Whole Lot Of Code(tm), and declaration of "instance Variant ..." proved to be a good 30% of total amount. Since most of them were variations of the "oneof [proper Foo, proper2 Bar, proper4 Baz]" theme, I started looking for a way so simplify/automate generation of such instances.

1.6 Deriving Variant instances automagically

I took a a post made by Bulat Ziganshin on TemplateHaskell mailing list to show how to derive instances of 'Show' automatically, and hacked it to be able to derive instances of "Variant" in much the same way: