Pre-conj Interview: Steve Miner

Steve Miner’s talk at the conj is about automatically creating test.check generators from a data-driven schema.

Background

Steve Miner is the author of the Herbert schema library. In Herbert, schemas are presented as plain EDN. They describe the structure and type of data. You can take a value and validate that it conforms to the schema. He is going to present a way to generate test.check generators from the schema. This way, you can do runtime checks and test-time checks. A good intro to Herbert is Steve Miner’s lightning talk from Clojure/conj last year. Reid Draper gave a talk about test.check at Clojure/West.

Why it matters

test.check contains a set of combinators to create new generators from existing generators. But combinators are not very easy to read or write when the data they are generating is fairly complex. Being able to generate them automatically from a declarative, data-driven style is one way to make it easier.

Besides being easier, it also unifies the contracts (schema validation in function preconditions) with generative testing. The next domino would have to be building core.typed type signatures from the same data.

Introduction

Interview

PurelyFunctional.tv: How did you get into Clojure?

Steve Miner: I think it was around 2008 when Paul Graham announced his new programming language, Arc. I was reading about Arc, when I came across a comment about Clojure and decided to take a look. I was very impressed with Rich Hickey’s intro video. The immutability and concurrency features really resonated with what I was hoping to find in a new language. The Java integration made Clojure a practical language with a huge eco-system of tools and libraries. I was using Java at work and I managed to do a bit of Clojure for a side project, but mostly I was just dabbling. A couple of years later, I decided to work on my own with Clojure full-time.

PF.tv: Can you describe Herbert for those who have never used it? Why would someone be interested?

SM: Herbert is a schema language for edn. The goal is to have a convenient language for describing the shape of Clojure data. I started out with an informal notation that I used for my internal documentation. For example, describing a map with certain required keys and the corresponding value types. I think you can guess what {:name str :num int} means as a schema. It turned out that with a little work, that informal notation could be used as a pattern language with a simple API for testing conformance. A Herbert schema is itself just edn data, open to all your Clojure tools. More recently, I added the ability to generate test.check generators from Herbert schemas, which makes it easy to generate test data.

PF.tv: That sounds nice. Would you mind explaining test.check generators a bit for those who don’t know?

SM: Reid Draper ported QuickCheck from Haskell to Clojure, and it became a contrib library called test.check.

Test.check is a property-based testing tool. A property is basically an invariant that should hold true over a range of input values. The test.check library gives you combinators that allow you to define generator functions which create data of specified types with optional constraints. The idea is to think about the whole range of possible inputs that your system should handle. Test.check then automatically tests across a random sample of generated data, probably generating example data that you might not have considered in your typical unit testing. If it finds a failure case where the desired property does not hold, test.check is smart enough to regenerate test cases so as to shrink the failure example to a reasonable size. That helps you isolate the cause of the failure.

PF.tv: So you’re able to automatically create generators from your schemas, which can also be used as contracts on your function arguments. Has having both improved your bug rate?

SM: I don’t have any numbers, but subjectively I think it’s helped. For me, Herbert schemas are primarily documentation tools, which help me to keep track of my data. That being said, I often test schema conformance in preconditions or asserts, especially with new code or when I’m trying to debug a problem.

Of course, I still make errors in specifying schemas and sometimes my properties aren’t exactly correct the first time. Particularly with new property-based tests, I have to look carefully at failures in case the bug is actually in the test. My hope is that schema-based generators will make property-based testing easier to use.

Using test.check definitely improves my confidence that I’m finding bugs in testing and avoiding regression errors. It’s been a great way to catch bugs in my own Herbert library.

PF.tv: So you have runtime check to make sure the function arguments conform to certain schemas. And you have a generative test that exercises a large space of that schema. Sounds pretty good to me!

But it sounds like you’re saying the primary benefit is more for you or other readers of your code. Can you elaborate on that?

SM: My approach started with a notation designed to help me keep Clojure data structures straight in my mind. I wanted something simple and terse, what I called a “whiteboard compatible” notation. My goal was that the notation should look something like the data it was supposed to represent as opposed to code or a type system.

So I began with documentation in mind, and I still think of that as the primary benefit. Once I got the idea of implementing conformance testing against formal schemas, the project became more about the code.

PF.tv: In what other areas do you see schemas playing a part? The first thing that comes to mind is writing core.typed type annotations. Anything else?

SM: There’s some conceptual overlap between schemas and type systems, but I see core.typed as a much more ambitious project. Herbert schemas only cover edn data and don’t deal with function types, for example. The Datomic database naturally has a schema language, so it would be interesting to see if Herbert could useful for data modeling. In the near term, I plan to extend Herbert so that it supports the Transit datatypes.

PF.tv: What resources would you recommend to a beginner who wanted to make the most of your talk?