Wednesday, December 30, 2015

enable bounds-checking to throw an exception if you index out of bounds or:

disable bounds-checking to improve performance.

Both of these options are still unsatisfactory, though. Even if your program fails fast with an exception, your program still failed.

Fortunately, Haskell programmers can now select a third option:

verify at compile time that you never index a vector out of bounds.

A new tool named Liquid Haskell makes this sort of verification possible. Liquid Haskell is a customizable static analysis tool for the Haskell language that eliminates a wide variety of programming errors at compile time that would normally be difficult to eliminate using Haskell's ordinary type system.

However, the Liquid Haskell tool still needs a lot of polish. I've been using Liquid Haskell for the last couple of months and in my experience the tool is just starting to become "usable in anger". I'm saying this as somebody who has reported eight issues with the tool so far and encountered a ninth issue while writing up this post.

I'll illustrate how Liquid Haskell works by:

implementing a binary search algorithm for Haskell's Vector type,

using Liquid Haskell to statically verify the complete absence of out-of-bound indexing, and then:

Liquid Haskell

We can run this code through the Liquid Haskell tool in order to locate all potential out-of-bounds indexes. In order to install the tool we must have some SMT solver on our $PATH which Liquid Haskell uses to automate the deduction process. In my case, I installed the Z3 solver by downloading the latest stable release from this page (Version 4.4.1 at the time of this writing):

Okay, so we got four errors on our first check. How concerned should we be?

Usually these errors fall into three categories:

Missing preconditions

We fix these by formally documenting the preconditions of our own functions using Liquid Haskell's type system

Missing postconditions

We fix these by formally documenting the postconditions of functions we use, also using Liquid Haskell's type system

Genuine bugs (i.e. your code is wrong even when all preconditions and postconditions are correctly documented)

We fix these by correcting our code

When we first begin most of the errors will fall into the first two categories (missing preconditions or postconditions) and as things progress we may discover errors in the third category (genuine bugs).

Preconditions

Liquid Haskell documents the preconditions and postconditions of many Haskell functions "out-of-the-box" by providing "refined" type signatures for these functions. You can find all of the built-in refined type signatures here:

(!) is Haskell's infix operator for indexing into a vector. Normally, the ordinary Haskell type signature for this operator would look like this:

(!) ::Vector a ->Int-> a

... which you can read roughly as saying: "The first (left) argument to this infix operator is a Vector containing elements of type a (where a can be any type). The second (right) argument to this infix operator is the index you want to retrieve. The result is the retrieved element.

Oops! This operator is not safe because indexing can fail at runtime with an exception if you index out of bounds.

This is where Liquid Haskell comes into play. Liquid Haskell lets you write richer type signatures that document preconditions and postconditions. These type signatures resemble Haskell type signatures except that you can decorate them with logical predicates, like this:

assume ! :: v :Vector a -> { i :Int|0<= i && i < vlen v } -> a

This is an example of a precondition. The above refined type says: "The index, i, must be non-negative and must be less than the length of the vector, v, that you supplied for the first argument".

So how does Liquid Haskell know what the length of the vector is? Remember that Liquid Haskell needs to somehow verify this precondition at compile time even though we don't necessarily know what the input will be at compile time. In fact, Liquid Haskell actually doesn't know anything about Vectors at all, let alone their lengths, unless we teach it!

Postconditions

We can teach Liquid Haskell about Vector lengths by introducing a new "measure" named vlen:

measure vlen ::Vector a ->Int

The only thing we provide is the type of vlen, but there is no code associated with vlen. Just treat vlen as an abstract type-level placeholder for the length of the Vector. We can use vlen to give a more refined type to Vector.length:

assume Vector.length :: v :Vector a -> { n :Int| n == vlen v }

This is an example of a postcondition. You can read this type signature as saying: "We know for sure that whatever Int that length returns must be the length of the Vector".

The assume keyword indicates that we haven't proven this the correctness of this refined type. Instead we are asserting that the type is true. Any time you study a Liquid Haskell program you need to stare really hard at any part of the code that uses assume since that's an escape hatch that might compromise any safety guarantees.

Not all postconditions require the use of assume. In fact, Liquid Haskell can automatically promote a restricted subset of Haskell safely to the type level. Unfortunately, this subset does not include operations on Vectors, which is why we must assert the correctness of the above refined type.

Safety

Let's verify that the refined types for (!) and length work correctly using a small test example. We'll begin with a program that is not necessarily safe:

import Data.VectorasVectorexample ::Vector a -> a
example v = v !2

We run the above program through the liquid type-checker and get a type error (as expected) since the type-checker cannot verify that the input Vector has at least three elements:

There are two ways we can fix our program. The best solution is to add a precondition to the type of our example function. We can specify that our function only works for Vectors that have at least 3 elements:

We document this in a second parallel type signature embedded within a Haskell comment. Liquid Haskell is designed to be backwards compatible with the Haskell language and is not yet a formal language extension supported by ghc, which is why Liquid Haskell types have to live inside comments.

Once we add that precondition then the liquid Haskell type-checker verifies that our program is correct:

Notice how this program requires no Liquid Haskell type signature or type annotation. Liquid Haskell is smart enough to figure out that the precondition for (!) was already satisfied by the runtime check:

This is Liquid Haskell's way of saying: "I can't prove that mid is within the vector bounds". Liquid Haskell will even explain its reasoning process, showing what refinements were in scope for that expression:

This is an example of the first class of errors: missing preconditions. Liquid Haskell can't read our comment so Liquid Haskell has no way of knowing that lo and hi are within the vector bounds. Therefore, Liquid Haskell can't conclude that their midpoint, mid, is also within the vector bounds.

This is very easy to fix. We can transform our unstructured comment describing preconditions into a refined type for the loop function:

Liquid Haskell incorporated the new information we supplied through the refined type. The type-checker then worked backwards from loop's preconditions and found a problem in our code via the following reasoning process:

The type of loop says that the lo argument must be less than the length of the vector

We supplied 0 for the lo argument to loop

Therefore the Vectorv needs to be at least length 1

But we never proved that the Vector has at least one element!

This is an example of the third class of errors: a genuine bug. I introduced this bug when I first wrote up this algorithm and Liquid Haskell caught my mistake. I never thought about the case where the Vector was empty. Many programmers smarter than me would carefully consider the corner case where the Vector was empty, but I use tools like Liquid Haskell so that I don't need to be smart (or careful).

In this case we don't want to refine the type of binarySearch to require non-empty Vector inputs since we want to support binary searches for Vectors of all sizes. Instead, we will add a special case to handle an empty Vector input:

Liquid Haskell already knows that both lo and hi are within the Vector bounds, but cannot deduce that their midpoint must also lie within the Vector bounds.

This is an example of the second class of errors: missing postconditions. In this case the issue here is a deficiency of Liquid Haskell's built-in Prelude. Without going into too many details, the built-in refinement for the div function uses integer division, but for some reason type-level division does not provide sufficient information for Liquid Haskell to deduce that the midpoint lies between lo and hi. I'm actually still in the process of narrowing downthe precise problem before reporting this on the Liquid Haskell issue tracker, so I may be interpreting this problem incorrectly.

Edit: Rhanjit Jhala explained that you can fix this by supplying the --real flag or adding this pragma to the top of the file:

{-@ LIQUID "--real" @-}import Data.VectorasVector...

Once you supply this flag then the postcondition for div correctly satisfies the type-checker. The previous version of this post used an assume to enforce the relevant postcondition, which is no longer necessary with this flag.

This one took me an hour to figure out and the error message originates from Liquid Haskell's termination checker! I only figured this error out because I already knew that Liquid Haskell had a built-in termination checker and even then I had to first minimize the code example before I fully understood the problem.

Liquid Haskell supports termination checking by default, which is actually pretty awesome despite the confusing error. Termination checking means that Liquid Haskell verifies that our code never endlessly loops. Or in other words, Liquid Haskell transforms Haskell into a "total" programming language (i.e. a programming language where computation always halts). You can also disable the termination checker completely or disable the checker for selected functions if you find the check too restrictive.

The termination checker proves termination by looking for some sort of "well-founded metric" that shows that the function will eventually terminate. This "well-founded metric" is usually an Int that decreases each time the function recurses and the recursion halts when the Int reaches 0.

By default, Liquid Haskell guesses and tries to use the first Int argument to the recursive function as the well-founded metric, which was the lo argument in our example. You get an error message like the above when Liquid Haskell guesses wrong. In our case, lo is not a suitable "well-founded" metric because lo does not decrease every time the function recurses. Quite the opposite: lo either stays the same or increases.

However, we are not limited to using arguments as well-founded metrics. We can create our own custom metric that is a composite of the given arguments. In this case we do know that hi - lo always decreases on every iteration and we can instruct Liquid Haskell to use that as the well-founded metric using the following syntax:

You can further eliminate the possibility of integer overflow through the use of refinement types, but Liquid Haskell does not provide these extra refinements to protect against overflow by default. Instead you must opt-in to them by adding your own refinements.

Liquid Haskell is designed in the same spirit as Haskell: check as much as possible with as little input from the programmer. Despite all the bugs I run into, I keep coming back to this tool because of the very high power-to-weight ratio for formal verification.

Also note that the final program is still valid Haskell code so we don't sacrifice any compatibility with the existing Haskell toolchain by using Liquid Haskell.

Wednesday, December 9, 2015

I wanted to share a few quick ways that beginning Haskell programmers can contribute to the Haskell ecosystem. I selected these tasks according to a few criteria:

They are fun! These tasks showcase enjoyable tricks

They are easy! They straightforwardly apply existing libraries

They are useful! You can probably find something relevant to your project

For each task I'll give a brief end-to-end example of what a contribution might look like and link to relevant educational resources.

This post only assumes that you have the stack build tool installed, which you can get from haskellstack.com. This tool takes care of the rest of the Haskell toolchain for you so you don't need to install anything else.

Contribution #1: Write a parser for a new file format

Writing parsers in Haskell is just about the slickest thing imaginable. For example, suppose that we want to parse the PPM "plain" file format, which is specified like this [Source]:

Each PPM image consists of the following:

A "magic number" for identifying the file type. A ppm image's magic number is the two characters "P3".

Whitespace (blanks, TABs, CRs, LFs).

A width, formatted as ASCII characters in decimal.

Whitespace.

A height, again in ASCII decimal.

Whitespace.

The maximum color value (Maxval), again in ASCII decimal. Must be less than 65536 and more than zero.

A single whitespace character (usually a newline).

A raster of Height rows, in order from top to bottom. Each row consists of Width pixels, in order from left to right. Each pixel is a triplet of red, green, and blue samples, in that order. Each sample is represented as an ASCII decimal number.

The equivalent Haskell parser reads almost exactly like the specification:

I use "long form" in quotes because the entire code is around 60 lines long.

Contribution #2: Write a useful command-line tool

Haskell's turtle library makes it very easy to write polished command-line tools in a tiny amount of code. For example, suppose that I want to build a simple comand-line tool for managing a TODO list stored in a todo.txt file. First I just need to provide a subroutine for displaying the current list:

Amazingly, you can delete all the type signatures from the above program and the program will still compile. Try it! Haskell's type inference and fast type-checking algorithm makes it feel very much like a scripting language. The combination of type inference, fast startup time, and polished command line parsing makes Haskell an excellent choice for writing command-line utilities.

You can learn more about scripting in Haskell by reading the turtle tutorial, written for people who have no prior background in Haskell programming:

Contribution #3: Client bindings to a web API

Haskell's servant library lets you write very clean and satisfying bindings to a web API. For example, suppose that I want to define a Haskell client to to the JSONPlaceholder test API. We'll use two example endpoints that the API provides.

A GET request against the /posts endpoint returns a list of fake posts:

The last line instructs the Haskell compiler to auto-derive conversion functions between APost and JSON.

Now we just encode the REST API as a type:

-- We can `GET` a list of posts from the `/posts` endpointtypeGetPosts="posts":>Get'[JSON] [APost]-- We can `POST` a list of posts to the `/posts` endpoint-- using the request body and get a list of posts back as-- the responsetypePutPosts=ReqBody'[JSON] [APost] :> "posts" :> Post '[JSON] [APost]
typeAPI=GetPosts:<|>PutPosts

Conclusion

Suppose that you write up some useful code and you wonder: "What's next? How do I make this code available to others?". You can learn more by reading the stack user guide which contains complete step-by-step instructions for authoring a new Haskell project, including beginning from a pre-existing project template: