Formatting with first-class types

Stringly-typed APIs under strict supervision.

It all started with an innocuous snippet of Idris posted on Twitter. But before we get into details, we need to talk about format strings and first-class types.

Format strings are an archetypal example demonstrating
dependent typing. It’s a simple but telling use case.
In most languages we program in we don’t have access to first-class types. As a result, we have to rely on various tools to help us program with format strings. Those tools may be embedded in our compilers, IDEs, or other analyzers. In any case, they are not intrinsic to the language.

Here, for instance, a C compiler is warning us of an invalid call to printf.

The code is invalid: there’s no fourth argument, and the two which remain are never set. In spite of errors, the compiler accepts the file without complaints.
Unless our IDE warns us earlier, we have to count on feedback from our test suite or static code analysers like SonarJava.

In a language with first-class types the situation changes.
Types and the number of arguments to a function like printf can be determined at compile time based on its first parameter: the format string.

Idris is a language with first-class types. In fact, the format string problem appears as an example in Edwin Brady’s Type-Driven Development with Idris. I can’t recommend this book enough; get a copy if you have a chance.

The first five chapters of the book offer a gentle and unassuming introduction to the language. In chapter six the author leads us through a process of defining a printf
function which will fail to compile unless arguments match the specification in the format string. We can experiment with the printf in the REPL:

What’s important to note is that the rules you see above are not embedded into the language, the standard library, or the compiler. We specify them ourselves by using Idris' type system.
How? Let’s take a look, loosely following the solution from the book.

The implementation begins with a definition of a type representing our format string. We will limit ourselves to formats with %d and %s specifications.

data Format = Number Format
| Str Format
| Lit String Format
| End

We represent a format string "%d is a %s" as a value Number (Lit " is a " (Str End)).
Specifically:

it begins with a number passed as an argument to printf,

then it continues with a literal constant string " is a ",

then it expects a string passed as another argument to printf, and

it ends.

Given a Format like the one above we want to compute the type of our printf function. We need a function for this.
We’ll call it PrintfType, and its implementation might look as follows.

If the format is a Number, expect an Int argument and continue computing the type. If it’s a Str, expect a String argument. If it’s a literal, no argument is necessary but we still need to continue parsing; the rest of the format might need some arguments. Finally, at the end of the format we end up with a regular String.

Using the REPL, we can verify that the type for our format is correct. It’s a type describing a function, taking an Int and a String and returning a String.

Using unpack we transform the given string into a list of characters. Such a list is much easier to process and pattern-match against than a normal string. After unpacking we parse the list using the following toFormat function.

lit keeps reading from its first argument and accumulates read characters in the second one. Upon encountering a '%' we know we have a complete literal. Using pack — the inverse of unpack — we transform what we have back into a string.
We return the string in a Lit, with the rest of parsing being handled by a call to toFormat.

Let’s try it out by parsing a string into a Format and then computing its PrintfType.

The implementation of the printf function remains an open topic, but we’ve solved the most interesting part of the problem. Code using our printf will fail to compile if our arguments don’t match the format string. Using exactly the same approach we could bullet-proof our functions for building
SQL statements.

At this point, I can’t recommend Type-Driven Development with Idris enough. In chapter six you will find a far more precise discussion of printf’s implementation, including an alternative definition of toFormat.

Having an understanding of how printf works, we are well-equipped to attack the next problem: format strings with precision specifications, such as "%0.3f".

We’ll also feature the aforementioned snippet from Twitter.
But for now, let’s take a break, have a walk outside, and let what we saw sink in. We’ll be back soon. Until then!

TAGS

Jan works at INNOQ in Berlin. He has worked on a number of software systems built with functional programming languages such as Clojure and Erlang. He enjoys sharing his experience and regularly speaks at software conferences. He is always happy to answer your questions related to functional programming and property-based testing.

Services

Never miss out on interesting articles, events and podcasts on architecture, development and technology trends!

I accept the storage and processing of my email address in order to receive the monthly INNOQ Newsletter. This acceptance can be revoked via email to info@innoq.com or by clicking the corresponding links in the newsletter emails. I've read %{privacy_link} of innoQ Deutschland GmbH and accept them.

I accept the storage and processing of my email address in order to receive the monthly INNOQ Newsletter. This acceptance can be revoked via email to info@innoq.com or by clicking the corresponding links in the newsletter emails. I've read the privacy terms of innoQ Deutschland GmbH and accept them.