Thursday, April 30, 2009

Scala has a flexible and powerful syntax that is very convenient when building libraries. Among the best examples of Scala's syntactic power put to excellent use are the parser combinator libraries in the standard distribution. Code that uses Scala's parser combinators looks remarkably like BNF (the standard notation for describing formal language grammars) but is actually an executable program which parses the language it describes.

What I want to write about today are pretty printer combinators. If a parser takes a stream of characters and turns them into an abstract syntax tree, a pretty printer does the opposite: it turns some tree into a string, subject to some constraints (to keep the output "pretty"). I've found myself needing to programmatically generate Scala code, but I want the output to be human-readable as well as valid Scala syntax. There is no equivalent to BNF for pretty printers, but my goal is to create a library of combinators to encode pretty printing rules in a custom syntax.

A Pretty Printer Algebra

Lets put aside concerns about syntax for now and think about the underlying mechanics of pretty printers. Phil Wadler, the same person who popularized parser combinators, describes an algebra for pretty printers in his paper "A prettier printer". Without getting into the details of why this is an "algebra", let's take a look at how Wadler's insights might be encoded in Scala.

This code describes four kinds of documents. The Empty document is self-explanatory. The Line document represents a new line. The Text document represents raw text (which MAY NOT contain new lines). The Cons document joins together two other documents. Given these rules, it's trivial to define a layout method that turns a Doc into a String.

These building block are sufficient to encode any document with any layout. The only problem is, well, they're not very useful. For example, we had to manually specify the indentation level of the code inside the brackets for each line independently. Keeping track of details like this is tedious, error-prone, and mechanical; it's just the kind of thing we should let computers do for us. With this use-case in mind, let's add a new kind of document:

case class Nest (n: Int, d: Doc) extends Doc

The Nest document is actually a transformation on an existing document. It modifies the preexisting document by adding n spaces after every line found in the enclosed document. We can encode the same document we did before, with the same final layout, without manually adding spaces but instead using our new Nest document:

The second case gives the key function of nesting: it adds n spaces after every newline. The rest of the cases state that nesting does nothing to Text and Empty, that it applies to both documents inside a Cons, and that nested Nests just add together.

There's still the issue of syntax. We'd like something a little prettier than the mess of nested parentheses we have now. Let's add an implicit, a few shortcuts, and a convenient definition for block which puts stuff in braces and automatically idents it.

We've assembled enough functionality to have a rudimentary pretty printer combinator. With this we can build pretty printers from very simple combinators. There's still a lot of work to be done: higher-level combinators, columns and alignment, and the ability to describe groups of layouts and to choose among them the one that best fits the width of the page. We'll go through all this and more in Part 2 of Combinators for Pretty Printers.

3 comments:

In part 2, perhaps you could discuss scala.text.Document. Purely by cooincidence while evaluating Scala, I was browsing the API and randomly looked at that package. What most struck we was that the extent of the scaladoc was a single sentence: "A basic pretty-printing library, based on Lindig's strict version of Wadler's adaptation of Hughes' pretty-printer."

It fills me with dread that a package could get into Scala's standard library with such documentation.