Sunday, January 4, 2009

Reading Camlp4, part 2: quotations

Last time we saw how Camlp4 represents OCaml abstract syntax trees. We also saw some small examples of using Camlp4 quotations to get the AST corresponding to a piece of OCaml code. In this post we'll build a simple code-generating tool that uses quotations in more complicated ways.

The tool takes constructor names on stdin, one per line. It prints an OCaml module on stdout containing a type definition with the given constructors, a function to_string that converts a constructor of the type to the string of its name, and a function of_string that does the reverse. (Not the most useful program ever written.) We'll go through it line by line:

openCamlp4.PreCastlet_loc =Loc.ghost in

We start out with some boilerplate. Recall that Camlp4.PreCast gets us the Ast module containing the AST datatypes (along with some helper functions we'll see below); and that we need a location bound to _loc since the expanded quotations mention it.

This part just reads lines from stdin until EOF and puts them in a list, nothing to do with Camlp4.

Printers.OCaml.print_implem
<:str_item<

Now we're into the meat. We've begun a str_item quotation (recall that this is a module body), and we're passing it to a function that will pretty-print it to stdout. The module Printers.OCaml comes from camlp4/Camlp4/Printers/OCaml.ml. (This module is called to pretty-print OCaml code when you run Camlp4; toward the bottom you can see some useful command-line options.)

It's a bit confusing with the interjected explanations, but we are still in the quotation (up to the >> at the end of the program). This part generates the variant type definition. If the input is the lines "Foo", "Bar", "Baz", we want to generate typet = Foo | Bar | Baz.

This might be a good time to run the above type definition through Camlp4 to see what the AST looks like. You'll see that the body of the definition begins with Ast.TySum, then contains the branches of the variant collected by Ast.TyOr's. Each branch is an Ast.TyId with an Ast.IdUid inside containing the constructor name. We want to come up with the same thing for an arbitrary list of constructor names.

Reading from the inside out, we take the list of constructor names and map a ctyp quotation over it. In this quotation we see an antiquotation. Antiquotations let you fill in code so you can use a quotation as a template. Here the tag uid: turns the string c into an Ast.IdUid inside an Ast.TyId. So we have a list of Ast.TyId (Ast.IdUid "Foo")'s.

Now we call a helper function Ast.tyOr_of_list. There are a bunch of these in camlp4/Camlp4/Struct/Camlp4Ast.mlast, which is where the Ast module comes from. This one collects the elements of the list with Ast.TyOr. (As we saw with Ast.StSem/Ast.StNil, the AST doesn't use regular OCaml lists, but builds list-like things out of data constructors.)

Finally we wrap a Ast.TySum around the whole thing, and use an antiquotation to insert it as the body of the type definition. (Here no tag is needed on the antiquotation; this is usually the case if you're inserting an AST rather than a string.)

Now that we've seen how things work, the functions are pretty easy to understand. There are a few new elements: A pattern match (in either a match or function expression) consists of an Ast.McArr for each arrow (which we make with the match_case quotation), collected by Ast.McOr's (which we make with the Ast.mcOr_of_list helper).

The `str antiquotation turns a string into a quoted OCaml string, escaping any special characters.

If you run a sample match through Camlp4 you'll see constructors like Ast.PaId and Ast.PaStr. These are for patterns on the left side of match cases. When we use the match_case quotation we don't have to worry about it; we can use the same antiquotations on the left and right, and the appropriate constructor is parsed out.

Quotations and antiquotations are a slick way to generate OCaml code, but it can be pretty hard to figure out how to use them. The first thing to realize is that you don't have to: anything you can do with Camlp4 you can do by working directly with the AST constructors (although you will find it tedious). As you learn your way around you can replace explicit AST code with quotations.

To access the full power of quotations we'll need to dive into the OCaml parser. But that's going to have to wait until the next post.