A practical Template Haskell Tutorial

This tutorial explores the Glasgow Haskell Compiler's compile-time meta programming in Template Haskell. It motivates use cases for meta programming and explains the different Template Haskell features on simple toy programs. The aim is to give an overview of Template Haskell's functionality in an example-driven manner.

Introduction

Template Haskell (TH) is the standard framework for doing type-safe, compile-time meta programming in the Glasgow Haskell Compiler (GHC). It allows writing Haskell meta programs, which are evaluated at compile-time, and which produce Haskell programs as the results of their execution.

Template Haskell was conceived by Tim Sheard and Simon Peyton Jones[1] by drawing on the ideas of Lisp macros, but in the typed setting of Haskell. Since then, the original implementation has evolved quite a bit[2][3]. Most notably, in 2007 Geoffrey Mainland added support for quasi quoting[4], which makes the embedding of domain specific languages into the Haskell host language much easier.

As it exists today, Template Haskell has two main areas of application: Haskell code generation at compile-time and facilitating the embedding of domain specific languages.

As a code generator, Template Haskell empowers a user to write many, syntactically different, programs all at once by means of a single meta program. All that is needed is a uniform, algorithmic description to create the different result programs. And the meta program then precisely implements the algorithm to compute all the different result programs as its output. This proves useful for example to avoid writing the same repetitive, boilerplate code over and over again. To this end, Template Haskell is used (among many others) in the aeson library to automatically derive a data type's ToJSON and FromJSON instances for JSON serialization; and in the lens library to mechanically create a data type's lenses.

As a framework for creating domain specific languages (EDSLs), Template Haskell allows a user to embed programs written in another programming language inside of a Haskell program. This enables writing parts of the program in the concrete, domain specific syntax of a different programming language. It has the benefit to think about -- and to express -- domain specific problems in the language best suited for the task. In particular, it lets a user focus on the domain specific problem and removes all additional language burdens induced by inconvenient syntax, unsuited control constructs, etc. Programs from the embedded language are parsed and translated into corresponding (but syntactically heavier) Haskell code at compile-time by Template Haskell. In this sense, (e.g.,) the shakespearean template languages from the shakespeare library use Template Haskell at their core. They expose succinct domain specific languages to write HTML, CSS, and Javascript code inside of a Haskell based web application.

Template Haskell by Examples

In this section, we will review the Template Haskell features to write meta programs. The first set of examples show-cases Template Haskell's potential as a code generator; the second set of examples highlights its facilities to create embedded domain specific languages (EDSLs). All examples require GHC's language extension TemplateHaskell to be enabled.

To avoid confusion in the sequel, we distinguish between meta programs and object programs. Meta programs are the Haskell programs that run at compile-time and which generate Template Haskell object programs as the results of their execution; they are the programs that devise or manipulate other programs by some algorithmic means. Object programs, on the other hand, are the Template Haskell programs manipulated and built by the Haskell meta programs at compile-time.

Template Haskell as a Code Generator

As an introductory example, consider Haskell's Prelude function curry::((a,b)->c)->a->b->c, which converts a function taking a pair to its curried equivalent. Unfortunately, there are no Prelude functions that provide the same currying functionality for functions taking arbitrary -tuples. Moreover, having to write more than a few of these functions manually is, while trivial, a very repetitive and cumbersome task. Instead we wish to generate needed curry3, curry5, or curry8 functions through a single meta program on demand. Template Haskell lets us do just this. The idea is to write a meta function curryN::Int->QExp which, given a number n >= 1, constructs the source code for an -ary curry function:

For input , meta function curryN builds a lambda abstraction LamE that pattern matches against a function f and argument variables x1, x2, ..., xn; in its body, it then applies function f to the -tuple (x1,x2,...,xn) derived from the pattern matched variables. The names used to refer to the variables f and x1 through xn are generated monadically by function newName::String->QName to always generate fresh names not used anywhere else. Hence, the value returned by curryN is a monadic computation of type Q Exp. When executed, this monadic computation yields an expression Exp representing the object program of an -ary curry function. For example, (curryN3) returns a monadic computation that yields an expression representing a curry3 function of type ((a,b,c)->d)->a->b->c->d in abstract syntax.[5]

To run a meta program like curryN at compile-time, we enclose it with Template Haskell's splice operator $ by writing (e.g.,) $(curryN3). This evaluates the meta program curryN 3 and puts the resulting object program \fx1x2x3->f(x1,x2,x3) in place of the splice. In general, the splice operator $ can be applied to any monadic Q computation, hereby performing this computation at compile-time and inserting the resulting object program as real Haskell code. To ensure type safety, meta programs to be run are type checked beforehand to indeed yield a valid Template Haskell object program. Hence, only imported, fully-typechecked meta programs can be run via the splice operator $: in particular, we have to evaluate the meta program $(curryN 3) in a separate module to where curryN is defined.

To generate function declarations for the first curry functions, we can devise a further meta program on top of curryN as follows:

Note that in this case, genCurries returns a list of top-level function declarations that bind the anonymous lambda abstractions built by curryN. Also, to name the function bindings, we use function mkName::String->Name instead of newName::String->QName. The reason is that here we want to generate functions curry1 to curry20 with exactly the prescribed names, so they can be captured and referred to from other parts of the program.

Evaluating Haskell (meta) programs at compile-time and splicing in the generated object programs as regular Haskell code is the first central building block of Template Haskell. The two other core mechanisms are exhibited by the implementations of curryN and genCurries: algebraic data types and the quotation monad Q.

First, object programs created by Template Haskell are represented as regular algebraic data types, describing a program in the form of an abstract syntax tree. The Template Haskell library provides algebraic data types Exp, Pat, Dec, and Type to represent Haskell's surface syntax of expressions, patterns, declarations, and types, respectively. Virtually every concrete Haskell syntactic construct has a corresponding abstract syntax constructor in one of the four ADTs. Furthermore, all Haskell identifiers are represented by the abstract Name data type. By representing object programs as regular algebraic data types (and thus as data), normal Haskell can be used as the meta programming language to build object programs.

Second, TH object programs are built inside the quotation monad Q. This monad is performed by the splice operator "$" at compile-time as part of evaluating the meta program. In the examples so far, the Q monad was only needed to provide fresh identifiers with function newName::String->QName for the generated Haskell expressions. The other main feature that requires a monadic construction of object programs is reification, which allows to query compile-time information during the object program's construction. We will explain reification in detail later.

Thus, Template Haskell's core functionality constitutes evaluating object programs with "$" and building them from algebraic data types inside the quotation monad Q. However, constructing object programs in terms of their abstract syntax trees is quite verbose and leads to clumsy meta programs. Therefore the Template Haskell API also provides two further interfaces to build object programs more conveniently: syntax construction functions and quotation brackets.

Syntax construction functions directly relate to the syntax constructors from the algebraic data types Exp, Pat, Dec, and Type for representing Haskell code. However, they hide the monadic nature of building object programs. For example, recall our definition of the genCurries meta function from above:

To use the object program generated by the sub call to curryN in the larger context of the returned function declaration, we have to first perform curryN and bind its result to cury. The reason is that we have to account for curryN's generation of fresh names before we can continue. Using syntax construction functions instead of data constructors, however, abstracts from the monadic construction of genCurries, thus making its code a little shorter:

The new funD, clause, and normalB functions directly correspond to the formerly used FunD, Clause, and NormalB constructors. The only difference lies in their types:

FunD::Name->[Clause]->Dec

funD::Name->[QClause]->QDec

Clause::[Pat]->Body->Clause

clause::[QPat]->QBody->QClause

NormalB::Exp->Body

normalB::QExp->QBody

While the syntax constructors work with raw TH expressions, the syntax construction functions expect their monadic counterparts. They construct a TH object program directly in Q, thus freeing the API consumer from doing the monadic wrapping and unwrapping manually. For every syntax constructor, there is a corresponding monadic syntax construction function provided.

On top of syntax construction functions, quotation brackets are a further shortcut for representing Haskell code. They allow to specify an object program using just regular Haskell syntax by enclosing it inside oxford brackets [| .. |]. That way, object programs can be specified yet much more succinctly. For example, a meta program building a Haskell expression for the identity function is still quite verbose, if expressed with either ADTs or syntax construction functions:

genId::QExpgenId=dox<-newName"x"lamE[varPx](varEx)

Using quotation brackets, writing the same meta program can be abbreviated much further as:

In addition, there is support for quoting Haskell (value and type) identifiers as corresponding Names inside Template Haskell. This allows to refer to regular Haskell identifiers from within TH object programs. For example, writing 'genId yields a TH Name referring to the genId identifier. Similarly, ''Q gives a Name referring to the Q type identifier.

Generic Maps

As a second example that uses both syntax construction functions as well as quotation brackets, let's consider a meta program mapN::Int->QDec to build "generic" map functions at compile-time. Invoking $(mapN1) should generate the well-known standard function map::(a->b)->[a]->[b]; evaluating $(mapN2) should splice in a binary map function of type (a->b->c)->[a]->[b]->[c], and so on.[6]

mapN::Int->QDecmapNn|n>=1=funDname[cl1,cl2]|otherwise=fail"mapN: argument n may not be <= 0."wherename=mkName$"map"++showncl1=dof<-newName"f"xs<-replicateMn(newName"x")ys<-replicateMn(newName"ys")letargPatts=varPf:consPattsconsPatts=[[p|$(varPx):$(varPys)|]|(x,ys)<-xs`zip`ys]apply=foldl(\gx->[|$g$(varEx)|])first=apply(varEf)xsrest=apply(varEname)(f:ys)clauseargPatts(normalB[|$first:$rest|])[]cl2=clause(replicate(n+1)wildP)(normalB(conE'[])) []

The implementation of mapN is very much in the spirit of meta function curryN from the first example. For instance, evaluating splice $(mapN3) splices in the following map function at compile-time:

map3f(x:xs)(y:ys)(z:zs)=fxyz:map3fxsyszsmap3____=[]

Nonetheless, meta function mapN exhibits a couple of new Template Haskell features: First, quotation brackets and splices are used in several places to abbreviate the object program construction. For example, helper definition apply used to generate map3's body fxyz:map3fxsyszs shows the use of quotation brackets; it also highlights how splicing ($) and quotes ([|..|]) cancel each other out. Second, identifier quotes (namely, '[]) are used to create an object program Name that refers to Haskell's built-in list constructor []. Third, the example advertises how all three APIs for building Template Haskell object programs can be interleaved. The lowermost verbose API of building a raw TH data value inside the quotation monad Q can be abbreviated, where possible, with syntax constructor functions and quotation brackets.

Lastly, the mapN example exemplifies how Haskell's static scoping is extended to object programs. The scoping principle for object programs is just as in normal Haskell: Identifiers are bound to their lexically enclosing binders in scope at the point the object program is defined. Quotation brackets and splices don't alter static scopes, even though splices may bring an object program into scope at a location, where a conflicting closure is present. For example, consider this snippet:

x::Intx=42static::QExpstatic=[|x|]plus42::Int->Intplus42x=$static+x

Here the occurrence of x in static refers to the global identifier x that is lexically in scope during its definition. Splicing in static into a different scope later where a different local x is present (i.e., plus42's local identifier x), doesn't alter the link between static's x and the global identifier x.

The only exception to static scoping in Template Haskell are the names generated by mkName::String->Name. These names implement dynamic scoping and can be captured in spliced-in code. Changing the previous snippet to

results in the identifier x spliced in by $dynamic to be bound to the closest x in scope. Hence, its binder is times2's local identifier x and not the global x.

Reification

The final major Template Haskell feature not yet described is program reification. Briefly, reification allows a meta program to query compile-time information about other program parts while constructing the object program. It allows the meta program to inspect other program pieces to answer questions such as: "what's this variable's type?", "what are the class instances of this type class?", or "which constructors does this data type have and and how do they look like?". The main use case is to generate boilerplate code which auto-completes manually written code. A prime example is to generically derive type class instances from bare data type definitions.

Moreover, suppose we want to derive Functor instances for all of these types. Deriving these instances manually is straightforward, but writing them all out by hand is quite cumbersome. Especially since writing a Functor instance follows the same pattern across all of the above types and in fact any type T a.

To make a type constructor T an instance of Functor, one needs to implement method fmap::(a->b)->Ta->Tb. Its definition is hereby precisely determined by parametricity and the functor laws: By parametricity, all values of type a must be replaced according to the provided function with values of type b. Furthermore, by the functor laws, all other shapes of the input value of type T a must be preserved when transforming it to the output value of type T b.

Meta function deriveFunctor::Name->Q[Dec] below implements the idea of this algorithm:

dataDeriving=Deriving{tyCon::Name,tyVar::Name}deriveFunctor::Name->Q[Dec]deriveFunctorty=do(TyConItyCon)<-reifyty(tyConName,tyVars,cs)<-casetyConofDataD_nmtyVarscs_->return(nm,tyVars,cs)NewtypeD_nmtyVarsc_->return(nm,tyVars,[c])_->fail"deriveFunctor: tyCon may not be a type synonym."let(KindedTVtyVarStarT)=lasttyVarsinstanceType=conT''Functor`appT`(foldlapply(conTtyConName)(inittyVars))putQ$DerivingtyConNametyVarsequence[instanceD(return[])instanceType[genFmapcs]]whereapplyt(PlainTVname)=appTt(varTname)applyt(KindedTVname_)=appTt(varTname)

Given the name of a type constructor (e.g. Result, List, etc.), deriveFunctor derives the code for this type constructor's Functor instance. For example, running the splice $(deriveFunctor''Tree) generates the following code:

Meta function deriveFunctor shows reification in action. It calls function reify::Name->QInfo on the input type constructor's name to yield information about this data type's definition. Using reify, it thus learns whether the data type was defined using the data or newtype keyword, which constructors it defines and what their shapes are. Based on the learned structure, deriveFunctor is then able to generate a suitable definition of fmap and its different clauses via the auxiliaries genFmap, genFmapClause, and newField, defined below. These auxiliary definitions generate one fmap clause for each of the data type's constructors. And each clause then transforms its constructor by recursively modifying all of the constructor's fields of type a through fmap's function f, while retaining all other shapes.

In more detail, deriveFunctor works as follows. First, via reify it observes the input data type's name tyConName, its declared type variables tyVars, and its exposed constructors cs. It then determines the data type's right-most type variable tyVar and stores it together with the data type's type constructor name tyConName in the Q monad's user state. This state information is retrieved later again from inside auxiliary definition newField. Next, deriveFunctor derives a Functor's fmap definition using auxiliary genFmap. For each of the input data type's value constructors cs, genFmap generates an fmap clause using helper function genFmapClause. The latter recursively maps the provided function f::a->b over all of a constructor's fields of type a, while leaving all other fields untouched. Each field is hereby modified through f or left unchanged by auxiliary newField based on the field's type: if a field's type is a (which is stored in the retrieved tyVar inside function newField), then f needs to be applied to it; otherwise it needs to remain unchanged.

In an analogous manner to deriveFunctor, a function deriveFoldable::Name->Q[Dec] can be devised to derive a data type's Foldable instance. All that is needed is to provide a definition for function foldMap::Monoidm=>(a->m)->Ta->m. Again, foldMap's definition follows directly from a data type's bare definition, which can be observed by means of reification. This highlights particularly how the functionality offered by Template Haskell provides a low-level API into the GHC compiler to manipulate abstract syntax trees at compile-time. This mechanism is quite powerful and even allows to simulate some of GHC's offered language extensions, e.g., -XDeriveFunctor and -XDeriveFoldable, to be implemented as a library on top of Template Haskell.

To see Template Haskell's potential for building an EDSL, consider the problem of pattern matching text with regular expressions. Suppose, as part of a Haskell program we need to devise many different regular expressions and use them to pattern match text fragments. Regular expressions are easily defined by an algebraic data type capturing their structure, as well as an evaluator checking whether a regular expression matches some input string. [7]

The evaluator match is hereby based on the concept of derivatives[8]: an initial regular expression r matches an input string s, if r matches the first character of s and its derivative regular expression (derivr) matches the remainder of s:

The RegExp data type and the match function solve the initially posed problem of providing regular expressions in Haskell. However, specifying regular expressions in abstract syntax is extremely tedious. For example, consider defining a regular expression for checking the wellformedness of email addresses ending with the top level domain .com. In its usual concrete syntax, such a regular expression is easily defined as ([a-z]|[0-9])*@([a-z]|[0-9])*.com, but writing it in terms of the RegExp dataype is verbose and unintuitive. Moreover, parsing functions like

compile::String->RegExp, or

compile'::String->EitherCompileErrorRegExp

do not remedy the problem of working with regular expressions in concrete syntax. Due to "compiling" regular expressions at run time, they don't provide any compile-time type-safety guarantees that the input raw expression is wellformed; thus they lead to either run time exceptions for illformed regular expressions (e.g., compile) or induce a tedious handling for compiled regexes (e.g., compile').

To preserve type safety and yet to be able to use regular expressions conveniently, we want to embed the concrete regular expression syntax into the Haskell host language. This can be done via Template Haskell's quasi quotes and furthermore enabling the QuasiQuotes extension. This allows defining quasi quotes for regular expressions, denoted [regex| .. |], where anything inside the quasi quote is considered part of an embedded regular expression language. Using quasi quotes, we can then specify the regex for email addresses from above naturally as follows:

Writing ${alphaNum} interpolates the regex referred to by alphaNum into the larger regex validDotComMail'. In essence, this means that we can define our own notion of splicing values from the Haskell meta language into the embedded object language of regular expressions. We can go further and even allow to run Haskell code when interpolating with ${..}. For example, refining our wellformedness check for .com mail addresses, we might want to ensure at least one character to occur on either side of the "@" symbol:

Here, plus corresponds to the usual regex combinator that requires a given regex to occur at least once. Note how plus is defined as a regular Haskell function and then used inside of the embedded regex language to build the regular expression for validDotComMail''.

Intuitively, a quasi quote like [regex| .. |] converts an embedded language's concrete syntax to Haskell code at compile-time. It is defined by a quasi quoter, which is a parser for the embedded language. Its task is to parse the embedded language's syntax into a corresponding Template Haskell expression and then to splice this expression as real Haskell code in place of the quasi quote. The conversion of embedded language code to corresponding Haskell code hereby happens before typechecking the Haskell module. Hence, trying to splice in malformed embedded language fragments will raise a Haskell type error at compile-time.

The quasi quoter regex for our embedded language of regular expressions can be defined as follows:

regex::QuasiQuoterregex=QuasiQuoter{quoteExp=compile,quotePat=notHandled"patterns",quoteType=notHandled"types",quoteDec=notHandled"declarations"}wherenotHandledthings=error$things++" are not handled by the regex quasiquoter."compile::String->QExpcompiles=caseP.parseregexParser""sofLefterr->fail(showerr)Rightregexp->[e|regexp|]

to parse raw strings of the embedded language into the different categories of Haskell syntax. In this example, however, we only want to splice embedded regular expressions into the context of Haskell expressions, so we only define the quoteExp parser in the regex quasi quoter. This parser compiles an embedded regular expression given as a string into a corresponding Template Haskell expression.

Compilation by the compile function proceeds in two stages: First, we parse the input string regex into a corresponding RegExp value. Second, we encode this RegExp value as a Haskell expression in Template Haskell's Q Exp type. It is the second step that allows us to interpolate variables (or even code) from the Haskell host language into the EDSL for regular expressions.

Parsing a raw regular expression into a corresponding RegExp value is a routine task using (e.g.) the parsec library:

Most of the lifting is a direct encoding of the syntactic structure of the RegExp value; the only interesting case is when lifting the regular expression variable Varvars. In this case, we treat the words in the string vars as referring to identifiers from the Haskell host language, which we apply in a left associative manner to each other. Doing this enables interpolation of Haskell identifiers or even simple forms of Haskell expressions into our EDSL of regular expressions as shown by the regexes validDotComMail', and validDotComMail'' above.

These two steps constitute the conversion of raw string regular expressions into Template Haskell expressions inside of the compile function and define the regex quasiquoter. Whenever we write a quasi quote like [regex|..|] in a Haskell expression context, regex's parser quoteExp converts the regex EDSL into a Template Haskell expression Q Exp and splices in the result as a wellformed RegExp value. This example shows how Template Haskell and quasi quotes can be used to define a type-safe, domain specific language for regular expressions.

Shakespearean Templates

In much the same manner as in the last example, Template Haskell and quasi quotes are used in Michael Snoyman's shakespeare library[9][10]. It defines embedded templating languages for working with the internet's web languages from within a Haskell web application. In particular, the shakespeare library provides the template languages Hamlet, Cassius, and Julius for writing embedded HTML, CSS, and Javascript code, respectively. All three templating languages internally work quite similarly to the previous example's EDSL for regular expressions: quasi quotes allow one to write HTML, CSS, or JavaScript code in concrete (though slightly modified) syntax inside of Haskell. Moreover, identifiers from the Haskell host language as well as code fragments can be interpolated into the template languages at compile-time. In the remainder we will briefly show-case the shakespeare library's templating language Hamlet for creating HTML documents; the other templating languages Cassius and Julius are similar.

To create and output a simple web page from inside a Haskell application, the following is enough:

Running this Haskell program, outputs an HTML page as specified by the Hamlet templating language, embedded through quasi quote [hamlet|..|] in function webPage. Hamlet closely resembles real HTML syntax, but is even more terse: instead of a closing HTML tag, Hamlet uses indentation to indicate the span of the tag. Furthermore, Hamlet allows to interpolate code or identifiers from the Haskell host language when creating an HTML template. Interpolation of Haskell code into Hamlet is done by writing #{ .. }. In the above example, the HTML page's title and content are interpolated from Haskell identifiers. Note particularly how in the webpage's title we uppercase the interpolated title using Haskell's Text.toUpper function inside of the Hamlet language.

In addition to this standard interpolation, Hamlet can also interpolate links by writing @{..}. These links are specified as values of the Page data type inside the template and the mkUrls render function translates them to real URLs later. Hamlet's URL interpolation is often described as creating "type-safe URLs". One reason is that, just like with normal variable interpolation, all interpolated links have to exist and be type correct at compile-time; in this case, links must be values of the Page data type. Hence, as soon as a link's constructor shape is changed, the compiler statically forces us to update all references to this link as well. Furthermore, there is only one distinct place in the code to maintain or update a link's raw URL, thus minimizing the risk of dead URLs.

For example, suppose we want to add more external links to our web page. We could model this fact by changing the Page data type to

Doing just these changes, will then cause a compile-time error in our webPage template, since we haven't updated the Github reference to our newly adjusted link structure. Hence, the compiler reminds (and in fact forces) us to update all locations in the code that used the old Github link to now use the new ExternalGithub (as well as optionally the ExternalHaskell, etc.) links.

Finally, Hamlet allows to use some control constructs like if-conditionals, for-loops, and let-bindings to embed basic business logic into a webpage's template. Michael Snoyman gives a gentle (and much more in-depth) introduction to shakespearean templates and Yesod[9][11].

↑Note that meta function curryN cannot be written in normal Haskell per se as the type for a generated -ary curry function depends on . Thus, the definition of curryN requires dependent types to be expressed in Haskell, a feature not yet present. However, there already exist ingenious alternatives to "faking" dependent types in Haskell; see for instance this paper for a solution to simulate functions like curryN without dependent types).

↑Note that -ary maps are better written using Applicative Functors and ZipLists, as this allows to define them first-class from within regular Haskell. For understanding Template Haskell as a code generator, this example is still useful though.