Writing a template language with clojure

Mustache is a very sweet
template system inspired by ctemplate and el. Ever wondered how mustache
is implemented or how to write your own template system? this post will
teach you the basics.

NOTE: although this post discusses abstract syntax trees and lexical
analysis, don’t be intimidated, no experience other than a basic knowledge of
Clojure and regular expressions is needed to read, understand and be able to
implement your own template language.

Get the code

This blog post is based on gabo, I
suggest you take a look at the README, clone the project or at least
play with it a little using the repl. Code is also available in
clojars.

Introduction

Rendering templates is a 3 step process:

Step 1: Lexical Analysis

The purpose of the lexical analysis step is to convert a template
string into a list of tokens.

Step 2: Parsing

The second step, parsing, is in charge of converting the list of
tokens produced by the lexical analysis step into an abstract syntax
tree.

Step 3: Rendering

The final step takes as input an abstract syntax tree and some input and
renders the given input using the template as structure.

Now that we have a high level understanding of the steps required to
compile and render templates let’s get into detail.

Step 1: Lexical Analysis

The main purpose of the lexical analysis step is to implement the
tokenize function which takes a template as input and converts it into
a list of tokens. A token is a very simple data structure (usually a
tuple) that splits the template into meaningful lexical elements. A
typical programming language will have string tokens, reserved keyword
tokens, number tokens, etc.

The tokenize function

The tokenize function is a very simple one, it takes as input a
template string and loops through the template until it reaches the end.
On every iteration the tokenize-chunk function is called which tries
to match the remaining template to a token, when a match is found, the
tokens vector will conjoined with the new token and the remaining
template will be substringed by the matched string.

As you probably guessed, the tokenize-chunk is the function that is
doing the actual heavy work of matching strings to tokens.

when-match, what is that?

You probably noticed the use of the when-match macro. The implementation
for the when-match macro can be found
here.

Example usage of when-match:

user=>(when-match"a long string"#_=>[match#"long"](keywordmatch)#_=>[wont-match#"string"](keywordwont-match)):long

when-match takes a string as its first argument followed by n pairs of forms.
The macro will try to match the regular expression in the first form. If a
match is found then the symbol preceding the regular expression will be
let and the second form will be evaluated. Regular expression matches
use re-find under the hood.

If there is no match then the same will be attempted with the next pair
of forms. I suggest you play around with when-match in your REPL to
get a solid understanding of how it works.

Going back to the tokenize-chunk function: Implementation is very
simple once you understand when-match, each regex matches to one type
of token, when a token is matched a pair of [string token] will be
returned where string is the actual string that was matched and token
is the corresponding token constructed from the matched string.

Example: tokenize-chunk at work.

user=>(tokenize-chunk"Hi {{name}}, how are you?")["Hi "[:literal"Hi "]]user=>(tokenize-chunk"{{name}}, how are you?")["{{name}}"[:symbol"name"]]user=>(tokenize-chunk", how are you?")[", how are you?"[:literal", how are you?"]]

Notice how this example tokenizes the "Hi {{name}}, how are
you?" by matching chunk by chunk to a token.

We are done with the lexical analysis step. Let’s go now to the parsing
step.

Step 2: Parsing

Let us recall that the purpose of this step is to produce an abstract
syntax tree or AST. Think of an AST as a tree like representation of our
template which we can later use to actually render templates.

Consider a template which renders a country and every city in the
country. We will first tokenize it and then manually convert the tokens
into an AST.

It doesn’t do much, let’s take a look at the build-ast which takes a
list of tokens as input and outputs an AST.

(defn-build-ast"Builds an abstract syntax tree given a list of tokens as produced by tokenize"[tokens](loop[tokenstokensast[]](if(empty?tokens)ast(let[token(firsttokens)](cond(or(is-literaltoken)(is-symboltoken))(recur(resttokens)(conjasttoken))(is-iter-inittoken)(let[sub-list(find-iter-sub-listtokens)[_token-valseparator]token](recur(drop(+2(countsub-list))tokens)(conjast[:itertoken-valseparator(build-astsub-list)]))):else(throw(unexpected-token-exceptiontoken)))))))

The build-ast function is a bit complex, but let’s break it down into
a series of steps:

It loops through all the tokens; the ast which will be the
result of the function.

The if in line 6 simply asserts that when there are no tokens left,
we are done so just return the ast.

If there are tokens left then line 8 lets token to the first token
in the iteration.

The cond has 2 cases and an else

Case 1: the “simple” case, if the token is a literal or symbol
simply add token to the ast.

Case 2: the “hard” case is evaled when an :iter-init token is
found.
The find-iter-sub-list function will return all tokens inside
the :iter-init and :iter-end pair.
The recur call will drop all tokens found in the sub-list + 2
which corresponds to the :iter-init and :iter-end tokens.
The ast is then conjoined to a recursive call of the build-ast
procedure over the sub-list of tokens.

else case: if something other than an :iter-init, :symbol or
:literal is found, then throw an exception because we found an
unexpected token.

This finishes our explanation of the parsing step. I have tried to write
the simplest possible parser, hopefully you think so too. A more complex
one would, for example, include better error detection mechanisms.

Step 3: Rendering

We have arrived to the final step in our template rendering system. This
system builds upon steps 1 and 2 and takes an AST and a context as
arguments and returns a rendered template. A context is simply a map
where every key is a keyword and vals are numbers, strings or
collections of contexts.

;; The {{.}} refers to the item that is being iterated.
user=>(deftokens(parse"{{#numbers}}Number:{{.}}{{/numbers}}"))user=>tokensuser=>[[:iter"numbers":default[[:literal"Number:"][:symbol"."]]]]user=>(eval-treetokens{:numbers[12345]})"Number:1,Number:2,Number:3,Number:4,Number:5"

Let us now look at the implementation of eval-tree:

(defneval-tree"Evaluates a compiled template as a tree with the given context"[treectx](cond(is-literaltree)(secondtree)(is-symboltree)(let[[_token-val]tree](if(=token-val".")(strctx)(getctx(keywordtoken-val)"")))(is-itertree)(let[[_token-valseparatorsub-tree]treecoll(getctx(keywordtoken-val)[])](->>(map#(eval-treesub-tree%)coll)(interpose(if(=:defaultseparator)","separator))(applystr)));; Only executed once: the first time eval-tree is called, no subsequent
;; recursive call will go through this branch.
(coll?tree)(applystr(map#(eval-tree%ctx)tree))))

eval-tree runs through each node in the AST recursively as follows:

If a :literal token is found, return the token’s value.

If a :symbol is found then check if its value is "."

Case token-val is ".": return the complete context as a string.

Case token-val is not ".": look for token-val as a keyword
inside the context map and return that.

If an :iter tree is found then:

letcoll to to the collection inside the context that matches
the :iter’s token-val.

Recursively call eval-tree on every node in the sub-tree.

Interpose the previous result with the separator which is by default
','.

Finally concatenate all “sub-templates” and separators.

The final branch (coll? tree) is actually only evaled once, the first
time that eval-tree is called as ASTs produced by parse are not
actually trees, they are a list of trees. This step is equivalent to the
previous branch sans interposing with separators.

Wrapping things up

Now that we have implemented tokenize, parse and eval-tree. The
render function is very simple:

(defnrender"Compiles and evaluates the template with the given context"[templatectx](eval-tree(parsetemplate)ctx))

Hooray!

I hope you now have a good understanding on how to write your own
template system. If you have any questions, suggestions or just want to
chat, you can e-mail me at fernandohur at gmail.com.