Approach: Building a toy template engine in Python

TL;DR Let's build a simplistic template engine and explore how it works under the hood – in case you were ever wondering.

If you wish to dive straight into the code, then Github is your friend.

Language design

Our language is pretty basic. We will have two types of tags, variables and blocks.

1
2
3
4
5
6
7

<!-- variables are surrounded by `{{` and `}}` --><div>{{my_var}}</div><!-- blocks are surrounded by `{%` and `%}` -->
{% each items %}
<div>{{it}}</div>
{% end %}

Most blocks need to be closed as in the example and the closing tag is just the
word end.

Our template engine will be able to handle basic loops and conditionals. We will
also add support for callables in blocks – personally, I find it handy being
able to call arbitrary Python functions in my templates.

In the example above, people is the collection and it refers to the current
item in the iteration. Dotted paths in names will resolve to nested dictionary
attributes. Using ’..’ we can access names in the parent context.

Conditionals

Conditionals need no explanation. Our language will support if and else constructs,
and the following operators: ==, <=, >=, !=, is, >, <.

Theory

Before delving into the details of how our engine will compile and render the templates
we need to talk a bit about how we will represent a compiled template in memory.

Compilers use Abstract Syntax Trees
to represent the structure of a computer program. ASTs are the product of lexical analysis on the source code.
An AST has a lot of advantages over source code since it does not contain unnecessary
textual elements such as delimiters; furthermore, nodes in the tree can be enhanced with
attributes without altering the actual code.

We will parse and analyse the template and build such a tree to represent the compiled template.
To render the template we will walk the tree, pass the correct context into each node and spit
out HTML.

Tokenizing the template

The first step in parsing the template is splitting the contents into fragments. Each
fragment can either be arbitrary HTML or a tag. To split the contents we will
use regular expressions and the split()
function.

Let’s analyze TOK_REGEX a bit. We can see that there’s an option between
a variable tag and a block tag. That makes sense – we want to split on either
variables or blocks. We surround the option with capturing parenthesis to include
the matching text in the resulting tokens. The ? inside the options is for
non-greedy repetition. We want our regex to be lazy and stop on first match,
so, for example, we can extract variables inside blocks. This
is a very nice explanation of how to control the greediness of regular expressions.

We will encapsulate each fragment of text in a Fragment object. This object
will determine the fragment type and prepare the fragment for consumption by
the compile function. Fragments can be one of four types:

Building the AST

Once we’ve tokenized the template content it’s time to go through each fragment and build
the syntax tree. We will use a Node class to serve as the base class for tree nodes, and
then create concrete subclasses for each possible node type. A subclass
should provide implementations for process_fragment() and render().
process_fragment() is used to further parse the fragment contents and store
necessary attributes on the Node object. render() is responsible for converting
the node to HTML using the provided context.

A subclass could optionally implement enter_scope() and exit_scope() which
are hooks that get called by the compiler during compilation and should perform
further initialization and cleanup. enter_scope() is called when the node
creates a new scope (more on this later), and exit_scope() is called when the node’s
scope is about to get popped off the scope stack.

In order to determine the node type (and hence instantiate the correct node class) we will look at the
fragment type and text. Text and Variable fragments directly translate to text nodes and variable nodes.
Block fragments need a bit more processing – their type is denoted by the
block command which is the first word in the fragment text. For example, the fragment:

1

{%eachitems%}

is a block node of ‘each’ type since the block command is each.

A node can also create a scope. During compiling, we keep track of the current
scope and new nodes are added as children of that scope. Once we encounter a correct
closing tag we close the scope, pop it off the scope stack and set the top of the stack
as the new current scope.

Rendering

The last step in the pipeline is to render the AST to HTML. We visit all the
nodes in the AST and call render() passing as a parameter the context given
to the template. During rendering we will need to deduce whether we are dealing
with literals or context variable names that need to be resolved. For this, we will use
ast.literal_eval() which
can safely evaluate strings containing Python code:

If we are dealing with a context variable name instead of a literal then we need
to resolve it by searching for its value in the context. We need to take care of dotted names
and names that reference the parent context. Here’s our resolving function which is
the final piece of the puzzle: