I am writing a simple parser framework in Swift; mostly as a learning exercise. Specifically, I am writing a recursive descent parser. Having never been a fan of the traditional lex/yacc parser generators I first came across recursive descent in the Boost Spirit C++ library. Since then, the framework I have used most is FParsec for F#1. What I'm doing here borrows heavily from FParsec.

This post is simply going to describe the basic features and key classes of this framework, working up to a demonstration showing how we can build a simple parser for S-expressions. You can find the project's code on GitHub: https://github.com/ollie-williams/SwiftParsec . You'll see that the code in this post is simplified in order to focus on the main points and not get sidetracked by details.

An implementation of Parser is a unit which translates characters into an output of type Target. The CharStream class, as its name suggests, supplies a stream of unicode characters representing the input we want to parse. To save space, I'll omit details of CharStream for now. Its behavior should be pretty obvious from context. The return type of Parser.parse is an Option<Target>, used to indicate success or failure.

Basic parsers

Let's demonstrate an implementation of Parser with something simple: a constant string parser

Constant.parse checks whether the leading characters in the stream match a specific string value. If so, it pulls those characters off the stream via the advance method and returns Some(value) indicating success. If the head of the stream doesn't match, parsing fails, the stream is left unaltered, and nil is returned to indicate this failure to the caller.

Constant is a parser which directly converts characters into a value. I refer to such parsers as basic parsers, as these are the building blocks from which more complex parsers are constructed. Another example of a basic parser is Integer:

FollowedBy connects two parsers "in series"; to parse the whole we must successfully parse both the parts sequentially.

FParsec makes liberal use of the fact that, in F#, one can define arbitrary operators. This makes building up quite large parsers succinct and straightforward. Swift also has this facility which means I can crib even more goodness from FParsec. In the case of FollowedBy we define the infix ~>~ operator

Because we declared ~>~ to be left associative the type hwparser.Target is a tuple of tuples: ((String,String), String), as this output reveals. Frequently, when connecting parsers in series, we don't want to keep both sides. In these cases there are also the operators ~> and >~ which only keep the output of the left and right hand sides, respectively.

Another useful combinator is Alternate, which connects two parsers in parallel. As with FollowedBy an alternate parser can be created via the | operator. If we define

let p1 = ...
let p2 = ...
let alt = p1 | p2

alt attempts to parse a stream by first applying p1 to it. If p1.parse returns nilp2 is attempted. For example, the program

Recursion

I said this is a recursive descent parser. So where's the recursion? That's the least obvious part of the puzzle. To demonstrate it, let's try to write a less trivial parser—one to parse Lisp's S-expressions. Here is an example of an S-expression:

As you can see, Expr is a recursive type, since the children member is itself an array of Exprs. There are two types of expression. Leaves, have no children and just appear as a stand-alone identifier. In our example, a, b, and c are leaves. A function call consists of a function symbol and zero or more child expressions. (add a (g b)) is a function call, with function symbol add and the children are a and (g b).

Let's start building up a parser for Expr by building a parser for identifiers:

I have made use of the helper functions manychars and many1chars, as well as satisfy. For sake of space, I won't describe these here, and hope that their usage and behavior are clear from context. skip is a very primitive skipper which simply jumps over spaces. A full skipper would need to account for other types of whitespace. identifier then treats any sequence of one or more characters, which are not either a space or a parenthesis, as an identifier. The identifier also does some housekeeping for us by skipping any whitespace that follows the string we want.

Since leaf expressions are just identifiers, these are easily parsed with a pipe to perform the translation:

let leaf = identifier |> Expr.MakeLeaf

Parsing a function call is where it gets interesting, since there's a recursion. To manage this we can use the LateBound parser:

LateBound enables us to define "loopy" parsers because it decouples the creation of an object satisfying Parser from the actual function implementing the parsing operation. Armed with LateBound we can complete our S-expression parser:

Posts About Swift

Swift is a new language from Apple. At first glance there's a lot to like about this, so I thought I'd spend some time learning it. These are some blog posts about things that come up as I figure things out!