When we last left our hero we had built enough of a recursive descent parser to be able to parse Lisp's S-expressions. In this post I want to show you what I've added to the framework since then. This includes some useful building blocks, including the regex leaf parser; and a demonstration of how easy it is to create a JSON parser with the framework.

Regular expressions (or regexes) are a great way to model a wide-variety of syntactical patterns. Not only that, but regular expression matchers are a built-in component of most computing environments; including Swift. While it might not be a great idea (or even possible) to build a complex parser out of just regular expressions, they can be a very handy way of building one up. Accordingly, I have added the Regex leaf parser.

This attempts to match the characters at the start of the input stream against the regular expression in pattern and returns the matching string if successful. This is now a powerful leaf parser that can be used to succinctly replace a lot of patterns that previously used multiple combinators, including identifier

let identifier = regex("[^\\(\\), ]+")

I would encourage the use of regex wherever it is a natural fit, because it pushes much of the complexity onto the platform's regex matcher (which is, presumably, more mature, stable, and efficient than my infant code). Regular expression syntax is well-known and well-documented and the ease with which complex patterns are built up is another reason to prefer it when possible. For instance, the identifier parser above is only as simple as it is because it was too much hassle to do something smarter. But with regex we can go a lot further with ease, for instance, a C-like identifier syntax is parsed by:

let clike_identifier = regex("[_a-zA-Z][_a-zA-Z0-9]*")

This would have been messy to write using just Satisfy, Many, and FollowedBy.

sepby

I added the sepby combinator which, like a lot of this, is also stolen from FParsec. sepby is used to parse a sequence of items of type T into an array [T], where there is a known pattern separating the items. There are three cases to consider when parsing lists:

An empty list, which results in an empty array.

A singleton, which contains no separators, and yields an array of length 1.

A list of n items, which are separated by n − 1 instances of the separator pattern.

Although we could parse this with existing operators, it would be a big, clumsy expression for such a common construction. Something like parsing a comma-separated list of words is now straightforward

While this is mostly straightforward, I've added the option of a strict floating-point number, which only treats a sequence of characters as a floating point number if it cannot be interpreted as an integer.

JSON parser

JavaScript object notation (JSON) is a popular format for serializing data1. With as much of the parsing framework as we now have, a JSON parser is remarkably easy to implement. First, some definitions:

Notice that JSValue is self-referential, and that JSObject and JSValue are mutually-recursive. It should therefore come as no surprise to see that the implementation of JSParser makes use of LateBound to deal with these loopy constructs:

Next

Hopefully this JSON example has demonstrated some of the power of this approach. However, there are two big gaps in the framework's capabilities right now. The first is the ability to parse infix notation. Although it is possible to parse infix with with a vanilla recursive descent parser, it's painful. If this parsing framework is to be a serious contender in which to implement a real programming language, we'll need to be able to handle infix (and postfix) notation, as well as the prefix-heavy style that RD is so good at. The second omission is that there is no error handling. If there's an error in an input stream, we simply return nil. That isn't good enough: especially it we aspire to being the front end of a compiler. How might we add useful error handling?

If I can reach the keyboard over my distended abdomen in the next week, I'll get on to addressing these issues.

It does use a lot of double quotes, which is a pain, but at least it's not XML↩

Posts About Swift

Swift is a new language from Apple. At first glance there's a lot to like about this, so I thought I'd spend some time learning it. These are some blog posts about things that come up as I figure things out!