Support

Language

Parser Combinators in Swift

This talk was delivered live in March 2017 at try! Swift Tokyo. The video was recorded, produced, and transcribed by Realm, and is published here with the permission of the conference organizers.

Parser combinators are one of the most awesome functional techniques for parsing strings into trees, like constructing JSON. In this talk from try! Swift, Yasuhiro Inami describes how they work by combining small parsers together to form more complex and practical ones.

My name is Yasihiro Inami; I work at the LINE Corporation, a messenger app company. It is an honor to be a speaker at the try! Swift conference, with great developers from all over the world. In this talk I present one of the most fantastic techniques in functional programming: parser combinator.

Parser is an analyzer that takes a string as an input and creates a syntax tree. It consists of two parts: the lexer (takes a long string and converts it into small chunks, called tokens), and the syntactic analyzer (takes the tokens and creates a Syntax Tree).

When we do parsing, there are mainly two approaches: bottom-up and top-down parsing.

For example, there is a string in the bottom A = B + C * 2 ; D = 1. How do we parse this, and in what order?

In bottom-up parsing, we take a closer look at the bottom tree first. We keep reading the text, we can expand the tree and, lastly, we reach the top. The parsing starts from the bottom to the top.

In top-down parsing, you read the partial text and, at some point (a bit earlier than bottom-up parsing), the top-down parser starts to predict what tree should come at the top. And it keeps guessing down to the bottom.

The parser combinator is a recourse in this parser, and it is a top-down parser. If you have ever read the Apple Swift Code before, the code is in leaf parse, and it also uses the same technique in C++. This is known as DSL imperative way.

We prepare a simple struct called Parser (this is a generic Parser<A>). It is a container of the closure (let parse). It takes String as input, and, in this case, the output is the tuple. The tuple consists of “output” and “remaining input”. Because the parsing sometimes fails, I use the Optional.

This structure is called state monad, and it may combine the monad or optional monad together. From here is monad’s showtime. It is not difficult; relax.

The pure is instantiating the parser. It parses the closure later. Inside that closure, you regain the tuple, which means this parser is always successful returning the variable <A>, and it does not consume the input. For the empty, the nil is returned. That means this parser always fails.

First, you receive the input, you see the closure, and the input is passed as an argument. You parse that with the given parser p. If that parsing succeeds: you get a tuple of it, you extract the result part, and apply f. And that f is the result of the second parser. You use that parser for the main input, in this case, input2. Otherwise, if the first parser fails in the first place, you would just return nil. That means the whole parser fails (because the first parsing fails).

Once you have this fmap operator, you can immediately get the functor fmap. In Swift, we do not use $, instead we use a ^ sign (we cannot use $ for the custom operator).

// Applicative sequential applicationfunc<*><A,B>(p:Parser<A->B>,q:Parser<A>)->Parser<B>{returnp>>-{finf<^>q}}// Sequence actions, discarding the value of the second argumentfunc<*<A,B>(p:Parser<A>,q:Parser<B>)->Parser<A>{returnconst<^>p<*>q}// Sequence actions, discarding the value of the first argumentfunc*><A,B>(p:Parser<A>,q:Parser<B>)->Parser<B>{returnconst(id)<^>p<*>q}

The first one is a function application for monadic structure.

The second one is a sequence action (it looks like an arrow pointing to the left). It tries parser p then q; it uses both. But only uses the left side of the output of p; it discards the output of q.

The last of our operators is the opposite. We try parser p then q. It is the same as before, but it only uses the output of q (not p).

Here is the satisfy parser I implemented (it is fundamental). It examines the input. You decompose it using the uncons function, it is the head and tail. Examining the head part with a predicate: if that predicate succeeds, you get the decomposed tuple; otherwise, the poor parser fails.

You can also create any() parser (can parse any character), or digit() (can only parse the numeric number, and numeric character, or some specific character), and char(c: Character) parser. It is readable, simple:

Here are some important parsers called many and many1. The parser p is given, and they use that parser p many times. The first many parser uses the parser p from zero occurrence to many occurrences, and it never fails. The many1 parser must be able to parse at least one occurrence of p, otherwise it fails. They are mutually recursive.

Here are skipping parsers: skipMany and skipMany1. It looks similarly to the previous many and many1, but notice that the result part is now an empty void, an empty tuple (meaningless output).

It seems similar to using this parser for the grounds, but it is more efficient, it has better performance than many and many1. And we can implement important particles, skipSpaces. As you can see, the implementation skipSpaces() is common when we want to parse things. We want to skip spaces, but we do not care whether it is top or line with feet.

Let’s add symbol(str: String) and natural() parsers. You might notice that there is a middle parser in the middle that is sandwiched 🍞 byskipSpaces() parser. With applicative, I do not like operators pointing this direction. It first skips the head white space, then uprising middle parser, and lastly, skips the tail space. But it only uses the output of the middle parser. It does not care about the spaces, it only cares about the string (like a hamburger! 🍔). For the simple parser, the relevant (delicious) part (meat) is a string parser. For natural parser, the relevant part is many1(digit())) (again, like a hamburger!), which converts the result party from string to integer. Only three lines, sometimes one line of code.

For those who want more on parsing, I developed TryParsec. It is a monadic parser combinator for this try! Swift conference (nowhere else!). It is inspired by Haskell, and especially the Attoparsec and Aeson libraries. It supports CSV, XML, and JSON. Technically, it can parse LL(*), with backtracking by default. This library is not using the do track catch in Swift. This Tryparsec… does not actually try anything, it does not use any try keywords, but please try it (because this is the try! Swift conference). This library is not open source yet (I will, right after this conference).

In TryParsec, the enum JSON is already built in (common, basic type). Let’s say we have a JSON String, how do we parse? All we need to do is call parseJSON, and you immediately get the JSON enum tree. Simple, easy.

In this process, I do not use any NSJSONSerialization, or any object that is not involved in this process. It is all written in pure Swift. But we are not interested in this structure. This is an intermediate representation. We want our custom model to be mapped to it. We need JSON decoding and encoding features (which we all love ❤️), and Swift has about 20-30 libraries for doing so. Fortunately, TryParsec, is the 31st JSON decoding and encoding library. 💯

You have to implement the conformance to the FromJSON protocol. You implement the static func fromJSON, and you use the helper message code fromJSONObject. This is common writing code applicative style. If you have ever tried the JSON library called Argo, TryParsec uses the same technique. Call function decode, and you get the mapped result.

TryParsec supports JSON, CSV and XML. It is simple, readable, and easy to create your own parsers (please try it). I also have the Xcode playground (you can also play around with it).

But let me point out some caveats:

It does not have good performance yet (compared to NSJSONSerialization).

FromJSON / ToJSON protocols (the mapping) are limited. Any library (Argo and other great libraries) has limitations… because Swift is currently limited. FromJSON / ToJSON does not work in some nested structures.

Hopefully, Swift 3 will solve these problems. Swift is now open source, it is evolving every day, and we have a great community.

Q: Cool stuff. I have two questions, you can pick which one you want to answer. The first one is, is there any support for error? If the parsing fails can you annotate your parser combinators to say, “if this part fails, maybe I am going to throw this error here”. The second question is, how did you do the ToJSON, do you have a printer combinator library?

Yasuhiro: I will answer the second question first. There is no printed JSON yet. I think I need to implement that. Now it is just one line JSON string. And the second one, is an error porting is important. But I did not implement it, because the performance is already bad at this point. I need to improve that first, and then hit our importing. Sorry about that.

This talk was delivered live in March 2017 at try! Swift Tokyo. The video was recorded, produced, and transcribed by Realm, and is published here with the permission of the conference organizers.

Yasuhiro Inami

Yasuhiro is an iOS developer at LINE Corporation. While creating iPhone apps such as messenger, camera, news app in his work, he also spends time on making open source projects like ReactKit and SwiftTask. He is a big fan of Apple, Swift, and Hearthstone. You can find him at Battle.net or GitHub.