The content of the lecture is closely related to the chapter 8 Functional Parsers of the book Programming in Haskell by Graham Hutton. It starts out with the definition of a type for a parser and a few very basic parsers.

I’m totally amazed by the simplicity, the elegance and the compositional aspect of these examples. I find it impressive how primitive but yet powerful these simple parsers are because they can easily be combined to form more complex and very capable parsers.

As an exercise, out of curiosity, and because of old habits I implemented the examples from the book in C#.

At the end of this post there will be an ultimate uber-cool parser example of a parser for arithmetic expressions.

The parser type

The type is just a function that takes a string as input and returns a list of tuples of a value of any type and a string:

public delegate List<Tuple<T, String>> Parser<T>(String inp);

The first item of the tuple represents a parsed value and the second item is the remaining unconsumed part of the input string. The fact that the return type is a list indicates that the result can also be empty if the parser fails. Otherwise, if the parser succeeds, the result will always be a singleton list.

Basic parsers

Here are the three most basic parsers.

Return always succeeds and returns the value that it was instantiated with as well as the unconsumed input string.

Failure always fails and therefore returns an empty list.

Item returns the first character of the input string as well as the remaining unconsumed input.

Note: In C# we don’t really need this. I think, in Haskell this is needed because the parsers will be instances of the Monad type class.

Sequencing

Next we need a function to combine multiple parsers. This can be done with Bind which takes a parser and a function that returns another parser based on the result of the first parser. If the first parser fails, the whole computation will fail. Otherwise the function will be applied to the result of the first parser and return a parser that then will be applied to the unconsumed remaining input string.

var p =
from x in Parsers.Item
from _ in Parsers.Item
from z in Parsers.Item
select new string(new []{x,z});
var result = "abc".Parse(p);
Check.That(result).ContainsExactly(Tuple.Create("ac", String.Empty));

Choice

Another way of combining parsers is to apply one parsers and if it fails apply another one with the Choice function:

The parser CharP can now be composed to a parser that matches a string:

public static Parser<string> StringRec(string str)
{
return String.IsNullOrEmpty(str)
? Return(String.Empty)
: from a in CharP(str[0])
from b in StringP(str.Substring(1))
from result in Return(str)
select result;
}

Also two very useful parsers are Many and Many1 that apply a parser as many times as possible, where Many allows zero or more and Many1 requires at least one successful application.

The problem with recursion

In C# we have a little problem here. StringRec is defined as a recursive function and Many and Many1 are defined as mutually recursive functions. That means that they are defined in terms of each other. This is a beautiful and elegant thing. But since the C# compiler is not very good at supporting recursive structures, stack overflow exceptions are likely to happen already with relatively small input strings (> a few thousand characters).

This looks almost like an unfold operation over the input string. But I couldn’t really figure out how to refactor out a yet underlying pattern here. I’m not sure if that is possible at all or makes sense. (Maybe the list concatenation can be abstracted out?)