February 16th, 2011

It has been a long time since my computer science classes on parsing at the university, were I learned to build a basic parser in Haskell using parser combinators based on Parsec. I always found this an interesting subject, but writing your own parser is not something most programmers, including me, do in daily development. Especially if you’re a web developer.

But this week there were two instances where I thought “having a parser for this would be quit handy”: converting WikiText to HTML and building a markup tree from BBcode. Instead of undusting my Haskell college was notes, I set out to look for some examples in C#.

For WikiText I found this example: Parsing WikiText using a stack (dotnetperls). I adjusted the code to be able to parse MediaWiki text, which has a different syntax than the one used in the example.

For BBCode I wanted to try another technique: recursive descent parsing. I based my parser onEric White’s nice blog post series on the subject of writing a recursive descent parser with C# + LINQ. The result is a parse tree, which can be used for example to convert BBCode to HTML.

I wrote the BBCode parser just for fun. There is an existing stable .NET open source parser available: Codekicker.BBCode. I could not find a similar project for WikiText parsing in .NET.