Parser Combinator in real life

I had to write an application that handle 2200 msg per second.
This application consist of indexing and compress data with 7z.

And here is why I use a parser combinator.

I will write another post for the usage of the awesome fszmq and a custom interop of 7z in a later post.

Disclaimer

I attempted to use Linq to xml that load entirely the document element by element in memory causing LOH (on data that I don’t want to index) problem when a big element is loaded.

The XmlProvider is based on Linq so I had the same problem.

Both are very usefull and the usage of parser combinator is overkill when there is no reason :)

The Problem

All my documents consist of a xml document that contains information to store and organize for indexation process and other data to just forward (to the 7z zipper part).

2 elements of the xml contain logs of partners request/response data.
Those 2 elements could have more than 10MB and I don’t want to load this kind of message entirely due to LOH + GC problems (avg is near 1200msg/s :)).

When I check the linq to xml implementation, it load the document entirely element by element in memory with the XmlReader.

I remember how Linq and FSharp.Data were very usefull in this case but for the performance reason it is not ok!

After dealing with if/else/pattern matching approach, I tried Parser Combinator style.
Instead parsing char, the context used by the parser is the XmlReader.
I don’t want to rewrite a full xml parser because parsing xml is too hard and XmlReader is ok for that point.

A Solution in Functional style

I have to index a lots of document so compose a parser should be easy
The solution should offer a maximum of flexibility to compose things together

Where ‘x could be the XmlReader and the run function that just unwrap the parser function and apply the given context.

>>. is an operator that takes 2 parser and return the result of the right parser.
If you understand the meaning of the point you should intuitively understand that the .>> do the same except that it return the left result
.>>. return both left and right result.

The pros

Hey it is composition, I can reuse parser and compose them to create new one!

It fix a real issue (I start writing a code that parse xml with XmlReader with hard issue to solve in existing code)

The cons

It was my first attempt, so the first parser was harder to write but others were very easy to compose (The most part was to understand how works XmlReader and when the context has changed or not!)

The use of operator and applicative separate the definition of the function from its execution. When you are debugging the run function you have to deal with a stack containing operator name (like GreaterGreaterEqual). Inlining the operator could help but it is more difficult to debug.

For now, in my experience, I delivered multiple fix / multiple parsers composition and it is not a big deal to change my habits, finally it was easy to fix and maintain.