Crushing Haskell like a Tin Can

Parsing Market Data with Ragel, clang and GHC primops

Feb 9th, 2012

From time to time, I have the need to parse relatively small, fixed width binary messages. Like a ITCH 4.1 MoldUDP64 packet from our buddies over there at NASDAQ. Parsing should also be reasonably quick. And I’m lazy. ~~Writing parsers by hand~~ Maintaining handwritten parsers is no fun.

So I’m going to define our parser in Ragel, a parser generator for regular languages (think: regular expressions). It targets C/C++ and some other languages I don’t care about. Like Objective-C, D, Java and Ruby. We’ll focus on C99.

This example parser will handle a cut-down view of the ITCH spec. The full source for this post is available over on GitHub. We’re looking for order executions:

123456789101112131415161718192021222324252627282930

machineITCHv41;# Common fields for both Order Executed messagesnanoseconds=any{4}>{nanos=__builtin_bswap32(*(constuint32_t*)(p));};orderExecutedShares=any{4}>{shares=__builtin_bswap32(*(constuint32_t*)(p));};orderExecutedMatchNum=any{8}>{matchNum=__builtin_bswap64(*(constuint64_t*)(p));};orderExecutedRefNum=any{8}>{refNum=__builtin_bswap64(*(constuint64_t*)(p));};orderExecutedCommon=orderExecutedRefNumorderExecutedSharesorderExecutedMatchNum;# 4.5.1 Order Executed MessageorderExecuted='E'%{type=1;}nanosecondsorderExecutedCommon;# 4.5.2 Order Executed With Price MessageorderExecutedPrintable='Y'%{printable=true;}|'N'%{printable=false;};orderExecutedPrice=any{4}>{price=__builtin_bswap32(*(constuint32_t*)(p));};orderExecutedWithPrice='C'%{type=2;}nanosecondsorderExecutedCommonorderExecutedPrintableorderExecutedPrice;main:=orderExecuted|orderExecutedWithPrice;

Ragel’s compiler does a bunch of optimization and pumps out a hot mess of GOTOs that regular C compilers like gcc, icc and clang eat for breakfast every day. This parser really isn’t very exciting:

Particularly when compared to a complete, validating ITCH parser:

An autovectorizing compiler can turn these state machines into machine code on par with some of the finest hand-rolled parsers. clang/LLVM does a decent job, adequate for now… but it also has some magic just hidden below the surface for Haskell developers. Namely an LLVM backend.

Typically we’d go ahead and consume parsers like these in GHC with a vanilla foreign import. But even with an unsafe import there is a relatively fixed amount of overhead due to switching calling conventions and loading out parameters. Normally this isn’t such a big deal, but we’re dealing with a lot of packets during simulation, billions and billions, and I’d like to dedicate some CPU to a task more productive than parsing.

To adopt GHC’s calling convention, we need to make our C code look enough like Haskell so they’ll play nice together. The first step is to define a function signature that looks like a regular STG function. The same thing that GHC generates. Like so:

This is still a ccall function but we’ll fix that later. There is currently no way to define this as cc10 (LLVM’s internal name for GHC’s calling convention) in clang.

Step two is to jump to the return address, which lives on top of the STG stack (the sp argument)… with the desired arguments, like the results of parsing, in tow. This is a regular function call that gets converted to a tailcall later on by llc, LLVM’s native compiler, when using cc10.

123456789101112131415161718192021222324252627

// define a function pointer type that matches the STG calling conventiontypedefvoid(*HsCall)(int64_t*,int64_t*,int64_t*,int64_t,int64_t,int64_t,int64_t,int64_t,int64_t,int64_t*,float,float,float,float,double,double);// Invoke the parser defined in our Ragel code%%writeexec;constHsCallfun=(HsCall)sp[0];// and then "return" our parameters as an unboxed tuple back to Haskell landreturnfun(baseReg,sp,hp,type,nanos,shares,price,matchNum,refNum,spLim,undef,undef,undef,undef,undef,undef);

And run it through sed to fix up the calling convention for the code generator… this is the magic part. Note: this is also overly general and will break any legit C calls. llc then pumps out an object file that GHC will link with later:

The last bit is to create a foreign primop import in Haskell. Many messages don’t fit within the 5 free registers (R2-R6) that are available here and need to be partially loaded onto the stack. In this example I’m just discarding the ‘printable’ flag to make everything fit in registers. Managing the stack is more involved. Perhaps I’ll cover it in a future post.