Haskell programmers tend to spend far less time with debuggers than programmers in other languages. Partly this is because for pure code debuggers are of limited value anyway, and Haskell’s type system is so strong that a lot of bugs are caught at compile time rather than at runtime. Moreover, Haskell is a managed language – like Java, say – and errors are turned into exceptions. Unlike in unmanaged languages such as C “true” runtime errors such as segmentation faults almost never happen.

I say “almost” because they can happen: either because of bugs in ghc or the Haskell runtime, or because we are doing low level stuff in our own Haskell code. When they do happen we have to drop down to a system debugger such as lldb or gdb, but debugging Haskell at that level can be difficult because Haskell’s execution model is so different from the execution model of imperative languages. In particular, compiled Haskell code barely makes any use of the system stack or function calls, and uses a continuation passing style instead (see my previous blog posts Understanding the Stack and Understanding the RealWorld). In this blog post I will explain a technique I sometimes use to help diagnose low-level problems.

Since I work on OSX I will be using lldb as my debugger. if you are using gdb you can probably use similar techniques; The LLDB Debugger shows how gdb and lldb commands correlate, and the ghc wiki also lists some tips. However, I have no experience with scripting gdb so your mileage may vary.

Description of the problem

As our running example I will use a bug that I was tracking down in a client project. The details of the project don’t matter so much, except that this project happens to use the GHC API to compile Haskell code—at runtime—into bytecode and then run it; moreover, it also—dynamically, at runtime—loads C object files into memory.

Finding a call chain

The lack of a suitable backtrace in lldb is not surprising, since compiled Haskell code barely makes use of the system stack. Instead, the runtime maintains its own stack, and code is compiled into a continuation passing style. For example, if we have the Haskell code

none of which is particularly informative. However, stepping manually through the program we do first see function A on the (singleton) call stack, then function B, and finally function C. Thus, by the time we reach function C, we have discovered a call chain A, B, C—it’s just that it involves quite a bit of manual work.

Note that if you are using the threaded runtime, you may have to select which thread you want to step through before calling sf:

(lldb) thread select 4
(lldb) sf

Tweaking the script

You will probably want to tweak the above script in various ways. For instance, in the application I was debugging, I wanted to step into each assembly language instruction but over each C function call, mostly because lldb was getting confused with the call stack. I also added a maximum step count:

You may want to tweak this step into/step over behaviour to suit your application; certainly you don’t want to have a call trace involving every step taken in the Haskell RTS or worse, in the libraries it depends on.

Back to the example

Rather than printing every step along the way, it may also be useful to simply remember the step-before-last and show that on a crash; often it is sufficient to know what happened just before the crash. Indeed, in the application I was debugging the call stack just before the crash was:

Which is a lot more helpful than the backtrace, as we now have a starting point: something went wrong when running the bytecode interpreter (remember that the application was compiling and running some Haskell code at runtime).

To pinpoint the problem further, we can set a breakpoint in interpretBCO and run sf again (the way we defined sf it steps over any C function calls by default). This time we get to:

and if you compare this with the code loaded into memory it all becomes clear. The jump instruction in the object file

jmpq _puts

contains a symbolic reference to puts; but the jump in the code that we are about to execute in fact jumps to the next instruction in memory:

0x1024a421c: jmpq 0x1024a42210x1024a4221: ...

In other words, the loaded object file has not been properly relocated, and when we try to call puts we end up jumping into nowhere. At this point the bug was easily resolved.

Further Reading

We have barely scratched the surface here of what we can do with lldb or gdb. In particular, ghc maintains quite a bit of runtime information that we can inspect with the debugger. Tim Schröder has an excellent blog post about inspecting ghc’s runtime data structures with gdb, and Nathan Howell has written some extensive lldb scripts to do the same, although they may now be somewhat outdated. See also the reddit discussion about this blog post.