An early look at ThreadScope, a tool for profiling concurrent and parallel Haskell programs

GHC 6.12 has the ability to generate event logs for your Haskell programs that contain concurrency- and parallelism-related information from the runtime. These event logs can be parsed using the ghc-events package, and visualised using ThreadScope (developed by Donnie Jones, Simon Marlow and Satnam Singh). I’ve had a play with the in-development version of these tools (GHC 6.12 is currently a release candidate, and ghc-events and ThreadScope are under active development) and wanted to share my experience.

To use ThreadScope at the moment, you need to have both a GHC 6.12 install (to run your program and generate the event logs) and a GHC 6.10 install (to run ThreadScope). This is because the build system of gtk2hs has not yet been updated to work with GHC 6.12 — eventually everything should build on GHC 6.12.

On your GHC 6.12 install, you then compile your program with the -eventlog flag. I’m using my boids program which is already cabalised, so I just had to add -eventlog to my ghc-options: field. When you run the compiled program, you need to pass the -ls option to the RTS, and you’ll probably also want to pass -N as usual to get it running on multiple cores (in GHC 6.12, if you don’t give -N a number, it uses all your cores). So the full command-line might be boids +RTS -ls -N. This will generate a .eventlog file for you — which can get quite large: my boid program (which admittedly has a lot of concurrency) generated a 100MB file, so I scaled back the number of cycles it was performing by a factor of 10.

You can then load the event log in ThreadScope. The eventlog of my quick first run of the boids program gave me this in ThreadScope (click to see more detail):

Each green bit is code executing, and each orange bit is garbage collection (GC). The two cannot overlap (this would be termed a concurrent garbage collector, but Haskell has a parallel garbage collector), so all the orange is bits where your code is not executing — this is confirmed if you zoom in.

If your ThreadScope profile looks like a green millipede with lots of orange legs, GC is happening very often. One common way to alleviate this problem is to increase the heap size by passing +RTS -H100M (or similar) to your Haskell program.

I added this greater heap size to my options, and when I re-ran I got this:

The green is now denser, and the program runs quicker. (My previous benchmarking results already used this increased heap size, but it’s nice to see the visual explanation of how it helps.) This zoomed out view gives a useful overview: I could see the rough profile of my GC, and I can see that all eight cores are being used roughly equally and constantly. Something I have learnt about ThreadScope is that while this zoomed out view is useful, the really interesting stuff only appears when you zoom in (and turning on labels helps too). Here is a teaser shot of a zoomed-in view of that file with labels turned on:

You can now see information on thread migrations — and you can see that my seemingly dense blocks of green actually contain some gaps in them. Intuitively, reducing the white and increasing the density of the green will make my program go faster. I’m hoping to become a bit more familiar with the GHC RTS to understand how to reduce those white gaps, both for my boids but also for other CHP programs. ThreadScope certainly looks like a very promising tool for increasing concurrent and parallel Haskell performance.

Like this:

Related

Try pulling the latest ThreadScope sources – yesterday I added an activity graph to the display so you can see how much CPU utilisation you’re really getting.

Also watch out when increasing the heap size, sometimes it can actually make the program slower. One reason is because the small nursery size fits in the cache, but I need to investigate some other odd effects I’ve seen too (which I wouldn’t have noticed if it weren’t for ThreadScope :-).