Progression: Supporting Optimisation in Haskell

Progression

I have recently been working on optimising CHP. In order to optimise a program, you need a set of benchmarks/tasks for which you want to improve performance. Then, you need to record how long the original version takes to complete the benchmark, record how long a changed version takes to complete the benchmark and compare the two to see if your change was sufficiently beneficial. This process is tedious, and staring at two columns of numbers is not always instructive, so I constructed a small Haskell library to help with optimisation. I’ve called it Progression, and it is now available on Hackage. Progression allows you to specify some benchmarks, run them, and graph their performance against each other; so what you get is a graph like this:

Each line is a different version of my CHP library, and each group of points is a different benchmark. (The versions were made in the order purple, blue, green, red; I ultimately stuck with the green version.)

Using Progression

Progression uses the excellent Criterion library for doing the benchmarking, so it is used in a similar fashion to Criterion. You construct a small wrapper program that defines/imports your benchmarks and passes them to the Progression library to be run. For example, here is the end of my file containing my CHP benchmarks:

My runAll function turns the second part of each pair from a CHP () type into IO (), then I map the (Criterion) function bench over the list, use the (Criterion) function bgroup to join them all together, then pass them to the (Progression) function defaultMain. I compile this file into an executable.

When you run the program, you can either pass in settings via the command-line arguments, or if they are not present, you will be prompted with an interactive tab-completing prompt (thanks to the easy-to-use Haskeline library). There are three main settings:

which benchmark groups you want to run (if you only have one group, you won’t be prompted),

what to label the recording you are about to make, and

which previous labels you want to compare against in the graph.

Once you’ve entered these two or three items, the benchmarks will all be run by Criterion, the results stored into a CSV file (using the handy txt-sushi library; this way you can easily manually inspect the data or further process it in a spreadsheet program), which is then combined with the previous stored CSV files and fed to GNUplot. (Unfortunately the gnuplot binding in Haskell is not in great shape at the moment, so I’m using a plain system call to run it — if you want graphs, make sure you have GNUplot installed on your system.) The program creates a graph like the one shown at the beginning of the post — by default in the file plot.png (but again, configurable by command-line option). If, later on, you want to just graph previous results against each other, you can do that by running the program with “-m graph”. On the graph you get points plotted at the means (staggered for each benchmark to aid readability and comparison) and error bars for the bounds that Criterion gives — by default these are the 95% confidence intervals, but that is configurable, too (by passing through an option to Criterion).

Installing Progression

1. Make sure gnuplot is installed on your system and available in your path. Gnuplot is very likely to be in your package manager if you’re on Linux.
2. cabal update && cabal install progression

Note that by default, Criterion uses the Chart library for plotting (a feature of Criterion that Progression does not use), which is in turn dependent on gtk2hs, which can be problematic for some people to install. If you get messages about cairo not being installed and you don’t want to install gtk2hs, you can install Criterion without this support for plotting (and thus without the gtk2hs dependency). The command cabal install criterion -f-Chart should do this (the minus between f and Chart is crucial), but unfortunately it seems after that, that you must install progression manually by downloading it and running Setup.lhs (I had hoped cabal install progression would work but that seems to attempt to satisfy the Chart dependency even though criterion is by then installed).

Issues with the 0.1 release

I realise that the graph is technically invalid. I shouldn’t connect the points with a line because the X-axis is discrete, not continuous. However, without the line (i.e. with just points and error bars) it’s much less readable at a glance, and a bar chart with error bars didn’t seem too readable either when I tried it. The graph display still isn’t perfect though; it works best when you have benchmarks that take roughly similar times, and if you make one huge saving on one benchmark, as I did (a factor of about 100), this throws off the whole display. Normalising all the times, so that one of the versions has all its times normalised to one, would partially fix the problems. Also, if you run a lot of benchmarks, the CSV files do start to litter the directory; I wonder if I should store them in a subdirectory.

Pipe Dream

That’s the summary of Progression. I hope that Progression helps people who are optimising Haskell programs (especially alongside profiling and tools such as ThreadScope, that can help pinpoint possible candidates for optimisation). But the work-flow is not perfect. Often, not all the benchmarks are written and complete when you start benchmarking; you both make changes and add new benchmarks as you work. Currently, when graphing, Progression ignores any benchmarks for which it does not have data for every recording being graphed (this should perhaps be fixed). Ideally, you would re-run the old version of your library/program (for example, the original version before you began optimising) with the benchmarks to get some data. For this, we need access to an old version. All developers store their projects in version control (and with Haskell, that’s typically darcs) so we could automatically pull out the old version from there rather than making the programmer do it themselves.

So perhaps what we would do is dig through the version history for the latest tag starting “OPT:” (which is the “original” for this current round of optimisation), then re-run that version with the latest benchmarks. In fact, why restrict it to just the original? We could look back to find the last tag with “OPT:”, then work forwards from there, looking for patches, marking out those starting “BENCH:” (the ones that introduce new benchmarks). We would then try and create versions of your program from the OPT tag forwards, with all the different versions since then, but always featuring all the BENCH patches. This would give us complete benchmark data for all old versions, and would also mean you wouldn’t have to tell Progression what the labels are, it could just use the patch names/dates. We could also try different combinations of patches (if you’ve recorded A, B, and C, where B depends on A, try A, A&B, A&B&C, A&C) to see how different independent changes combine together to find the fastest combination.

I’m not sure how much work that would be, but it sounds like it might form quite a useful tool. Or, it might turn out to be too over-the-top and awkward — it might be best to just stick to Progression’s current straightforward design. Comments are welcome below, as are any questions or feedback about the current release of Progression.

Like this:

LikeLoading...

Related

> Unfortunately the gnuplot binding in Haskell is not
> in great shape at the moment, so I’m using a plain
> system call to run it — if you want graphs, make sure
> you have GNUplot installed on your system

I hadn’t looked into it, no. Progression was originally a bash script, calling a Criterion executable, txt-sushi and GNUplot from the command line (and I’m fairly familiar with GNUplot syntax). I moved it into a Haskell program but didn’t look much at the plotting. I’ll have a look at Chart, since Criterion pretty much depends on it anyway. Edit: one problem with Chart, which I now remember is why I hadn’t looked at it, is that it depends on gtk2hs. gtk2hs completely refuses to build on my GHC 6.12 machine at the moment, so I can’t use Chart on there.

This program fills a noticeable gap in profiling tools; I hacked together similar functionality for benchmarking “iteratee”. My design is a bit different; I have one script to run multiple versions of code (with different command-line options if necessary) and a second program that calculates/displays output. I have one question, one comment, and one suggestion/feature request:

1. Why do you create your own CSV files instead of using Criterion’s directly?

2. how does Progression behave with more data? My basic data set has about 25 different codebases (including differing compiler options). I can’t fit all of that data on one chart cleanly. Although I am using bar charts, maybe a plot like this would be better.

3. For my work, an absolutely necessary feature is the ability to reorganize output by transformations on the group/benchmark names. Could something like this be added to Progression?

1. The stored bench-whatever.csv files are written directly by Criterion; I added that feature to Criterion primarily for this purpose, and Bryan O’Sullivan took the patches🙂

2. You are prompted to enter what benchmarks you want on the graph. You could enter all 25 items, but I think any graph with that much data isn’t going to be clear. When I was using it, I generally compared the latest one or two with the best one or two before that; once one version had better performance than another, I stopped including the worse one on the graph. I think once you have a lot of data, it becomes more of an issue with work-flow than the tool itself, but if Progression can somehow be improved to help more then that’s great.

3. I’m not completely clear what you’re asking; are you saying you want to re-order the placement of the benchmarks on the X-axis according to a user-supplied sorting function? Even if that’s not what you’re suggesting, that’s something I’d like to add. You can see on my example graph that a simple lexicographical ordering of names puts my simpleChoice benchmarks in the order 10, 2, 5, where obviously I’d prefer 2, 5, 10.

John Lato

February 9, 2010 at 12:56 pm

Thanks for these answers, it seems I was slightly mistaken about a few things. Thank you for submitting the patches for Criterion to output .csv files; they’re very useful.

Re: 3, Not exactly. A single implementation, e.g. “hashKey”, can have multiple sets of benchmarks and should be represented as “hashKey1” and “hashKey2”. The Criterion benchmark outputs appear as “pure/maps/bench1” and “monadic/maps/bench1” instead of just “maps/bench1” and “monadic/bench1” as they do for other implementations.

After I implemented a means to accomplish this, it turns out to be more flexible, and more useful, than I originally expected. As an example, I can organize one chart per benchmark with the implementations along the x-axis, or one chart per implementation with the benchmarks along the x-axis, with some simple command-line flags.

In my case it’s not a process of incremental optimization, but rather deciding which of several alternate implementations offers the best overall performance. So I really would like to compare all the variants together, at least initially.

Does Progression keep track of which benchmarks have been run? That is, if you have a current .csv file and the benchmark file hasn’t changed, will it use the cached version?