2009-05-26

My personal computers are pretty old and/or slow. I have an old PowerBook G4 and a newishEEE PC. The PowerBook was top of its class when I got it with most options maxed out. Alas, that was five years ago. The EEE PC is by definition not a top performer, nor does it try to be. I find that when the two machines perform similarly in day-to-day tasks, at least when the EEE PC is in “Super performance” mode.

Truth is that for most of the mundane stuff I do these two machines perform acceptably. I’m obviously not going to be watching any 1080p movies on them, or enjoying the latest games (from a productivity standpoint I’m not sure these are such bad things), but most everything else works fine.

Where I feel the performance does hurt me is when compiling with GHC (or any compiler for that matter, it’s just that GHC is the one I use most). Often I spend way to much of my precious little private developer time waiting for the compiler to finish. This is in stark contrast to the situation at work where I run GHC on a pretty zippy Dell PowerEdge Blade Server.

In the near future I expect to be spending more time developing at home and want to be able to do so more efficiently, preferably on par with the situation at work. I could obviously buy shiny new hardware but being a miser (in case you couldn’t already tell based on my hardware) I’m looking for alternatives that would allow me to avoid or defer a hefty up-front investment. One such alternative I’m considering is to rent compute capacity in the Amazon Elastic Compute Cloud (EC2).

EC2 compute capacity is sold in the form of instances at an hourly rate ranging from $0.10 to $0.80 depending on capacity/performance plus some small-change for bandwidth and persistent storage. While there are a couple of hurdles to overcome in order to leverage EC2 as a development workstation the first thing I want to do is to make sure it is a goal worthy of pursuing in the first place, i.e. will I get the desired performance gains at a reasonable price?

To at least begin to answer this question I’ve done some informal benchmarking of the aforementioned systems, excluding the pricier EC2 instances. All the regular benchmarking caveats apply and to reinforce the unscientificity of it all I’m not going to bother providing complete specs for the systems. Here are the fundamentals:

I figure the results are probably more interesting that the details of the tests so here they are, the systems are ordered by increasing performance which happened to be consistent across the tests:

Benchmark results, times in seconds, shorter times are better.

astro-tables build

highlighting-kate build

fad test suite

PowerBook G4

292

(848)

28

EEE PC

291

643

18

EC2 Small

171

519

15

Dell PowerEdge

75

260

6

EC2 Medium

55

172

4

The EEE PC was in “Super Performance” mode during the tests and the PowerBook was at its highest CPU speed. All times are the “real” time as measured by the Unix time command and lower numbers are naturally better.

As can be seen an Amazon EC2 Small instance is only marginally faster than the EEE PC. An EC2 High-CPU Medium instance on the other hand is significantly faster than the zippy Dell PowerEdge Blade server. Is either one a good deal? Good question, I think a case could be made either way depending on your priorities but I’m not going to tackle that today.

If you care about the details of the tests read on, if not please move on to your next blog of choice!

astro-tables build

This package currently consists of a single automatically generated 4000-line monster of a module1. The code is pretty straight-forward but the module takes ages to compile, almost certainly due to me giving the type checker an unnecessarily hard time. I have a trivial rewrite on my todo-list which I expect will shorten the compilation time dramatically but the current form comes in kind of handy for the purposes of this benchmark. The Git repo is git://github.com/bjornbm/astro-tables.git and the commit used in the benchmarking was e63b8978833878526870b2101697197ff64af593. I made sure the dependencies were already installed and ran time cabal install.

highlighting-kate build

From recent memory I knew that John MacFarlane’s highlighting-kate package has a hefty number of modules (the majority of which are also automatically generated) that take a fair amount of time to compile. I downloaded version 0.2.4 from hackage, made sure all dependencies were already installed, and ran time cabal install --flags=executable.

I ran into one snag with this test: the build wouldn’t complete with GHC 6.10.3 on the PowerBook G4 due to some problem with pcre-light (which tends to give me headaches on pretty much every platform). This particular headache2 I was unable to resolve and had to run the test using GHC 6.10.1 on the PowerBook G4.

fad test suite

Finally I did a runtime performance (as opposed to compilation) benchmark: running the test suite of the fad library. The Git repo is git://github.com/bjornbm/fad.git and the commit used was cd2965a6741291570930e4bf6e9f8f9ab64ccadd. I ran ghc --make Test and then time ./Test.

Anonymous #2: By using Elastic Block Storage (EBS) you can get persistent state pretty transparently (except for the root volume). But you would still want to just leave your instance up during the entire programming session.