Getting started with Google Caliper

Earlier this week, I came across a few references to a micro-benchmarking tool called Google Caliper. As I was working on enhancing the micro-benchmarking features in VLPT I thought this would be a good time to give another framework a try - no point in going to the effort of writing the enhancements if something else out there did the job really well after all.

It takes a little bit of legwork to get up and running with Caliper, so I thought I'd blog the journey in case it can help anyone else.

So its actually pretty simple - you can probably get here in less than 5 minutes if you dont have to go digging around. Sadly the User Guide on the google code wiki doesn't have much content so it does take a bit of work.

The linux advice looks sound, but I'm running on Win 7. The home directory maps to C:\Users\James for me, so that's where I need to create the .caliperrc file. I pasted in all three lines directly from the web page.

We dont have to make any changes to the run time configuration - just hit run again and wait for the test to complete. You should see an extra line at the end of the output:

Hitting the link should take you to the results (it took a little while to load for me) :

So that's pretty good - nice and straight forwards. What I really like about their results webapp are the box plots on the results.

They do a great job of explaining the various aspects on the online result page - basically you get five juicy facts in one lovely bar:

What was the quickest time the code executed in : this is the left most part of the T on the error bar

Three percentile markers at the 25, 50 and 75 percentiles : this is the pair of lighter coloured areas at the end of the bar

What was the worst time : the far right-hand part of the T on the error bar

My example code isn't great for highlighting these things as the results are so consistent. But for a real example (I'm going to do one for VL Logging appenders soon) it provides invaluable data:

The quickest time is especially relevant for Java - it likely represents the best case time your code takes to execute with all other aspects (hotspot, GC, etc) being minimised in the equation

The maximum is interesting but not so useful - there is a chance it represents the worst of all worlds, if a major GC occurred during that execution it'll be very high indeed

Their online results system has some other nice features for tracking changes to your benchmarks over successive runs as well, simply run the test again and the results will be shown alongside the previous runs.

One interesting aspect of their implementation is how many test runs they carry out per test. Its not 100% clear but I think its something along these lines (for the default MicrobenchmarkInstrument) :

A warm up period is defined, in which the test method will be called until that period has elapsed (instrument.micro.options.warmup=10s)

The warm up code tries to come up with a good estimate for the number of repetitions possible in 500ms (instrument.micro.options.timingInterval=500ms)

The test then runs in a loop which will exit when enough time has passed, or if we have collected the target number of results (instrument.micro.options.maxTotalRuntime=15s, instrument.micro.options.reportedIntervals=9)

Each loop runs with a slightly different number of repetitions (a factor of one of these : 0.7, 0.9, 1.1, 1.3) each time

The test loop can also exit early if the normalised standard deviation of the results is less than a tolerance setting (instrument.micro.options.shortCircuitTolerance=0.01)

So you end up with 9 results, based on the average of a variety of runs with different numbers of repetitions, and you know that each run will take around 15 seconds.

Cloning from Git

I might have picked a bad day to hit the bleeding edge, as it looks like the project is transitioning from 'old' Caliper to 'new' Caliper. Things of note:

The results are available as raw json output files, but I couldn't see a way to turn them into a report - it'd be great if there was a way of generating a static html report locally, especially for sensitive internal performance testing results

For now I'll be sticking with the Maven 0.5rc1 version, but the progress on the trunk looks good once they've wired everything back together in the 'new' version.

Where to go from here

The Command Line Options page is a good one to read through to get an idea of the different features on offer - I've not tried them all so its not 100% obvious if they'll work with 'new' or 'old'

The approach of starting a new VM for each test is a really great way of ensuring clean results - it also means you can provide different low level JVM parameters on each run

This could potentially be very handy for checking results on longer running tests when doing lower level tests, like GC tuning for example

The default number of runs is only 9 (defined in global.caliperrc) - you can change this by overriding the default in your ~/.caliperrc file:

# Caliper ultimately records only the final N measurements, where N is this value.
instrument.micro.options.reportedIntervals=9