Dear list,
during past few days I spent a lot of time trying to figure out how to write Criterion benchmarks,
so that results don't get skewed by lazy evaluation. I want to benchmark different versions of an
algorithm doing numerical computations on a vector. For that I need to create an input vector
containing a few thousand elements. I decided to create random data, but that really doesn't
matter - I could have as well use infinite lists instead of random ones.
My problem is that I am not certain if I am creating my benchmark correctly. I wrote a function
that creates data like this:
dataBuild :: RandomGen g => g -> ([Double], [Double])
dataBuild gen = (take 6 $ randoms gen, take 2048 $ randoms gen)
And I create benchmark like this:
bench "Lists" $ nf L.benchThisFunction (L.dataBuild gen)
The question is how to generate data so that its evaluation won't be included in the benchmark. I
already asked this question on StackOverflow (
http://stackoverflow.com/questions/12896235/how-to-create-data-for-criterion-benchmarks#comment17466915_12896235 )
and got answer to use evaluate + force. After spending one day on testing this approach I came
to conclusion that doing this does not seem to influence results of a benchmark in any way (I did
stuf like unsagePerformIO + delayThread). On the other hand I looked into sources of criterion
and I see that the benchmark code is run like this: evaluate (rnf (f x))
I am a Haskell newbie and perhaps don't interpret this correctly, but to me it looks as though
criterion did not evaluate the possibly non-evaluated parameter x before running the benchmark,
but instead evaluates the final result. Can someone provide an explanation on how this exactly
works and how should I write my benchmarks so that results are correct?
Janek