7 thoughts on “Benchmarking mistakes, part three”

When testing two sets of code, or making subsequent test runs on the same code, using the same set of “random” values each time is desirable so that both runs process the same values. You can do this by specifying the seed to the RNG.

Depending upon the algorithm, I would think using a random seed would sometimes be a good thing and sometimes not. If a given sequence of random numbers will end up yielding the same general sequence of operations in the algorithms being tested, using a fixed seed may be good. On the other hand, if one is trying to compare the relative performance of e.g. two very different sorting algorithms, it would be possible that a fixed seed might consistently yield a sequence that usually favors one or disfavors the other. A random seed sometimes yield such sequences, of course, but a fixed seed that did so consistently would pose a bigger problem. I would suggest that even if one uses random seeds, one logs the values that are actually used to allow particular results to be reproduced (e.g. if one run of a program is excessively slow, retrying it with the same seed would reveal if it was an algorithmic problem or a system hiccup).

Sending random inputs to the algorithm is a type of fuzzing. Although it’s usually applied in security testing, it could apply quite well in performance testing (although I’ve never heard anyone mention it in that context). In both cases, the goal is to eliminate the biases that can be present in hand-written tests.

Normally, when doing such testing, you will still record the seed. If you are using truly random data (e.g., from random.org), you could store the associated generated entropy, or wrap your random generator and have the ability to record and replay random numbers.