4/8/2013

I recently (December 2012) switched jobs from being an Engine Programmer at Insomniac Games, to being a Software Engineer at Google. At Google, I’m now working on Skia, which is the 2D graphics library that underlies Chrome, Firefox and parts of Android. One of the first things that came to my attention when I arrived was the pseudorandom number generator (PRNG) used to generate test cases for regressions — namely, that there was some concern that it wasn’t very random. In particular, the least significant bit flips back and forth, and possibly other lower-order bits weren’t very random. “Aha,” I thought, “Here’s something small and isolated that I can take on as a new hire. And it won’t take long; I can just draw on the random number chapter from our book.” Turns out that was partially right.

The first step was determining how good or bad the original PRNG was. Looking at it, it was clear that it was a linear congruential generator (LCG), which do have notoriously poor randomness in lower-order bits. They also fail the spectral test, i.e. if you generate points in space they tend to fall along visible planes. Not so good for generating tests for a graphics library. But how to verify this?

In the book I discuss some batteries of tests, such as Marsaglia’s Diehard and Brown’s DieHarder. In doing some research I found a couple more. First, there’s TestU01, which is a more extensive suite released in 2009. And Marsaglia and Tsang have created a smaller set of tests called tuftests, which they claim are nearly as rigorous as the large suites — i.e. if a PRNG passes tuftests, it’s highly likely to pass Diehard. For our case, we aren’t trying to find the ultimate PRNG, just something fast and reasonably random. So tuftests should suffice.

Tuftests consists of only three tests: the GCD test (as opposed to the GDC test, which probably involves giving a coherent talk after a night of parties), the Gorilla test, and the Birthday Spacings test. The GCD computes a large series of two pseudorandom variables, finds their greatest common denominators, and then compares the distribution of cycles necessary to compute the GCDs to the cycles needed for truly random variables. The Gorilla test is a variant of the Monkey test, where random strings of bits are generated using a single bit position of a random variable. This is done multiple times. The count of strings not generated should fall in a normal distribution. The Birthday Spacings test generates a set of birthdays in a “year”, and then determines the spacings between them, which should fall in a Poisson distribution. For all of these, the generated distribution is compared with the expected distribution, using chi square, Anderson-Darling or other method.

Testing the LCD used by Skia, it was clear that it failed the Gorilla test for all but the highest order bits (it may have also failed the Birthday test, I don’t recall now). So the belief that Skia’s PRNG needed improvement was verified. The next step was to find a decent replacement that doesn’t require a lot of overhead (important for testing on mobile devices). And I’ll cover my search for that next time.

I have not yet moved the golden master image tests and the benchmark tests over to the new random number generator, as that will involve some pain to recalibrate them (using the new RNG will produce different images and times). However, I did update the unit tests, and it exposed a couple of bugs that hadn’t previously been found, because the original RNG didn’t produce broad enough coverage. So it’s been worth it, I think.

I remember trolling through the standard includes at Google and looking at the state of FP code there. They did have a Mersenne Twister (MT) implementation so most of my objections to Linear Congruential generator was set aside: “If you need good numbers, use the MT code”. Good to know there are people still questioning the fundamentals.

Moving from an LCG to an XorShift will move you from a dependency chain length of 2 to 6. The upside is you’ll go from failing most (if not all) of SmallCrush to passing more. 32-bit SHR3 (13,17,5) fails 1,2,6,10. Adding an integer counter (via xor or add)..like an integer Weyl generator increases the dependency chain to 7, but you’ll pass all of SmallCrush and fail Crush 17,61,72. The 64-bit XorShift fails 1 SmallCrush test #8 and adding a counter will make it passes both SmallCrush and Crush. MT will likewise fail a couple of Crush tests (fixable in the same manner)..and cost quite a bit more in time and memory.