Perfecting Simulations – The Quest for a Perfect Random Number
Generator

PRNG Evaluator Applet

The attached
applet generates a set of 10,000 floating point pseudo random numbers
(PRNs) between 0 and 10. Each
time the "Run Trial" button is pressed it creates a new trial
by generating a new set. The "Reset" button resets the number of
trials. These are used in calculating the % rejects (see below).
The top plot shows a histogram of the 10,000 data points.

The applet provides five different pseudo random
number generators (PRNGs). The "Uniform Distr" option uses a standard pseudo
random number generator from the Java language. The "Normal Distr"
option is also from Java but generates pseudo random numbers with
a Normal Distribution. "LFSR" stands for linear feedback shift
register. It’s a pseudo random number generator which has been
used in cell phone communications. The applet has three different
configurations for this PRNG.

All three systems are referred to as pseudo
random number generators since they generate numbers from
equations which repeat over a long period of time.

Three different hypothesis tests are set up so that rejecting Ho implies that the data is not
random. Since alpha is 5%, Ho should be rejected about
5% of the time. (See the write up at left for more information.)

Fig.2A shows the distribution of consecutive
first digit runs in the 10,000 PRNs. Fig. 2B is a time
plot showing the order in which the PRNs were generated. Patterns
in these plots indicate that the PRNs are not truly random.

In probability and statistics, simulations are a big
deal. They’re used for things like confirming theories and studying all
kinds of scientific phenomenon from social sciences to physics. These
simulations are often, somewhat romantically, referred to as Monte Carlo
simulations because, figuratively speaking, they depend on a roll of the
dice. More correctly, they depend on multiple rolls of the dice or the
generation of multiple random numbers.

In the casino, consistently getting desired results
depends largely on whether one is playing for the casino or playing for
one’s self. While the games may appear random they're actually biased in
favor of the casino. In Monte Carlo simulation bias is deadly. Even a
subtle pattern in the random number generator (RNG) can cause significant
errors. Ironically, simulations with excellent randomness in their RNGs
have the advantage.

Large scale Monte Carlo simulations have huge
appetites for random numbers. For example, Monte Carlo techniques can
simulate the seemingly random motion of oxygen molecules diffusing through
plastic food packaging. Tracking the behavior of particles on a molecular
level can require supercomputers to generate billions of random numbers.

Obviously, rolling dice billions of times isn't an option and so computers rely on pseudo random number generators (PRNGs).
These use equations programmed into software that can generate
huge quantities of numbers in short periods of time. While the
numbers they generate appear random, in reality they aren't.

PRNGs start with a seed value and will eventually
repeat all the numbers they generate in exactly the same order. Putting in
the same seed value will give precisely the same set of random numbers. On
large scale Monte Carlo simulations, care has to be taken to make sure
that the PRNG cycle is significantly longer than the quantity of random
numbers needed or the pattern in the PRNG cycle can show up as an error
producing pattern in the simulation results.

Seed values for PRNGs are generally chosen based on
some type of random process. This can be as simple as using the time of
day in seconds when the simulation was started, or may involve using a
device which responds to random events in the environment, such as cosmic
rays.

One might ask why these other systems of generating
random numbers aren’t used instead of PRNGs. Time in seconds obviously has
to be used sparingly since it tends to constantly increase not to mention
follows a cycle. A background radiation detector is a better choice but
requires lots of care to set up and maintain. Its calibration can drift or
it can end up responding to non-random events like radon levels. These can
be altered by simply opening or closing a window.

PRNGs are generally better from a setup and
maintenance standpoint than other systems. They are also superior for
debugging or verifying software. By keeping track of the seed value, it’s
possible to repeat a simulation result and evaluate if the result was due
to a software problem or the values of the pseudo random numbers used.

The attached applet contains five different types of
random number generators along with several different ways to evaluate
them. The Uniform Distr option uses a standard pseudo random number
generator from the Java language. The Normal Distr option is also from
Java but generates pseudo random numbers with a Normal Distribution. LFSR
stands for linear feedback shift register. It’s a PRNG which has been used in cell phone communications.

LFSRs are made up primarily of digital devices called
shift registers. Each shift register can hold a value of either zero or
one. The reading from each of the registers can be combined into a binary
number and converted into base 10 PNRs. The Basic LFSR uses the bits from
the SRs in the order they’re connected. The LFSR Modified uses the bits
from the SRs in random order and the LFSR Dual 64s generates PRNs
alternately from two different LFSRs with 64 SRs in each. These two LFSRs
are set up with different initial values.

Note that the number of SRs can be altered with the
“Adjust LSFS” button. This was done to illustrate the effects of having a
short repeat rate. The repeat cycles of the various options are as
follows:

SRs Repeat Cycle

4 14

8 255

10 1023

16 65536

32 Approx 4.29 x 10^9

64 Approx 1.84 x 10^19

The low number SRs have a loss of resolution
since very few bits are used when deriving the PRNs from them.

Three of the ways to evaluate PRNG use hypothesis
tests to evaluate the shape, spread, and central tendency of the
distribution generated by the PRNGs. These respectively use chi squared, f, and z statistics.

In each case the null hypothesis is
that the numbers are truly random. Rejecting the null is evidence that the
numbers are not generated by a random source.

Alpha is 5 % for all the hypothesis tests. This
implies that the null hypothesis would be rejected 5 % of the time even if
the numbers were generated by a true random number generator. A
consistently lower or higher reject rate is evidence that the PRNG does
not truly generate random numbers. In this respect the chi and f-test
reject rates are too low. However, this is probably only a reflection of
the fact that the range of numbers is constrained.

The applet generates sets of 10,000
consecutive pseudo random numbers and displays them in a histogram and
time plot. These are useful for spotting patterns. Bars in a histogram of
random numbers should show a variation in height but have no unusually
high or low bars. The time plot should be jagged and random looking.

Anyone who has studied coin flipping knows that the
way to detect fake data is to look at runs of heads. Typically, fakers
know that long runs are unlikely and incorrectly eliminate them. In
“Benford’s Law Part 1 – How to Spot Fraud” we found out that the first
digit of real data is not randomly distributed but the first digit of
random numbers is randomly distributed. Hence having a uniform
distribution of first digit runs is a powerful sign that numbers are
indeed random.

Figure 1) Applet Output for Standard
Java PRNG :

Z-test failure rate = 8.7 %, There is no
apparent pattern in either the first digit repeats or time
plot.

Figure 1) Applet Output for 64 SR LFSR PRNG :

Z-test failure rate = 36 %, There are
patterns in both the first digit repeats and time
plot.

Figure 3) Applet Output for Improved 64
SR LFSR PRNG :

Z-test failure rate = 33.6 %, There is no
apparent pattern in either the first digit repeats or time
plot.

The applet has a plot showing
different length runs of first digits. In this case the first digit is
defined as the digit in the ones place. For example, the first digit of
0.954 would be a zero while the first digit of 9.540 would be a 9. The
first digit runs for random numbers would be evenly distributed among the
digits. Run lengths of three (blue bars) should be more frequent than run
lengths of four (purple bars) and they should be more frequent than run
lengths of five (red bars).

Play with the applet and it quickly
becomes apparent that there’s no all-inclusive test for screening random
numbers. A PRNG can look great in one test and terrible in another.
Figure 1 shows that while the standard Java PRNG shows no apparent pattern
in the first digit runs or time plot it fails the z-test 8.7% of the time
instead of the expected 5% rate.

Figure 2 shows that the basic 64 SR LFSRs
first digit repeat plots have excessive repeats for 0 and 9 but no repeats
for any other digits. It also show a pattern in the time plot as well as
failing the z-test an excessive 36% of the time. In the basic Figure 3
shows that the LFSR system can be greatly improved by combining the binary
values of the SRs in random order. While this improves performance, the
improved LFSR system still fails the z-test an excessive 33.6% of the
time. The applet clearly illustrates that true randomness is hard to find.

Unfortunately, the problem of finding useful PRNGs
and confirming their randomness is even more complicated than our applet
indicates. Remember, the applet only looks at 10,000 consecutive numbers
at a time. Compare this to a cycle of over 1.8 x 10^19 for a 64 SR LFSR
and it’s clear that the applet’s evaluation has only scratched the
surface. If we could generate a billion random numbers a second, which far
exceeds the rate of even high end desktop computers, it would take roughly
585 years to generate the entire cycle! Even in University and government
research centers, finding and evaluating PRNG for Monte Carlo Simulations
is a challenge.