Random Number Generator data and analysis

Posted by Ea! on 10/10

I've done some testing for the randomness of our random number
generator. This message is fairly technical in nature, if you get lost
reading it, the bottom line is that our random number generator is more
than adequate for the mud.
There's three methods available for us to use in generating random
numbers. The first, which we use, is the rand() function. It's got a
lot of advantages -- it's fast, it's widely available (code portability
is one of the long term goals of the coding department), and it's easy
to use.
The second method is to read numbers from /dev/random. /dev/random is
a source of random numbers that exists under Linux (and a few other
operating systems) that is quite good at generating really random
numbers. It's main disadvantage is that it's slow. Really slow. Too
slow for use with the mud, most likely. It's also not as widely
available.
The third method is to read random numbers from /dev/urandom -- a
psuedo-random number generating source that's one step better than
rand() and faster than /dev/random.
I ran a file of random numbers generated by the three sources (the
calls to rand() were done in a similar way to how the code uses it, the
others were just generated directly) through a program that's designed
to determine the "randomness" of the data. Since truely random numbers
can show sequences, what it's really determining is the entropy in the
data.
First, rand():
>Entropy = 7.999977 bits per byte.
>
>Optimum compression would reduce the size
>of this 7649861 byte file by 0 percent.
>
>Chi square distribution for 7649861 samples is 238.63, and randomly
>would exceed this value 75.00 percent of the times.
>
>Arithmetic mean value of data bytes is 127.4981 (127.5 = random).
>Monte Carlo value for Pi is 3.140184600 (error 0.04 percent).
>Serial correlation coefficient is 0.000308 (totally uncorrelated = 0.0).
And /dev/urandoms:
>Entropy = 7.999976 bits per byte.
>
>Optimum compression would reduce the size
>of this 7475200 byte file by 0 percent.
>
>Chi square distribution for 7475200 samples is 247.97, and randomly
>would exceed this value 50.00 percent of the times.
>
>Arithmetic mean value of data bytes is 127.4452 (127.5 = random).
>Monte Carlo value for Pi is 3.145153652 (error 0.11 percent).
>Serial correlation coefficient is -0.000082 (totally uncorrelated = 0.0).
And /dev/randoms (with a smaller file -- the larger the file, the
more acurate the numbers are likely to be).
>Entropy = 7.999884 bits per byte.
>
>Optimum compression would reduce the size
>of this 1638925 byte file by 0 percent.
>
>Chi square distribution for 1638925 samples is 263.72, and randomly
>would exceed this value 50.00 percent of the times.
>
>Arithmetic mean value of data bytes is 127.6365 (127.5 = random).
>Monte Carlo value for Pi is 3.140514142 (error 0.03 percent).
>Serial correlation coefficient is 0.001704 (totally uncorrelated =
>0.0).
The best of the tests is the Chi square distribution: 50% is "random",
0% or 100% is totally non-random. Anything between 10% and 90% is
basically random.
I'd say that rand() is good enough for what we're doing -- the extra
speed is more helpful to the mud than the extra little bit of
randomness that we would get by switching to either of the other
methods. Unless we were doing cryptography that's sensitive to bad
random numbers, I couldn't see any real advantage to switching.
-Ea!

From: Mac
Friday, October 06 2000, 06:10AM
I did my own lil test, from about 50 pretyped scan/cointoss's
to try duplicate the constant delay between random calls as
you'd get in pk. I got runs of 5, 7 and 8, I don't know
if that's unusual in the real world but I had noticed runs
like this in the past, I would even deliberately wait for
my skill delay to pass before executing the same command
again if the previous attempt failed.

From: Danar
Friday, October 06 2000, 03:08PM
The runs you found are statistically unimportant, Mac.

From: Infidel
Friday, October 06 2000, 03:27PM
WTF? statistically unimportant? this isn't being done
for fun but to locate a problem that's been around for
ages, of course when you run a random # generator over
time it'll even out but these runs are significant and
as far as I know, hasn't been checked for.

From: Mac
Friday, October 06 2000, 04:31PM
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm
If you can't reproduce runs of 7 and 8 out within 50 or so
attempts as I did, try another runs test at the same intervals
as the MUDs skill lag. It sure don't seem right to me.

From: Ea!
Friday, October 06 2000, 05:41PM
The program I used to verify the randomness does, in fact, check
for runs (runs are a lack of entropy).
Just to give numbers, the data set I was looking at was about
equivilent to slightly over 7 million coin tosses.
Frankly, we pretty much directly call the standard random function
in Linux. If there's a problem, I'd say that it's over 95% likely
that it's a bug in the Linux libraries and not in the mud code.
The test I ran are pretty convincing mathematically.
-Ea!

From: Ea!
Friday, October 06 2000, 05:54PM
One other thing to mention -- the random number generator is called
so frequently within the code that any runs that you're seeing aren't
going to be proper runs -- for each dieroll, there are tons of calls
in between.
However, the way that the random number generator works, there's a chance
that the random numbers would be in a sequence -- when the list of random
numbers got used up, it would start over again.
Now, the odds of this coinciding with ticks, pulses, or any other
measurment within the code must be astronomically low. However, as there
was a theoretical chance, I made an adjustment to the code to have
it use a different set of random numbers when the random number list is
used up. That's the change to the RNG that went in earlier
today. It now never re-uses the same random seed.
(That's a really simplistic and cut down description of some of the
internals of rand(), quite probably not even the implementation used
on modern computers, but since it was a chance, I figured it was worth
changing.)
-Ea!

From: Craven
Friday, October 06 2000, 09:03PM
well, I don't know if theres a problem with the generator or not,
but from the few times I've helped test things on test muds or
played really early in the monring etc, the fewer people on, the more
things tend to be -normal-. So maybe its like you say, since its
accessed so often by the time you roll your 2nd round your back to the
same place you were last round and you get the same number. I don't
know but it sure seems like the fewer people accessing it, the more
random it is.

From: Mac
Saturday, October 07 2000, 02:09AM
Well, the thing I'm suspicious of is that the random function
isn't based on previous calls but the computers clock alone
meaning it wouldn't matter how many times its called inbetween
and that if you call it at specific intervals the seed used
may have certain characteristics that can produce similar
results to those at the last or even recent intervals.

From: Ea!
Saturday, October 07 2000, 04:51PM
rand() is not based soley on the clock of the computer.
-Ea!

From: Christopher
Tuesday, October 10 2000, 04:34AM
Frankly I wouldn't care if the random numbers were generated
by Kaige picking a number from 1-10.
I've not a problem with number gens.
I've personally seen one of my characters
knock down an opponent every headbutt
and the next opponent (same mob type)
makes me miss every headbutt