Build Your Own Simple Random Numbers

Liam O’Connor got me thinking about the best way to explain the idea of a pseudo-random number generator to new programmers. This post is my answer. If you already understand them, there won’t be anything terribly new here. That said, I enjoy clean examples even for easy ideas, so if you do too, then read on!

Note: The title may have caused some confusion. I’m not suggesting you use the trivial algorithms provided here for any purpose. Indeed, they are intentionally over-simplified to make them more understandable. You should read this as an explanation of the idea of how generating random numbers works, and then use the random number generators offered by your operating system or your programming language, which are far better than what’s provided here.

The Problem

Suppose you’re writing a puzzle game, and you need to choose a correct answer. Or suppose you are writing a role-playing game, and need to decide if the knight’s attack hits the dragon or deflects off of its scales. Or you’re writing a tetris game, and you need to decide what shape is going to come next. In all three of these situations, what you really want is a random number. Random numbers aren’t the result of any formula or calculation; they are completely up to chance.

Well, here’s the sad truth of the matter: computers can’t do that. Yes, that’s right. Picking random numbers is one of those tasks that confound even the most powerful of computers. Why? Because computers are calculating machines, and we just said that random numbers aren’t the result of any calculation!

Of course, you’ve probably played games on a computer before that seem to pick numbers at random, so you may not believe me. What you’re seeing, though, aren’t really random numbers at all, but rather pseudo-random numbers. Pseudo-random numbers are actually the result of a mathematical formula, but one designed to be so complicated that it would be hard to recognize any pattern in its results!

Writing a Pseudo Random Number Generator

A lot of smart people actually spend a lot of time on good ways to pick pseudo-random numbers. They try a bunch of different complicated formulas, and try to make sure that patterns don’t pop up. But we can build a simple one pretty easily to pick pseudo-random numbers from 1 to 10. Here it is, in the programming language Haskell:

random i = 7 * i `mod` 11

Since it’s a function, it needs to have an input. It then multiplies that input by 7, and then finds the remainder when dividing by 11. We’ll give it the previous number it picked as input, and it will give us back the next one. Suppose we start at 1. Then we get the following:

Let’s look at the range of answers. Since the answer is always a remainder when dividing by 11, it’ll be somewhere between 0 and 10. But it should be pretty easy to convince ourselves that if the number we give as input is between 1 and 10, then 0 isn’t a possibile answer: if it were, then we’d have found two numbers, both less than 11, that multiply together to give us a multiple of 11. That’s impossible because…. 11 is prime. So we’re guaranteed that this process picks numbers between 1 and 10. It seems to pick them in a non-obvious order with no really obvious patterns, so that’s good. We appear to have at least a good start on generating random numbers.

Notice a couple things:

We had to pick somewhere to start. In this case, we started out by giving an input of 1. That’s called the seed. If you use the same seed, you’ll always get the exact same numbers back! Why? Because it’s really just a complicated math problem, so if you do the same calculation with the same numbers, you’ll get the same result.

To get the next number, we have to remember something (in our case, the last answer) from the previous time. That’s called the state. The state is important, because it’s what makes the process give you different answers each time! If you didn’t remember something from the last time around, then you’d again be doing the same math problem with the same numbers, so you’d get the same answer.

Doing Better By Separating State

Unfortunately, our random number generator has a weakness: you can always predict what’s coming next, based on what came before. If you write tetris using the random number generator from earlier, your player will soon discover that after a line, they always get an L shape, and so on. What you really want is for your game to occasionally send them a line followed by a T, or even pick two lines in a row from time to time!

How do we do this? Well, the next answer that’s coming depends on the state, so our mistake before was to use the previous answer as the state. The solution is to use a state that’s bigger than the answer. We’ll still be looking for random numbers from 1 to 10, but let’s modify the previous random number generator to remember a bigger state. Now, since state and answer are different things, our random function will have two results: a new state, and an answer for this number.

That says take the input, multiply by 7, and find the remainder mod 101. Since 101 is still prime, this will always give answers from 1 to 100. But what we really wanted was a number from 1 to 10, just like the one we had before. That’s fine: we’ll just take the ones place (which is between 0 and 9) and treat 0 as 10. The tens place doesn’t really change the answer at all, but we keep it around to pass back in the next time as state.

Excellent! Now instead of going in a fixed rotation, some numbers are picked several times, and some haven’t been picked yet at all (but they will be, if we keep going), and you can no longer guess what’s coming next just based on the last number you saw. In this random number generator, the seed was still 1, and the state was a number from 1 to 100.

People who are really interested in good random numbers sometimes talk about the period of a pseudo-random number generator. The period is how many numbers it picks before it starts over again and gives you back the same sequence. Our first try had a period of 10, which is rather poor. Our second try did much better: the period was 100. That’s still pretty far off, though, from the random number generators in most computers, the period of which can be in the millions or billions.

Real World Pseudo-Random Number Generators

Our two toy pseudo-random number generators were fun, but you wouldn’t use them in real programs. That’s because operating systems and programming languages already have plenty of ways to generate pseudo-random numbers. And those were created by people who probably have more time to think about random numbers than you do! But some of the same ideas come up there. For example, consider this (specialized) type signature for the random function in the Haskell programming language:

random :: StdGen -> (Int, StdGen)

Look familiar? StdGen is the state, and choosing a random Int gives you back the Int, and a new StdGen that you can use to get more pseudo-random numbers! Many programming languages, including Haskell, also have “global” random number generators that remember their state automatically (in Haskell, that is called randomIO), but under the covers, it all comes down to functions like the ones we’ve written here… except a lot more complex.

Where To Get a Seed

We’ve still left one question unanswered: where does the seed come from? So far, we’ve always been using 1 for the seed, but that means that each time the program runs, it will get the same numbers back. So we end up with a similar situation to what we saw before, where players will realize that a game starts with the same sequence of random events each time.

To solve this problem, the seed should come from somewhere that won’t be the same each time. Here are two different ways to seed a random number generator.

Mostly, pseudo-random number generators are seeded from a clock. Imagine if you looked at the second hand on a clock, used it to get a number from 1 to 60, and used that for your seed. Then the game would only act the same if it started at the same number of seconds. Even better, you could take the number of seconds since some fixed time in the past, so you’d get an even bigger difference in seeds. (Entirely by coincidence, computers often use the number of seconds since January 1, 1970.)

You might try to get a good seed from details of the way the user uses your program. For example, you can look at the exact place the user first clicks the mouse, or exactly how much time passes between pressing keys. They will most likely not be exact, and click a few pixels off or type ever so slightly slower, even if they are trying to do exactly the same thing. So, again, you get a program that acts differently each time. This is called using entropy.

Most of the time, using the computer’s built-in clock is okay. But suppose you’re making up a code word. It would be very bad if someone could guess your code word just by knowing when you picked it! (They would also need to know how your computer or programming language picks random numbers, but that’s not normally kept secret; they can probably find that out pretty easily.) Computer security and privacy often depends on picking unpredictable random numbers — ones that people snooping on you won’t be able to guess. In that case, it’s important that you use some kind of entropy, and not just the clock. In fact, when security is at stake, you can use entropy to modify the state as well, to make sure things don’t get too predictable. Most operating systems have special ways of getting “secure” random numbers that handle this for you.

Another example of entropy:

If you play the game Dragon Warrior for the Nintendo, but use an emulator instead of a real Nintendo, then you can save a snapshot of your game before you fight a monster, memorize what the monsters are going to do, and figure out exactly the right way to respond. When you load the game from the snapshot and try again, as long as you do the same things, the monster will respond in exactly the same way! That’s because the snapshot saves the state of the random number generator, so when you go back and load from the snapshot, the computer picks the same random numbers. So if a fight against a monster is going well but you make a disastrous move at the end, you can load your snapshot and repeat the exact same fight up to that point.

The same trick doesn’t work in Dragon Warrior 2 (or later ones), though! Why not? Because the company that makes the game started using entropy in their sequel. So now little things like exactly how long you wait between pressing buttons will change the game. Since you can’t possibly time everything exactly the same down to hundredths or thousandths of a second, the task is hopeless, and you have to just take your chances and trust to luck.

So as you can see, random numbers can become a very tricky topic. But ultimately it’s all just a complicated formula, a seed, and a state.

Like this:

Related

5 Comments

Concerning video game emulation:
Tool assisted speed runs use the entropy in actions to their advantage. By recording the input and subtling adjusting timing, they can do things like always get a critical hit, or control what items are dropped. See: http://tasvideos.org/901M.html for an example.

If you are wanting a good PRNG that has passed a huge battery of statistical tests, look no further than Mersenne Twister.

However, and an important point this article fails to mention, is that PRNGs are NOT good enough for areas where the security and secrecy of the numbers is critical to proper operation. Generating cryptographic keys for conducting secure communications is a good example. Your starting seed(s) have to come from reliably random sources and each new number can’t be predicted if any of the previous sequence has been compromised and none of the previous sequence should be predictable if the current sequence is compromised. This is the domain of a CSPRNG.

However, finding source code for a CSPRNG is tough. This looks promising though: