Strong Cryptography Using Linux's Random Number Generator

When building secure systems, having a source of random numbers is essential. Without it, most cryptographic systems break down and the privacy and authenticity of communications between two parties can be subverted. We cover why secure systems require random numbers in more details in this blog post link. CloudFlare’s servers require a good source of randomness for authentication and to assure perfect forward secrecy in SSL.

Random numbers are hard to come by on a computer. Internally, computers are deterministic machines that follow instructions and are required to do so in a predictable manner. Uncertainty and unpredictability are not built in: there is no easy way to tell a computer to go flip a coin. Randomness inside a computer has to come from its interactions with the outside world.

Consumer computers and mobile devices have a number of sensors that provide unpredictable input. The timing of keystrokes and mouse movements of a user will have some degree of randomness if measured closely enough. Noise from microphones and cameras can also provide a lot of randomness. Mobile devices have even more sources including fluctuating wifi signals, motion sensor and GPS information.

Most of these sensors are not available on servers where random numbers are needed most. This is especially true for servers that run in virtualized environments that might not have access to a precise system clock. For CloudFlare’s servers, we currently rely on the random number generator built into the Linux operating system.

Linux is one of the most popular operating systems in the world. It serves as the operating system for everything from the web servers and data centers of many the largest sites in the world (Google, Facebook, Amazon, Apple, etc.), to desktop computers (Ubuntu, Chrome OS, etc.) to embedded devices (smart TVs, Android, etc.). CloudFlare’s software is built on the solid foundation of the Linux operating system kernel.

Linux itself provides a random number service so that any program has access to random numbers at any time. Luckily for us, Linux is open source software and we can learn how it works by reading the code.

Entropy and Randomness

Not all randomness is created equally. For computer security, random numbers need to be hard to guess. The predictability of numbers is quantified in a measure called entropy.

A balanced coin toss provides one bit of entropy. An unbalanced coin toss provides less than one bit, since it’s much easier to guess when you know the bias. Flipping a coin with heads on both sides provides no entropy, since the result of a coin toss can be guessed with absolute certainty.

Entropy is distinct from statistical randomness. Looking at the statistical properties of a stream of numbers does not guarantee that the stream contains any entropy. For example, the digits of Pi look random by almost any statistical measure, but contain no entropy since there is a well known formula to calculate them and perfectly predict the next value.

Similarly, large numbers do not always have high entropy. You can take a small random number and turn it into a large random number and the entropy remains the same. For example, take a random number from 1 to 16 and compute its cryptographic hash with an algorithm like SHA-1. The resulting 160 bit number looks very random, but it is only one of only 16 possible such numbers. Guessing the number is just as easy as guessing a random number from 1 to 16.

For cryptographic keys, the amount of entropy used to create it is tied to how hard it is to guess. A 128 bit key created from a source with 20 bits of entropy is no more secure than a 20 bit key. A good source of entropy is necessary to create secure keys.

Take a Dip in the Pool

On Linux, the root of all randomness is something called the kernel entropy pool. This is a large 4096 bit number kept private in the kernel’s memory. There are 2^4096 possibilities for this number so it can contain up to 4096 bits of entropy depending on how guessable it is.

Numbers derived from this entropy pool will look random to any observer. The more numbers derived from the pool an observer sees, the more information they have about the pool. This makes guessing the entropy pool number easier to an outsider. Deriving random numbers and exposing them to the world therefore reduces the entropy of the pool. Conversely, changing the entropy pool number with a random transformation will make it less guessable and increase its entropy.

This is the key to how random number generation works on Linux. If randomness is needed, it’s derived from the entropy pool. When available, other sources of randomness are used to stir the entropy pool and make it less predictable. I will describe the details of this below.

The kernel keeps a rough estimate of the number of bits of entropy in the pool. You can check the value of this estimate through the following command: cat /proc/sys/kernel/random/entropy_avail.

A healthy Linux system with a lot of entropy available will have return close to the full 4096 bits of entropy. If the value returned is less than 200, the system is running low on entropy.

The Kernel is Watching You

I mentioned that the system takes other sources of randomness and uses this to stir the entropy pool. This is achieved using something called a timestamp.

Most systems have precise internal clocks. Every time that a user interacts with a system, the value of the clock at that time is recorded as a timestamp. Even though the year, month, day and hour are generally guessable, the millisecond and microsecond are not and therefore the timestamp contains some entropy. Timestamps obtained from the user’s mouse and keyboard along with timing information from the network and disk each have different amount of entropy.

How does the entropy found in a timestamp get transferred to the entropy pool? Simple, use math to mix it in.

Just Mix It In

A fundamental property of entropy is that it mixes well. If you take two unrelated random streams and combine them, the new stream cannot have less entropy. Taking a number of low entropy sources and combining them results in a high entropy source.

That means you can add timing entropy to an entropy pool with the right combination function. One of the simplest such functions is the logical exclusive or (XOR).

Even if one source of bits does not have much entropy, there is no harm in XORing it into another source. Entropy always increases. In the Linux kernel, a combination of XORs is used to mix timestamps into the main entropy pool.

Generating Random Numbers

Cryptographic applications require very high entropy. If a 128 bit key is generated with only 64 bits of entropy then it can be guessed in 2^64 attempts instead of 2^128 attempts. That is the difference between needing a thousand computers running for a few years to brute force the key versus needing all the computers ever created running for longer than the history of the universe to do so.

Cryptographic applications require close to one bit of entropy per bit. If the system’s pool has fewer than 4096 bits of entropy, how does the system return a fully random number? One way to do this is to use a cryptographic hash function.

A cryptographic hash function takes an input of any size and outputs a fixed size number. Changing one bit of the input will change the output completely. Hash functions are good at mixing things together. This mixing property spreads the entropy from the input evenly through the output. If the input has more bits of entropy than the size of the output, the output will be highly random. This is how highly entropic random numbers are derived from the entropy pool.

The hash function used by the Linux kernel is the standard SHA-1 cryptographic hash. By hashing the entire pool and and some additional arithmetic, 160 random bits are created for use by the system. When this happens, the system lowers its estimate of the entropy in the pool accordingly.

Running Out of Entropy

One of the dangers of a system is running out of entropy. When the system’s entropy estimate drops to around the 160 bit level, the length of a SHA-1 hash, things get tricky.

Linux exposes two interfaces for random data that behave differently when the entropy level is low. They are /dev/random and /dev/urandom. When the entropy pool becomes predictable, both interfaces for requesting random numbers become problematic.

When the entropy level is too low, /dev/random blocks and does not return until the level of entropy in the system is high enough. This guarantees high entropy random numbers. If /dev/random is used in a time-critical service and the system runs low on entropy, the delays could be detrimental to the quality of service.

On the other hand, /dev/urandom does not block. It continues to return the hashed value of its entropy pool even though there is little to no entropy in it. This low-entropy data is not suited for cryptographic use.

The solution to the problem is to simply add more entropy into the system.

Hardware Random Number Generation to the Rescue?

Intel’s Ivy Bridge family of processors have an interesting feature called “secure key” . These processors contain a special piece of hardware inside that generates random numbers. The single assembly instruction RDRAND returns allegedly high entropy random data derived on the chip.

It has been suggested that Intel’s hardware number generator may not be fully random. Since it is baked into the silicon, that assertion is hard to audit and verify. As it turns out, even if the numbers generated have some bias, it can still help as long as this is not the only source of randomness in the system. Even if the random number generator itself has a back door, the mixing property of randomness means that it cannot lower the amount of entropy in the pool.

On Linux, if a hardware random number generator is present, the Linux kernel will use the XOR function to mix the output of RDRAND into the hash of the entropy pool.

Third-Party Entropy Generators

Hardware number generation is not available everywhere, and the sources of randomness polled by the Linux kernel itself are somewhat limited. For this situation, a number of third party random number generation tools exist. Examples of these are haveged http://www.issihosts.com/haveged/, which relies on processor cache timing, audio-entropyd (http://www.vanheusden.com/aed/) and video-entropyd (http://www.vanheusden.com/ved/) which work by sampling the noise from an external audio or video input device. By mixing these additional sources of locally collected entropy into the Linux entropy pool, the entropy can only go up.

A Diversity of Sources

The main thing to understand is that better randomness comes through diversity. Taking a variety of sources of random data and mixing them together results in better random numbers. For servers, this should include data local to the machine (hardware random number generator, network timing) along with sources derived externally in a safe location.

Looking Ahead

In addition to the sources described above, there are many sources of random numbers to be harvested. These include lava lamps, space noise and the quantum properties of light. CloudFlare is working on a system to ensure high quality random numbers to all of our servers by adding new sources into the system Linux currently provides. As these systems come online over the coming months, we will share the details with the community.