A while back, a couple of BitFolk customers mentioned to me that they were having problems running out of entropy.

A brief explanation of entropy as it relates to computing

Where we say entropy, we could in layman’s terms say “randomness”. Computers need entropy for a lot of things, particularly cryptographic operations. You may not think that you do a lot of cryptography on your computer, and you personally probably don’t, but for example every time you visit a secure web site (https://…) your computer has to set up a cryptographic channel with the server. Cryptographic algorithms generally require a lot of random data and it has to be secure random data. For the purposes of this discussion, “secure” means that an attacker shouldn’t be able to guess or influence what the random data is.

Why would an attacker be able to guess or influence the random data if it is actually random? Because it’s not actually random. The computer has to get the data from somewhere. A lot of places it might be programmed to get it from may seem random but potentially aren’t. A silly implementation might just use the number of seconds the computer has been running as a basis for generating “random” numbers, but you can see that an attacker can guess this and may even be able to influence it, which could weaken any cryptographic algorithm that uses the “random” data.

Modern computers and operating systems generate entropy based on events like electrical noise, timings of data coming into the computer over the network, what’s going on with the disks, etc. fed into algorithms — what we call pseudo-random number generators (PRNGs). A lot of data goes in and a relatively small amount of entropy comes out, but it’s entropy you should be able to trust.

That works reasonably well for conventional computers and servers, but it doesn’t work so well for virtual servers. Virtual servers are running in an emulated environment, with very little access to “real” hardware. The random data that conventional computers get from their hardware doesn’t happen with emulated virtual hardware, so the prime source of entropy just isn’t present.

When you have an application that wants some entropy and the system has no more entropy to give, what usually happens is that the application blocks, doing nothing, until the system can supply some more entropy. Linux systems have two ways for applications to request entropy: there’s /dev/random and /dev/urandom. random is the high-quality one. When it runs out, it blocks until there is more available. urandom will supply high-quality entropy until it runs out, then it will generate more programmatically, so it doesn’t block, but it might not be as secure as random. I’m vastly simplifying how these interfaces work, but that’s the basic gist of it.

What to do when there’s no more entropy?

If you’re running applications that want a lot of high-quality entropy, and your system keeps running out, there’s a few things you could do about it.

Nothing

Well, no, not really. If you’re running a busy site with a lot of HTTPS connections then you probably don’t want it to be waiting around for more entropy when it could be serving your users. Another one that tends to use all the entropy is secure email – mail servers talking to each other using Transport Layer Security so the email is encrypted on the wire.

Use real hosting hardware

Most of BitFolk’s customers are using it for personal hosting, this problem is common to virtual hosting platforms (it’s not a BitFolk-specific issue), and BitFolk doesn’t provide dedicated/colo servers, so arguably I don’t need to consider this my problem to fix. If the customer could justify greater expense then they could move to a dedicated server or colo provider to host their stuff.

Tell the software to use urandom instead

In a lot of cases it’s possible to tell the applications to use urandom instead. Since urandom doesn’t block, but instead generates more lower-quality entropy on demand, there shouldn’t be a performance problem. There are obvious downsides to this:

If the application author wanted high-quality entropy, it might be unwise to not respect that.

Altering this may not be as simple as changing its configuration. You might find yourself having to recompile the software, which is a lot of extra work.

You could force this system-wide by replacing your /dev/random with /dev/urandom.

Customers could get some more entropy from somewhere else

It’s possible to feed your own data into your system’s pseudo-random number generator, so if you have a good source of entropy you can help yourself. People have used some weird and wonderful things for entropy sources. Some examples:

The problem for BitFolk customers of course is that all they have is a virtual server. They can’t attach web cams and sound cards to their servers! If they had real servers then they probably wouldn’t be having this issue at all.

BitFolk could get some entropy from somewhere else, and serve it to customers

BitFolk has the real servers, so I could do the above to get some extra entropy. I might not even need extra entropy; I could just serve the entropy that the real machines have. If it wasn’t for the existence of the Simtec Electronics Entropy Key then that’s probably what I’d be trying.

I haven’t got time to be playing about with sound cards listening to static, webcams in boxes and things like that, but buying a relatively cheap little gadget is well within the limit of things I’m prepared to risk wasting money on.

Customers would need to trust my entropy, of course. They already need to trust a lot of other things that I do though.

Entropy Key

Entropy Keys are very interesting little gadgets and I encourage you to read about how they work. It’s all a bit beyond me though, so for the purposes of this series of blog posts I’ll just take it as read that you plug in an Entropy Key into a USB port, run ekeyd and it feeds high quality entropy into your PRNG.

I’d been watching the development of the Entropy Key with interest. When they were offered for cheap at the Debian-UK BBQ in 2009 I was sorely tempted, but I knew I wasn’t going to be able to attend, so I left it.

Then earlier this year, James at Jump happened to mention that he was doing a bulk order (I assume to fix this same issue for his own VPS customers) if anyone wanted in. Between the Debian BBQ and then I’d had a few more complaints about people running out of entropy so at ~£30 each I was thinking it was definitely worth exploring with one of them; perhaps buy more if it works.

How much entropy do I have anyway?

Before stuffing more entropy in to my systems, I was curious how much I had available anyway. On Linux you can check this by looking at /proc/sys/kernel/random/entropy_avail. I think this value is in bytes, and tops out at 4096. Not hard to plug this in to your graphing system.

Click on the following images to see the full-size versions.

Typical host server, no Entropy Key

Here’s what some typical BitFolk VM hosting servers have in terms of available entropy.

That’s pretty good. The available entropy hovers close to 4096 bytes all the time. It’s what you’d expect from a typical piece of computer hardware. The weekly view shows the small jitter:

The lighter pink area is the highest 5-minute reading in each 30 minute sample. The dark line is the lowest 5-minute reading. You can see that there is a small amount of jitter where the available entropy fluctuates between about 3250 and 4096 bytes.

Here’s a couple of the other host servers just to see the pattern:

No surprises here; they’re all much the same. If these were the only machines I was using then I’d probably decide that I have enough entropy.

Typical general purpose Xen-based paravirtualised virtual machine

Here’s a typical general purpose BitFolk VPS. It’s doing some crypto stuff, but there’s a good mix of every type of workload here.

These graphs are very different. There’s much more jitter and a general lack of entropy to begin with. Still, it never appears to reach zero (although it’s important to realise that these graphs are at best 5-minute averages, so the minimum and maximum values will be lower and higher within that 5-minute span) so there doesn’t seem to be a huge problem here.

Virtual machines with more crypto

Here’s a couple of VMs which are doing more SSL work.

This one has a fair number of web visitors and they’re all HTTPS. You can see that it’s even more jittery, and spends most of its time with less than 1024 bytes of entropy available. It goes as low as ~140 bytes from time to time, and because of the 5-minute sampling it’s possible that it does run out.

Again, this one has some HTTPS traffic and is faring worse for entropy, with an average of only ~470 bytes available. I ran a check every second for several hours and available entropy at times was as low as 133 bytes.

Summary so far

BitFolk doesn’t have any particularly busy crypto-heavy VMs so the above was the best I could do. I think that I’ve shown that virtual machines do have less entropy generally available, and that a moderate amount of crypto work can come close to draining it.

Based on the above results I probably wouldn’t personally take any action since it seems none of my own VMs run out of entropy, although I am unsure if the 133 bytes I measured was merely as low as the pool is allowed to go before blocking happens. In any case, I am not really noticing poor performance.

Customers have reported running out of entropy though, so it might still be something I can fix, for them.

Where next?

Next:

See what effect using an Entropy Key has on a machine’s available entropy.

Assuming it has a positive effect, see if I can serve this entropy to other machines, particularly virtual ones.

Can I serve it from a virtual machine, so I don’t have customers interacting with my real hosts?

Does one Entropy Key give enough entropy for everyone that wants it?

Can I add extra keys and serve their entropy in a highly-available fashion?

Those are the things I’ll be looking into and will blog some more about in later parts. This isn’t high priority though so it might take a while. In the meantime, if you’re a BitFolk customer who actually is experiencing entropy exhaustion in a repeatable fashion then it’d be great if you could get in touch with me so we can see if it can be fixed.

Good point, I forgot about that. I think that’s because a certain amount of entropy is required to create a process.

With 0.25 seconds between “cat” processes it made the available entropy jump around a lot but not really run out. I ran it with 0.1 seconds and I was able to bring available entropy down to hover around 0.

I have an ekey plugged into a machine now and when I tried it on that one, it stayed mostly around 4096. I saw it drop to 3084 for a fraction of a second before it went back up again. That’s encouraging, if nothing else.

At ~0430 UTC I shut down the ekeyd so it wasn’t getting any entropy from the ekey. I don’t yet understand why the available entropy dropped to ~400 bytes. Unfortunately I have no graphs of this machine from before the ekey was ever plugged in. It could be the normal state for it to have only ~400 bytes available, although this would make it unique amongst my host machines.

In the next hour that graph should show a little dip, which is where I ran the “watch 0.1 cat …”.

mksh now has a cat(1) builtin in the current version (not yet released)… so no process creation You could of course put it into some self-contained dæmon – which can use the ioctl instead of procfs, too.

Firstly, thanks for this great writeup, very useful. I manage a number of non-virtualised servers that nonetheless suffer from severe entropy-starvation. It may be because they are not very highly loaded IO-wise or because I try to be privacy conscious and encourage all my clients to use SSH/SSL/TLS as much as possible. But anyway, I needed a solution. The end result is that I’ve set up an OpenVPN network across all my servers and distribute entropy over EGD from one of them equipped with an EntropyKey. This solves the unencrypted-TCP problem, and with topped up entropy pools on all machines, OpenVPN should be very secure

On a side note, the default entropy pool on linux currently is 4096 *bits*, not even bytes. That seems very limited, but of course the size of this buffer is only relevant if you have spikes in your entropy demand (or dips in its generation).

[…] Modern computers and operating systems generate entropy based on events like electrical noise, timings of data coming into the computer over the network, what’s going on with the disks, etc. fed into algorithms — what we call pseudo-random number generators (PRNGs). A lot of data goes in and a relatively small amount of entropy comes out, but it’s entropy. […]