Slashdot videos: Now with more Slashdot!

View

Discuss

Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

CalTrumpet writes "Our research group recently spoke at Black Hat USA on the topic of cloud computing security. One of the interesting outcomes of our research was the discovery that the combination of virtualization technologies and public system images results in a problem for random number generation on guest operating systems. This is especially true for Linux, since its PRNG uses only a small set of entropy-gathering events, and virtual Linux images often generate SSH host keys within seconds of their initial boot. The slides are available; the PRNG vulnerability material begins at slide 63."

TPM chips have their bad things, but one thing they do offer is a cryptographically secure RNG. Its completely understandable not to trust it 100% completely, but you can use the random number stream it puts out as a good addition to the/dev/random number pool.

Assign all of he virtual servers a unique 256 bit ID. XOR that with 256 bits of input of any USB device that measures the real world, and send it through a hash algorithm. USB devices are easy for virtual servers to access.

Perhaps better, have a 256 bit seed for each server as above, but have the host server distribute 256 bits at startup time using a microphone run through a hash algorithm.

Unless you wanted all of the servers (and all of the VMs on each server) to have *different* entropy sources, which was the whole point of TFA. Unless you run a lot of single-device racks each in their own room you're just going to end up with an expensive way to get exactly the same "random" data on each machine.

There's also some correlation between things like disk activity and sound output of the machine; there may be some entropy available in the ambient sound -- it may even be chaotic -- but it's certa

there are plenty of free webcams that will let you look at...say the Eiffel tower, or even out at some guy's yard to watch his grass grow. Couldn't those free feeds be used to generate the random number?

First, real-world images are not very random just be virtual of being part of the real world; random things also need to happen. This is particularly mostly-static images like you'd see in 24/7 web cams -- there is not much entropy available.

Second, most of the reason we want random data for seeing purposes is because the seed needs to be something an attacker cannot derive. The output of truly random number generator cannot be predicted by a remote attacker, but publicly available video streams most certainly can, so any source that sends the same data to more than one person is not suitable for things like cryptography. Frankly that's the whole point of the article; if there are many VMs on the same host, or many real hosts on the same hardware and network, started at the same time, and using the same source for entropy they will all generate the same "random" number.

Finally, this is a well-solved problem. Many CPUs and motherboards include a hardware RNG that is perfectly sufficient both in terms of randomness and speed for typical PRNG seeding needs. VIA has had one directly in all their CPUs for a long time, Intel includes one in their firmware hubs, and I'm sure there are similar options on most other architectures. Using that on-board RNG to individually seed each VM/host would solve the problem described in the article. There's no reason to try to invent ways to get random data unless you have very specific requirements not met by the existing solutions, as you're quite likely to come up with something inferior either in design or implementation.

The Eiffel tower hasn't changed that much in the past 110 years... The MIT (IIRC) used to have a random data feed generated from a webcam and a lava lamp. I guess each datacenter ought to invest in a lavalamp...

Correction : after a quick Google, it seems that it was SGI [wired.com], and that they actually patented the thing (which may or may not mean something depending on your jurisdiction). The site was (and still is apparently) lavarnd.org [lavarnd.org].

You don't need a mic. The resistor noise on the sound card inputs is present and of secure quantum origin, regardless of whether a microphone is plugged in. The microphone noise is louder, but it's much harder to determine how much secure entropy is present. Why trust it when you don't have to? There's plenty available for most purposes without it. The Turbid [av8n.com] program does this in an efficient and secure manner (and they have a paper discussing the details, along with the relevant proofs, for the curious).

That's even worse than the microphone idea -- every server and VM within 20 miles of you would have the same source for "random" data.

And that's ignoring the fact that clouds aren't random at all; they change in very well-understood ways given the wind, humidity, temperature, etc. The path, shape and density of clouds can be predicted in general terms a week in advance, and pretty specifically over the course of a few minutes.

Don't forget each of these techniques vary by the quality of the hardware as well as it's sensitivity and spatial relation to the source of the stimulus . If I put a cheap mic in a high traffic area ( office / street ) with an addition of white noise (salt) and generation of keys limited to traffic hours you could get a decent seed. Make a large number of salted seeds when you can and salt again per host.

While you are right that using these methods for realtime key generation could be predictable gene

You're assuming that more noise is equivalent to more entropy -- that may or may not be true. People typically walk at a fairly even cadence, speak in a certain frequency range. Traffic noise has a predictable dopler shift and fairly well-characterized frequency distribution. And most importantly, the data isn't secret so someone else could just slap up a second mic next to yours and record the data.

Regardless, it's far from an optimal solution even if you assume that no dedicated hardware RNG is available.

You are correct more noise does not contribute to better entropy, distribution and change over time does. I only suggested sampling the data at it's 'richest' points which would be during office hours or in areas where people move and talk. Random numbers are just that how can you say 3 random numbers generated at 3 different times are any less random then a random number generated at a given point in time? The sources of entropy are the most important and while the hardware you mention is professional g

They do? It's secure crypto hardware.. what's evil about that? Yes you have scary evil like Palladium but you don't have to install it if you don't want to. And if machines take control you can always disable the device from the BIOS.. (given you don't care about any data which has encryption keys stored only in the module)

Black box testing isn't all that productive for RNGs. You can check distribution and very simple patterns, but beyond that it's a major headache. White box testing makes things much easier. Yay source code!

Most of the RNG chips publish pretty good specifications on the design of their entropy source, the amount of real entropy it provides, and the circumstances in which that entropy level might be reduced. There could be implementation or production errors or course, just like there could be runtime or compiler errors with software, but the design is available for perusal and has been analyzed.

Black box testing isn't all that productive for RNGs. You can check distribution and very simple patterns, but beyond that it's a major headache. White box testing makes things much easier. Yay source code!

Good luck for testing the validity of a random generator from source code. This is major Ju-Ju. Generated randomness is deep black majick. It's *waaaay* simpler and efficient to just test the output over a few X runs for a very large X.

That's not to say that source code isn't useful to check for glaring mistakes, but if you want to check the validity of an algorithm by looking at it, you'd better be a professional mathematician with specific interests.

TPM doesn't buy you anything in a VM - the virtualized environment has to trust the host that it's getting the correct certificates and data. The VM doesn't have direct access to the TPM, and the TPM won't export private keys. Also, that's only one set of keys per TPM, so multiple/portable VMs aren't realistic.
I'm in favor of the tools TPM offers users (vice content producers) but I don't believe this is a good fit.

Well, clearly that "Linux" thing is a toxic gas weapon being used by the reds. Ya, I'd worry about them blowing up a chemical weapon in the clouds. They obviously got the technology from the Nazi's (no, not a candidate for Godwin's law).

I don't know about you, but I'm grabbing my M1 Garand and heading down to the shelter under the house. Once that Linux stuff clears, I'll they'd better have thought twice about attackin' my good ol US of A.

Well, you asked what they would have though 50 years ago, didn't you?:)

I think of some primitive post-human civilization struggling to industrialize amid the ruins of the heat-dead universe.

There's little solid matter left. Nobody really knows why; the legends tell of ancient, sprawling empires releasing great monsters that consume worlds and deliver energy to fuel their eons-old wars in the cold between the stars. Several human colonies survived the Last Scourge. One even knew something of their people's history. This colony of merchant-scholars thrived in an old space-borne city drifting about a great lightyears-long dust cloud inexplicably left untouched by the wars. The city was old, very old, built by a generation of master engineers who etched their likenesses in the great canvases of the city's impervious white construction. Quiet machinery lurked untouched in the mysterious depths of the undercity, seen only by outcasts wandering alone through those vast echoing chambers.

The city provided everything the civilization needed. Somehow (so much seemed like magic to them that even the usually-curious humans grew bored of speculation) their reservoirs filled with water, their air recycled, and their waste disappeared down bottomless shafts. All of their needs were filled, but they craved expansion and exploration. They were able to harvest some limited chemical energy from the food supplied by the city, and build using scrap. Still, entropy was a problem in the dust cloud of Linux.....

So, I was mostly just giving him shit because of his name.
If you want a more serious debate, here's my best shot:
The instructions you described are all relatively easy to define a generally useful specification.
My main point was that every application has differing standards of randomness that are required. Do you need real quantum-mechanical randomness, or just a CSPRNG? How many bits of random data do you need, and how frequently? I'm assuming that the request is for real quantum-mechanical randomness. I find it hard to imagine defining a good spec for such hardware component, especially since the vast majority of applications don't actually require quantum-mechanical randomness, and the ones that do are likely to have very specific requirements.
Anyways, besides the fact that it's tough to come up with good requirements for such a feature, I bet it's really tough to implement as well. I know just barely enough about about hardware implementations to be dangerous, so someone who knows for real, please correct me if I'm wrong. Anyways, circuits that exhibit quantum-mechanical randomness are, as far as I know, essentially the same as circuits that cause metastability [wikipedia.org] in transistors. Because of the need to control for such problems, implementing such circuits on the same die as a normal digital circuit would likely be very expensive in terms of both die area and yield.

First, the cost of computing truly random numbers is way too high for that, unless you are performing an iterative approach to random number generation (and then you have the problem of predictability). It could be done, but you'd be pumping a lot of hardware into computing values that would be thrown away 99.9%+ of the time.

Secondarily, if your PRNG algorithm is broken, you're stuck replacing the hardware. At least a bad software PRNG can be replaced.

That said, hardware PRNG is provided in many modern systems by a TPM [wikipedia.org]. It lacks the performance problems associated with your solution, since it only generates random numbers on demand. It still has the problem of a potential exploit being discovered leading to expensive hardware upgrades, but to my knowledge that has not been a problem to date.

Why can't the CPU contain a register which holds a random number which is updated with every clock cycle?

First, the cost of computing truly random numbers is way too high for that

Computers are deterministic. Truly random numbers cannot be computed, they can only be provided by special hardware (something that can measure shot noise or thermal noise, a camera pointed at a lava lamp, a movement detector in Schrodinger's cat's box).

That's why you don't do pseudo-random numbers, but real randomness from thermal noise or shot noise or some other quantum effect (cats and lava lamps don't fit on ICs).

That said, hardware PRNG is provided in many modern systems by a TPM.

And at some level, the randomness generator on the TPM almost certainly has an interface of "read this special register every X clock cycles" (because how else would you interface with your special hardware?).

It lacks the performance problems associated with your solution, since it only generates random numbers on demand.

If it's implemented in hardware (as it must be, to get true randomness), it's always running and there is no "on demand".

It still has the problem of a potential exploit being discovered leading to expensive hardware upgrades, but to my knowledge that has not been a problem to date.

That's why you don't do pseudo-random numbers, but real randomness from thermal noise or shot noise or some other quantum effect (cats and lava lamps don't fit on ICs).

A small radiation source/detector, like the ones in smoke detectors, can work just fine for this purpose. Since radiation is the result of quantum interactions, the output is truly random due to the nature of the universe.

Only if you demand perfect randomness (for which there is little practical use in typical computers). And even then "perfect" only means "perfectly preserving randomness" not "correctly detecting every single event/non-event". Given the relative simplicity of a radiation detector "perfect" or some very close equivalent thereto is probably not all that unrealistic anyway.

There are CPUs (or more often, chipsets) that provide RNGs, along with a few other hardware implementations of crypto algorithms. Most of them are meant for smaller computers, though, like the VIA C3. I wish they were more widespread and used.

Many do. VIA has had CPU-integrated dual-oscillator hardware RNGs for a long time. Intel firmware hubs also commonly contain a hardware RNG, as do other motherboard architectures.

They aren't very fast sources of random data -- it's actually pretty hard to get truly random data, even outside the world of desktop CPUs -- but they are fast enough to provide a relatively long seed for a PRNG within seconds of boot. Assuming you use a reasonable PRNG, providing a truly random seed is sufficient to let the PRNG g

Generating SSH keys involves interaction via at least keyboard and possibly mouse at a terminal. Surely that basic permise is enough to provide enough entropy for the pseudo-random generator. Also, the date and time (as sources of random) can't be virtualized of course.

If you use PuTTY, yes. OpenSSH, at least, doesn't require anything in particular, just a sufficient amount of entropy. On a properly configured system, moving a mouse or banging randomly on the keyboard might feed entropy -- but then, so would plugging a microphone into the sound card, or any number of other things.

And as Kaseijin mentions, this is about host keys. Especially in a virtualized environment, you can't assume any sort of human interaction when these keys are generated.

No, the only true random number is 17. This was asserted by several mathematicians who used several lines of reasoning (one rather like this [flickr.com]). Then you get the random sequence 17,17,17... and the random rational 0.17171717... and lots of other perfectly good random numbers. Though you probably shouldn't use them as a source for cryptographically strong random numbers.

The nice thing about Linux is that you can develop whatever entropy-producing process you want and write its output to/dev/urandom to add more entropy to the pool. For instance, a boot script could issue an HTTP request to a website backed by a hardware random-number generator (access-control to only machines in the cloud by IP range). It is something to be worried about, though.

Java code that does cryptography or generates UUIDs (in the hope that they will be a truly universal key for something) operates under similar problems. JavaScript is even worse; all it has is the time, perhaps the user's window-size (not very random if maximised) and mouse-movements, and the built-in random() method, which is not expected to be of cryptographic quality.

Interesting idea, though I would recommend HTTPS (pre-shared self signed cert would be sufficient for in-house use). If predictability is the problem you're trying to avoid, you want to skirt the packet sniffers.

By the way, why write to/dev/urandom, and not/dev/random? Doesn't/dev/urandom act as a front for/dev/random except when the entropy pool is empty (at which point it goes pseudo-random). Just curious.

Actually,/dev/random and/dev/urandom have their own, separate secondary pools that are fed off of a main pool when entropy is "depleted" in the second level pools. This is an area of research for us as well, since Linux's entropy estimation algorithm fails in situations where the timing deltas of entropy gathering events (IRQs and disk IOs) are actually predictable, so it's possible that the second level pools are not being refreshed at appropriate times.

If you write to/dev/urandom, it goes into the primary pool by tradition. This is what the rc scripts do on bootup with the random seed file on disk.

BTW, it's absolutely the wrong solution to get entropy from another source on the network (for many reasons, but one is that you can't do a secure HTTPS handshake without, you guessed it, unguessable random numbers). The whole point here is that we are looking for a way for 500 Linux instances on EC2 to have different entropy pools before the kernel completes boot. The only possible solution is for the hypervisor (Xen for Amazon) to provide a simulated HW RNG that pulls entropy from a real HW RNG or from an entropy daemon in the hypervisor.

The best way to learn about Linux RNG basics is Gutterman et. al. Analysis of the Linux Random Number Generator [iacr.org]. Several of the issues they describe have been addressed, such as their PFS concerns, but their description of the entropy pools is still accurate.

BTW, it's absolutely the wrong solution to get entropy from another source on the network (for many reasons, but one is that you can't do a secure HTTPS handshake without, you guessed it, unguessable random numbers). The whole point here is that we are looking for a way for 500 Linux instances on EC2 to have different entropy pools before the kernel completes boot.

If we're talking about a VM, what's wrong with setting up a point-to-point link with the host machine and accessing an entropy source over that,

CONFIG_HW_RANDOM_VIRTIO enables it. It's been there for quite a while.

We could easily support it in KVM but I've held back on it because to really solve the problem, you would need to use/dev/random as an entropy source. I've always been a bit concerned that one VM could starve another by aggressively consuming entropy.

I guess you could set it as an option, but the threshold between a useful amount of entropy and what it would take to starve another is often overlappng, so it wouldn't be much help in any but the most controlled situations--which is exactly when you wouldn't need the option.

I don't understand how entropy consumption is fundamentally different than I/O consumption or memory consumption, or why it would need a different solution to the problem of competing demands for scarce resources.

I'd like some evidence that cloud computing is a fad. Tens of thousands of companies, in dozens of industries, do not list "computing hardware, availability, and capacity management" as a core competency, making them prime cloud customers.

It is a tool in the bucket. That what it is. There will be a huge growth spurt, then they realize that it won't solve everything. Then they will cut back and still use it until they find something better.

Seems to me this could be solved via the "Guest Additions" module that most virtualization packages recommend you install in the guest OS. Use the GA to inject some entropy from the host system into the guest system's entropy pool. The host CPU's TSC register would probably be an excellent source.

If you "need" cloud computing, then you're bright enough to install an entropy daemon on one of the machines and maybe even slap a hardware-based RNG on it (probably worth sourcing a VIA or similar just for this purpose, to be honest). It's not hard.

Anything else, your "randomness" really doesn't matter and the standard entropy will be just fine.

A bold assertion. I assume you're thinking of TCP sequence numbers or similar. Otherwise, I call bullshit on the "ANY".

And the entropy provided by being connected to a network in any way, shape or form is enough for that purpose.

Even in general, unless you're generating LOTS of SSH/SSL keys on some kind of automated process schedule, you're fine, and that's the sort of task that should be pushed out to a dedicated entropy machine.

Otherwise, every ADSL router etc. in the WORLD would be worthless - no keybo

If you "need" cloud computing, then you're bright enough to install an entropy daemon on one of the machines and maybe even slap a hardware-based RNG on it (probably worth sourcing a VIA or similar just for this purpose, to be honest). It's not hard.

Err... yes, it is. Where does your entropy daemon get its entropy from? How do you install the hardware given that you're running in a VM hosted on somebody else's machine, located in somebody else's datacentre? This is an issue that can only be solved by the

"The term cloud computing is useless" said Stamos. "It's way overused. It's mostly about gathering venture capital or selling your products."

Yes. Because no one on the Internet has any use for gathering venture capital or selling products.

It IS an overused term, but you're not testing some product or how people are using it, you're really just testing the security models of various operating systems to determine which are more ready to support those concepts that people grouped together and called "cloud computing". There were a lot of various concepts that were grouped together that comprised the "Net 2.0" concept too...and that cliche was just as derided for being overused. And yet, websites that aren't all ajaxed up or don't use css seem pretty old-fashioned these days.

That said, the question I have is how ready for those "cloud computing" concepts is Windows, really? How much of that security model is using the proper approach to securing a transaction instead of just shutting down that path altogether?

This is not a "cloud" problem. This is a virtual server and image problem. Clouds have nothing to do with virtual servers. If you use a service like NewServers.com, you can get dedicated physical servers for your cloud, on-demand and at hourly prices.

While the origin of the issue is the virtualization layer, it is more specifically a cloud problem because most IaaS/VPS providers use standard images with a public random seed file, so everybody's machine initializes up to the same state (RTC + random seed + handful of interrupts).

This is not a "cloud" problem. This is a virtual server and image problem. Clouds have nothing to do with virtual servers. If you use a service like NewServers.com, you can get dedicated physical servers for your cloud, on-demand and at hourly prices.

Expanding on the other answer you've, here's the basic problem:

I can take a virtual server, install an image with a well-known PRNG seed in it, and use it for a little while. While it's used the PRNG is updated by entropy in an unpredictable way, resulting eve

I did a system wipe and rebuild (re-installs CentOS from scratch) and SSH'ed in and... got no warning. The system's SSH keys were identical as the previous build. Needless to say I generated a local set and uploaded them.

I set up and scanned a number of virtual machines for a network security class Spring of this year. I noticed the following was the typical output of nmap when scanning the virtual host (in this case the VM was Fedora Core 10 hosted on CentOS 5.3 running a 2.6.29 custom kernel):

This is pretty old-hat. First, the host-keys issue inside pre-generated images is a very obvious one, although I'm not too surprised that companies aren't considering it. RNG issues aren't quite as obvious, but they're not super-secret either, anyone with any amount of background in security has been aware of this for a while.

In fact, questions regarding RNGs have even surfaced in the ##xen IRC channel (freenode.org) because it is a very important issue to some. In particular, those with the need for hardware RNG solutions have come seeking assistance.

I'm certainly not minimizing the issue, just noting that it isn't really a new one at all. More than anything, is that the average systems administrator has been slow to realize this, and developers even less so.

This problem has been solved: use EntropyBroker [vanheusden.com]: a physical machine gathers entropy data and distributes this to the virtual machines.
If I remember correctly KVM has a special driver for feeding the VM with entropdata from the host system.

OpenSSL has a cryptographically secure random number generator. I know not everything uses it but doesn't (Open)SSH?

No. By default, OpenSSH will use the system's pesudo-random number generator, but you can also make it use prngd [sourceforge.net] or EGD (the Entropy Gathering Daemon) instead. Whether either are more "secure" than the kernel's built-in RNG I am not qualified to say.

Like Web 2.0, it has at least one or two specific meanings. The problem is getting specific -- a little knowledge is a dangerous thing, and managers can be very fuzzy (clouded?) about "cloud computing" if they don't understand the difference between calling Gmail a "cloud app" and calling Amazon Web Services a "compute cloud".

is that it doesn't exist. It's a farce, a meaningless buzzword, just like web 2.0.

A more appropriate word would be servers.

You miss the point. We aren't talking about servers, and any ordinary server-provision system wouldn't have the problem highlighted in TFA. We are talking about servers that are initialised on-demand, with a pay-by-the-hour pricing model, so that individual OS installations typically only run for a few hours at a time before being shut down and essentially wiped back to the base insta

well... i screwed up and got my threading mixed up so I thought you'd replied to a different post. d'oh.

All I was saying is that based on recent discussions on xen-devel concerning TSC synchronisation when physical CPU's are scheduled into virtual CPU's in a VM, the value might as well be a random number for the amount of use that it is. I assumed you were making a joke about that, but obviously you were replying to a different post than I thought you were.

A book of random numbers is great for statistics. If that's your use there's no need to do anything else, and RAND's book is still a good choice.

But the value of random numbers in things like cryptography is that they are unpredictable. If everyone is using the same list the numbers are entirely predictable and therefore useless. A typical example is the hybrid cryptosystem used in public-key encryption -- the computer picks a random number for use as the secret key for the shared-secret cypher, encrypts th

There. Fixed it for you. Works better if the VM server has a high volume entropy source, but even if not it is still pretty damn good.

Except this is somewhat harder to do if you're running a service where you provide virtual machines that run OS images from unknown sources, that could be running basically any OS/distribution the user wishes, with the image using practically any file system that has ever be