Piercing a key defense found in cloud environments such as Amazon's EC2 service, scientists have devised a virtual machine that can extract private cryptographic keys stored on a separate virtual machine when it resides on the same piece of hardware.

The technique, unveiled in a research paper published by computer scientists from the University of North Carolina, the University of Wisconsin, and RSA Laboratories, took several hours to recover the private key for a 4096-bit ElGamal-generated public key using the libgcrypt v.1.5.0 cryptographic library. The attack relied on "side-channel analysis," in which attackers crack a private key by studying the electromagnetic emanations, data caches, or other manifestations of the targeted cryptographic system.

One of the chief selling points of virtual machines is their ability to run a variety of tasks on a single computer rather than relying on a separate machine to run each one. Adding to the allure, engineers have long praised the ability of virtual machines to isolate separate tasks, so one can't eavesdrop or tamper with the other. Relying on fine-grained access control mechanisms that allow each task to run in its own secure environment, virtual machines have long been considered a safer alternative for cloud services that cater to the rigorous security requirements of multiple customers.

"In this paper, we present the development and application of a cross-VM side-channel attack in exactly such an environment," the scientists wrote. "Like many attacks before, ours is an access-driven attack in which the attacker VM alternates execution with the victim VM and leverages processor caches to observe behavior of the victim."

The attack extracted an ElGamal decryption key that was stored on a VM running the open-source GNU Privacy Guard. The code that leaked the tell-tale details to the malicious VM is the latest version of the widely used libgcrypt, although earlier releases are also vulnerable. The scientists focused specifically on the Xen hypervisor, which is used by services such as EC2. The attack worked only when both attacker and target VMs were running on the same physical hardware. That requirement could make it harder for an attacker to target a specific individual or organization using a public cloud service. Even so, it seems feasible that attackers could use the technique to probe a given machine and possibly mine cryptographic keys stored on it.

The attacker then gives up execution and hopes that the target VM will run next on the same core—and moreover, that the target is in the process of running the square-and-multiply operation. If it is, the target will cause a few cache-line-sized blocks of the attacker's instructions to be evicted from the cache. Which blocks are evicted is highly dependent on the operations that the attacker conducts.

The technique allows attackers to acquire fragments of the cryptographic "square-and-multiply" operation carried out by the target VM. The process can be difficult, since some of the fragments can contain errors that have the effect of throwing off an attacker trying to guess the contents of a secret key. To get around this limitation, the attack compares thousands of fragments to identify those with errors. The scientists then stitched together enough reliable fragments to deduce the decryption key.

The researchers say it's the first demonstration of a successful side-channel attack on a virtualized, multicore server. Their paper lists a few countermeasures administrators can take to close the key leakage. One is to avoid co-residency and instead use a separate, "air-gapped" computer for high-security tasks. Two additional countermeasures include the use of side-channel resistant algorithms and a defense known as core scheduling to prevent attack VMs from being able to tamper with the cache processes of the other virtual machine. Future releases of Xen already include plans to modify the way so-called processor "interrupts" are handled.

While the scope of the attack remains limited, the research is important because it opens the door to more practical attacks in the future.

"This threat has long been discussed, and security people generally agree that it's a concern," Green wrote. "But actually implementing such an attack has proven surprisingly difficult."

Promoted Comments

I get up in the morning, look at myself in the mirror, and think "Hey there, you are a pretty bright person."

The I read about the people figuring out these kinds of attacks/issues and I think "Wow, I'm amazed I have enough intelligence to tie my own shoes."

I am absolutely flabbergasted at the intelligence of some of these people (and the occasional Ars commenter). And I *really* appreciate these kinds of articles that take such a complex and fascinating concept and explain it in a way that I can, at least in a rudimentary way, understand.

How is it possible for a virtual machine to get a read on electromagnetic emanations? Isn't that something you'd need special hardware for? Or do I misunderstand it completely?

That's just an example of side channels in general, not of this particular one, which measures cache eviction. In particular a very famous side channel is the very clear electromagnetic signal that a monitor gives off which varies with the pattern it is displaying. Find an analogue AM radio and an old monitor (the older the louder, generally). Scroll through the stations slowly as you minimize and maximize a window and you will hear distinct fizzles and pops. This is NOT only a problem in theory: you can play music this way! http://www.erikyyy.de/tempest/

Side channels. It's always the freaking side channels. Being able to recover a 4kbit key in less than a day is Serious Business.

Very interesting piece. The whole question of whether logically separated systems provide enough security is not a new one, but in this economic climate the temptation to reduce IT costs and move to virtualization or fully into the cloud is perhaps stronger. Research of this kind gives an important extra perspective to that decision.

Very interesting work. For something like AWS, this definitely argues for having crypto performed on dedicated residency systems – it'd be extremely interesting to know whether ELBs are cohosted with user VMs – but the writing on the wall is clear long term: the entire crypto stack needs to migrate to time-invariant algorithms to avoid the many varieties of timing attacks researchers have proven.

I do wonder whether we'll see cloud providers offering hardware crypto modules – it'd be an interesting risk shift but I very much like the idea of using private keys which never leave an HSM

I am amazed at the ways people figure out how to gain secrets. A completely secure system that has network access can't possibly exist.

It can, the problem is that the difficulty of creating that security increases exponentially as you increase the amount of available interactions. For example, if all the machine does is receive a serious of "yes" or "no" inputs, it is extremely easy to secure. If, OTOH, it is storing databases, delivering up webpages, hosting virtual servers, or accessing remote data, it suddenly becomes considerably harder to secure, as the range of valid interactions becomes considerably larger, and therefore also does the possible space for exploits.

In other words, it is very easy to secure even a networked computer, but it is very hard to do anything useful with a secure computer or to secure a useful computer.

In other words, it is very easy to secure even a networked computer, but it is very hard to do anything useful with a secure computer or to secure a useful computer.

Correct me if I'm wrong, but couldn't this analogy apply to physical security (especially in the post-9/11 USA) as well? It becomes a balancing act of security hassle vs. functionality. If a super-secure system (digital or physical) is devised, how usable (for everyday tasks) is it?

Great to see Ars doing a write-up on this - thanks, Dan. Really interesting stuff, and fascinating to see it working in practice - since clearly the news isn't that these kind of side-channel attacks are possible but that they can be practical.

I feel compelled to add a couple of things from the original post, though. Rather than re-word, I'll just quote wholesale:

Quote:

First: there's a reason these researchers did this with libgcrypt and Elgamal, and not, say OpenSSL and RSA (which would be a whole lot more useful). That's because libgcrypt's Elgamal implementation is the cryptographic equivalent of a 1984 Stanley lawnmower engine -- it uses textbook square-and-multiply with no ugly optimizations to get in the way. OpenSSL RSA decryption, on the other hand, is more like a 2012 Audi turbo-diesel: it uses windowing and CRT and blinding and two different types of multiplication, all of which make it a real pain in the butt to deal with.

Secondly, this attack requires a perfect set of conditions. As proposed it works only works with two VMs, and requires specific training on the target hardware. This doesn't mean that the attack isn't viable (especially since cloud services probably do use lots of identical hardware), but it does mean that messiness -- the kind you get in real cloud deployments -- is going to be more of an obstacle than it seems.

My completely unprofessional opinion is that of the two, the second is likely to be harder for an attacker to navigate, and I'd be interested to see this pursued further by the researchers. If you're on the same machine as ten others, does the signal-to-noise ratio increase by a factor of ten? My instinct is to say yes, but I wouldn't be completely surprised if there's some way of reducing the odds in that situation.

I'd be interesting to see these techniques applied to non-x86 systems (like true mainframes) and see the results it produces. Reading through the paper some parts seems to be very x86 dependent.

The other things that stands out is that it is dependent on flexible hardware allocation between VM's. An exclusive lock on a VM to selected processors would mitigate most of the risk. As pointed out by the paper, disabling SMT (Hyperthreading) logically reduces the attack vector. I'm kinda surprise at how much changing the clock speed also affects this attack.

Not mentioned in the paper is how this attack would work with multiple instances running in a cluster. In particular the scenario of one attacker VM encountering another attacker VM on the same hardware.

Correct me if I'm wrong, but couldn't this analogy apply to physical security (especially in the post-9/11 USA) as well? It becomes a balancing act of security hassle vs. functionality. If a super-secure system (digital or physical) is devised, how usable (for everyday tasks) is it?

A super-secure system will never have functionality beyond possibly storage or mundane tasks. The catch is that computers which have a lot of functionality, making them the most vulnerable, are the ones that have network access, do online banking, store personal/ professional information, and are more than likely attached to users who haven't the slightest clue how vulnerable they make themselves.

The more complex fix would be to upgrade the CPU to be more protective of it's cache, unless you can some how do that in the hypervisor.

I do not believe the CPU will allow the contents of the cache to be read back out. The impression I got is they are using timing measurements to determine which of their pages needed reloaded into the CPU, which in turn tells them something about the task that was running previously.

This is a difficult thing to prevent, short of turning off the CPU cache (which decimates performance). The only other thing I can think of would be to prevent the VM from knowing how many clock cycles a task takes, or any other high performance timing. Which can cause other problems, I am sure.

Skimming the analysis linked to, and it appears this attack may not be as useful as it seems here.

Really interesting stuff, and fascinating to see it working in practice - since clearly the news isn't that these kind of side-channel attacks are possible but that they can be practical.

I'm not really sure that breaking a known-bad crypto implementation in a hand-picked setup necessarily rises to the level of "practical."

Quote:

My completely unprofessional opinion is that of the two, the second is likely to be harder for an attacker to navigate, and I'd be interested to see this pursued further by the researchers. If you're on the same machine as ten others, does the signal-to-noise ratio increase by a factor of ten? My instinct is to say yes, but I wouldn't be completely surprised if there's some way of reducing the odds in that situation.

There has been lots of research into de-noising side channel signals, it's really just a matter of more samples/processing. Reducing signal-to-noise by a factor of 1000 (at least for this attack, which has a very strong signal) would only be a minor impediment. The real issue is that as your required number of samples increases you run the risk of hitting the brick wall of key rotation.

Very interesting piece. The whole question of whether logically separated systems provide enough security is not a new one, but in this economic climate the temptation to reduce IT costs and move to virtualization or fully into the cloud is perhaps stronger. Research of this kind gives an important extra perspective to that decision.

How is it possible for a virtual machine to get a read on electromagnetic emanations? Isn't that something you'd need special hardware for? Or do I misunderstand it completely?

That's just an example of side channels in general, not of this particular one, which measures cache eviction. In particular a very famous side channel is the very clear electromagnetic signal that a monitor gives off which varies with the pattern it is displaying. Find an analogue AM radio and an old monitor (the older the louder, generally). Scroll through the stations slowly as you minimize and maximize a window and you will hear distinct fizzles and pops. This is NOT only a problem in theory: you can play music this way! http://www.erikyyy.de/tempest/

Side channels. It's always the freaking side channels. Being able to recover a 4kbit key in less than a day is Serious Business.

I get up in the morning, look at myself in the mirror, and think "Hey there, you are a pretty bright person."

The I read about the people figuring out these kinds of attacks/issues and I think "Wow, I'm amazed I have enough intelligence to tie my own shoes."

I am absolutely flabbergasted at the intelligence of some of these people (and the occasional Ars commenter). And I *really* appreciate these kinds of articles that take such a complex and fascinating concept and explain it in a way that I can, at least in a rudimentary way, understand.

In other words, it is very easy to secure even a networked computer, but it is very hard to do anything useful with a secure computer or to secure a useful computer.

Correct me if I'm wrong, but couldn't this analogy apply to physical security (especially in the post-9/11 USA) as well? It becomes a balancing act of security hassle vs. functionality. If a super-secure system (digital or physical) is devised, how usable (for everyday tasks) is it?

@karolus Read: Security+, CCISP, CEH etc. Security HAS ALWAYS been the balance of security vs. usability. A complete secure system is possible, it's powered off, in a vault, encased in cement, under ground, and wired with explosives, with no connectivity to the outside world.

@baloroth And frankly to say it's "easy to secure a computer that does nothing but yes/no answers' defeats the broader understanding that everything a computer does IS a yes/no answer. Everything is either I/O 1/0 On/Off Yes/No. It's a question of the complex interactions of multiple such questions that results in usable information, so no, it is not possible to completely secure if it's networked, it just increased the difficulty of the attack.

@The Article: This is where we see a difference in security levels based on processor architecture and utilization. This attack could effectively be negated by using a processor architecture in-which each processor core has a dedicated L1 cache, and each VM is given access to a dedicated core. Thus in a quad-processor dodeca-core server you could run 48 VMs without L1 cache side-channel access being an issue.

Then the question: Can similar action be taken against L2 cache (and L3 cache when existing?)

((From the research Paper (I wanted to see if it answered the above question)"For example, the L1 caches contain the most potential for damaging side-channels [36], but these are not shared across different cores. An attacker must therefore try to arrange to frequently alternate execution on the same core with the victim so that it can measure side-effects of the victim’s execution."

The paper goes on to mention how multiple virtualCPUs can cause the side-channel attack to be quicker as it allows for Inter-Processor Interrupts. (For the non-comp sci, every exchange between a peripheral/processing device is initiated by what is known as an interrupt request, essentially a signal which says to the processor "excuse me, I'd like a moment of your time.")

I don't see any specific mentions of the L2 cache in the article, though there is some use of reference material on the L2 cache in the citations. Likewise only one use of the word "layer" in the article suggests that I am not missing L2 for it's alternative meaning of L))

Next question, is there a way to trick the DMA channels to access information as it is processed in RAM, or a way to do to the RAM what they have done to the L1 Cache, thus providing information through the same side-channel style attack of observing and processing the changes forced by other virtual machines?

Again I check the article and find no specific references, but I only skimmed it quickly for information about "RAM". The issue I see here is the massive amounts of RAM that such a server can physically contain causing the Signal-to-Noise ratio to be considerably (exponentially) higher on the noise side of things than at the cache level which tends to be measured in KB not GB and TB.

As said in the article, there is nothing really new here since these sort of attacks have been done before between privileged and unprivileged processes. The only question was the feasibility of a practical implementation, which was solved through no small feat of engineering.

There are basically two solutions:

1) Choose better security primitives that don't leak side channel info. Many algorithms that use lookup tables have similar attacks, but other types of timing attacks are possible. Hardware may help if the implementation is good.

2) Isolate VMs better. In this case it would probably require each VM to have private CPUs (or sockets in case there's L3 cache). That may defeat the purpose of virtualization in some cases, but work in others.

In any case, a VM is only one more layer of security, not a guarantee of anything.

Next question, is there a way to trick the DMA channels to access information as it is processed in RAM, or a way to do to the RAM what they have done to the L1 Cache, thus providing information through the same side-channel style attack of observing and processing the changes forced by other virtual machines?

Interesting question. Maybe with a multi-socket NUMA machine and VM memory deduplication (e.g. KSM) it would even be possible to use timing attacks on which node the physical RAM resides?

Unlikely to work in practice, since RAM pages are a lot larger than cache lines (the data is probably on the same page), and the data would have to be very cold not to reside in any caches. Not to mention other - possibly easier - side channel attacks that deduplication opens.

Aside from the practical difficulties of landing on target host hardware, I see that the target must also be executing crypto code at the time of attack. Either it sits there doing lots of encryption, or the attacker needs to be able to trigger an encryption operation on the target to be able to interrupt it reliably enough to get measurements. I'd consider it nigh impossible to target a given organization's infrastructure this way.

The more complex fix would be to upgrade the CPU to be more protective of it's cache, unless you can some how do that in the hypervisor.

I do not believe the CPU will allow the contents of the cache to be read back out. The impression I got is they are using timing measurements to determine which of their pages needed reloaded into the CPU, which in turn tells them something about the task that was running previously.

This is a difficult thing to prevent, short of turning off the CPU cache (which decimates performance). The only other thing I can think of would be to prevent the VM from knowing how many clock cycles a task takes, or any other high performance timing. Which can cause other problems, I am sure.

Skimming the analysis linked to, and it appears this attack may not be as useful as it seems here.

I have a hard time believing that a timing attack can actually work in a production environment. I would expect natural entropy to sufficiently corrupt the results of such an attack. Has there ever been a real world example of this?

In other words, it is very easy to secure even a networked computer, but it is very hard to do anything useful with a secure computer or to secure a useful computer.

Correct me if I'm wrong, but couldn't this analogy apply to physical security (especially in the post-9/11 USA) as well? It becomes a balancing act of security hassle vs. functionality. If a super-secure system (digital or physical) is devised, how usable (for everyday tasks) is it?

I am reminded of Hollywood computers, where any good software company or government agency will perform all of it's most secure functions on a machine hidden behind so much layered security that no one can ever interact with it.

In other words, it is very easy to secure even a networked computer, but it is very hard to do anything useful with a secure computer or to secure a useful computer.

This is just NOT true. It's easy to secure a computer, even a networked computer AND do useful work on it.

The real problem is most programmers and software developers are clueless about security, and/or are careless, and/or don't care enough to actually secure their software. Some have the attitude that "you want security use a server, not a desktop".

The best way for Linux distros to secure their software is to use a hardened toolchain and hardened kernel with a complete setup of grsecurity.

Aside from the practical difficulties of landing on target host hardware, I see that the target must also be executing crypto code at the time of attack. Either it sits there doing lots of encryption, or the attacker needs to be able to trigger an encryption operation on the target to be able to interrupt it reliably enough to get measurements. I'd consider it nigh impossible to target a given organization's infrastructure this way.

It's an interesting attack, though, and they only get better.

a web server running a high traffic https server will be doing a LOT of encryption. if it's only running a single web server then it means any fragments of SSL activity recovered will all be related.

another server which is being used for copying files with ssh will also have a lot of crypto operations running through it.

if you happen to know the IP address of the server, and there's a fair chance you do since you're running a VM guest on the same host as the target VM guest, you can trigger ssl activity simply by connecting and initiating a connection, even if you fail any authentication after key exchange.