Tuesday, June 02, 2009

More Thoughts on CPU backdoors

I've recently exchanged a few emails with Loic Duflot about CPU-based backdoors. It turned out that he recently wrote a paper about hypothetical CPU-backdoors and also implemented some proof-of-concept ones using QEMU (for he doesn't happen to own a private CPU production line). The paper can be bought here. (Loic is an academic, and so he must follow some of the strange customs in the academic world, one of them being that papers are not freely published, but rather being sold on a publisher website… Heck, even we, the ultimately commercialized researchers, still publish our papers and code for free).

Let me stress that what Loic writes about in the paper are only hypothetical backdoors, i.e. no actual backdoors have been found on any real CPU (ever, AFAIK!). What he does is he considers how Intel or AMD could implement a backdoor, and then he simulate this process by using QEMU and implementing those backdoors inside QEMU.

Loic also focuses on local privilege escalation backdoors only. You should however not underestimate a good local privilege escalation — such things could be used to break out of any virtual machine, like VMWare, or potentially even out of a software VMs like e.g. Java VM.

The backdoors Loic considers are somewhat similar in principle to the simple pseudo-code one-liner backdoor I used in my previous post about hardware backdoors, only more complicated in the actual implementation, as he took care about a few important details, that I naturally didn't concern. (BTW, the main message of my previous post about was how cool technology this VT-d is, being able to prevent PCI-based backdoors, and not about how doomed we are because of Intel- or AMD-induced potential backdoors).

Some people believe that processor backdoors do not exist in reality, because if they did, the competing CPU makers would be able to find them in each others' products, and later would likely cause a "leak" to the public about such backdoors (think: Black PR). Here people make an assumption that AMD or Intel is technically capable of reversing each others processors, which seems to be a natural consequence of them being able to produce them.

I don't think I fully agree with such an assumption though. Just the fact that you are capable of designing and producing a CPU, doesn't mean you can also reverse engineer it. Just the fact that Adobe can write a few hundred megabyte application, doesn't mean they are automatically capable of also reverse engineering similar applications of that size. Even if we assumed that it is technically feasible to use some electron microscope to scan and map all the electronic elements from the processor, there is still a problem of interpreting of how all those hundreds of millions of transistors actually work.

Anyway, a few more thoughts about properties of a hypothetical backdoors that Intel or AMD might use (be using).

First, I think that in such a backdoor scenario everything besides the "trigger" would be encrypted. The trigger is something that you must execute first, in order to activate the backdoor (e.g. the CMP instruction with particular, i.e. magic, values of some registers, say EAX, EBX, ECX, EDX). Only then the backdoor gets activated and e.g. the processor auto-magically escalates into Ring 0. Loic considers this in more detail in his paper. So, my point is that all the attacker's code that executes afterwards, think of it as of a shellcode for the backdoor, that is specific for the OS, is fetched by the processor in an encrypted form and decrypted only internally inside the CPU. That should be trivial to implement, while at the same time should complicate any potential forensic analysis afterwards — it would be highly non-trivial to understand what the backdoor actually have done.

Another crucial thing for a processor backdoor, I think, should be some sort of an anti-reply attack protection. Normally, if a smart admin had been recording all the network traffic, and also all the executables that ever got executed on the host, chances are that he or she would catch the triggering code and the shellcode (which might be encrypted, but still). So, no matter how subtle the trigger is, it is still quite possible that a curious admin will eventually find out that some tetris.exe somehow managed to breakout of a hardware VM and did something strange, e.g. installed a rootkit in a hypervisor (or some Java code somehow was able to send over all our DOCX files from our home directory).

Eventually the curious admin will find out that strange CPU instruction (the trigger) after which all the strange things had happened. Now, if the admin was able to take this code and replicate it, post it to Daily Dave, then, assuming his message would pass through the Moderator (Hi Dave), he would effectively compromise the processor vendor's reputation.

An anti-replay mechanism could ideally be some sort of a challenge-response protocol used in a trigger. So, instead having you always to put 0xdeadbeaf, 0xbabecafe, and 0x41414141 into EAX, EBX and EDX and execute some magic instruction (say CMP), you would have to put a magic that is a result of some crypto operation, taking current date and magic key as input:

Magic = MAGIC (Date, IntelSecretKey).

The obvious problem is how the processor can obtain current date? It would have to talk to the south-bridge at best, which is 1) nontrivial, and 2) observable on a bus, and 3) spoof'able.

A much better idea would be to equip a processor with some sort of an eeprom memory, say big enough to hold one 64-bit or maybe 128-bit value. Each processor would get a different value flashed there when leaving the factory. Now, in order to trigger the backdoor, the processor vendor (or backdoor operator, think: NSA) would have to do the following:

1) First execute some code that would read this unique value stored in eeprom for the particular target processor, and send this back to them,

2) Now, they could generate the actual magic for the trigger:

Magic = MAGIC (UniqeValueInEeprom, IntelSecretKey)

3) ...and send the actual code to execute the backdoor and shellcode, with the correct trigger embedded, based on the magic value.

Now, the point is that the processor will automatically increment the unique number stored in the eeprom, so the same backdoor-exploiting code would not work twice for the same processor (while at the same time it would be easy for NSA to send another exploit, as they know what the next value in the eeprom should be). Also, such a customized exploit would not work on any other CPU, as the assumption was that each CPU gets a different value at the factory, so again it would not be possible to replicate the attack and proved that the particular code has ever done something wrong.

So, the moment I learn that processors have built-in eeprom memory, I will start thinking seriously there are backdoors out there :)

One thing that bothers me with all those divagations about hypothetical backdoors in processors is that I find them pretty useless in at the end of the day. After all, by talking about those backdoors, and how they might be created, we do not make it any easier to protect against them, as there simply is no possible defense here. Also this doesn't make it any easier for us to build such backdoors (if we wanted to become the bad guys for a change). It might only be of an interest to Intel or AMD, or whatever else processor maker, but I somewhat feel they have already spent much more time thinking about it, and chances are they probably can only laugh at what we are saying here, seeing how unsophisticated our proposed backdoors are. So, my Dear Reader, I think you've been just wasting time reading this post ;) Sorry for tricking you into this and I hope to write something more practical next time :)

@vierito5: your statement seems to be incoherent: if you believe there are even more sophisticated backdoors than I propose, then why are you so sure they would be leaked out? How would you approach "detection" even of the backdoors I proposed?

I didn't mean to say more sophisticated only more effective in a real world without people bothering about them. I think hardware backdoors will increase in the future and a lot of people will become interested in this field but we already are full of software backdoors.

As you say, Intel and AMD should take very careful steps if they pretend to include such technologies because they are studying each other's designs despite they can't probably fully reverse engineer them.

Hardware reverse engineering and chip analyzing is growing, so I'm sure these kind of 'features' would be leaked some time if existed.

Let's propose the example that Intel could probably had included a hardware backdoor but AMD doesn't, that would reach front page on digg.com. If an "average user" knows that, although not being sure of it he's gonna jump and buy AMD CPUs, he bothers about it and he can choose. But here's the point, that "average user" also knows that very often trojans and backdoors are included in the cracks he uses every day to make his Windows non-genuine apps work, and he doesn't care at all!! He is still gonna use them.

What I want to say is, that for the backdoor operator, let's say... NSA, is more effective (not more sophisticated of course) to include one in the bittorrent of a non-genuine version of Windows than making, let's say Intel, to include a backdoor in their chips. It is just easier! and people would still download and install that Windows version. There are tons of cracks filled with malware, average user doesn't care, he only wants to use that software for free. That's what make that kind of backdoors more effective.

On your approachment to CPU backdoors, I do think they would need something like a built-in eeprom as you have great explained.

Don't most modern processors have reprogrammable microcodes? I think that interface is usually cryptographically protected, but if you can reprogram the microcode for unprivileged instructions into performing privileged operations, isn't that the same thing? Isn't that *already* a backdoor for the vendor?

@anon: FPGA device is hardware indeed, but if you create something *for* an FPGA device, this something is a "program" for how FPGA should configure the actual gates. So, an FPGA-based application is more of a software then hardware to me (with the difference that there is no instruction interpretation being done by any CPU).

Also what King at el. did was again just a simulation -- it wasn't a processor that you could put into people's machines (replacing mainstream CPUs). There is a very long road from what they did to producing a *real* CPU -- road counted in billions of dollars.

Chip maker have always implemented undocumented features in their processors.cpu bugs, back doors, and there is no reason not to believe that they cannot embed such disguised features or backdoors. The question is has to with this: Can we determine whether these features malicious or not.

However, I believe smart card chips can make these cpu-based backdoors more realistic in our world. RAM, ROM, EEPROM all there lol..

It's not a religious belief lol... I am saying (based on the information I know) that certain chips on smart cards may offer better hardware resources and motivations to implement these backdoors. and this my be more realistic based on the daily use of these chips in every aspect of our life. However, maybe it is worth knowing more about smart and turn this belief into reality. Additionally, don't forget there are many papers and arguments (some you referred to) are actually based on hypothetical situations...

Hi JoannaWell about the Springer policy it is just acadmic way of management of scientific results. Papers must be reviewed in order to be sure that the results are valid and it has a cost for Springer(note that authors, reviewers and editors are working for free. Ultimately it is not your case -:). I remember the time when yourself was asking for US $ 200 000 to disclose your code. Can you assert that you have published everthing at the present time -:)

I like the idea of the challenge-respond aspect. I have another proposal for a CPU backdoor that could also be implemented in this fashion.

Instead of implementing the CPU backdoor in terms of magic register values/instructions which require you to be able to execute code on the target machine (unfortunately not that hard with all the bugs out there...) one could perhaps implement it in the memory fetch mechanism of the CPU. The CPU fetches data and code from memory in 16-byte blocks. At the simplest, one could define a special, magic 16-byte block that when fetched by the CPU would cause it to do something special, like entering ring 0.

Such a mechanism might be more easily exploited than the code/register approach. For instance, one might be able to trigger it simply by submitting a TCP/IP packet containing the properly aligned prefix. One might even be able to trigger it via a web form.Firewall software would probably not offer any protection, since the firewall would have to fetch the TCP/IP packet from memory in order to inspect it, thereby triggering the exploit itself, even if it rejects the packet. Automatic prefetching by the CPU's make this much more difficult (if not impossible) for the software to control. Alignment of data shouldn't be a problem since one can simply submit 16 different packets with the magic value aligned in different ways. Or the values could be defined so that they work even if they are rotated or split over a memory boundary (a simple state machine inside the CPU could keep track of it).

If one was to implement such a mechanism, one might as well make it more flexible ;) The mechanism could cause the CPU to enter a special attack mode where the data stored in the memory addresses that follow would be executed in ring 0, or it could be a more complicated protocol where the contents of magic values would get accumulated in a memory store in the CPU that would get executed when a final trigger prefetch was encountered. Likely, the actual magic values wouldn't be a specific value. Rather, it might be that there are many magic values, all identified by having some special cryptographic structure. For instance, it could be that the last 8 bytes of a block would have to be a MAC over the first 8 bytes etc. using a special MAC key contained inside of the CPU. The first 8 bytes (likely encrypted themselves) could then be used for transmitting information to the CPU about the actual exploit, enabling a more complex behavior (i.e. logging of 'interesting' data over longer periods of time, with support for some type of back channel).

Of course, the system has to be made so that the mechanism is never triggered accidentally. This could be achived in many ways. Since an attacker will likely be able to feed essentially an unlimited number of fetches to the victim CPU (due to the ease of getting memory fetched on the victim CPU) one could make a protocol requiring several specific magic values to be sent after another to make the CPU enter the mode where it would permit execution of functional magic values.

It doesn't even stop at the CPU. One could imagine similar exploits in the chipset - the chipset could have a mini-CPU inside that is activated by these memory patterns. Since the chipset has access to all hardware (including network card) and memory it would be easy for it to transmit a memory dump to another computer etc.

Just some thought. I agree that it is in some sense useless to talk about these things since it's hard to detect them, at least as long as they are not activated. On the other hand, it is good to create awareness of what is theoretically possible. I don't have reason to believe that this has been or has not been implemented in commercial CPU's (there are many concerns in addition to the technical), but since it is definitely technically possible it seems likely that it will happen some day. Maybe all these blog posts about it might actually make it happen sooner rather than later ;)

I have an idea for a possible remedy - at least in security critical setups. A CPU can not really do any damage in itself (after all, it is just a piece of silicon with some pins!), it relies on having other hardware around it to carry out its commands. If the behaviour of the CPU was specified (as it used to be) down to exactly how it behaves in response to different programs (i.e. in which order does it fetch memory, when does it prefetch etc.) it should be possible for independent manufacturers to produce identical CPU's that could run at the same time. A computer would have to be equipped with at least two CPU's from different vendors. A couple of extra "surveillance chips" (from different vendors) could monitor the behaviour of the CPU's - both CPU's take the same input so should always give the same output if they are bugfree and honest. The chips should have a way to halt execution and warn the operator in case of discrepancy.To avoid this happening too often due to honest bugs, all new CPU's should be carefully screened against reference CPU's etc. with a wide range of workloads to make sure they are in fact close to bug free at least under most use. This is much easier to show than to show they don't have deliberate exploits which by design would be hard to create a test case for without knowing it.

This would also lead to many other benefits - hopefully less buggy CPU's. The redundancy could also be used to guarentee against hardware failure as is done today, but with CPU's from the same vendor. If halted execution is undesirable the device could contain CPU's from 3 or more vendors so it could continue execution even if an exploit/bug was triggered.

Maybe not be realistic for common PC's but for certain use (military etc.) it might be useful.

Loic, in his article (that I referred in my post) actually mentions this potential solution. In practice I don't expect this happening for any general-purpose processors, like e.g. IA32-based. When you look at AMD vs. Intel and comparable processor generations, they are dramatically different -- support different kind of extensions, etc.

@Alfred: Isn't it just a question of how you define hardware? As such the discussion isn't very interesting. I don't see any "natural" definition that makes FPGA's distinct. "Standard" CMOS CPU's are made from masks that can be represented to full precision on a computer. The masks are derived from the same synthesis languages used for FPGA's (of course with differences in the type of synthesis done). In both cases, the richness of the language in question is mapped to something simple, be it LUT configurations (FPGA's) or transistor configurations (CMOS). It's just a different set of primitives used for realizing the design. Furthermore, standard CPU's are filled with microcode and software themselves, blurring the distinction even further.

@Joanna. I agree. If inspection with electron microscope truly was possible, it might be possible to audit the VHDL code and synthesis tools, repeat the synthesis and compare the results to what was observed from random samples of different CPU batches (to verify they are produced from the correct VHDL code).

Actually all CPU’s have back doors (at least since I been in the industry back to the P1 days), The back doors are fully documented for those of us building / designing system boards and sign system manufacture NDEs. Their primary purposes are for debugging and wear leveling the boards. The back doors give you full access of the BIOS, CPU firmware, all CPU states, and direct memory access out side of the OS. The Back door access is actually hardware implemented, and by the agreements that a manufactures sign the hardware to access the back door is not supposed installed on production boards. That is why when you buy system boards (Evan the “Ultra Deluxe” models) you will see missing componentry around the CPU. There have been some exceptions where a manufacture screwed up and left the required back door hardware implemented on the production runs. There have been no publicly recorded screw ups like (that I know of) this since one of the old P4 Abit models (circa 2003-4).

To the guy above, the micro codes you are thinking of are CPU specific extension for the BIOS. It is never actually written to the CPU. By design the CPU should be flashed only once in it existence, after the elctro-thermal diagnostics during manufacture. The BIOS micro codes tell the system how to avoid run specific documented errata that would cause system issues.

@Rob: Firstly, these backdoors are hardware features, they can't (hopefully) be activated by software means as the ones discussed here. I think even on the commercial boards you can put an ICE between CPU and socket to analyze bus/memory traffic etc. but it would require physical modification of the machine. An ICE is quite expensive too! However, I wonder what would happen if an ICE'd machine was used for TXT, i.e. if the CPU would refuse to enter SMS mode with the ICE there or it would give different PCR values. If not, then one could abuse e.g. remote attestation into believing you were running some code and then modify it using the ICE. Same for DRAM "ICE's".

On microcode you are actually wrong. Since Pentium Pro, Intel's CPU's have had a microcode update feature allowing into Intel to correct various errata. In Intel's errata sheet you will sometimes notice the sentence "The BIOS may carry a fix to this errata" or something like that.These microcode updates are incorporated into BIOS updates that are installed by the end user. The BIOS then updates the CPU on every boot (the effect only lasts till the CPU is reset). Intel has fixed errata that way.

Unfortunately many mainboard manufacturers 'abandon' their boards quite quickly and don't produce new BIOS updates containing the microcode updates. For this reason, both Linux and Windows has a microcode update driver that can even install the microcode updates after the system is booted. The o/s simply has a big database of microcodes for various CPU's and will which if it has newer code than currently loaded.

The microcode itself is encrypted (at least on Intel) and probably signed or MAC'ed by some means to prevent unauthorized updates. I'm not aware of any exploits of the feature.

AMD also has this feature, and at least in the past (2004-ish) the updates were not encrypted, and as far as I've read some researcher succeeded in loading custom upgrades sometimes causing erratic behaviour of the CPU, crashes etc. I don't think they succeeded in doing anything productive with it.