May 30, 2007

The Panasonic R/W/T/Y series is a set of ultralight notebooks, also available in a US model. I like the Y4-Y7 for their large screen, light weight, and long battery life.

There are a couple things needed to get suspend/resume working on these models in FreeBSD. To get the backlight on and video working again, we need to trigger the 0xc000 video BIOS reset. Set the following in /etc/sysctl.conf:

hw.acpi.reset_video=1

The built-in mouse is stuck once we resume. We need to reinitialize it by setting the following in /boot/loader.conf. This should really be done automatically since the reinitialization should work on most systems. But more testing will have to be done before that is enabled by default.

hint.psm.0.flags="0x2000"

With these changes, I can suspend/resume to RAM successfully. For more info, check the handbook.

May 28, 2007

There are a number of things to try when developing such attacks, depending on the device and countermeasures present. We’ll assume that the attacker has possession of several instances of the device and a moderate budget. This limits an attacker to non-invasive and slightly invasive methods.

Timing attacks work at the granularity of entire device operations (request through result) and don’t require any hardware tools. However, hardware may be used to acquire timing information, for example, by using an oscilloscope and counting the clock cycles an operation takes. I call this observation point external since only information about the entire operation (not its intermediate steps) is available. All software, including commonly used applications or operating systems, need to be aware of timing attacks when working with secrets. The first published timing attack was against RSA, but any kind of CPU access to secret data can reveal information about that data (e.g., cache misses.)

A common misconception is that noise alone can prevent timing attacks. Boneh et al disproved this handily when they mounted timing attacks against OpenSSL over a WAN. If there is noise, just take more measurements. Since noise is random but the key is constant, noise tends to average out the greater your sample size.

Power, EM, thermal, and audioside channel attacks measure more detailed internal behavior throughout an operation. If the intermediate state of an operation is visible in a timing attack, I classify it as an internal side channel attack as well (e.g., Percival’s cache timing attack.) The granularity of measurement is important. Thus, thermal and audio attacks are less powerful given the slow response of the signal compared to the speed of the computation. In other words, they have built-in averaging.

Simple side channel attacks (i.e. SPA) involve observing differences of behavior within a single sample. The difference in height of the peaks of power consumption during a DES operation might indicate the number of 1 bits in the key for that particular round. Since most crypto is based on an iterative model, similarities and differences between each iteration directly reflect the secret data being processed.

Differential side channel attacks (i.e. DPA) are quite a bit different. Instead of requiring an observable, repeatable difference in behavior, any slight variation in behavior can be leveraged using statistics and knowledge of cipher structure. It would take an entire series of articles to explain the various forms of DPA, but I’ll summarize by saying that DPA can automatically extract keys from traces that individually appear completely random.

Glitch attacks (aka fault induction) involve deliberately inducing an error in hardware behavior. They are usually non-invasive but occasionally partially invasive. If power lines are accessible, the power supply can be subjected to a momentary excessive voltage or a brown-out. Removing decoupling capacitors can magnify this effect. If IO lines are accessible, they can be subjected to high-frequency analog signals in an attempt to corrupt the logic behind the IO buffer. But usually these approaches can be prevented by careful engineering.

Most glitch attacks use the clock line since it is especially critical to chip operation. In addition to over-voltage, complex high-frequency waveforms can induce interesting behavior. Flip-flops and latches have a timing parameter called “setup and hold” which indicates how long a 0 or 1 bit needs to be applied before the hardware can remember the bit. High frequency waveforms at the edge of this limit cause some flip-flops to register a new value (possibly random) and others to keep their old value. Natural manufacturing variances mean this is impossible to prevent. Pulse, triangle, and sawtooth waveforms provide more possibilities for variation.

Optical and EM glitch attacks induce faults using radiation. Optical attacks are partially invasive in that the chip has to be partially removed from its package (decapping). EM attacks can usually penetrate the housing. The nice thing about this glitching approach is that individual areas of the chip can be targeted, like RAM which is particularly vulnerable to bit flips. Optical attacks can be done using a flash bulb or laser pointer.

May 25, 2007

We had a pretty good time at Zeitgeist but boy was it chilly. We’re going to try meeting inside this time at 21st Amendment, downtown SF. Date is June 20, 7 pm, details at http://www.sockpuppet.org/baysec/. See you there!

Comments Off on Next Baysec: June 20 at 21st Amendment

May 24, 2007

The first step in understanding glitch attacks is to look at how hardware actually works. Each chip is made up of transistors that are combined to produce gates and then high-level features like RAM, logic, lookup tables, state machines, etc. In turn, those features are combined to produce a CPU, video decoder, coprocessor, etc. We’re most interested in secure CPUs and the computations they perform.

Each of the feature blocks on a chip are coordinated by a global clock signal. Each time it “ticks”, signals propagate from one step to another and among the various blocks. The speed of propagation is based on the chip’s architecture and physical silicon process, which together determine how quickly the main clock can run. This is why every CPU has a maximum (but not minimum) megahertz rating.

In hardware design, the logic blocks are made up of multiple stages, very much like CPU pipelining. Each clock cycle, data is read from an internal register, passes through some combinational logic, and is stored in a register to wait for the next clock cycle. The register can be the same one (i.e. for repeated processing as in multiple rounds of a block cipher) or a different one.

The maximum clock rate for the chip is constrained by the slowest block. If it takes one block 10 ns to propagate a signal from register to register, your maximum clock rate is 100 MHz, even if most of the other blocks are faster. If this is the case, the designer can either slice that function up into smaller blocks (but increase the total latency) or try to redesign it to take less time.

A CPU is made up of multiple blocks. There is logic like the ALU or branch prediction, RAM for named registers and cache, and state machines for coordinating the whole show. If you examine an assembly instruction datasheet, you’ll find that each instruction takes one or more clocks and sometimes a variable number. For example, branch instructions often take more clock cycles if the branch is taken or if there is a branch predictor and it got it wrong. As signals propagate between each block, the CPU is in an intermediate state. At a higher level, it is also in an intermediate state during execution of multi-cycle instructions.

As you can see from all this, a CPU is very sensitive to all the signals that pass through it and their timing. These signals can be influenced by the voltage at external pins, especially the clock signal since it is distributed to every block on a chip. When signals that have out-of-spec timing or voltage are applied to the pins, computation can be corrupted in surprisingly useful ways.

May 7, 2007

(First in a series of articles on attacking hardware and software by inducing faults)

One of the common assumptions software authors make is that the underlying hardware works reliably. Very few operating systems add their own parity bits or CRC to memory accesses. Even fewer applications check the results of a computation. Yet when it comes to cryptography and software protection, the attacker controls the platform in some manner and thus faulty operation has to be considered.

Fault induction is often used to test hardware during production or simulation runs. It was probably first observed when mildly radioactive material that is a natural part of chip packaging led to random memory bit flips.

When informed that an attacker in possession of a device can induce faults, most engineers respond that nothing useful could come of that. This is a similar response to when buffer overflows were first discovered in software (“so what, the software crashes?”) I often find this “engineering mentality” gets in the way of improving security, even insisting you must prove exploitability before fixing a problem.

“We have improved on Differential Fault Analysis. Rather than needing about 200 faulty ciphertexts to recover a DES key, we need between one and ten. We can factor RSA moduli with a single faulty ciphertext. We can also reverse engineer completely unknown algorithms; this appears to be faster than Biham and Shamir’s approach in the case of DES, and is particularly easy with algorithms that have a compact software implementation such as RC5.”

This is quite a powerful class of attacks, and is sometimes applicable to software-only systems as well. For instance, a signal handler often can be triggered from remote, inducing faults in execution if the programmer wasn’t careful.

Of course, glitch attacks are most applicable to smart cards, HSMs, and other tamper-resistant hardware. Given the movement to DRM and trusted computing, we can expect to see this category of attack and its defenses become more sophisticated. Why rob banks? Because that’s where the money is.

May 4, 2007

It seems some people still miss the point about my previous post — the focus is on the misleading PR approach, not the contents of the talk or speaker’s ability. So in that vein, let’s compare the two articles, both post-talk and pre-talk (same author, same publication, two weeks apart.)

“Jack plans to show how his attack could be used to make changes to the firmware of a router so that it injects a malicious code into any executable files downloaded from the Internet” (i.e. this talk)

The second article gets it right. It has enough details to know the general type of attack being discussed, downplays the hype, and lacks the misleading focus on JTAG. If the first article had never been written, I wouldn’t be discussing any of this.

The important thing to note is that the same author wrote both, so the only difference had to be the information that was provided to him. It was easy for me to recognize the PR influence since previous companies I’ve worked at have done the same thing. Security researchers, please make the effort to provide accurate details when announcing your talk, despite pressure from your PR department to overhype it or withhold information necessary to even know the topic.