One of the challenges I looked at during the last PlaidCTF was 'drmless', and since I didn't see a write-up yet I thought it'd be nice to publish one. Again this is cross-posted on my blog and int3pids blog.

For this challenge, we were provided with a drmless.tgz file containing a few things:

.drmlicense : A 16 byte file with some hexadecimal contents

cool_story.enc : A 'cool story' encrypted.

cooler_story.enc : An even 'cooler story' also encrypted.

drmless : an ELF binary.

readme.txt : challenge instructions.

If we read the instructions, we see the following text:

Here's a cool story from PPP!
We wrote an even cooler story, but you need to pay
us if you want to unlock it. TEN THOUSAND DOLLAR.

If we run 'drmless cool_story.enc' after extracting the archive on a 32 bit Linux machine we get a nice decrypted file. If we attempt to do the same on the cooler_story.enc file, we are told that this is a binary file and asked whether we want to view it anyway. Sounds like the 'less' program, doesn't it?

So we have a file we can decrypt, and one we cannot. The objective is to decrypt the other one, presumably by altering the drmlicense or bypassing it in some way. Let's look at the binary to find out what's going on.

At startup, the binary loads the .drmlicense file and reads 16 bytes into the 'license' global buffer:

So apparently 'drmless' uses this aes_wb_decryptor function to decrypt the data, and then XORs it with the license. It is interesting to note that only the input buffer is passed to the decryptor, which means that it either uses a hardcoded key or it is stored in some global variable.

Also, it is interesting to note that the name indicates this is probably a whitebox implementation. This means the implementation is obfuscated in an attempt to withstand static/dynamic analysis, and most likely the key is mixed into the algorithm itself in some way.

In any case, I have read documentation on whitebox cryptography before, and also analyzed some implementations of it. Based on that experience, I had some ideas on how to approach a whitebox implementation, but I also knew that I should probably focus on the stuff around the whitebox before diving into the whitebox itself.

Just for fun, I look at the implementation and quickly saw that each round of AES is splitted into several functions, and they are all called sequentially.

This just decrypts the first 64 bytes of a file and performs some checks on it. If the checks pass, the function returns 1, otherwise it returns 0. At this point I strongly suspected this was all the protection that was to be found. So I decided to set my drm license to all-zeros, force the 'drmprotected' function to return 1 and dump the data into a file.

I did this using this vtrace script and running it with the cool and the cooler story. The script still requires you to go through the whole output by pressing 'space' until reaching the end, in the same way you'd navigate through a file with 'less'.

After this, I used xortool from Hellman to analyze the output files. When run with the 'cool story' it lead to the original .drmlicense contents... so I ran it against the cooler story and used the resulting key to decrypt the output:

Last Friday I delivered a presentation at Insomni'hack about embedded security on how to break 'modern' embedded systems using fault injection and side channel analysis.

In this post, I will summarize my presentation and provide links to additional reading material and, when possible, open source software/hardware references.

Traditional embedded systems are awfully insecure in a hostile environment: they usually run unauthenticated code from flash/rom memories, contain a JTAG debugging interface allowing runtime control, etc. In addition to the lack of runtime protection, traditional systems store data in flash-like memories either in the clear or encrypted with a key that you can also find in the firmware.

Thus, in this scenario both data and code integrity and confidentiality can be easily compromised by an attacker with access to the device. Modern systems are most usually based on a complex System On Chip solution as shown in this figure:

SoC based system

In such a system, several techniques are used in an attempt to solve these security challenges:

- Secure Boot: this mechanism refers to a device that boots from internal code (e.g. an internal boot ROM) and performs some kind of authentication of the firmware to be loaded on the device. Obviously, for this to work the system needs to guarantee that the internal boot ROM cannot be easily tampered with and that code is properly authenticated.

- Secure data storage: in order to protect sensitive data, modern devices often include a secure storage in the form of some internal flash or One-Time-Programmable (OTP) memory. Access control rules are implemented on this storage in order to ensure sensitive data is only accessible to those parts of the chip that require it.

- Key stores: As a specialization of secure data storages, key stores allow the system to securely store cryptographic keys. Often these stores allow the main CPU to write keys into them and to instruct the on-chip cryptographic coprocessors to use them. Therefore, in this way keys remain secret even if one achieves runtime control of the system.

- Memory protection: Many modern systems implement a protection mechanism for their main DRAM memory. In order to avoid attackers from reading data off RAM chips, sniffing the bus or modifying the data easily, they implement DRAM scrambling / encryption mechanisms. These mechanisms are usually quite weak due to timing restrictions, but nonetheless pose some barrier to the attacker.

- Debug interface protection: In addition to the above, debug interfaces such as JTAG are not (supposed to be) left open on secure systems. Since usually the system developer wants to be able to debug the target during development time and in case of errors, the most common solution involves setting up a password protection scheme.

Thus, the JTAG interface is usually locked until a password/key is presented in some way. This can be either a hardcoded key, device-specific key, or even a challenge-response protocol.

The remainder of the presentation (and thus this post) focuses on how to get to these secret keys starting from a device implementing all these mechanisms.

Of course, depending on how well these mechanisms are implemented, it might be very well possible to achieve this by means of logical attacks (e.g. overflows, improper bound checking, etc.).

However, for the sake of argument we are going to assume that this is not possible and see what we can do at the more physical/hardware level.

Step 1: Achieving runtime control

Say we have no easy external interface providing unauthenticated data to the target that we can exploit (i.e. no browsers, no networking, no filesystem parsing... ).

What do we do? If you have been reading this blog before, you probably know I am thinking about fault injection. As I described in that post, we can make a device fail by bringing it just outside its normal operating conditions. This can be achieved by modifying the voltage supply to introduce short glitches (aka VCC glitching), by injecting optical energy (laser/optical fault attacks), EM energy, etc.

Now, if we time our faults precisely at the moment when the internal BootROM is verifying the integrity of external code, we might be able to bypass it and run unauthenticated code.

In absence of specific countermeasures, it is almost always possible to achieve this. Therefore, it is important to implement appropriate protection and detection mechanisms in order to guarantee the integrity of boot code. See this post for references on how to do this.

Also, keep an eye on Die Datenkrake, which promises an open source hardware and software design that can be used for fault injection purposes amongst others.

Step 2: Getting to the keys

So once we have runtime control of the target, we are usually allowed to encrypt/decrypt data at will. However, what we really want is to obtain the key so we can share it with our friends, sell it to the highest bidder, or just post it on twitter for the lulz.

So again, if we cannot find an easy way to obtain the keys through logical means, we will resort to side channel information. We can do this in two ways: by monitoring the chip or by injecting faults on it.

Step 2.a : Keys through side channel information

When a chip is functioning, information about its operation is leaked to the surrounding environment. Think of it as hearing stuff that you are not supposed to be hearing (e.g. what your neighbours where doing last night 😉 😉 ).

Even though you are not supposed to hear it, you actually do hear it. So you know what they are doing, and you make assumptions about what's happening in there.

The same happens with chips. When a chip is functioning, it takes more or less energy from the power supply depending on what it is doing. It also takes more or less time to compute, creates stronger or weaker EM signals (remember tempest?) around it or emits tiny photons depending on the activity it is carrying out.

Now, this activity is of course related to the process it is performing (e.g. encrypting some data) but also to the data it is using (e.g. input data, keys, output data...).

So what you do is you ask the chip to use that precious key he is not willing to show to you, and you monitor it while it is doing so. Kind of like the polygraph tests three-letter-agencies like to perform on super-bad terrorists.

How do we link this to the keys you are asking? Here is the trick: we ask the chip to perform lots of operations with the key using random input data. At the same time, we record side channel traces (let's say power consumption traces, using an oscilloscope).

Now, due to the fact that the power consumption is linked to the data used for the computations (and thus the key), we will use statistics to find out what the key is. What we do is we split the key in small chunks used by the algorithm (e.g. in DES 6 bits of key are mixed with 6 bits of data and fed into the S-boxes).

We refer to these internal results as intermediate values, and we create a model of how these values will leak to the power consumption. For instance, we assume that the Hamming Weight of these values is leaked in the power consumption. This means that whenever these values are computed, the consumed power varies depending on their HW (and probably also other variables).

Now, since there is a dependency between the intermediate values and the power consumption, we can use statistics to find out the amount of dependency. Since those intermediate values depend on small chunks of key, we can try all the possibilities and find out how much statistical dependency there is between the power traces and those intermediate values. The key guess showing more statistical dependency will probably be the correct key.

If we repeat this process for all the key chunks, then we obtain the whole key. Now you might be asking... how do I do this at home? Well, if you are a security test lab you can buy products like Inspector (from my employer ;-p ) or CRI Workstation.

But if you are not, you can take a look at this page which contains information from a presentation delivered at BlackHat EU 2013: http://www.newae.com/blackhat. You can also take a look at OpenSCA, a Matlab-based framework for side channel analysis.

Step 2.b: Getting keys through fault injection

So as I said above we can also get keys through fault injection. Depending on how the system is designed, it might be possible that the memory cells containing the keys are actually mapped in the memory space of the system.

In those cases, an access control mechanism is placed in between the bus initiators and the bus target (i.e. the memory itself) in order to identify whether the request is allowed or not.

As you can imagine, glitching this access control check would result in access to the secret keys for initiators that should otherwise have no access to them.

But sometimes you won't be as lucky and you won't have the ability to request a read of the key (even if it was denied) and attempt to bypass the access control checks... simply because the key is not mapped anywhere in memory.

What do you do then? DFA is the answer, my friend. With Differential Fault Analysis you can inject faults into the cryptographic algorithm itself, and by observing the changes in the output you can recover the secret keys. See this post for more information.

Conclusion and further reading

As you can see, when a device is under the control of an attacker (aka user in some cases ;-p ), there are a number of attacks that need to be considered in addition to the usual software/logical vulnerabilities.

By abusing environmental variables such as power/energy supply and consumption, radiation, etc. one can exploit hardware that would otherwise be secure.

Therefore, as an embedded system designer it is important to protect your code and hardware. Even in the presence of countermeasures, it is critical to test those countermeasures and verify that they actually do what you intended them to do (the company I work for can help you there 😉 ).

In order to protect against SCA attacks, you need to make the environmental variables statistically independent of your secret data or hide that dependency somehow. You must know that most countermeasures are covered by CRI's patents, so you might want to check with them before you implement them in your products.

In order to protect against fault injection, you need to introduce redundancy in your computations and make your software AND hardware resilient to induced errors. You can take a look at this paper for some ideas on how to do this at the application level.

Additionally, for more reading material on the subjects you can take a look at CHES (mostly side channel analysis) and FDTC (mostly fault injection) from the last couple of years. The DPA book is also a very good read with lots of background information on DPA attacks.

Of course, if you have any concrete inquiries you can address me in the comments, on twitter or elsewhere.

Last week I started playing with the exploit exercises from the Fusion VM at exploit-exercises.com. The first level was a straightforward stack overflow without any mitigations. Next came one with ASLR for the stack, which was easy to bypass with a simple jmp *esp found in the main binary. Next level up added NX, so I had to resort to ROP, which was also simple by using ROPGadget to generate a payload based off libc (and fixing up the chain due to a bug in ROPGadget 3.3).

Now, level04 was slightly more difficult. It is a web server compiled with stack cookies (SSP), position independent code (PIC/PIE) and configured to run with ALSR and NX. So this gives us a few protections we need to bypass or work around somehow.

Because of this, I thought it was interesting to share and maybe request some feedback... who knows, maybe there are easier solutions and I am just complicating my life 🙂 You can also find a copy of this post at int3pids.com.

Additionally, the web server generates a random password which you need to provide in order to hit the vulnerability and control the instruction pointer. Looking at the validate_credentials function, we see this code:

So there is a clear timing leak on line 208 which tells you how many bytes were wrong. Additionally, the base64_decode function seems to take the length of the output buffer but then this is what it does with it:

So, it just overwrites it with the computed output buffer and goes ahead and decodes it. Thus, we have a stack based buffer overflow but in order to exploit it we first need to recover the random password so that the function returns and uses the pops the return address off the stack.

Discovering the password is fairly easy thanks to the timing leak: we just brute force byte by byte, measuring the time to see if we found the right character or not. In my case, the timing difference is below 0.001 when we hit a good value and above it when we do not, so I just check for that:

So this piece will find us the password. What's next? Well, we can use this to overwrite some data and hopefully hijack EIP. But... as I mentioned above, there is a catch: there is a stack cookie protecting EIP.

Since we can feed any binary data of any size (thanks base64 decode 🙂 ), we can brute force the stack cookie byte by byte. When we see a stack corruption, we guessed an incorrect value. When the program goes on normally, we hit the right result. Again, this is the code implementing that part:

At this point, we can reliably set EIP since we know the stack cookie value. However, that's not very helpful since everything is randomized thanks to full ASLR and a PIC binary. The first thing I tried here was overwriting the last byte of the saved EIP to land on a call to printf() or some other function returning data. This would help me obtain some data from the stack, and then I could create a ROP payload based on the leaked data.

However, then I hit another problem: ebx is used by the code to contain a reference to the binary loading address + some offset. This is done to achieve position independent code. Unfortunately, the exploit also overwrites ebx, which was stored by validate_credentials just between the stack cookie and the return address.

So what I did next was guessing ebx in the same way I did for the stack cookie. However, here I started with the known last byte (the lowest 12 bits of the binary are not randomized) and brute forced the rest. Again, the code is very similar to the previous one:

This code first obtains a 'normal response'. Next, it starts bruteforcing ebx byte by byte, comparing the response with the 'expected response'. If it matches, the right value was found, else it iterates to the next candidate.

Now, with this done we are able to compute the base for the binary by subtracting 0x4118 from the leaked ebx value. After finishing the exploit I realized that this was actually not needed, since the stack smashing detection code prints out a very helpful memory map into stderr, which is redirected to our socket. Anyway, with this code the exploit doesn't rely on that leak so it would work even if stderr is not redirected to our socket.

At this point I had two options: use this to leak the libc or make a ROP payload based on the binary. Since I already had a payload for libc made with help of ROPGadget (which I had to correct due to some bug by the way), I decided to return into write and leak the first GOT entry to obtain the libc base. Therefore, the final stages of my exploit look like this:

After this, a nice shell is running for us. With only one catch: an alarm is set by the webserver, and it will trigger a SIGALRM signal soon. So what I do is just executing trap '' SIGALRM; as a first command so that the signal is ignored. The full exploit code is here for you to play with 🙂

Conclusion

Even in the presence of quite some countermeasures (fully randomized addres space, non-executable data areas, stack smash protection), it is still possible to achieve reliable code execution.

In this case, we have abused the fact that the vulnerable server does not call execve() for every new child to brute-force the stack cookie and also to discover the base address of the executable itself. After this is done, it is basically game over.

However, these techniques have two requirements: the address space has to remain constant between requests (albeit random every time we restart the whole server) and we need to be able to partially corrupt the data with fully arbitrary values. If these requirements were not met, another info-leak bug would have been required in order to obtain this data.

Thus, as an application programmer it is always recommended to call fork()+execve() instead of just fork(), so that the OS re-randomizes the whole address space.

So you read my last post and were left wondering how the heck you would be able to inject temporary faults into hardware devices? Here is your answer 🙂

In that post, I explained how to extract keys from cipher implementations assuming we could somehow inject faults during the execution of the cipher. Besides DFA attacks, I also said we could achieve something similar to what we do with software protections (i.e. modifying control flow, bypassing checks, etc.) using fault injection techniques. I thought it was worth giving a few examples of how to inject faults in real hardware to complete the picture.

When hardware is designed, it is engineered to work under certain conditions of temperature, input voltage ranges, clock frequencies, etc. The hardware is tested under those conditions and is supposed to function in that range... and there are no guarantees that it will operate correctly if you bring it outside them.

I guess you follow my reasoning already 🙂 So if we want to inject faults into hardware, a good place to start looking is exactly in those gray areas around the operating conditions. Of course, we want the chip to be functioning properly most of the time, and we want it to fail at the precise moment at which it is computing something sensitive (say a secure boot check, or an RSA-CRT signature). Thus, we probably need to have some control over the timing, and inject the fault only temporarily.

In this post I am introducing from an intuitive perspective three ways of injecting faults: voltage, clock and laser/optical glitching.

Voltage glitching

The first example I want to touch on is that of voltage (or VCC) glitching. In this case, we typically run the chip at its nominal voltage (say 3.3V), and whenever we want to inject a fault, we drop voltage down to e.g. 1V.

Example of voltage glitching. The supply voltage is set to 0.8V during a short moment of time.

At this moment, the input voltage to certain gates within the chip will be too low due to the lack of supply voltage. Thus, these gates will receive an input voltage which is below the threshold that indicates whether the signal is a zero or a one, no matter what value it was supposed to be.

Then we increase the voltage again to the nominal voltage of 3.3V, and we have a functioning chip that just failed to execute one of its operations. For instance, it failed to execute a conditional jump and fell through to the code that we wanted to have executed.

The trick here is to find the proper parameters for the glitch: voltage drop (do we go to 0V, to 0.4V, to 1V ...), length of the glitch (a few nanoseconds, a few microseconds?) and the timing. Typically, if voltage drop and length of glitch are too small, the chip will function properly. If they are too large, it will just die (mute, reset, or maybe even physically damaged). Of course, if the timing is wrong then the attacker will never see the effects he wants to see.

As a protection against this kind of glitching, most modern smart cards and some embedded devices incorporate voltage sensors that detect whether the voltage went below a certain value or not. However, this attack is still effective against a wide range of products.

Clock glitching

Clock glitching is similar to VCC glitching in the sense that it affects another critical parameter of the chip that can be controlled by the attacker. In this case, what we do is injecting spurious clock cycles that are way shorter than the original clock cycle.

Example of clock glitching. A very short spurious clock cycle is inserted at the beginning of a normal cycle.

Since the internal logic of the chip operates based on its clock, a short clock cycle will trigger a new operation before the results of the previous one were completely computed or propagated through the device.

Imagine you have to multiply two values, and then add a third value to them. Normally, multiplying values takes longer than adding them up. Thus, the clock frequency for a chip that only performs these two operations would be long enough for the multiplication to occur and its result to be ready at the input of the next stage, since that is the critical operation.

Now, if I tell you to start adding up before you received the multiplication result, you will be using invalid (old?) data instead of the correct result. Thus, you will fail at computing the correct result.

Clock glitching exploits exactly that situation. Again, finding the right parameters in this case is the key to success.

As for hardware level protections, frequency sensors as well as using internally generated clocks (using on-die oscillators) are generally the most common ways to protect against clock glitching.

Additionally, fast clocks make these attacks less practical for attackers, since they need to inject even faster clock cycles and synchronize their attack at a higher speed.

This is why clock glitching is less effective nowadays: most high-end smart cards use their own on-die clocks, and embedded systems require much higher clock speeds.

Optical glitching

After clock and VCC glitching, we move to the real king of current fault injection attacks. Optical fault injection, or most commonly Laser fault injection, uses a light beam to inject faults into semiconductor devices.

How is this possible? Well, light (physicists, don't kill me!) basically consists of a number of photons carrying a certain amount of energy. Roughly, when these photons reach a semiconductor (typically silicon in electronic devices), their energy is absorbed by the semiconductor.

Given enough energy, electrons that would otherwise be still within the semiconductor will start to move, creating current. So, for our chips, this means that some of the transistors in the chip will actually switch when they should not!

The big difference between this fault injection technique and the previous ones is that in this case we actually have spatial selectivity (or resolution): we can choose which parts of the chip we attack by pointing the laser beam to them.

Of course, this is very powerful but at the same time it adds extra complexity to the attack: now you need to find the sensitive spots in the chip. As before, there are a number of parameters one needs to play with in order to successfully inject faults: glitch timing, glitch length, wavelength of the injected light and amount of energy injected.

Also, this attack is semi-invasive: we need to open up the chip package so that the light radiation can reach the die. Otherwise, the light will be blocked by the package or the plastic around the smart card die. Thus, this attack provides additional power at the cost of additional complexity, as usual.

In terms of hardware level protections, this is also the most difficult attack to prevent. Typically light sensors are scattered around the chip, but manufacturers cannot place sensors everywhere (that's expensive!) so there is always open spots.

At the end of the day, fault injection protection requires a combination of hardware and software prevention and detection mechanisms: typically sensors at the hardware level and double-checks and redundancy at the software side.

Due to the difficulty of completely preventing this kind of attacks, fault injection attacks are nowadays one of the main threats to secure hardware. Additionally, this difficulty together with the physical nature of the attacks also means that simulating them is typically not enough to assure appropriate protection levels, making fault injection testing key for secure hardware.

So, after more than a year without writing anything here, I was bored today and thought it would be nice to share a new piece on attacking cryptographic implementations here 🙂

Differential Fault Analysis (DFA) attacks are part of what is known as fault injection attacks. This is, they are based on forcing a cryptographic implementation to compute incorrect results and attempt to take advantage from them. With fault injection attacks (also often called active side channel attacks) one can achieve things like unauthenticated access to sensitive functionality, bypassing secure boot implementations, and basically bypassing any security checks an implementation performs.

With DFA attacks, one is able to retrieve cryptographic keys by analyzing correct/faulty output pairs and comparing them. Of course, this assumes you are able to inject faults somehow... which is often true in hardware implementations: gaming consoles, STBs, smart cards, etc. At the software level, one can achieve similar things by debugging the implementation and changing data or by patching instructions... but this is something we have been doing for a long time, haven't we? 🙂 I often say that fault injection attacks are the analog version of 'nopping' instructions out in a program, although we often do not know exactly what kind of faults we are injecting (i.e. we often miss a fault model, but we still successfully attack implementations in this way).

There are ways to protect against this kind of attack as an application programmer, but this is not the objective of this post. In the remainder of this post, I will explain two powerful DFA attacks on two modern cryptographic algorithms: RSA and (T)DES. For some information on protecting from these attacks as a programmer, take a look at these slides. If there is some interest, I will outline the most common techniques to perform fault attacks in a future post.

After a long long while, it's time to go on with our crypto series. Last time we talked about the RSA cryptosystem, and we learned its security is based on the integer factorization problem (plus the DL problem for message secrecy). Today, we'll continue with public key cryptosystems: we'll look into Elliptic Curve Cryptography.

Elliptic Curves

If we are talking about Elliptic Curve Cryptography, first we need to define what an Elliptic Curve is. Mathematically, an Elliptic Curve is a curve with the following equation:

This means that every point (x,y) for which the above expression is met will be part of the curve. However, it turns out in our case we can simplify the equation because the curves we'll be using can generally be written as:

Such a curve, over the reals (i.e. x and y are real numbers) and with a=-3, b = 1, looks like this:

Elliptic Curve y^2 = x^3 - 3x +1 over the real numbers

What makes these curves special is that we can define an abelian group with them. To do that, we define the point at infinity and an addition law. The addition law is depicted in the following picture from Wikipedia:

Elliptic Curve Addition law

As you can see, if you want to add two points P and Q, you draw a line through them. The intersection of this line and the curve is the point -(P+Q). Then, you just need to invert this point (negate the y coordinate) to obtain the final result.

Of course, we have special cases. If the point is added to itself, the line is defined as the tangent to the curve at that point, as intuitively the tangent touches 'two times' the point.

If we add a point to its inverse, we get a vertical line... and that's a problem because it will never touch the curve. Here is where the point at infinity comes to rescue. The point at inversity is simply 'up there' (and 'down there'), and is the zero element of the group.

Elliptic Curves for Cryptography

We have defined above how an elliptic curve looks like over the reals, and how to perform additions of two points. Obviously, when addition is defined we also have multiplication for free: just add a point to itself several times in a row (although you can do it in smarter and more efficient ways).

But how do we use it for cryptography? I mean, where is the difficult problem here? Actually, the difficult problem is again the discrete logarithm problem. In this case, we define it as follows:

Given a curve E and all its parameters, a base point P and a point Q=nP, obtain n.

And how is this difficult in the curves defined above, you might be thinking... The truth is we do not use real curves in ECC, but we use curves over finite fields instead. We can do it over prime fields GF(p), or we can do it over binary fields GF(2^n). I'll look only at GF(p) here, but similar concepts apply (although the simplified expression I defined above is slightly different in that case).

So, the curve I depicted previously taken over GF(8761) looks like this:

Elliptic Curve y^2 = x^3 -3x+1 over GF(8761)

Messy, huh? Exactly the same addition laws apply here, but now when you add two points you draw a line... and when the line gets out of the GF(p) x GF(p) plane it wraps around and comes back from the other side. It is a little more difficult to depict and to visualize, but the concept is the same as before. And now you probably start seeing why this is difficult to solve...

Why Elliptic Curves?

Now you might be wondering... why do we use Elliptic Curve cryptography at all? What are the benefits? The answer is that the ECC allows us to use smaller keys than other algorithms like RSA / 'normal' DL systems for the same amount of security.

This is because the best known general methods for solving the DL in Elliptic Curve are of exponential complexity, while for the other systems we know subexponential methods. Hence, the DL problem under Elliptic Curves is believed to be more difficult than the equivalent base problems for other public key cryptosystems.

Now that we know how elliptic curves are used in cryptography and what benefits they have over traditional

Elliptic Curve Diffie-Hellman

So, if you remember from when we talked about Diffie-Hellman, this is a key exchange protocol that relies on the Discrete Logarithm problem (and the Diffie-Hellman assumption). Usually this is done over a finite field GF(p), but now we have just defined a group based on Elliptic Curves which we can use as well.

In this case, Alice has a private key and a public key , where G is the base point. Similarly, Bob has and . Alice and Bob exchange public keys, and then each of them can compute a common point .

This protocol relies on the assumption that the DL problem is infeasible in the elliptic curve (which requires a base point G of high order) and the Diffie-Hellman assumption.

Other ECC algorithms

Besides the EC Diffie-Hellman algorithm defined above, there are several other algorithms based on Elliptic Curves. For example, one could compute digital signatures using Elliptic Curve DSA or Elliptic Curve Nyberg Rueppel. Each algorithm has its own details, but the important problem used as a foundation for each of them is the Discrete Logarithm problem over Elliptic Curves as we have defined it here.

Beware, however, that similarly to other algorithms, ECC algorithms rely also on other conditions. For example, for ECDSA (and DSA) there is a secret parameter that must be unique, and two signatures with the same value for this parameter will reveal your secret key. As usual, if you implement cryptography. you need to be aware of the requirements and limitations or you will certainly screw up (toc toc SONY!).

Somewhere before the weekend I was discussing about Padding Oracles with a friend and somehow it came up that there was no public tool using timing information for this kind of attacks.

I had seen that Thai and Juliano mentioned timing leaks in their talk at EkoParty, but since AFAIK there was no public tool available I decided to look into it. Also, some weeks ago I added the CBC-R encryption part to my scripts, in order to be able to encrypt arbitrary information as long as we are able to control the IV.

So in this post I'm going to write about these two things: CBC-R encryption and a web based padding oracle attack script using timing information.

It's been a long time I haven't written anything here... I've had to travel for work and have been doing other things and didn't find the right moment to write about anything useful. But a few days ago I decided to take a look at Padding Oracle attacks after hearing several times about them, and I thought it would be nice to share with you guys.

Padding Oracle attacks were introduced in 2002 in paper [1]. After that, several other papers have presented similar attacks based on the same concept for other padding schemes. In this post, I will restrict myself to the padding scheme presented in [1] and will add a list of references at the end for further reading.

Oracles and Padding Oracles

In cryptography, an oracle is basically a black box that responds to queries. For instance, an encryption oracle will encrypt any input you give to it under a certain key. A random oracle will always respond with uniformly random data, and this is useful to model protocols based on hash functions.

In our case, we are interested on padding oracles (PO). A PO is a sort of black box that decrypts an input message and tells you whether the padding was correct or not. For instance, think of an application that receives an input ciphertext and decrypts it using a block cipher.

Since a block cipher encrypts data in chunks of a given size, whenever the data to be encrypted does not fit exactly in a number of chunks it needs to be padded. Thus, after decryption our example application will check the padding applied and will throw an exception if the padding is incorrect. If it is correct, it will just continue with its normal processing.

As you can see, this application is an example of a padding oracle because it is telling us whether the padding is correct or not. We can check the behavior of the application and when we observe an exception we know that the padding was incorrect. When we do not, we know it was correct.

Mounting attacks based on padding oracles

So, how can we use such a PO to mount an attack on the system? The answer is it depends on the underlying cipher mode and on whether it is used properly or not, of course. Assuming the application uses a cipher in CBC mode, and it does only use encryption but no authentication, we can feed the application with specially crafted ciphertext and make assumptions about the plaintext based on the response.

Before going into the attack method, let's define the padding system we are going to look at. The PKCS#5 standard defines a padding scheme that works as follows: If the message length is a multiple of the block length, the padding scheme adds an extra block with all bytes set to the number of bytes in a block. Otherwise, the remaining bytes will be set to the number of padding bytes that need to be added to have an exact number of blocks.

If the block size is 8 bytes, then for a 7 bytes message you will add a byte set to 01; for a 6 bytes message, 2 bytes set to 02, and so on. I bet you get the idea 🙂

Now, when you decrypt in CBC mode, the process works as described earlier in this blog: first you decrypt a block, and then XOR it with the previous ciphertext block (or the IV if it is the initial block). This means that if you get a random data block, then append a target ciphertext block to it, and feed it into the random oracle, you will know whether the decryption of your ciphertext block XORed with your random data adheres to the padding scheme or not.

So, now we know that the decrypted message XORed our random data ends with either 1 byte set to 0x01, or 2 bytes set to 0x02, or 3 bytes set to 0x03, etc. In the most likely case, you'll be lucky with the first option. However, you still need to check, since it might be that by chance you got one of the other options.

How do you know then? Easy: start modifying bytes one at a time and feed them to the Oracle. To make it generic, start from the left-most byte. Modify it, and check with the oracle if the padding is incorrect. If it is incorrect, that means this byte was part of the padding, so the whole thing should be all 0x08's (assuming 8 byte blocks). If it is correct, continue with the next byte until you see that the padding turns incorrect or you reach the last byte.

In the description above, b is the number of bytes in a block. In steps 1 to 3 it is discovering the last byte as I explained above. It creates b random bytes, and modifies only the last one until it gets a message with the right padding. Then it replaces this last byte with the correct value.

Now, using this value for r it has a good padding, so it starts modifying the bytes from the top and checking if this changes the results. If the result is not affected in any of the cases, then the last byte was set to 0x01 after decryption, and this means the last byte of the decrypted plaintext is our random byte rb XORed with 0x01. The same holds for the other cases, only that you need to XOR the random bytes with the appropriate value.

So the above algorithm works for the last byte, and if you are lucky for some other bytes as well. How do we turn it into a block decryption oracle?

Quite simple: assuming you know the final X bytes, generate a random block that after XORing with those known bytes sets the result to X+1. Now, the final block ends with an unknown byte followed by X bytes set to X+1... so now you start trying values to XOR with that unknown byte until you get a good padding response. At that point, you know that this unknown byte XORed the random data you supplied must give X+1 as a result to form a good padding (remember, good padding = Y last bytes set to a value Y).

So now you know one more byte, and you only need to iterate all the way till the end of the block to finish the decryption. The following algorithm was provided in the original paper:

In this case, ak with k=j,...,b are the bytes you know. In step 1 you generate the random bytes that would set the final bytes to the appropriate value. Then you add some random numbers to them, and loop through the possible values for the unknown byte until you find the one that creates a proper padding (step 4). From this one, you construct the value for the unknown byte and return it in step 5.

Implementing the attack in Python

To make sure I understood what I explained above in the right way, I created a small python class able to decrypt any input block given a padding oracle. The class constructor takes an input block size together with a padding oracle in the form of a function returning a boolean value.

This means you can construct your own padding oracle function in python, which will simply return True or False depending on whether the padding was correct or not. This function can be a simple POC to test the concept or can be a function using a padding oracle present on a web site or whatever you can think of. See the presentation by Juliano Rizzo and Thai Duong ([3]) linked at the end for ideas ;).

By the way, the code quality might not be very good and some things might be done easier in Python than I've done them. I'm pretty new to Python so if you see anything you think should be changed let me know ;-). As for the code, you can do whatever you like with it.

You can see it works for my test case. I also tested it using AES encryption insted of DES and it works fine there too, which gives me confidence enough that the concepts above are correctly explained and the tool works appropriately 🙂

You can download the whole package with my DecryptionOracle class and the test cases here.

Hope you enjoyed it and I'm looking forward to see your feedback in the comments!

References

There is more to padding oracle attacks than exposed here. If you want to know more, you can check the following resources. I might be implementing other attacks as well in the future, but the fastest way to know more will be to read these papers:

Here I come with yet another post about the DNIe. In the previous posts, we have seen how the device authentication procedure works and how to use the resulting keys to perform secure messaging. Now it's time to see how to ask the device to perform a hash on the input data and how to perform electronic signatures on it.

I'll start off with the description of the standard and continue with an explanation on how the DNIe drivers do it. Yes, you are reading it right, they use different APDUs than the ones defined in CWA14890, at least in the OpenSC module I'm using as a base for this analysis.

I wanted to share with you guys the little challenge I prepared for the Campus Party Europe. The wargame was organized by SecurityByDefault and took place during the last couple of days.

I was asked to prepare a cryptography challenge for it, and I delivered a little problem that became the level 4 challenge in the crypto category. The problem is based around RSA with 2048 bit keys and AES in ECB mode with 128 bit keys.

The idea was to give some real crypto instead of the typical break-classic-crypto or find-the-needle-in-the-haystack challenges. Of course, I am not asking you to factor an RSA-2048 modulo (well, I am, in a way...) nor breaking AES in a mathematical sense because that is not feasible nowadays. You have to find the trick ;-).

Want to challenge yourself? Give it a try!

I'll leave the challenge here, and the solution will be published in SecurityByDefault in some time. If you have questions or want to share ideas with me you can use the comments, but please do not spoil the solution for other readers!

These are the instructions:

Dear agent,

In one of our missions we have intercepted an email containing a file encrypted with AES in ECB mode with a 128 bit key. Together with the file there was what we suspect is the AES key encrypted with a 2048 RSA key, which we found to be as follows:

Although it was a tough mission, our Operations team did a great job and was able to provide the following information on the target:

- It uses a cryptographic device that contains a 1024 bit modular exponentiation accelerator
- The device uses the same key for decryption and for signature generation

In addition, the Operations team modified the hardware used by our target and was able to collect a pair of RSA signatures over the same data. One of these signatures contains a fault injected thanks to our hardware modification, while the other one is the correct signature. These are the signature values:

Unfortunately, the team was not able to obtain the private RSA key nor decrypt the AES file. It is critical for the mission to obtain the contents of the encrypted file. Your task is to obtain the contents of the AES file.

Good luck!

PS: All RSA operations are RAW operations. This means no padding, just modular exponentiation. For keys smaller than the modulus, the padding is null (i.e. zero bytes).