Contents

Oppida/NoSuchCon challenge

Intro

In April 2013, Oppida proposed a challenge associated with NoSuchCon 2013.
The challenge was designed by Eloi Vanderbéken @elvanderb and consisted into a PE binary embedding a white-box AES implementation.
It was of "keygen-me" type.
The challenge was broken by a few people and some write-ups are available:

To break the challenge one has to guess what input to the white-boxed AES will produce the desired output such that MD5(nickname) == AES(serial) i.e. to revert the AES block such that serial=AES^-1(MD5(nickname)).
This white-box being made of look-up tables, it could be reverted step by step, round by round from output to input, but the AES key itself was left unbroken.
There are pointers in Shiftreduce's presentation to the BGE attack, a now classical attack against a no less classical Chow white-box AES implementation but this implementation is quite different from Chow implementation and AFAIK Shiftreduce didn't recover the key.

Because I got curious about white-box only recently, I looked around what was publicly available and decided to investigate this one, a bit late for the competition I admit ;-)

The advantage of starting late is that I can rely on the wonderful work published by those three people and concentrate directly on the white-box itself, sparing me the need to peel the onion at first (it would have made me crying for sure).
After the challenge period, Eloi even published the code of his generator, very useful to study it and generate our own challenges (under Linux)!

Structure

The drawing depicts the detail of the computation of one AES round, more precisely just the first byte of the state (and the three next, depending on the same 4 input bytes).
The first and last rounds are different:

First round is similar to intern rounds but the tables applying the SBox here are also applying the initial round key in a so-called TBox and the input is not encoded by [math]X_2[/math] but by a so-called external encoding [math]F[/math] to be reverted.

Last round as usual doesn't contain the MixColumn operation therefore is much simpler but another external encoding [math]G[/math] is applied on the output.

Notations of the drawing (please forgive my lack of math strictness):
All datapaths are 8-bit wide

So this design makes use of 9 random substitution tables (that you don't find as such in the binary but which are used during generation) and their inverse:

[math]F, G, B_0, B_1, B_2, B_3, X_0, X_1, X_2[/math]

and the same substitutions are re-used in all rounds (otherwise the 3 large xor tables would have to be duplicated many times)
At the end, the binary contains those tables:

3 xor tables of 256*256 values

9*16*4 round tables of 256 values

16 final round tables of 256 values

A curiosity of the design when compared with classical Chow is that each round key is applied before MixCol and therefore split into 4 random shares (recombined later by the xors). So the first round tables contain already the first two round keys.
If you want to compare with Chow, see A Tutorial on White-box AES by James A. Muir and zoom on this picture:

Attack

Because the internal encoding [math]X_2[/math] is the same between all rounds one may choose to attack a reduced version of AES with only 3 round keys but the problem is that the external encodings are completely unknown.
I propose another attack that works against any of the intermediate rounds and doesn't require a ton of equations because... I hate math.
Here attacking Round 3 but any round between 2 and 9 will do.
Let's guess the encoding of one single value [math]P=0[/math] in [math]X_2[/math], the encoding table used between rounds.
Encoding of [math]P[/math] is [math]X[/math]: [math]X_2(P)=X[/math].
Try to find the corresponding key of 3nd round (so the 4rd round key):

There are 16 groups of round tables for each round, taking 4 bytes as input and producing one byte (one round table group here is actually the 4 sub-round tables combined with the 3 xor tables)

To attack one group, we fix the 4 input bytes as [math]X\!:\!X\!:\!X\!:\!X[/math] so the decoded input is supposed to be [math]0\!:\!0\!:\!0\!:\!0[/math]

We compute the encoded output byte out and compare it with [math]X[/math]

if output [math]Y \neq X[/math], we create a new input [math]Y\!:\!X\!:\!X\!:\!X[/math] and chain executions of that group till output [math] = X[/math] so we know decoded output is [math]P=0[/math]

The required number of iterations to reach an output [math] = X[/math] is then compared with a clean implementation without encodings to check which k candidates require the same number of iterations, so here going from [math]0\!:\!0\!:\!0\!:\!0[/math] to [math]0[/math]

Example requiring 3 iterations:

Clean group implementation:

So from each initial encoding guess we get typically something like this: (i=ith byte, n=nr of iterations)

Recovery of the external encodings is left as exercise for the reader ;-)
Fun facts: if on the clear implementation we don't look for [math]0\!:\!0\!:\!0\!:\!0[/math] to [math]0[/math] but for [math]x\!:\!x\!:\!x\!:\!x[/math] to [math]x[/math] where [math]x[/math] is arbitrarily chosen, we still find the right round key but with a wrong [math]X_2[/math] mapping.

Thanks for your patience if you have read so far, feel free to share your thoughts with me (@doegox).
And thanks to Eloi for this great challenge and to Axel, Arnaud and Guy those who have shared their write-up!

Epilogue

Before being able to mount the attack, we had to recover the actual tables and that's not that simple.
Axel's version contains the tables but still unsorted as this was not needed for him to break the challenge:

Table names and positions are random

Code is flattened

Intermediate variables are taken in arbitrary order from a reusable pool, including output buffer

Steps are calculated out of order, only order due to dependencies is preserved

Half of tables are actually tables to function snippets in the original code but Axel's version removed already that obfuscation layer, sigh!

So to rename properly the tables one needs to:

Rewrite code with static single assignment (SSA) form, to be able to reorder it without conflicts