I'm new to cryptography and have just encountered AES and SPN. I would like to know how the S-box, permutation step and the MixColumn step help improve security. It seems to me (perhaps mistakenly) that the security is provided solely by the key and that the substitution, permutation and MixColumn steps are redundant.

I know I must be wrong, but can't put my finger on it.

Also, is there a small example of AES which can be implemented by hand?
I think seeing it would help with my understanding of the concept.

2 Answers
2

The s-box and linear round operations would be entirely redundant if AES were only ever used to encrypt 128 bits per key (i.e. if you changed keys for every 128 bit block). Because then you would have the one-time pad, which is secure just by xoring the key with the message.

However, ciphers like AES are based on computational complexity (rather than information theoretic security), and purport to securely encrypt plaintext that vastly exceeds the length of the key. The claim of security is that realistic opponents, using realistic amounts of computational resources, have only a vanishingly small chance of recovering any information about the plaintext or the key. Or in other words, recovery of unknown plaintext information that has been encrypted with AES and an unknown key is supposed to be a computationally hard problem.

If AES did not have highly non-linear s-boxes, then the problem would be very easy, as daniel points out in his answer. This is because AES would then be an entirely linear system (linear in the Galois finite field of characteristic 2), and given a small amount of plaintext (just one or two blocks worth should be sufficient) we could use Gauss-Jordan elimination to solve the system and thereby recover the unknown key (or at least enough of it to decrypt and encrypt whatever we liked).

If AES did not have the linear operations (the byte permutation step and MixColumns step), then it would be easy for a different reason. Those steps provide 'diffusion', whereby information about one part of the 128-bit block is 'mixed' with information from other parts of the block. If there was no mixing, then each 8-bit section (byte) of the 128-bit block would be 'segregated' from every other byte. So differences in one byte in the input plaintext would result in differences in only that same byte of the output ciphertext (which can reveal information about the plaintext). Also, you would only have to guess that small part of the key that interacts with a single byte in order to 'decrypt' that byte, and it is much easier to decrypt by brute-force-guessing a small part of the key rather than the entire key.

Together with the s-box, this diffusion means that after just a few rounds, every single bit of the output is a very complex, high-degree, non-linear function of every single bit of the input and the entire key. Highly non-linear systems of equations are much harder to solve than linear systems of equations.

That said, the reason why AES uses that particular s-box and those particular linear steps has to do with the methods of cryptanalysis in use at the time it was designed. AES was carefully designed to resist two very powerful types of attack called differential cryptanalysis and linear cryptanalysis (and some related variants) that broke many block ciphers in the generation prior to AES. To have an 'intuitive sense' of why AES is the way it is, you need to understand those two attacks.

SubBytes (S-boxes) adds non-linearity to the cipher, to prevent building a linear system of equations and do some attack. Also notice that in AddRoundKey the key is added to the state, so the mixing process on the state at the next round is full of sense as the key is involved.