Suppose that we do not generate initialization vectors randomly (using AES in CBC mode). Instead it is initially all zeroes and we increment it by 1 each time a message is encrypted. How can this cause a problem? Could explain with an example?

2 Answers
2

A block cipher like AES is deterministic: for the same key and the same input, you get the same output. Since real-life data tends to exhibit redundancy, this makes ECB, the most basic cipher mode, quite weak. CBC tries to avoid that issue by "randomizing" the input blocks, through a XOR with the previous encrypted block: the idea is that the output of a block cipher should look "random enough". The AES is still deterministic, but the randomization ensures that the risk of collision will remain very low (AES has 128-bit blocks, so the first collision should be encountered after encrypting, on average, about 264 blocks, which is 256 exabytes, aka "an awful lot of data"; this will not happen often). Since the first block has no previous block, we need an "artificial previous block" and that's the IV. Therefore, the IV should be "random looking".

A first basic observation is that a simple +1 policy on the IV may fail to be random enough. Imagine that you encrypt messages which contain some data as a list of "key = value" pairs (e.g. the standard encoding of a Java Properties object), and that all the messages begin with userPassword = thepassword (for various user passwords). With AES, and its 16-byte blocks, note that the first block ends immediately after the first letter of the password. So the successive first blocks will be almost identical, save for their last byte. Suppose that the IV and messages are:

If you work out the values for the first block passed as input to the AES for each message, you will find that for the first and fifth message you end up with the same block (29648fbe541ea05ca9a35c6b02536ee7). So the first sixteen bytes of the two encrypted messages will be identical. By spotting that equality, the attacker will then deduce exactly which bits differ in the first byte of each password. By observing many successive messages, the attacker may accumulate quite a lot of such equations on the unknown data. This is exactly the kind of leakage that good encryption should avoid.

The situation is made much worse in the presence of chosen-plaintext attacks: these are attacks where the villain gets to choose part of the plaintext. In the example described above, suppose that the attacker is one of the user, and his password is in the first message. Thus, the attacker knows that his password begins with an 'm'; by observing the encryption of the successive messages, he then infers that the password in the fifth message begins with an 'a'. Moreover, the attacker may change his password and have another go, getting information on other passwords, and so on.

The recently advertised BEAST attack builds on a CPA, in a more realistic scenario involving a Web browser, an HTTPS connection, and a "secure" Paypal cookie. The cornerstone of the attack is that data in a SSL connection is broken into "records", each record being encrypted in CBC mode, and the last block of a record is used as IV for the next. The attacker then forces some data of his own into the connection, observes the first record, and computes from that the data he then adds (which will be encrypted as the second record), so that the first block of that data XORed with the IV yields a predictable value. The details are intricate, but end up with revealing the secure cookie. For the attack to work, all the attacker needs is a predictable IV, which he gets by observing records (TLS 1.1 fixes that by adding per-record random IV).

To sum up, the BEAST attack demonstrates that CBC with an IV that can be predicted by an attacker who can do chosen-plaintext attacks, is weak. A +1 policy for IV management leads to IV that can be easily predicted. Therefore...

One shall note that while CBC requires random, uniform, unpredictable IV, not all block cipher modes of operations are that picky. Many cool newer modes, especially those which combine encryption and integrity check, can live with non-repeating IV even if the IV values are totally non-random (e.g. EAX or GCM); for them, a +1 on IV is perfectly fine. In some protocols, this allows for an implicit IV (a sequence number, for successive messages sent over a wire) which may save a bit of bandwidth.

Not using a random IV with CBC mode is a vulnerability recognized by CWE-329

The IV is almost always known to the attacker, and ideally this value is useless without the secret key.

However, If the attacker knows what the IV will be for a given plain text message or if the attacker can control the message, then he can per-compute all possible keys for that Message+IV combination. The cipher produced by this Message+IV+Key can then be looked up in the precomputed table to obtain the secret key in a very short period of time.

Precomputing all keys is not really feasible for large key sizes (like AES). This can be used for distinguishing previously (or later) sent messages encrypted by the same key, though.
–
Paŭlo EbermannNov 21 '11 at 20:34

@Paŭlo Ebermann your right, but it weakens the system and thats why we have CWE-329.
–
rookNov 21 '11 at 20:47