Archive

There is one essential usage restriction: never encrypt two different messages with the same pseudorandom data, or even the same truly random data.

Using P for plaintext, C for ciphertext, R for (pseudo)random, and for XOR this makes the encryptions:

C1 = (P1)XOR(R)
C2 = (P2)XOR(R)
The enemy can intercept both ciphertexts, so he does:

X = (C1)XOR(C2) = (P1)XOR(R)XOR(P2)XOR(R)
and the Rs cancel out!

This is extremely good for the attacker who now has:

X = (P1)XOR(P2)
This is very weak indeed. Because the Rs cancel out, the attacker gets the same result no matter what R is used, so the encryption no longer has any real effect and the quality of the random numbers becomes irrelevant.

From that point on, the attacker need not worry about the encryption, only about properties of the plaintexts. If he knows or can guess part of either plaintext, he can immediately infer the corresponding part of the other. A zero byte in X means the corresponding bytes in P1 and P2 are equal. Other simple relations exist and can readily be exploited. He can also play a sort of ping-pong; make some inferences or guesses about one text, see what that tells him about the other, make some inferences there, see what that tells him about the first text, and so on.

Given moderately strong knowledge of plaintext properties — for example, that it is English or Russian text — P1P2 can be broken using techniques such as frequency counting that have been well-known for centuries and can be done with pencil and paper; you don’t even need a computer. See History of cryptography for details. Even with weaker knowledge of the input format, it is still easily breakable.

Of course in a real application, such as the NSA and British attack on Soviet one-time pads called VENONA, the attack may be far more difficult. Suppose you have a substantial archive of intercepted messages and you know or suspect that the enemy sometimes reuses the random material, There is still a huge amount of work to do to discover what was reused where.

On the other hand, the attack might be easy. In some versions of Word and Excel, Microsoft mis-used RC4 [6]; when multiple versions of an encrypted file were saved, the same pseudorandom stream was used each time. In this case the attacker has more to go on since he knows the texts are multiple versions of the same document. Also, there may be more than two versions.

Microsoft also mis-used RC4 in their first version of PPTP, the point-to-point tunneling protocol used to implement VPNs [7]. They used the same key for encryption in both directions on a connection; the attacker can just XOR the two message streams together. This was fixed in their next version of the protocol.

There is a rather nice graphical illustration of this attack at Cryptosmith.

Rewrite attacks

Stream ciphers (including one-time pads) are inherently vulnerable to a rewrite attack. If an attacker knows some plaintext and has intercepted the matching ciphertext, then he can discover that portion of the pseudorandom data. This does not matter if the attacker is just a passive eavesdropper. It gives him no plaintext he didn’t already know and we don’t care that he learns some pseudorandom data. We will never re-use that data, for reasons given in the previous section. For a one-time pad, having a portion of the truly random key tells him absolutely nothing about the rest of the key. For a stream cipher, having some of pseudorandom generator’s output does not allow an attack if the generator is well designed.

However, an active attacker who knows the plaintext can recover the pseudorandom data, then use it to encode whatever he chooses. If he can get his version delivered instead of yours, this is a disaster. If you send “attack target 372 at dawn”, the delivered message can be anything the same length; perhaps the enemy would prefer “withdraw East immediately”. An active attacker with only a reasonable guess at the plaintext can try the same attack. If the guess is correct, this works and the attacker’s bogus message is delivered. If the guess is wrong, a garbled message is delivered.

The attack fails if an effective cryptographic authentication mechanism is used at either packet or message level. That mechanism detects the bogus message and either discards it or warns the user. Note, however, that while authenticating the players at channel setup time is a required step to prevent man-in-the-middle attacks, it does not prevent this attack. To stop rewrite attacks, you need authentication later on, at packet or message level.

Even ignoring whatever authentication is used, this attack is generally not practical against high-speed systems with a good synchronisation mechanism, such as military systems or network encryption boxes. Performing the attack in real time — fast enough to get a bogus message delivered without losing synchronisation — would be very difficult. Systems such as TLS that use a stream cipher for a slower and less synchronised connection are more vulnerable; they definitely need authentication. Systems such as Microsoft Word encryption which apply stream ciphers to static data are even more vulnerable.