Suppose Alice wants to send encryptions (under a one-time pad) of $m_1$ and $m_2$ to Bob over a public channel. Alice and Bob have a shared key $k$; however, both messages are the same length as the key $k$. Since Alice is extraordinary lazy (and doesn't know about stream ciphers), she decides to just reuse the key.

Well, what you are doing is using a randomly generated key and combining it with the plaintext to form the ciphertext. If it is used more than once, then you could find out how the key and plaintext are being used to form the ciphertext, then exploit this to deduce some letters? Further, use common cryptanalysis techniques to solve (letter frequency, bigrams, etc...)? This might help: cs.utsa.edu/~wagner/laws/pad.html
–
Mr_CryptoPrimeJul 13 '11 at 7:01

7 Answers
7

There is a great graphical representation of the possible problems that arise from reusing a one-time pad. Reusing the same key multiple times is called giving the encryption 'depth' - and it is intuitive that the more depth given, the more likely it is that information about the plaintext is contained within the encrypted text.

The process of 'peeling away' layered texts has been studied, as ir01 mentions, and those methods improve with more layers.

This picture illustrates things beautifully. I guess the spirit of my question was "how would you actually do the statistical analysis once you have $m_1 \oplus m_2$"; a respectable cryptographer would probably say something like "that's trivial".
–
ElliottJul 14 '11 at 0:52

There are two methods, named statistical analysis or Frequency analysis and pattern matching.
Note that in statistical analysis Eve should compute frequencies for $aLetter \oplus aLetter$ using some tool like this. A real historical example using frequency analysis is the VENONA project.

EDIT: Having statistical analysis of $aLetter \oplus aLetter$ like this says:
If a character has distribution $X$, the two characters behind $c_1 \oplus c_2$ with probability $P$ are $c_1$, $c_2$.

While keystream reuse in stream ciphers and one-time pads has been a
well known problem for several decades, the risk to real systems has
been underappreciated. Previous techniques have relied on being able
to accurately guess words and phrases that appear in one of the
plaintext messages, making it far easier to claim that “an attacker
would never be able to do that.” In this paper, we show how an adversary
can automatically recover messages encrypted under the same
keystream if only the type of each message is known (e.g. an HTML page
in English). Our method, which is related to HMMs, recovers the most
probable plaintext of this type by using a statistical language model
and a dynamic programming algorithm. It produces up to 99% accuracy on
realistic data and can process ciphertexts at 200ms per byte on a
$2,000 PC. To further demonstrate the practical effectiveness of the
method, we show that our tool can recover documents encrypted by
Microsoft Word 2002

Here since the key is used more than one time, an attack called Crib-Dragging can be used to attack the cipher text.

A blog post which could give you a greater understanding on the implementation part is located at travisdazell.blogspot.in/2012/11/many-time-pad-attack-crib-drag.html:

Many Time Pad Attack - Crib Drag

The one time pad (OTP) is a type of stream cipher that is a perfectly secure method of encryption. It's very simple to implement and is perfectly secure as long as the length of the key is greater than or equal to the length of the message. That's it's major downfall. However, it also requires that the key never be used more than once. This tutorial shows what happens when you re-use a key to encrypt more than one message. I also show how to uncover the plain-text of two messages that have been encrypted with the same key, without even knowing the key. I use a method called crib dragging.

Let's begin with a brief description of OTP and how it works. Let's take the following message and key:

message = "Hello World"
key = "supersecret"

If we convert both the message and key to hex strings, we get the following:

message = "48656c6c6f20576f726c64"
key = "7375706572736563726574"

If we do a simple XOR of the two hex strings we get the following cipher-text:

cipher-text = "3b101c091d53320c000910"

If we XOR the cipher-text with the key, we can recover the plain-text. That's how OTP works. Without the key, you have no way of uncovering the plain-text.

Let's consider what happens when you have two messages encrypted with the same key. Take the following two messages and key:

message1 = "Hello World"
message2 = "the program"
key = "supersecret"

If we convert each message and the key to hex strings, and then encrypt each message using a simple XOR with the key, we'll get the following cipher-texts:

Let's say that all we have is the two cipher-texts and the knowledge that they were encrypted with a supposed OTP; however, they were both encrypted with the same key. To attack this encryption and uncover the plain-text, follow the steps below.

Guess a word that might appear in one of the messages

Encode the word from step 1 to a hex string

XOR the two cipher-text messages

XOR the hex string from step 2 at each position of the XOR of the two cipher-texts (from step 3)

When the result from step 4 is readable text, we guess the English word and expand our crib search.

If the result is not readable text, we try an XOR of the crib word at the next position.

Step 1 seems difficult (guessing a word that might appear in one of the messages), but when you think about it, the word "the" is the most commonly used English word. So, we'll start with assuming "the" is in one of the messages. After encoding "the" as a hex string, we'll get "746865". That takes care of steps 1 and 2. If we XOR the two cipher-texts, we'll get the following result:

cipher-text1 XOR cipher-text2 = "3c0d094c1f523808000d09"

The next step is to XOR our crib word "746865" at each position of the XOR of the cipher-texts. What we'll do is slide "746865" along each position of "3c0d094c1f523808000d09" and analyze the result. After the first XOR, we get the following result:

When we convert the hex string "48656c" to ASCII, we get the following text, "Hel". This takes us to step 5 from above. Because this looks like readable text, we can assume that the word "the" is in the first position of one message. If we didn't get readable text, we would slide 48656c one position to the right and try again (and keep repeating until the end of 3c0d094c1f523808000d09).

Note that we don't know which message contains the word "the". It could be in either message1 or message2. Next, we need to guess what the word "Hel" is when fully expanded. It could be "Help", "Hello", etc. If we guess "Hello", we can convert "Hello" to a hex string, we get "". We then XOR it with the XOR of the two cipher-texts (just like we did with "the"). Here's the result:

"7468652070" when converted to ASCII, is "the p". We then repeat the process, guessing what "the p" might be when expanded and then XOR that result with the XOR of the cipher-texts. Granted, guessing what "the p" might expand to is not super easy, but you get the idea. If we were to guess "the program", convert it to a hex string, and XOR it with the XOR of the cipher-texts, we'll get "Hello World".

This is called crib dragging. My suggestion is to first try " the " (note the spaces before and after). Most cipher-texts that you'll try cracking will contain that word somewhere in the text. If the result of your crib drag yields gibberish, then you can be sure " the " isn't in either of the plain-text messages. So, try another commonly used English word or phrase and keep trying until the result yields something that looks like readable text. Then you can just expand your guess and keep XORing until you uncover the plain-text messages.

When you just XOR the cyphertexts with each other, what you get is in fact the XOR result of both cleartexts.

f(a) ⊕ f(b) = a ⊕ b

And after that point, all that's left is to use statistical analysis, as ir01 has mentioned.

In fact, the early cell phones used to implement a somewhat similar encryption scheme. They had a one byte (if my memory serves me well) key which was used to XOR the voice in blocks. Thus, an attacker could just XOR the voice message by itself phase shifted by one byte, and get the clear voice communication phase shifted and XOR'd by itself. Which is indeed very easy to crack. Even easier to crack than the XOR result of two separate cleartexts.

Also, as Tangurena mentioned, the Soviet message traffic was decrypted due to the fact that one-time-pads had been re-used. See the Wikipedia article on the VENONA Project.

Each zero in m1⊕m2 indicates a matching character. These are known as coincidences. The number of coincidences can possibly indicate what language they are communicating in since different languages have a different character frequency distribution. (Random data should have coincidences 1/26 of the time if using only lowercase letters, whereas English should be around 6%).

Other than that, you could XOR common words in various locations against m1⊕m2. If the result makes sense (i.e., isn't a bunch of gibberish unprintable ASCII characters) then you found a possible match for both original plain texts at that location. With enough persistence its very possible you could extract meaningful information. You might start with a word like 'the' and go from there, and maybe score the results using an english trigram distribution.

Actually it's 1/52 since we have both lower and upper case, and even higher if you consider punctuation and other symbols. It wouldn't make much sense to start with 'the' because it's unlikely the word will align in both messages. On the other hand checking for it is not expensive so you might as well go for it. Looking for 'e' alone is much more likely to yield fruitful results, and then you proceed to find digrams, trigrams etc.
–
rathJul 30 '13 at 7:48

I specified 'if using only lowercase letters' in the post. It's irrelevant if 'the' (actually ' the ' with spaces on each end is a better phrase to start with) matches up in both plain texts, just that it exists in one of the plain texts. If in m1 you have ' the ', then XORing ' the ' in the same position in m1⊕m2 will reveal the corresponding text in m2. You can't do this with individual characters because you have to be able to judge whether the result is random letters like 'xztyb' (thus not a match at that location) or maybe some letters like 'nd th' which would show up relatively often.
–
AndrewHJul 31 '13 at 0:12