Cryptopals challenge 6: Break repeating-key XOR

This challenge isn’t conceptually hard, but it involves actual error-prone coding. The other challenges in this set are there to bring you up to speed. This one is there to qualify you. If you can do this one, you’re probably just fine up to Set 6.

Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.

Write a function to compute the edit distance/Hamming distance between two strings. The Hamming distance is just the number of differing bits. The distance between:

this is a test

and

wokka wokka!!!

is 37. Make sure your code agrees before you proceed.

For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.

The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.

Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.

Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.

Solve each block as if it was single-character XOR. You already have code to do this.

For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.

This code is going to turn out to be surprisingly useful later on. Breaking repeating-key XOR (“Vigenere”) statistically is obviously an academic exercise, a “Crypto 101” thing. But more people “know how” to break it than can actually break it, and a similar technique breaks something much more important.

No, that’s not a mistake.

We get more tech support questions for this challenge than any of the other ones. We promise, there aren’t any blatant errors in this text. In particular: the “wokka wokka!!!” edit distance really is 37.

This challenge was tough for me. I wasn’t familiar with Hamming distance, and also got a bit confused in steps 3, 4, and 6. While researching my confusion, I found this blog post extremely helpful, and it probably saved me lots of frustration.

Solving this challenge took multiple steps: Read and decode the file contents, implement a loop to cycle through the potential keysize range, write a function to calculate Hamming distance, normalize the Hamming distance, break the ciphertext into chunks the length of the determined keysize and transpose the blocks, and then implement single-key XOR brute forcing.

“For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE”

When I did this initially I did it incorrectly. For each piece of ciphertext, I simply took a first and second keysize worth of bytes, ran it through the Hamming distance function, and then divided the results by the keysize. So for example if the cipher was asdfghjklzxcvbnm and my keysize was 2, I would take ‘as’ and ‘df’, send it to the Hamming distance function, divide the result by 2, and then move onto the next keysize and repeat. Where I went wrong was only taking the first and second worth of bytes. After some research, what I did instead was split the ciphertext up into keysize-size chunks, take the first two chunks and get the Hamming distance, and then repeat this process for all chunks. Here is what the code looks like:

Each of these distances were then further normalized by taking the average of the distances of the chunks. The results were then stored in a dictionary that was appended to a list that was initialized at the beginning of the function:

The whole purpose of this step is to find the correct keysize (or most likely keysize). If you are interested in another method to determine the potential keysize or how this method works to determine the keysize, take a read at the Kasiski examination (the alternate method for determining keysize) and Friedman test (what is implemented for this challenge) here.

Break the ciphertext into chunks the length of the determined keysize and transpose the blocks

It took me a while to figure out what I was supposed to do at this point. I decided to turn to a book that I previously worked through when learning Python, Hacking secret ciphers with Python, by Al Sweigart, which has a chapter dealing with hacking a Vigenere cipher. The purpose of this step (transposing the blocks), is to recover the key character by character. As an example, let’s look at encrypting a message with the same key in the previous challenge.

>>> message = b'the cat in the hat'
>>> key = b'ICE'

Since the key is only three characters, every 3rd character will be encrypted with the same key.

For example, the following highlighted characters will be encrypted with ‘I’:

If this isn’t clear, take a look in the aforementioned book at how a Vigenere cipher is created and cracked, and it should give you a better understanding of how and why the bytes are transposed in this way.

Implement single-key XOR brute forcing

I’ve already explained the code to do this in a previous post, so I won’t go through it again, but each block created in the previous step is ran against the single-key XOR brute forcing code. Here is the full script:

I also went ahead and made this into a tool, which you can find on my GitHub. I made a few modification to the above code, mainly selecting the 5 shortest normalized edit distances and process each of them. The tool allows you to read the data from a file or as an argument to a parameter, as well as decode the input using Base64, hex, and/or URL encoding. Feedback is welcome.

Your email address will not be published. Required fields are marked *

Comment

Name *

Email *

Website

About me…

I’m an information security professional with a focus on offensive security. My day job is in penetration testing, but I also have experience in host defense, audit, and system administration. When I’m not doing that, I enjoy coding, building things in the AWS cloud, and ultra running.