This blog posts presents my solution to the Reverse Keygen by GordonBM on the “reversers’s playground” at crackmes.de. I first used IDA Pro to get the CIL representation of the key generator. You can see the full listing of the key generator function here here.

The code responsible for generating the key is in the method pump. This function receives the message as the first argument. The function generates the key in either the simple of hard mode based on the check box status passed as the second argument. The function returns the key back to the caller, who simply displays the value by copying it to the message text box. These are the first few lines of pump:

The snippet converts the message string to an array of characters and stores the result in rgchMessage. It also initializes two results variables szResult (an empty string) and szResult (an empty array with same size as the message). After that, the code either continues with the hardcore keygenerator (when the checkbox is set) or jumps to simple mode key generating at loc_4A1. Let’s start with the easy version.

Simple Mode

Let’s first investigate the simple version, which starts at line loc_4A1

This snippet doesn’t to much. We are probably looking at a for loop that iterates over all characters in szMessage. After the for-loop, the method pump returns the string in szResult. Here is the code in pseudo C#:

Reversing encrypted text is straightforward. The only tricky part is differentiating between the two digit ASCII codes and the three digits ones. The latter always start with 1, while the former never do (the ASCII codes 10 to 19 are not printable). The following Python script first tokenizes the key into ASCII codes, and then transforms those code back to characters:

So the hardcore mode, just like the simple version, operates on the ASCII codes of the letters and concatenates the results. This time, however, there is an additional method ran whose return value is added to the ASCII character codes. Also the code uses double types instead of integers. Let’s try to decompile ran:

So to summarize: The hardcore mode also iterates over all characters and takes the ASCII codes. But then it adds random noise with the function ran. The hardest part about reversing this key generation algorithm is definitely the uncertainty of this noise. The next section investigates this noise in more detail.

Learning about the Noise

Strings of Length 1

Let’s start with the easiest case - strings of length 1. The for loop is executed exactly once. The expression (2*iCounter) ^ 6 evaluates to 6. The random function rnd.Next(0, 2) either evaluates to 0 or to 1. This means the result of ran is either 0 or 18.

Strings of Length 2

The for-loop gets executed twice for two letter words. After the first pass, dbV0 holds either 0 or 18 as seen before. For the next iteration (2*iCounter) ^ 6 evaluates to 4. The random function rnd.Next(0, 2) again is 0 or 1. So if dbV0 was 0 after the first iteration, we get either 0 or 4. If, on the other hand, dbV0 was 18, we get the value 162 or 174.

Strings of Length n

The following recursive Python function generates all possible noise values for a given length:

The number of noise values doubles for each extra letter. The only exception is the step from 3 to 4 characters which both lead to 8 noise values. This anomaly is due to the XOR expression that becomes zero for

iCounter = 3.</p>

Not only do the number of potential noise values increase, the number of digits also varies more and more. For 6 letter strings, for example, the noise could be 0 up to the 7 digit variable 1146402. So an additional problem is to know how many characters the message has. The next section examines the mean key length for different message sizes.

Average Key Length

To guess how many characters were in the message, we need to have a statistic for the average key length given a certain message length. The following Python script simply generates all potential noise values with the method ran_values shown above. It then adds the average ASCII code value. For the average ASCII code, I’m assuming the characters of the message are withing the range 32 to 126. This ranges includes all printable characters. The mean character would therefore have a code of 79. This is, of course, a very rough estimate. Better algorithms would determine the mean based on average messages. But to just guess the string length it should be fine. Here’s a script that lists the average length of the key for different message lengths up to 14:

Note that the code takes a while to complete. But the values need to be computed only once and can later be hard coded into the reverse key generator.

message length

avg. key length

1

2

2

5

3

9

4

16

5

23

6

33

7

44

8

56

9

72

10

90

11

110

12

132

13

156

14

182

Given the average key length we can estimate the length of the message.

Average Key Length

The example key given by the GordonBM is 9247109931023928283286380308924708453882326686447837 and has 52 characters. The closest value in the above table is 56 which corresponds to a message length of 8. The real message “GordonBM” indeed has 8 characters. The following Python snippet returns the best guess for the message length based on the hardcoded results from the previous section:

With the above code we can guess the expected length of the message. Given this information we can then calculate the noise values. The “only” remaining part is to actually crack the key. I’m doing this with brute force.

Brute-Forcing the Message

With the length information of the message and, more importantly, the resulting noise values, we can brute force the message. The following algorithm starts at the beginning of the key and iterates over all potential noise values. It then checks if any character from the ASCII_RANGE = [32, 126] could have produced the key at hand. If yes, the algorithm advances the resulting number of digits and repeats the procedure. If it manages to reach the end of the key with all valid message characters (meaning they are in the ASCII_RANGE), then the function tests if the length of the message checks out and simply prints the message to stdout:

Because the message is longer (and therefore also the potential noise values), we get not just one message back but 64 slightly different ones. Among those values is also the original message “GordonBM”. But without additional knowledge about the message we can’t do better than provide the extensive list of potential message.