Encryption 101: How to break encryption

Continuing on in our Encryption 101 series, where we gave a malware analyst’s primer on encryption and demonstrated encryption techniques using ShiOne ransomware, we now look at what it takes to break an encryption. In order for something as powerful as encryption to break, there needs to be some kind of secret flaw. That flaw is often a result of an error in implementation.

There are a number of things that can go wrong for someone who is implementing encryption. What’s difficult is being able to identify and analyze the methods a programmer used for encryption and look for any weaknesses to exploit.

Locating encryption algorithms

Before you can even attempt to find the weakness, you must first know what was the encryption algorithm being used. A lot of times, it’s as simple as looking at the API calls. If this is the case, it can be quite simple to identify the algorithm. This was the case for the previous ShiOne walkthrough.

There are times, however, where the encryption is statically compiled into the malware or even a custom written encryption algorithm is used. When this is the case, you must be able to understand the inner workings of encryption algorithms to be able to identify code.

A file’s content will be encrypted and written back into the file, so a quick method to narrow down the general region where the encryption lies is to simply xref the ReadFile and WriteFile API calls. The encryption implementation will likely be performed between these two points.

Identifying encryption code

When looking for statically compiled encryption code, as we mentioned, you will not have the luxury of searching for any API calls. A basic understanding of some of the low-level details of how these encryption algorithms work will be necessary.

Starting off, below, we have the high-level flow of AES algorithm. In general, most synchronous encryption algorithms have a similar flow to this; the differences may be the types of mathematical operations performed, but the core concepts remain the same. So, understanding AES will be enough of a starting point to help identify other types going forward in a real-world analysis.

With AES, being that it is a symmetric encryption algorithm, it performs a series of mathematical and logical operations on three things working together:

Plaintext data to be encrypted

Static bytes that are part of the algorithm (lookup table)

The key used for encryption

Depending on the flavor of AES and key size, the flow will be slightly different. In the picture above, you see a loop involving a few blocks:

Add key

Shift rows

Sub bytes

Mix columns

What is happening in these steps is the file data is read into a matrix of a fixed number of bytes. In this case, it’s 16 bytes, but depending on the algorithm, it could be anything. Here are the rounds of steps:

The add key round XORs the key data against the matrix of input data.

The shift rows round rolls the data using a shift operation. What I mean by rolling is the following: 4 5 2 1. If the roll shifted left one count, it would become 5 2 1 4. Rolled again it would become 2 1 4 5.

The sub bytes round involves a static array of bytes built into the algorithm. Each byte of data from previous steps is used as the index to a lookup array. Thus, there is a static substitution occurring. You can think of it as similar to an enum in programming.

In the mix columns round, the bytes in the matrix are manipulated by some mathematical operations and linear transformations, and result in each byte of the matrix being different now.

Each set of these four series of operations is considered one round. AES can have 10 to 14 rounds. This means that when you are looking for the encryption code inside of a binary, it will likely be a long function with a lot of repetitive-looking code. This is one aspect that can help you identify it as encryption code when looking though the binary.

Here is another example of a round of encryption, likely from a different flavor of AES or similar synchronous crypto:

As you can see, the order of operations is a bit different. These kinds of details are not too important to us because we are not cryptographers. In general, we are not looking to find the weakness in AES algorithm itself, we are looking to find a weakness in the implementation. The reason for going into such detail on the inner workings of AES is only to give you an understanding of how it works so that you can identify it in code when you see it in the wild.

I will point you to a previous analysis we did of the Scarab ransomware. This was an example from which the code above was taken. They were encrypting files using statically compiled AES—no API calls. We had to do some research on the inner workings of various encryption methods to be able to properly identify what the algorithm was actually doing.

The details on the number of sets of these operations in this function was one of the main indicators to us as to which algorithm this code belongs.

I am including this image from the previous article once again just to remind about many encryption methods are being used in a single ransomware. This is good to keep an eye out for and not to be confused when you find multiple encryptions being used. Here, we have the flow chart showing the file encryption but also the algorithm that encrypts the previous key. Although it is not the encryption that is modifying the file itself, it will be what is used to keep the file encryption key secure. Both areas are points of weakness when looking to break encryption.

The point is that any number of combinations of encryption can technically be used, as it is up to the author. You must be able to understand and identify each one and the role it plays in the overall scheme. It may be that a single encryption usage was implemented incorrectly and can be broken, and it may be a combination of a few things that together cause a hole in the overall scheme.

Random number generators

A good starting point when looking for weaknesses in encryption is by looking at the encryption key generators, which in most cases are just some form of a random number generator.

If you have ever read anything about encryption, you will likely have come across someone mentioning the importance of the random number generator. The reason for this is that if you can force the output of a random number generator to reproduce the same value that was generated during a previous encryption, you will likely be able to recreate the original encryption keys.

An example of this is shown below. The system time is being used as the seed for a weak random number generator.

For the most part, any computer algorithm can only perform a finite series of operations. If the inputs to a function are the same, the output must also be the same. It is quite logical. In the case of random generators, the ingenuity is in taking enough inputs to seed the random value so that the output is not east to recreate. For example, some weak generators take the time of day as an input. Although this is, in a way, obscure, the conditions can definitely be recreated. What is necessary is to use enough semi-random inputs to give you enough entropy.

As you can see above, a more solid random generator may sample audio data, in addition to the time of day, and use mouse input and a number of other elements to try to make the inputs as random as possible. This requires an unreasonable amount of operations to brute force or recreate.

Theoretical process of cracking weak RNG

Here is a theoretical example for ransomware using a weak generator referred to as RNG. Suppose that the ransomware used a RNG-seeded with the current time in microseconds and the encryption is a standard algorithm. These are the basic steps for an attack:

Network admin analyzes the ransomware and sees that the public key, which was used to encrypt, is used as the victim ID for the ransomware.

The network admin knows roughly the time at which the infection occurred on his network, possibly by looking at the network logs. Let’s say it occurred sometime between 10:00:00am and 10:00:10am—a 10-second window.

Since the RNG uses the time in microseconds, that leaves him with 10,000,000 possible seeds.

The admin then says to himself, “If the ransomware used time as seed value x, then the encryption code produces the key pair value KEYx. “

He incrementally uses microseconds, one by one, starting at 10:00:00, to perform the key pair creation using some standard software.

Now he checks to see if that matches the public key (victim ID) he has obtained.

Nope, it did not match. That means the RNG did not use x (10:00:00AM) as the seed.

He tries again with x+1 and so on, until he reaches the final microsecond before 10:00:10am.

Eventually, a match will be made—the generated public key will match the victim ID.

He will now know that the private key generated is the same as the one which was generated during the encryption of his hard drive.

Now he can take that private key, run it through his off-the-shelf decryption software, and have the original file back.

In this scenario, a brute force attack is completely within reason. Now, if the RNG used milliseconds, in combination with the number of processes running at the given time, that adds a bit more complexity. It would take the initial 10,000,000 possibilities multiplied by the range of potential processes running on the machine. You can assume it might be somewhere between 5 and 25 processes. So now, that initial 10,000,000 attempts becomes 200,000,000. It is still iterate-able, but has added more complexity. You get the point.

If you add enough parameters, or parameters with a lot of possible outcomes, the number will eventually become so big where a brute force attempt would not be possible in your lifetime, as shown below.

Decryption in practice

Below is a list of a few examples of ransomware that were successfully broken and the methods used.

7ev3n, XORist, Bart: Weak encryption algorithm

Petya: Mistakes in cryptography implementation

DMA Locker, CryptXXX: Weak key generator

Cerber: Server-side vulnerability

Chimera: Leaked keys

Weak encryption algorithm

The DES algorithm was developed in the 1970s and was widely used for encryption. It is now considered a weak encryption algorithm because of its key size. The amount of bits generated as the key for an encryption algorithm is one of the considerations for the strength of an algorithm. For example, there was a contest to crack a 40-bit cipher which was won by a student using a few hundred machines at his university. It took only three and half hours. The bigger the size of the key, the harder it will be to crack an encryption—that is, without knowing anything about it.

Not to say that the common analyst has access to such resources, but I just wanted to give you a better understanding of why an encryption algorithm might be considered weak.

Often times, you can get an initial idea of what the encryption method might be by simply looking at a visualization of the files.

As you can see here, there is low entropy, and the data within the encrypted file shows similarities to the original plaintext. This could be the result of a XOR type of cypher, and a few others

Let’s compare this to a file encrypted with a another algorithm. You will be able to tell the difference in the high entropy result from encryption:

The file visualization can also be a good starting point when researching to see if a given ransomware is able to be decrypted, in that it can help you to figure out what the algorithm is and if it is a know weak one. It can also point you in the direction of which part of the process you may be looking to attack to break the encryption. But as you saw from the list above, Cerber was broken by exploiting a server-side vulnerability. So although the encryption itself was strong, a side channel was attacked in order to create a decryptor.

Conclusion

In this post, we covered the need for identification and classification of the encryption algorithm used in order to look for weaknesses. We then went through a primer on identifying what the code might look like. We covered various weaknesses that can potentially be exploited and walked through a theoretical example of a scenario where a network admin might be able to decrypt the ransomware.

Tune in for part four, the final part of our Encryption 101 series, where we will go through the code of a weak ransomware and walk through, line by line, the process of creating a decryptor.

April 24, 2012 - WARNING: The information included in this tutorial could be used for malicious purposes in the wrong hands, please expect to be yelled at by people who think you are a bad guy if you start talking about this or asking questions. Also, please use responsibly. Hello everyone! Today I am going to give a detailed...

April 24, 2012 - Security Level: Light Purpose: To hide who you are while performing research through your browser. Benefits: Hide your IP Easy to set up Can be run off of a USB stick Drawbacks: Drive-by attacks can still lead to the infection of your host system. Can only hide traffic going out of HTTP port(s). Not meant...

April 27, 2012 - Security Level: Medium Purpose: To hide who you are while performing research through your browser AND protecting your host system from drive-by download attacks. Benefits: Hide your IP Protect the host system by running in a virtual environment Execute malware in a safe environment (non-traffic capture) Drawbacks: Not as easy to setup Need to gather...

April 27, 2012 - Security Level: High / Hardcore Purpose: To hide who you are while performing research through your browser AND protecting your host system from drive-by download attacks AND being able to perform dynamic malware analysis and capture malicious traffic moving between the malware and the C&C. (Whew, that’s a lot of ANDs. =D) Benefits: Hide your...

May 22, 2012 - Since December of 2011, the spread of malicious advertisements, or “Malvertisements”, has drastically increased. Along with this trend is the increased spread of some pretty nasty malware. One in particular is called Happili, an adware trojan that installs a browser extension to re-direct legitimate search queries to ad sites.