Decrypting Malicious PDF Documents Part Two

Share

Patrick Wardle, Director of Synack Research

This is the second blog post in a two-part series discussing how to decrypt PDF documents for analytical purposes. Click here for part one.

Decryption

Guess, what? “it’s hammer, decryption time!” Section 7.6.2 of [5], verbosely describes the encryption algorithm for protecting PDF documents. Since the PDF specified AES, a symmetric cryptographic algorithm, with knowledge of the encryption key the same sequence of steps can performed to decrypt the document. Instead of simply regurgitating the mind-numbing content of the PDF encryption specification, it seemed more productive to instead illustrate the decryption of a PDF document step-by-step in a programmatic manner. Of course, if a step is unclear, or more details are desired, feel free to consult the official documentation from Adobe (we won’t judge).

The first step in decrypting a PDF document is to generate what Adobe calls the ‘file key.’ To generate this key, four pieces of information are required; the file ID, the owner password hash, the user password, and the document’s permissions flag. Of course, if the user password is not blank, the PDF cannot be decrypted (discounting brute-force/cryptanalysis attacks).

The file ID is stored in the PDF documents’s trailer (see figure 0x1). Specifically, it is first value of the /ID object. For the PDF document referenced in this blog (encrypted.pdf), the full ID is:

Given a helper function (GetValueForKey()) that can extract an initial value from a key, the following code will extract the file ID:

(figure 3) extracting the PDF’s file ID

The next piece of information required for generating the file key, is the owner password hash, stored within the PDF encryption object. This value, accessed via the ‘/O’ key, should be extracted and converted into 32 bytes of binary data. This extraction is a touch tricky, as the password hash is stored as an ascii string containing escaped values that must be manually unescaped or converted [3]. For example, a ‘r’ is stored in two characters (ascii ‘’, hex 0x5c and ascii ‘r’, hex 0x72). This two byte value must be converted into the single byte new line character (hex 0x0d).

For following code illustrates one way to achieve the correct extraction of the owner’s password:

(figure 4) extracting and escaping the owner’s password (‘/O’)

When the code in figure 4 is run on the encrypted.pdf document, it produces, hash is: 0x30e714592e1ff1fe285a601b413c71caf3b989576b72ac51371cf782a7f32f0b

Assuming that the user’s password is blank, the final piece of required information for generating the file key, is the value of the permissions flag (under the ‘/P’ key) also found within the PDF’s encryption object. Though stored as string representation of a signed integer (e.g., -3392), it must be extracted as unsigned 32bit integer:

(figure 5) extracting and convert the value from ‘/P’

With all three (again, ignoring the user password as its assumed to be blank), pieces of required information, the file key can now be generated. The key generation algorithm is described in Adobe’s documents, under the section ‘Computing an Encryption Key’ on page 18 [8]. In short the user’s password is padded up to 32 bytes, with a known/static password pad, then hashed with the owner password, the value permissions flag, and the file ID. This hash is then re-hashed 50 times. Since the user’s password is blank, all 32 bytes of the password pad are fully used. With a known password pad, an encrypted PDF document (with a blank password) can be fully decrypted.

(figure 6) generating the file key

Adobe describes a way to validate that the computed key is in fact correct. That is to say, it will successfully decrypt the PDF document. In short, the user’s password (again padded to 32 bytes with the password pad), and the PDF’s file ID are hashed. The resulting digest, is first decrypted with RC4 using the generated file key as the decryption key. The decrypted data is then decrypted 19 more times, with a decryption key generated by XOR’ing the decryption count (0 thru 18), with each byte of the computed file key. The final result of this hash/decryption loop should match the last 16 bytes of user’s password hash in the PDF’s encryption object.

This algorithm is of course much easier to understand in code:

(figure 7) validating the file key

Once the file key has been generated and verified encryption can (almost) commence. According to Adobe’s documents [5], PDF objects are individually encrypted – each with a unique key. More key generation you ask? Why yes!

With knowledge of the file key, generating an object key is fairly straightforward. In short, the generated file key is hashed with the object number, object version (or ‘generation’) number, and the hardcoded salt value: 0x73416C54 (‘sAlT’). The object number and version number are simply the first two digits found at the start of the object declaration. The following code illustrates how to generate an object key for an encrypted PDF object:

figure 8) generating an object key

With an object key in hand, an encrypted PDF object can finally be decrypted! Recall that this PDF (as specified PDF’s encryption object) is encrypted with AES. More specifically, it uses AES in CBC mode, with the first 16 bytes of the encrypted PDF object as the initialization vector (IV). Given an object key, the following code will decrypt an encrypted PDF object:

(figure 9) decrypting a PDF object

The ability to decrypt PDF objects within an encrypted PDF document is a powerful technique, as it can reveal previously inaccessible (malicious) components.

Often, attackers will store malicious components such as media files (e.g., a flash component to exploit CVE-2010-1297), shellcode, or javascript in compressed PDF stream objects. If the document (and these objects) are encrypted, a naive signature-based scanner (that does not support decryption) would fail to detect any maliciousness. Using the techniques described in this blog however, the objects can be decrypted and their true nature revealed. While extracting and decompressing streams is beyond the scope of this blog post, here are some of the more interesting objects, decrypted out of the encrypted PDF document.

First is a stream that contains the malicious flash file (to exploit CVE-2010-1297):

(figure 10) decrypted stream with malicious flash component

Then, a stream that contains exploit-related javascript:

(figure 11) decrypted stream with malicious javascript

Conclusion

Malicious PDF documents are a highly effective way of infecting an attacker’s target. Although the AV industry attempts to detect such documents, attackers can often thwart these efforts by utilizing Adobe’s built in encryption. This blog post detailed how PDF documents, encrypted with blank password, could be programmatically decrypted. As was shown, once decrypted the PDF’s malicious intentions were clearly revealed…attackers, the next move is yours!