Decrypting Malicious PDF Documents Part One

Share

Patrick Wardle, Director of Synack Research

This is the first blog post in a two-part series discussing how to decrypt PDF documents for analytical purposes. Click here for part two.

Background

Malicious PDF documents are one of the most common methods of gaining access to a remote system. The software that renders PDF documents is complex, and as such, suffers for a seemingly unending stream of vulnerabilities [1]. Moreover, even patched vulnerabilities are often quite potent as users tend to update extremely slowly…if at all. Armed with a malicious PDF an attacker can deliver it to a victim in a variety of (often anonymous) ways. For example, via socially engineered emails (containing a PDF attachment, or links to a hosted documents), targeted victims can easily be coerced into viewing malicious documents, thus infecting themselves [2].

The anti-virus (AV) industry is well aware of the PDF threat vector and as such, has integrated automatic scanning of PDF documents into their products. To avoid detection and hide malicious content, attackers have concocted a myriad of anti-detection mechanisms, including the utilization of Adobe’s built-in PDF encryption scheme. Even today, the use of such encryption proves problematic for AV scanners, many whom often are thwarted by its use. These scanners generally look for known exploit or malware signatures. Of course if a PDF document is encrypted and the AV scanner cannot automatically decrypt it, the exploit will remain hidden and the PDF document will not be flagged as malicious.

This failure of detection is illustrated by a malicious encrypted PDF document (encrypted.pdf, MD5: 306d7e608a52121aa4508e9901e4072e) that will be referenced throughout the remainder of this blog post. Though containing a known exploit (a malicious flash component that triggers CVE-2010-1297) from almost 5 years old, the encrypted document is detected only by about 50% of the AV scanners on VirusTotal, even when reanalyzed:

(figure 0) VirusTotal’s ~50% detection rate (July 2014)

Such a low detection rate is clearly unacceptable. The goal of this blog therefore, is to shed light on encrypted malicious PDF documents and to comprehensively describe how they may be programmatically decrypted (if encrypted with a blank password). Once decrypted, the PDF document may then be scanned or manually analyzed to reveal its true nature. While PDF decryption has been described in other sources such as [3] and [4], these sources are somewhat dated, or are light on details. As such, it seemed useful to provide an updated, comprehensive, instructional analysis of encrypted PDFs and illustrate programmatic decryption capabilities.

Encryption

Generally, to recover plaintext for any encrypted content, knowledge of the decryption key is necessary. To protect documents (such as PDFs), this key is usually derived from a user-specified password. When the encrypted document is (re)opened, the rendering software (e.g., Acrobat Reader), will prompt for the original password. If it is not provided, the document will remain, for all intents and purposes ‘undecryptable’.

The requirement of knowing the decryption key would seem problematic to an attacker. What is the point of sending out an encrypted malicious PDF document if the victim cannot open it? As it turns out though, the cryptographic algorithm used to encrypt Adobe’s PDF documents (described in ‘Document management — Portable document format’)[5] supports blank passwords. In this scenario, the encryption is completely transparent to end users: a document can be both encrypted, and subsequently opened by the PDF reader software without a password. This is highly beneficial to an attacker, who can benefit from the encryption (e.g., to hide the exploit signature, hinder analysis, etc), while ensuring that the victim can still open the document and infect themselves.

Of course, if the reader software (e.g., Acrobat Reader) can decrypt an encrypted PDF document without a key, so can anything else. This includes a python script, (snippets presented below), that can automatically decrypt such PDF documents. Armed with such a script, an analyst can remove the encryption (or ‘armor’) of the document, revealing its malicious components.

As shown on page 70 of [5], encrypted PDF documents contain a reference to an encryption object in the file’s trailer. (Note: for a great introduction to anatomy of PDF documents, see [6]). Using Didier Stevens’ pdf-parser.py [7], the tailer of a document (e.g., encrypted.pdf) can be dumped:

(figure 1) the encrypted PDF’s ‘encryption object’

As the following figure shows, the PDF’s trailer contains a dictionary with a key value pair that indirectly references the PDF’s encryption object (55, 0). Dumping this encryption object reveals another dictionary, containing various encryption parameters:

(figure 2) dumping the PDF encryption object

Sections such as 7.6 of Acrobat’s PDF specification [5], describes these encryption parameters. For purposes of programmatic decryption, only a few of the parameters are relevant. The ‘/O’ and ‘/U’ key value pairs are referred to as the owner and user password hashes. These, along with their use in the decryption of the PDF document are described below. The ‘/P’ key value though not fully described in the Adobe documentation, shown being used in decryption. However, online sources refer to it as the permissions flag, “specifying the allowed operations” [4]. Its use is also described later in this post. The final relevant value, the ’/V’ key value pair is “a code specifying the algorithm to be used in encrypting and decrypting the document” [5]. A value of four, indicates the that security handler defines the use of decryption in the document, “using the rules specified by the CF, StmF, and StrF entries” [5]. In other words, the name and version of the cryptographic algorithm is specified elsewhere, in other parameters. In this particular PDF document, the ‘/CF’ dictionary contains a ‘/AESV2’ value, indicating that the Advanced Encryption Standard algorithm should be used.