January 13, 2010

This post is about hiding an evil PDF into a saint PDF. The objective is to embed a pdf into another pdf, and make the reader parse the embedded one without user intervention. If we manage to do this we’ll be able to ‘filter’ the embedded file and hide it through some pdf encoding filters (flatedecode, crypt, etc), that way making it invisible from the outside. And at last, as we’ll be using miniPDF.py, we’ll pass everything through the (unfinished) obfuscated version of the miniPDF.py lib, here.

Hey! But, can we embed files into a PDF at all? Well as stated here …

PDS3200:2008::7.11.4 Embedded File Streams

If a PDF file contains file specifications that refer to an external file and the PDF file is archived or transmitted, some provision should be made to make sure that the external references will remain valid. One way to do this is to arrange for copies of the external files to accompany the PDF file. Embedded file streams (PDF 1.3) address this problem by allowing the contents of referenced files to be embedded directly within the body of the PDF file. This makes the PDF file a self-contained unit that can be stored or transmitted as a single entity. (The embedded files are included purely for convenience and need not be directly processed by any conforming reader.)

.. YES we can. There are probably other ways to embed files, as in the relatively new PDF ‘collection’ thing, but that’s other story.

I) Embeed a PDF into a PDF

OK, let’s start! First thing we need is a clean PDF to hide. I needs to be one with a correct xref and with a clean overall file structure. So, for a start we hide a good pdf, then we’ll see how to embed a bad one. There is a clean minimalistic text displaying pdf generated in this post, the pdf here.

Now we need to construct the host pdf. We are not really interesting in putting anything here so let’s construct an empty pdf (mostly as done for the JS-to_PDF post, here).

As in the earlier post first we import the lib and create a PDFDoc object representing a document in memory …

from miniPDF import *
#The PDF document
doc= PDFDoc()

Prepare the Pages dictionary, wich is in charge of linking to the pages..

(2) Now we’ll construct the FileSpec dictionary for it.

As stated in the rather confusing PDF3200:2008.1::7.11.3(File Specification Dictionaries), a file specification dictionary for an embedded file will need to have this tags on it…

Key

Type

Value

Type

Name

The type of PDF object that this dictionary describes; shall be Filespec for a file specification dictionary.

F

string

A file specification string of the form described in PF3200:2008.1::7.11.2, “File Specification Strings,”

EF

dictionary

A dictionary containing a subset of the keys F, UF, DOS, Mac, and Unix, corresponding to the entries by those names in the file specification dictionary. The value of each such key shall be an embedded file stream (see 7.11.4, “Embedded File Streams”) containing the corresponding file. If this entry is present, the Type entry is required and the file specification dictionary shall be indirectly referenced.
The F and UF entries should be used in place of the DOS, Mac, or Unix entries.

So, my version of the FileSpec dictionary follows.

We need a dictionary containing a subset of the keys F, UF, DOS, Mac, and Unix, corresponding to the entries by those names in the file specification dictionary. And then put that under the EF tag in the Filespec dictionary. Damn! This is confusing. Basically we need a dictionary that looks like this…

<< /F N 0 R >>

Where “N 0 R” refer to the embeddedFile Stream object. Here you have the code..

embeddedlst = PDFDict()
embeddedlst.add(‘F’,PDFRef(embedded))

Let’s construct the actual Filespec dictionary. Note that I’ve hardcoded the name to ‘file.pdf’ and that this should be revisited if we are trying to embed more than one file.

WE HAVE EMBEDDED A FILE!!!

The yet incomplete PDF with an embedded file containing “AAAA” is demostrated here, an it actually have something under the ‘paper clip’, check it out …

II) Jump to the embedded PDF with GoToE

Now than we have added an embedded pdf to a pdf we’ll want to jump to it without user intervention and (why not) without javascript.

For this we’ll set up a GoToE action and link it to the OpenAction or some other trigger dictionary in the document.
An action dictionary defines the characteristics and behaviour of an action, and it is described in PDF3200:1008.1::12.6.2(Action Dictionaries).

Embedded go-to actions give a complete facility for linking between a file in a hierarchy of nested embedded files and another file in the same or different hierarchy. The GoToE action is described in PDF3200:1008.1::12.6.4.4(Embedded Go-To Actions), but basically they have this look…

Setting the NewWindow tag to True or False may change how the reader opens the hided file. Funny things may happen when run from inside a browser (!).

OK, all we have left is linking this action to some trigger that wouldn’t call the user attention.. well we have OpenAction but let’s try something a lil different now. Let’s put one of those AA trigger dictionaries to our single dummy page on the host pdf. That’s done with something like this…

page.add(‘AA’,PDFDict({‘O’: PDFRef(action)}))

And finally render it out to stdout…

print doc #:)

And as we expect the pdf to hide in hte parameters.. we can use it like this…
python embeddPDF.pdf evil.pdf > goodness.pdf

III) The virustotal.com test

It’s time for the virustotal.com test. I’ll try to hide the evilness of some PDF embedding it into one of our hosts PDF, as described previously, and see what happens.

I’m tired so I’ll pick one not-so-evil pdf I got from my previous post. So I got this pdf which is a small pdf with a javascrip openaction featuring an obcene heap spray usually easily detected by AVs. That gave this result on virustotal.com, a 14 over 41 score.

Now lets embed it by our embeddPDF.py… I got this pdf. And when pass it to virustotal.com it got detected by 2 of 41 AVs. Here you have the result. Damn! 0 out of 41 seems to be hard to get. Let’s try it again but this time using the obfuscated miniPDF.py version piled on the embeddPDF.py. I got this pdf. I passed it to virustotal… and got

-danger- !! 0/41 !! -danger-

No AV have detected it!!

I suppose there are 1mill ways to accomplish this but it still feels g00d! The results here.

Disabling JavaScript disables nothing but javascript. Though, it will block javascript bugs and also exploits of any kind that use javascript for heapspray or for some memory massaging technique. But remember that really, really bad guys don’t use javascript for their exploits.
In this Post I used JS just as a demo payload and not for the main functioning of the obfuscation.

Just given this a try using the exploit PDF mentioned in the article and VirusTotal is giving me 16/41 for the embedded file vs 23/41 for the plain document so looks like the process still has some merit but is being detected by most of the major vendors now.

Oba!!! That’s great news, right?!
But I still get 7/41 here -> http://bit.ly/9QD8hP. Which are in fact false positives due to the fact it doesn’t do any harm. I should have taken some stats about the AVs progress on this.

Hey feliam, forgive my ignorance if this is a stupid question, but did you ever actually get the javascript in the embedded pdf to execute? I’ve tried numerous times with just a simple js popup alert box, as well as a heap spray exploit with a calc spawn payload but had no luck. I tried it with pdfs I had created, as well as editing testx.pdf that you supplied (made sure to update the xref table as well) but still had no luck.

It is kinda pointless though – with the eyes of a former pro malware researcher – as AV engines will easily unpack all of your layers (embeddedfile, flatedecode). And ur 0/41 danger ratio is impressive, but did U check if it even worked? Same for 2/41, or your goal was just to put something-malicious-looking inside a PDF, and see how many scanners didnt notice it?

This is old now and I’m not sure if this particular technique still works. Probably adobe has stopped opening the embedded file from the OpenAction/GoToE action.
In any case I wouldn’t go as far as saying they will ‘easily unpack’ every layer. They have to put together a complete parser that may handle all corner cases. Consider combining static filter/encryption, pdf actions, javascript, flash, xfa, xslt and xpath together. AV-zilla alert!

3 years ago this worked like charm. I could put whatever inside and no AV reported anything.
This was just a PoC showing that at that point bypassing AV was easy.
My hypothesis: at any given time there are 2347896238946 ways of doing this in pdf and no AV will save you from targeted attacks. Nah, I’m probably wrong.