I have a project where I would like to allow a user to view a document (PDF) in their browser by clicking on a link or thumbnail of the file. They would be able to download said file. Ive just begun to research the vulnerabilities associated with PDF files but here is what I can gather so far:

PDFs can contain executable code that could create exploits or identify other exploits. PDFs are not unique in this regard.

To protect against any malicious PDF action occurring, browsers now open all PDF files in a sandbox. These sandboxes can be more secure than local viewers. Browsers that open PDF files in a sandbox:

Chrome : yes

Firefox : yes

Edge : ?

IE : ?

Safari : ?

I plan to store the PDF files on the file system outside of web root. The files meta data will be indexed inside a database.

Communication will be over SSL.

Questions:

Does opening a PDF file in a browser sandbox eliminate risks with displaying PDF in browser?

Are there more ideal file formats (from a security standpoint) to use over PDF for the use case described?

What types of server side validation could/should I perform when displaying a PDF document?

pdf problems are generally only a problem on old (long unpatched) systems.
– dandavisFeb 10 at 0:59

@dandavis have you looked at the CVEs for PDF products from the last couple years? They may tell a slightly different story.
– multithr3at3dFeb 10 at 3:30

@multithr3at3d: i just reviewed all pdf mentions from 2019; they are all relatively minor problems like DOS or integer overflows, or relate to 3rd party pdf tools, not official distros. It's not perfect, but it's no longer the wide-open backdoor it was years ago.
– dandavisFeb 10 at 17:56

1 Answer
1

Assuming the browser makes things safe for you, you shouldn't have anything to do on your end (more or less). Unless you are yourself implementing a browser.

One thing I've seen, though, are processors that will remove scripts from such file formats (especially in MS-Word/Execl files). Then you know that at least they won't execute anything on the client's machine. Of course, as a result the file may not display as expected by the author.

other formats

On the Internet, I'd just use HTML. However, if your clients need printing your data, it's better to have a PDF file (as it is more likely to appear as expected on the printer).

Otherwise, pretty much all file formats may include a script. Maybe RTF is limited in that arena, but I've not checked that format in ages.

PDF validation

For sure, you should make sure that the file is a PDF file, if not so, it can be refused. For that test, make sure you use a mechanism similar to what the file command line does:

file <your-file>.pdf

Just checking that a file has a given extension (.pdf) would not help one bit.

The best technique is for the server to read the entire file. Maybe run it through a PDF parser which spits out the text in a .txt file. If that works, then you can assume the file is a PDF and not some other random format.

Running it through a PDF parser won't help - malicious PDF files are actually PDF files so they will check out normally
– Conor ManconeFeb 10 at 2:05

@ConorMancone You're right. It will accept valid albeit possibly malicious PDF files. But it will at least trim out all those files that are not PDFs or even partial PDFs which can also cause some issues. It's a little bonus, I'd say. But it won't help that much with security unless that validation can tell you whether there are scripts in the file, then it becomes more powerful.
– Alexis WilkeFeb 10 at 4:35