Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.

Strange PDF file of the week

October 19, 2010 48 sec read

I continue to be impressed at the sheer variations we come across with PDF files, even after 10 years of writing a PDF parser….

Today we had a PDF file created with a tool call PdfGenLib. Here is a sample of what it produces…

000003 0 obj

<<

/Type /Outlines

/First 000004 0 R

/Last 000004 0 R

/Count 1

>>

endobj

000004 0 obj

<<

/Title (Documents)

/Parent 000003 0 R

>>

endobj

000005 0 obj

The interesting thing is that it appears to be hard-coded to generate a 6 digit number for each PDF object reference (ie 000001 0 R) rather than 1 0 R. The PDF reference does not say that you cannot do this, but it does not add anything to the PDF file except to make it larger. So no other tools do this.

And it means that some clever code I wrote last week to allow for the first object in the PDF being object 0 0 R (and not object 1 0 R) needed to be modified. Still it means I will never be short of work 😉

Do you have any similar experiences with ‘strange’ PDF files?

This post is part of our “Understanding the PDF File Format” series. In each article, we discuss a PDF feature, bug, gotcha or tip. If you wish to learn more about PDF, we have 13 years worth of PDF knowledge and tips, so click here to visit our series index!

Mark Stephens has been working with Java and PDF since 1999 and has diversified into HTML5, SVG and JavaFX.

He also enjoys speaking at conferences and has been a Speaker at user groups, Business of Software, Seybold and JavaOne conferences. He has a very dry sense of humor and an MA in Medieval History for which he has not yet found a practical use.