I have got a sample PDF from the Internet, the content is listed below.

%PDF-1.7

1 0 obj

<<

/Type /Catalog

/Outlines 2 0 R

/Pages 3 0 R

>>

endobj

2 0 obj

<<

/Type /Outlines

/Count 0

>>

endobj

3 0 obj

<<

/Type /Pages

/Kids [4 0 R]

/Count 1

>>

endobj

4 0 obj

<<

/Type /Page%

/Parent 3 0 R

/MediaBox [0 0 612 792]

/Contents 5 0 R

/Resources

<< /ProcSet 6 0 R

/Font << /F1 7 0 R >>

>>

>>

endobj

5 0 obj

<< /Length 48 >>

stream

BT

/F1 24 Tf

100 700 Td

(Hello World)Tj

ET

endstream

endobj

6 0 obj

[/PDF /Text]

endobj

7 0 obj

<<

/Type /Font

/Subtype /Type1

/Name /F1

/BaseFont /Helvetica

/Encoding /MacRomanEncoding

>>

endobj

xref

0 8

0000000000 65535 f

0000000012 00000 n

0000000089 00000 n

0000000145 00000 n

0000000214 00000 n

0000000381 00000 n

0000000485 00000 n

0000000518 00000 n

trailer

<<

/Size 8

/Root 1 0 R

>>

startxref

642

%%EOF

Adobe Reader DC can open the document without any problem but it asks if I want to save the document when I close it without making any changes. By searching online I realized that the source of the problem is the cross reference table. More precisely the byte offsets of all objects as well as that for xref itself are wrong. I used a Hex edit to find correct byte offsets and corrected them, and the problem is gone.

But my question is, if xref is to be used to locate the objects in byte stream, the above PDF will certainly lead to the wrong places. It looks to me that Adobe Reader does not rely on xref since it opens the above PDF without any problem. But if that is the case, what is the use of xref?

In order for Acrobat reader to do that (to repair wrong offsets), it must not rely on xref but search objects directly from the document. Again, back to my question, if everyone (including Adobe) has to search object directly, what is the use of xref?

The cross-reference table is used as an optimization to allow software to quickly locate important data objects, but it is possible to locate these objects just by scanning the contents of the file.

When the PDF format was introduced in the 1990s, machines were slow enough that this speed-up was necessary. With modern hardware, this optimization is not nearly as critical except for extremely large PDFs; but the XREF table is still required to be correct in a syntactically valid PDF.

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot delete your posts in this forumYou cannot edit your posts in this forumYou cannot create polls in this forumYou cannot vote in polls in this forum