Series Article Index

Having been working as developers specializing in PDF and HTML5 Solutions since 1999, the developers at IDR Solutions in their pursuit to being the first to make many features in their PDF and HTML5 Solutions available on the Java Platform have regularly blogged about their adventures over the years.

With so much Information available it became clear that there was a need to have all the archived information in one location so you can easily use it for reference.

2 Replies to “Series Article Index”

I am working with USN ship deck logs provided by National Archives. My objective is the extraction of the page images for image processing to locate certain features on the page(s). My S/W is able to locate the image stream . . . endstream & length from the PDF.

In early versions of the PDFs, the pages consisted of JPEG images embedded within the PDF. In the latest PDFs, the images are stored in what appears to be an LZ77 format. I can extract & inflate the image stream using zlib, bu the resulting image looks like salt & pepper.

Are your aware of any other info defined in the PDF that might be needed for the inflate op?

BTW – the image stream starts with x78 x9C, that is, there is no header in the stream before those to byte.

If it is a DCTDecode block, my guess is that it is not RGB. In that case you would need to post-process it. I would recommend using a tool like Photoshop or Itext’s RUPS to drill down and see what is happening.