Where native and natural coincide

Menu

Hermitian and Applewhite’s JPEG

Hermitian: Talk about missing the workflow ! The Applewhite original was not a JPEG but rather a PDF. Remember the totally missing camera METADATA in the Applewhite PDF ? No !!! I didn’t think that you would remember. I don’t recall that you ever posted any camera METADATA from a JPEG that Applewhite created with his camera.

Actual, his original was a jpg which was turned into a PDF. The jpg can be extracted from the PDF, which I did, and I showed you the relevant metadata. Have you been asleep? As to me never posting the metadata, again you are wrong…

Hermitian: What !!! You can’t possibly mean that your freetoy JPEG extractor tool looked into some JPEG that you obtained from an unnamed source and it didn’t find any camera METADATA for Applewhite ? And then there was also no camera METADATA in Applewhite’s PDF image of the pale-Blue reporter’s handout copy.

It’s not the tool but what you do with it. I have come to realize that you have no idea how to extract data from a PDF, unless Illustrator supports it?

The metadata, unremarkably, matches metadata found in other AP photographs. Nothing to exciting here. If only Hermitian had taken the time and effort to do the logical thing…

Hermitian: Now let’s get down to reality on the work flow. So we already know that the Applewhite’s “PDF file “ap_obama_certificate_dm_110427.pdf” was created and produced using Adobe Acrobat 8.26.” Hence we know that this PDF is an Adobe PDF file. We also know that this file loads OK with Adobe Illustrator CS6 and Adobe Illustrator CC. Of course it also opens without error in Adobe XI Pro. And the same applies to every other program that I have that is compatible with Adobe PDF files.

It seems to me that Hermitian does not appreciate that a PDF contains bitmaps or JPEGs. In this case the JPEG is easily extracted. While he is struggling to get his expensive tools to do what I can do with freeware, I have shown how there is just no support for his claims. Once again our poor friend was embarrassed by his lack of investigative research.

That’s because Adobe Photoshop can only open and save Photoshop PDF files.

Again, wrong. Adobe Photoshop can open any PDF but has special capabilities if you save the PDF using Photoshop and remember to turn on the special setting

Hermitian: And common sense would dictate that the AP doesn’t ever furnish it’s original PDF files to any news outlet.

Yes, so why would you expect Applewhite to make a PDF for a small town newspaper🙂 Common sense indeed..

Hermitian still does not get it. The same jpeg that was used to create the AP PDF was used to create the Muscatine PDF. The same metadata… I cannot believe why he overlooked such a simple work flow.

You really should have paid more attention to what my findings were… You’re miles behind right now.

How embarrassing… Our friend does not even know how to extract a simple jpeg from a PDF.

Even though I have shown him several ways of doing so… Well, you can lead a horse to water, but you cannot make him drink.

Hermitian, again, thank you for your contributions as they place in nice contrast your musings versus my hypotheses.

I explained to Hermitian how to extract a jpeg from a PDF here and showed him the relevant metadata. He still appears to be utterly confused as to the workflow, from the moment Scott Applewhite shot his pictures, uploaded them to the AP office, where it was distributed. In addition someone took the photograph and embedded it into a PDF for distribution.
All the times line up quite nicely:

“Again, wrong. Adobe Photoshop can open any PDF but has special capabilities if you save the PDF using Photoshop and remember to turn on the special setting”

The special setting “Maintain Photoshop Editing Capabilities” has nothing to do with compatibility with other Adobe Products. Rather, if you turn on the special setting then the Photoshop PDF may not be compatible with earlier versions of Photoshop. When the setting is turned on, Photoshop warns the user that the PDF may not be compatible with older versions of Photoshop.

And besides, all your blathering doesn’t remove the fact that the Applewhite PDF of the pale-Blue background LFCOLB will not open in Photoshop CS6 or CC. End of story !

Rather, if you turn on the special setting then the Photoshop PDF may not be compatible with earlier versions of Photoshop. When the setting is turned on, Photoshop warns the user that the PDF may not be compatible with older versions of Photoshop.

Is that your best understanding? Let me guess, you do not often use photoshop. But then again, neither do I and I understand… Weird… This ain’t rocket science.

Customer: I’d like the BBQ spare ribs with a side of fries
Hermie: We don’t carry bleu cheese on our menu, sir
Customer: Uh, that’s okay, I just wanted ribs and fries
Hermie: I already told you, we don’t have bleu cheese, sir
Customer: I don’t want bleu cheese. I want to order the ribs!
Hermie: I’m sorry that we don’t have bleu cheese. I don’t control the menu.
Customer: Oh, whatever. I’ll have a burger and chips instead.
Hermie: We don’t have onion rings on our menu either.
Customer: AAAAAAAAAAAAAAAAAAAAAAAAAAAAARGH!

The special setting “Maintain Photoshop Editing Capabilities” has nothing to do with compatibility with other Adobe Products. Rather, if you turn on the special setting then the Photoshop PDF may not be compatible with earlier versions of Photoshop. When the setting is turned on, Photoshop warns the user that the PDF may not be compatible with older versions of Photoshop.

Interesting but no claim was made of compatibility.

And besides, all your blathering doesn’t remove the fact that the Applewhite PDF of the pale-Blue background LFCOLB will not open in Photoshop CS6 or CC. End of story !

Is there any reason it should? Why exactly do we care that a file created by ABC News won’t open in a program not used to create the file?

So then the AP caption writer back at the AP headquarters was the one who turned it into a PDF. Here’s the METADATA from the PDF.

<snip>

So he’s the one who ran the JPEG through Acrobat 8.26 to turn it into a PDF. And none of that JPEG METADATA that you keep bragging about leaked through to the PDF METADTA.

Hermie’s actually getting close to what happened! Still off a bit, though.

And then the AP sent the original JPEG rather than the PDF to the Muscatine Journal.

Yes! Exactly! Then the people at Muscatine Journal got rid of the blue and cropped it, and because they used Photoshop, the JPEG METADATA leaked through to the PDF METADATA, as you so quaintly put it.

And then the AP sent the PDF only to ABC along with the Green background WH LFCOLB.

Oh, so close! Not quite right, though. But I think you’ve got enough to get a passing grade for once. What actually happened is that the AP sent the JPEG to ABC. It was the guy at ABC, not the AP caption writer in DC, who ran the JPEG through Acrobat 8.26 to turn it into a PDF, in the process losing the JPEG METADATA. (ABC later added the WH LFBC downloaded from the Whitehouse site).

Actually, one minor quibble remains. AP didn’t send the JPEG so much as ABC, the Muscatine Journal, and a host of other sites downloaded it from the AP site. But that’s just semantics.

Hermitian is often so close but he cannot really make the connection and thus jumps to conclusions not supported by the data and when people try to help him, he gets all upset. Yes, being wrong is easy, admitting to it is much harder

“It’s not the tool but what you do with it. I have come to realize that you have no idea how to extract data from a PDF, unless Illustrator supports it?

“The metadata, unremarkably, matches metadata found in other AP photographs. Nothing to exciting here. If only Hermitian had taken the time and effort to do the logical thing…

“”Hermitian: Now let’s get down to reality on the work flow. So we already know that the Applewhite’s “PDF file “ap_obama_certificate_dm_110427.pdf” was created and produced using Adobe Acrobat 8.26.” Hence we know that this PDF is an Adobe PDF file. We also know that this file loads OK with Adobe Illustrator CS6 and Adobe Illustrator CC. Of course it also opens without error in Adobe XI Pro. And the same applies to every other program that I have that is compatible with Adobe PDF files.””

“It seems to me that Hermitian does not appreciate that a PDF contains bitmaps or JPEGs. In this case the JPEG is easily extracted. While he is struggling to get his expensive tools to do what I can do with freeware, I have shown how there is just no support for his claims. Once again our poor friend was embarrassed by his lack of investigative research.”

I have attempted to duplicate your JPEG extraction results using several different independent methods but all of the methods failed to produce your results. In reporting my results, I will do so only with a brief description. Where possible. I will avoid posting METADATA for obvious reasons.

As a preliminary check, I opened the file ap_obama_certificate_dm_110427.pdf in 010 Editor and searched for the label YcbCr within the text file. The search returned zero hits. I then switched to HEX mode and searched for the term 59 43 62 43 72. Again the search returned zero hits. I concluded that this PDF file did not contain your YcbCr label.

On 06/20/2013 I opened the file “ap_obama_certificate_dm_110427.pdf” in Adobe Illustrator CS6. Opening the links panel I found only one file, a single embedded file. The image icon for this one file indicated that it was a .PSD Photoshop file. The single link to this original file was broken. The image resolution was 200 PPI x 200 PPI. The pixel dimensions were W = 2698 pixels ; H = 3234 pixels. The page size of this image had been reduced by a scale factor of 36% for both x and y dimensions. The page size of the reduced image was W = 13.49 in. x 16.17 in. The image had been placed into the PDF with zero rotation.

Applying the new unembed command the file “AI_Image.psd” was produced within Adobe Illustrator CS6. Adobe Illustrator assigns the file name to the files that it creates. The unembedded file replaced the original file in the links list. The AI_Image.psd file is also automatically written to disk. Opening this Photoshop file in Photoshop CS6 I found an image of pixel dimensions W = 2700 pixels ; H = 3236 pixels. This converts to inch dimensions of W = 13.50 in. ; H = 16.180 in. The image resolution was again 200 PPI x 200 PPI. These page dimensions were slightly larger in the .PSD format. Opening this PSD file in 010 Editor, I searched for the label YCBCr in the text file and got zero hits. Changing the view to HEX I, also searched for the label 59 43 62 43 72 and also got zero hits. I concluded that this label was not in the embedded image file.

Comparing the METADATA for the files ap_obama_certificate_dm_110427.pdf and AI_Image.psd I found that the original METADATA from the PDF file had been replaced entirely. The METADATA in the unembedded file AI_Image.psd contained only METADATA written by Adobe Illustrator CS6 and Photoshop CS6.

I had originally applied the next method sometime after Jul 4, 2013 when I first posted on the NBC site. Unfortunately, I trash-canned those files so I repeated the analysis today.

Opening the file “ap_obama_certificate_dm_110427.pdf” in Adobe Illustrator CC I executed File/Save As/SVG. This operation creates a small file ap_obama_certificate_dm_110427.svg of file size 32 Kb and one external JPEG file “73B46875.jpg” of size 646 Kb. The JPEG file is assigned a random number name by Illustrator. The original unnamed image file in the links list was replaced by the file name “73B46875.jpg” with a link to the external file “73B46875.jpg” on disk.

Then opening the file ap_obama_certificate_dm_110427.svg in Illustrator CC the certificate image page size in pixels is W = 2700 pixels ; H = 3239 pixels. The inch dimensions of the page are W = 13.49 in. ; H = 16.1753 in. The single file 73B46875.jpg is a single linked file in the links panel. The pixel resolution is 200 PPI x 200 PPI. The image was placed in the PDF at a 100% scale. The image was not rotated.

I then opened 73B46875.jpg in 010 Editor. In the HEX mode the label “JFIF” was found in the first line. The word “Ducky” appeared in the second line. And the word “Adobe” was seen in the third line. This file has all the attributes of a JFIF formatted file.

I then searched this same file for the YcbCr text and the 59 43 62 43 72 HEX and the searches returned zero hits.

The METADATA from 73B46875.jpg contained none of the METADTA from ap_obama_certificate_dm_110427.pdf.

For the third and final method I extracted a .jpg file directly from the AP PDF file. This was done by means of my PDF code parser. This parser can extract the bitmaps in six different file formats, one of which is .jpg.

I named the extracted .jpg file “ap_obama_certificate_dm_110427 parser Extracted JPEG.jpg”.
Opening this file in Photoshop CC I found a pale-Blue certificate image with page size in pixels W = 2698 pixels ; H = 3234 pixels. However, the page dimensions were W = 37.472 in. ; H = 44.917 in. The pixel resolution was 72 PPI x 72 PPI. I will be posting more comments about the larger page size and lower resolution later when I post the same results for the archive copy of birth-certificate-long-form.pdf.

I then opened this JPG file in 010 Editor and found the JFIF label in the first line in both the text and HEX modes. This JPG file has all the attributes of a JFIF formatted image file. I then searched for YcbCr in text mode and 59 43 62 43 72 in HEX mode and twice got zero hits.

As a double check, I applied a JPG binary template to the .jpg file and again searched for the same text and HEX terms. Both searches returned zero hits.

I the opened the file “ap_obama_certificate_dm_110427 parser Extracted JPEG.jpg” in Photoshop CC to read the .jpg file info.

Again, the file “ap_obama_certificate_dm_110427 parser Extracted JPEG.jpg” contained no METADATA from the file “ap_obama_certificate_dm_110427.pdf” that would reveal the original creation date, or the original PDF creator or PDF producer.

However, the same PDF code parser did report the original METADATA from the file “ap_obama_certificate_dm_110427.pdf”when the text of the AP PDF METADATA was written to the window of the parser.

As you recall, I had previously searched the Xerox 7535 file “wh-lfbc-scanned-xerox-7535-wc.pdf” for the text label YCbCr and its HEX equivalent 59 43 62 43 72. Both searches returned zero hits.

As a preliminary check, I opened the file ap_obama_certificate_dm_110427.pdf in 010 Editor and searched for the label YcbCr within the text file. The search returned zero hits. I then switched to HEX mode and searched for the term 59 43 62 43 72. Again the search returned zero hits. I concluded that this PDF file did not contain your YcbCr label.

Let me help you out here. If you check out the Object that encodes the JPEG, you will realize that Xerox uses ‘double compression’ and applied FlateDecode to the DCTDecode.

Therefore you need to properly extract the object.

Life is simpler with the Preview created versions since Apple is smart enough not to apply Flatedecode to DCTDecode encoded objects.

So what you need to do is extract the binary code for the object which contains the jpeg. You can do this with a HEXeditor or by using qpdf. Once you extract the object, you need to ‘deflate’ the object and the resulting object is a jpg.

Again, the tools are just part of the equation, you have to know how to apply them properly.

As a preliminary check, I opened the file ap_obama_certificate_dm_110427.pdf in 010 Editor and searched for the label YcbCr within the text file. The search returned zero hits. I then switched to HEX mode and searched for the term 59 43 62 43 72. Again the search returned zero hits. I concluded that this PDF file did not contain your YcbCr label.

As expecpected, for two reasons. 1) the ABC LFBC PDF was not created by a Xerox scan 2) the comment should only be found in the extracted JPEG, not the PDF

On 06/20/2013 I opened the file “ap_obama_certificate_dm_110427.pdf” in Adobe Illustrator CS6. Opening the links panel I found only one file, a single embedded file. The image icon for this one file indicated that it was a .PSD Photoshop file. The single link to this original file was broken. The image resolution was 200 PPI x 200 PPI. The pixel dimensions were W = 2698 pixels ; H = 3234 pixels. The page size of this image had been reduced by a scale factor of 36% for both x and y dimensions. The page size of the reduced image was W = 13.49 in. x 16.17 in. The image had been placed into the PDF with zero rotation.

Sounds about right. Take an image in the standard 72 ppi resolution (one pixel per point) and scale it by 36% and you get a 200 ppi image. And there’s no particular reason for there to be a rotation, since this wasn’t scanned on a Xerox WorkCentre.

Applying the new unembed command the file “AI_Image.psd” was produced within Adobe Illustrator CS6. Adobe Illustrator assigns the file name to the files that it creates. The unembedded file replaced the original file in the links list. The AI_Image.psd file is also automatically written to disk. Opening this Photoshop file in Photoshop CS6 I found an image of pixel dimensions W = 2700 pixels ; H = 3236 pixels. This converts to inch dimensions of W = 13.50 in. ; H = 16.180 in. The image resolution was again 200 PPI x 200 PPI. These page dimensions were slightly larger in the .PSD format. Opening this PSD file in 010 Editor, I searched for the label YCBCr in the text file and got zero hits. Changing the view to HEX I, also searched for the label 59 43 62 43 72 and also got zero hits. I concluded that this label was not in the embedded image file.

Interesting that it added a one-pixel border. The lack of YCrCb comment is expected, since this file was not generated by a Xerox WorkCentre.

Comparing the METADATA for the files ap_obama_certificate_dm_110427.pdf and AI_Image.psd I found that the original METADATA from the PDF file had been replaced entirely. The METADATA in the unembedded file AI_Image.psd contained only METADATA written by Adobe Illustrator CS6 and Photoshop CS6.

This is unsurprising, since the unembedding process you used appears to alter the embedded file (as evidenced by the different canvas size)

I had originally applied the next method sometime after Jul 4, 2013 when I first posted on the NBC site. Unfortunately, I trash-canned those files so I repeated the analysis today.

Opening the file “ap_obama_certificate_dm_110427.pdf” in Adobe Illustrator CC I executed File/Save As/SVG. This operation creates a small file ap_obama_certificate_dm_110427.svg of file size 32 Kb and one external JPEG file “73B46875.jpg” of size 646 Kb. The JPEG file is assigned a random number name by Illustrator. The original unnamed image file in the links list was replaced by the file name “73B46875.jpg” with a link to the external file “73B46875.jpg” on disk.

Then opening the file ap_obama_certificate_dm_110427.svg in Illustrator CC the certificate image page size in pixels is W = 2700 pixels ; H = 3239 pixels. The inch dimensions of the page are W = 13.49 in. ; H = 16.1753 in. The single file 73B46875.jpg is a single linked file in the links panel. The pixel resolution is 200 PPI x 200 PPI. The image was placed in the PDF at a 100% scale. The image was not rotated.

Interesting. Once again, this changed the canvas size of the file, so we know it is not identical to the original.

I then opened 73B46875.jpg in 010 Editor. In the HEX mode the label “JFIF” was found in the first line. The word “Ducky” appeared in the second line. And the word “Adobe” was seen in the third line. This file has all the attributes of a JFIF formatted file.

Well, it’s good to see that Illustrator can save properly formatted JFIF/JPEG files.

I then searched this same file for the YcbCr text and the 59 43 62 43 72 HEX and the searches returned zero hits.

Again, as expected, since 1) this is not from a file created by a Xerox WorkCentre, and 2) the file has been altered by Illustrator.

The METADATA from 73B46875.jpg contained none of the METADTA from ap_obama_certificate_dm_110427.pdf.

Not surprising.

For the third and final method I extracted a .jpg file directly from the AP PDF file. This was done by means of my PDF code parser. This parser can extract the bitmaps in six different file formats, one of which is .jpg.

I named the extracted .jpg file “ap_obama_certificate_dm_110427 parser Extracted JPEG.jpg”.
Opening this file in Photoshop CC I found a pale-Blue certificate image with page size in pixels W = 2698 pixels ; H = 3234 pixels. However, the page dimensions were W = 37.472 in. ; H = 44.917 in. The pixel resolution was 72 PPI x 72 PPI. I will be posting more comments about the larger page size and lower resolution later when I post the same results for the archive copy of birth-certificate-long-form.pdf.

Sounds right. Scale a 72 ppi image by 36% and you get a 200 ppi image, and the page sizes also match up to the 36% scaling factor (and that statement was redundant, by the way).

I then opened this JPG file in 010 Editor and found the JFIF label in the first line in both the text and HEX modes. This JPG file has all the attributes of a JFIF formatted image file. I then searched for YcbCr in text mode and 59 43 62 43 72 in HEX mode and twice got zero hits.

As expected, since this file was not created by a Xerox WorkCentre.

As a double check, I applied a JPG binary template to the .jpg file and again searched for the same text and HEX terms. Both searches returned zero hits.

As expected, since this file was not created by a Xerox WorkCentre.

I the opened the file “ap_obama_certificate_dm_110427 parser Extracted JPEG.jpg” in Photoshop CC to read the .jpg file info.

Again, the file “ap_obama_certificate_dm_110427 parser Extracted JPEG.jpg” contained no METADATA from the file “ap_obama_certificate_dm_110427.pdf” that would reveal the original creation date, or the original PDF creator or PDF producer.

Well, the pdf was created from the jpeg, and the METADATA was only for the pdf, so not surprising that they don’t match.

However, the same PDF code parser did report the original METADATA from the file “ap_obama_certificate_dm_110427.pdf”when the text of the AP PDF METADATA was written to the window of the parser.

That’s an interesting trick, but I’m not sure it has any relevance.

As you recall, I had previously searched the Xerox 7535 file “wh-lfbc-scanned-xerox-7535-wc.pdf” for the text label YCbCr and its HEX equivalent 59 43 62 43 72. Both searches returned zero hits.

As expected, because the YCrCb comment is found in the embedded JPEG, not the PDF itself. Try extracting the JPEG using parser and then searching the extracted JPEG for the comment.

“Why do you expect a Xerox specific JPEG comment to be found in the AP pdf?…”

1. The purported Xerox specific JPEG comment YCbCr is found in the WH LFCOLB.

2. One doesn’t have to deflate or extract anything to find it there.

3. The YCbCr is non-functional in the WH LFCOLB PDF.

4. You claim that your Xerox forger put the YCbCr label there.

5. But the label YCbCr is not found anywhere in the file wh-lfbc-scanned-xerox-7535-wc

6. Both files are PDF files.

7. I have successfully extracted the bitmap from “ap_obama_certificate_dm_110427.pdf” three different ways. One method produced .PSD — the other two methods produced JFIF formated.

8. There is absolutely nothing wrong with my extracted bitmaps.

9. The label YCbCr is not unique to Xerox in any way.

So you are claiming that the Xerox in the WH was used to produce the WH LFCOLB PDF but was not used to produce the file “ap_obama_certificate_dm_110427.pdf” or to make the Xerox copies of the Obama certified LFCOLB for the reporters handout package ?

You have failed to show this. I have shown that Xerox creates these comments and thus the corroborate my work flow.

Hermitian: So you are claiming that the Xerox in the WH was used to produce the WH LFCOLB PDF but was not used to produce the file “ap_obama_certificate_dm_110427.pdf” or to make the Xerox copies of the Obama certified LFCOLB for the reporters handout package ?

You realize that embedded JPEG comments do not transfer to paper copies? But yes, the Xerox was not used to created the AP PDF.

Thanks for proving my points. Very helpful. Together we may figure out the truth🙂

I have provided you, once again, with the simple steps to recreate the embedded JPEG. Any luck with that? Need help installing python?

1. The purported Xerox specific JPEG comment YCbCr is found in the WH LFCOLB.

Glad to see you finally acknowledge that fact.

2. One doesn’t have to deflate or extract anything to find it there.

Wrong! You have to extract the JPEG from the PDF (but IIRC, due to Preview, you don’t have to deflate)

3. The YCbCr is non-functional in the WH LFCOLB PDF.

Agreed.

4. You claim that your Xerox forger put the YCbCr label there.

The only JPEGs found (in a survey spanning several thousand) with that particular comment was in documents known or believed to have been produced by Xerox WorkCentre machines.

5. But the label YCbCr is not found anywhere in the file wh-lfbc-scanned-xerox-7535-wc

Like with the WH LFBC PDF, you have to extract the JPEG first. You also have to DeFlate it.

6. Both files are PDF files.

7. I have successfully extracted the bitmap from “ap_obama_certificate_dm_110427.pdf” three different ways. One method produced .PSD — the other two methods produced JFIF formated.

True (but as I noted, two of those ways produced files clearly modified from the original embedded file)

8. There is absolutely nothing wrong with my extracted bitmaps.

Other than two of them are modified from the original.

9. The label YCbCr is not unique to Xerox in any way.

Except that no-one has been able to find JPEGs from a non-Xerox source with that comment, while it is present in all the Xerox WorkCentre JPEGs.

So you are claiming that the Xerox in the WH was used to produce the WH LFCOLB PDF but was not used to produce the file “ap_obama_certificate_dm_110427.pdf” or to make the Xerox copies of the Obama certified LFCOLB for the reporters handout package ?

We’ve been saying that for months! Pay attention, for fuck’s sake. Aside from the photocopying/printing the packets, the Xerox WorkCentre had no part in creating the various files descended from the original Applewhite file. Taking a digital photo of a photocopy created by a Xerox WorkCentre will not insert METADATA or internal comments from the Xerox into the picture.