Where native and natural coincide

Menu

MS – Orly v Democrat – Henry Blake Affidavit – Part 2

Henry Blake, in his affidavit, observes what he believes to be artifacts in a document. I believe that the data point to a workflow where PFU ScanSnap Manager Software was used to capture the scan from a Fujitsu ScanSnap Scanner which was subsequently inserted into the PDF using Adobe Acrobat 9.51 Paper Capture. There is nothing that suggests anything nefarious, even though the presence of the invisible objects has not been fully explained. They neither add nor distract from the facts. Let’s start with the most useful document which was submitted as an attachment to Document 35 filed in the case:

The Catalog object (Obj 1) shows that it contains /Outlines, an /AcroForm and 4 /Pages

A PDF document may optionally display a document outline on the screen, allowing the user to navigate interactively from one part of the document to another. The outline consists of a tree-structured hierarchy of outline items (sometimes called bookmarks), which serve as a visual table of contents to display the document’s structure to the user. The user can interactively open and close individual.

Outlines are used to combine sets of pages in a “Table of Content”. Two such outlines are identified. One marked “FUDDY LETTER 5-26-12”, which contains 3 pages and one marked “Fuddy Letter Attachment”, which contains 1 page (the birth certificate).

The are 7 BI/EI inline Image Objects with varying colors, and a single byte with turned off ImageMask. The cm scales it and translates the image to form a rectangle.

There are also 14 line draw objects (invisible lines). In addition there is evidence of incomplete OCR. Nothing really suggests any nefarious purposes or actions other than scanning of the document and importing it into a PDF using paper capture plugin.

First the JPEG background is rendered, which was scanned. The color change in the text label shows how the color was poorly captured by the scanner.

Subsequently, several text blocks are written to the PDF. They are written in an invisible color, which allows them to be selected and copied. Many of the text is marked as ‘suspect’, indicating OCR errors.

Thus, more words were selectable than are flagged as suspects. Why would anyone apply OCR to a document page, find that most of the text was not found and converted, and then only correct a few words that were marked as suspects but not all of them that were flagged? And then why would that person also not enter the remaining text which was not found and converted by OCR?

In other words, Hermitian, you have a few questions but nothing really that shows of any fraudulent actions.
Just an import using Adobe Paper Capture plugin which happens to also do OCR, which of course is very useful for the first three pages.

Yes, this is necessarily true, except in the case where the OCR software tags all the text it found as suspect. You certainly won’t find less words that are selectable than flagged as suspected text.

Why would anyone apply OCR to a document page, find that most of the text was not found and converted, and then only correct a few words that were marked as suspects but not all of them that were flagged?

What are you babbling about? I don’t see any evidence that anyone corrected words marked as suspect. You do realize that if the OCR software finds a close enough match to both the individual letters and the internal dictionary, it makes the word selectable without marking it as suspect, right? Oh yeah, that would require you to actually understand something about which you are opining.

And then why would that person also not enter the remaining text which was not found and converted by OCR?

How about a simpler workflow. Secretary scans the printout using a scanner with OCR turned on by default. Secretary doesn’t look to see if OCR was accurate (and may not even be aware it was on). Secretary uploads pdf to court.

Actually, NBC is probably right (I cross-posted with him). Run it through Adobe Paper Capture to OCR the first three pages. Who cares if the BC is OCR’d properly? It’s the text of the accompanying letter that needs to be right, if they even bothered to examine that.

Examination of the internal code structure of court document 35-1.pdf yielded two important findings regarding the LFCOLB PDF image on page 4.

1. Firstly, the 7 hidden Black rectangles were identified as “image masks” with “overprint off”. These Black rectangles are also “inline” bitmap images.

2. Secondly, the 14 hidden line objects were identified as “vector objects (stroked)” with “overprint off”.

Each of these 21 added objects occupies a single layer in the layer stack. The flattened bitmap image (of the page 2 LFCOLB PDF image from document 15-1.pdf) occupies the bottommost layer. Consequently, each of the 21 added objects is above the flattened LFCOLB layer in the stack.

Then, typically, the Black rectangles would overprint the Green LFCOLB image but in this case the Black rectangles are set to “knockout”. Normally a bitmap image using Black should be set to overprint.

These line and rectangle objects would not have been added by a normal scan of a paper document in a typical office environment. Instead the objects would have been added by a human using a vector graphics program. Typically, these types of objects are added when text or other graphic elements are manually placed on the page. The added lines and Black rectangles are utilized as an aid to the placement and alignment of other objects.

You may have overlapping colored objects in your PDF document, for example text or an image on a colored background. If so, you can specify what should happen with these colors when they are printed:

Knockout, meaning that the colors of the object in the foreground cut out the area underneath. In other words, the background color is erased and the resulting color will be the foreground color.
Overprint, meaning that the colors of the object are printed on top of the back­ground colors. The resulting color is a combination of the foreground and the background color.

Overprint actually blends the colors – it is a transparency method. It is not what you normally do with solid color bitmap images. Certainly not with scanned images. With scanned images, you would expect knockout to be used. I doubt anything but the highest-end scanners would use overprint.

Frankly, if I saw a purportedly scanned pdf with “overprint on” I would strongly suspect that the image had been altered.

A caveat: if you overprint with Black, it will still come out black, and is sometimes preferred for printing to avoid misregistry (where the inkjet colors don’t quite align, thus leaving weird colors at the edges of objects). This is true only for true Black, anything else will blend colors.

Assuming that the Obama LFCOLB PDF image on page 4 of the court document 35-1.pdf was created by means of a human operator scanning a printout of page 2 of court document 15-1.pdf in a Fugitsu ScanSnap #S1500 scanner into Acrobat 9 (with OCR turned on) then OCR would assign each word to one of the three following categories:

1. Those words which are deciphered and made selectable

2. Those words which are deciphered but are flagged as suspect for errors – these words are also made selectable

3. Those words which are not deciphered – these words are not made selectable

The Obot claim is that this assumed work flow produced the LFCOLB image which comprises page 4 of court document 35-1.pdf. However, much of the text appearing on page 4 of court document 35-1.pdf was not deciphered and thus fell into category 3.

Of the certificate words on the page 4 LFCOLB that were deciphered, most were marked as suspect. Those words (or characters) which were made selectable but were not flagged as suspect include “OF”, “61″ (in the certificate number) and the typed Roman numeral “II”. The words (or numbers) “Case”, ”Filed 05/04/12, “Page” which are part of the original case label (i.e. the Green label) were also marked as suspect. However all of the text of both case labels was deciphered and made selectable.

A significant finding of the inspection (of page 4 of document 35-1.pdf) within Adobe Acrobat XI Pro was that none of the form text was deciphered by the purported OCR except for the words “STATE”, “HAWAII”, “CERTIFICATE OF LIVE BIRTH”, and “DEPARTMENT OF HEALTH”. The deciphered words are in the largest font printed on the certificate form. None of the smaller text printed on the form was deciphered and made selectable.

These results are atypical because the OCR algorithms included with the various versions of Adobe Acrobat typically detect more words than not – as do most of the popular OCR programs. Two popular programs are ABBY PDF Transformer Pro 3.0, and PDF-XChange Viewer Pro version 2.5.

For reference, I applied the ABBY PDF Transformer 3.0 program to the original WH LFCOLB PDF image. This PDF utility does both OCR and MRC. I turned the MRC off and scanned for OCR only. The ABBY OCR algorithm deciphered all of the typed text except for the word “Male”. The OCR scan also failed to decipher the form text “Sex”, “6a.”, “6c.”, “8.”, “20.”, ”Other”, and in box 22 ”Date Accepted by Reg.”, and the date stamp “AUG -8 1961″. The WH LFCOLB file is a one-page PDF file.

I also applied PDF-XChange Viewer Pro version 2.5 to scan the WH LFCOLB PDF image for OCR. All of the typed text was made selectable. The form text that was not made selectable included “Sex”, “6a.”, “6c.”, “8.”, “20.” and the Reg. General’s date stamp “AUG -8 1961″. All of the smallest form text was made selectable.

I also applied the ABBY PDF Transformer 3.0 OCR algorithm to document 15-1.pdf. Page 2 of document 15-1.pdf is identical to the WH LFCOLB image except for the case label added to the top edge of the page. The OCR algorithm deciphered the entire case label, and all of the typed text except for the one word “Male”. Additionally the form text (or numbers) “Sex”,“6a.”, “6c.”, “8.”, “20.”,“Other”, “Date Accepted by Reg.” and the associated date AUG -8 1961 were not deciphered. These OCR results (except for the added case label) are the same as for the WH LFCOLB image. Both pages of document 15-1.pdf were scanned for OCR.

Finally I also applied the ABBY PDF Transformer 3.0 OCR program to the four-page document 35-1.pdf. The scan deciphered both case labels and found all of the typed text with the exception of the “X” in the No box within form box 7g. The form text that was not deciphered included “5a. Month”, “5b. Hour”, “6b. Island”, “Town Limits”, “Island”,“7d. Street Address”, “ district”, “7g. Is Residence on a Farm or Plantation?”, “Mother”, “17b. Date Last Worked”, “Signature of Parent”, “Informant”, “Parent”, “Other”, “18b. Date of Signature”, “hour stated”, “M.D.”, “22. Date Accepted by Reg. General”, “AUG -8 19″. Additionally, the following warning was returned by the scan: “Page 4 Warning Check the document language”.

The difference in image resolution of the mostly text layer of the WH LFCOLB PDF image and the uniform resolution of the page 4 LFCOLB PDF image likely explains why less text was detected in this trial OCR scan of page 4 of document 35-1.pdf than the scans of the WH LFCOLB and the page 2 LFCOLB. The resolution (150 PPI) of the page 4 LFCOLB PDF image (last page of 35-1.pdf) is lower than the resolution (300 PPI) of the mostly text layer of the WH LFCOLB PDF image (and the page 2 LFCOLB PDF image). The smallest form text of the page 4 LFCOLB PDF image would be the most affected by the reduced resolution.

The Obot claim is that page 4 of document 35-1.pdf was created by a scan of a paper copy of page 2 of document 15-1.pdf. The METADATA from document 35-1.pdf indicates that the PDF document was created by PFU ScanSnap Manager 5.0.21 #S1500 and produced by the Adobe Acrobat 9.51 Paper Capture Plug-in. Thus the document would have been created by means of a Fugitsu ScanSnap S1500 scanner and Adobe Acrobat 9. The PDF document would have been created using the “PDF from scanner” mode in Acrobat 9 in a customized scan with “Make Searchable (Run OCR)” and “Optimized Scanned PDF” options selected.

If this indeed was the actual workflow, then the results from the trial OCR scans of the three LFCOLB PDF images reported herein do not explain how the assumed workflow could have yielded the observed poor results. The reported results from the trial scans indicate that OCR should have detected most of the text on the page 4/11 LFCOLB but it did not.

This was first detected when the page 4/11 LFCOLB PDF image was opened in Adobe Acrobat XI Pro and the “Find All Suspects” tool was applied. The “Select Text” tool was also utilized. Much of the text on page 4 of 35-1.pdf was found to be not selectable. Of the selectable text, most was also flagged as suspect. The words “Case”, “Filed 05/04/12″ and “Page”in the original case label (i.e. the Green label) were flagged as suspect. However, both case labels were entirely selectable. Of the identified words and numbers that were made selectable by the purported OCR only the word “OF”, the number “61″ and the Roman numeral “II” were not flagged as suspect.

The findings reported herein indicate that the particular words on page 4 of document 35-1.pdf that were made selectable did not result solely from the application of OCR. Rather it is more likely that human intervention also occurred. Otherwise, why was only the largest printed text on the certificate form made selectable? Then, more importantly, why was none of the smaller form text made selectable in this purported OCR scan?

If not this scenario, then the peculiar internal structure of the page 4 PDF image must have defeated the Adobe Acrobat OCR scan. This scenario is also unlikely assuming that the PDF file was created by first scanning a paper document to create a flattened bitmap image and then embedding this bitmap image into a single layer within a PDF document.

The reality is much simpler. The rectangles do not show up because of the double image mask layer which, because of its settings, removes anything printed inside and outside of it other than the base image layer.
The OCR’ed text is rendered ‘invisible’ by setting it to invisible.

Nothing that suggests any manual alterations so far, just a simple scan and capture.

A significant finding of the inspection (of page 4 of document 35-1.pdf) within Adobe Acrobat XI Pro was that none of the form text was deciphered by the purported OCR except for the words “STATE”, “HAWAII”, “CERTIFICATE OF LIVE BIRTH”, and “DEPARTMENT OF HEALTH”. The deciphered words are in the largest font printed on the certificate form. None of the smaller text printed on the form was deciphered and made selectable.

It all depends on the quality of the image that went into the paper capture plugin. Given the poor color of the scanned in document, it should not come as a surprise that most of the text was not captured. Furthermore, most of the text that was captured was captured as misspelling.

I did not present the OCR part but again, it all points to simple OCR of a poorly scanned input file, nothing more, nothing less.

Your workflow misses the poor quality of the original scan. GIGO so to speak….

For instance Department of Health is OCRed as

DlPAlTMEMT Of HEA.LTH

Obama is recognized as Obaka, Honolulu as Ronolu1ll and so on

Nothing out of the ordinary so far. I bet that the blocks and lines are created by the capture software to assist in recovery of the ‘form’ look&feel.

The Obot claim is that page 4 of document 35-1.pdf was created by a scan of a paper copy of page 2 of document 15-1.pdf. The METADATA from document 35-1.pdf indicates that the PDF document was created by PFU ScanSnap Manager 5.0.21 #S1500 and produced by the Adobe Acrobat 9.51 Paper Capture Plug-in. Thus the document would have been created by means of a Fugitsu ScanSnap S1500 scanner and Adobe Acrobat 9. The PDF document would have been created using the “PDF from scanner” mode in Acrobat 9 in a customized scan with “Make Searchable (Run OCR)” and “Optimized Scanned PDF” options selected.

I propose a much more sensible workflow:

1. Scanned using Fugitsu ScanSnap S1500 software
2. Imported into Acrobat using the Paper capture together with the 3 pages of the letter, creating two sets of internal document, just as observed.

These results are atypical because the OCR algorithms included with the various versions of Adobe Acrobat typically detect more words than not – as do most of the popular OCR programs. Two popular programs are ABBY[sic] PDF Transformer Pro 3.0, and PDF-XChange Viewer Pro version 2.5.

Ah, most. Implying not all. Therefore, we should be careful to use the actual OCR program used to create the file. Do we happen to know which one?

The Obot claim is that page 4 of document 35-1.pdf was created by a scan of a paper copy of page 2 of document 15-1.pdf. The METADATA from document 35-1.pdf indicates that the PDF document was created by PFU ScanSnap Manager 5.0.21 #S1500 and produced by the Adobe Acrobat 9.51 Paper Capture Plug-in.

By your own admission, we do! So you used the same program to test this out, right?

For reference, I applied the ABBY[sic] PDF Transformer 3.0 program to the original WH LFCOLB PDF image.
…
I also applied PDF-XChange Viewer Pro version 2.5 to scan the WH LFCOLB PDF image for OCR.
…
I also applied the ABBY[sic] PDF Transformer 3.0 OCR algorithm to document 15-1.pdf.
…
Finally I also applied the ABBY[sic] PDF Transformer 3.0 OCR program to the four-page document 35-1.pdf.

FAIL!

Also, I note that the PDF-XChange Viewer did better than the ABBYY PDF Transformer. This should have clued yo in that maybe different programs perform worse than others. I’d be especially wary of cheap programs that are packaged with third-party scanner software. Kind of like the difference between Illustrator and Mac Preview.

By default, if you scan an image and convert it to PDF, you get a Image Only PDF.

To make your PDF indexable, searchable and the content “copy & pasteable”, you need to run the file through the Paper Capture plug-in.

* You need to scan the images at least using 300 dpi for the OCR to be effective.

Once you converted the PDF (and fixed up any OCR errors), the text would be selectable in Acrobat Reader with the “Text Select Tool”..

Wait a second. Paper Capture isn’t effective on anything under 300dpi? What was the dpi of the 35-1 LFBC again?

The difference in image resolution of the mostly text layer of the WH LFCOLB PDF image and the uniform resolution of the page 4 LFCOLB PDF image likely explains why less text was detected in this trial OCR scan of page 4 of document 35-1.pdf than the scans of the WH LFCOLB and the page 2 LFCOLB. The resolution (150 PPI) of the page 4 LFCOLB PDF image (last page of 35-1.pdf) is lower than the resolution (300 PPI) of the mostly text layer of the WH LFCOLB PDF image (and the page 2 LFCOLB PDF image). The smallest form text of the page 4 LFCOLB PDF image would be the most affected by the reduced resolution.

So by your own admission, the resolution explains why less text was detected. Now apply that to a program that requires a minimum resolution twice that of the image, and you can see why hardly anything got picked up.

Could you please explain why anyone would go to the trouble of creating a new copy of the LFBC in some graphics software when all you would have to do is print and scan the WH LFBC? This discussion is just asinine. I anticipate your answer because it is always interesting to see how Birthers can rationalize and make believe even the silliest claims are within the realm of the probable.

“Could you please explain why anyone would go to the trouble of creating a new copy of the LFBC in some graphics software when all you would have to do is print and scan the WH LFBC? This discussion is just asinine. I anticipate your answer because it is always interesting to see how Birthers can rationalize and make believe even the silliest claims are within the realm of the probable.”

From my affidavit:

The two case labels applied to the top edge of the (page 4/11) LFCOLB PDF image imply that it was created from the pre-existing (page 2/8) PDF image. If true, then the (page 4/11) LFCOLB PDF image would be identical to the (page 2/8) LCOLB PDF image except for the second case label added to the top edge of the document. The (page 2/8) LFCOLB PDF already existed in PDF format as page two of Document 15-1.pdf. Thus there would be no need to create another Obama LFCOLB PDF image in the same law case.

Assuming all of this to be stipulated, then the task of creating the (page 4/11) LFCOLB PDF image could have been most easily accomplished as follows. To create the (page 4/11) LFCOLB PDF image requires only that page 2 of Document 15-1.pdf be extracted into a separate one-page PDF document and a second case label added above the first. This could all be accomplished in Adobe Acrobat which was available to whoever created Document 35-1.pdf. The four steps are:

1. Document 15-1.pdf is opened in Adobe Acrobat and then page 2 is extracted into a separate one-page PDF document.

2. The object containing the existing Document 15-1 case label is then selected and its color is changed from bright Blue to light Green.

3. The bright-Blue case label for Document 35-1 is then typed above the light Green case label of Document 15-1 within the margin of the (page 4/11) LFCOLB PDF image.

4. The resulting (page 4/11) LFCOLB one-page PDF image file is then merged with the three-page Tepper-to-Fuddy letter to create the Document 35-1.

The resulting (page 4/11) LFCOLB PDF image would then be identical to the (page 2/8) LFCOLB image except for the added second case label. The Blue second case label contains the Document number 35-1.

The (page 2/8) LFCOLB PDF image is identical to the WH LFCOLB image except for the first case label added to the top edge of the page. This first case label contains the document number 15-1.

It follows that, under this preferred work flow, the (page 4/11) PDF image would then be identical to the WH LFCOLB PDF image except for the two added cases labels.

Had this preferred work flow been applied to the creation of the (page 4/11) LFCOLB PDF image, then there would be a solid chain of evidence between the (page 2/8) LFCOLB PDF image and the (page 4/11) LFCOLB PDF image. Unfortunately, the collective findings reported herein indicate that this preferred work flow was not followed in this case.

So why don’t you ask the forger why he scanned a printout RC? He would be the only one who knows why he did that instead of following my preferred work flow.

So it is your claim that the 7 Black rectangles and 14 line elements were created in a MRC scan?

And stop putting words in my mouth!

I would love to hear from you what you think happened in creating this file and its relevance. After all, you wrote an affidavit…

No there was no MRC scan in this case. The rectangles and line elements are ‘hidden’ through a clever trick which I believe is not easily applied through regular tools and is part of the attempt by the paper scanner plugin to recover the fields and lines found in the document. With limited success of course.

“By default, if you scan an image and convert it to PDF, you get a Image Only PDF”

Depends on the work flow. If you “Place” a bitmap file onto a new document in Adobe Illustrator, then you get a PDF document with a link to the external bitmap file. The PDF becomes an image only PDF if you first embed the bitmap image and then save the file..

“So by your own admission, the resolution explains why less text was detected. Now apply that to a program that requires a minimum resolution twice that of the image, and you can see why hardly anything got picked up.”

I OCR scanned the page 4 LFCOLB PDF image with ABBYY PDF Transformer 3.0 and the scan deciphered much more text than your “imaginary OCR scan” of the same document. For example, my trial scan deciphered much of the small form text whereas your purported scan did not. None of the form text is selectable on page 4 of document 35-1.pdf except for “STATE OF HAWAII CERTIFICATE OF LIVE BIRTH DEPARTMENT OF HEALTH”. All of the small form text was not deciphered and therefore was not made selectable.

“Of the identified words and numbers that were made selectable by the purported OCR only the word “OF”, the number “61″ and the Roman numeral “II” were not flagged as suspect.”

I listed all the selectable words for page 4 of document 35-1.pdf in my affidavit.

Finally, I also applied the ABBYY PDF Transformer 3.0 OCR program to the four-page document 35-1.pdf. The scan deciphered both case labels and found all of the typed text with the exception of the “X” in the “No” box within form box 7g. The form text that was not deciphered included “5a. Month”, “5b. Hour”, “6b. Island”, “Town Limits”, “Island”,“7d. Street Address”, “ district”, “7g. Is Residence on a Farm or Plantation?”, “Mother”, “17b. Date Last Worked”, “Signature of Parent”, “Informant”, “Parent”, “Other”, “18b. Date of Signature”, “hour stated”, “M.D.”, “22. Date Accepted by Reg. General”, “AUG -8 19″. Additionally, the following warning was returned by the scan: “Page 4 Warning Check the document language”.

The image resolution for this trial OCR scan would be the same as for your “imaginary OCR scan”.

Consequently, you have to prove that the internal PDF code structure of the page 4/11 LFCOLB PDF image defeated the OCR scan within the Adobe Capture Plug-in in order to sell your theory that the observed selectable words were entirely a result of an OCR scan.

The Adobe Paper Capture Plug-in is now incorporated into Adobe Acrobat XI Pro. The current version of the Paper Capture Plug-in is version 11. Consequently, to access the Adobe Paper Capture Plug-in, you must select the “Create PDF from Scanner” and then select “Custom Scan” and then “Make Selectable (Run OCR)”. All of this is done from within Adobe Acrobat XI Pro.

Can we file this under “I am an inept researcher therefore it is a forgery”?

Well, he is suffering from the problem that he has concluded forgery even when there is a better explanation. Based on this his other explorations somehow also include a forger, even though there is no logical reason nor evidence to reach such a conclusion.