Convert scanned PDF to searchable PDF without losing color

This article shares tips on how to convert scanned PDF to searchable PDF without losing color. VeryPDF PDF to Text OCR Converter Command Line v3.0 lets you use different ways to create searchable PDF according to different input files. Some input PDF contains scanned pages and editable pages, some PDF contains only scanned pages. If you want to convert such PDF to searchable PDF without losing original color, you can try the option -ocrmode <int>.

-ocrmode 1 vs -ocrmode 2

Four values are permitted by -ocrmode<int>. To remain color, you can use -ocrmode 1 and -ocrmode 4. The following is the comparison between the two modes that you can use to generate color PDF where text is searchable:

-ocrmode 1

-ocrmode 4

supported input formats

scanned PDF

PDF and images

text in output PDF

vector-based, searchable

raster-based, searchable

quality of the magnified text

high quality

loss clarity

text layer

under original PDF pages

hidden

original PDF pages

retain

removed

original color

retain

retain

When use -ocrmode 1?

If the input PDF contains only scanned pages, you are recommended to use -ocrmode 1 as in pdf2txtocr.exe -ocr -ocrmode 1 ocr.pdf ocr1.pdf, Where

-ocrmode 1 means to recognize text in scanned PDF, and insert new text layer under original PDF pages.

ocr.pdf represents the input file.

ocr1.pdf stands for the output file.

The illustrations below show the effects of conversion from a scanned PDF to searchable PDF. The text in the result PDF can be magnified by any amount without lowering quality.

Fig.1 Input scanned PDF

Fig. 2 After use -ocrmode 1 Fig. 3 Magnified for 16 times

[Tips] -ocrmode 1 only recognizes text in scanned PDF. It can’t recognize text in images. If the input PDF has editable pages, there might appear two text layers: one is newly created, and the other belongs to original editable pages. Such problems can be solved using -ocrmode 4.

When use -ocrmode 4?

In order to convert image to searchable PDF, scanned PDF to searchable PDF, and PDF with some searchable pages to editable PDF, -ocrmode 4 is provided. When use -ocrmode 4 to convert scanned PDF, you will find that the text in the result PDF text will loss clarity as being magnified. The illustrations below show the effects:

Fig. 4 After use -ocrmode 4 Fig. 5 Magnified for 16 times

The following are two command lines for conversion from scanned PDF to searchable PDF:

pdf2txtocr.exe -ocr -ocrmode 4 -bitcount 24 ocr.pdf color.pdf

pdf2txtocr.exe -ocr -ocrmode 4 ocr.pdf grey.pdf

The illustrations below show the effects of the two command lines:

Fig. 7 1st command line Fig.8 2nd command line

The following are for conversion from image to PDF:

pdf2txtocr.exe -ocrmode 4 ocr.tif color.pdf

pdf2txtocr.exe -ocrmode 4 ocr.png color.pdf

[Tips] When convert image to PDF, -ocr is not required as in the fourth and last command lines above . When convert scanned PDF, -ocr must appear as in the first two command lines above. Moreover, to retain original color when create searchable PDF from scanned PDF, you need to use -bitcount 24. Otherwise, the result PDF will be grey as Fig 8.

One Response to Convert scanned PDF to searchable PDF without losing color

Thanks for your message, the following products are all can convert scanned PDF files to searchable PDF files, the output PDF files will contain a hidden text layer, you can open OCRed PDF files in Adobe Reader and search text contents properly,