OCR or optical character recognition is a widely used technology for reading text strings from image documents. This is done by comparing scanned image documents with stored bitmaps of scriptural text strings, and editable and searchable files are created out of discovered matches. Rather than using such a heuristic model, modern technology is fast implementing a mathematical model where formulas are used to compare sets of data from the scanned image with other sets from the stored image. When comparisons reach a certain level, the scanned image is taken to be the text that comparison represents and converted accordingly.

What is a compress software?

A software that is used to compress data by reducing redundancies and irrelevancies and retaining only those parts of the data that are unitary and cannot be eliminated without loss of data is a compress software. The two types of compression are lossy and lossless. In the former, size is more important than quality, and loss of data is fine as long as the file size is reduced. This produces poor quality but highly reduced files. Lossless compression on the other hand does not part with data even if file size remains large.

How is a compress software used with OCR?

When OCR documents are created, they can often be pretty large in size, especially if they are in the PDF format. A PDF is a portable file format that is very useful to retain file structure and content intact which necessitates a very large file. It is not easy to use or store or even transfer such a large file, and there fore compress softwares are used with OCR to reduce file sizes of PDF documents. A software can reduce the file size by as much as 50 per cent, which helps save disk space and bandwidth.