Blog

Comparing different OCR packages

Optical Character Recognition (OCR) is a widely used technology for extracting text from the scanned or camera images containing text. There are different types of Open source and Commercial OCR Software. In this article, We will compare between the best of the available OCR software in the Open source and Commercial.

A general Comparison between different OCR software is given in this wiki article. But There is no sample data on the OCR efficiency. The best available Open Source OCR software is 1. Tesseract, but still its efficiency is not good enough to be compared with Commercial ones. From the list of Commercial OCR software we will select 2. Abbyy finereader and 3. Maestro.

What we have done is Process an image with these 3 softwares and compared the raw text output.

From the comparison given above We can clearly see that Abbyy finereader or Maestro OCR software are far better than standard Tesseract OCR software in detecting more characters with a level of accuracy. We followed the same process for several images and found that Abbyy finereader and Maestro OCR trumped Tesseract consistently.

But since the commercial packages are quite expensive, we will have to see if we can somehow improve the performance of Tesseract. It may be cost-effective to work on improving Tesseract performance rather than purchase the commercial options.

Note: Read how we made Tesseract perform as well as or even better than these commercial packages here