Third-party software integration: OCR Tesseract

Tesseract is an Open Source OCR engine adopted by Google. It works really well. The OCR natively can read TIFF documents and has hight ratio of recognition with images 300 dpi of resolution and converted to lineart (1 bit color).

If you are using a computer with Debian / Ubuntu, the installation simplifies a lot:

You must add the PPA, install the latest Tesseract and then disable the PPA as it contains a lot of bleeding edge packages!

$ sudo add-apt-repository -r ppa:alex-p/notesalexp

There is also another interesting free OCR application called OCRopus. It has many improvements over Tesseract but is on early development stage. Last released version (0.3.1) is quite usable and works very well but have to be compiled and actually is a difficult task. Visit http://code.google.com/p/ocropus/ for more info.

Compile from source code

You can download the source code from http://code.google.com/p/tesseract-ocr/ and compile yourself. Also download the language files you need and uncompress them in the same folder of the application.