GOCR is a command line tool for text recognition, which since 2000 by Joerg Schulenburg {En} / {En}is developed. The program is to configure “trainable” and varied, good results are obtained especially for sans-serif fonts. This is purely a character recognition program that works talked-independent. With GOCR tcl also a graphical user interface, which is not quite up to date exists. By default, xsane GOCR as text recognition program, with OcrGui some of the options can be used in a graphical user interface. Many OCR frontends can GOCR use (eg: ocrodjvu , OCRFeeder , gscan2pdf ).

pdfocr is a program that allows, from scanned PDF to make templates searchable documents. The in Ruby wrote script engages the text recognitionby default on the OCR program tesseract-ocr , optional Cuneiform Linux , or OCRopus , back and used for merging the original with the text recognitionhocr2pdffrom Exact Image . Also, come pdftk and pdfimages used.

tesseract-ocr {En} is a command line program for text recognition . Originally from Hewlett Packard developed as a commercial program 1984-1995, the code was released in 2005. The development is supported by Google as an open source solution for creation of e-books was needed. The program supports a number of Western European and Asian languages ??such as Vietnamese. tesseract-ocr is a pure character recognition program, it does not provide layout analysis, and are plain text, version 3.00 also HOCR from. The text recognition can be “trained”.

Examples

//LINUX ==> MUTIPLE IMAGES TO MULTI PAGE PDF (Page of PDF will be depend on number of input images)