What is OCR?

OCR – Optical Character Recognition – is a useful machine vision capability. OCR let’s you recognize and extract text from images, so that it can be further processed / stored. This is very useful for processing scans / pictures of text – for instance, when working with invoices, scanned forms and signage.

We’ve looked at several APIs for OCR, evaluating them based on:

Ready to start building awesome apps? Get Started at RapidAPI's API Marketplace!

Accuracy – we tried them all with the picture bellow to make sure they clearly recognize the text.

Price – we outline the price per call of the different APIs.

Special capabilities – some of the API we’ve covered have special capabilities, making them more well suited for specific tasks like scanning invoices / recognizing logos.

We used the following image to try out the API as it contains a lot of text in different styles & sizes, as well as some graphics that could confuse the API.

The Best OCR APIs

The Microsoft Computer Vision API is a comprehensive set of computer vision tools, spanning capabilities like generating smart image thumbnails, recognizing celebrities in images and describing the content of images using AI.

The text recognition works well, and returns the text divided into regions of text. Each region has lines, and each line has words, which contain the actual text. The division is convenient for understanding the structure of the content in the image, though if you just need the text as one large string and don’t care about positioning, it’ll require more code.

Price

The free tier for Microsoft’s API will give you 5,000 requests per month. The API has 3 paid plans:

This API is a dedicated OCR platform, with a single function – Image OCR. It also has a “sister” API – Video OCR – which is optimized for extracting text from videos (more on that later).

The SemaMedia API also requires manually setting the language with each request (using the lang parameter). In scenarios where the language is known this should actually improve the accuracy, as it lets the API compare the recognized words with the dictionary (when using the df=True option).

Accuracy

The API handled the supplied image very well. It returns an array of results, each a region of text with a position in the image, as well as the text result.

Special Features

The SemaMedia platform also supports video OCR with the Video OCR API. According to the docs, video OCR is an analysis cascade which includes video segmentation (hard-cut), video text detection/recognition, and named entity recognition from video text (NER is a free add-on feature). The analysis result of this method enables automatic video retrieval and indexing as well as content-based video search in video archives. A detailed example can be found in our demo website.

Price

The free tier for SemaMedia’s API will give you 100 requests per month. The API has 3 paid plans:

The Taggun API is a unique OCR API, targeted directly at scanning invoices and receipts. This can be useful as the API not only recognizes the text in the image, it also recognizes the structure of the invoice and returns parsed data like totalAmount, taxAmoumt, merchantName etc…

Accuracy

Calling the simple receipt processing endpoint, the API returns an accuracy score with each piece of information returned. Sometimes, that’d be 0 and the information would be missing. However, when the information is there, it is usually accurate.

The label by label accuracy can be used to ask users for fields that are not properly recognized in the scanned invoice.

Price

The Taggun API has a free plan that includes 50 requests per month, and a paid plan costing $90 that includes 1,000 monthly requests.

The Cloudmersive OCR API is a nifty tool for simple text extraction from images. It has only one endpoint – Image to Text , and returns all the text in the image as one string rather than by regions. This can be useful when transcribing a big blob of text (from a book / paper), and only the text itself is needed.

Accuracy

The API was pretty accurate, and successfully transcribed most words in the document.

Price

The free tier for the Cloudmersive API will give you 50,000 requests per month. The API has 3 paid plans:

Using the /detectText endpoint with the supplied image, the API identified the text well. The response contains a textAnnotation field which has the different word segments in the image, with their text and location. This can be very handy for highlighting specific words in the image (for instance highlighting brand names / words from a list).

The API also returns a fullTextAnnotation field which contains the entire text in the image as a single string, as well as the detected language of the document.

Price

The API includes 1,000 free API calls per month, and charges $1.5 for each subsequent 1,000 requests (as of April 2018).

Special Features

The Google Cloud Vision API also has an OCR-related endpoint called /detectLogos . Given an image that contains brand logos, this endpoint could identify the brands they belong to. During our testing, this endpoint easily identified logos for top brands.