I have some images of answered tests. I am working on recognizing the answers for the multiple choice questions on each test. A example:

I think the problem can be separate into two sub-problems:
1.) Identify and segment the multiple choice answers from the rest of the test. 2.) Classify and recognize the answer as one of the letters in A, B, C, D, E.

For the second sub-problem, there exists libraries like Tesseract, also I can train a neural network to identify among the five letters which is not hard.

However, I am not sure how to approach the first sub-problem. The most important part in identifying the multiple choice answers would be identifying the region that contain the multiple choice answers (the red box region in the example). Since multiple choices answers are typically written in a table that has two rows and number_of_questions + 1 columns. My first step is to identify this table in the image.

What's the best way to identify such table? I am currently thinking about training a neural network to identify the table. Should I use template matching or other methods instead? Which one is better?

$\begingroup$Can I ask if it is just the red square bit you are interested in "recognising" automatically? Or the whole document? If it's just the red square, you can achieve it with a very simple classification scheme that simply recognises capital letters at a single scale but you will have to make sure that all documents are shot under identical conditions.$\endgroup$
– A_AMar 8 '16 at 10:19

$\begingroup$@A_A yes, it's just the red square bit I am interested in recognising. However, I can't control the condition each would be shot under. The photos will be taken over the phone. And the position of the multiple choice answer chart varies among different tests.$\endgroup$
– wenxiMar 9 '16 at 1:16

$\begingroup$@A_A what classification scheme you have in mind?$\endgroup$
– wenxiMar 9 '16 at 1:21

1 Answer
1

What this basically does is looking for a template: a small image representing the response letters, inside a (usually bigger) image. The result that is returned from this operation looks like an image but it essentially has peak values where the template matches the image.

This approach transferred to this problem means that one cross correlation would have to be carried out for each possible answer and each resulting image examined for "peaks".

Of course, this assumes that the letters appear at a single scale throughout the document. Otherwise, cross correlations would have to be applied (via an appropriate method) at different scales too.

There are two components to be carefully managed here:

Quality and scale of the image:

This can be roughly controlled by instructing users to align the top left corner of their camera's field of view with the paper. This will constrain the distance from the image but it still depends on the lens of the mobile phone.

Quality and scale of the template:

This depends on #1 and the hand-writing of the users. But on average, roughly equal size fonts can be used (as further below).

from matplotlib import imshow, show
from skimage.io import imread
from skimage.feature import match_template
#Load the image
Q = imread("tst4.jpg")
#Load the template, here it is the letter A from the Sans font, approximately at the same size as the hand-written provided
A = imread("letter2.png")
U = match_template(Q, A, pad_input = True)
#Show the result
imshow(U);show()

(Please note, a more elaborate version of that is available via this link)

The result U looks like this:

For a Q that looks like this:

And an A that looks like this:

In the U image, please note that the red colour corresponds to high peaks (very close to 1.0). Just putting a threshold on the pixel values of U will return the centers of the 'A' symbols.

Of course, other formations seem to have high correlation as well in that image. These are "false-positives", that is, places where the algorithm THINKS there is an "A" but in reality there is not.

There are two ways by which these can be reduced (and increase the system's reliability):

Apply the template matching over a smaller area (this is now possible if the top left corner is roughly aligned with the edge of the paper)

Add more features to the classification operation than just the peak of the cross-correlation between the template and the image to make it more robust. This is probably the next level in complexity and more information can be provided if needed.

$\begingroup$Thanks, the instruction is very detailed. I will give it a try. And let you know the results ;)$\endgroup$
– wenxiMar 15 '16 at 12:19

$\begingroup$@wenxi Thank you, glad to hear it was helpful. There are some details to do with inverting the image (negative) and slightly adjusting its contrast which are very simple preprocessing operations to improve the cross correlation output, I can expand on those if required later.$\endgroup$
– A_AMar 17 '16 at 5:57