Blog

06.23.18

The Omega Pack and Optical Character Recognition

We would like to scan all our clients’ bank statements using optical character recognition. Experience suggests that our typical bank statement is a photocopy of a photocopy with ticks and scribbles, and maybe a coffee stain and a dead insect. Shouldn’t we just type it in?

Let’s give the score “alpha” to pristine original bank statements with a running total on every line which are easy to scan by OCR. A pack of such statements may be called an Alpha Pack.

Let’s give the score “omega” to the worst bank statement we actually get. It’s a skewed photocopy of a photocopy where every number has been ticked off, several annotations have been added, several running totals are missing, and so on. A pack of statements where the first statement scores alpha, the second beta, the third gamma, and so on until the last scores omega, may be called an Omega Pack. Technologists may recognise that we are about to talk about Graceful Degradation.

So what is our OCR system like with an Omega Pack? How soon will it be before we abandon OCR and resort to typing it in? For us these are key questions. Any fool can design an OCR system to process Alpha Packs. Our average statements score between epsilon and sigma. We aim to be able to scan them all anyway.

As well as a computer, our other principal resource is the human visual system which is good at pattern recognition and seeing the big picture, and we aim to exploit it. The spreadsheet version of each scanned bank statement will be displayed on a blink comparator in a simple “in your face” style one at a time so the accounts clerk can look back and forth between the display and the original statement. In addition, if a running total is wrong by a penny or two, the computer can detect and highlight this, but leave it for the clerk to fix by overtyping wrong numbers after inspecting the original paper statement.

If some numbers are missing off the end of a statement, the human will spot this easily and can just add them. The computer will find this impossible to fix. If a number is wrong within a statement, the computer will find this easy to spot, while the human will struggle. Computer and human have complementary abilities which we will exploit.

With this system, we are able to process statements scoring between alpha and sigma, let’s say. Statements scoring between tau and omega will need to be typed in, but we have a system to do this quickly by the column. After entering numbers, the system can use “datepointing” to ask us to only enter the dates of material transactions (it interpolates the rest), and it can use “narrative prediction” to make a first guess at the narratives so we only need to overtype the narratives that are wrong.

When things go wrong, OCR systems tend to go haywire. If the reader has ever tried to use an OCR device, this is probably his or her first experience of them. We use Able2Extract OCR software which is “best of breed” at spotting tables of information as a matter of experience, and never goes crazy, but then we force the software to stick to a rigid five column template (date, narrative, number, number, running total). This stamps on the haywire problem. Every bank has its own template which the clerk must select, but selection of the wrong template with common British bank statements merely means that column widths need to be adjusted, which is easy. It could be said that we are using the OCR software like a glorified bulk OCR pen scanner.

We assert our control in the horizontal direction with the five column template. In the vertical direction, our blink comparator display will pick up extraneous material (“rubbish”) from above and below the working area. Since many working areas begin and end with lines like “Balance brought forward” and “Balance carried forward”, we can easily program our blink comparator to highlight the working area for the benefit of the clerk, but to leave it to the clerk to take the actual decision to delete the rubbish. The clerk could instead choose a different working area before deletion, but the software usually gets it right first time.

Our software can recognise and deal with errors like “alance brought forward” and “etails” which may appear occasionally. We can simulate these errors by overtyping on OCR output and then we program our own software to fix or tolerate them.

The process of picking out and tidying up the working area we call idealisation, and the result is an ideal bank statement where the running total is always correct and there is no extraneous matter. Normally idealisation only takes a minute or so and is much faster than typing it in. Towards the back of the Omega Pack idealisation is going to take a bit longer, but not out of proportion to the degree of deterioration of the statement. This is called Graceful Degradation and it is the basic aim.

Ideal bank statements can then be easily processed by the rest of our OCR system. Actually, if we deliberately leave a few errors in, but not too many, our software can make repairs by reference to the running total, so we have massive defence in depth. However, since the computer is just a stupid machine, we recommend that the clerk does not rely on it too much.

So what is OCR software like with the Omega Pack that we encounter in the real world? As we said, any fool can produce an OCR system to process an Alpha Pack. We need something that gives sensible results when things are less than perfect. If we work through an Omega Pack, we are likely to find that the scanning of narrative is the first to suffer as the statements progressively deteriorate because there is no independent way to check narratives. At some point we may abandon scanning of narratives and just scan dates and numbers. The alternative technology is then Narrative Prediction where we predict future narratives from what has been entered already, and then overtype the predictions that turn out to be wrong. In some cases, the narratives are so predictable that no overtyping whatsoever is needed, so NP can be a good backup technology. Sometimes it is actually better than OCR and an accountancy firm which chooses to use NP all the time is not necessarily at any disadvantage.

As we continue to work through the Omega Pack, dates will begin to suffer. Dates which are obviously wrong are just deleted and then interpolated from nearby successfully-scanned dates. Once we are deleting all the dates, we can resort to just scanning numbers and use a “datepointing” system. This asks the user to enter the dates of material items, and also the first date of every new bank statement. It then completes the dates by interpolation.

Continuing through the Omega Pack, numbers are the last to go because bank statements have a running total which we can use to check them and highlight obvious errors. Once we cannot even scan numbers, we must resort to typing them in for which we have a “single sweep numeric keypad” system. Both OCR and SSNK can be used on the same run of bank statements, and can be intermixed with datepointing and Narrative Prediction, so we have lots of options.

Some client records consist of handwritten transactions for which OCR is useless. We see these as merely omega quality bank statements for which we would use SSNK, datepointing and NP. Gringotts Bank provide handwritten statements as a matter of course, while the printed statements of some banks are little better.