OCR and ScanningOptical Character Recognition [OCR] and
scanning are often confused. OCR includes scanning of fields to be read.
Scanning takes an image of a page or line without intelligently
recognising the data scanned. OCR was originally developed for
automatically processing very high volumes of paper such as cheques or
utility bill remittance stubs. The characters in the fields that could be
recognised were not ordinary type fonts but were specialised character
sets designed specifically for OCR. These fonts were known as OCR A and
OCR B.

The early OCR readers were large, cumbersome,
noisy pieces of equipment. Moving volumes of ‘used’ paper past a read head
at high speed and accurately reading the desired fields is a non-trivial
task. Forms design is critical. Paper weight, colour, layout, and box size
have to be carefully controlled. In processing, forms have to be
accurately aligned, fields to be read have to be identified and registered
and characters have to be machine readable. Where an item could not be
read accurately the form was dropped into a ‘reject’ pocket on the reader.
The rejects were then re-processed by key punching. These systems were
cost-effective from the late 1960s/early 1970s although acquisition and
running costs were high. A resident engineer was needed for daily
operations.

Later in the 1970s, OCR readers became more
reliable and efficiency was improved by automating reject repair. This was
done by integrating a key-to-disk minicomputer with the OCR reader.
Rejected characters were displayed as read on a terminal screen. The
operator corrected the rejected character by visual verification. A second
development was the ability to read handprint [not handwriting]. These
systems were called Mixed Media systems because they could capture data
both by OCR and by keying.

In 1978, British Rail installed the UK’s first
mixed media system with handprint recognition for reading timesheets for
payroll. Over the next 20 years, OCR was continuously developed so that
more standard type fonts could be read and handprint became widely
available. The readers became smaller, more reliable and less costly. In
1998 the same conceptual system approach using commodity rather than
proprietary hardware was used to read Cattle Passports for the UK’s Cattle
Tracing System designed in the aftermath of the Mad Cow Disease [BSE]
crisis. The only common features were really the common challenge of
setting-up and maintaining the systems to a high standard although the new
technology was orders of magnitude easier to work with than the old.
Nevertheless OCR remains a specialist technology.

Scanning generically came from photocopying
with advances in laser technology and digitisation improving the quality
and tractability of the scanned image. A good example of scanning being
used in data capture is the Atomic Weapons Establishment’s payroll system
using desk-top scanners connected to PCs. These scanners had recognition
software and were ‘state-of-the art’ at the time.

The Case Studies are the only ones to survive.
They are broadly representative and indicative of the use of the
technology. By the 1990s desk-top scanners were so common they were rarely
documented as systems. In the early 1990s software systems were available
such as ROCC’s SEECHECK Forms Processing Solutions that ran on commodity
hardware and provided comprehensive facilities for all scanning- related
and keying-related data capture. From a separate department in an
organization using somewhat esoteric technology ,data capture had become
just another desk-top task.