Quick post on OCR and PDF recognition

I’m sitting at the Wednesday workshop “Breaking Down Barriers and Enabling Access: (Dis)Ability in Writing Classrooms and Programs” and in a small roundtable we’re talking about accessibility of written materials for those who use screen readers.

One important tool is Optimized Character Recognition. When you scan a document to share with students, you can scan it with recognized text, which allows it to be read with screen readers.

There are specialty programs for disabled people that will convert scanned PDFs into OCR’d PDFs so that they can be read from a screen reader. You can also have disability services help you scan directly to OCR. However’y ou can do it yourself as well: Here’s a quick guide to OCRing a text using Adobe: http://blogs.adobe.com/acrobat/acrobat_ocr_make_your_scanned/

Tips for Ensuring OCR’d PDFs actually work:

It’s necessary to check that your OCR’d PDFs actually work! One option, as Sushil Oswal suggested is to download Microsoft’s new Windows Eyes software (for free!) and listen to the file itself.

This might not work for everyone, however. Suppose you’re Deaf or hard of hearing, how can you tell the accuracy of an OCR’d PDF?

One thing I’ve discovered from my years of scanning and listening to readings using Kurzweil 3000 is that a poorly scanned PDF will be translated to the computer as gibberish. For example, if you scan something that’s been underlined by hand, the OCR will not be able to distinguish the letters. Or, sometimes lowercase Ls are turned into 1s, etc.

A simple way to check the accuracy of your OCR scan is to try copying the text of the PDF and pasting it in another program, like MS Word. Once you’ve OCR’d a text, you should be able to highlight and copy any text that has been recognized and integrated into the file. Whatever appears when you copy the text should be exactly what the program thinks the PDF says: so if it’s full of errors, then you can expect that the student listening to the file will hear those errors. If the text copies across platforms without errors, then the audio should read without errors too. Take it from me, listening to a poorly scanned PDF is deeply frustrating.