For the past few months, we have been blogging about our research into
how to handle scanned documents at CourtListener since a number of
courts have a habit of releasing their opinions in this manner.
Previously when this happened, it meant that we couldn’t get the text
out of the document, and as a result, it was impossible for anybody to
find these cases on the site.

Obviously, this is a bad situation for our users, so we are excited to
announce that as of today we have a new Optical Character Recognition
(OCR) system for extracting the text from scanned documents. We’re
currently extracting the text from an additional 10,000 opinions that
were previously unsearchable, and going into the future we’ll do this
automatically as we get cases from the courts.

This change further expands the breadth of our coverage, and we hope you
find it to be a useful change!