12 April 2007

It's easy to become apprehensive about the massive and growing power of Google. After all, its operating plan is essentially to know everything about everything that happens online - and, as a consequence, offline. I certainly share those concerns, but it's also important to note the company continues to make moves that contribute to the free software commons.

We're happy to announce the OCRopus OCR Project, a Google-sponsored project to develop advanced OCR technologies in the IUPR research group, headed by Prof. Thomas Breuel at the DFKI (German Research Center for Artificial Intelligence, Kaiserslautern, Germany).

The goal of the project is to advance the state of the art in optical character recognition and related technologies, and to deliver a high quality OCR system suitable for document conversions, electronic libraries, vision impaired users, historical document analysis, and general desktop use. In addition, we are structuring the system in such a way that it will be easy to reuse by other researchers in the field.

We are initially targeting Linux x86 and x86/64 and are developing under Ubuntu 6.10. The code should be easily portable to other Linux distributions and other platforms. If you're interested in taking responsibility for another platform, please let us know.

OCR is an area where free software is still lagging somewhat compared to proprietary code: Google's latest gift to the community is therefore highly welcome - even if ultimately it will help it know even more about documents and hence us. (Via Matt Asay).

About Me

I have been a technology journalist and consultant for 30 years, covering
the Internet since March 1994, and the free software world since 1995.

One early feature I wrote was for Wired in 1997:
The Greatest OS that (N)ever Was.
My most recent books are Rebel Code: Linux and the Open Source Revolution, and Digital Code of Life: How Bioinformatics is Revolutionizing Science, Medicine and Business.