May 23, 2009

Dr. von Ahn has also created a free system, called reCaptcha recaptcha.net, now used by about 120,000 sites including Ticketmaster, Craigslist, Facebook, Twitter and The New York Times.

The system has an unusual twist that provides an added benefit to projects that are digitizing books and papers in archives: the source of the wiggly images that people must decipher is not random. The images are drawn from books and other media that are being digitized in mass projects, but that machines haven’t been able to read because, for instance, the page is wrinkled.

Automatic character recognition lets people who are having the work scanned know which words it cannot read. These are the words that recaptcha farms out and, once they are interpreted, returns to the original document. In this way, word by word, most of the mystery words are deciphered, in this case by humans. “We are digitizing about 25 million words per day by having people type in captchas,” Dr. von Ahn said.