Digitizing in Little Bits

CMU Researcher Uses eCommerce Tool To Digitize Books
6/4/2007
By Paul McCloskey
A researcher at Carnegie Mellon University has found a way to turn the process by which people register at commercial websites into a method for digitizing books, the Associated Press reported.
The method involves putting the time and effort people spend deciphering the short word puzzles used to confirm a registration to better use by having users key-in print materials that need digitizing.
The word puzzles are known as CAPTCHAs, short for “completely automated public Turing tests to tell computers and humans apart.”
Computers can’t decipher the letters and numbers, ensuring that real people are using the websites.
CMU researchers estimated about 60 million CAPTCHA puzzles are solved every day, taking about 10 seconds each. Researchers have now come up with a way for people to type in snippets of books when registering at a site to help speed up the process of putting texts online.
“Humanity is wasting 150,000 hours every day on these,” said Luis von Ahn, an assistant professor of computer science at Carnegie Mellon, who helped develop the original system.
Von Ahn is working with the Internet Archive, which runs several book-scanning projects, to use CAPTCHAs for this instead. The Archive scans 12,000 books a month and sends von Ahn image files that the computer cannot recognize. The files are split up into single words that can be used as CAPTCHAs at sites all over the Internet.
Paul McCloskey, “CMU Researcher Uses eCommerce Tool To Digitize Books,” Campus Technology, 6/4/2007, http://www.campustechnology.com/article.aspx?aid=48372