Wednesday, April 29, 2009

This week Time Magazine announced the results of their Time 100, a listing of the most influential people over the last year. In the past this list has included Barack Obama, Steve Jobs, and Al Gore. This year’s winner: “moot,” the creator of an underground bulletin board called 4chan. Not many people had heard of moot before, so it’s no surprise to say that that the poll was completely manipulated by members of the site he created. At first, the poll had almost no protection against abuse, so members of 4chan wrote a program to vote for moot millions of times :)

Although Time Magazine eventually decided to implement reCAPTCHA on their poll, it was unfortunately too late. Before Time added reCAPTCHA, members of 4chan had written a program that was able to submit over 20 million votes. After Time added reCAPTCHA, the ballot-stuffing program completely stopped working, and the members of 4chan were forced to spend thousands of human hours typing reCAPTCHAs by hand. Through all this effort (some of them singlehandedly spent 40+ hours per week typing CAPTCHAs), they were only able to submit about 200,000 more votes after Time implemented reCAPTCHA. 200,000 votes is a small number compared to the number of votes other candidates got.

Use reCAPTCHA from the start on your polls and you will significantly raise the bar for spammers to be effective.

Sunday, February 8, 2009

Sunday, December 14, 2008

Every day we serve over 30 million randomly chosen pairs of words from scanned books and newspapers to users around the world. Although we heavily filter the words presented to avoid offensive combinations (there are over 1,000 words in our block list), some amusing pairs slip through. Below are some of our favorites. All are real examples emailed by users.

While obtaining tickets for a concert:

What can we say?

Made an unlucky user insult themselves:

Marital advice:

This is one of the all-time best: The user emailed the site 20 minutes later complaining he had followed the instructions to wait, but nothing was happening.

Sunday, December 7, 2008

One of the main goals when we launched reCAPTCHA was to provide an accessible system to visually impaired individuals (who surf the Web using screen-reading software). Most other CAPTCHAs do not provide an audio alternative, and therefore block blind people from freely navigating the Web. We're proud of the fact that reCAPTCHA has always had an audio alternative.

Today we are announcing a significantly improved audio CAPTCHA which is both easier for humans than our previous one, and most importantly, by far the most secure audio CAPTCHA we know of.

Like many of the other audio CAPTCHAs, our previous version consisted of distorted spoken digits. We collected thousands of voices saying the digits zero through nine, and formed audio CAPTCHAs by concatenating digits from different speakers and adding noise distortions in the background. To maintain the security of the audio CAPTCHA, our distortions were quite heavy. We now believe that even such heavy distortions are not enough when the audio CAPTCHAs are restricted to only spoken digits or letters.

Although we have not seen anybody abuse our previous audio CAPTCHA in the wild, we have taken preventive measures against this potential attack. So today we announce the release of a new audio CAPTCHA that is significantly more secure and in particular not susceptible to Jenn's attack. In fact, breaking this new audio CAPTCHA would require major advancements in speech recognition technology.

Instead of using spoken digits or letters, our new audio CAPTCHA presents entire spoken sentences or phrases that the best speech recognition algorithms failed to recognize. In other words, this new audio CAPTCHA uses the same idea as the standard visual reCAPTCHA: we play audio from old time radio shows that speech recognition software could not decipher correctly, and then use the results of humans solving these CAPTCHAs to transcribe the old time radio shows. Not only is this audio CAPTCHA more secure, but it will also have a positive side-effect. Much like the visual reCAPTCHA has helped to digitize billions of printed words so far, we expect that the audio version will help transcribe large amounts of historical audio content.

You can hear the new audio CAPTCHA by going here and clicking on the audio button. You'll hear a short clip with people speaking and will have to type what they are saying. To account for spelling mistakes and homophones, the verification algorithm uses a phoneme-based encoding and allows a small number of mistakes.

We'll be rolling this update out to all of our users over the next few weeks. For now, if you are using our custom theme option, we ask that you update the instructions for the audio CAPTCHA to say something along the lines of "type what you hear".

After a year and a half of running reCAPTCHA, we finally had time to start a blog.

Perhaps the best way to begin is with a run-down of our milestones: The media has noticed us with coverage in NPR, the Wall Street Journal, the Boston Globe, the Guardian, Wired, and hundreds of other outlets; we published a paper in the journal Science about the accuracy of the reCAPTCHA transcriptions; over 75,000 Web sites have signed up to use our service (including some household names like Facebook, Ticketmaster and Craigslist), and to this day over 300 million people (more than 5% of the world's population!) have helped us digitize content from the New York Times and the Internet Archive. So far, close to 5 billion words have been served.