recaptcha

This talk from the 2012 LayerOne conference outlines how the team build Stiltwalker, a package that could beat audio reCAPTCHA. We’re all familiar with the obscured images of words that need to be typed in order to confirm that you’re human (in fact, there’s a cat and mouse game to crack that visual version). But you may not have noticed the option to have words read to you. That secondary option is where the toils of Stiltwalker were aimed, and at the time the team achieved 99% accurracy. We’d like to remind readers that audio is important as visual-only confirmations are a bane of visually impaired users.

This is all past-tense. In fact, about an hour before the talk (embedded after the break) Google upgraded the system, making it much more complex and breaking what these guys had accomplished. But it’s still really fun to hear about their exploit. There were only 58 words used in the system. The team found out that there’s a way to exploit the entry of those word, misspelling them just enough so that they would validate as any of up to three different words. Machine learning was used to improve the accuracy when parsing the audio, but it still required tens of thousands of human verifications before it was reliably running on its own.

Though Time won’t admit it, their poll on the most influential person was hacked. Moot, the founder of 4chan is rated #1. Not only that, but if you read the first letters of the poll results, you get “Marblecake also the game”. This refers to the IRC Chanel where many 4channers congregate as well as “the game” an internet meme. This article is very interesting as it delves into the details of the attack. Focusing mainly on what happened when the autovoting software was shut down due to reCaptcha. you’ve probably seen reCaptcha before. It presents you with two words, made difficult to read by strange kearning, warping, and squiggles. If you can read it, you’re most likely a human. Anon, a common name for 4channers, first tried to hack reCaptcha.

Their attempt at hacking reCaptcha relies on the process reCaptcha uses to identify words. It presents you with two words, one of which it already knows. The other is compared to a database of common responses to that word. Anon decided that if they entered “penis” enough times, they could flood the database allowing their autovoter to function again. This, though clever, was unsuccessful. They eventually settled on manual voting. This was taking too much time, they feared they would never reach their goals. To help with this, they built a simple interface that would preload several reCaptchas and cue up votes. This streamlining allowed them to squeak in the votes they needed to accomplish this.

It’s also worth noting that Time didn’t close the vote entries when the poll closed. They removed the poll from their site, but the streamlined vote software was still working. Anon is a powerful force of nature. If only we could harness it to cure cancer or HIV.

This was certainly the last thing we expected to see today. [ShaunF] has created a Greasemonkey script to bypass the captcha on filehosting site Megaupload. It uses a neural network in JavaScript to do all of the OCR work. It will auto submit and start downloading too. It’s quite a clever hack and is certainly helped by the simple 3 character captcha the site employs. Attempting to do the same thing with ReCAPTCHA has proven much more difficult.