Cracking ReCAPTCHA

Cracking ReCAPTCHA

I was browsing dark Reading today and came across an article they published 4 days ago. A researcher has broken reCAPTCHA, that is a CAPTCHA software tool that many websites use to tell the difference between a human and a computer. It is designed to prevent automated program from creating mass accounts which, in most cases, is intended to abuse a particular service.

A researcher earlier this month demonstrated how he solved Google's reCAPTCHA program even after recent improvements made to the anti-bot and anti-spam tool by the search engine giant.

Chad Houck, an independent researcher, also released the algorithms he wrote to crack reCAPTCHA. Houck had published a white paper on the hack prior to presenting his research at Defcon in Las Vegas, and says that Google made several fixes to reCAPTCHA that defeated several of his algorithms before he was scheduled to give his presentation. He then quickly came up with a few additional approaches with his algorithms, and says he was able to beat the updated reCAPTCHA 30 percent of the time.

"[ReCAPTCHA] has never been wholly secure. There are always ways to crack it," says Houck, whose algorithms have been available online since Defcon. "The information [about the research] is out there. Google still hasn't changed it, which kind of surprises me." Google, however, thus far has not seen any signs of this being actively used in the wild.

…

ReCAPTCHA, which was originally created by Carnegie Mellon University and later purchased by Google, basically protects websites from bots and spam by generating distorted text or words that humans can read, but software or optical character readers cannot. The words used by the reCAPTCHA program come from books that are being digitized. The program, which runs on many major websites as a way to validate that the user on the site is a human and not an automated bot or spammer, presents the user with two real words to type into a box, one of which is for verification and the other for digitization purposes.

…

Just how difficult would it be for a bad guy to exploit this? "As long as you know how to program well enough, it would take a day to implement my algorithms," he says.

I would say that this somewhat qualifies as news. On the one hand, reCAPTCHA nicely dovetails with Google’s mission to digitize all of the world’s books (that the publishers will let them). While people are busy solving these CAPTCHAs, at the same time they are putting books into digital format which assists in their redistribution. In essence, Google is killing two birds with one stone – they are preventing abuse of their systems, and at the same time capturing information in preparation for its dispersal to everyone else (or as one Objectivist put it, the only resource that requires redistribution is knowledge).

ReCAPTCHA has become very popular and a lot of sites use it because it is free and it is (was) secure. However, on the flip side, the fact that a CAPTCHA is broken doesn’t really qualify as news. We have known for years that CAPTCHAs are broken and this has been accomplished by a couple of different methods:

Spammers (or malware authors) hire people off shore in countries in the developing world and pay them create accounts. In essence, they (the spammers) are still absorbing a cost but they have circumvented the problem of deciphering text that is unreadable by ensuring that the cost/benefit ratio is still in their favor. The clock is ticking on this mechanism, though. As more and more countries are lifted up out of poverty, the industrial wages there will go up. Eventually it will not be cost effective for spammers to outsource labor in this fashion.

Of course, it will be decades before that finally happens, so while the clock is ticking, it’s a very long clock.

Software already exists that breaks CAPTCHAs. This is kind of the point of my post. These types of security measures have had pseudo-effectiveness. There are lots and lots of abusive accounts being created on Hotmail, Yahoo, Gmail and abusive content hosted on Windows Live Spaces, Sky Drive, Yahoo Groups and Google Blogspot. All of these services are free to sign up with and all of them are protected by CAPTCHAs. However, we continue to see lots of spam, malware, and other sundry subterfuge being used for these services. Working backwards, it doesn’t take a genius to figure out that the spammers have figured out a way to break the CAPTCHAs used to protect those sites.

To be sure, all of these services periodically go back and update the algorithms for these things and the spammers are defeated for a while. However, the spammers react, tweak their own software and eventually they can go back to breaking the algorithms used to stop them from abusing the service. Now, when I say ‘break’, I mean that they are successful maybe 10% of the time. However, if something is successful 10% of the time and you can do it over and over, it basically means that you have succeeded in breaking the protection.

So, I take issue that this is news in the sense that it is “new”, or that we haven’t seen this before. What makes this newsworthy is that a service that was supposed to serve the dual purpose of implementing security + saving the world might not be able to serve a dual role after all.