Breaking and fixing the world’s primary spam defence: CAPTCHA

The basic concept of a captcha is to distort a string of text so that automated scripts cannot discern the individual letters. As you are well aware, this usually involves a funky background, horizontal and vertical lines, or squishing, rotating, and blurring the text until it’s almost unrecognizable. The idea is that the human brain is very, very good at picking obfuscated words out of the mess, while computers are not.

The problem is, there is no standard way of making a captcha — everyone does it differently — and until this Stanford research project, there hadn’t been a systematic study of which obfuscation method is the best. To this end, the team created Decaptcha, a script that analyses, breaks apart, puts back together, and ultimately solves most captchas on the market.

Pre-processing

The first step in attacking each variety of captcha is pre-processing, which basically involves removing any non-letter content — colorful or distracting backgrounds, or crisscrossing lines — from the image. For the most part, this seems to be surprisingly trivial: a Gibbs de-noising algorithm reliably removes noise and lines from a captcha, and if that fails, a Hough Transform usually works (images above).

Tagged In

Post a Comment

http://twitter.com/m0r1arty James Moriarty

I’m fairly certain Xrumer and the like have been breaking Captchas for a while already, nice to see the academics explaining how they’ve been doing it though – sadly it means spammers’ll up their game.

Good for the immune system of the internetz, but bad for the usability of us simple carcinoma cells using it.

I’m not sure this will ever kick off as an alternative, however it is pretty pleasant: captchathedog.com

I’m back in the UK for a bit Seb, looking forward to reading ET more often – really like how the site is going these days – Well done to you and your team!!

http://www.mrseb.co.uk Sebastian Anthony

Hola! I’m actually in the US at the moment — in New Jersey for a few months (back soon, though).

There definitely lots of good alternatives to the text captchas, and I’m sure they’ll be explored in due course. The main problem, as far as I know, is that you can ultimately pay people to break the captchas for you. There are internet farms in China where you pay X cents per captcha…

Thanks!

http://pulse.yahoo.com/_T63YOKOKOKPOY3UVPUMI563QK4 Aokay

There would be a side benefit of stamping out CAPTCHAs. People with deteriorated vision — like me — wouldn’t be driven nuts! Also, it’s very hard to see the threat to some of the websites using the damned things. I predict that somebody in the US will soon be unleashing ADA against them. And don’t try to cool me out by telling me about audio alternatives — I have bad hearing too!! [ggg]

http://www.mrseb.co.uk Sebastian Anthony

Yea, the eyesight/hearing thing is a serious issue, and I’m not sure there’s an easy way around it :(

Anonymous

Fascinating article – but why does Sebastian Anthony presume that spam bots are only or even primarily found in Russia or China ? Can he present any evidence at all for this rather convenient assumption ? As is commonly known, it is extremely difficult to determine where bots originate ; is Mr Anthony’s ascription of Russian or Chinese orgins to them based upon anything more than prejudice ? As to internet farms which employ people to break captchas ; my understanding has been that these are primarily located in India (note this Ars Technica article from 2008 : http://arstechnica.com/security/news/2008/09/captchas-flummox-bots-but-may-be-doomed-by-captcha-farmers.ars). What evidence can Mr Anthony adduce to demonstrate that these farms have moved to China ?…

Henri

http://www.mrseb.co.uk Sebastian Anthony

Does that article say that such farms are primarily located in India? I’m sure internet farms exist there — they exist in any country where people can earn more online than from real-world jobs — but I doubt India is the mecca.

I mention Russia because it famously has a ton of pharmaceutical spam barons, and a lot of botnets seem to have Russian connections. I mention China because of its history with game (WoW) farming. To be honest, I don’t know which country has the most spam bots — again, I suspect they’re quite evenly spread out amongst the developing world.

Thanks for taking the time to comment!

Anonymous

Sebastian, the Ars Technica article to which I linked above is based upon – and links to – a ZDNet blog byDanilo Danchev dated 29 August 2008 (http://www.zdnet.com/blog/security/inside-indias-captcha-solving-economy/1835), which provides some detail on what Danchev calls the Indian «captcha-solving economy». Admittedly, a great deal may have changed in the three years since the blog was published, but if, as you say, you have no concrete evidence that this lucrative branch has moved to China or Russia, would not the wisest course be to abstain from speculation in that direction ?…

Use of this site is governed by our Terms of Use and Privacy Policy. Copyright 1996-2015 Ziff Davis, LLC.PCMag Digital Group All Rights Reserved. ExtremeTech is a registered trademark of Ziff Davis, LLC. Reproduction in whole or in part in any form or medium without express written permission of Ziff Davis, LLC. is prohibited.