While most of this ongoing series on WTF Blog Clutter has been focused on the blog sidebar and design elements, a big clutter element is the continued use of the CAPTCHA with comments with the misguided belief that it would stop comment spammers. NOT.

CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart, created to ensure that humans can read the letters and numbers in a way that computers can’t, so automated scripts and bots can’t leave a comment on your blog. Pass the test and you’ve earned the right to comment. Except that the CAPTCHA techniques have been broken and bypassed easily by computers for years.

Websites use Captchas in an attempt to disrupt the spam and malware economy – but they are not working. “Spammers and malware authors are able to break Captcha process,” says Carl Leonard, a threat research manager at Websense Security Labs. “As a result, we’ve seen an increase in the amount of mail sent out from reputable mail services such as Gmail, Hotmail and Windows Live Mail, and an increase in the number of blogs that host malicious content, or content that the spammers wish to advertise.” Email accounts on such services are particularly valuable because spam filters cannot block them without also blocking genuine mail.

Techniques to break Captcha are nothing new. First, if a human can read an image then the chances are that software can do the same thing. In 2005, a software developer, Casey Chesnut, wrote a Captcha-breaking algorithm and demonstrated it by posting automated comments to nearly 100 blogs to demonstrate their vulnerability. In response to this kind of attack, Captcha authors have devised tests that are harder to solve. Images may be more squiggly than they used to be, making them harder to break but also more troublesome for legitimate users. Other ideas include 3D Captcha, relying on object recognition rather than character recognition; or framing questions that are trivial for humans to answer but hard for software to parse. Some approaches work better than others, but there are a number of inherent problems. One is that many Captchas are inaccessible to the visually impaired, and will fall foul of accessibility legislation unless there is an alternative. Another snag is that spammers may play their trump card, using humans.

WTF CAPTCHAs

When I talked about comment spam fighting techniques with WordPress at WordCamp Dallas this year, I loved the reaction to some images I displayed of WTF CAPTCHAs I’ve collected over the years. The crowd laughed and applauded, having experienced their own WTF moments facing these ridiculous combination of letters and numbers before you can leave a comment or take the next step.

I’ve found these ugly images on some of the most beautiful blog designs ever. Designs that have me swooning with envy at their clean, clear colors and design elements. Combined with great content, these are nearly orgasmic creations, until the phone rings, there’s a knock at the door, and everything screeches to a hault when I scroll down to leave a comment and find one of those blotchy, out-of-focus, clunky CAPTCHA scripts. CAPTCHA interuptus.

I’ve tried to stare down many a CAPTCHA trying to solve its mystery. I’ll tilt my head to the side, the other side, and even consider flipping my monitor or myself upside down to figure them out. I’ve spent so much time trying to figure out the CAPTCHA message I’m supposed to solve that I’ve often forgotten when I wanted to say in the first place.

When I haven’t been able to clearly see what it is asking me to do, out of desperation I’ve clicked the “If you can’t see this, click here” link only to have the entire page reload – and lose my comment! ARGH. WTF!

Don’t insult our intelligence with one of those dumb torture test quizzes, either. How much is 4+6? What is my name? What is the name of this blog? Are you a spammer?

Come on, folks! These are CAPTCHAs in disguise and they have been broken for ages. Any decent comment spam attacker can bypass these, and if they haven’t, they probably will by tomorrow. You can’t keep these updated fast enough to outwit the comment spammers.

I’ve talked to a lot of bloggers and blog readers about CAPTCHAs and torture tests. A lot of them say that they won’t comment on blogs with those features installed.

For those that do, many are often faced with the unfriendly white page with only another version of the CAPTCHA that says, “Wrong Answer. Please try again.” Or “Thank you for your comment” with no way to get back to get back to the blog unless you hit the Back button on your browser, and you probably won’t see your comment or know if it is acceptable, in moderation, or anything until you refresh the page. Do you have that much time and familiarity with your browser to do that?

CAPTCHAs and torture test quizzes are painful to users, and to the inexperienced blog reader, they can be exceptionally frustrating. Why bother when they don’t work?

Is There Hope for CAPTCHAs?

While I believe that CAPTCHAs have no place on blogs in the comment box, there is still a great need to improve security when it comes to logins and registration. These are highly sensitive areas. You don’t want just anyone to log into your blog or website and create havoc. There have been improvements in this area, but when it comes to the comment box – let comments come without a ticket to the dance. Everyone’s welcome to join in.

The article goes on to explain that while some, like Microsoft, are continuing to invest in improving their CAPTCHA systems, even while attackers are working harder at breaking them, community-based protect such as Akismet and Defensio Anti-Spam are actually working better at keeping comment spammers at bay.

That said, the internet is a long way from adopting this level of security, and there is always a danger that whatever steps the industry takes to improve authentication, the scammers will keep up with innovations of their own.

Mullenweg’s answer is to focus on the content rather than the user. His Akismet system for preventing spam comments relies on a combination of secret algorithms and community reports, and has proved remarkably effective.

“Ultimately Captchas are useless for spam because they’re designed to tell you if someone is ‘human’ or not, but not whether something is spam or not. Just because something came from a real human being doesn’t mean it isn’t spam, which is why content-based solutions like Akismet are the only long-term solution to the spam problem.”

If you are really worried about comment spam on your blog, I’ve long recommended a multi-layered approach, though I’m finding that Akismet does the job alone extremely well on my WordPress.com blog. By working together I believe we can protect each other from comment spam, but we also need to do more.

What do you think about Capatchas in other settings such as wikis? I know that if we turned off the capatcha mechanism for registration and left the system pretty much open, we’d be flooded with bot edits in Chinese for Warcraft Gold Mining. And I’ve seen a number of wikis that aren’t being tended to which have basically been over run with those edits. You really don’t have to use the capatcha if you’ve registered and verified the e-mail and you aren’t inserting links into your edits but other wise, capatcha it is. That seems to work but still… Every barrier to editing is annoying and may potentially hurt a project so it isn’t ideal.

The fact that spammers use mediawiki sites is particularly frustrating as the default for MediaWiki is ref=nofollow. (You can turn it off but considering the nature of wikis and what you could be looking to, it just doesn’t seem smart in terms of PageRank. ) There just doesn’t seem to be any inherent benefit to spamming wikis because of that.

CAPTCHAs for Wikis are often registration based defenses, so I recommend finding whatever tool is available for Wiki registration protection. It might just be a CAPTCHA. I understand how horrible that is, working on many Wikis.

As for “nofollow” it doesn’t work on anything, so it doesn’t matter if you have it turned on or not. It doesn’t hold weight any more as no one other than Google adopted it, and even they don’t take it seriously. A lot of stuff marked nofollow is indexed by Google. It might not get Page Rank, but if it can be found on a search, who cares.

I know its frustrating, but if CAPTCHAs aren’t working, then what do you use on a wiki? Not sure about that. Anyone?

For wikis, if capatchas don’t work, you’re basically left with two options:

1. Making sure you patrol like hell to remove wiki spam, delete articles created by spambots, and block robots from accessing special pages where you can see the history of early articles. Encourage contributors to patrol. Ban bot created accounts as soon as they are created. This is really time intensive.

2. Require that any one who edits verifies their e-mail address before editing. Still patrol because you’ll get human edited wiki spam or people who just can’t follow rules regarding content allowed on a wiki.

3. Capatchas in cases where people haven’t verified e-mail addresses.

One is kind of the ideal but really time and resource intensive. kei.ki does that. Two just has too many barriers to editing for a lot of people. If you want to just correct a spelling error (eror to error for example), then having to register, turn off. Blah.

No real solution to wikis unless you’re using wiki software that doesn’t have a high penetration rate in the wiki market where people have not created bots to use that software or they are protected from editing by shared passwords. (I believe the latter is the case for BarCamp’s PBWiki based wiki. You need the password to edit.)

And ouch. I had read that nofollow wasn’t used much but if it isn’t, why hasn’t that been depreciated as a feature on MediaWiki? (And that is digressing.)

I absolutely agree with Matt. Captchas are absolutely useless nowadays. Somebody in a foreign country will break it anyways. Sure, they’ll “reduce” spam, but it won’t eliminate it. Plus, it’ll just annoy your users.

Spammers are getting more and more sophisticated. We spend a lot of our time analyzing their behavior and let me tell you that they’re getting smart, very smart! We can’t rest on our laurels, we have to evolve with them.

Of course, the big problem with filtering systems such as Akismet is that there’s always the theoretical possibility of false positives. Even worse, a lot of scripts (Possibly even Akismet?) don’t properly inform users when their comment has been identified as spam, so they’re left confused and may even write/submit it again.

I haven’t used Akismet in a while as several false positives worried me, plus I didn’t have the time to regularly dig through all the filtered comments and work out which if any were legitimate.

While I do believe CAPTCHAs are nearing the end of their life, it’s worth bearing in mind that they do have two positive aspects:

1. They virtually eliminate comment moderation for the blog author
2. If you use one unique to your site, it’s very unlikely to be broken and at the very least will last several months

Theoretical possibility? No, it’s a fact. Akismet and the rest do the best they can with what they have. Since their data is reliant upon their own data, research, and information they collect from others, there will be problems along the way as people report legitimate comments as spam and others remove spam comments from the queue. It’s the nature of a collaborative method. But it works great even with those issues to deal with.

As for the two positive aspects you list, the article states that they do not eliminate comment moderation. Spammers know how to break through, so comment moderation remains a choice. As for the last, unique systems are great claims, but I’ve seen them broken in minutes. There are only so many ways to come up with a unique system, and if you spread it across multiple sites and make it available to the public, it will be broken fast as it is no longer unique.

I’m sure someone will come up with some system that will bring back the CAPTCHA but until then, I vote we work as hard as possible to put attackers, sploggers, and spammers out of business before they reach the web more than stop them once they get on. That would be the perfect world, wouldn’t it? :D

They can be broken quickly, but that’s a very different matter to whether or not they will be broken quickly. Breaking a unique CAPTCHA is generally not done automatically, so the popularity of your blog is a very big factor in whether or not it’ll be broken. And if you put up a fight and change it after being broken, you may win a war of wills quite quickly.

A lot of people could very happily get on with an easy-to-read unique CAPTCHA if their blog is one of the less popular ones, which of course comprises a huge amount of blogs. The fact that an unbroken CAPTCHA essentially eliminates comment spam also reduces moderation workload significantly, thus making them more suitable to those with little time to dedicate to moderation.

I’m only really making sure there’s another perspective to this, because a CAPTCHA could be more appropriate for some people based on their specific circumstances. I’ve seen several blogs (including one of my own) receiving many thousands of visits and dozens of comments per month, and thanks to their unique CAPTCHAs the spam throughput was almost non-existent.

Probably worth noting is that I no longer use CAPTCHAs myself. I do believe the comment submission process should be as easy for users as possible, even if it does burden blog authors more. But not everyone is like me and takes their blog so seriously they’re prepared to sift through comments checking for false positives every day. I think the “Using CAPTCHAs is evil!” tone of this article is slightly too hard on such people.

While we’re on the subject, does anyone know anything that can present a notification to people if their comment has been caught by Akismet? As far as I can tell it just goes into a black hole, which is very unintuitive for those unfortunate enough to be falsely identified as spammers.

A simple ‘Click here if your comment hasn’t appeared!’ message after posting would help a lot, making it easy for people to seek help. I get people who find my contact page and let me know anyway, but I’d imagine many either get confused and run away, or try to post it again thinking there was an error.

For the time being I just have a ‘If your comment doesn’t appear immediately after posting, please use the comment restoration form’ message below the comment submit button and hope people will see it. The form is naturally a simply contact form that asks almost nothing of the user beyond a name (and even that isn’t necessary — the time of the submission is enough so that I know to look into the spam tray and reveal it).

Currently, the process of getting your comment out of Akismet’s queue, often put there because someone(s) submitted as comment spam or you used a keyword combination that triggered a reaction, is to find it and remove it. If you think it has been caught, ask bloggers whom you’ve commented on to search their queue and mark it as NOT SPAM.

I think it would be nice to get a notification, but then they’d have to notify EVERYONE in the queue, including comment spammers. That’s a lot of work.

A message to the blogger is a nice idea for if the comment doesn’t not appear. Want to write the Plugin for that? :D

Your words are your own, so be nice and helpful if you can. If this is the first time you're posting a comment, it might go into moderation. Don't worry, it's not lost, so there's no need to repost it! We accept clean XHTML in comments, but don't overdo it please.