F**K CAPTCHA

March 25, 2011

Using a CAPTCHA is a way of announcing to the world that you’ve got a spam problem, that you don’t know how to deal with it, and that you’ve decided to offload the frustration of the problem onto your user-base. As statements go, that’s pretty lame.

If you ran a high street store, you wouldn’t force your customers to mop the floor before you serve them, on account of the people who came in earlier with muddy boots. That mud is your problem, not theirs. The same goes for spam. reCAPTCHA bothers me the most because it tries to sugar coat users’ frustration and make it palatable to site owners. Helping digitise and preserve literature is a worthy goal, but it’s a task that’s utterly at odds with what your users are trying to do at that very moment.

Sometimes site owners seem to think they really need CAPTCHAs, having been hurt by spam in the past. Without hard evidence, it can be difficult to persuade them otherwise. Well, here’s some good news – I recently got chatting to Chris Korhonen of Animoto, who’s kindly shared some data that could help you talk your clients around.

In case you don’t know, Animoto is a web app that allows users to create video compositions from their photos, video clips and music. According to their press releases, their registered user-base grew from 300k users in August 2009 to 2 million users in November 2010. Roughly speaking, that’s 2,400 new registrations a day – giving them plenty of data to run quantitative research.

In Q1 2009 they ran a simple experiment, looking at the impact of CAPTCHA on registration completion. This is what their signup form looked like at the beginning of the study:

Users were directed to the sign-up form direct from the homepage before they could interact with the product. As you can see, there was a CAPTCHA at the bottom of the form (powered by reCAPTCHA). With this design, they had a conversion rate of roughly 48%. They then removed the CAPTCHA, and it boosted the conversion rate up to 64%. In conversion rate lingo, that’s an uplift of 33.3%! They replaced the CAPTCHA with honeypot fields and timestamp analysis, which has apparently proven to be very effective at preventing spam while being completely invisible to the end user.

To quote Chris:

“We left the test running until the results were statistically significant to a 99% confidence level. We’ve followed the same testing methodology with other bits and pieces – removing demographic fields, moving things around, and so on, but nothing has moved things more than a couple of percent.”

Got any evidence of your own about CAPTCHAs and conversion rates? Comments, please…

————–Edit #1: For some reason this article has hit the front page of Hacker News and is getting quite a lot of traffic. I should mention that yes, I acknowledge CAPTCHAs are of course sometimes unavoidable. That doesn’t mean, however, that we should ever feel good about using them, nor should we fool ourselves that users don’t mind them.

However, in a few cases, CAPTCHA can work well as a deliberate barrier when you don’t want optimum conversion.

For instance, I run a website where we have comments on articles. Since we started using CAPTCHA, the proportion of crappy comments has decreased because people have to make more of an effort to leave one.

If you just want to leave a driveby, stupid comment, the effort required isn’t worth it. If you really want to say something interesting, it is. These days, most comments are longer and better thought out than they were before.

Most of the time, CAPTCHA is an unnecessary barrier to conversion, but it’s not always bad.

A honeypot field is a hidden field that a real user wouldn’t see, but a bot would. So on the example form in this post, a honeypot field could be a “state” field, with a corresponding drop-down list with all the US states. Then you hide the field and it’s label with javascript or css (ideally obfuscating the fact that you have hid them – the smart spammers actually detect hidden fields now). A real user of the site would never fill out those fields (because they are hidden), but an automated program that’s just crawling the markup would. If the form is submitted with “state” filled out – bam – it’s a spammer.

Note though, that there are accessibility concerns here. Screenreaders like JAWS would pick up on the honeypot fields and legitimate visually impaired users may have trouble with the form. I’m not sure if there are any best practices out there about that.

tetfsu

March 25, 2011

There are a couple of things that you should consider before dismissing the idea of CAPTCHA…

1) The user of CAPTCHA is sometimes not used to prevent spam, but instead used to help prevent brute force attack to try and steal customer data. CAPTCHA keeps someone from using an automated mechanism to try to crack into a customer focused site and getting their sensitive data.

2) Using a honeypot field is a great idea for users or who are sighted. But when that comes to a blind or low sighted user who uses a screen reader (a computer) this can cause an issue. In general CAPTCHA causes issues for these users but with programs like reCAPTCHA that have an alternative audio feature they can still be used.

I agree that using CAPTCHA for a simple interaction is honorus in most cases, but I use the examples above more for user account creation or forgotten passwords/usernames and other account servicing type situations.

Gro

March 25, 2011

Lord Inglip is not pleased…

John Strickler

March 25, 2011

I never thought about honeypot fields before, instead of hiding it, you could just give it a fixed position off of the screen. Still the issue with the screen readers still exists but its at least one more hurdle for the bots to overcome. Cha ching!

In the UK, blocking people who fill in honeypot fields and thus locking out all users of screenreader software would be illegal under the Disability Discrimination Act. I’m guessing that the Americans With Disabilities Act would similarly be a problem.

So its not just a case of “Oh well, I don’t care about those blind people, I just wanna block spammers!”, you’re creating a liability for your organisation.

Also, under the UK legislation, “offering a service” covers free services too – so you can’t make an excuse that you’re not charging for something.

So anyone got any ideas of how you create a honeypot that allows real sight-impaired people through but catches spammers?

Chros –
“Worth noting though, that there is a difference between simple sign up forms and forms that process credit cards, where the repercussions of spam can be much more serious”

From the business’ point of view, though, the repercussions of dropping genuine users on a credit card form are also much more serious – ie lost revenue, and possibly a loss of repeat customers.

CAPTCHAs should only ever be implemented after a rigorous cost/benefit analysis. Too often I see website owners allowing developers and techies to make a decision to add one as a quick technical fix to a problem, without considering or even understanding what the repercussions might be.

Dylan

March 25, 2011

So it looks like your method of filtering is to manually go through and filter and moderate comments (ie. the spam for enlargement and the follow-up “suck it, lol”). while that all well and good on privately owned sites and small personal blogs, this kind of filtering wouldn’t be a viable solution to for large scale operations such as a large community forum or a branch website of a company. In fact any website where articles are fetching tens of thousands of page views or more wouldn’t be able to go through and meticulously moderate quality. It’s just not a legitimate solution.

Hey, if you like manually moderating comments and posts for quality, more power to ya! I love an in depth discussion.

I guess my point is, in regards to generating traffic, if you’re going to slam CAPCHA and reCAPCHA, you have to equally slam honey pots. (….unless you really really hate the visually impaired) But those systems are in place to create a legitimate level of quality that doesn’t require a staff all its own. It’s not perfect but thats why its there.

In different project i use honeypot fields and different hack to change default submits form as example – the name of fields can be changed by adding date time stamp and other small hack’s. And no problem with spam bots.
I think that this is right way.

captcha are useless. I am a programmer for 25 years and I’m too stupid for every secound captcha. 50% of these are then in hebrew, russian, chinese or any other unreadable letters with the other one a math formula. I can put in anything I want and it doesn’t fail.

Most of the forum post apps I have seen used for boosting PR for a site by bot generated forum post already have a place to put your user name and password in for de captcha, a service that charges meer pennies to automatically break recaptcha. I agree that trolls are likely one lazy ones that a captcha will stop.

I use a variation of the honeypot that I believe is accessible as well.

The basic form includes an extra field with a simple question, that has to be answered.

This makes it accessible.

I then have some javascript that answers the question and hides the field. So for users that have javascript, they don’t even know the question was there.

So far it works quite well. The only spam I have seen get through looks like manually posted product promotions.

Like all ideas, if it gets too popular the spammers will get wind and work out ways to hack it. So shhhh, don’t tell anyone ;-)

Vlad

March 26, 2011

On honeypots and screenreaders: I don’t see a problem here. Just state in plain English in the field description that it should not be filled. The majority of users won’t see it, bots won’t understand it, those who hear will know to ignore.

bryan

March 27, 2011

I’m confused about the timestamp analysis thing because, admittedly skimming, it assumes you have a limit to how long you should take to fill out a form and if you go over that it means you’re a bot? wouldn’t the opposite be true?

Bryan – it’s the other way around – bots are typically very fast, humans typically take much longer. It’s an arms race, though. CAPTCHAs are like nukes – they are known to be quite effective (at blocking bots), but they cause a lot of collateral damage (by annoying or turning away users).

bryan

March 28, 2011

“Bryan â€“ itâ€™s the other way around â€“ bots are typically very fast, humans typically take much longer.”

That’s what I thought, I found the tutorial confusing on this issue due to these parts:

“In other words, your visitor will have a limited amount of time (specified by you) to fill in the form and send it.”
and in “Checking the Form” which says

* Has too much time elapsed?

but the logic on that seems backwards, shouldn’t I want to verify the form is not submitted too quickly. I might also say it is not waiting too long to submit, but I am going to assume that the bot is more likely to submit overly quick than too slow.

Also the code seems to want things to go fast (don’t know php very well so I can be wrong on this):

Problem is that whenever a popular site (like a blog platform) comes up with a smart captcha idea, it gets cracked and they have to go back to recaptcha or whatever. :(

The tricks like those in the mentioned links work well for little personal sites only.

Neej

April 8, 2011

Captchas really aren’t going to be beaten – sorry. The methods outlined here, as mentioned, are already detected and accounted for by some of the more advanced link spamming applications.

Of course experienced link spammers make their spam either not look like spam or use techniques to make it impractical to find it.

For example if I wanted to link spam blogs using Scrapebox I’d search for blogs to do with topic A then leave a comment that was relevant to topic A. It looks like a legitimate comment.

Or if I was link spamming forums using Xrumer (a practice that really only works effectively in large link wheel type backlinking campaigns – the forum links point to other higher PR pages containing links to the money page that I’m promoting despite the hype surrounding Xrumer in the BH SEO world) I make new accounts, leave them for a few days, make a few innocent posts usually to the Introductions or anything goes forum – then after a few months when the posts are buried deep and hence unlikely to be dug up I go back and add my links to the account profile.

Good luck wiping this practice out.

If anything captchas *help* serious link spammers because it provides a barrier to newbs who either can’t or wont cough up the dollars to solve them through decaptcha, death-by-capcha and other similar services. Meaning less spam and more chances of our links remaining undetected and giving maximum link juice.

The timestamp thing is a nightmare: guess who it hits?? Not bots, who are fast. It hits screen reader users, dyslexics, and the same kinds of people who have issue with the CAPTCHAs in the first place. Timed forms are a known barrier. Avoid them. Oh and now you require scripting to have a client communicate with a server, breaking the whole idea the web was built upon (HTML and HTTP). Good job! Let’s just build the whole form in Flash and be done with it. All the cool people are doing it.

Honeypots don’t screw with screen reader users if you aren’t stupid about them. It’s a form input, meaning it gets a label (what? you don’t use labels? please stop building forms then), and like every good label it should tell the user what to do (go ahead and user-test: if people keep filling stuff in because they are conditioned to fill in inputs, then go ahead and tell them what to fill in and filter that out in the back). I use honeypots and they don’t block out screen reader users (unless they can’t read at all, in which case, they aren’t filling out my forms in the first place… so note: I’ve seen English honeypots (and English skip links) on non-English forms/pages. Lolwut).

The backend’s job is to filter the spam, not the client’s. I never understood this idea that filtering stuff on the client end was somehow good, secure, or worth the hassling of users. Kinda like expecting everyone to have Javascript running. Let’s go futher and assume everyone is running IE on Windows with the Flash plugin. After all, anyone who doesn’t clearly is too poor to buy your products anyway, so you sure don’t want your time wasted by those sorts of ruffians and scoundrels and crazy paranoid Stallman types, lol.

Anyway except for the jQuery bull, excellent article. I fight CAPTCHAs every day, but I bother less and less. WebVisum plugin for the win.

Regarding the audio alternative currently used in reCaptchas – they are completely unusable.

This is because the amount of background noise and word distortion makes it impossible to guess for humans. Just select “get an audio challenge” on their demo page: http://www.google.com/recaptcha/learnmore

Great article. We just launched a new product here in Sweden. And before launching the website we had some thoughts on using reCaptcha. But we ended up using honeypot fields instead.
The main reason for this, other than that reCaptcha is freakin’ annoying. Is that the majority of our user base is “older” people (+50 years). And we really didnt want to add more friction for them when contacting us / signups.

Captcha can be really helpfull sometimes though. For example loginforms, if the user enters the wrong password to many times. Instead of blocking ( something you might wanna do still but ) the user after X errors.

Our IT department are pushing for us to use CAPTCHA on the email form on our site, we are resisting because of the impact on the average user. For now we’ve said we’ll look at it again if spam becomes an issue. So far the email filters do their job and we only see a few spammy emails a day – so it’s at a manageable level.

The article, and the comments, give me some data to back up our view point, and some alternatives for the next conversation.

What about using an unlock widget like the one used for an Iphone. The user just has to swipe the ‘Unlock’ to prove they’re human. Would work for people using a mouse or tablet, not sure what I’d do about keyboard navigation only though. A screen reader wouldn’t work either. Perhaps I’d use a different mechanism for screen reader and keyboard only navigation. A need something quite robust for a financial system.