There is no evil like reCAPTCHA

Like many things that starts out as a mere annoyance, though eventually growing into somewhat of an affliction. One particularly dark and insidious thing has more than reared its ugly head in recent years, and now far more accurately described as an epidemic disease.

About The Author

18 Comments

I hate reCAPTCHA with a deep-rooted passion. It’s insidious, annoying, probably doesn’t work, and all you’re doing is helping Google by playing the role of a dumb bot. It’s dreadfully dystopian.

Yes X 1000. It is annoying and sometimes broken causing denial of service. It enables google to track users. To make things worse it’s preinstalled in lots of content management systems (including osnews when you guys first switched to wordpress). I will protest the use of google recaptcha everywhere I encounter it!

If your forum is for an interest group (favorite computer, car or whatever) you could add a simple question in plain English (or whatever language your forum uses) that all users of your forum would know the answer to – it will fool the spambots. Example: “please write the name of the computer that we discuss on this forum, it start with X and has five letters”.

If you run any site on the internet which accepts user input, you will find yourself inundated with spam… Having a captcha is reasonably effective at stopping or seriously reducing it.

Blocking known abusive addresses no longer works, you will end up blocking every TOR exit node, most public VPN services and any ISP that uses CGNAT, which will then block a large number of otherwise innocent users.

If you run any site on the internet which accepts user input, you will find yourself inundated with spam… Having a captcha is reasonably effective at stopping or seriously reducing it. Blocking known abusive addresses no longer works, you will end up blocking every TOR exit node, most public VPN services and any ISP that uses CGNAT, which will then block a large number of otherwise innocent users.

Actually, the second part of your post is a huge reason I object to google recapcha specifically, it blocks a large number of otherwise innocent users. And by “block” I mean it asks you to identify god damned storefronts, fire hydrants, and cars in a photo in an endless loop no matter how many times you get it right. I find google recaptcha annoying, but the far bigger problem with it is when it results in a denial of service to legitimate humans including myself. That’s a primary mission failure.

You note that IP reputation can hurt innocent users, but you should realize that google recaptcha also uses IP reputation along with device tracking/fingerprinting. People with active google devices/accounts are given the green light immediately, but people like me who block google tracking and ads are punished for it with impossibly difficult recaptchas across the web. 🙁

By standardizing website access mechanisms to use google’s recaptcha mechanism, they’ve created a monoculture that spammers are exploiting (there are now extremely cheap 3rd party services that solve google’s captchas for spammers). Since solving the recaptcha is no longer a reliable indicator, and google knows this, it’s been forced to fall back to IP reputation based solutions and google tracking/account login data. In order to keep the number of spammers down (ie those who can solve the captchas) they have to increase the false positives, but that blocks humans. Google has lost 🙁

The conclusion is usually the same, avoid the sites that use recaptcha, keep trying, change IPs, etc. It all dances around the roots of the problem, captchas are bad for accessibility, it’s too difficult for humans, too easy for bots, causes new failure modes. They are bad for the web.

The main reason recaptcha specifically is effective in stopping spam today has virtually nothing to do with the “captcha” mechanism. This just isn’t effective against today’s spammers who have no trouble getting past the captcha and google might as well get rid of it. What’s actually blocking spammers on websites that have recaptcha installed is google’s IP reputation & google account cookie tracking.

Unfortunately webmasters are often under the false impression that the captcha is what’s stopping bots and letting users through, but that’s been untrue for a couple of years now. Solving captchas is a solved problem for spammers. I won’t link any here, but there are plenty of services today where you can buy access to recaptcha bypassing APIs for cheap. The main thing keeping spammers out is IP reputation and cookie tracking.

To quote the OP, bert64:

Blocking known abusive addresses no longer works, you will end up blocking every TOR exit node, most public VPN services and any ISP that uses CGNAT, which will then block a large number of otherwise innocent users.

And he’s right, innocent users do get snagged in this sort of blocking.

All very interesting reading but none of that is actual data that it blocks a “it blocks a large number of otherwise innocent users.”

It blocks some quantity of users, but all captchas do. The question is it blocking an unacceptable number of users. But what neither of us have is quantifiable data of how many uses that is

Sure, I get that you want to gauge the problem with more accuracy on a nice little chart. You want to know how many times captchas falsely concluded that a human was a bot, I’d like to know too, but ironically the data you are asking for is something that even google doesn’t have because if they had an algorithm to determine when the captcha fails, they could just use that algorithm to fix the captcha. How would google even poll for that? They could have a little checkbox on the captcha asking if the user was wrongfully blocked or wrongfully given access, that’d be cute, haha. For these reasons the data you are seeking is almost unknowable. But just because a statistic is unknowable doesn’t mean it’s not a problem for lots of users especially when you see lots of complaints online.

Hypothetically, what proportion of google recaptcha trouble for humans is acceptable to you?

Every captcha in history has gone through the same lifecycle. At first it works well and blocks all the existing bot tools on the market. Then the bot authors up their game and manage to solve the captcha. The captcha then needs to become more difficult and more tedious to block the bots. This cycle continues until the captcha becomes impractically difficult for humans. This is always going to be the outcome of every mainstream captcha. Google only manages to mask this (ie hide it from users) by using traditional IP reputation heuristics and account cookies to allow google users to either skip the captcha or get a low difficulty version of it….but that’s not the captcha working, that’s the heuristics that predetermined the user was safe.

So although google’s own registered users are covered by recaptcha, the problem is for users that don’t want to have google accounts or to be tracked by google. Technically that makes us a niche population. We’re generally happy to avoid google properties and use ad-blockers and tracker-blockers on other websites to cut ties with google. However when it comes to recaptcha that’s a major problem because the more effectively one block google’s web-tracking bugs, the more recaptcha’s heuristics treats us as a bot. For better or worse, google’s recaptcha heuristics discriminate against humans that avoid google.. Webmasters that deploy google recaptcha either don’t care, or don’t realize what’s going on. 🙁

reCAPTCHA is also one of a small number of components used by third party sites that steadfastly refuse to work on Firefox. Embedded Google Maps are another. I hope the EU investigates these anti-competitive practices, but am not holding my breath.

I don’t get it. Its simple and easy and reduces spam. Especially the newer “Click here” buttons. I mean the alternative is to do second factor with a hardware key. That would work, but would be pretty annoying and potentially expensive.

I don’t get it. Its simple and easy and reduces spam. Especially the newer “Click here” buttons.

It would be one thing if it worked reliably for all humans, but if you don’t have a dedicated private IP don’t log into google, clear cookies after every session, block trackers, etc (all things I’m guilty of on a regular basis) then sometimes it can be extremely difficult and even impossible to pass recaptcha. I understand the spam problem, but google recaptcha has caused me enough grief that I try to avoid sites that force me to use it. I can’t make anyone stop using it, but it’s absolutely not a panacea as some imagine it to be.

Yeah, especially those of us with vision impairments or no vision at all. At least Google’s solution has audio for those like us.

Yeah it’s good they offer that.

Sometimes when I’m about to give up on the visual captcha after answering correctly over and over again and have no hope of getting in, I try the audio one and it lets me in right away. I think google’s difficulty calibration for the audio and visual captcha are independent from one another. Since few spammers use the audio captcha, the same rate of spam blocking can be achieved with very low audio captcha difficulty.

I hate them because I can never figure out what the damn thing is supposed to be (or perhaps it just doesn’t work).. Seriously, there was an NPR report a while ago that some team had made an AI that could crack them something like 95% of the time.

Which means for them to work, they have to make deigns that *humans can’t figure out either* – which kind of defeats the point.

What about the “select all bridges” type CAPTCHA that is used on a number of places (which is also Google I think, but whatever..)

I wonder (or maybe I read somewhere) whether the real purpose of reCAPTCHA that has two different strings is to get humans to unwittingly help them with spots that Google Books massive automatic book-scanning machinery can’t figure out, because a page accidentally pinched or puckered or something during the super-fast scanning phase. (HOW do you manage make a mechanical robot that can scan probably tens or hundreds of books a day totally automatically?!)