How reCAPTCHA works and how to mess with it!

The fault of reCAPTCHA lies in the fact that it is used to digitize non-digital texts maybe to OCR the web, as well as stop spam. What this means is that in every captcha, there will be two words: One that the computer knows is right and will compare and check your text against and one that it hopes to use to digitize text. In other words, reCAPTCHA only needs one word of your captcha to be correct for your captcha to be accepted.

You will be given two words: [Real, Fake] or [Fake, Real]. The fake word is unknown to the computer and can be replaced with anything.

Let me show you few examples of how ALWAYS FAKE words look like:

Numbers:

Symbols:

Words with accents or punctuation:

Indecipherable text:

Almost always fake:

Words with the following are usually fake with few exceptions. Sometimes you will get surprises, however.

Inverted colors:

Odd or non-matching fonts:

Deformation caused by scanning:

Things to remember:

1. The fake word is usually the one which is blurrier and harder to read, even if by a little. However, sometimes it is the one which is unusually clean and easy to read though the quality of the scanned words varies greatly.

2. Real words usually use the same type of font throughout as they are computer generated, while the appearance and font for fake words can vary greatly as they are scanned from multiple sources. See the “Odd or non-matching fonts” above for examples.

3. The fake word is usually thicker, bolder, and blacker. But sometimes it is also thin and long.

4. A fake word’s alignment of letters are more likely to be in a straighter line or a smoother curve as it is scanned from a printed material. A real word’s alignment of letters are more likely to be wavy and a bit jumbled up due to being distorted by a computer.

5. You’ll sometimes get words with lots of noticeable dots around them. They are obviously scanned from books and therefore, fake.

6. Practice! Once you start out, You’ll have difficulty identifying which captcha is real, But after doing a few dozen, You’ll be proficient in picking out fakes and this will be such a great time saver for you.

Update: I’ve been seeing a lot of comments like… “Yeah, How noble! screwing over a worth project. What a time saver!”
Guys, I’m not asking you to do that everytime you see a recaptcha. If you do have the time to read indecipherable text and fill it in then great, Thank you. But think about if you’re probably late for something and you have to download a file or leave a comment before you go, Will you keep refreshing the recaptcha until you find a word that you can read?
You get my point.
[ad#bottom]

Ok. Who is going to do that? You? It’s not difficult work so lets say you make minimum wage… and then we’ll stuff you in a library for a few years and your job is to transcribe every word of every book in the library. Granted. That’s one library. Have fun.

When you digitize a book to make it available for free you can add a text boxes for the uncertain words inside the viewer, so that people whom are actually reading that book fix it, taking into account the context! That’s the way to do this accurately!

Applying distortion to already hard to read piece of text, seen entirely outside the context, that is NOT the way to do it accurately.

What they are doing, they digitize the books which they don’t make available for free in text format. They’re stealing people’s effort, just to avoid paying some poor chinese or indian.

At same time, they’re clinical psychopaths of some sort. They played the idiots like a fiddle – the idiots think that recaptcha somehow uses the effort that would have been wasted. It does not; it just adds more effort that it steals.

Recaptcha is silly, I already have to enter a stupid word in order to access websites, retain access to the websites, register, log in, check my settings, change my settings, and now I have to enter two? Essentially doubling my frustrations?

If it was used to combat spam, sure, I see the point, but it’s not. Fuck recaptcha.

To digitize books, did you not get that the first time around? For you, entering 2 words instead of 1 is gonna take 6 seconds instead of 3. But the entire system digitizes more books than anything else out there, so it’s little cost for great benefit.

i’m often on omegle and i have to type that sh*t like every 90 seconds. i’m pretty much following your guide and if anyone wants to blame someone, then blame omegle for abusing that system, not me. it’s a good thing in theory only!

What he’s saying is that reCaptcha doesnt show the word only ONCE. The word that you ‘fucked up’ was correctly typed by a large number of people, and this number will ALWAYS be larger than the number of people who ‘fuck up’ the word.

No, it doesn’t make sense to mess with reCaptcha because you are going to have to use a Captcha method anyway. The websites that use reCaptcha could have used any captcha method. You would have still had to input words, recaptcha just utilizes your wasted effort. It’s like the new invention being tried out to use power generated by people walking over sidewalks in a city. You would have still used the same amount of effort to walk across down the street, just now it’s not wasted effort.

Did you even read the article? See, ‘captcha’ in recaptcha is an ordinary captcha with computer generated nonsense, that wastes same amount of your effort as other captchas do, and solving of which does not help digitize books. There’s also the second word, which does not protect from the bots, but solving which deprives some poor sod of a job.

Here’s what I wonder: People have to ask for a new captcha when they can’t read a word. So, what happens to that unreadable one– stays in the system until read, right? But, would that mean that the percentage of unreadable words begins to climb until, one day, maybe 20 years from now, almost everything’s been digitized, and we only have the unreadable crap left?
I don’t usually troll, but this was fun. : )

Can someone please tell me why I, yes the user not the pc, cannot see the words. When I go to buy tix for instance and the recapcha screen comes up, there are no words! Is my laptop too old or running an old browser? Please help…..

I freaking hate this whole word thing. Regardless of how many times I enter the stupid words they never work which never allows me to post comments on blogs. Besides they no longer even have a real word. They are all completely random letters now. Total Crap if you ask me.

Websites get a free widget to validate human users.
Google gets the text analysis of the end user.
The end user gets to post his comment (or whatever) without having to register or otherwise enter personal data.

One thing I hate about Captcha is that it says to enter the two words, but invariably one of them isn’t a word at all. Add to that the fucked up manner in which they are displayed, and it ends up that more than once my brilliant insights were denied to a forum because I just gave up trying to figure out what the fuck the fucking fuck was.