Now, the captchas provided by the site aren’t very “hard” to solve (in fact, they’re downright bad – some examples are below):

But there are many interesting parts here:

The HTML 5 Canvas getImageData API is used to get at the pixel data from the Captcha image. Canvas gives you the ability to embed an image into a canvas (from which you can later extract the pixel data back out again).

The script includes an implementation of a neural network, written in pure JavaScript.

The pixel data, extracted from the image using Canvas, is fed into the neural network in an attempt to divine the exact characters being used – in a sort of crude form of Optical Character Recognition (OCR).

If we crack open the source code we can see how it works. A lot of it comes down to how the captcha is implemented. As I mentioned before it’s not a very good captcha. It has 3 letters, each in a separate color, using a possible 26 letters, and they’re all in the same font.

The first step is pretty clear: The captcha is copied into the canvas and then converted to grayscale.

function convert_grey(image_data){
for (var x = 0; x < image_data.width; x++){
for (var y = 0; y < image_data.height; y++){
var i = x*4+y*4*image_data.width;
var luma = Math.floor(image_data.data[i] * 299/1000 +
image_data.data[i+1] * 587/1000 +
image_data.data[i+2] * 114/1000);
image_data.data[i] = luma;
image_data.data[i+1] = luma;
image_data.data[i+2] = luma;
image_data.data[i+3] = 255;
}
}
}[/js]
The canvas is then broken apart into three separate pixel matrices - each containing an individual character (this is quite easy to do - since each character is a separate color, they're broken apart just based upon the different colors used).
[js]filter(image_data[0], 105);
filter(image_data[1], 120);
filter(image_data[2], 135);[/js]
[js]function filter(image_data, colour){
for (var x = 0; x < image_data.width; x++){
for (var y = 0; y < image_data.height; y++){
var i = x*4+y*4*image_data.width;
// Turn all the pixels of the certain colour to white
if (image_data.data[i] == colour) {
image_data.data[i] = 255;
image_data.data[i+1] = 255;
image_data.data[i+2] = 255;
// Everything else to black
} else {
image_data.data[i] = 0;
image_data.data[i+1] = 0;
image_data.data[i+2] = 0;
}
}
}
}[/js]
Finally any extraneous noisy pixels are removed from the image (providing a clear character). This is done by looking for white pixels (ones that've been matched) that are surrounded (above and below) by black, un-matched, pixels. If that's the case then the matching pixel is simply removed.
[js]var i = x*4+y*4*image_data.width;
var above = x*4+(y-1)*4*image_data.width;
var below = x*4+(y+1)*4*image_data.width;
if (image_data.data[i] == 255 &&
image_data.data[above] == 0 &&
image_data.data[below] == 0) {
image_data.data[i] = 0;
image_data.data[i+1] = 0;
image_data.data[i+2] = 0;
}[/js]
We're getting really close to having a shape that we can feed into the neural network, but it's not completely there yet. The script then goes on to do some very crude edge detection on the shape. The script looks for the top, left, right, and bottom-most pixels in the shape and turns it into a rectangle - and converts that shape back into a 20 by 25 pixel matrix.
[js]cropped_canvas.getContext("2d").fillRect(0, 0, 20, 25);
var edges = find_edges(image_data[i]);
cropped_canvas.getContext("2d").drawImage(canvas, edges[0], edges[1],
edges[2]-edges[0], edges[3]-edges[1], 0, 0,
edges[2]-edges[0], edges[3]-edges[1]);
image_data[i] = cropped_canvas.getContext("2d").getImageData(0, 0,
cropped_canvas.width, cropped_canvas.height);[/js]
So - after all this work, what do we have? A 20 by 25 matrix containing a single rectangle, drawn in black and white. Terribly exciting.
That rectangle is then reduced even further. A number of strategically-chosen points are then extracted from the matrix in the form of "receptors" (these will feed the neural network). For example a receptor might be to look at the pixel at position 9x6 and see if it's "on" or not. A whole series of these states are computed (much less than the full 20x25 grid - a mere 64 states) and fed into the neural network.
The question that you should be asking yourself now is: Why not just do a straight pixel comparison? Why all this mess with the neural network? Well, the problem is, with all of reduction of information a lot ambiguity exists. If you run the online demo of this script you’re more likely to find the occasional failure from the straight pixel comparison than from running it through the network. That being said, for most users, a straight pixel comparison would probably be sufficient.

The next step is attempting to guess the letter. The network is being fed with 64 boolean inputs (collected from one of the extracted letters) along with another series of pre-computed values. One of the concepts behind how a neural network works is that you pre-seed it with some of the results from a previous run. It’s likely that the author of this script simply ran it again and again and collected a whole series of values to get an optimal score. The score itself may not have any particular meaning (other than to the neural network itself) but it helps to derive the value.

When the neural net is run it takes the 64 values that’ve been computed from one of the characters in the captcha and compares it against a single pre-computed letter of the alphabet. It continues in the manner assigning a score for each letter of the alphabet (a final result might be ‘A 98% likely’, ‘B 36% likely’, etc.).

Going through the three letters in the captcha the final result is devised. It’s not 100% perfect (I wonder if better scores would be achieved if the letter wasn’t turned into a featureless rectangle before all these computations) but it’s pretty good for what it is – and pretty amazing considering that it’s all happening 100% in the browser using standards-based technology.

As a note – what’s happening here is rather instance-specific. This technique *might* be able to work on a few more poorly-constructed captchas, but beyond that the complexity of most captchas just becomes too great (especially so for any client-side analysis).

I’m absolutely expecting some interesting work to be derived from this project – it holds a lot of potential.

It would be awesome if this script was sending all the actual captchas images and their human-read equivalents back to some database that would serve as further training data. I’m sure spammers would pay big money for that. ;)

It would be nice if we could harvest our (human) successful CAPTCHA interactions to train a neural net. An existing neural net could use back prop. or something based on the correctness of it’s guesses when presented with a CAPTCHA. Perhaps it could sit passively as a FF extension, churning into silent action when a CAPTCHA is presented.

Im curious w/ getImageData is it possible to use other type of algorithms for say form factor detecting (ie face, objects, etc) I have to imagine this type of work is not extremely fast but the nn aspect seems like there are a bunch of things you can do beyond simple CAPTCHA … any thoughts? I have not played around w/ canvas much yet but I think this post peaked my interest big time.

@Jon Baer: Speed is likely going to be slower when compared to say, php image processing. But there’s still a lot you can do. One site, canvaspaint.org, has attempted to recreate MS Paint using the canvas element.

Also, I’m currently working on a site which generates collages from user-uploaded images. Using getImageData on each pixel of the original image, I use flickr’s API to find photos which match the hex color. It’s still mostly hacked together, but it’s certainly another good example of what can be done. Check it out here. (Also, here’s an example collage using the google logo.)

@John Resig: Very, very cool find. The neural network aspect of this really interests me. Just a note: I think you meant to point your link to “the source code” (in the sentence “If we crack open the source code we can see how it…”) to userscript 38736, (not 3873).

“A pretty amazing piece of JavaScript dropped yesterday and it’s going to take a little bit to digest it all.”

Oh, thank you. I actually wrote most of the code in December, I just updated it a few times this month. I’m somewhat amazed at the sudden publicity and praise since I consider myself only mediocre in my ability to use javascript and to use neural networks. I’m sure it could be improved a lot by someone who really knows what they’re doing. There really isn’t even a good reason to convert it to greyscale in the javascript implementation either, I just held it over when porting it from my python version.

For some reason I’m not too impressed, cause this kinda sounds like a hack to me. Meaning this may work on a particular instance of captcha,but could easily be overriden. If their were some principles there that made it more generalizable then I’d really be impressed.

I think the kind of image and letter recognition software needed to crack most captchua’s is beyond the scope of javascript. Their have been some impressive gains lately in the rapid feed forward modeling of human vision. It is more likely that we will see these complex algorithms in external software breaking turing tests.

Do you have an alternative to captchas that should be used instead? Or are you one of the nefarious few who profit from automated spamming of forums across the internet? Or perhaps you simply enjoy spam posts more than legitimate ones? I surely hope it’s not one of the latter two options.

Hey, thanks to whomever created this. I guess we can expect them to start using the impossible-to-read garbage that Google, Yahoo and everyone else is using. It’s nice to be able to do it in one guess instead of three or four, but I guess it’s too important to hack everything.

Oh, come on. There are more than enough empirical studies that show that getting in the way of user access *degrades* community discussion. Captchas have been broken as a concept for a while now – just because we don’t have a good answer for how to fix the problem doesn’t mean we have to put blinders on and claim that they’re a good answer.

@podunk: It’s much better if a guy cracks a captcha and publishes the code, possibly helping the site change to something more secure, than letting some spammer develop something like this secretly.
Read http://slashdot.org/features/980720/0819202.shtml

This work fits a fairly established pattern: old ideas implemented in new, seemingly inappropriate language, to great amazement. Actionscript went through a similar phase about 8 years ago with various drag/drop interactive widgetry. This one’s an application of well understood visual search techniques that have got to be at least 20 years old.

The interesting bit is always the new context granted by the language. What does javascript or greasemonkey get you, here? It runs in a browser, on the web, and can be made to send results back to some central location, so there’s a new kind of results sharing David Bolter mentions above. It also stretches the boundaries of javascript somewhat, which I expect will result in more attention being paid to image/pixel manipulation by engine writers, just like all that Praystation business in 2000/01 led Macromedia to more seriously consider the use of Actionscript for desktop-like interactions.

This is a very clever use of Javascript.
At least it is good to see advanced programming in Javascript that is not DOM, UI or performance related.

It reminds us that:
– we now have the means to do advanced stuff in the browser (thanks canvas in that case)
– Javascript is a powerful language and you can implement advanced algorithm like in any other language
– Javascript performance in the browser got better and allows expensive computation
– people minds are amazing

Just google for “CAPTCHA Breaker” or pwntcha and you’d have much more advanced captcha breakers capable of much more. But javascript won’t be able to handle other captchas very well because you need good linear algebra libraries to break harder captchas. Even a normalization step such as PCA requires eigenvectors. This means that JS will be wildly inappropriate.

I remember a while back some friends of mine working on some code to try to figure out how to record license plates as a project as we drove down the road. So you’d be on your way to work, and you could have your car computer tell you how many times it has recorded the cars around you on previous occasions.

The OCR was one trick to it, the other was that different states have different color combination.

I never would have imagined that somebody could write this code in javascript, that’s impressive.

I certainly agree that this example code is excellent and a worthy contribution to the arms race between spammers and those who want to protect forums from them. I simply would like to know if anyone denigrating captchas in general has an idea that is better than getting rid of captchas altogether. I’ve seen too many unprotected forums and services decline at the hands of spammers to accept that the solution is to have no mechanism whatsoever.

This neural network could easily be setup to learn of its own accord; via detection of the ‘success’ and ‘failure’ states on the response after submitting the CAPTCHA-“protected” form. When an input is confirmed as either successful or failed, only then does it become of any value for use with future inputs…so to “teach” a neural network you either need to tell it what’s right or wrong (exhaustively) or give it the ability to make that decision itself (which can be done quite easily in this case!).
In terms of handling CAPTCHAs of greater complexity…just up the number of inputs (and of course the number of NN layers), will take longer but _can be done_!
PS: I

how could you transform this to make it a web based aplication?
I have a php method because php can read grayscale images too but I was wondering how could you put javascript on a site and make it work like the online example and then auto-get download (I just want to know how can I put this to work in a web aplication by itself without need use grease monkey)
-greetz-

When you first posted you had a nice list of tools you used to prettify the source code. It appears as if you have elided this bit since the author has opened up the source. Could you please list those tools again? They sounded handy.

how about the 25Billion simple things that humans could answer in an instant, assuming you’re capable of actually using the internet you should as well be able to. like for instance: won plus won iz how many?

now take that simple statement, apply whatever idiotic imaging you want to, put the statement “interperet the following” “decipher what follows” “figure this out” “the answer is” make all of them completely independant, and have lets just say for reference sake google (i’d assume they’d like to assist in spam prevention especially if they could make some profit from the process, as well as the capacity to do so) centralize distribution with simple logarithms routed through other partnership sites using a dedicated portion of resources, and thus potential for profiting from use of resources as well as no added cost to end user…………..

It is irrelevant as to how hard you make captchas because of the supply and demand effect. Spammers have a great need to do their spamming and an entire industry is being born from captcha’s. Captcha images are being routed to india to be broken by humans whom are paid 80 cents per 1000 images broken. So you can come up with a million ideas to stop it but in the end it doesnt matter because you will have humans cracking captchas by the thousands.

@Dean: If the spammer uses real people no kind of turing test can protect the system, but by forcing the spammer to hire people you’re raising the costs and lowering his profit. And at least we give more people a job :P