How robots and spiders are causing issues, how to stop them. We can also talk about Completely Automated Public Turing Test To Tell Computers And Humans Apart - their use, their compliance issues, porn proxies, PWNtcha and other ways to defeat them.

Am I missing something? Parsing this instruction text is trivial. Here is a quick-and-dirty bookmarklet that will select the necessary choices for you and submit the form (I didn't bother making sure that it works in anything other than Firefox).

Its just a lot uglier since I didn't know there was a .previousSibling attribute....

Text manipulation is the easiest thing a computer could do, there is no way you could write instructions a computer could not understand - because you yourself have to have instructions by which to generate the rules.

Oh, and the line

document.forms[0].elements.checked = true;

Should probably be changed to

document.forms[0].elements.click();

Because there are cases where the same object name is given twice, and so you have to uncheck it.

I changed the code above slightly because of the changes jungsonn made in the demo.

kuza, I think the same instruction coming twice was simply a bug, it doesn't seem to appear any more. Anyway, I think the correct response was still to select the checkbox, so replacing .checked=true by .click() would be wrong.

Oh I didn't read it here ^^ sorry, yes I changed a few things, not much though.
But I heard from Heyes that it wasn'r ready or something, well I never actually looked at the sourcecode that much.

Edit:
Yeah seems to work indeed, I heard that he wanted to implement random id tags but that it would break the screenreader comp. So i'll have to wait for it.

This is what Gareth emailed me before I posted the script:

Quote
Ronald, There's a ton of extra security features I could add to it but I
guess I just wanted to get it out there and see if anyone can break
it.

The weakness of this captcha is the passphrase, because if you
wrote a script which read the phrase and then decided which boxes
to get based on the words then you could gain the key. This could
be prevented by assigning random id's to the input boxes but then
screen readers wouldn't know which label went to which checkbox.

So you guys have an idea to fix this without random id's? Think that's pretty tough to do.

Is this user friendly? What if someone is just plain dumb and doesn't know the difference between a melon, apple and citrus etc. I mean when it says to select 'two apples' someone might choose the cherries thinking they are 2 apples.

As for the obvious weakness pointed out in previous posts enough said, there is also the fact that the images are not random. So parsing the images on the page and getting their MD5 or even filesize can easily say. File with size 855bytes is an orange. Random images must be used otherwise easy as someone making a captcha with letters as images.

>> Text manipulation is the easiest thing a computer could do, there is no way you could write instructions a computer could not understand - because you yourself have to have instructions by which to generate the rules.

That says it all. In my opinion an image captcha is harder to break than this. With an image captcha you at least fail if you use a good captcha, with this method though a properly written script can break it with 100% success rate.

Sure that's pretty obvious, but it's modified now. It got random classes added to it, making it harder for regexing the classes and there values. But, anyone is allowed to try. It can be build in a way that it is very hard to write a script for, but it probably involves a lot of code juggling on the fly.

I found another flaw, pretty easy one: the session is not destroyed after submitting and thereby it's possible to continously reload the page.

Well it depends if normal visual CAPTCHA's are better, depends on it's implementation. There are enough ways to break them without OCR's, like using the sessionid of the generated image for example.

Gareth Heyes Wrote:
-------------------------------------------------------
> Anyone got any suggestions on how I can improve
> it?

You could do all the positioning via CSS, and position each element individually - don't have one class per fruit - all you need to do then is correlate a classname to a fruit, and you also need to use random filenames for the images so that we can't just lookup the background to find out what fruit it is.

But this is still attackable, because unless you have dfferent images for each fuit, it is as easy as downloading the images, hashing them, and comparing them to the hashes we have for the images - one way to solve this problem is to either alter the images slightly so that simple comparison's don't work - but this can be defeated with a bit of work, e.g. by checking that say 90% of pixels are the same, or something - or by having lots of different images for the same fruit. And I mean LOTS of different images.

Also, don't have all the same fruits in a row, because then if you determine what one of the images in a row is, you automatically know that the other two are of the same type.

But yeah, other than image recognition - there isn't much that anyone has found which can work as a Turing test.....

Yes, I thought about using absolute positioning via CSS as well - position numbers, images and checkboxes independently. Which still doesn't stop the script from reading out positions and determining which checkbox belongs to which number/image. And with the canvas tag in Firefox I can get pixel data for the images which allows fingerprinting images without depending on file names or requesting enhanced privileges for file download.

So clearly it fails to an effective CAPTCHA, as one can see with the constant battle for the Image CAPTCHA, I don't think it's the way to go here. I do think I can get very close to a good CAPTCHA with JavaScript, but yeah that's the thing we don't want I guess. However, I am busy to test out a few screenreaders, I really want to see how much JavaScript they can handle, cause I read that JAWS -share of 50%- can read/excute JavaScript, which is worth to check out it's functions.

I was more thinking of trying to make a dynamically generated, convoluted system, where there are nests and quirks everywhere, and where they all use different types of positioning, and floating, etc, so that you either need a way to read the image and checkbox locations from the page, or pretty much implement a CSS parser.

It can still be broken of course by doing something like ripping Firefox's rendering engine, or something, but it would be much harder than simple js.

kuza55, both Internet Explorer and Firefox allow you to run high-privilege code (HTA in Internet Explorer and extensions in Firefox) so creating a CAPTCHA solver in JavaScript that has all the DOM features to its avail is not a hurdle. That's also why a JavaScript-based CAPTCHA that jungsonn is proposing won't work.

Sure, Lest we forget: to be fair there is a difference at the attack angle here. It's different if one does it manualy, or automated with a robot. I don't know a single robot that can execute the scripts I write. Botmaster -which seems one of the best- can only read certain form JavaScript, but not execute it.

If anyone has knowledge a bout a robot which can do this, I really like to hear it.

Jungsonn, I think that these robots don't exist because there is currently little use for them. But automating a browser is easy as I wrote above, e.g. HTA in Internet Explorer is basically a regular HTML page that is allowed to load web pages from anywhere and manipulate their DOM. It would be less efficient than a Perl-based spam bot but still efficient enough to do more than enough harm.

nEUrOO, which costs do you mean? Development costs are lower. Deploying the bot to a botnet is also easier (one simple file, and Internet Explorer is installed everywhere - at least if you look at botnets). Performance can be improved by creating multiple instances.

trev Wrote:
-------------------------------------------------------
> kuza55, both Internet Explorer and Firefox allow
> you to run high-privilege code (HTA in Internet
> Explorer and extensions in Firefox) so creating a
> CAPTCHA solver in JavaScript that has all the DOM
> features to its avail is not a hurdle. That's also
> why a JavaScript-based CAPTCHA that jungsonn is
> proposing won't work.

I know that, but they don't expose all the info you might need to figure out a convoluted CSS setup. Unless there's some feature I'm missing whereby you can get relative pixel values for elements?

kuza55, you simply look at offsetLeft/offsetTop properties of elements - that's their position after all CSS applied. You can also look at runtime CSS property values but that should be more complicated.

trev Wrote:
-------------------------------------------------------
> kuza55, you simply look at offsetLeft/offsetTop
> properties of elements - that's their position
> after all CSS applied. You can also look at
> runtime CSS property values but that should be
> more complicated.

Sorry, I'm joining this conversation late, but if you look at spidermonkey you can use a full fledged rendering engine without having to do anything super tricky. It's kludgy, but it would easily defeat any client side obfuscation. Ultimately this isn't much different than the HTA solution as described above, but still. I have very little confidence in this sort of solution as it relies entirely on code obfuscation, and that happens to be what computers are highly efficient at solving.

Actually, SpiderMonkey is only a JavaScript engine, without rendering or DOM. But if you meant to say XULRunner you are right, and writing a bot on top of that is a matter of one day (assuming that you don't know anything about it and must read your way through the documentation). Only reason I talked about HTA above - its runtime is widely distributed which saves you a 4 MB download.

Ah yes, thank you, trev. I haven't played around with either one (no time lately) so that's why I got my names crossed. But yes, HTA is a nice solution to the problem. A problem, ultimately that I think is solved by easily. In fact, this is even easier than most captchas because most captcha solving relies on OCR which is a bit of black magic anyway.