How robots and spiders are causing issues, how to stop them. We can also talk about Completely Automated Public Turing Test To Tell Computers And Humans Apart - their use, their compliance issues, porn proxies, PWNtcha and other ways to defeat them.

I've just released this today, it's a technical CAPTCHA targeted at high tech audiences like sla.ckers. It still needs some work but I thought it was a interesting concept,

I used code errors as the method because although a computer can parse syntax errors it usually cannot understand the code behind it. Is this easily breakable? I could introduce more randomisation and different code blocks if so.

I dont know if this is easily breakable for a computer, but the biggest problem is that its definitly not userfriendly since it takes too long to solve, even for slackers I think (or maybe I'm just not qualified ;)

Yeah sorry I added the instructions after :)
I'm working on a few bugs the code randomisation isn't working quite right yet.

Update...
I think I've fixed it now, should produce more random code and remove most false positives. Any suggestions for implementing other languages would be appreciated and also any possible attacks and weaknesses.

And am told that I'm a robot. Your check doesn't like passing a variable to a function, within the pass to a function (ie: function(passed) is fine, function(function(function(passed))) or function(function(passed)) isn't cool with your algorithm). This is annoying.. that is perfectly valid, and the most logical debug to the code.. why would I remove the function that the variable is being passed to?? :\ just a bug

I like the idea, although it does have the side effect of limiting your audience to those capable of answering it - a bit elitist and un-beginner-friendly. (but that may be what that site wants)

The only tricky part i see, is making a question base that's big enough and unique enough so an attacker can't just archive all permutations.

Another huge boost to complexity would be to make the code an image - so an attacker is first forced to OCR all of it - with any errors likely causing it's whole answer to break. You can even add code comments with OCR busting characters ( /*bl&#257;hbÌah º/ echo 'still a comment' */ )
That might cause the OCR to think º/ is really */ .. just a thought.

'Course an image requires a re-think on how a human enters corrected code (maybe an imagemap to click on the problematic character? maybe do it all in flash?)

Well, it is an interesting concept, especially due to the fact that it completely eliminates the importance of all the work a spammer may have previously done in CAPTCHA-breaking programs (by this I mean that he won't be able to re-use the stuff he made for other sites, because those probably use image and/or audio CAPTCHA's).

In this line of thinking, turning the whole thing into an image is not the way to go, as a spammer will probably be able to deal with the OCR part easily, because that's the kind of stuff he is probably good at. So unless you distort the whole thing as you would distort a CAPTCHA, you might as well not bother at all (and if you do bother, then it'll take ages for anyone to solve the CAPTCHA!). Plus, after OCR-ing the image a spammer would have to fix the code, so he'll probably fix OCR-errors in this step anyways...

On the other hand, the downside to this idea is that even if you're wiping out everything OCR-related a spammer might know, you're giving him new tools he can use to break the CAPTCHA: there's a lot of code-validators out there, and some of them are EXTREMELY powerful (and they should be: they were designed to check and automatically correct millions of lines of code!). So You might stop spammers for a while, but I don't think this approach would hold for too long on its own, as a determined spammer may be able to adapt an existing code-validator to suit his needs...

In any case, I think there's one thing you'll positively gain from this kind of CAPTCHA: no more stupid comments like ":)" in your blog!

I've improved it loads now, I can't see how a debugger could be written to parse the code, it requires some understanding of how the code works in order to fix it. The code is much lighter now 50% smaller :) I plan to release the code as soon as I'm happy with it.

I like the idea Gareth. I understand that code is used to limit your audience, and it should serve that purpose well, but I think the general idea has merit for wider audience-type captcha as well.

In particular, have a captcha where the user must take an english sentence and fix grammar related issues. This is something human brains should be naturally good at, but difficult (how difficult?) for a robot. To make it somewhat harder for bots, but easier for humans, you could use self referencing sentences. Something along the lines of:

"Insert two not tree comas into this sentence an fix any words that art not spelled correctly."

The captcha should only use misspellings that also happen to be words, otherwise it is trivial for a bot to detect the misspelling. Unfortunately, people (myself included) suck at grammar, plus it would be difficult to ensure unique solutions, so I don't think such a captcha would ever get much traction.

Another possibility that might be easier for humans to solve would be to just have self referencing sentences which contain instructions on what to do. e.g.

"capitalize all words which are not verbs or nouns from this sentence"
"identify the last letter of the second word and remove all instances of this letter from this sentence"
"remove any word from this sentence that contains more letters than the last word".

I think this variation has more merit, though still suffer from the drawback of being language specific. On the other hand, if the user is reading your website in language X, then they can reasonably be expected to solve a captcha presented in language X as well, no? Another potential problem is that it might be too difficult to automate the random generation of such a captcha.

I actually played with the idea of sentences before:-
http://www.thespanner.co.uk/2007/07/12/return-of-the-heyes-captcha/

But in the end I decided it was a weak form of CAPTCHA because:-
a) Computers can parse text really easily
b) It was weak against brute force attacks

The weakness in your example would be for a regular expression or algorithm to be developed which simply replaced the word instructions with programmatic ones.

However using a programming language makes it more difficult to solve for the computer because it needs to understand the context of the code to produce the correct output. I think various languages could be added quite easily and all they would require would be a syntax checking engine.

Update....
I've updated it again, slightly reduced the settings and set it to "low" randomisation now. The script has 3 modes (low,medium,high) I've not tried it on high yet :)

Update again...
Added a randomisation setting option, allows you to switch between low, medium and high.

In it's current state, i'm pretty confident i can write a captcha solver for it, atleast on low. (high looks like a nightmare, lol)
when you get any kinks out of it to a point i can take a fair swing at it - let me know .. i'd love to try ^^

For those testing Codetcha, i should point out that it's seemingly developed using firefox.. you'll get some buggy behavior in IE7, so use FF.

It's now REALLY buggy - 4/4 tests on medium and low have all returned that I am a robot, despite the code being perfectly valid and the most logical solution to the captcha. Oh, and I haven't tried high yet.. it looks insane.

Phew finally found what was going on with the function call. I had forgot to include the $ in the code mirror so every reference to a function parameter was being treated as a constant. Doh! Fixed now though :) I switched the errors off and ran loads of tests to make sure each result matched and it appears to be fine now.

I've also added 2 new random generations, which include if statements so automation should be more difficult now. Thanks for testing everyone! Please let me know if you find any other problems or suggestions

UPDATE...
I've increased the generation settings to make code scanning and guessing more difficult. I'd be interested if someone can automate an attack against it now so please have a go and let me know. The code is getting pretty close to release now and I plan a Wordpress and generic plugin.