Posted
by
Soulskill
on Monday February 20, 2012 @01:27PM
from the soon-you-will-need-to-authenticate-in-person dept.

Orome1 writes "After creating the 'Decaptcha' software to solve audio CAPTCHAs, Stanford University's researchers modified it and turned it against text and, quite recently, video CAPTCHAs with considerable success. Video CAPTCHAs have been touted by their developer, NuCaptcha, as the best and most secure method of spotting bots trying to pass themselves off as human users. Unfortunately for the company, researchers have managed to prove that over 90 percent of the company's video CAPTCHAs can be decoded by using their Decaptcha software in conjunction with optical flow algorithms created by researchers in the computer vision field of study."

There must still be computational areas in the visual domain where we humans are way more efficient.

Even if that is the case, there is still a relatively straightforward attack on captchas: the mafia porn site. It is generally easier to use a mechanical turk to decode captchas than to attack captchas algorithmically.

I've always thought that going with a higher level thinking would be harder to break. Instead of copying letters from an image you have to identify a set of images that is easy for a person but more difficult for a computer. Think children's picture book type deal. Can a computer reliably tell a dog from a cat from a cow?

I've always thought that going with a higher level thinking would be harder to break. Instead of copying letters from an image you have to identify a set of images that is easy for a person but more difficult for a computer. Think children's picture book type deal. Can a computer reliably tell a dog from a cat from a cow?

I think that's a pretty good thought. I'd extend it with perhaps one of those, "which of these things doesn't belong" type of setups (which may have been what you meant). It could then show pictures of a banana, an apple, an orange, some grapes, and a baseball hat. I don't know, perhaps there is a way to solve these easily by computer. But I know the stupid text CAPTCHAs that I had to go through yesterday to sign up for one site were so "obfuscated" that I couldn't read them either and I had to click the

I know I've seen this idea before. I wonder why I've never actually seen it implemented anywhere. It seems pretty easy to do to. Collect images (either drawings or pictures), and assign tags. For example an apple might have the tags 'apple', 'fruit', 'food', and 'red'. Then when the system generates a captcha, it picks a random tag in its database, and finds 4 images with that tag, and 1 without. The user should be able to pick out which images isn't a 'fruit' or 'red'.

Nah, it's just as easy for a machine to recognize an animal as it is to recognize a character. And we're getting to the point where any question that has an objective answer can be answered by a search engine.

There must still be computational areas in the visual domain where we humans are way more efficient.

On your left, you will see 21st century purely organic brains. Their limited capacity neural networks had not yet been mechano-electrically enhanced with additional storage, high speed neuronal interconnects, broad EM spectrum sight, or even simple wireless intercourse, or "telepathy" as the luddites of the past initially called it.

On your right, you will see the first machine intelligence construct which exceeded human levels of complexity. Not to worry, the intelligence that once inhabited this form

You just need to make the decoding much harder than the encoding. There must still be computational areas in the visual domain where we humans are way more efficient.

You fall into the same tired old trap of believing this is some kind of arms race: a game of escalation. It's not. It's a matter of finding the things that computers are not very good at; which is usually about context, and more specifically, culture. In other words, it's not the visual domain, but cultural markers where computers are simply unable to compete with humans. The danger then, is alienation. You have to target your audience carefully, and in a localized manner.

The problem is not creating things which are hard for computers to decode, it is creating things which are hard for computers to decode but easy for humans. That is why captchas will ultimately fail: they rely on the idea that there is something that human brains can understand which computers cannot decode, but which computers can still generate.

As soon as computers are as capable as people, captchas are no longer necessary. Then computers can directly detect and block unwanted behaviour. With the added advantage that it can block that behaviour even if real humans do it.

Except that the ability to solve a simple puzzle may be different from the ability to recognize spam (which is what we are really trying to stop here). Even if you had a computer that was better at solving CAPTCHAs than humans are, you might still be unable to detect the specific class of unwanted behavior that you were trying to defend against. Now, if the CAPTCHA was asking you to label a series of short messages "spam" or "not spam," then perhaps your point would hold...except that it would be far too

Yes, lets make a stupid law that you can't use a computer to do audio and image analysis. I'm sure we'll have some sort of airtight clause about "only for CAPTCHAS" that will prevent that law from being perverted to stop legitimate uses of image recognition. I mean, we wouldn't want anyone but the federal government doing video analysis would we.

What does breaking CAPTCHAs really do that's so bad to society? Comment quality goes down due to spam? a ticket scalper buys up a bunch of tickets to an event o

..if your user can interact with it, they can screw with it. The nature of HTTP and the web is a stateless environment, one has to impress state onto it for things like secure transactions and sessions. Basically, you need to come up with a test that randomly checks to see if the input is coming from a person; all without breaking the experience of the web browser, or the web in general. It's an arms race, and things are even again; another advantage bites the dust.

The catchpa is worthless against an army of Indians being paid just pennies a pop to break them. The only thing they do is annoy the script kiddies. Far better success would be had in doing pattern recognition on sign ups instead.

Words cannot express the rage I felt when I needed to register an XBox Live account to play a game I purchased because of the stupid G4WL DRM nonsense. I spent around 10 minutes on the bloody captcha because it differentiated capital, lowercase, number, and symbols. It was the most absurd captcha system I've seen to date. Was it an O, and 0? lowercase L or uppercase I? Was that a dollar sign or just some lines thrown in to distort the word further? An M or a W flipped on its side (was a 90 degree squiggle t

We have to face that fact, capcha is just a temporary measure anyway. Software is rapidly approaching the ability to do anything online that your average human can. While computers rapidly increase in capability, the average human stays the same. Eventually the only way to tell a computer from a human, will be the humans are easier to confuse.

And yet there are still those Craigslist 'employment' ads that promise 400/week for 5-10 hours 'work' spamming newsgroups and such. If it were automated, those 'jobs' would be lost. No big deal, really, cause when it's all said and done, it works out to about 25 cents an hour.

It doesn't matter too much which problem researchers focus on - they are solving the problem of human (and then superhuman) capabilities in this area. Captcha's are nice because you have a self-funding opponent creating test data for you.

If you have a small-ish site that caters to a niche community where your target audience will share some knowledge that non-target folks don't have, a riddler where you can set the questions can work great. Just structure your questions in such a way that the answer is non-obvious in an automated way to all but the best AI engines.

And even that isn't as clear-cut as you might think. Most people probably think that ATI is superfluous, but if so, they're wrong.If you say "ATI, nVidia and Intel", you don't need to mention AMD cause it's impled, thus AMD is superfluous.

If you make a question unambiguous enough, computers can answer it too. You can overwhelm a computer system by the sheer amount of ways to ask things, but then you need a human, who in the long run can't produce captchas as quickly as a computer can fail them.

It's just an illustration, but just like it can be hard for humans to decipher a captcha, it could be hard to understand the logic -- Intel, AMD and NVIDIA are all companies where ATI was actually purchased by AMD and would thus make it superfluous.

If it were easy to answer, it would be easy for automation to crack it.

ReCAPTCHA needs to be retired. OCR is getting too good. ReCAPTCHA, remember, is using images from book scanning, ones that the OCR system couldn't recognize.
When ReCAPTCHA started, the text presented was usually an English word. Now, if the book scanning OCR system can't figure out something, it's probably not an English word. You're lucky if it's a sequence of characters found on an A-Z keyboard. People have reported ink blots, mathematical formulas, and Cyrillic.

Worse, ReCAPTCHA's idea of the "right" answer is crowdsourced. It's possible for bots to pollute the ReCAPTCHA database, by providing the same wrong answer more than once. You only have to get one of the words right, so if you can read one, a junk response for the other works. This goes into the database as a vote for the "right answer", to be presented to someone else later. I sometimes type "whatever" when one of the images is unreadable.

You're missing an opportunity to add words to past texts. I always type "bunga-bunga". My hope is that someday in the far future, a scholar of historic literature will be scratching his head wondering why all these old books have the phrase bunga-bunga thrown in at random places.

You're missing an opportunity to add words to past texts. I always type "bunga-bunga". My hope is that someday in the far future, a scholar of historic literature will be scratching his head wondering why all these old books have the phrase bunga-bunga thrown in at random places.

Another reason I recently realized that recaptchas are useless: The whole idea is that one of the words could be read by a robot [spoiler]from the start[/spoiler] to be included in the rotation. Now, granted, they've modified the word to try and anti-robot it, but the fact remains that at some point it was readable; the other "word" never was. Thus it had a limited lifespan until the spambots caught up in OCR to Google's bots.

I've spoken with the founder of ReCAPTCHA about this when he came to campus for a talk several years ago. It's both the expected end game and seen as a victory ("we forced OCR to become usable with market pressures").

Don't worry, they have other puzzles in the queue that need machine comprehension models.

Worse, ReCAPTCHA's idea of the "right" answer is crowdsourced. It's possible for bots to pollute the ReCAPTCHA database, by providing the same wrong answer more than once. You only have to get one of the words right, so if you can read one, a junk response for the other works. This goes into the database as a vote for the "right answer", to be presented to someone else later. I sometimes type "whatever" when one of the images is unreadable.

Not just bots - humans can (unintentionally) do it as well. Sparkfun (an electronics hobbyist site) recently had a giveaway in order to stress test their servers. Several thousand people were solving CAPTCHAs as quickly as possible. There was a noticeable drop in the accuracy of the answers required, since a lot of people were taking shortcuts in entering them.

Because image recognition research is beneficial in many areas. Also, Captchas are mostly snake oil as there are tons of Indians willing to be paid next to nothing to break thousands and thousands of Captchas for the spammers anyway.

Because "focusing on Captchas" is dealing with image recognition directly? Besides, improving OCR to break Captchas directly helps improve the ability to OCR old and badly scanned works. Also, do you think that if these people did stop working on it that no one else will? Isn't it better for the good guys to be showing us the weaknesses rather than the bad guys exploiting it due to everyone being ignorant of the flaws? You can't fix flaws if you stop people from researching into them.

Well, the whole CAPTCHA system is itself flawed - it's putting all the data in one place. The only way to make it harder would be to have multiple data sources for users to have to put information through - e.g. not simply one CAPTCHA to verify, but 3 or 4 separately loaded, and all indepent of each other. (Even 2 would be an improvement.)

Still, it would only be a matter of time before the bots figured out how to track all the CAPTCHAs and thereby defeat it yet again.

Actually, the entire reason we have captcha is because the techniques you just listed don't work any more. Bots learned to run Javascript and ignore hidden fields years ago. Even if the bots could not do those things, it still wouldn't matter because whoever codes the routine to submit the form will pick up on those things. The best you can do is make it inconvenient enough that they will pick another target instead. But if you are Yahoo or Google or Wordpress, that won't deter them.

I agree with these sorts of solutions to stop bots. It works on sites I've put together because none of them were very high profile for spam attacks. Get a site that is worth it for spammers to crack and they probably will.

IIRC, eBay does all sorts of javascript loads and changes their HTML layouts commonly to reduce screen scrapers from crawling auctions. This cuts down on the problem, but people are still able to find a way to do it if they want it enough.

I NEED one of these captcha solver programs. When I try to register for a website or forum, many of them are so unreadable it takes me 20 minutes of trying to get it right and NO PHONE NUMBER to call their technical to register me by tele.

What about charging 10-15 seconds of CPU time with some arbitrarily hard code? It seems like everyone agrees that CAPTCHAs are an arms race that the good guys can't win, why not make it where it isn't profitable to solve the CAPTCHA replacement on a large scale?

What about charging 10-15 seconds of CPU time with some arbitrarily hard code?

A major obstacle to this is that you have to make the puzzle easy enough that your users on lower-end or mobile devices still have the necessary computation power to complete the puzzle in a reasonable time. Malicious organizations behind the spam will just put more hardware into their attack, typically by using the compromised machines in botnets. They'll also optimize the code, and parallelize the attack by performing the computation for multiple attempts on multiple CPU cores, while your code has to wo

Let's now imagine a perfect world in which you create a check that actually takes 15 seconds to complete. They can still do that 5,760 times per day.

The point of this proposal is not to stop spam entirely, but to keep the rate at which spam can be sent down to manageable levels. If a spammer can only send 5760 spam messages per day, that is a big improvement -- right now spammers are limited only by bandwidth, and can send tens of thousands of messages per day.

The state of OCR has changed little in over a decade, at least at the consumer end. I've tried the top software like Acrobat Pro and Omnipage and hardware solutions from Xerox, HP, Fujitsu, etc. The text can be printed clear as day yet, with no flaws, and the OCR programs all fail to get above I'd say a 70% accuracy. Maybe it's different in the commercial world, where one can afford a $25,000 glorified copier, but I've been unable to find anything you can buy from Amazon or the like that will reliably scan

There are a variety of low-tech techniques that can be more effective than using Captchas or even "security questions", especially when you mix and match. You don't have to annoy your legitimate users, or make them jump through hoops. One trick is to include a "honeypot input" in your form. Give it a tantalizing name attribute such as "username", give it visibility of "hidden" (with CSS from a style-sheet), and when validating your form simply check to see if any values have been entered. If it's non-empty,

The key with CAPTCHAs is diversification, just like the key to avoiding disease in biological specimens is avoiding a monoculture. If there were 15000 different CAPTCHA methods, it wouldn't be profitable to create CAPTCHA tools that would each only work on some small subset. There are a lot of low population sites I use that check whether I'm a human with some unique set of hoops through which I must jump. The effectiveness of those hoops comes from the fact that they're often unique to that site, not a lump of code used by thousands of different sites. Diverse CAPTCHA breaking might require something like Watson, which isn't going to be available to spammy types in the near future.

Have the captcha page displays some really good porn video footage - drawn from a huge repository of suitable images (say, the rest of the internet). The clips are fairly long (say 3-5 mins or so). To pass the captcha the user merely has to click on a button at the right time.So, if the user clicks right away, its a bot. if there is a suitable pause (say 3-5 mins), then its more likely human:P

I have to wonder just who Standford is trying to help out with this research. Captcha's may be annoying but when their research makes its way to the script kiddies and the industry comes up with a new solution does anyone really think the new solution won't be even more annoying?

I always thought that was pretty secure because the machine couldn't tell which picture was a cat? What about combining video and cat captcha. 4 videos, one of which is a cat. But it could be a close video, or a zoomed out one where the cat is running around. A computer really shouldn't be able to decode that. Use a large enough database and they'll never solve it.

if I have to start watching videos just to sign up for some forum or so, then the sign up is probably just not going to happen. Your idea sounds several orders of magnitude more annoying than the already highly annoying captchas in use (with ReCAPTCHA on the top of annoyances - most of them are simply unreadable).

Catch captcha has been around for awhile, but it doesn't even have to be a video. It could be 4 animated gifs. A computer would have a hard time deciphering an animated gif of a cat running across a room. But for a human they're very easy. Easier than recaptcha.

I got to chat with Luis von Ahn, co-creator of the Captcha and reCaptcha, and it turns out he's a surprisingly idealistic guy. Taking inspiration from people in gyms pedaling and going nowhere, he hoped to actually *do* something with the brainpower needed to solve a reCaptcha (he said something along the lines of, "actually your brain is doing a pretty amazing thing -- translating an image to text.") Maybe digitizing the archives of the New York Times and ancient manuscripts isn't world hunger or world pea

Made to look like a captcha, with the text, "What is 2+3?" Spammers read the captcha and submit that back. Normal people type 5. "What site is this?" is another good one. Heck, you don't even need to make it look like a captcha; it's just funnier that way. One site I mod used to get dozens of spam threads a day, until a couple years ago they added a box to the end of their registration with one of a handful of questions like "what do seal clubbers club" (answer: seals), or "what is the first letter of the a