The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Hybrid View

Avoiding CAPTCHA

Currently on my website I've been using CAPTCHA for private message conversation reply forms, commenting forms, and almost every form page on the website (that creates new records).

It gets pretty expensive for the server to have to create, draw and store (the text in the db) the CAPTCHA-related info EVERYTIME a form is rendered (even if the form isn't even filled out).

I've thought about having a spam filter or a anti-flooding system for the website, but I'm not really sure 100% of how to implement it. Should I create a table for all the recent requests? Should I create a minimum time delay (say 30 seconds) between form posts?

Even if I don't use CAPTCHA anymore, there's still the problem of bots posting data onto the website automatically ... as long as its between the time periods.

You could also pregenerate a bunch of them and store on the filesystem. eg
1.jpg
2.jpg
3.jpg
Then store the text for each in the db, which will be very low overhead since you just pick one randomly.

If you're really concerned someone will write a bot which a human can spoonfeed the question/answers to, so it can later recognise one of these known images on its own, you could do a non realtime garbage collection and regeneration routine, so they gradually get cycled out and won't last too long, making it difficult for them to ever get a good chance of getting a known image.

On my site, I am removing captcha as much as I can. Its very annoying to the end user.
Only register form has a captcha. If the input form is only available to the logged in users then I guess there's not much need of captcha as a secure login can deter a lot of bots.

If, for some reason, you ever need to have a CAPTCHA check on every form, you can store a temporary cookie after the user has solved one CAPTCHA to remember that the user is a human.

Good idea, but you would still need to have a spam or floood filter on the backend to filter any brute force traffic.

Lets say that a user proves that he is not a machine and then gets the priviledge of not having to decode letters from an image, then what about if someone brute forces using that account ... no CAPTCHA = no security.

OK so lets say that you do have something on the backend to filter the spam, this works, but you would need to create some sort of flooding system. This is kinda countering the idea of using CAPTCHA, because with a CAPTCHA image, as long as you enter in the information then you're fine (post as much as you want).

True, but if were gonna talk about defending against human assisted bots, were gonna be talking for a lonnnnnnng time

Flood controll and captcha are different things. Using the natural flood controll effect of a captcha limits you to the speed a human can enter characters into a text field while looking at an image. It would probably take me all of 15 minutes to write a bot that will present me with your captcha image and a text box. I solve the question, my bot forwards it to your site for me and includes a spam message. It then immediately fetches the next imge and presents to me again. I can probably get a message sent every 5 seconds now, because it doesnt take long for me to read an image and type, and there isn't any form of flood controll.

Well, most sites don't run into a problem of flooding. Flooding one site with a bunch of advertisements at one time is not effective when your point is to advertise. One SQL query could delete a spammer's work. Unless you fear that you will receive continual targeted attacks on your site, it should not be an issue.

Most sites have the problem with entirely automated bots that scan the Internet for forms to fill out. One CAPTCHA is good enough to defeat these bots. You don't even need a complicated CAPTCHA to defeat these.

I'm just suggesting there comes a point where you need to draw the line, and that point is usually around when you start trying to defend against humans(or human assisted bots).

If someone specifically wants your site bad enough code a bot specifically to defeat your proprietary system, how much of a difference will it make if they only need to give it the answer once, and then it can use the "authorized" cookie to post repeated messages, over them having to actually supply a new answer for each message? Probably not much, but maybe the site really is that desirable, and requiring a human to babysit the bot would make it financially unfeasible if the goal is to send massive amounts of posts.

Perhaps having a random number (say 1/5) determine if a CAPTCHA will appear. This way, if a bot does get to shoot a ton of requests, then they might get a few requests in before the random CAPTCHA number appears. However, as soon the bot hits a CATPCHA input, and gets it wrong, then it'll have to continuously INPUT in the characters correctly. If more than 10 errors are made in the CAPTCHA, then that user accuont is blocked for a few minutes.

I was thinking of doing something like this on my server:
- I generate the images and store them in my DB (have quite a few, 1000+)
- When I display and image, i select a random one, and crop it at some random position (so 0-10 px of the image margins can go away)

When users take actions that update something (my problem is so users can't update stuff to much), i add to a counter. If they update more than X things in Y time, I lock their account for 1h and show a CAPTCHA image.
They get Z tries at the CAPTCHA (every time they fail the CAPTHA changes, just in case they can't read it), and if they get it right, the account gets unlocked and their counter reset.
If they get it wrong, they could unlock their profile from some email or something.

The idea is that normal users never see this CAPTCHA, unless they update stuff to much, and then they see it once every X minutes.

Note that my problem is for people using bots on one of my browser games, so this system is not designed to stop people from gathering information from my site.

Let me clarify (I lost track of who was the OP earlier):
You would only remember the CAPTCHA bit for the current session for an X amount of time.

However, what I said wasn't really to solve your problem. It was an addendum to khuramyz's comment about how annoying CAPTCHAs are to end users. I assume the biggest load to your servers for CAPTCHAs would result from the many guests that visit that have no intent of submitting a form. Your server would generate a CAPTCHA image regardless, so my addendum doesn't directly solve your problem.

Are you really expecting targeted attacks against every form on your website?

So many sites use the Captcha, some times the captcha not clear, some viewers would get away. so i think if you want more user not use the captcha. if want to control the users quantity, you should use it