Damn spam bots (Read 1841 times)

Several years ago, there was a spike in new account creations. Normally, I wouldn't have noticed except there was a corresponding spike in spams in the forums. Pretty much everything on this website is custom built, including the forums. I like to reinvent the wheel. Don't criticize me. I was somewhat amused that someone went through the trouble of writing a bot specifically to post on RA, but mostly I was annoyed because it meant I had to divert my attention to dealing with the problem.

I installed reCAPTCHA as deterrence. The CAPTCHA is an image containing distorted text that you (a person) can easily recognize and type into a text field but is impossible or very hard for a computer to do, which is how websites can keep the bots out. As soon as I uploaded the change, the new account creations went back to its normal rate.

Over the years, the spammers improved their pattern recognition algorithms and soon were able to recognize the CAPTCHAs with relative ease. To combat the improvements, CAPTCHAs became progressively harder to read. The ironically, the bots can now solve the CAPTCHAs faster than humans. The CAPTCHAs were so hard to read that I received complaints from potential new users.

Earlier this year, I removed reCAPTCHA because I feel it's hindering account creation. I created some home grown bot detection schemes, which consisted mainly of a "honey pot". It is a hidden password field that a person can't see. The bots on the other hand don't know the password field is hidden and would fill it with a password. If this password field contains text, then RA would know a bot is trying to create an account. RA also log these attempts so I could see what the spammers are up to. It turned out that about every 30 to 60 seconds, a bot would attempt to create an account.

All is good until two days ago, when the number of new accounts spiked again. The servers were still catching these bots, but some of them are successfully getting through. I think the spammers realized there's a honey pot and started adjusted their bots to avoid it. It is not a perfect scheme because the servers randomly designate one of the password fields as the honey pot, so the spammers never know which one is real.

Late last night, I implemented my new spam deterrence scheme. The "Create" account button is disabled for 15 seconds. There's a count down next to it to provide a visual cue to when it will enable. I figure that a person will take about that long to fill out the form. It stopped the bots immediately. The servers now also log how long it took to fill out the form. Since the bots don't click on the "Create" button to send the data back to RA, they take about 1 to 2 seconds to complete the form. Any form submission that’s less than 15 seconds is a bot.

I estimated the bots created about 450 accounts in the last 2 days. It's not many in the grand scheme of things. So far, the spammers haven't done anything with these accounts. Maybe it's because the spammers intended to use these accounts to spam the forums, but the custom forums software is confusing the bots.

For now, I have the upper hand but I don't expect it to last long. Pretty soon, I will need to come up with a new scheme. One obvious solution is to force the user to verify his/her email address before creating the account. I don't like this idea because I don't want the potential new user to wait for the email before he/she can use RA. If you have ideas on how to deter bots, please let me know.

The perseverance of spammers is to be admired. Imagine what they can accomplish if they apply their ingenuity for something good instead of being the constant parasites of society.

For now, I have the upper hand but I don't expect it to last long. Pretty soon, I will need to come up with a new scheme. One obvious solution is to force the user to verify his/her email address before creating the account. I don't like this idea because I don't want the potential new user to wait for the email before he/she can use RA. If you have ideas on how to deter bots, please let me know.

The perseverance of spammers is to be admired. Imagine what they can accomplish if they apply their ingenuity for something good instead of being the constant parasites of society.

eric

Eric, I don't know if you're creating some of these deterrants with your own innovation but if things like the "honeypot" are unique I hope you are protecting your IP because this is all very neat.

IMHO, if it saves you work, why not go with email verification? There are a LOT of sites that do this. With most, the email arrives so quickly that if I have to open another window to log into gmail it is already waiting for me, I click, and am verified. It adds perhaps 10-15sec to the whole process and I am never deterred from a site by needing to do this.

However, I have encountered a small number of instances where the email takes many minutes or even a few hours to arrive. I have always assumed this is because someone has to manually trigger the email to be sent. THIS is annoying and a deterrent.

As long as the emails go right out I don't see why this would be a problem for people. I'd rather your time go into positive improvements to RA than deterring spammers.

"If you want to be a bad a$s, then do what a bad a$s does. There's your pep talk for today. Go Run." -- Slo_Hand

Eric, I don't know if you're creating some of these deterrants with your own innovation but if things like the "honeypot" are unique I hope you are protecting your IP because this is all very neat.

The honey pot idea is not new. It only works if the spammers don't know there is one. Since they figured it out, I figure I can talk about it. The randomizing the honey pot was my enhancement. Obviously nothing is fool proof and I don't expect 100% success rate. I just want the number of real accounts to be more than fake ones.

You're right that eventually, I may have to use email verification. Even that is not perfect. When I moved my servers to a new ISP, I inherited a new set of IP addresses. That resulted in many mail servers, including hotmail.com, rejecting emails from RA. My ISP had to vouch for me that I am not a spammer and took a couple of weeks to resolve.

I have one more idea on detecting spammers. Not sure if it will work. After that, I might have to seriously consider email verification.

Google is fast! I wanted to check the reference of your quote and guess which page is the #1 search result? And apparently, I time traveled too because Google claimed the thread was created 12 hours ago.

IMHO, if it saves you work, why not go with email verification? There are a LOT of sites that do this. With most, the email arrives so quickly that if I have to open another window to log into gmail it is already waiting for me, I click, and am verified. It adds perhaps 10-15sec to the whole process and I am never deterred from a site by needing to do this.

However, I have encountered a small number of instances where the email takes many minutes or even a few hours to arrive. I have always assumed this is because someone has to manually trigger the email to be sent. THIS is annoying and a deterrent.

As long as the emails go right out I don't see why this would be a problem for people. I'd rather your time go into positive improvements to RA than deterring spammers.

+1, though I really don't even mind if I have to wait an hour or two to join a worthwhile site, to be perfectly honest.

I've noticed in the past week or two that my favorite cycling BB has been inundated with new spammers...of the variety that have to post several times before they have full membership and don't have to have their posts in moderation. These spam preparation posts are almost always the same--they resurrect an old thread and post some random mumbo-jumbo that only sort of relates to the thread and contains questionable use of the English language. Often they will have some random link in their signature. I assume these are the work of bots, too.

'15 Goals:

Like spaniel said, honey pot is a neat idea... protect it. On a side note, how are you hiding the password field? If it's an HTML hidden field, bots can recognize it easily.

For a new idea, how about a virtual keyboard? The only way to fill out the form is to click letters/digits on the virtual keyboard. You could force the users to enter a field or two that way. Also, I have seen some virtual keyboards randomly change positions of the digits making it even better.

Another idea, which I am not completely sure would work, is to ask the user to make random mouse movements in a specific area of the screen. The mouse movement generates a number, which they have to enter to register. I am not sure whether bots can simulate mouse movements.

I am wondering how the email verification will prevent the bots (not challenging, just asking). Could they not use something like http://www.guerrillamail.com/ or http://mailinator.com/?

running is somewhat like playing golf to me. crappy shots all day long, ready to give it up & wondering why I'm trying so hard just to get this stupid little ball into a stupid little hole but then out of the blue comes a monster drive or a long putt that actually gets into the cup. bingo! that one shot keeps me going for the rest of day no matter how crappy I continue to play & gets me back out again on another day. strange. -- skyedog

Javascript post-page-render of the password box (rather than in HTML) might also work.

The success of your strategy will likely depend on whether someone is actively customizing a bot to register for RA. There are a lot of auto-register programs. Many of them deploy effective strategies to very common anti-spam tactics (like CAPTCHA). If the spammers are mindlessly running a sophisticated bot, you stand a chance at beating them. If there's an eager coder behind the bot personally trying to foil your plans, it is a _lot_ harder.

There is KittenAuth. Pick out the kitten from the grid of 9 random picture. It is usually followed by marking the originating IP address and disallowing more than N number of failed attempts in an day. Success depends on how large your set is, and whether there is an eager coder behind the bot. (often bot writers will take human-challenges and post them on pr0n sites for humans to solve, then leverage the answers with the bot.

To keep the theme, maybe you use pictures of runners.

Natural language questions are cheap, but often eliminate non-native English speakers. "Which letter of the alphabet sounds like 'tea'?" But again, it is easily leveraged again using the pr0n solver.

Personally, I'm surprised the 15sec timer isn't more successful. It slows down the bot enough to make it hardly worth while to attack the site.

Good luck.

2014 Goals: sub-3 Marathon

Current Status 11/10: Back to building up miles. Junk feels mostly okay. Kinda.

For a new idea, how about a virtual keyboard? The only way to fill out the form is to click letters/digits on the virtual keyboard. You could force the users to enter a field or two that way. Also, I have seen some virtual keyboards randomly change positions of the digits making it even better.

Another idea, which I am not completely sure would work, is to ask the user to make random mouse movements in a specific area of the screen. The mouse movement generates a number, which they have to enter to register. I am not sure whether bots can simulate mouse movements.

I am wondering how the email verification will prevent the bots (not challenging, just asking). Could they not use something like http://www.guerrillamail.com/ or http://mailinator.com/?

Virtual keyboard is a great idea! Each virtual key would map to a random letter or number such that the spammers can't guess what the correct sequence would be. It's a bit of hassle for the user because it's not automatic, but it's simple enough to stop the bots.

The email verification works by slowing down the spammers. Most of the time, spammers would use fake email addresses so they will never receive the verification email. I doubt they'll set up their own mail server because they'll get spammed by the verification emails that their bots generate. Besides, such a server can be easily shut down.

Maybe enhance how you hide the hidden password field. That is, they may be able to read the visible or display property in html, or even in css, but you might be able to do the hiding via a javascript, which would be more troublesome for the computer program to read.

Or if they're just guessing, go to 100-1 ratio of fake password fields to real ones.

I suppose you could randomize the names/ids of the text fields, as another idea. But, they need nearby text that is meaningful to humans still -- could use some tricks to move the labels far away in the html, and position them via css, or javascript.

PS: A honeypot is a concept that has been public for decades, I think, so we probably don't need to get too excited about patenting it now.

MTA: Whilst I was typing this, the virtual keyboard suggestion appeared, which is much cleverer than my ideas

There is KittenAuth. Pick out the kitten from the grid of 9 random picture. It is usually followed by marking the originating IP address and disallowing more than N number of failed attempts in an day. Success depends on how large your set is, and whether there is an eager coder behind the bot. (often bot writers will take human-challenges and post them on pr0n sites for humans to solve, then leverage the answers with the bot.

I looked into various deterrence methods and KittenAuth was one of them. It's a neat idea, but also has the problem of requiring the user to understand English. It's not a problem right now because RA hasn't been localized yet. I'm trying to come up with a method to allow users to help with the localization effort but that's a completely different project.

Like you said, picture identification has several weaknesses. The most obvious is the size of the image set. An attacker may do a brute force attack by downloading all the pictures. In a 3 x 3 set of pictures, the bots still have a 1 in 9 chance of getting through. It's not perfect, but it cuts down most of the junk.

I'm currently tracking the IPs of the bots, although I doubt that's useful. I suspect many of these IPs are from compromised computers that are part of botnets. I recognize that this is a virtual arms race that site owners will always be on the defensive side. The 15 seconds delay does slow them down.