Stopping Spam Comments in Drupal 7 — An Overview of Modules

Getting comments on your articles and such is great — you know people are reading your stuff, helpful visitors will expand, refine and correct your posts. It’s all good. Except when it isn’t. When it isn’t good, you get tons of spam submissions and you feel like you’re spending all your time sifting through them. Fortunately, Drupal offers a multitude of ways to combat spam comments, some of which will work better in some contexts than others. The most popular of these methods can get really annoying. I just abhor CAPTCHAs, but because they tend to be the default, I’ll get that out of the way first and give a quick overview of CAPTCHA options for Drupal. Then I’ll list some of my prefered methods that don’t annoy people as much.

Stopping Spam by Annoying People

The easiest way to stop spam comments on a Drupal site is to require registration — only logged-in users can post. In practice, this mostly means no comments. I’m not signing up for an account on your site just to post a comment and you’re not signing up for an account on my site just to post a comment. So you stop comment spam, but you also stop comments!

The Dreaded CAPTCHA

The next easiest is to install some type of CAPTCHA on your site. Your users have to solve a puzzle in order to post. Unfortunately, the spambots have gotten so good at solving CAPTCHAs that I’m increasingly finding them impossible to solve myself. I have to reload five and ten times to find one I can solve. Often, I just go away. As the bots get better, they will eventually be better than humans at solving this type of CAPTCHA. In this particular arms race, human users will eventually lose. That said, sometimes CAPTCHA is a useful tool and Drupal has several CAPTCHA modules.

CAPTCHA Pack — and add-on for the CAPTCHA module that adds different methods.

reCAPTCHA — integrates the reCAPTCHA system which shows two words, one known and one unknown. You solve the known one to prove you’re human and solve the unknown one to add it to the database. Typically these words are taken from text documents and used to transcribe them, one word at a time. The first big success was transcribing the full New York Times.

Text CAPTCHA — a little less annoying, this uses a logic question to test the user, such as “The number of body parts in the list face, knee and six is?”

Not So Fast — a variation on CAPTCHA I guess, but much different. I’m putting it here anyway because it requires actual action from the user to prove that she is human. In this case, if the anonymous user enters an email address that has not yet been approved for posting, that user gets an email with a link he or she has to click on before the comment gets approved. The admin can see in the comment queue which comments are from people with verified emails and which are not. Since bots will basically never leave a valid email address, this should stop most of them.

Stop Annoying People and Start Annoying Spambots

Thankfully there are a lot of alternative methods to putting CAPTCHAs on your site and I’ve generally found them to be effective, often in combination with one another. Some use hidden form fields to trick bots, some require a “proof of work” that demonstrates a human is on the other end (keystrokes or an elapsed time or something), some use third-party systems to analyze your comments after submission, some check IP addresses against known spammers and so forth. All of these tend to be a lot less annoying and intrusive to users than logins and CAPTCHAs, but nevertheless fairly annoying to spambots.

In addition, the Spambot module checks the user against a blacklist of known spammers and is aimed less at comment spam per se than at user registration spam (which amounts to much the same thing if you require registration to comment, but is rather different if you have more of a blog-style commenting setup which is rather open). From the project page for Spambot: “Spambot protects the user registration form from spammers and spambots by verifying registration attempts against the Stop Forum Spam online database.”

Finally, the Cloudflare reverse proxy service also acts as a spam blocker to some extent. All requests to your site are routed through Cloudflare. The main purpose is to speed up your site by serving cached resources from a geo-distributed network of servers, but they also check the requester IP against blacklists of known spammers and simply block the request if they find it suspicious.

The one issue with all service of this sort is that you may be catching users who are legitimate but, for whatever reasons, trip the flags that cause Mollom of Akismet of Cloudflare to block them. Most services have some fallback. In the case of Cloudflare they put up this big “Blocked!” page and the user has to choose to send a message to the webmaster. I’ve never received such a message and I rediscovered Cloudflare one day, after having investigated and forgotten, because I was blocked from a site as a “threat” to something or other. Generally I’ve been pleased with Cloudflare, Mollom and Akismet, but I have no real measure of the number of false positives they cause. It’s not huge — I haven’t seen traffic or comments collapse after adding them. But there are no doubt some false positives.

Relevant Drupal Modules:

Mollom. It should be said, not everyone is a fan of Mollom. Check out Randy Fay’s complaint (and read the comments to that post — some great info in there; thanks to kiamlaluna’s comment in this Drupal Answers thread for the reference) as well as Randy’s followup issue in the Mollom module issue queue. These issues pertain primarily to the false positive problem and though they concern Mollom specifically, I’m not sure Mollom is any worse than any other. The Cloudflare filter is the only one I’ve personally tripped.

Cloudflare — completely optional. Cloudflare functions well without this, but this module makes sure visitor IPs are reported correctly (rather than using the IP of the Cloudflare proxy) and integrates the Drupal IP banning with your Cloudflare account. Note that the IP issue can also be handled server side for Apache with the installation of mod_cloudflare, which is likely much more efficient than doing this on the Drupal end. So if you don’t need the additional integration and want higher performance, you would want to change your server setup and just do without this module.

As far as your users are concerned, they are virtually transparent, but will still block the vast majority of spam submissions to your site. If you pass a certain volume, though, you will have to pay a fee. So in order to stay below the volume allowed, you may need to take other measures.

Hidden Field (a.k.a. Honeypot) and Proof of Work Solutions

You’re probably familiar with the honeypot method of fighting email scrapers. You create a hidden link that a human user can’t see, you block bots from crawling it in your robots.txt file, and ipso facto, anything that arrives there is a bad bot. Form submission honeypots work in a similar way. You create a field on a form that sounds like it should be filled in, but then you hide it from regular users using CSS. Since most form submission bots don’t parse the CSS, they don’t know the form field is hidden and they fill it in. So if the field has any value other than blank, we guess that a bot is submitting the form. There are several Drupal modules that can offer this.

Proof of work methods, sometimes called “Hash Cash” methods, are somewhat similar in that they are invisible to the user. They can include things like using Javascript to verify that actual keystrokes or clicks get recorded or they can have a timestamp and if the form is submitted with no keystroke having taken place or is submitted too quickly, the module assumes that only a bot could do that.

I’ve generally found proof of work and honeypot methods to be fairly effective and since they’re just adding a little bit of data to the form in some way, they are generally compatible with third-party methods like Mollom and Akismet as well.

Relevant Drupal Modules

Hashcash — an old favorite in both its Drupal and WordPress iterations. This was, as far as I know, the original proof of work spam preventer and it has worked well for me on several sites.

BOTCHA — hidden field spam blocker that can be used side-by-side with the CAPTCHA module for belt and suspenders protection (this may be true of the other hidden field modules, but only BOTCHA specifically suggests this on its project page). BOTCHA combines “multiple recipes” such as adding bogus fields, shuffling fields and labels, using Javascript to create a hash on the submit button and so forth (see the TODO file for the project which gives a list of some of the recipes already implemented). So BOTCHA includes fairly advanced honeypot-style recipes and a variety of proof-of-work recipes.

Behavior-Based Methods

Behavior-based methods may look at past behavior (for example consult lists of known bad IPs) or measure current bad behavior (following links on your site that are invisible to humans and blocked by robots.txt, thus indicating a rogue bot). There are a lot of different approaches here, but they all share the basic characteristic of using actual behavior to stop bots.

Bad Behavior — implements the Bad Behavior library of PHP scripts. In brief, “Bad Behavior pioneered an HTTP fingerprinting approach. Instead of looking at the spam, we look at the spammer. Bad Behavior analyzes the HTTP headers, IP address, and other metadata regarding the request to determine if it is spammy or malicious.” The creator encourages you to consider a “first line of defense” and couple it with Akismet, Mollom or whatever.

http:BL — implements http:BL on your site for blacklisting based on IP and also catching scrapers and contributing to the Project Honeypot database.

The venerable Spam module is not yet out for Drupal 7, but warrants a mention because it has been an old stalwart for such a long time and as of August 24, 2012, it was getting close to a usable release for Drupal 7. Unlike Bad Behavior, it actually does look at the spam itself, looks for links that it knows to be spammy, known IPs and things like that and can auto-ban spammer IPs.

Block Anonymous Links — if a comment from an anonymous user includes links, just block the comment straight away. Not so good if you want anonymous users to be able to include links, but 99% of the time, only spammers add links to your comments.

BlogSpam — connects to the BlogSpam service which does things like check comments against known bad IPs, looks for URLs that appear in blog spam (as these are more stable than the IP numbers of the spammers), checks to make sure the spammer email address has a valid MX record and so forth.

Reactive Methods

By reactive methods, I mean things that help you after the fact to stop a spammer from coming back by banning their IP. I don’t find these very useful these days since spammers will use so many different proxies, but eventually you might get a big percentage of them.

Relevant Modules

Go Away — displays IP on all anonymous comments and provides a form for entering IP and banning it.

Where to Start?

Personally, I like to start with the least intrusive methods — things like Hashcash and Mollom that don’t annoy my users. Also, think about your overall site architecture. CAPTCHA can have some issues if you have aggressive caching modules like Boost active, so you’ll have to try some modules and then test them as an anonymous user to make sure you can still comment. On the one hand, you’ll have to ask yourself whether or not you’re making it too difficult to comment and discouraging participation, but you also have to remember that there is a cost to dealing with comment spam. If you spent all that time writing good content for your site, what could you accomplish and how many new people could you bring in?

Help!

Do you know a good Drupal spam-smashing module? Please talk about it in the comments and help others!

This is a great review of captcha / spam-blocking options. Thanks for putting it together!

Are you aware of any problems with combining multiple solutions? Maybe something like BOTCHA + Bad Behavior + http:BL? I’m trying to find a method that covers as many bases as possible without bugging the user or having to use Mollom (as per the Randy Fay thread you already mentioned).

I’m not aware of any problems with multiple solutions. In fact, I pretty much always have at least two things going. As I mentioned, I prefer Proof of Work solutions and so I usually use some Hash Cache solution along with Mollom or Akismet

About Raised by Turtles

There’s not much to say. This site is maintained (to put it charitably!) by Tom Lambert. Sometimes I have things to write about. Some of those I actually do write about. A proportion of those might even be worth writing about. If you don't agree, nobody shackled you to a keyboard. A more interesting blog is just one click away.

If you have a question, comment or injurious insult, go ahead and send me an email.

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.