It used to be a trickle of two or three a day, but lately we’ve been hammered with hundreds of automated SPAM submissions through the cformsII contact form plugin we use with WordPress. The contact form feeds into a Google Groups mailing list that forwards to the whole team.

99% of the SPAM advertised a dozen designer brand knock-offs: North Face, Gucci handbags, Burberry, etc. This stuff isn’t hard to filter by keyword. First we looked at Google Groups to see if there was a keyword filter to protect group lists from SPAM, naughty language, or whatever. Surprisingly nothing.

Next we checked the cformsII plugin. As versatile and flexible as it is, there’s no easy way to add a banned word list. There’s a couple antispam features, but the honeypot requires CSS changes to the site theme and we’d die before burdening you with a captcha to contact us.

cformII allows us to use a regular expression to evaluate each field of the submission form. Searching the support forum turned up tons of requests for a simple regex to filter SPAM words. Each was met with suggestions to search the forum for some epic post on the topic, but we couldn’t find it anywhere.

First we worked up a regex to match SPAM words with The Regex Coach. The problem is this matches words and evaluates to true, allowing only forms with these words to be accepted. We needed a way to negate the regex result. The equivalent of ! or NOT in most languages like C, PHP, Perl, Basic, etc.

Here’s the final regex we’re using after much painful tinkering. ^ and $ encase the regex. (?i) makes it case insensitive. (?:(?! and .)* negate the results of the list of SPAM words so only messages without these words are allowed. \b( and )\b is a list of bad words that are rejected from the message body and name, \b defines the word boundary.

Hope this helps out anyone else dealing with repetitive SPAM from the same few bots. Obviously its not a perfect solution for everything, but it stopped the flood instantly without modifying site themes or forcing you to use a captcha on the contact form.

This entry was posted
on Saturday, November 16th, 2013 at 1:58 am and is filed under site.
You can follow any responses to this entry through the RSS 2.0 feed.
You can skip to the end and leave a response. Pinging is currently not allowed.

4 Responses to “Killing cformsII contact form SPAM with regex”

A few suggestions which may possibly help –
You may perhaps consider something like many sites I see where they ask you a simple question with pictures (often the question itself is a picture). Simple things like “What is 1+4?”. I think that’s much less annoying than some cryptic disguised, contorted alpha-numeric code that you have to strain to read and often get wrong.
Another good example (but don’t tell him I said so ;) I’ve seen is over on the Ultrakeet site, look down below http://ultrakeet.com.au/contact/

If you want a nice simple way to deal with RegEx in almost any flavour, then consider the very powerful RegEx utils ‘RegExMagic’ and ‘RegExBuddy’ over at http://www.just-great-software.com/
These could save you a lot of time and hair-pulling.

Yes, Ahmad’s a pretty funny character, he comes up with some good stuff.
With the ‘question on a picture’ stuff I was talking about, they usually have a bunch of pictures that are put up randomly asking you simple questions, but ones that might confuse a script, so even if they have good OCR, it would probably get it wrong anyway.