Notes from the field in the War on Spam

Bad Behavior 2.1 and 3.0 Roadmap

When I released Bad Behavior 2, I noted that due to time constraints I was unable to complete everything on the roadmap. Most of that is because spammers have dramatically stepped up their activity in recent weeks and the new version provides greatly improved protection against their attacks. Part of it is that as an unpaid project, I can only devote so much spare time to it.

Now that Bad Behavior 2.0 has stabilized, it’s time to update the roadmap in preparation for the next minor (2.1) and major (3.0) releases.

Before I go into the roadmap, I need to diverge a bit and explain something a lot of people may not be aware of (again).

Bad Behavior is open source software, released under the GNU General Public License, which you can find copies of all over the Internet, or included with the program. (And I make exceptions for linking it to non-open-source software such as ExpressionEngine; contact me if you are in this situation.) You don’t have to pay a cent to download or use it. However, developing it still costs me time and money. Killing blog spam has been mostly a labor of love, however, rather than cash, and as such, has to take a back seat to other more pressing concerns, like anything that generates revenue.

I’ve been pretty successful at maintaining a roughly weekly rate for incremental updates (new spambots, bug fixes, etc.) since the 2.0 release, and with your support, financial and otherwise, I’ll be able to continue that. I think most of the bugs have been worked out at this point, though, so it’s time to look forward.

If you see any problems with the roadmap, or think it could be improved, feel free to comment on it.

Bad Behavior 2.1

Bad Behavior 2 was a ground-up refactoring of the core of the system. Much to my surprise, it wound up being both smaller and faster than the 1.x version. Though people more experienced than I am could have told me that (and one did). For many people it’s orders of magnitude faster. But the early release meant that several things were left unfinished, and those I want to address in 2.1.

First off is the modular architecture. While I made much progress on this, and it’s now near its final form, one more thing will need to change: The parts of the system which are specific to a particular software package (ExpressionEngine, MediaWiki, WordPress, etc.) and those which are user-customizable need to be further separated, so that the core can be updated independently of the wrapper which connects it to your software.

While this isn’t a major issue for WordPress, the architecture of MediaWiki, which has no good way for an extension to save settings, and that of ExpressionEngine, which virtually requires such an approach, are forcing the issue.

By Bad Behavior 2.1, you will have in essence two packages: A core download and a platform-specific download, each of which can be updated separately. While this introduces a bit of complexity, at least for initial installations, it will make updates much easier for most people, as well as allow for several more interesting things down the road. The ExpressionEngine port already uses this approach; to install it, you have to download the Bad Behavior 2 core as well as the EE extension and then integrate them. Ultimately I’ll have a packaging system in place which will make the initial download easier by combining the two into a single download for those who want it, and a core-only download for those who are updating.

For 2.0, I had proposed an administrative screen which would appear inside the host platform and provide various services such as being able to search Bad Behavior’s logs for specific spammers or for potential false positives. This will be complete for WordPress by 2.1. I had planned a MediaWiki special page, but discovered to my dismay that no accurate developer documentation exists for this, so it is on hold indefinitely, until someone updates and/or corrects the documentation on meta (which Brion tells me is wrong and should not be relied on) or provides new documentation. I also plan to provide this for ExpressionEngine, assuming my developer license is still any good.

I also noted that I planned a type of screener which would help sort legitimate browsers from those which were sending spam. This screener, which uses a combination of JavaScript and cookies, was partially implemented in 2.0, but the checks which actually make it work aren’t enabled, as I had not fully debugged them by release. It also turned out to be difficult (maybe impossible) to implement part of it for MediaWiki. I hope to have the screener working for WordPress and MediaWiki by 2.1 and for other platforms at a later date.

Bad Behavior 3.0

It’s a bit early to say exactly what the next major version of Bad Behavior will look like. But one thing is likely to come down the pipeline.

Many people have asked for Bad Behavior to automatically update itself whenever a new version comes along. After the necessary architectural reworking is done for 2.1, it will be possible to provide a framework for Bad Behavior to update itself. I’d like some comments on this, as I can foresee that some people might not like the software updating itself. Should the feature be off by default, or on by default?

Other things on the to-do list

Various bits of documentation need to be updated. I need to host installation instructions, or links thereto, for any platform to which Bad Behavior has been ported, and several of those (such as phpBB and Movable Type) are missing right now.

I need to follow up with some people who have ported Bad Behavior 1 to other platforms in the past and either get them to update their work or let me know that they can’t (e.g. Drupal and DotClear).

Some people have complained that the error messages displayed to people who are blocked aren’t thorough enough or don’t explain well enough how to resolve the problem. I edit these on an ongoing basis whenever I become aware of a particular issue, but with the wide variety of proxy servers out there (and it virtually always is a proxy of some type) it’s difficult to just sit down and provide specific directions for every one of them. I may need some of you to contribute directions on reconfiguring specific proxy servers, and requests for these will likely be posted here in the near future.

End notes

Bad Behavior must continue to keep up with spammers as they attempt to adapt and find new ways to post their automated garbage. As I noted last year, this has been at most a minor issue, as there is only so much the spammers can do while maintaining their high rates of spamming (now 100,000 or more spams in a single run is not unusual, and one spammer I’ve blocked can send 1,000,000 in a day). Bad Behavior attempts to drive up the cost of link spamming by blocking as many automated spammy requests as possible, forcing the spammers to resort to MUCH slower manual methods, or ideally, give up and find more honest work.

While this has actually worked, the spammers have begun to adapt. I am seeing a rise in spam being delivered through botnets of compromised Windows computers running various bits of malware which take over Internet Explorer, and occasionally even Firefox, to do their dirty work. The screener, which I expect to complete by 2.1, should take care of the vast majority of these.

But it remains an ongoing problem, and I’ve set up a separate project whose purpose is to locate and disable these botnets and ultimately cut the flow of spam right from its source. I can’t say much more about this project right now, but a few of you will hear from me about it in the next week or so, and hopefully in a few months I can release more information generally.

If you think this roadmap looks good, and want to accelerate the development of Bad Behavior, or the botnet project, contribute financially and I’ll be able to devote more time to it. And by all means, if you think I left something out that should be in the roadmap, please let me know. And yes, I know a lot of you are flat broke, so even if you are unable to contribute financially, please leave your comments.

I have an idea for the auto update feature. Perhaps the update download should be automatic, but the update application should be manual (and easily switched to automatic.) Here is how I invision it working:

BB downloads and update and sends an email listing changes to the appropriate admin. Admin examines change list and update and runs patch script. Some admins decide to automate the process and they edit your autoupdate script uncommenting lines to autorun patch script and disabling (or not) the email.

Comment by Scott Killen |
August 5, 2006

1. I think auto-update should be off by default, but perhaps with the flexibility in WP, you could actually prompt the user to enable it when they click the activate button.

2. “it’s difficult to just sit down and provide specific directions for every one of them. I may need some of you to contribute directions”. So make your personal MW public, and have those “I got blocked” keys link directly to a wiki page.

The problem with linking to a wiki in the error page is that virtually everyone who sees it is a spammer. At this point we’re talking maybe one in ten million hits might be an actual person, and the reason they are seeing the message is either because their computer is full of viruses and malware, or because they have done something unusual to their configuration (and already know what they’ve done and how to fix it).

What’s needed are instructions for those extremely rare people who have done something unusual to their configuration, but don’t know what they’ve done or how to fix it. (And this is maybe one in a billion hits at this point.)

It’s funny you posted this now, because after our e-mail exchange earlier I spent half the night painfully extracting enough information on MediaWiki extension development to write a basic add-on to Bad Behavior 2.0.6 which displays log entries in a human-readable special page. I’m calling it Bad Behavior 2 Extended, since it’s likely I’ll try to add in some additional useful stuff in the future.

Interested MediaWiki users should check my post about it at http://neurophyre.livejournal.com/420286.html which also contains a handy link to my 5-step guide to blocking spam in MediaWiki. I’ve not had a SINGLE successful linkspam insertion on http://www.umasswiki.com/ in 6 months or more, and it’s well-publicized. Bad Behavior has been an important part of that success.

Incidentally, I look forward to testing BB2 on MediaWiki 1.8.0 which came out today. I should have results by this weekend, and I’ve added more information to the spurious whitespace bug on their Bugzilla.

Okay, even weirder is you posted this on August 5 and Livejournal just snarfed it up now so that I saw it after I finished writing the initial MediaWiki thing.

Anyway, what I’d like to see for the future roadmap is either finer granularity of control over what checks BB uses, or a ‘relaxed’ mode to complement ‘strict’ and ‘normal.’ Namely, ‘relaxed’ would probably avoid DNSBLs and stick to other forms of analysis.

Having two separate packages is interesting; would it then be possible to download the core and place it in /home/ while downloading appropriate wrapper packages to place in /home/domain1.com/, /home/domain2.com/ and so on? That would further simplify updating, I think, while actually helping to reduce complexity for the user (update the core once fixes several, perhaps dozens, of domains at one time).