The motivation behind spam (unsolicited commercial email or "bad" email) is quite simple. Spammers send out millions of emails so they can sell something - almost anything! For spam to be worthwhile to spammers, all that is required is for one person in a thousand or even one person in a million to buy something.

The biggest challenge with spam detection is how to programmatically block spam but not block ham (good email). Incorrectly identifying ham as spam is called a false positive in the spam industry. In reality, the only "thing" that can tell whether an email is spam or not is the person who is reading it! What might be considered spam by one person could well be considered ham by another person. That's the dilemma. How do you keep the bad mail out, let good mail (electronic airplane tickets, customer orders and queries) in and not have false positives?

So what are SURBLs and why are they so good at identifying spam?

The best way to think of SURBLS (Spam URI Realtime Blocklist) is to think of them as being backwards RBLs (see below for more information about Real-time Blackhole Lists). Instead of blocking "spam sending" email domains, which is what RBLs do, SURBLs block destination web site addresses - the destinations that the spammers want you to visit. From www.surbl.org, "SURBLs differ from most other RBLs in that they're used to detect spam based on message body URIs (usually web sites). Unlike most other RBLs, SURBLs are not used to block spam senders. Instead they allow you to block messages that have spam domains which occur in message bodies."

What this means is that SURBLs get spammers where it hurts! It is easy for spammers to use many different techniques to use fake email addresses to send spam. That part we understand. It is much harder, however, for spammers to have large numbers of different destination web sites for them to send you to. To overcome the effect of SURBLs, a spammer would need to have thousands of destination web sites - and that's just too complex. Because a SURBL is a list of "destination" URLs and domains that are included in spam emails, being required to create and manage hundreds or thousands of destination web sites decreases spammers' ROIs. Remember that a spammer's primary objective is to get a person to go to a destination web site URL to buy something! And because spam emails need to have a destination web site URL to be of value to a spammer, SURBLs are the most effective recent advance in spam detection. By analyzing the destination links in emails and comparing them to a SURBL, the level of accuracy of spam detection rises dramatically.

Just how good are SURBLs in helping identify spam?

Depending on the SURBL or combination of SURBLs, without using any other strategies, SURBL spam detection rates can be as high as 90% with false positives running from 0.001 to 0.4%. And the accuracy gets much higher when SURBLs are combined with other strategies that are available. According to Darryl Bleau, developer of the popular GroupWise and NetMail SPAM Filtering software, GEE Whiz, "by adding SURBL support to GEE Whiz 2.x, spam detection has increased significantly. And it is the last few percentage points that make the biggest difference. More importantly, because SURBLs are so much better suited to targeting spam, the decrease in false positives is most appreciated by end users and administrators." See www.surbl.org/links.html for additional information and comments regarding how adding SURBL support has dramatically improved spam detection and decreased false positives.

The best thing about SURBLs is they are part of a growing array of new strategies that are starting to win the battle against spammers. SURBLS decrease the return on investment for spammers. And the primary objective of anti-spam strategies is to decrease the spam generated user clicks to such a low level that spammers can no longer make money doing what they do! When that occurs, the spam problem will be solved.

So, are RBLs and other Spam Assassin techniques no longer required?

RBLs and Spam Assassin work with SURBLs and other strategies to create a well-rounded spam detection solution. The best and most respected spam detection products and techniques are built on the open source project and collaborative information available at www.spamassassin.org.

Spam detection products built on SpamAssassin and other strategies have developed a large number of spam detection strategies over a short period of time. These strategies include: blacklists, RBLs (Realtime Blackhole Lists), header and text analysis and Bayesian Filtering. Most spam detection products use combinations of these and other strategies.

SURBLs are part of an evolving list of spam detection strategies. Historically, one of the first strategies used by spam detection software was to create lists of domains and email addresses of known spammers to be blocked (blacklists). This was quite effective in the early days when spam techniques were primitive. The biggest challenge with this strategy was the amount of administration time required to update the blacklists. Many companies and individuals overcame part of this challenge by sharing blacklists thereby developing comprehensive blacklists. Blacklists evolved over time to include complex regular expression (REGEX) statements that take out entire ranges of domain names based on the content of the sending email address or domain name.

A second strategy used by spam detection software came about as a result of spammers using Open Relays (improperly configured "innocent" email servers that would allow anyone to send mail through them) to generate huge volumes of spam. To detect spammers using open relays, the technique of RBL blocking became common. Before allowing an email into a system, the spam detection software would check the sending address against an RBL service and apply the appropriate result (block the email or increase the Spam Assassin value thereby increasing the probability of the spam being blocked). The biggest challenge for this strategy is the number of legitimate email servers that were identified as being email relays and the amount of legitimate mail that was being blocked (false positives).

A third strategy used by spam detectors is to use header and text analysis to block mail based on words or phrases and to incorporate a strategy like Bayesian filtering to allow spam products to "learn" and differentiate between spam and ham. The biggest advantage of Bayesian Textual Classifying is that an administrator can build up a library of spam and ham that is specific to his/her environment. As a first generation Textual Classifier, Bayesian was very successful.

The next article in this series deals with a new Advanced Textual Classifier strategy that improves on the original Bayesian Filtering strategy.