1. It requires zero interaction from user. 2. It produces zero false positives (good messages identified as bad) and zero false negatives (bad messages identified as good). 3. It is transparent – that is, you only ever see good messages and never need even be aware that spam exists.

That’s it. Not much of a shopping list is it? Of course, “SpamSplatter 3000” hasn’t been invented yet (and if it does, I want a piece of action), but it does give us a frame of reference when looking for best filter we can find.

Let’s take each point in turn:

It requires zero interaction from user There are two kinds of filters that come near to this ideal currently: Bayesian Filters and Community Filters. Bayesian filters strip messages down to small “word bites”, or tokens and maintain a database containing lists of good and bad tokens. When a new message is encountered, filter strips this message down to tokens, compares it to database, and applies a formula based on British scientist Alan Bayes’ formula for probability calculation. Over time, Bayesian filter “learns” characteristics of spam messages.

Community Filters simply work on a voting system whereby every user that receives a spam message “votes” it as spam. This information is stored on a central server and when enough votes are received message is banned from all users in community.

As can be seen, user interaction from these types of filters is mainly limited to two button operation – correcting wrongly identified messages – and more accurate filter, less those buttons are used.

OK, so that’s pretty good. Not exactly zero interaction, but if filter is accurate enough, then it should be pretty near. That brings us to point two:

It produces zero false positives or negatives This is area in which most spam filter development is concentrating and things are getting pretty good nowadays. It is not at all unusual to see an efficient modern filter achieve accuracy of 96% or better. It is, of course, far better to have a false negative than a false positive if you are ever going to tear yourself away from killed mail folder!

Spam Filters Explained

Written by Alan Hearnshaw

Spam Filters Explained What do they do? How do they work? Which one is right for me? By Alan Hearnshaw

Spam is a very real problem that many people have to deal with on a daily basis. For those that have decided to do something about it and start to investigate options available in spam filtering, this article provides a brief introduction to your options and types of spam filters available.

Despite bewildering array of spam filters available today, all claiming to best one “of its kind” there are really just five filtering methodologies in general use today and all products rely on one, or a combination of these:

These filters scan contents of and look for tell-tale signs that message is spam. In early days of spamming it was quite simple to look out for “Kill Words” such as ”Lose Weight” and mark a message as spam if it was found.

Very soon though, spammers got wise to this and started resorting to all kinds of tricks to get their message past filters. The days of “obfuscation” had begun. We started getting messages containing phrase “L0se Welght” (Notice zero for “o” and “l” for “i”) and even more bizarre – and sometimes quite ingenious – variations. This rendered basic content-based filters somewhat ineffective, although there are one or two on market now that are clever enough to “see through” theses attempts and still provide good results.

Bayesian Based Filters “The Reverend Bayes comes to rescue”

Born in London 1702, son of a minister, Thomas Bayes developed a formula which allowed him to determine probability of an event occurring based on probabilities of two or more independent evidentiary events.

Bayesian filters “learn” from studying known good and bad messages. Each message is split into single “word bytes”, or tokens and these tokens are placed into a database along with how often they are found in each kind of message. When a new message arrives to be tested by filter, new message is also split into tokens and each token is looked up in database. Extrapolating results from database and applying a form of good reverend’s formula, know as a “Naive Bayesian” formula, message is given a “spamicity” rating and can be dealt with accordingly.

Bayesian filters typically are capable of achieving very good accuracy rates (>97% is not uncommon), and require very little on-going maintenance.

Whitelist/Blacklist Filters “Who goes there, friend or foe?”

This very basic form of filtering is seldom used on its own nowadays, but can be useful as part of a larger filtering strategy.

A “whitelist” is nothing more than a list of e-mail addresses from which you wish to accept communications. A whitelist filter would only accept messages from these people and all others would be rejected

A “blacklist”, conversely, is a list of e-mail addresses - and sometimes IP Addresses (computer identification addresses) - from which communications will not be accepted.