Marcel's Linux Walkabout: But I Don't Like Spam!

Are you drowning in a sea of spam (unsolicited email)? Hoping somebody somewhere will throw you a life preserver? Never fear, rescue is at hand. Join Marcel Gagne on his Linux Walkabout, as he introduces you to your Linux system's new best friend: the SpamAssassin.

From the author of

From the author of

To those of you who, upon reading the title of this article, share a mix of
anger and Monty Python nostalgia all rolled into one, count me among your
numbers. For all those others who have never seen the famous Monty Python Spam
sketch, or have never eaten the spiced ham luncheon meat, Spam for you is simply
unsolicited email. Incidentally, the term Spam, when referring to unsolicited
email, was actually coined from the Monty Python sketch rather than from
Hormel's meat product.

More and more, Spam is robbing us of our productivity, forcing us to wade
through increasingly large numbers of unwanted junk in order to deal with the
messages that are truly important. I am quite certain that in my zeal to delete
my junk email, I have more than once accidentally deleted a valid message. The
noise-to-signal ratio is getting far too high.

Just how much Spam are we getting, anyway? Well, let me give you a
frightening quote. "Predicted number of spam e-mails per inbox per year by
2006: 1,500." That quote comes from the August 2002 issue of Linux
Journal. And according to some figures, Spam already accounts for 36% of all
the email we receive. These figures should be enough to make you consider taking
drastic measures. Contemplation of just how drastic those measures might be
probably had something to do with how the package featured in today's
Walkabout got its name. Justin Mason's SpamAssassin is like saying
"NO" to Spam in a big way.

Aside from a great name that kind of sums up how many of us feel about Spam,
just what is SpamAssassin? Simply put, it is a mail filter that attempts to
identify spam using text analysis and several Internet-based realtime
blacklists. SpamAssassin doesn't actually delete mailinstead, it
marks it for easy identification to then be filtered into a special
folder (you don't want to automatically delete messages that might be
genuine.) When you have some free time, have a quick look at the collection of
messages and quickly delete what you don't need.

I've been running SpamAssassin on my system for several weeks now, and I
must say that I am very impressed. The project's website claims 99.94%
accuracy in identifying Spam. I'm not sure if it is quite that high, but I
would certainly agree to 95% accuracy. To start getting some relief from Spam,
start by taking a little walkabout of your own to
http://www.spamassassin.org,
where you'll find the latest source distribution. Since this is all Perl
code, building the software does require that you have Perl installed on your
system. All we have to do now is build SpamAssassin, which thankfully, is
frightfully easy.

As the above installation instructions imply, this is all Perl code, and as
such may require some prerequisites. The most significant of these is the
Net::DNS set of modules. On my system, I also found that I needed to
install the Time::HiRes module as well. Unfortunately, those packages
may also have some prerequisites that you need. The easiest way to deal with
this mess is to use perl CPAN shell.

perl -MCPAN -e shell

Upon issuing this command (you should be doing this as root, by the way),
you'll be at a cpan> prompt. One by one, enter the following
commands. After each line, the cpan> prompt will return, waiting for
you to enter the next command.

o conf prerequisites_policy ask
install Net::DNS
quit

Some of you may have already wondered whether you could just do the same
thing with the whole SpamAssassin install since it too, is a perl module. Well
done! Go to the front of the class. Just remember that to get the latest and
greatest, you should still visit the SpamAssassin website. At the
cpan> prompt, you can just type these commands.

o conf prerequisites_policy ask
install Mail::SpamAssassin
quit

With the installation complete, I needed to test it with a spam email
message. A quick look at my /var/spool/mail/marcel file (my inbox)
showed that one had just arrived. Well, well, well...I copied the message (your
basic Nigerian billions of dollars scam spam) to a temp file.

cp /tmp/spam.test

Then, I ran SpamAssassin against it.

spamassassin -t < /tmp/spam.test > /tmp/spam.out

The -t tells SpamAssassin to run in test mode. That means it
won't do anything in terms of your mailbox and simply directs the output to
a file. You can also choose to leave out the redirect (">
/tmp/spam.out") and the whole thing will be displayed to your screen.

Have a look at the output and you'll discover some new and interesting
mail headers.

If you don't happen to have a handy piece of spam, you can use the
sample-spam.txt file included in the distribution. Now that you are
comfortable with what is happening, how do you get SpamAssassin to start
protecting your sanity? In other words, how do we make this real? Dealing with
messages on a one-off basis is more trouble than it's worth. You want your
whole network protected.

On a number of Linux distributions with a standard Sendmail install, you will
find that procmail has been installed as well. Its job it to make it
possible to pre-process mail coming into the system or individual mailboxes. The
system-wide procmail configuration file is /etc/procmailrc. The
user's home directory ($HOME) may also have its own configuration
file, called .procmailrc. The file will be read and the rules within it
applied as messages come in. By default, mail winds up in your inbox unaltered.
These rules may be a set of instructions to deliver mail to pre-defined folders,
redirect it to another mailbox based on the subject line, or any number of
things. The procmail configuration file is where all this magic takes place.

To process incoming mail through SpamAssassin, we need to add a couple of
simple lines to our procmailrc file (or the local
~/.procmailrc).

:0fw
| usr/bin/spamassassin -P

That is all there is to it. When messages come in, they are tagged and
identified with a highly noticeable *****SPAM***** before the
message. You can then use your email client's filter rules to automatically
move these messages into a folder for later analysis. On my Kmail setup,
I have a rule that automatically moves Spam into a folder called
caughtspam (see Figure 1). Every few days, I take a rapid fire tour
through the folder, happily scanning for valid messages (almost never happens)
and deleting the offenders.

Of course, some things that look like spam may not be. There are some things
that you asked for and that you wanted to get. For instance, I get daily news
updates from a variety of tech info sites. Along with the stories, these emails
do contain some product advertisements. Since I still want to see them despite
their commercial content, I must tell SpamAssassin to ignore them. In the
/etc/mail/spamassassin directory, you'll find a file called
local.cf, where you can do just that. By default, there is nothing but
a couple of comments in the file. To flag an address as being from a
"good" site as opposed to a "bad" site, use
the whitelist_from parameter.

# Add your own customisations to this file.
# See 'man Mail::SpamAssassin::Conf'
# for details of what can be tweaked.
#
whitelist_from @protected_news_site.com
whitelist_from someuser@good_news_site.com

When SpamAssassin goes through your mail to decide whether something is Spam
or not, it assigns a score to each item it finds and adds up the total. The
default is to declare something as Spam if it hits a score of 5. I found this to
be a little low (too much non-Spam being caught), and changed mine to 7. This is
something you may wish to experiment with. This is the required_hits
parameter, also from the local.cf file you modified above.

Within a few days, you will have pretty much taken care of all the messages
that you really do want coming through. You can then relax and once again get
used to that wonderful feeling of an inbox filled with messages from people you
actually wanted to hear from. It is a great feeling.

Next time on the Walkabout, I'm going to take you into the hinterlands;
the backwaters; the deepest, darkest corners of the Linux universe. While it may
not be quite that scary, it's best not to take any chances. Better stock up
on that programmer food, just in case.