Stop Spam!

SPAM IS THEFT.

This essay describes what is spam,
why it's a problem, ways you can counter it today,
the bigger picture (how it will need to be countered long-term), and
a few links to related information.

What is Spam?

Spam is unsolicited bulk email (also called
unsolicited mass email); it's the (automated)
bulk nature of it that is so offensive, as discussed next.
A few people try to limit the definition of spam to only cover
commercial spam, but
it doesn't matter if the spam is commercial or not, if the
sheer volume of spam makes it impossible to use the equipment you purchased.

Why is Spam a Problem?

Spamming is another form of stealing and trespass;
spammers make other people pay for their messages
without permission of the recipient.
For example, spam steals a great deal of my time, uses up bandwidth and storage
space without my permission,
and makes it hard for legitimate email to reach me.
And it hurts others; my web pages once made it
easy for people to contact me (using "mailto:" links);
the volume of spam I get has forced me
to remove that capability, making it unnecessarily harder for
others to contact me.
Many organizations no longer post their email addresses, because their
legitimate email is overwhelmed by spam.
One estimate finds that spam costs
6 billion
Euros a year and the cost is rising.
For example, spam uses up lots of bandwidth and disk space that the
recipient has to pay for.

A
Washington Post article dated March 13, 2003 also discusses spam.
They quote Brightmail Inc.'s report that
roughly 40% of all e-mail traffic in the United States is spam
(up from 8% in late 2001 and nearly doubling in the past six months), and
Ferris Research Inc.'s report that
spam will cost U.S. organizations more than $10 billion this year.
They also quote
Robert Mahowald, research manager for IDC;
his firm estimates that for a company with 14,000 employees,
the annual cost to fight spam is $245,000 and that "there's no end in sight."

Spam can even kill.
There is a
U.S. Secret Service
Advisory on 419 schemes that particularly discusses the
"Nigerian" letter (called the Nigerian Advance Fee Fraud Overview, a
kind of "419" fraudulent scheme)
They note that
in June of 1995, an American was murdered in Lagos, Nigeria,
while pursuing a 4-1-9 scam, numerous other foreign nationals
have been reported as missing, this particular type of fraud
grosses hundreds of millions of dollars annually,
and the monetary losses are continuing to escalate.
This kind of fraud doesn't need email, but it's a common spam letter
precisely because the criminals don't need to pay "up front" to send spam,
so spam has enabled this deadly fraud to ensnare and hurt far more people
than it could otherwise.
Certainly, people shouldn't fall for such frauds, but since criminals
are allowed to send spam to everyone on the planet,
spam allows these criminals to exploit the naive and those who
momentarily let their guard down.
The notion that you can steal other people's resources to support scams
is ludicrous.

The IETF (who develop Internet standards)
describe why spam is a problem in documents such as
RFC 2635.
The IETF document
RFC 2505
gives more information to mail administrators on how to deal with spam.

Spam has caused me, personally, loss of important data.
All email sent to me in August 13-20, 2002, was lost forever
due to a torrent of spam.
Spam caused me to lose more email in November 2002.
Again, Spam effectively steals the ability to effectively use email systems
from their rightful users;
it's way past time for legislators to understand that spam is theft.

Since receivers pay the bulk of the costs for spam, spam use
will continue to rise until effective technical and legal countermeasures
are deployed, or until people can no longer use digital communications.
I hope that legislatures in particular will realize the threat and
help work to counteract it.
In the meantime, technical approaches without
legal help will need to be put in place.

Lots of people have various anti-spam suggestions, and I believe
that defense in depth is a good idea.
However, there are a few ideas I'm particularly fond of:
email passwords,
challenge-response passwords, and statistical analysis (e.g., Bayesian).

The SpamBayes project
has researched how to improve classification of spam vs. ham messages
when examining the message contents (as well as developing an implementation).
They began with the naive Bayesian approach, but have been
evaluating variations on the technique to improve it further.
Their tests show that examining pairs of words are actually
less effective than examining words individually, which is an interesting
approach.
They have also developed a different algorithm for combining the
probababilities of individual words, the chi-squared approach, which
in their tests produce even better results.
Here is their description in their own words:

The chi-squared approach produces two numbers -
a "ham probability" ("*H*") and a "spam probability" ("*S*").
A typical spam will have a high *S* and low *H*, while a ham will
have high *H* and low *S*.
In the case where the message looks entirely unlike anything the system's been trained on, you can end up with a low *H* and low *S* - this is the code saying "I don't know what this message is". Some messages can even have both a high *H* and a high *S*, telling you basically that the message looks very much like ham, but also very much like spam. In this case spambayes is also unsure where the message should be classified, and the final score will be near 0.5.
So at the end of the processing, you end up with three possible results - "Spam", "Ham", or "Unsure". It's possible to tweak the high and low cutoffs for the Unsure window - this trades off unsure messages vs possible false positives or negatives.

A selected set from the
newsgroup news.admin.net-abuse.sightings might be useful for initial
training, though it can't be used directly
(a spammer will try to fill it with useful messages to disable filtering).
I really like this filtering approach and approaches like it.

I'd like to see mail browsers add a "SPAM" button that will can do
a number of configurable actions, and has a useful default.
I suggest as the default that it save the message in a
"past spam" folder, and occasionally invokes a naive Bayesian
statistical analysis program (as Graham describes)
to create a filter for the future
(then filter out email with a high probability of being spam).
Perhaps it could optionally do other things, such as forward a copy
to a list of email addresses (e.g., your local "abuse" account, the
newsgroup news.admin.net-abuse.sightings, and
email addresses of well-known spam killers), or calling on other
spam killers to check it
like SpamAssassin.
It would be great if the spam message could be forwarded to an abuse
account of the sending ISP; determining what that ISP is could be difficult.
Perhaps there could be a checkbox beside each action like
"don't do it when you press SPAM", "do it when you press SPAM",
or "confirm before doing it when you press SPAM" - that way,
you could get rid of chain letters without sending them to net-abuse.
The advantage of these first two approaches is
that it doesn't matter if spammers
know this is happening, and they can be implemented without requiring some
Godlike and expensive central authority.

Another approach is to use various blackhole lists, which identify
locations that allow spam to be sent, and then summarily throw away all
email from those locations.
If the location is used for non-spam, eventually the location will have an
incentive to stop the spam.
Obviously, this has various risks, such as incorrectly identifying
spam sources, getting a site off the list, and finding ways to
maintain and distribute the list.

There are other ideas, too.
Tools like SpamAssassin,
while necessarily imperfect,
do a reasonable job at detecting spam.
There are tip-offs to spam that aren't always caught by generic tools,
for example, if you only speak
languages that use Latin characters (such as English),
it's very likely that an email subject line in an Asian language is spam
(and often reprehensible spam, too, like child pornography - why anyone stands
up for the right to transmit child pornography spam is beyond my
understanding).
Various
approaches to implement stamps (either money or computational time)
make some sense, although they would require extremely
widespread deployment to work.
If S/MIME and PGP were more widely deployed, and keys were widely available,
you could only accept encrypted email, which would at least make spamming
slightly harder; see my
article on how to easily distribute
email keys.
One group is even using
haiku to counter spam.

Another simple approach is to sort email so that email that is probably ham
(not spam) is sorted first or placed in a separate box.
For example, email from people you've already
sent email to (or in some other way identified as trusted)
could be identified as ham.
Of course, email sources can be forged, and spammers could use
viruses to send spam email from trusted soruces,
but this makes a spammer's job a little harder.
The general approach of defining a list of email addresses that
are not spammers for you is called a "whitelist".
A simple way to create a whitelist is to start with the
contents of an addressbook and saved email messages.
Another approach is to require codewords to be placed in the email
before you'll receive them,
and the codeword is then placed on a website as a shrouded image
(so that humans can read the codeword, but spammer's automated
email address harvesters cannot).

I believe that email reading programs, such as
Mozilla, will include
stronger and stronger anti-spam technical measures over time using
methods such as these.

As I noted before, an "opt-out" list is not really a great idea.
Why should I have to sign up on a list just so I can use the email
account I'm already paying for?
However, it may be that legislatures will be unwilling to establish
strong anti-spam legislation without one, and for the future I believe
it'll be important to enact strong anti-spam legislation
(as I'll discuss why legislation is important in a moment).

If the world must have an "opt-out" list, there needs to
be a single opt-out list that doesn't help spammers, and costs
nothing for non-spammers (whose resources are, after all, being stolen in
the first place).
Such a list must also allow whole domains to opt-out,
not just individual users, since some domains' connections have poor
bandwidth or are very expensive - even some spam can take them down.

Here's the best way I think of implementing a bad situation:
a non-profit organization (with a .org address) or
government agency without a conflict of interest
would create and maintain a database of HASHES of
email addresses that do NOT want spam
(say MD5 and SHA-1 hashes of canonicalized email addresses,
e.g., all lower case; an entire site could be represented by "@mycompany.com").
Anyone can download the database, for a fee.
Anyone must be able to add or remove their email address from the list for FREE
(and it must always be free); they just need to subscribe/unsubscribe,
with a separate email to confirm (to show that they really did add their
email address to the list; this can be confirmed by emailing them a
temporary password to confirm the request;
entire sites could require "root" or "postmaster" to represent them).
The confirmation would need to be by email, since otherwise many spammers
would simply forge messages to remove everyone from the list.
It's critical that users can add or remove themselves for free; why
should I pay an additional tax just to help the freeloaders who are
exploiting others' email addresses?
Then legislation can be enacted that gives serious penalties to
any spam sent to the "no-spam" list.
Capturing the database wouldn't do any good for a spammer;
it would only provide hashes and date/time stamps.

Note that this requires almost no resources; adding/removing names can
be done via the web, the "database" can be trivial
(a text file listing timestamp, action (FORBID or PERMIT spam), and
email hashes), and the implementation program can be
trivial (a few hundred lines of code at most).
Database download or query rights fees (say, $10,000 per 10 million
email messages checked) could pay for the whole thing.

This approach isn't foolproof; spammers can use password cracking techniques
to figure out at least some of the database contents.
More likely, many spammers will simply ignore the list, and find the
names just like they do now.
But stiff fines for ignoring opt-out lists might cut back a few spammers.

I will say that this sort of opt-out approach hasn't fared well
in the past;
Shut up and Eat Your Spam
discusses the history and bad faith of spammers.
To be practical, this requires legislation; spammers wouldn't voluntarily
use an effective list (doing so would eliminate the point of spam).
Of course, the whole notion that you have to sign into a database to
prevent theft is a wrong-headed notion in the first place.

In the bigger picture, I believe there needs to be both laws forbidding
spam (all spam, not just commercial spam),
as well as technical means to help enforce this.
We need both law and technology; each needs the support of the other.

Laws don't solve the whole problem, but if spam were illegal, much
more could be done to reduce spam to a much smaller level.
Murder still happens - even though it's illegal - but the legal
system acts as a deterrence, helping to reduce the occurance of murder.
Some spammers spam simply because it's legal where they are; if it
was illegal, they would stop.
The spammers who will perform illegal acts will find it more difficult.
Such laws must be international, but that's actually quite possible.
Countries that fail to enact anti-spam laws could find their entire
country blacklisted (no one else would accept their email), and that
would act as a strong incentive to enact and enforce
anti-spam legislation as well.
If the top ten spammers were hit hard
(say, taking all their assets the first time and throwing them in jail
on a repeat offense), spam would go down remarkably -
because the worst
offenders would stop, and the others would not be next.
This is already starting to happen through local laws;
a
Washington resident has been awarded $250,000 against a spammer, and
spammers who filed a frivolous lawsuit to quiet opposition
are finding that they may pay dearly for it.

I believe that the solution is quite workable: make unsolicited bulk
email illegal.
If you send unsolicited bulk email, you need to at least pay a fine for
each unsolicited email sent.
It's always complicated to define new laws, but it's clearly workable
(several governments have already done so!).
Unsolicited can be defined e.g., as messages that you didn't request and
not related to an ongoing or recent (within one year)
business transaction or relationship.
Customers must be able to opt-out of messages from businesses they DO
work with.
The definition of "bulk" could simply be a large number, like
at least 1000 recipients; nobody needs to send a message to that many
individuals unsolicited.
That means that opt-in mailing lists are fine, since you sign up for them.
By "email" I mean any communication capable of communicating with many
people, such as Internet email, cell phone text messages, and so on.
Note that websites don't have a problem with such laws,
since web users have to
perform an action (such as clicking on a link) to see the data, and thus
are requesting to see that particular data.
Many governments are already moving this way.
People who create spam viruses should be accountable for all the spam they
generate, as well as for illegally using others' computers.

But what if governments are unable to do what I believe is the right thing?
Is there a partial position that could help, temporarily, until
they get more effective legislation enacted?
At the least, laws could be put in place to require all
unsolicited bulk email to have a standard marking
that can be easily identified mechanically
(e.g., the first characters in the "Subject" line, or its equivalent,
must be exactly the four characters "ADV:")
and to make forging "from" information
for the purpose of spamming illegal.
This would make it trivial to filter out spam, and would make it much
easier to use "whitelists".
Some U.S. state laws do at least one of these things
(see Spamlaws), or at least
do it for commercial and/or pornographic spam.
Laws should cover all spam; what is "commercial" or "pornographic" can
be subjective, and other spam still steals services, so there's no
reason to treat different kinds of spam differently.
Do you really want massive amounts of pro-Nazi spam, for example,
even if it isn't commercial?
Simply covering all unsolicited bulk email would be more appropriate
and would be much easier to enforce.
With those laws (required marking and valid return addresses), people
could at least begin to throw away spam if they want to, which I suspect
will be true for almost everyone.
Perhaps everyone will choose to throw away spam - but if that's true, then
that's a consumer choice, and spammers have no right to object to
true consumer choice.
Those who don't want spam
from other countries where these laws aren't passed can
simply throw away all email from those countries ("blackholing") -
at least this gives users of email a choice!

The U.S. Federal Trade Commission (FTC) has already begun
suing spammers who
disobey existing laws; these are generally fraudulent acts
such as deceptive misuse of others' trademarks,
false return addresses,
and adding users to spam lists when they ask to be taken off the list,
But until laws make spam illegal (or at least, unmarked spam illegal),
the FTC cannot take legal actions against spammers in general.

Spam has become so obnoxious that
even the
Direct Marketing Association (a group whose purpose is to steal time and
resources from others) agrees that commercial spam with
forged "from" fields should be made illegal in the U.S. by
federal law.
Perhaps the DMA itself is finding that it's having trouble using email itself!
For example,
the DMA uses email addresses like
membership@the-dma.org and
Presiden@the-dma.org;
either they get piles of spam, or they have to put a big spam filter on it.
Of course, if the association for spam is having trouble using email
because of spam, then it's clear that spam is out of control.
As I noted above,
some states already forbid at least commercial spam from forging their
"from" addresses, as part of their anti-fraud laws, so enacting at least
that requirement across the U.S. shouldn't be that hard.
The DMA's accepting of anti-fraudulent spam legislation
would not stop many problems, for example, their approaches still
want to stay with opt-out and don't want to include the ADV convention
required by several states.
In other words, the DMA assumes that everyone needs several billion emails
a day unless they spend their lives sending opt-out messages individually
to every organization on earth.
After all, if the DMA has its way, every organization will be sending
out cheap emails to everyone on Earth, repeatedly, preventing people
from using their own email accounts.
Ridiculous.