Spam filter for mbox style spools
(a work in progress)

This is still a work in progress, but it works well enough for me that
I put it in my cron.

What I do is set up four POP3 mailboxes.

mbox
mbox_keep
mbox_spam
mbox_unsure

All my mail goes to the 'mbox' mailbox. The 'noah_keep' mailbox is the one
that I actually read. That is, this is the account that I point my mail client.
The spam filter takes all mail from the 'mbox' mailbox and sorts it into the three
other mailboxes.

spam_filter.py mbox

produces the following files in mbox format: mbox_keep, mbox_unsure, mbox_spam

mbox_keep

everything here matched the WHITELIST or WHITEWORDS.

mbox_spam

everything here matched the BLACKLIST or was bigger than the size limit.

mbox_unsure

everything here is not on either list and it is less than the size limit.
In general this is mostly spam. You may want to check from time to time to
see if anything interesting is in here.

WHITELIST

This is a list of email address patterns. They can be full email addresses
or regular expression patterns. One pattern per line.

tom@example.net
.*example.org

BLACKLIST

This is a list of email address patterns.

sue@example.net
jim@example.net
.*example.com

WHITEWORDS

My Project Name
bypass_password

BLACKWORDS

The spam filter is smart enough to catch obfuscations such as "V. I. A. G. R. A"
and it converts fake unicode such as vïágrã to regular ASCII before checking the BLACKWORDS list.