Block Sex Word In All incoming Emails

Dear Sir.

Is there anyway, I can block word sex or any other bad word, so that who ever wants to send any email with a word sex, in the subject or the body of the message they got blocked or got denied in sending me an email, I am using sendmail-8.9.3-15

Please let me know cos I am having so many stupid emails all the time..

Not easily with that version of sendmail. You could use a procmail filter, but the only other way for a sendmail of that vintage would be by writing code to be included in the check_compat() routine. There a number of procmail filters, like the one at http://www.spambouncer.org/.

Be careful about keyword matches. It's entirely too easy to wind up rejecting valid email. For example it seems safe to reject mail containing sex, but improper rule construction can also block emails containing Middlesex, sextant, etc. And no matter how good the keyword filters are constructed spam will still get through.

My personal preference is to reject mail that clearly appears to have orginated from spammers (known spam sites, open mail relays, sent by bulk mailers, or came from dialup or other personal accounts) and live with the other spam that slips through. There are a number of resources that you can take advantage to safely reject spam, like RBL+ (mail-abuse.org, orbz.org, etc).

I now know of a good solution for spam control that will work with this or any other version of sendmail. Take a look at http://www.sng.ecs.soton.ac.uk/mailscanner/index.html and http://www.spamassassin.org/ (they work together for mail filtering). I've got that combo and an virus scanner running on a number of mail servers now, ranging from small ones with a few hundred users handling a few thou messages per day all the way up to servers with 15K users handling 600k messages per day inbound only. The stuff works and works very, very well. For large servers (>50K messages/day or lots of large emails) it does require some tweaking to keep from being swamped.

I think SpamAssassin is too heavy-weight to use in a high-volume environment (because it is written in Perl). For high-volume systems, I would recommend filtering based on procmail scripts... There is a lot less overhead involved.

But, if SpamAssassin works for you, and you can throw the necessary hardware at it, then go right ahead!

What do you consider high volume? As I stated I manage servers that support 15K users with 600K messages inbound every day. Most folks would agree that that qualifies as pretty high volume for a single server. A typical box for that kind of load would be a dual 2Gz box and 1Gb or more memory with SCSI disks, which certainly isn't all that big of a server nowdays. Load average for the filter box will typically run between 3-4.

And there's a world of difference in the kind of filtering you can do with procmail and the context sensitive, score based filtering that SpamAssassin does. Simple keyword filtering is rarely acceptable because of the risk of false positives. Context/score based filtering may let some things through, but it is very unlikely to block valid messages that just happen to have one of the objectionable keywords. For example consider the following:

"We could see the baby in the ultrasound image but we still don't know what sex it is."

That could be a perfectly acceptable message between a daughter or son and one of their relatives or close friends and a simple keyword filter would block it, but SpamAssassin or any other sophisticated scanner wouldn't.

Well, with the kind of machine you're running, you can certainly afford some overhead :). Keep in mind that a lot of people don't think a mail server needs to be a hefty machine (or many times, they think it doesn't need to be a dedicated machine, either), so they end up running it on a P3-500 with 256 mb of memory.

I do not use extensive keyword filtering in my procmail scripts. Keywords need context to know if they're really spam or not, and are highly error prone (although I challenge you to come up with a legitimate use of the phrase "naked furry barnyard animals" -- one of the keyword phrases actually _in_ my filters :)

Rather, my filters are based on recognizing spam patterns and the structure of the spam message itself, instead of fixating on keywords (which is all SpamAssassin does, albeit in a more intelligent way).

In your example above, the message from a relative shouldn't ever get caught by a spam filter. One of the most important aspects of spam filtering is to have a good, comprehensive list of "known good" senders (a whitelist or greenlist), so that innocuous phrases (as above) don't get caught. While I wouldn't mind receiving an email from my wife with the word "sexy" in it, I can guarantee you that I don't want to see that word in an email from a stranger ("Come see our sexy naked furry barnyard animals!").

Only a minimal amount of filtering should be done on messages from people in your whitelist / greenlist (since spammers love to fake the headers & make you think you sent yourself a message), while lots of filtering should apply to people you don't know..

By the way, when I first implemented spam filtering, I used both SpamAssassin and SpamBouncer (procmail based). SpamBouncer caught more spam, but also a had more false positives... I wasn't happy with either one -- that's why I wrote my own scripts and started scanning for mail structure, rather than keywords... Personally, I think it works better than either one (no false positives, very, very, very little spam gets through)...

Well, I did say that the machine mentioned above is the config I'd use for a site that handles lots of traffic. And yes, one can use a 500Mhz machine w/256Mb & IDE disk(s) as a mail server. But, I'm reasonably sure that such a machine would die a horrible death if presented with the kind of mail load I that the fast machine above sees. I do have small machines, all the way down to 300Mhz boxes running as mail relay's and/or mail servers at sites where that processing power is appropriate.

FYI, the machine/site detailed above runs a loadav of 2-3 when the scanner and virus check are not used, i.e., in a pure mail relay configuration. Turning on the scanner only raises the loadav to 3-4, so the Perl code isn't as much of a burden as it might seem to be.

Featured Post

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention. Check out this how-to article for more information.