BLOG

Content based filtering

Content filtering is often hard to explain to people, and I’m not sure I’ve yet come up with a good way to explain it.

A lot of people think content reputation is about specific words in the message. The traditional content explanation is that words like “Free” or too many exclamation points in the subject line are bad and will be filtered. But it’s not the words that are the issue it’s that the words are often found in spam. These days filters are a lot smarter than to just look at individual words, they look at the overall context of the message.

Even when we’re talking content filters, the content is just a way to identify mail that might cause problems. Those problems are evaluated the same way IP reputation is measured: complaints, engagement, bad addresses. But there’s a lot more to content filtering than just the engagement piece. What else is part of content evaluation?

Does the mail have hashbusters? Hashbusters are blocks of text, sometimes invisible to the recipient, that are put in an email in order to break some types of filtering. Ways to hide text include in HTML comments and by making foreground and background text the same color.

Does the mail have valid HTML? Spammers have frequently used invalid HTML tags as a way to avoid filters by breaking up content or as hashbusters.

Does this mail contain malicious content? These filters look for virus signatures or code that may compromise a recipient’s computer. Very few legitimate mailers have mail caught in virus filters, but every incoming mail is still scanned for viruses or malicious code.

Does this mail look like a phish? These filters look at the domains and authentication, but also look for common words and tricks phishers use. This filter is most likely to catch legitimate mail using tracking links with different URL content in the text portion of the HTML. An example of this kind of trigger is <a href=”http://tracking.example.com/login.html”>http://paypal.com</a>. Making sure there aren’t URLs, email addresses or hostnames in the text portion of a link generally avoids this kind of filter.

Is this an industry with a bad reputation? The most obvious examples here are payday loans. There are so many horrible players in the online payday loan industry that it doesn’t really matter how good or clean individual mailers are. Payday loans are filtered heavily. Stock and financial messages also have challenges because there are so many pump-n-dump spammers out there.

Changing content can cause an improvement in delivery. But if that content was flagged because of user complaints or bad recipient profiles, the content filters will catch up. Continuing to attempt to evade filters by changing content can result in IP based filtering.

These are just a few of the things companies look at when evaluating content.

Phishing filters know that <a href=”http://phishing.com/paypal” >paypal.com</a> is the sort of thing phishers do, and a sign of untrustworthy mail. To a mechanical filter, that looks just the same as <a href=”http://esp.com/clicktrack/abcde” >listowner.com</a>.

<a href=”http://listowner.com/” >listowner.com</a> is fine – but if you use that in your content and your ESP is providing click-tracking for you, it’ll get rewritten into something that looks like the link above, and phishing filters will get upset. Avoiding having URLs or hostnames in the readable text of the link is one simple way to avoid that.

Insightful article! Traditional email marketing is becoming difficult as spam filters are becoming more & more strict. Please elaborate this article though, I feel that this topic warrants more than just 5 points 🙂

You can't technical your way out of the bulk folder. I wrote that a year and a half ago, and it's even more true today. Filters at the big webmail providers continue to evolve to meet new threats and new spamming techniques. Sending technically perfect mail won't get your mail into the inbox. Recipients have to want the mail and interact with the mail for good delivery.
No Comments