A while back we installed FuzzyOcr, a SpamAssassin plugin that used OCR programs to try and detect the surge in image spam that was going on. That worked fine for a while, but spammers changed to using obfuscated images which weren't easily readable by OCR systems. However it didn't matter that much, because:

a) Most of the new spam machines started being listed on RBL lists, so they would get spam scores regardless of the image analysis
b) Some SA rules were added by the regular sa-update that gave a score to the general form of message with attached gif

It seems these two combined in most cases would get the spams over the 5 point threshold that's the default "Normal" level protection.

Now the problem is that the RBL stuff doesn't work nearly as well for people forwarding from another service to FM. Basically SpamAssassin will in quite a few cases only look at the network "edge" where the email came from to our system because you can't trust headers beyond that. In the case of forwarding services, that means the forwarding service itself is checked against RBLs, not very helpful. I made a change a while back to SA that tries to help with that by defining some "trusted" forwarding servers. If we find those in the headers, we scan back through them to the IP of the machine that entered that system. The current list of trusted systems is:

Note that being a "trusted" system doesn't mean we don't spam check it, it just means that we parse back through the Received headers to find what server delivered the email to that service, rather than using that services IP. This improves RBL checks enormously because there's no reason for any of the above services to be on an RBL.

To avoid forgery issues, we look at the IP address the Received header shows the email came from, and do a reverse and forward DNS lookup to see that they match, and then see that it's a host within one of those domain names above.

This has helped with those forwarders, but of course not all forwarding systems people have setup. So I had a quick look if we could improve the image spam scanning. An hour of fiddling, and I found a set of transforms that does amazingly well on almost all the current obfuscated image spams out there. Check out this dir:

You can see some original, and some "fixed" versions. Feeding the fixed versions into OCR software usually gives some meaningful result.

I rolled this out to our spam scanning machines for all incoming email yesterday (note I haven't rolled it out to the machines that handle Pop Links yet), and I notice that overnight I got no image spams at all in my Junk Mail folder. Hopefully a trend that continues.

When a user is using a forwarding service then there is a specific SMTP envelope rcpt address the user set at the forwarder to forward to. I have several forwarders and each forwards to a specific subdomain@alias address. It would be helpful if there was a way for a user to declare that a certain incoming address is expecting mail from a certain forwarding host, and then trust the next received headers from these hosts when they send to the particular address (though I'm not sure how easy it is to implement. Most forwarding services I use insert several "Received" headers for several internal transfers.

SpamCop has their "mailhosts" configuration system for making their system learn about the forwarders each user uses, but I guess parsing "Received" headers is their specialty.

I've been seeing a massive increase in spam reaching my inbox and it all has a spam score of 0! So now I know why - it's all being forwarded through 2 university mail systems. Rob - can you somehow add "trusted" .edu mail forwarders?

Currently adding a trusted forwarder is manual only, however I was thinking of adding an option on the Options -> Spam/Virus protection screen so people could add custom hostnames. Still, I'd like to make sure that the most common forwarders are included so things just "work" for most people.

If you email me at robm AT fastmail DOT fm with forwarder details, I'll think about adding them.

Re the X-Spam-source header. My plan had been to integrate this with the improved SA Received header tracking, I just hadn't got to it yet.

FYI, I only received 2 image spam messages over the entire weekend in my Junk Mail folder, which means all the rest must have been recognised and given a score > 10. Normally I get a dozen in my Junk Mail folder.

Originally posted by robmueller If you email me at robm AT fastmail DOT fm with forwarder details, I'll think about adding them.

I think a ".edu forwarder" usually means a .forward file in a user's homedir on unix, so allowing this as "trusted forwarder" ammounts to either allowing all the .edu tld and also things like .ac.cc or .edu.cc (where cc stannds for country code) or listing lots of individual universities and many hosts within each university. My own forwarders are ams.org, openu.ac.il and my ISP (and I'm not sure if it can be "trusted").

It's silly using RBLs that block email sources with any legitimate mail (eg hotmail/yahoo/etc) even if they do send some spam, because you're just randomly blocking some machine and users. I think RBLs are best when they just block known insecure machines that should never be sending email.

I think having the per-user option is the best way to go in the long term, but I'm happy to add forwarders people use if they email me.

Originally posted by robmueller I won't add isps to trusted hosts, since they are actually indirectly the source of most spam with their users on dsl networks with compromised machines. ...

I thought that the "trusted hosts" was just about being able to trust a host to put in correct "Received" headers. So even if an ISP's outgoing SMTP server relays spam sent by broadband subscribers it usually can be trusted to prepend correct "Received" headers to the email it relays.

Anyway, how do you know where to stop accepting "Received" headers? Most forwarders I've seen add several such headers.

Here the forwarder added 4 "Received" headers so if the forwarder can be trusted the bottom one shows the originating IP (personally I trust these headers and crop all but the bottom one from spam reported using SpamCop).

Basically spamassassin already includes a Received: header parser that tries to break a Received header down into a common format (for the interested, it's a several hundred line function that tries to match against many, many different formats since there's no standard Received format).

Now by default, spamassassin will search back through the Received headers to find the IP the message was received from into the local system. All we do is extend the length it keeps searching back to include extra "trusted" hosts.

We look at the rdns value (mail01.ams.org), and see if it's in our trusted host list. It's not, so we strip the /^[^.]*\./ from the front to get "ams.org" and try again. This is in our trusted host list.

Now we need to check that the header isn't forged, and the ips are actually real. So we do a DNS lookup on mail01.ams.org to get the IPs.

Code:

$ dig +short mail01.ams.org
130.44.1.106

And we see this does match the IP in the header, so we trust this Received header, and move on to the next one. Repeating this process would get us to:

So actually SA does a pretty good job going down all received headers representing internal mail transfers inside an ISP given only the ISP's domain. So if you can get it to "trust" a user specified domain the user would only have to specify the ISP's domain and not any internal email hosts for it to work. I prefer it to be limited only to specified email address since the forwarding works is that a user specifies a particular email address to the forwarder to forward to, so there is no reason to "trust" the forwarder except for the specific email address email is forwarded to.

Originally posted by robmueller With that definition, it doesn't really matter whether it's a specific email address or not for the hosts you "trust"

I understand that the definition ot "trusted" here is just "trusted to report the correct IP address in the 'Received' header".

But then if for instance I am a client of inter.net.il (which I would bet several FastMail users are. It's one of the biggest ISPs in Israel) then I might have then forward my email to FastMail. Then if I can tell FastMail "trust inter.net.il" when parsing "Received" headers then if I get spam sent from a broadband subscriber of inter.net.il sent directly to FastMail or to any forwarder I "trust" then I would also trust the "Received" line that the spammer put in if the spammer corectly uses the rdns of the IP address that sends the spam, and then the spammer can indicate a forged source in a forged header I "trust". For instance in the second example that I posted above, the one with bottom "Received" header saying:Received: from SHIVUK-NET-5 (Hosting-IGLD-192-248.inter.net.il [213.8.192.248] ...
the spammer can identify as Hosting-IGLD-192-248.inter.net.il which would pass the "trust test" and then the spammer can forge another "Received" line that lets Hosting-IGLD-192-248.inter.net.il tell us that the email originated from somewhere else. This is a side effect of the fact that anything with suffix inter.net.il would be "trusted" if inter.net.il is trusted. I don't know if limiting "trust" per address set to receive forwarded mail can solve it, but it can limit this problem to spam sent within the forwarding ISP.

"I won't add isps to trusted hosts, since they are actually indirectly the source of most spam with their users on dsl networks with compromised machines."

If you use an ISP for forwarding then, you're unfortunately out of luck at the moment with this. However you are right, you can narrow down the problem by making trust a tuple of "host/rcpt-to-address" that's trusted rather than just "host". When I add "per user trusted hosts", i'll keep it in mind...