How Hotmail Filters Junk Mail

Microsoft launched a new MSN Postmaster Web site where the company released details about how it filters junk mail destined for Hotmail accounts. The company also launched a new Web site to let companies view how much junk mail destined for Hotmail inboxes originates from their networks. If you're interested in a quick and basic description then read our news story, "Microsoft Shares Hotmail's Antispam Data." If you want more detail then read on...

First of all, Hotmail uses what Microsoft calls SmartScreen technology, which the company explains as a probability-based system. While Microsoft didn't come right out and say it, SmartScreen sounds a lot like other commonly used Bayesian filtering systems. As you might know, Bayesian filtering is a process of collecting statistical data on words commonly found in legitimate email and junk email. After enough data is collected the filter can become very effective at detecting junk mail by calculating a probability based on the words contained in any given email message. Microsoft relies on end users to provide feedback about the email they receive to help build a database than can be used to calculate probabilities.

Hotmail uses Symantec Brightmail, which relies on a "probe network" that is essentially a collection of about 200,000 email addresses. The addresses aren't used for any purpose other than to attractive junk mail, so any email received in the boxes are most certainly junk mail. Other Hotmail filters check messages to ensure they conform to common Internet standards as defined by relevant RFCs. Hotmail also allows users to configure their own whitelists and blacklists of email addresses.

A couple of other filters in use are Microsoft's Sender ID technology and IronPort System's Bonded Senders. Sender ID is basically a system that provides domain authentication to ensure email actually came from the domain as claimed by the sender. Bonded Sendersis a program where people post a bond in order to be considered a legitimate sender of email. You can think of the program as a verified whitelist that can be used as the basis for the assumption that email from program participants is not unwanted junk mail.

Microsoft made a new service available, Smart Network Data Services, that is designed to let companies know how their networks fare in terms of mail delivery to Hotmail inboxes. So for an example, an ISP could request access to Smart Network Data Services and then look up their IP addresses. The results might show how much email came from those addresses as well as the results of the delivery attempts. Results could include how much of the email was considered to be junk and which IP addresses sent the junk mail. Such information could help ISPs located computers that are part of botnets or accounts used by spammers.

Another useful service that can be used by anyone is SenderBase, provided by IronPort Systems. SenderBase is a set of email-related data collected from over 50,000 organizations around the world. The data reveals which domains and IP addresses sent the most junk mail over a 24 hour period. As quick view of the data on May 26 showed the most heavily used networks for sending spam were Comcast Cable, whose users sent 403.2 million messages, and Yahoo!, whose users sent 304.7 million messages. Hotmail was in sixth position where some 109.5 million junk messages were sent during the same observed period of time.