Search form

The Reverend vs Spam

What does spam have to do with a Presbyterian minister who died in 1761? He was also a statistician and formulated a theorem which bears his name, Bayes’ Theorem. Basically it allows us to adjust the probability of an event given new information that might be related to the event. We can change the probability of an email being spam after getting more information by looking at the contents of the email.

Without Bayes’ Theorem we could look at our inbox and see that on an average day we get a certain amount of spam. The probability that any given email is spam would be the number of spam emails divided by the total number of emails. For example if I received 100 emails today and 70 of them were spam then the chance that any random email would be spam is 70/100 = 70%. But without any more information we still would not be able to predict that any given email was more likely spam than the others. The probability is the same for all of them so not helpful.

But what if we have some new information? What if we know that spam emails often have the word FREE in them? Bayesian Spam Filtering will look through the text in each email, if it finds the word FREE then using Bayes’ Theorem the probability of that being a spam email increases. Spam filters are all a little bit different and look at a variety of things to help them make predictions. Most will have a large list of words, with different weighting, that it looks for. Other things might be if the email has a lot of html tags, the image to text ratio is high, or the subject is in ALL CAPS. The filter is also constantly learning and adjusting the weights by looking at what users have marked as spam. In the end it gives each email a probability and if it is above the threshold set in that filter, then it is automatically marked as spam, put in your spam folder and you do not have to deal with it.

Thanks Reverend!

If you are interested in two slightly more detailed examples of Bayes’ Theorem keep reading.

WARNING: MATH BELOW
vvvvvvvvvvvvvvvvvvvvvvvvv

The basic equation is:

P(B|A) * P(A)
P(A|B) = ----------------
P(B)

P(A) and P(B) are the probabilities of A and B by themselves.
P(B|A) is the probability of B if we know A.
P(A|B) is the probability of A if we know B, which is what we are trying to determine.

An intuitive example is if you pull a card from a deck of cards what is the probability that it is a King? We know there are 4 Kings in a deck of 52 cards, so the probability that we drew a King, P(King), is 4/52 or 1/13. Now what if we had some new information about the card? What if we knew that the card we drew was a face card? Now we can use Bayes’ Theorem.

We already know that P(King) = 1/13. There are 12 face cards in a deck of 52 cards, which means that P(Face) = 12/52 or 3/13. The last part is P(Face|King). We know that if we drew a King then it is definitely a Face card, so P(Face|King) = 1. Plugging these parts into Bayes’ Theorem gives us:

1 * 1/13
P(King|Face) = ------------- = 1/13 * 13/3 = 1/3
3/13

Having the extra information about the card changes the probability that it is a King.

A counterintuitive example is testing for a disease. If we know that 1% of people have a disease. The test for the disease is 90% accurate. If you have the test done and it comes back positive what is your chance of having the disease? The intuitive, but wrong, answer is that you have an 90% chance of having the disease. Let’s use Bayes’ Theorem again.

The tricky part on this one is the P(Positive Test). Of the 1% of people who have the disease it will correctly give a positive result 90% of the time. But, of the 99% who do not have the disease it will incorrectly give a positive result 10% of the time. The resulting formula is:

That means that instead of a 90% chance of having the disease, you really only have an 8.33% chance of having it. In this case knowing the accuracy of the test makes a very big difference in how the test results should be interpreted.