Monday, June 24, 2013

Let's say you go to your favorite local bar, and do what most people do in bars: propose marriage to anyone of the opposite sex (that you find attractive enough, whatever that means to you).

When everything's said and done, there's only two ways it can go: Yes (success) or No (failure).
Furthermore, let us, for the sake of argument, assume that you only approach those you believe will say yes (perhaps to save yourself some embarrassment, although the reason doesn't matter).

Let's also say you'd like to keep track of your success rate, so you count the number of people you've approached, as well as the number of "Yes" you got (you don't stop as soon as you get a Yes, perhaps because you want the best Yes possible).

The hit rate is the proportion of success you've had:

Hit Rate = (Number of Yes) / (Number of Yes + Number of No) = (Number of Yes) / (Number of People Asked)

This seems like a good measure for success, except you've forgot something crucial: there's two ways things can go wrong:

You asked someone and got a No (that's the above situation).

You didn't asked someone that would have said Yes.

When you ask someone and they say Yes, that's called a true positive. When you ask someone and they say No, it's a false positive: positive because you thought they would say Yes, and false because you were wrong.

When you don't ask someone (because you think they will say No), and they would have said no, it's called a true negative. If, however, you never asked but they would have said Yes, that's a missed opportunity, a false negative: negative because you thought they would say No, false because you were wrong.

It may seem strange to talk about what would happen if we did something we never did, but the context in which this is performed in science is usually an instrument that is supposed to detect something. For example, if we do a test for a certain disease, a "positive" means that the diseases is there, and a "negative" means that the disease is not there (which to call "positive" and "negative" is largely a matter of definition, it is just a binary classification). We could thus verify the correctness of the "guesses" by using samples for which we know the true state (or by repeating the experiment multiple times, assuming we know the probability of success is more than 50%).

If the experiment succeeds it's a "true", otherwise "false". We can visualize the four options: The instrument (or human) "guesses", and the classification happens according to that guess and the real world state.

The danger in situations like the marriage proposals is that we don't observe the Outcome of the Predicted No, since we never experience that alternate reality. Thus the True Negatives and False Negatives are hidden from view, and we can't measure the proportion of missed opportunities:

This measure is interesting in our bar example but usually not used in other contexts. In general, a table of confusion can be used to list all prediction/outcomes. What should be clear is that the Hit Rate is not necessarily the best measure of success in general, as it doesn't account for False Positives.

Likewise, an instrument that is good at detecting a disease has few False Negatives, but what about the amount of people that are healthy but the instrument incorrectly classifies as having the disease (False Positives)? This is usually less of a priority in this case, since it is seems less bad to classify healthy people as sick than sick people as healthy.

Whether it is the False Positives or False Negatives we would like to detect best depends on how we describe the problem: in this case we call healthy people "negatives" and sick people "positives" (it thus has nothing to do with having a "positive" or "negative" outlook on things!).

It is usually a matter of prioritizing: if the machine can aggressively detect traces of a disease, it may also incorrectly find patterns that are in fact not due to the disease. Put differently, the machine is sensitive, meaning that has a high hit rate.

The "opposite" of sensitivity (hit rate) is specificity: how many healthy people that are classified as being healthy.