Monday, April 21, 2014

A common cond probability puzzle -- Suppose there's a test for HIV (or another virus). If you carry the virus, there's a 99% chance the test will correctly identify it, with 1% chance of false negative (FN). If you aren't a carrier, there's a 95% chance the test will come up clear, with a 5% chance of false positive (FP). To my horror my result comes back positive. Many would immediately assume there a 99% chance I'm infected. The intuition is, like in many probability puzzles, incorrect.

In short Pr(IsCarrier|Positive result) depends on the prevalence of HIV.

Suppose out of 100million people, the prevalence of HIV is X (a number between 0 and 1). This X is related to what I call the "pool distribution", a fixed, fundamental property of the population, to be estimated.

The notations are non-intuitive. I feel a more intuitive perspective is "Does TruePositive dominate FalsePositive or vice versa?" As explained in [[HowToBuildABrain]], if X is very low, then FalsePositive dominates TruePositive, so most of the positive results are false positives.