Saturday, February 10, 2018

Probabilities of one-time events don't have error margins

Quanta Magazine's puzzle columnist Pradeep Mutalík wrote an amusing and sensible text When Probability Meets Real Life with three probability puzzles. They're a bit more ordinary and less controversial than the Sleeping Beauty or the Monty Hall Problem. But they touch some general principles, too.

He says that scientifically inclined people often try to apply probabilistic reasoning in their lives. It's not perfect but it may be helpful.

In the first problem, Mutalik shows that Bayesian, perceived probabilities often change as new evidence arrives. Someone sadly fell out of an airplane. The probability of death is 90%. Fortunately, he had a parachute. The probability of death is 5%. Sadly, it didn't open. 99.9%. Happily, there was a haystack directly below him. 40%. But there was also a pitchfork there. 99.99%. Happily, he avoided the pitchfork. 40%. But he avoided the haystack, too. 99.999%. ;-) You may give better numbers.

In Bayesian inference, updating is the key process, as he correctly writes. He also mentioned:

If you always rely on the most reliable and objective “data-driven” probability estimates, keeping track of possible uncertainties, the final probability number you arrive at will be the best possible.

The latest probability estimate is the best one. However, we should add: So far. But if the event already took place and the truth value of the proposition was decided and you learned about the outcome, the only correct probabilities are 0% and 100%. Hard evidence trumps any vague previous calculations! On the other hand, if it hasn't been decided yet, no probability estimate can be considered the final. There may always come another piece of evidence in the future!

When you consider the pitchforks beneath the guy who fell from the airplane, you are dealing with sort of ad hoc, surprising twists. In such contexts, "keeping track of possible uncertainties" is very hard and sometimes impossible. One should still try to know or imagine a maximum amount of possible twists such as the pitchforks but some twists may still remain surprises and unpredictable.

But even when the new evidence is "less surprising" in character than pitchforks, one can't really know how much the new evidence may change the Bayesian probability. A sufficient amount of clear evidence may turn a near-0% probability to a near-100% one or vice versa.

In particle physics and other parts of science, experimenters may measure a probability, e.g. that a Higgs boson produced at the LHC decays to two photons, and this probability may be quoted with the error margin. But it's because they really measure a function of the parameters in the laws of physics and the event is almost perfectly repeatable. We have a controlled experiments that may be assumed to have the same initial conditions at all times.

So in this case, the laws of physics imply some particular value of a probability that may be in principle measured – if you have precise enough apparatuses and run them very many (ideally "infinitely many") times. The two conditions are needed to suppress the systematic and statistical errors, respectively.

But when you discuss uncontrolled experiments or one-time events where you don't know something about the initial conditions or the environment (which is also an assumption, and therefore some kind of initial or boundary data), such as the example of the particular man who fell out of the airplane, it is not possible to calculate the precise probability from the laws of physics – because the task isn't precisely well-defined (due to the ignorance about the initial conditions). So the precise number quantifying the probability doesn't really exist. You just can't find a precise answer to an imprecise question! Your Bayesian probabilities are subjective estimates of a probability that can't be calculated precisely, not even in principle!

So we often rightfully demand error margins accompanying measurements and estimates because it's scientific. But I believe that Bayesian probabilities of uncontrollable events – in which the initial or boundary conditions i.e. the definition of the problem isn't perfectly specified – shouldn't be required to have error margins because they can't have a particular one. Well, maybe I should weaken the statement a bit. We should still try to quantify the probability that surprises will change the probability and how much. But we should acknowledge that there's no canonical and precise way of doing so – there's no canonical quantitative method to deal with a game whose rules are unknown.

Car or airplane

In the second problem, Mutalik tells you to assume that car crashes and airplane crashes are the only cause of death. You live for millions of years without them. Now, what is your expectancy if you annually take the 10,000 miles by car or by airplane? The rate of death is 0.2 or 150 deaths per 10 billion flight miles or vehicle miles, respectively. The airplane is 750 times safer in this counting. (I think that this huge ratio is misleading because the rates are given in vehicle-miles and aircraft-miles and many more people die when an average airplane crashes. So the airplane is not this much safer. But I will ignore that and assume everyone drives or pilots his own car or airplane.)

Well, so the probability of survival decreases exponentially because of the risks. If you invert the numbers above, you see that the life expectancy is 50 billion miles and 10/150=0.067 billion miles, respectively. Divide it by 10,000 miles per year and you will get the life expectancy of the traveling man as 5 million years and 6,700 years. So the probability of death in 1 million years is close to some 20% with the airplane choice – but the probability of survival for 1 million years in cars is comparable to exp(-1 million / 6700) = exp(-150). It's negligible. You almost certainly die within a period of time that is not much longer than those 6,700 years – a lifetime that is as long as one million years is too much to dream about if you use cars.

Ethnicity of a sample: trust population ratios or unreliable tests?

In the third problem, variation A, Mutalik tells you about a country with 80% French and 20% Arabs (Mutalik calls them ethnic groups "one" and "two" so we had to solve this sub-puzzle first) – the country is probably known as France (they love to oversimplify a lot, except for the -aioux at the end of the words). The ethnic groups have the same rate of a rare disease. 80 French and 20 Arabs are sampled. One sample is found to be positive (the person is ill) but the ethnicity has to be kept in secret. Someone privately runs an ethnicity test on this test that is 75% reliable and it says that the person is Arab. What is the probability that the ill person is actually Arab?

By Bayesian inference, the random person was 80% likely to be French and 20% to be Arab before the ethnicity test, i.e. 4-to-1 odds, the French is the more likely answer. The ethnic test increases the probability that the person is Arab. But if I interpret the "success rate" correctly, it says that if the actual ethnicity is French, there are 75%-and-25% odds that the test will say French-or-Arab, and vice versa.

We know that the disease is spread uniformly so it's a fact that the probability that the only sufferer was French was 80%. Inside this 80%, the composite probability is (3/4 and 1/4 i.e.) 60% and 20% that he's French and the test says that he's French or he's Arab, respectively. Inside the remaining 20% probability where he's Arab, the Arab and "test says Arab vs French" have 15% and 5% probability, respectively.

The test said he was an Arab – which means that we're either in the 20% group or the 15% group as described in the previous paragraph. So this 20%+15% piece of pie (35%) becomes our main pie. We want to calculate the conditional probability that the person is Arab given the Arab result of the unreliable test and we see that this conditional probability is therefore 15/35=3/7=43% or so. Did I get it correctly? So the answer is below 50% – I think that the problem was created with the lesson that "you shouldn't believe unreliable tests too much" because the answer obtained by trusting the test is below 50% likely to be correct here.

Variation B involves France of 2040. Yes, French and Arabs are 50% of the population. But you still have 80 French samples and 20 Arab samples. One positive result. Again, an unreliable 75% test that says "Arab". It's a very similar problem and the solution could be exactly the same. Is it the same? I leave it to you.