probability

I understand inferential statistics reasonably well but I’m a rookie at Bayesian statistics. What’s the difference? Well, I’m glad you asked.

Inferential statistics allow us to infer some conclusion. Let’s say we want to study a hypothesis that Scientific American put forth last year: Men who do more housework have less sex. That’s an interesting thought and one that men – especially married men – might want to investigate.

Where do we start? First, we need to define the population. Let’s say we’re only interested in men in America. So, the hypothesis becomes:

Men in America who do more housework have less sex.

Now, all we have to do is interview all the men in America about the chores they do and how often they have sex. There are roughly 147 million men in America so this is going to take some time. Indeed, it will take so long that our attitudes toward sex and housework might change in the interim. The study would be useless.

We need a faster way. We might take a random sample of men and interview them. We gather the data and find that the hypothesis is true in the sample. The men we interviewed who did more chores also had less sex.

Now we have to ask ourselves, what’s the probability that the sample accurately represents the population? We found an inverse correlation (more housework, less sex) in the sample. Can we infer that this is also true in the population?

To do this, we need to play with probabilities. We randomly selected the men in the sample, which means the sample probably represents the population (but maybe not). Generally speaking, larger random samples are more likely to accurately represent the population.

We can also calculate the probability that the correlation found in the sample also exists in the population. By general agreement, if it’s more than 95% probable (less than 5% improbable), we declare that the finding is statistically significant. In other words, we believe that the finding is real and not caused by errors in the way we chose the sample. We infer that it exists in the population as well as the sample.

Two things to note here:

Statistical significance has to do with probability, not size. It’s not the same as saying, “Tom is significantly smarter than Joe”. A statistically significant difference may be quite small.

Five per cent of the time – one time out of twenty – the finding is flat out wrong. Yikes! The five per cent threshold is generally used only in the social sciences. In the medical sciences, we normally use a one per cent or one-tenth of one per cent threshold to declare statistical significance.

Note that our finding – which is called a frequentist probability — represents a point in time. It’s essentially a snap shot. We noted that, if our study takes too long, conditions might change and invalidate the study. Indeed the Scientific American study cites data collected from 1992 to 1994. Perhaps conditions have changed since then. So how accurate is this?

That’s where Bayesian statistics come in to play. They allow us to add in new information as it becomes available. Let’s say, for instance, that women’s attitudes toward men who do house work evolve over time. We could factor in the new information and recalculate the probabilities.

Bayesian statistics are complicated and hard to compute. We’ve only been using them widely in the recent past as more powerful computers have become available. Still, they can help us work out complex problems that used to be way beyond our capabilities.

I’ll write more about Bayesian probabilities as I learn more. In the meantime, I’m going to sell the vacuum cleaner.

Last week a man was swallowed by a sinkhole while sleeping in Florida. This week, I’m more worried about sinkholes in Florida than I am about driving on icy roads in Colorado. Is that logical?

It’s not logical but it’s very real. Sometimes a story is so vivid, so unexpected, so emotionally fraught, and so available that it dominates our thinking. Even though it’s extremely unlikely, it becomes possible, maybe even probable in our imaginations. As Daniel Kahneman points out, “The world in our heads is not a precise replica of reality.”

What makes a phenomenon more real in our heads than it is in reality? Several things. It’s vivid — it creates a very clear image in our mind. It’s creepy — the vivid image is unpleasant and scary. It’s a “bad” death as opposed to a “good” death. We read about deaths every day. When we read about a kindly old nonagenarian dying peacefully after a lifetime of doing good works, it seems natural and honorable. It’s a good death. When we read about someone killed in the prime of life in bizarre or malevolent circumstances, it’s a “bad” death. A bad death is much more vivid than a good death.

But what really makes an image dominate our minds is availability. How easy is it to bring an instance to mind? If the thought is readily available to us, we deem it to be likely. What’s readily available? Anything that’s in the popular media and the topic of discussion with friends and colleagues. If your colleagues around the water cooler say, “Hey, did you hear about the guy in the sinkhole?” you’ve already begun to blow it out of proportion.

Availability can also compound itself in what Kahneman calls an availability cascade. The story itself becomes the story. Suppose that a suspicious compound — let’s call it chenesium — is found in the food supply. Someone writes that chenesium causes cancer in rats when administered in huge doses. Plus, it’s a vividly scary form of cancer — it affects the eyeballs and makes you look like a zombie. People start writing letters to the editor about the grave danger. Now it’s in the media. People start marching on state capitals, demanding action. The media write about the marching. People read about the marching and assume that where there’s smoke, there’s fire. The Surgeon General issues a statement saying the danger is minimal. But the populace — now worked into a frenzy — denounce her as a lackey of chenesium producers. Note that the media is no longer writing about chenesium. Rather, they’re writing about the controversy surrounding chenesium. The story keeps growing because it’s a good story. It’s a perfect storm.

So, what to do? Unfortunately, facts don’t matter a whole lot by this point. As Kahneman notes (quoting Jonathan Haidt), “The emotional tail wags the rational dog.” The only thing to do is to let it play out … sooner or later, another controversy will arise to take chenesium’s place.

At the personal level, we can spare ourselves a lot of worry by pondering the availability bias and remembering that facts do matter. We can look up the probability of succumbing to a sinkhole. If we do, we’ll realize that the danger is vanishingly small. There’s nothing to worry about. Still, I’m not going to Florida anytime soon.