Sunday, January 30, 2011

Consider the following sentence: "You ate the blueberries because your fingers are stained." What is odd about it is that ordinarily, when we say "X because of Y" we mean "Y is the cause of X". For example, "The window broke because the baseball hit it" means that the baseball hitting the window caused it to break. But in this case, the sentence surely doesn't mean that your fingers being stained caused you to eat the blueberries. Now one might object that it's a weird sentence, and that instead it should be "I believe you ate the blueberries because your fingers are stained." But the original version is not confusing to an English speaker, and people sometimes do speak this way. Language is a complicated business. And language about causality is particularly tricky.

It is well known that correlation does not imply causation. But when scientific studies are reported in the media, this dictum is often forgotten. Professor Jon Mueller at North Central College in Naperville, Illinois has compiled a great set of links to news articles reporting scientific findings. Some of the headlines for these articles suggest causal relationships and some do not. Clicking through to the actual news articles shows that the purported causal relationships are often a stretch, to say the least. For example:

The researchers found children who watched two to four hours of TV were 2.5 times more likely to have high blood pressure compared with those who watched less than two hours of television a day. Those who watched more than 4 hours per day were 3.3 times more likely to have hypertension.

In other words this was an observational study, which established a correlation between watching high amounts of TV per day and having high blood pressure. Contrary to the headline, the study did not show that the TV watching was the cause of the high blood pressure. For convenience let's rework the headline, while preserving its causal sense:

TV watching increases the probability of high blood pressure. (1)

The causal implication can be removed by writing:

TV watchers have higher probability of high blood pressure. (2)

In a wiki entry on causal language Gustavo Lacerda points out that action words often express causality. Note that in the present example, in order to remove the causal aspect of (1), it was necessary to change the verb "watching" into the noun "watchers" and the verb "increases" into the noun "higher".

Interestingly, there is a Bayesian formulation that sounds closer to (1):

Being a TV watcher increases the probability that a child has high blood pressure.

Note that this version has the verb "increases", like (1), but not the verb "watching". Instead it's expressed as "being a TV watcher", which indicates group membership rather than action or choice. It is this information about group membership that is used to update the probability of high blood pressure, following the Bayesian recipe.

Prediction and causality

Prediction can sound a lot like causation. Consider this statement:

If you exercise, you're less likely to have a heart attack. (3)

Does this mean:

People who exercise are less likely than people who don't to have a heart attack. (4)

or does it mean:

The act of exercising reduces your chances of having a heart attack. (5)

It seems quite ambiguous. On the one hand, "if you exercise" sounds like a statement about your choice simply to exercise instead of not exercising, which supports interpretation (5). On the other hand, "if you exercise" identifies you as a person who exercises, and that may predict your risk of heart attack, perhaps due to another behaviour common among people who exercise, such as healthy eating. This would support interpretation (4).

Natural language allows ambiguities. It's convenient to leave things out because everyone knows what we mean. Don't they? Not necessarily. Certainly, when it comes to causality, ambiguity can lead to a mess of trouble. In ordinary speech, the distinction between correlation and causation is often blurred. Statement (3) above is ambiguous about the comparator: less likely than whom to have a heart attack? Less likely than people who don't exercise? Less likely than you would be if you chose not to exercise?

It seems to me that causal language is almost a worst-case scenario. Many people would see the concern as unimportant. And yet evidence and beliefs about causation are at the foundation of any intervention, whether in health care, education, social programs, economics, what have you. The media and politicians routinely use misleading causal language. But it's difficult even when we try to be clear!

Tuesday, January 11, 2011

In an op-ed in today's issue of the Los Angeles Times, Michael Shermer wrote about the rush "to find the deep underlying causes of shocking events", with reference to the shooting in Tucson, Arizona and the recent mass bird deaths.

Shermer made some good points, but parts of his argument were flawed. For example, he cited statistics from the National Institute of Mental Health to argue that unbalanced people are not uncommon, and

Given these statistics, events such as the shooting in Tucson are bound to happen, no matter how nicely politicians talk to one another on the campaign trail or in Congress, no matter how extreme tea party slogans are about killing government programs, and no matter how stiff or loose gun control laws are in this or that state. By chance — and nothing more — there will always be people who do the unthinkable.

In other words, he is pointing out what he sees as an inevitability, and then attributing it to chance. But an inevitability is the opposite of chance: it is a systematic pattern. And a systematic pattern is precisely what we can hope to change. I tend to agree with Shermer that "there will always be people who do the unthinkable." But surely we ought to do what we can to make such occurrences as rare as possible, and to reduce the harms as much as we can.

Shermer finishes his piece as follows:

... as often as not, events in life turn on chance, randomness and statistical probabilities that are largely beyond our control. So calls for "an end to all overt and implied appeals to violence in American politics" — such as that just issued by MoveOn.org — may make us feel better, but they will do nothing to alter the inevitability of such one-off events in the future.

By definition "one-off events" are unpredictable and idiosyncratic. And yet Shermer says they are inevitable. The apparent confusion here is between statistical probabilities that can be used to make fairly certain predictions, and the virtual impossibility of prediction at the micro level. For example, age- and sex-specific incidence rates of different types of cancer are carefully tabulated by the CDC, and we can use these rates to predict the number of people who will be diagnosed with cancer this year. But we can't predict well who those people will be. There are, however, patterns. We learned that smoking causes lung cancer (and heart disease, and emphysema, ....) and through reduced smoking rates we have seen reductions in mortality [pdf]. Perhaps we do have some control after all.