The answer to your title question is never. However, 3 measures something different than correlation. If you want to measure causation, you could design an experiment to do just that. Most of us have conducted enough experiments to conclude that hitting your thumb with a hammer causes pain; none of us ever tried to measure correlation between the two.
–
François G. Dorais♦Apr 25 '10 at 7:07

I am voting to close, even though I am not a statistician and the two answers so far seem to indicate that it is possible to say something of a mathematical nature on the subject. However, in the end and answer to the question as stated cannot be mathematical in nature. As always, if you disagree, explain why here, so closure can perhaps be staved off (and if the discussion turns long, move it to meta).
–
Harald Hanche-OlsenApr 25 '10 at 12:03

8

I vote against closing, even though the question could have been phrased somewhat more precisely. The importance of our understanding how to perform causal inference cannot be underestimated. Any improvement in our current understanding of this matter could be used to combat many kinds of ills in the world, such as diseases and crime. A very important general question is how to begin with a large amount of observational data about, say, how disease X is caused -- say 10,000 people's health status and auxiliary variables -- and extract the most likely guesses as to the etiology of X.
–
Daniel AsimovApr 25 '10 at 22:26

7

This is a very important question in applied mathematics/statistics. People indeed tried to suggest various mathematical criteria for causation. I find it hard to understad why people regard the question as borderline or even want to close it.
–
Gil KalaiApr 27 '10 at 4:59

7 Answers
7

Many scientific disciplines are such that experiments are impossible. The effects of childhood abuse, for example. Scientists who study such matters are generally very sensitive to the fact that the empirical work they do cannot prove causation in the way that controlled studies can.

What a well-designed correlative study can prove or disprove is that of a long list of proposed explanations for some effect, one is the most likely. For example, you can make a study that determines which of the claims "A causes C" or "B causes C" is more likely. If you do enough of these, against every conceivable other thing that might effect C, then you can reasonably assert that you have evidence that A causes C, but you have to remember that it is a different kind of evidence from in-lab controlled experiments. (Sometimes it is better evidence: lab environments can be poor approximations of the actual world.) ((Yet another kind of evidence would be a proposed mechanism. If you can tell a convincing yarn about why A causes C, which builds on a series of well-established causal relationships, then you may claim you have evidence for your causal assertion. I think that this kind of evidence is generally the weakest, but it is often the one that people like the best, since most people understand the world through stories.))

But it is important to remember that just the existence of a correlation need not have much to do with a causal relationship. Here's one that comes to mind. It is a fact that a history of childhood victimization predicts for (is correlated with) lower adulthood weight (the effect is weak, but statistically significant). However, the best guess for the relationship between childhood victimization and weight is that victimization causes the victim to be overweight in adulthood. This result shows up in the empirical data after you "hold other variables constant". In particular, childhood victimization strongly correlates with tobacco use, and tobacco is known to make people thinner. But if you child-abuse-victimization with weight within either the tobacco-using population or the tobacco-non-using population, you do see a correlation between abuse and obesity. Again, it is always possible that there are other better explanations for this effect, and to test them you go out in the field and measure all your proposed variables. My understanding of the result "child abuse causes obesity" is that it is more likely than any other explanation that people have thought of, in the sense that every variable scientists have thought of to hold constant or covary or all the other things they can do in the statistical models seems to lead to the asserted conclusion.

Learning causal relationships (i.e. a directed acyclic graph where "$A \to B$ means $A$ causes $B$") from observational data is a kind of causal inference. In general, it is not possible. However, the following two conditions, phrased in the terminology of graphical models, are sufficient for models without confounding variables.

Causal Markov assumption: each node is conditionally independent of its non-effects, direct or otherwise, given its direct causes (i.e. parents).

Faithfulness: the only conditional independencies in the true distribution arise from d-separation in the true causal DAG.

Judea Pearl, Clark Glymour, and Peter Spirtes have done excellent work in this area. Since you are asking your question on MO, you are presumably interested in mathematically-oriented discussion, such as the one in Koller and Friedman's Probabilistic Graphical Models.

I guess this answer shows I got a little to far in my last paragraph.
–
Benoît KloecknerApr 25 '10 at 8:11

1

When I see this approach, I always wonder whether arrow directions in a Bayesian Network have any bearing on the direction of causality. After all, you can take any tree Bayesian Network, pick any node as root, and re-orient all the edges to point away from that root to get the same model with different causality semantics
–
Yaroslav BulatovAug 15 '10 at 0:10

See Dawid's "Beware of the DAG!" for an elaboration of this criticism
–
Yaroslav BulatovAug 15 '10 at 0:28

First, 1) is certainly not necessary, since correlation can be 0 while Y is a function of X. Indeed, correlation detects only affine relations between X and Y; see wikipedia http://en.wikipedia.org/wiki/Correlation_and_dependence for examples and a discussion on the relation between correlation and dependence.

Second, the classical example to have in mind when adressing this question is the case when there is a hidden variable Z, whose value imply (in the sense of causality) that of X and Y. Then, it can happen that X and Y are entirely dependent (although they could very well be independent) but neither X imply Y nor Y imply X.

Also, note that logical implication need not follow from causality (if you have a wet umbrella in your hand, I can deduce that it rained, however it is not your umbrella that caused the rain).

Last, I would say that causality seems not to be a mathematical matter, not at least in the domain of probability and statistics. To establish causality, you usually need some knowledge of how the world works.

When you don't have knowledge about the causality, people go to statistics asking for help. Then it probably becomes a mathematical matter. Especially, to figure out the causality, we have to do "actions" and estimate the model from the intervened dataset instead of the original observed dataset. How to estimate the true distribution from the intervened data without any bias is an important mathematical task.
–
pacificmothApr 25 '10 at 18:54

You might want to look at tests for Granger causality, which formalise the idea that "X causes Y if X occurs before and is correlated with Y". The tests suffer from the same problem mentioned elsewhere: they can't rule out a third event or process Z which causes both X and Y.

Correlation (assuming its existence is correctly detected) does imply shared dependency on one or more other variables. From enough data you can also detect whether $Y$ is (close enough to being) a function of $X$ alone, and absence of causation, where some sets of variables are independent of others. Beyond that you need to consider specific mechanical models of what-causes-what and the data available may or may not be enough to answer that. As others mentioned there are formal analyses of this problem by people doing statistics and machine learning, e.g., in Pearl's book.

One approach is to say that X causes Y if the shortest description (Kolmogorov complexity) of P(X,Y) consists of separate description of P(X) and a deterministic function f:X->Y. Since Kolmogorov complexity is uncomputable, it's replaced by more practical notions of description length. See Inferring deterministic causal relations from UAI'10 for an example