Criteria for Causality

The isolation of cause and effect in controlled experiments is relatively easy. For example, a headache medicine was administered to a sample of subjects who were having headaches. A placebo was administered to another group with headaches (who were statistically not different from the first group). If after a certain time of taking the headache medicine and the placebo, the headaches of the first group were reduced or disappeared, while headaches persisted among the second group, then the curing effect of the headache medicine is clear.

For analysis with observational data, the task is much more difficult. Researchers (e.g., Babbie, 1986) have identified three criteria:

The first requirement in a causal relationship between two variables is that the cause precede the effect in time or as shown clearly in logic.

The second requirement in a causal relationship is that the two variables be empirically correlated with one another.

The third requirement for a causal relationship is that the observed empirical correlation between two variables be not the result of a spurious relationship.

The first and second requirements simply state that in addition to empirical correlation, the relationship has to be examined in terms of sequence of occurrence or deductive logic. Correlation is a statistical tool and could be misused without the guidance of a logic system. For instance, it is possible to correlate the outcome of a Super Bowl (National Football League versus American Football League) to some interesting artifacts such as fashion (length of skirt, popular color , and so forth) and weather. However, logic tells us that coincidence or spurious association cannot substantiate causation.

The third requirement is a difficult one. There are several types of spurious relationships, as Figure 3.8 shows, and sometimes it may be a formidable task to show that the observed correlation is not due to a spurious relationship. For this reason, it is much more difficult to prove causality in observational data than in experimental data. Nonetheless, examining for spurious relationships is necessary for scientific reasoning; as a result, findings from the data will be of higher quality.

Figure 3.8. Spurious Relationships

In Figure 3.8, case A is the typical spurious relationship between X and Y in which X and Y have a common cause Z . Case B is a case of the intervening variable, in which the real cause of Y is an intervening variable Z instead of X . In the strict sense, X is not a direct cause of Y . However, since X causes Z and Z in turn causes Y , one could claim causality if the sequence is not too indirect. Case C is similar to case A. However, instead of X and Y having a common cause as in case B, both X and Y are indicators (operational definitions) of the same concept C . It is logical that there is a correlation between them, but causality should not be inferred.

An example of the spurious causal relationship due to two indicators measuring the same concept is Halstead's (1977) software science formula for program length:

where

N = estimated program length

n 1 = number of unique operators

n 2 = number of unique operands

Researchers have reported high correlations between actual program length (actual lines of code count) and the predicted length based on the formula, sometimes as high as 0.95 (Fitzsimmons and Love, 1978). However, as Card and Agresti (1987) show, both the formula and actual program length are functions of n 1 and n 2 , so correlation exists by definition. In other words, both the formula and the actual lines of code counts are operational measurements of the concept of program length. One has to conduct an actual n 1 and n 2 count for the formula to work. However, n 1 and n 2 counts are not available until the program is complete or almost complete. Therefore, the relationship is not a cause-and-effect relationship and the usefulness of the formula's predictability is limited.