Monday, March 31, 2008

Following up on Part I, we now look at a recent empirical study whose discussion provides what I consider a particularly thoughtful exposition on causal inference with longitudinal survey data.

As someone who studies adolescent drinking and how it may be affected by parental socialization, I took notice of an article by Stephanie Madon and colleagues entitled “Self-Fulfilling Prophecy Effects of Mothers’ Beliefs on Children’s Alcohol Use…” (Journal of Personality and Social Psychology, 2006, vol. 90, pp. 911-926).

The substantive findings of longitudinal relations between mothers’ beliefs at earlier waves and children’s drinking at later waves were interesting. However, what really stood out to me was the thoughtful discussion near the end of the article about which criteria for causality the longitudinal correlational design could and could not satisfy, and what other lines of argument could be marshaled on behalf of causal inferences. Within the larger Discussion section, Madon and colleagues’ consideration of causal inference appears under the heading “Interpreting Results From Naturalistic Studies.”

Madon et al.’s discussion notes the major thing their design does: “…rule out the possibility that the dependent variable exerted a causal influence on the predictor variable because measurement of the predictor variable is temporally antecedent to changes in the dependent variable.” It also acknowledges the major limitation of the design, possible unmeasured third variables: “…the potential omission of a valid predictor raises the possibility that mothers based their beliefs on a valid predictor of adolescent alcohol use that was not included in the model. If this were to occur,… [mothers’] self-fulfilling effects would be smaller than reported.”

The authors then proffer “several reasons why we believe that the self-fulfilling prophecy interpretation is more compelling” than a third-variable account.

First, Madon and colleagues claim, their preferred interpretation is consistent with established experimental findings of self-fulfilling prophecies. “Although this convergence does not prove that our results reflect self-fulfilling prophecies, confidence in the validity of a general conclusion increases when naturalistic and experimental studies yield parallel findings.”

Second, the authors cite the breadth of “theoretically and empirically supported predictors” they used as control variables. This step, they claim, reduces the chance of spurious, third-variable causation.

Third, the authors note that maternal beliefs exerted greater predictive power over time, whereas the non-maternal predictors explained less variance over time. The extension of this argument, as I understand it, would then be to ask, How likely is it that any as-yet-unmeasured, potential third variable tracks perfectly with maternal beliefs and not at all with the predictors that served as control variables?

Both by their empirical findings and their citation of the broader adolescent drinking (in terms of their control variables) and self-fulfilling prophecy (for its experimental foundation) literatures, Madon and colleagues provide a great deal of evidence that is consistent with a parenting effect.

Depending on one’s perspective, to use a football analogy, one might conclude that the Madon et al. study has advanced the ball to the opponents’ 40-yard line, or 20-yard line, or even 1-yard line. But most would probably agree that it takes a true experiment to get the ball into the end zone.

Wednesday, March 26, 2008

For more than a decade, I've taught a course on structural equation modeling (SEM). The technique, nearly always used with survey data (although in theory also applicable to experimental data), involves drawing shapes to represent variables and arrows between the shapes to represent the researcher's proposed flow of causation between the variables. Regression-type path coefficients are then generated to assess the strength of the hypothesized relations. Though the unidirectional arrows in the diagrams and the tone often used in reports of SEM analyses (e.g., "affected," "influenced," "led to") imply causality, such an inference cannot, of course, be supported with non-experimental data. This is especially true of cross-sectional data (i.e., where all variables are measured concurrently). As will be discussed later, causal inferences can be supported to a greater degree -- though not completely -- with longitudinal data.

It is in this context that I read the chapter on SEM in Judea Pearl's (2000) book Causality: Models, Reasoning, and Inference. Pearl, a UCLA professor with expertise in artificial intelligence, logic, and statistics, writes at a level that is, quite frankly, well over my head. I did find his SEM chapter relatively accessible, however, so that I will discuss.

Pearl's apparent thesis in this chapter is that contempory SEM practitioners are too quick to dismiss the possibility of being able to draw causality from the technique (much like I did in my opening paragraph). Early on, in fact, Pearl writes that, "This chapter is written with the ambitious goal of reinstating the causal interpretation of SEM" (p. 133).

Pearl reviews a number of writings on SEM that he feels, "...bespeak an alarming tendency among economists and social scientists to view a structural equation as an algebraic object that carries functional and statistical assumptions but is void of causal content" (p. 137). He further notes:

The founders of SEM had an entirely different conception of structures and models. Wright (1923, p. 240) declared that "prior knowledge of the causal relations is assumed as prerequisite" in the theory of path coefficients, and Haavelmo (1943) explicitly interpreted each structural equation as a statement about a hypothetical controlled experiment. Likewise, Marschak (1950), Koopmans (1953), and Simon (1953) stated that the purpose of postulating a structure behind the probability distribution is to cope with the hypothetical changes that can be brought about by policy.

An interpretation of the above paragraph that would make the causal interpretation of SEM defensible, in my view, would be as follows: If the directional relations one is modeling with non-experimental (survey) data have previously been demonstrated through experimentation, then SEM can be a useful tool for estimating quantitatively how much of an effect a policy change could have on some outcome criterion. It's when we start talking about "assumed" or "hypothetical" experimental support that things get dicey.

As I noted above, however, contemporary SEM practitioners would probably be more comfortable with suggestions of causation if the data were collected longitudinally (more specifically, with a panel design, in which the same respondents are tracked over time). Of the three major criteria for demonstrating causality, longitudinal studies are clearly capable of demonstrating correlation and time-ordering; provided that the most plausible "third variable" candidates are measured and controlled for, the approximation to causality should be good (these latter two linked documents are from my research methods lecture notes).

MacCallum and colleagues (1993, Psychological Bulletin, 114, 185-199), while acknowledging some limitations to longitudinal studies, succinctly summarize why they're useful: "When variables are measured at different points in time, equivalent models in which effects move backward in time are often not meaningful" (p. 197).

In the upcoming Part II, we highlight a recent empirical study whose discussion provides a particularly thoughtful exposition on causal inference with longitudinal survey data.

Friday, March 7, 2008

As is taught in beginning research methods courses, the only technique that allows for causal inference is the true experiment and the linchpin of experimentation that permits such inference is random assignment. In the social and behavioral sciences, experiments typically are conducted in university laboratories, with established subject pools to ensure the availability of participants.

Scientists lacking such resources may thus have a hard time conducting experiments, even if they wanted to. Others may simply develop a preference for survey research or other non-experimental methods such as archival research and content analysis; such techniques generally cannot be used to assess causality, but offer potential advantages in terms of mapping onto more natural, realistic situations encountered in daily life. I myself, for whatever reason, gravitated to survey research over the years, even though laboratory experimentation was a major part of my graduate-school experience.

Now, however, even scholars lacking an affinity for the lab may have opportunities to address causation in their research. In what appears to be a growing trend, clever researchers are noticing random assignments in real-world settings and seizing upon them to conduct causal studies from afar.

As one example, University of Durham anthropologists Russell Hill and Robert Barton realized that in Olympic "combat" sports such as boxing and wrestling, competitors are assigned at random to wear either red or blue outfits. The finding that red-clad participants won more often than did their blue counterparts can thus be interpreted causally (a showing that outfit color has some causal effect does not necessarily mean that it's a large effect).

As another example, readers of the 2005 book Freakonomics (by Steven Levitt and Stephen Dubner) may remember Levitt and colleagues' drawing upon the Chicago Public Schools' use of random assignment in the district's school-choice program when more students wanted to go to a particular school than could be accommodated. As Levitt and Dubner wrote (p. 158):

In the interest of fairness [to applicants of the most competitive schools], the CPS resorted to a lottery. For a researcher, this is a remarkable boon. A behavioral scientist could hardly design a better experiment in his laboratory...

The random aspect thus allowed the researchers to make the causal conclusion that:

...the students who won the lottery and went to a "better" school did no better than equivalent students who lost the lottery and were left behind.

In yet another example, Levitt's fellow University of Chicago faculty member, law professor Cass Sunstein, seized upon the fact that appellate cases within the federal circuit courts are heard by random three-judge panels from the larger pool of judges within each geographic circuit. Does a judge appointed by a Democratic (or Republican) president show variation in how often he or she votes on the bench in a liberal or conservative direction, depending on whether he or she is joined in a case by judges appointed by presidents of the same or opposing party? That is the type of question Sunstein and his colleagues can answer (here and here).

Yale law professor (and econometrician) Ian Ayres, whose 2007 book Supercrunchers I recently finished reading, refers to this phenomenon as "piggyback[ing] on pre-existing randomization" (p. 72). The book contains additional examples of its use.