Sunday, December 28, 2008

The December 2008 issue of Psychological Methods includes an article by Joseph Schafer and Joseph Kang, entitled, "Average Causal Effects From Nonrandomized Studies: A Practical Guide and Simulated Example" (abstract).

The article addresses the seemingly age-old issue of how best to approximate a causal inference when participants' levels of the purported "causal" variable have not been assigned at random. Although the article contains a fair amount of jargon and technical formulas, the key foundation appears to boil down to the following quote:

In a typical observational study... it is unlikely that [treatment condition] will be independent of [individuals' outcome scores]. The treatments may have been selected by the individuals themselves, for reasons that are possibly related to the outcomes. With observational data, a good estimate of the [average causal effect] will make use of the covariates... to help account for this dependence (p. 281).

Via a large simulation study, Schafer and Kang compare nine different approaches for covariate-based adjustment, including analysis of covariance (ANCOVA), regression, matching, propensity scores, and weighting schemes. Near the end of the article, the authors present a section entitled "Lessons Learned," containing practical recommendations.

As social-science research articles continue to use increasingly sophisticated statistical and analytical methods, Schafer and Kang's article should be a useful resource for researchers looking to remain current with state-of-the-art approaches.

Sunday, December 21, 2008

Today's Los Angeles Times has an article on the failure of several recent randomized clinical trials to show benefits of taking vitamin supplements for disease prevention. The article provides some good explanation for the layperson on the differences between experimental (i.e., RCT) and observational (i.e., correlational) studies for being able to make causal inferences, in my view. It also discusses some problems, specific to vitamin research, that complicate the interpretation of experimental/RCT studies, even though they're designed to hold extraneous factors constant.

First, here are some examples of the article's discussion of experimental vs. observational research:

Randomized clinical trials are designed to test one factor at a time, but vitamins and minerals consumed as part of a healthy diet work in concert with each other.

"You don't eat a food that just has beta carotene in it," said Dr. Mary L. Hardy, medical director of the Simms/Mann UCLA Center for Integrative Oncology...

Also:

"The observational studies that originally linked vitamins to better health may have been biased because people who take supplements are often healthier overall than people who don't.

"They tend to be more physically active, better educated, eat better diets, and tend not to be smokers," [the NIH's Paul] Coates said. "So you can't say for certain it's your item of interest that causes" the health benefit.

The issue that's specific to vitamin studies is as follows, focusing on the need for a "placebo" or "control" group that does not receive the treatment being tested:

For several reasons, researchers say, vitamins don't lend themselves to randomized controlled trials. Chief among them is that there is no true placebo group when it comes to vitamins and minerals, because everyone gets some in their diet."For drugs, someone either has [the anti-impotence drug] Cialis in their system or he doesn't have Cialis," said Paul Coates, director of the NIH Office of Dietary Supplements in Bethesda, Md. But with vitamins, "there's a baseline exposure that needs to be taken into account. It makes the challenge of seeing an improvement more difficult."My (Alan's) initial reaction to this statement was that extraneous exposure to vitamins from food consumption should not necessarily compromise researchers' ability to make a causal inference about the vitamin supplements. The experimental group would be ingesting vitamin pills plus vitamins in food, whereas the control group would be ingesting placebo (sugar) pills plus vitamins in food. Assuming the food intake to be similar in the experimental and control groups -- which is what random assignment to groups is supposed to buy us -- then the only difference between the groups should be the vitamin vs. placebo pills.

However, there's yet another complication:A vitamin's benefit may become apparent only if people aren't getting enough of it. That could explain why vitamin D has been linked to reduced rates of heart disease, cancer and diabetes."Most people are vitamin-D-deficient, and that's not true for vitamin E," [the USDA's Jeffrey] Blumberg said.

Wednesday, December 3, 2008

The following Call for Papers appeared on the SEMNET (Structural Equation Modeling) discussion list:CAUSALITY IN THE SCIENCES

A volume of papers on causality across the sciences edited by Phyllis McKayIllari, Federica Russo and Jon Williamson under contract with Oxford University Press.

http://www.kent.ac.uk/secl/philosophy/jw/2009/cits/This book will contain original research papers that deal with causality and causal inference in the various sciences and with general questions concerning the relationship between causality, probability and mechanisms. Some chapters will be invited contributions; others will be submitted to a call for papers. All papers will be subject to a reviewing process.

1st November 2009: notification of acceptance of papers for publication.

1st December 2009: deadline for final version of papers accepted for publication.

THE VOLUME

The volume will run to about 600 pages and will be subdivided into the following parts:

Introduction

Health Sciences

Social Sciences

Natural Sciences

Psychology and Neurosciences

Computer science and statistics

Causality, probability and mechanisms

ORGANISATION

This volume is organised by the Centre for Reasoning at the University of Kent. It is associated with the Causality in the Sciences series of conferences, and with the research project Mechanisms and Causality funded by the Leverhulme Trust.

Tuesday, November 25, 2008

A collaborative effort of the individuals named below, summarized by Alan...

Jackie Wiersma, who successfully defended her Ph.D. research at Texas Tech University a few months ago, just had her officially approved dissertation posted in the university's online repository (click here to read the dissertation). Alan and Bo each served on Jackie's committee (along with Chairperson Judith Fischer and Kitty Harris), Bo via speakerphone given his move from Texas Tech to Penn State a couple years ago.

Jackie used longitudinal data, which can be of some help in strengthening one's argument for particular directions of causality (see here and here). To strengthen some of Jackie's arguments further, however, her committee recommended the use of a technique known as propensity scores (Bo, in particular, played a key role by finding online resources such as this one on the technique).

Jackie's study of adolescent and young-adult drinking involved a number of hypotheses. For simplicitly of presentation, the remainder of this entry focuses on one that predicted an individual's level of drinking as an adolescent would be associated with his or her romantic partner's drinking when the focal individual was a young adult. In other words, would being a drinker as an adolescent propel someone to select a relatively heavy drinker for a romantic partner in the future?

The key predictor variable -- in this case, adolescents' own drinking status -- is discussed in the dissertation analogously to being "assigned" to a "treatment" condition, even though such drinking status is measured as it occurs naturalistically (known in epidemiology as an "observational" variable). The main ideas involving the propensity scores are discussed in the following excerpts from Jackie's dissertation:

For the selection hypothesis, the first prediction was that assignment to group (adolescent drinker and nondrinker) would be related to partner drinking in young adulthood. Thus, it is important to take into account the possible covariates of the assignment to adolescent drinking (p. 56)...

Within the proposed hypotheses, the groups (drinkers and nondrinkers) should show differences in the differentiating variables, thus, this study examined relevant background information (e.g., parental drinking, sensation seeking, peer drinking, college enrollment) that might plausibly affect the group with which individuals are identified. A propensity score is a measure from 0 to 1 of the likelihood of being in one of two designated groups. In this study, a score of 0 means a high probability of a person being a drinker and a 1 score means a person is a nondrinker. These scores were created in SAS using logistic regressions to predict "drinker" versus "nondrinker" (p. 57).

There are many approaches that are used for propensity score matching to adjust for group differences. For this study, the stratifying propensity scores approach was used... After this step, implementing regression models, one can compare the drinker group to the nondrinker group without worrying about the impact of any baseline differences on selection into the groups (Lowe, 2003) (p. 57).

One limitation of the way propensity scores were implemented here was that, in trying to create drinker and non-drinker groups that did not differ on the other covariates, a substantial loss of cases occurred. The following is a hypothetical example (which may have approximated what actually happened). Finding a large number of adolescent drinkers who had high values on traditional predictors of adolescent drinking (e.g., peer drinking) is easy; finding non-drinkers with similarly high levels on the traditional predictors is not. Thus, the number of participants in the former group would have to be shaved down to match the cell size for the latter group. For this reason, the propensity-score analyses were de-emphasized in Jackie's dissertation, in favor of controlling for covariates by entering them as individual predictors in regression analyses.

The fact that propensity scores did not work out in this particular instance should not be taken to disparage the technique. Rubin (1997), in an introduction to propensity scores for the non-expert, discusses the general superiority of propensity scores to ordinary regression as a way of controlling for potentially confounding variables. Indeed, Jackie's attempts to use propensity scores should be considered a learning experience for all involved with her research.

References

Lowe, E. D. (2003). Identity, activity, and the well-being of adolescents and youths: Lessons from young people in a Micronesian society. Culture, Medicine and Psychiatry, 27, 187-219.

Saturday, August 2, 2008

The essence of designing research to permit causal inference is that the investigator can arrange things so that a single element (independent variable) can be isolated that distinguishes the treatment of two groups. Then, when the groups are later compared on an outcome (dependent) variable, a causal inference becomes inescapable because the groups differed in only one way (their respective form of specific treatment) leading up to the outcome.

In experimental studies, techniques including random assignment to conditions of the IV, alternative activities to occupy the control group's time and efforts, and double-blindness, help ensure as best as possible that the treatment of the experimental and control groups differs only on the one essential element that is of scientific interest.

In non-experimental research, isolating the key differentiating factor between the experience of two groups is much more difficult. We may observe groups in society that appear to differ on Factor A (of interest to us), but they may also differ on Factors B, C, D, etc. For example, we may be interested in how our Factor A, type of school attended (public or private), affects the outcome of standardized test scores. Children who differ on Factor A may also, however, tend to differ on family income (Factor B), neighborhood environment (Factor C), etc. Ultimately, therefore, any difference seen in the final outcome measure between children who attended public and private schools cannot be pinpointed definitively to have been caused by school type (Factor A), because Factors B and C also distinguished the groups' experiences.

Writing on the New York Times "Freakonomics" blog, Justin Wolfers reviews a study by Scott Carrell and Mark Hoekstra on whether the presence of a disruptive child in a classroom can adversely affect the learning and behavior of the other children. To obtain an objective measure of children's likely disruptiveness, Carrell and Hoekstra examined official records looking for children who came from a home in which there was an allegation of domestic violence. Here are some key excerpts from Wolfers's entry (bold emphasis added by me):

Around 70 percent of the classes in their sample have at least one kid exposed to domestic violence. The authors compare the outcomes of that kid’s classmates with their counterparts in the same school and the same grade in a previous or subsequent year — when there were no kids exposed to family violence — finding large negative effects.

Adding even more credibility to their estimates, they show that when a kid shares a classroom with a victim of family violence, she or he will tend to under-perform relative to a sibling who attended the same school but whose classroom had fewer kids exposed to violence. These comparisons underline the fact that the authors are isolating the causal effects of being in a classroom with a potentially disruptive kid, and not some broader socio-economic pattern linking test scores and the amount of family violence in the community.

In a research area such as this, where random-assignment experiments are impossible to conduct, Carrell and Hoekstra have thus done their best to hold everything constant -- the school, the grade level, and, via the sibling component, children's home environment -- so that observed differences in children's school performance can more confidently be attributed to the putative effect of having a disruptive classmate.

Sibling designs are used fairly often in social science research. We hope to provide more in-depth discussion of this approach in our future postings.

Monday, June 30, 2008

Correlation and causality reached the U.S. Supreme Court last week -- or at least the written dissent of one justice -- as a 5-4 majority interpreted the U.S. Constitution's Second Amendment to confer an individual or personal right to gun ownership, as opposed to only a collective right (i.e., belonging to "a well-regulated militia...").

Cases such as this are supposed to be decided on constitutional issues, in terms of the history and meaning of the document. However, as sometimes happens, policy issues such as whether gun-control laws are good or bad for society find their way into the discourse.

Shown below is a passage from a New York Timesarticle, which quotes Justice Stephen Breyer's attempt to make sense of empirical studies of gun and crime (Breyer's full dissenting opinion is available here).

According to the study, published last year in The Harvard Journal of Law and Public Policy, European nations with more guns had lower murder rates. As summarized in a brief filed by several criminologists and other scholars supporting the challenge to the Washington law, the seven nations with the most guns per capita had 1.2 murders annually for every 100,000 people. The rate in the nine nations with the fewest guns was 4.4.Justice Breyer was skeptical about what these comparisons proved. “Which is the cause and which the effect?” he asked. “The proposition that strict gun laws cause crime is harder to accept than the proposition that strict gun laws in part grow out of the fact that a nation already has a higher crime rate.”

Whatever positions individuals might take on gun-control legislation, I hope most would agree that careful examination of the direction of causality from inherently correlational studies -- like that exhibited by Breyer -- is a good thing.

8-9 September: 2 days of tutorials on causality, probability and their use in science.10-12 September: CAPITS 2008 a 3-day conference on causality and probability in the sciences.15-19 September: a week of advanced research seminars on causality and probability.

Thursday, May 22, 2008

The following message was sent to the SEMNET listserve discussion group:

Dear colleagues,

We would like to kindly invite you to the Symposium on Causality 2008, scheduled for July 17th to 19th in Dornburg (near Jena), Germany. The symposium brings together different traditions of analysis of causal effects (regression-based analyses, analyses based on propensity scores, analyses with instrumental variables) to discuss the state-of-the-art in the analysis of causal effects, with a special focus on non-standard designs and problems (missing data, non-compliance, multilevel designs, regression discontinuity designs).

The symposium will be structured along seven focus presentations by leading proponents in different fields of causality research. Each focus presentation will be discussed and supplemented by two invited discussants, followed by an open discussion among all participants. Focus presentations will be given by Donald B. Rubin, Thomas D. Cook, William R. Shadish, Rolf Steyer, Steven G. West, Christopher Winship and Michael E. Sobel.

There will also be ample room for participants to present and discuss their research during the symposium. Participants who want to present their research findings are asked to register for the symposium no later than June 15 and submit a title and an abstract for their presentation together with their registration. The mode of presentation (oral presentation or poster) will be determined by the organization committee depending on the total number and quality of the submissions. Other participants should register no later than June 29.

The registration fee for the symposium is 80 Euros including a daily bus transfer from Jena to Dornburg and refreshments during the conference. You can also register for the conference dinner for additional 30 Euros. To register, please visit our webpage [English, German], where you can also find additional information about the contents and structure of the symposium. If you have any questions do not hesitate to contact us.

Tuesday, May 6, 2008

The March 2008 issue of Developmental Psychology contains a special series of around 15 methodolocially and statistically oriented articles (Table of Contents). Three of the articles explicitly refer in their titles to causal inference, and others of the articles may have relevant ideas, as well. The three titles mentioning causation are as follows:

At this stage, I have only skimmed through these (and other) articles in the issue. The techniques of "instrumental variables" and "matching" have, of course, been around for many years. I will be interested to see in greater depth what new contributions these articles make with such established techniques. Only within the past six months did I first hear the term "propensity score;" in skimming the many articles in the issue that use propensity scores, however, I've learned that this approach, too, has been around for decades!

Causal inference from nonexperimental data clearly is a complex, tricky endeavor. Perhaps it is for this reason that the kinds of techniques discussed in the special series have needed a quarter-century or longer to be absorbed, tested in different contexts, and diffused across disciplines.

Monday, April 7, 2008

Commentary continues to come in on the criteria for causality. This latest note is from Les Hayduk. He has agreed to my reprinting of these lightly edited comments, which he originally posted in full to the SEMNET discussion forum on Saturday, April 5, 2008, with the subject heading: “correlation-causality blog – improvements.” Dr. Hayduk requests that continuing discussion of his comments take place on the SEMNET forum (click here for an introduction to SEMNET). – Alan

I had a look at the blog Alan provided (see below) and found this easily readable, traditional, and in some ways extremely UN-helpful. I will pick up on two of the things that seem standard, but that slant people's thinking in ways that are unhelpful, and hence where I see improvements are possible. The two matters I will take on are: experiments as the supposed benchmark/gold-standard against which SEM is to be evaluated (I doubt this), and the conditions for causality (2 of the 3 traditional conditions are wrong, the third is imprecise).

There is some substantial artificiality of comparing single experiments and single SEM studies, but I skip this for the moment, though I suspect it may eventually become an important matter.

Some failings of experiments: 1) random assignment of cases (say people) to groups should result in the groups being SIGNIFICANTLY different on 5 out of every 100 characteristics, in the long run. (SEM analysis of the experimental data can include potentially problematic variables, to see if they happen to be among the 5%, if the experimenters are not too proud to combine experiments with SEM.)

2) Experiments minimize, but do NOT statistically control for any remaining measurement error. Such control can and should be done by SEM statistics. This is NOT a feature of the experiment, but involves the statistics that could be connected to the experiment. Notice that comparing experiments and SEM is implicitly comparing two different things: the methods, and the statistics that usually go along with the methods.

SEM should be used IN CONJUNCTION WITH experimentation, so I see Alan as (possibly unknowingly) working against the helpful combining of SEM with experimentation.

Within a single experiment the mechanisms of action WITHIN the study are usually NOT well-investigated with experiments, but can be much better done in a single SEM (via inclusion of indicators of the appropriate/anticipated intervening causal structures).

Model testing is LESS well done in experiments than in SEM. SEM has an overall model test, and experiments do not usually have a comparable test (if ANOVA, or regression, or mean-differences are used as the statistical procedures). These procedures provide parameter tests parallel to those in SEM, but they have no parallel to the OVERALL MODEL TEST in SEM. Often experimenters are not even aware that they do not have an overall model test comparable to SEM's overall model test.

Enough on this for now, so I will move to the criterion for causality. Here is a quote from Alan's blog:

…contemporary SEM practitioners would probably be more comfortable with suggestions of causation if the data were collected longitudinally (more specifically, with a panel design, in which the same respondents are tracked over time). Of the three major criteriafor demonstrating causality, longitudinal studies are clearly capable of demonstrating correlation and time-ordering; provided that the most plausible “third variable” candidates are measured and controlled for, the approximation to causality should be good…

Time sequence is NOT required for causation. Causes can go “both ways simultaneously” – there are such things as reciprocal causes (e.g. Rigdon, 1995, Multivariate Behavioral Research 30(3): 359-383) and variables can even cause themselves (for example, see my 1996 book chapter 3, or Hayduk, 1994, Journal of Nonverbal Behavior 18:245-260).

Correlation is NOT required. Suppressor effects can result in a variable causing another variable, and yet other parts of the causal system can counteract the causal covariance contribution, so the covariance between the variables is zero. (See Duncan, 1975, Introduction to Structural Equation Models, page 29 [equation for Greek-ro-subcript23] and realize that one term in the equation can be positive and the other of equal-magnitude yet negative.)

[Moderator’s note: See also Dean Keith Simonton’s discussion of suppressor variables and causal inference, in the posting immediately below.]

“Third variable” control should refer to MANY variable control – there can be many common causes, and many correlated causes that influence the two variables, and even reciprocal effects where the jargon of “third variables” is not quite correct. The full causal structure should be attended to, including misplacement of causally downstream variables to upstream locations. The issue here is the full proper causal specification of the model, not something connected to just third variables.

I notice you mentioned Judea Pearl's work. [Interested readers are encouraged to] have a look at the SEMNET archive for the comments Judea Pearl provided to SEMNET some years back [and] discussion of some of Pearl's work in SEM 2003, 10(2):289-311, which was designed to help SEM people understand one part of Pearl's book that directly connects to SEM.

Sunday, April 6, 2008

In response to some of my recent posts, I received a nice note from Dean Keith Simonton, in which he applied the concept of suppressor variables to the three traditional criteria for causality. With Dean's permission, here is his comment. -- Alan

I was reading your posting when I came across the following page, where you state that there are three criteria for the inference of causality, the first being correlation. Correlation is specified as a necessary but not sufficient standard for causal inference.

This I believe is incorrect. When I teach causal modeling I emphasize a paradoxical version of the commonplace statement that "correlation does not prove causation," namely that "no correlation does not prove no causation." Both are equally true.

The problem is this: Not only can third variables generate spuriously non-zero correlations but they can also generate spuriously zero correlations. Only after we control for these attenuating effects will we discover that the (partial) correlation (or regression coefficient) is actually non-zero. Not only can this happen, but we even have a name for this consequence: suppression. Third variables that enlarge rather than reduce the association between two variables are suppressor variables.

Admittedly, suppression is often seen as something to be avoided. This is especially true when suppression yields standardized partial regression coefficients that are greater than one (or less than minus one) or when the relationship between the two variables changes sign (e.g., from a significantly positive correlation to a significantly negative beta). Often such effects can be seen as artifacts of poor measurement or design (e.g., excessive collinearity among the independent variables measured by tests with numerous shared items).

Yet it can also happen that suppression leads to a superior understanding of the underlying causal process. Sometimes the true model operates in such a fashion that it produces a zero bivariate correlation between two variables that are actually causally related. In such instances, no correlation does not prove no causation.

The example I use in class is the equation I've been developing over the years to predict the greatness assessments of US presidents.* It turns out that one of the best predictors in a 6-variable multiple regression equation is whether or not a president was assassinated while in office. Yet assassination does not have a significant zero-order correlation. How can this be? Well, another major predictor of leader performance is duration of tenure in office, and this variable quite understandably has a negative correlation with assassination. On the average, assassinated presidents have shorter tenures. So the positive association between tenure duration and the global leadership assessment masks the positive impact of assassination. Only when both are put into the same equation will the causal impact of assassination emerge. In addition, the predictive power of tenure duration is increased because its true effect size is no longer obscured by assassination. In the literature, this is sometimes called "cooperative suppression" (a term that seems inappropriate in the current example!).

I could give other empirical illustrations, but the foregoing should suffice. Two variables can have a causal relation even in the absence of a non-zero correlation. Zero-order correlations can be spuriously small as well as spuriously large. This outcome is especially likely in the complex causal networks that likely underlie real-world phenomena. Hence, the three conditions for causal inference from correlational data are misspecified. They probably reduce to two: temporal priority and a non-zero correlation after controlling for all reasonable third variables.

*The original 6-variable equation was published in Simonton, D.K. (1986). Presidential personality: Biographical use of the Gough Adjective Check List. Journal of Personality and Social Psychology, 51, 149-160. An update of the entire research program will appear in Simonton, D.K. (in press). Presidential greatness and its socio-psychological significance: Individual or situation? Performance or attribution? In C. Hoyt, G. R. Goethals, & D. Forsyth (Eds.), Leadership at the crossroads: Psychology and leadership (Vol. 1). Westport, CT: Praeger.

Monday, March 31, 2008

Following up on Part I, we now look at a recent empirical study whose discussion provides what I consider a particularly thoughtful exposition on causal inference with longitudinal survey data.

As someone who studies adolescent drinking and how it may be affected by parental socialization, I took notice of an article by Stephanie Madon and colleagues entitled “Self-Fulfilling Prophecy Effects of Mothers’ Beliefs on Children’s Alcohol Use…” (Journal of Personality and Social Psychology, 2006, vol. 90, pp. 911-926).

The substantive findings of longitudinal relations between mothers’ beliefs at earlier waves and children’s drinking at later waves were interesting. However, what really stood out to me was the thoughtful discussion near the end of the article about which criteria for causality the longitudinal correlational design could and could not satisfy, and what other lines of argument could be marshaled on behalf of causal inferences. Within the larger Discussion section, Madon and colleagues’ consideration of causal inference appears under the heading “Interpreting Results From Naturalistic Studies.”

Madon et al.’s discussion notes the major thing their design does: “…rule out the possibility that the dependent variable exerted a causal influence on the predictor variable because measurement of the predictor variable is temporally antecedent to changes in the dependent variable.” It also acknowledges the major limitation of the design, possible unmeasured third variables: “…the potential omission of a valid predictor raises the possibility that mothers based their beliefs on a valid predictor of adolescent alcohol use that was not included in the model. If this were to occur,… [mothers’] self-fulfilling effects would be smaller than reported.”

The authors then proffer “several reasons why we believe that the self-fulfilling prophecy interpretation is more compelling” than a third-variable account.

First, Madon and colleagues claim, their preferred interpretation is consistent with established experimental findings of self-fulfilling prophecies. “Although this convergence does not prove that our results reflect self-fulfilling prophecies, confidence in the validity of a general conclusion increases when naturalistic and experimental studies yield parallel findings.”

Second, the authors cite the breadth of “theoretically and empirically supported predictors” they used as control variables. This step, they claim, reduces the chance of spurious, third-variable causation.

Third, the authors note that maternal beliefs exerted greater predictive power over time, whereas the non-maternal predictors explained less variance over time. The extension of this argument, as I understand it, would then be to ask, How likely is it that any as-yet-unmeasured, potential third variable tracks perfectly with maternal beliefs and not at all with the predictors that served as control variables?

Both by their empirical findings and their citation of the broader adolescent drinking (in terms of their control variables) and self-fulfilling prophecy (for its experimental foundation) literatures, Madon and colleagues provide a great deal of evidence that is consistent with a parenting effect.

Depending on one’s perspective, to use a football analogy, one might conclude that the Madon et al. study has advanced the ball to the opponents’ 40-yard line, or 20-yard line, or even 1-yard line. But most would probably agree that it takes a true experiment to get the ball into the end zone.

Wednesday, March 26, 2008

For more than a decade, I've taught a course on structural equation modeling (SEM). The technique, nearly always used with survey data (although in theory also applicable to experimental data), involves drawing shapes to represent variables and arrows between the shapes to represent the researcher's proposed flow of causation between the variables. Regression-type path coefficients are then generated to assess the strength of the hypothesized relations. Though the unidirectional arrows in the diagrams and the tone often used in reports of SEM analyses (e.g., "affected," "influenced," "led to") imply causality, such an inference cannot, of course, be supported with non-experimental data. This is especially true of cross-sectional data (i.e., where all variables are measured concurrently). As will be discussed later, causal inferences can be supported to a greater degree -- though not completely -- with longitudinal data.

It is in this context that I read the chapter on SEM in Judea Pearl's (2000) book Causality: Models, Reasoning, and Inference. Pearl, a UCLA professor with expertise in artificial intelligence, logic, and statistics, writes at a level that is, quite frankly, well over my head. I did find his SEM chapter relatively accessible, however, so that I will discuss.

Pearl's apparent thesis in this chapter is that contempory SEM practitioners are too quick to dismiss the possibility of being able to draw causality from the technique (much like I did in my opening paragraph). Early on, in fact, Pearl writes that, "This chapter is written with the ambitious goal of reinstating the causal interpretation of SEM" (p. 133).

Pearl reviews a number of writings on SEM that he feels, "...bespeak an alarming tendency among economists and social scientists to view a structural equation as an algebraic object that carries functional and statistical assumptions but is void of causal content" (p. 137). He further notes:

The founders of SEM had an entirely different conception of structures and models. Wright (1923, p. 240) declared that "prior knowledge of the causal relations is assumed as prerequisite" in the theory of path coefficients, and Haavelmo (1943) explicitly interpreted each structural equation as a statement about a hypothetical controlled experiment. Likewise, Marschak (1950), Koopmans (1953), and Simon (1953) stated that the purpose of postulating a structure behind the probability distribution is to cope with the hypothetical changes that can be brought about by policy.

An interpretation of the above paragraph that would make the causal interpretation of SEM defensible, in my view, would be as follows: If the directional relations one is modeling with non-experimental (survey) data have previously been demonstrated through experimentation, then SEM can be a useful tool for estimating quantitatively how much of an effect a policy change could have on some outcome criterion. It's when we start talking about "assumed" or "hypothetical" experimental support that things get dicey.

As I noted above, however, contemporary SEM practitioners would probably be more comfortable with suggestions of causation if the data were collected longitudinally (more specifically, with a panel design, in which the same respondents are tracked over time). Of the three major criteria for demonstrating causality, longitudinal studies are clearly capable of demonstrating correlation and time-ordering; provided that the most plausible "third variable" candidates are measured and controlled for, the approximation to causality should be good (these latter two linked documents are from my research methods lecture notes).

MacCallum and colleagues (1993, Psychological Bulletin, 114, 185-199), while acknowledging some limitations to longitudinal studies, succinctly summarize why they're useful: "When variables are measured at different points in time, equivalent models in which effects move backward in time are often not meaningful" (p. 197).

In the upcoming Part II, we highlight a recent empirical study whose discussion provides a particularly thoughtful exposition on causal inference with longitudinal survey data.

Friday, March 7, 2008

As is taught in beginning research methods courses, the only technique that allows for causal inference is the true experiment and the linchpin of experimentation that permits such inference is random assignment. In the social and behavioral sciences, experiments typically are conducted in university laboratories, with established subject pools to ensure the availability of participants.

Scientists lacking such resources may thus have a hard time conducting experiments, even if they wanted to. Others may simply develop a preference for survey research or other non-experimental methods such as archival research and content analysis; such techniques generally cannot be used to assess causality, but offer potential advantages in terms of mapping onto more natural, realistic situations encountered in daily life. I myself, for whatever reason, gravitated to survey research over the years, even though laboratory experimentation was a major part of my graduate-school experience.

Now, however, even scholars lacking an affinity for the lab may have opportunities to address causation in their research. In what appears to be a growing trend, clever researchers are noticing random assignments in real-world settings and seizing upon them to conduct causal studies from afar.

As one example, University of Durham anthropologists Russell Hill and Robert Barton realized that in Olympic "combat" sports such as boxing and wrestling, competitors are assigned at random to wear either red or blue outfits. The finding that red-clad participants won more often than did their blue counterparts can thus be interpreted causally (a showing that outfit color has some causal effect does not necessarily mean that it's a large effect).

As another example, readers of the 2005 book Freakonomics (by Steven Levitt and Stephen Dubner) may remember Levitt and colleagues' drawing upon the Chicago Public Schools' use of random assignment in the district's school-choice program when more students wanted to go to a particular school than could be accommodated. As Levitt and Dubner wrote (p. 158):

In the interest of fairness [to applicants of the most competitive schools], the CPS resorted to a lottery. For a researcher, this is a remarkable boon. A behavioral scientist could hardly design a better experiment in his laboratory...

The random aspect thus allowed the researchers to make the causal conclusion that:

...the students who won the lottery and went to a "better" school did no better than equivalent students who lost the lottery and were left behind.

In yet another example, Levitt's fellow University of Chicago faculty member, law professor Cass Sunstein, seized upon the fact that appellate cases within the federal circuit courts are heard by random three-judge panels from the larger pool of judges within each geographic circuit. Does a judge appointed by a Democratic (or Republican) president show variation in how often he or she votes on the bench in a liberal or conservative direction, depending on whether he or she is joined in a case by judges appointed by presidents of the same or opposing party? That is the type of question Sunstein and his colleagues can answer (here and here).

Yale law professor (and econometrician) Ian Ayres, whose 2007 book Supercrunchers I recently finished reading, refers to this phenomenon as "piggyback[ing] on pre-existing randomization" (p. 72). The book contains additional examples of its use.

Thursday, January 31, 2008

Sportswriter Allen Barra has carved out a niche for himself as a contrarian, as evidenced by the title of his 1995 book, That’s not the way it was: (Almost) everything they told you about sports is wrong. Seemingly whenever he gets the chance, Barra likes to take on conventional wisdom in sports.

Viewers of the Super Bowl football game this upcoming Sunday will likely hear the announcers cite statistics purporting to show that some early development in the game may presage a victory by one team over the other. These are the kinds of pronouncements Barra loves to challenge.

One of the lines of thinking he attacked in his book was that, “You need a strong running game to win in pro football.” As noted by Barra, fans often hear a claim of the form: When running back A runs for 100 yards in a game, his team wins some high percent of the time (see this example involving the running back Ahman Green). Yet, statistical analyses by Barra and colleagues did not appear to show that the better teams rushed the ball better than did the poorer teams. Eventually, Barra reached the following conclusion:

What we finally came to discover was that football people were confusing cause with effect in regard to running the ball…Stated as simply as possible, the good teams weren’t winning so much because Tony Dorsett (or Walter Payton or Roger Craig or whoever) was rushing for 100-plus yards – the runners were getting their 100-plus yards because the teams were winning. Teams with a sizable lead in the second half have the luxury of running far more plays on the ground than their opponents; this not only allows them to avoid sacks and interceptions that could help their opponents get back into the game, it allows them to eat up the clock by keeping the ball on the ground (pp. 173-174).

Among the statistical findings presented by Barra were that, “Most playoff teams led in most of their regular season games by halftime,” and that, “Most playoff teams get as much as two-thirds of their rushing yards in the second half when they already have a lead…” (p. 174).

Several years after the publication of Barra’s book, in 2003, I attended an informal gathering of academics and sportswriters in Scottsdale, Arizona to discuss the application of statistics and research methodology to sports decision-making. Another possible case of football spuriosity that came up was the likely correlation between throwing interceptions and losing games. Were such a correlation to be confirmed, many observers would probably interpret it as the throwing of interceptions causing a team to lose (i.e., by giving the opponent good field position and/or killing one’s own drives). Following the same logic as in the above example, it could be that a team falls behind for reasons having nothing to do with interceptions, but once behind, throws a lot of risky passes in an attempt to catch up, which get… intercepted!

The book traces roughly 150 years of history of how prominent statisticians developed concepts to help solve practical problems. Of greatest relevance to this blog is Chapter 18, “Does Smoking Cause Cancer?” Given the inability to conduct true, random-assignment experiments on human subjects, epidemiological researchers were left in the 1950s and ’60s with a variety of case-control (retrospective) studies, prospective studies, and studies of animals and tissue cultures.

Arguably the central figure in the book is Sir Ronald A. Fisher (1890-1962). A prolific contributor to methodology and statistics (e.g., experimental design, analysis of variance and covariance, degrees of freedom, time-series analyses, sample-to-population inference, maximum likelihood), Fisher is portrayed as a fervent skeptic of a causal connection between tobacco and lung cancer.

As part of his historical review, Salsburg writes:

Fisher was dealing with a deep philosophical problem – a problem that the English philosopher Bertrand Russell had addressed in the early 1930s, a problem that gnaws at the heart of scientific thought, a problem that most people do not even recognize as a problem: What is meant by “cause and effect”? Answers to that question are far from simple (p. 183).

Salsburg then reviews proposals (and their failures) for conceptualizing cause and effect, including the use of symbolic logic, and “material implication” (suggested by Russell and elaborated by Robert Koch).

Each study is flawed in some way. For each study, a critic can dream up possibilities that might lead to bias in the conclusions. Cornfield and his coauthors assembled thirty epidemiological studies run before 1958… As they point out, it is the overwhelming consistency across these many studies, studies of all kinds, that lends credence to the final conclusion. One by one, they discuss each of the objections. They consider [Mayo Clinic statistician Joseph] Berkson’s objections and show how one study or another can be used to address them… (p. 190).

Cornfield and colleagues did this for other critics’ objections, as well.

Another research milestone Salsburg cited, which looks interesting to me, is the development of a set of criteria for matching in case-control studies, attributed to Alvan Feinstein and Ralph Horwitz. These authors’ contributions are cited in this 1999 review of epidemiologic methods by Victor J. Schoenbach.

Tuesday, January 15, 2008

Welcome to our new blog on correlation and causality. For four academic years (2003-04 to 2006-07 inclusive) we both taught research methods in the Department of Human Development and Family Studies (HDFS) at Texas Tech University. Bo has now moved to Penn State. Alan, who arrived at Texas Tech for the 1997-98 year, remains there.

Each of us has used a “validities” framework in teaching our methods courses (cf. Campbell & Stanley, 1963; Cook & Campbell, 1979), where internal validity deals with causal inference, external validity with generalizability, and construct validity with whether a test measures what it purports to measure (Linda Woolf has a page that concisely summarizes the different types of validity).

Bo one time said that, with reference to academics at least, internal validity is what gets him up in the morning and motivates him to come in to work. Alan’s passion is perhaps more evenly split between external validity (on which he operates another website) and internal validity.

On this blog, we seek to raise and discuss various issues pertaining to correlation and causality, much like we did during our frequent conversations at Texas Tech. In fields that study human behavior in “real world” settings, many potentially interesting phenomena are off-limits to the traditional experimental design that would permit causal inferences, for practical and ethical reasons.

Does the birth of a child increase or decrease couples’ marital/relationship satisfaction? Does growing up with an alcohol-abusing parent damage children’s development of social skills? How does experiencing a natural disaster affect residents’ mental and physical health?

For none of these questions could researchers legitimately assign individuals (or couples) at random to either receive or not receive the presumed causal stimulus. Much of our discussion, therefore, will be aimed at formulating ideas for how to make as strong a causal inference as possible, for a given research question.

By raising issues of how researchers might approach a given research question from the standpoint of internal validity, we hope to fulfill a “seeding” process, where our initial commentaries will be generative of further discussion and suggestions. We are thus permitting (and encouraging!) comments on this blog, for this purpose. We hope to learn as much (or more) from you, as you might learn from us.

In addition, we’ll write about stories in the news media that raise causal questions and review scholarly articles and books that do the same.

We recognize that issues of causality are implicated in a wide variety of academic disciplines. At the outset at least, we will probably stick closely to fields such as psychology, sociology, and HDFS. Later on, we hope to expand into other domains such as philosophy and legal studies (within the law, many states have homicide or wrongful death statutes with wordings that allude to situations in which someone “causes the death” of another).

We invite you to visit this blog often and chime in with comments when the feeling strikes. Requests from readers to write lead essays as guest contributors will be considered (or we may even invite some of you to do so).