Presentation
slides (pdf
file)"...
the researchers' conclusions regarding a pledge effect
suggest a criterion for causality of post hoc ergo
propter hoc ("after this, therefore because of
this"), a fundamental fallacy of logic, known since
classical times, that involves inferring a causal
relation on the basis of correlation and temporality
alone.
"

Appraising Evidence
on Program Effectiveness: Do Virginity Pledges Cause
Virginity?(original
critique)In a recent publication
on appraising evidence on program effectiveness
(Constantine & Braverman,
2004), we used the Bearman and
Brueckner (2001) virginity pledge study to illustrate the
problem of dubious claims of program effectiveness that
are erroneously based on non-experimental correlational
data. After reviewing the logical and statistical basis
of the authors' claims regarding the effect of virginity
pledging on delaying sexual intercourse, we concluded
that "... in
considering the original question--Do virginity pledges
cause initiation of sexual intercourse to be
delayed?--the answer remains that they might or might
not. This particular study adds little or nothing to our
knowledge of this wished-for effect."(p.241)

The following text is the full
critique excerpted from Constantine and Braverman (2004).
For comments and questions about this critique, and our
replies, see the critique
FAQ.

Causation, Correlation, and
Alternative Explanations

The essential nature of causation and the types of
evidence necessary to demonstrate a causal relation, such
as the effect of a program on an outcome, have been long
debated (e.g., Bunge, 1979; Mackie, 1980; McKim and
Turner, 1997). In spite of these ongoing debates, it is
safe to say that a pragmatic view of causation is most
appropriate to intervention effectiveness studies. Most
program outcomes of interest are the result of numerous
and interacting causes-some that are potentially
changeable (such as home environment) and some that are
much less so (such as genetic influences). What we expect
of the best interventions is to partially influence some
outcomes, under specific conditions and circumstances, by
modifying one or more causal factors. But how do we know
when this has happened?

According to the nineteenth-century philosopher John
Stuart Mill, at least three criteria must be invoked in
justifying casual claims: (1) association (or
correlation-the cause is related to the effect), (2)
temporality (the cause comes before the effect), and (3)
elimination of plausible alternative explanations (other
plausible explanations for an effect are considered and
ruled out). The key is that all three are necessary, yet
sometimes the second and often the third of these
criteria are neglected in the design and interpretation
of evaluation studies. And even when the third criterion
is explicitly addressed by the evaluation, it is often
arguable whether or not a sufficient number of the most
likely plausible explanations have been considered.

It is widely recognized that correlation does not
necessarily imply causation, yet erroneous causal
attributions are commonly made based on association or
correlation alone. Consider the potential conclusion that
adolescents' levels of psychological attachment to their
families are a cause of observed differences in problem
behavior levels, based on a correlation between these two
variables. Although this conclusion might in fact be
valid, the correlation alone does not provide sufficient
supportive evidence for its validity; any number of
alternative explanations could fit the observed
relationship. For example, lower levels of problem
behavior might strengthen family attachment rather than
the other way around. Or a third factor, such as patterns
of parental conflict, might independently influence both
attachment and problem behavior.

When both of Mill's first two conditions (association
and temporality) hold, it can be even more tempting to
erroneously infer causation without considering other
plausible explanations. As an example, consider the
National Longitudinal Study of Adolescent Health
(commonly known as the Add Health study) (Resnick and
others, 1997) . This large correlational study has
yielded uncountable associations among adolescent
behaviors, background conditions, health outcomes, and
other factors. And because it was longitudinal, involving
linked measurements over time from the same participants,
some of these associations have been examined for the
temporality expected for a cause-and-effect relation. Yet
little effort has been invested in addressing the third
critical criterion for causality: identifying and ruling
out plausible alternative explanations.

A compelling illustration is provided by a widely
publicized Add Health study conclusion that virginity
pledge programs "cause virginity," that is, delay
initiation of sexual intercourse (Bearman and Brueckner,
2001). Complex statistical methods, such as survival
analysis and logistic regression, were used to reach this
conclusion. Several qualifications regarding the program
setting were appropriately discussed, most notably that
to have an effect, the pledge must occur in a community
of other pledgers that is neither too small nor too large
relative to the total student population in the school.
The authors, however, neglected sufficient consideration
of plausible alternative explanations. Foremost among
these would be the possibility that a pre-existing
disinclination to initiate sex might have been a primary
causal factor behind both signing the pledge and delaying
intercourse. If true, this alternative explanation
implies that signing the virginity pledge serves as a
marker to identify those youth who delay intercourse for
any number of other reasons and that, in the absence of
pledging, the pattern of sexual initiation would be
largely unchanged. This alternative arises from the
likelihood of a strong self-selection effect, meaning
that participants determine for themselves whether they
will be part of the intervention group (in this case,
those who pledged) or the control group (those who did
not). It is likely that pre-existing differences between
those who chose to pledge and those who didn't-most
notably differences in the intent to delay
intercourse-are not only related to the intervention
group assignment but are arguably among its most
important determinants.

A statistical adjustment procedure intended to remove
the effect of self-selection was described in an appendix
to the article, but this procedure was both logically and
statistically inadequate (see Pedhazur and Schmelkin,
1991, pp. 295-296 for a discussion on the futility of
this type of adjustment). Instead, the researchers'
conclusions regarding a pledge effect suggest a criterion
for causality of post hoc ergo propter hoc ("after
this, therefore because of this"), a fundamental fallacy
of logic, known since classical times, that involves
inferring a causal relation on the basis of correlation
and temporality alone.

This study, nevertheless, has generated extensive
media coverage and policy discussion (see, for example,
Boyle, 2000; Nesmith, 2001; Schemo, 2001; Willis, 2001)
and has had a substantial influence on federal policy
about sexuality education. Prior to this study, the U.S.
Department of Health and Human Services had required as
performance measures for the evaluation of
federally-funded abstinence education programs "the
proportion of program participants who have engaged in
sexual intercourse" and the birth rate of female program
participants (Federal Register, 2000). Two years later,
on the heels of extensive media attention to Bearman and
Brueckner's (2001) study, these sexual behavior and birth
rate measures were replaced with the "proportion of youth
who commit to abstain from sexual activity until
marriage" (Department of Health and Human Services,
2002). Thus, virginity pledging has become the primary
behavioral outcome to be measured.

If one reads the various critiques and summaries of
the pledge study and its conclusions, it is remarkable to
find no mention of the obvious plausible alternative
explanation of a pre-existing disinclination among
pledgers. Instead, the critiques tend to focus on the
limited conditions under which the intervention is
believed to be effective and the negative side effects
observed (for example, that pledgers who break the pledge
were less likely to use contraception than nonpledgers).
Yet in considering the original question--Do virginity
pledges cause initiation of sexual intercourse to be
delayed?--the answer remains that they might or might
not. This particular study adds little or nothing to our
knowledge of this wished-for effect.

-- (from pages 239-241)

Experimental and
Quasi-Experimental Designs

The pledge study example sets the stage for a brief
review of experimental and quasi-experimental study
designs. A more comprehensive overview of this topic is
provided by Reichardt and Mark (1998), and a definitive
coverage can be found in Shadish, Cook, and Campbell
(2002). The critical missing design element in the pledge
study was a controlled manipulation, that is, random or
other controlled assignment of the pledge program to some
schools or classrooms and not others. Random assignment
would be characteristic of a true experimental design,
whereas nonrandom assignment strategies could be part of
a quasi-experimental design. By contrast, the virginity
pledge study design was purely correlational, in that no
manipulation of intervention delivery across schools,
classrooms, or other units took place. With a good
experimental or quasi-experimental design, the plausible
alternative explanation for the virginity effect could
have been ruled out or rendered unlikely, and then the
potential effectiveness of pledging could have been
examined more appropriately. The admonition "no causation
without manipulation" (commonly attributed to Paul
Holland) might be somewhat exaggerated for effect, yet it
is a useful heuristic for raising a red flag whenever one
encounters claims of program effects based on
self-selected participation in an intervention program.
Correlational designs do have a variety of appropriate
and important uses, such as developing hypotheses to be
tested in subsequent studies. However, their utility in
eliminating plausible alternative explanations is
limited.

-- (from page 241-242)

Transparency and
Accessibility

A preliminary draft of the virginity pledge study
discussed earlier was released by the authors in July,
2000--six months prior to its official publication date
(January 2001) in the American Journal of Sociology.
However, this printed journal was not mailed to libraries
or subscribers until June of 2001, creating, in effect, a
one-year interval in which the report was extensively
discussed in the national media and among policy
advocates on both sides of the issue, yet without access
to the final published version of the article. Many
discussants were limited to commenting on the press
release or the few selected details reported in the media
reports.