There are many different types of quasi-experimental designs that have a variety of
applications in specific contexts. Here, I'll briefly present a number of the more
interesting or important quasi-experimental designs. By studying the features of these
designs, you can come to a deeper understanding of how to tailor design components to
address threats to internal validity in your own research contexts.

The Proxy Pretest Design

The proxy pretest design looks like a
standard pre-post design. But there's an important difference. The pretest in this design
is collected after the program is given! But how can you call it a pretest if it's
collected after the program? Because you use a "proxy" variable to estimate
where the groups would have been on the pretest. There are essentially two variations of
this design. In the first, you ask the participants to estimate where their pretest level
would have been. This can be called the "Recollection" Proxy Pretest Design. For
instance, you might ask participants to complete your measures "estimating how you
would have answered the questions six months ago." This type of proxy pretest is not
very good for estimating actual pre-post changes because people may forget where they were
at some prior time or they may distort the pretest estimates to make themselves look
better. However, there may be times when you are interested not so much in where they were
on the pretest but rather in where they think they were. The recollection proxy pretest
would be a sensible way to assess participants' perceived gain or change.

The other proxy pretest design uses archived records to stand in for the pretest. We
might call this the "Archived" Proxy Pretest design. For instance, imagine that
you are studying the effects of an educational program on the math performance of eighth
graders. Unfortunately, you were brought in to do the study after the program had already
been started (a too-frequent case, I'm afraid). You are able to construct a posttest that
shows math ability after training, but you have no pretest. Under these circumstances,
your best bet might be to find a proxy variable that would estimate pretest performance.
For instance, you might use the student's grade point average in math from the seventh
grade as the proxy pretest.

The proxy pretest design is not one you should ever select by choice. But, if you find
yourself in a situation where you have to evaluate a program that has already begun, it
may be the best you can do and would almost certainly be better than relying only on a
posttest-only design.

The Separate Pre-Post Samples Design

The basic idea in this design (and its
variations) is that the people you use for the pretest are not the same as the people you
use for the posttest. Take a close look at the design notation for the first variation of
this design. There are four groups (indicated by the four lines) but two of these groups
come from a single nonequivalent group and the other two also come from a single
nonequivalent group (indicated by the subscripts next to N). Imagine that you have two
agencies or organizations that you think are similar. You want to implement your study in
one agency and use the other as a control. The program you are looking at is an
agency-wide one and you expect that the outcomes will be most noticeable at the agency
level. For instance, let's say the program is designed to improve customer satisfaction.
Because customers routinely cycle through your agency, you can't measure the same
customers pre-post. Instead, you measure customer satisfaction in each agency at one point
in time, implement your program, and then measure customer satisfaction in the agency at
another point in time after the program. Notice that the customers will be different
within each agency for the pre and posttest. This design is not a particularly strong one.
Because you cannot match individual participant responses from pre to post, you can only
look at the change in average customer satisfaction. Here, you always run the risk that
you have nonequivalence not only between the agencies but that within agency the pre and
post groups are nonequivalent. For instance, if you have different types of clients at
different times of the year, this could bias the results. You could also look at this as
having a proxy pretest on a different group of people.

The second example of the separate pre-post
sample design is shown in design notation at the right. Again, there are four groups in
the study. This time, however, you are taking random samples from your agency or
organization at each point in time. This is essentially the same design as above except
for the random sampling. Probably the most sensible use of this design would be in
situations where you routinely do sample surveys in an organization or community. For
instance, let's assume that every year two similar communities do a community-wide survey
of residents to ask about satisfaction with city services. Because of costs, you randomly
sample each community each year. In one of the communities you decide to institute a
program of community policing and you want to see whether residents feel safer and have
changed in their attitudes towards police. You would use the results of last year's survey
as the pretest in both communities, and this year's results as the posttest. Again, this
is not a particularly strong design. Even though you are taking random samples from each
community each year, it may still be the case that the community changes fundamentally
from one year to the next and that the random samples within a community cannot be
considered "equivalent."

The Double Pretest Design

The Double Pretest is a very strong
quasi-experimental design with respect to internal validity. Why?
Recall that the Pre-Post Nonequivalent Groups Design (NEGD) is
especially susceptible to selection threats to internal
validity. In other words, the nonequivalent groups may be different in some way before the
program is given and you may incorrectly attribute posttest differences to the program.
Although the pretest helps to assess the degree of pre-program similarity, it does not
tell us if the groups are changing at similar rates prior to the program. Thus, the NEGD
is especially susceptible to selection-maturation threats.

The double pretest design includes two measures prior to the program. Consequently, if
the program and comparison group are maturing at different rates you should detect this as
a change from pretest 1 to pretest 2. Therefore, this design explicitly controls for
selection-maturation threats. The design is also sometimes referred to as a "dry
run" quasi-experimental design because the double pretests simulate what would happen
in the null case.

The Switching Replications Design

The Switching Replications
quasi-experimental design is also very strong with respect to internal validity. And,
because it allows for two independent implementations of the program, it may enhance
external validity or generalizability. The design has two groups and three waves of
measurement. In the first phase of the design, both groups are pretests, one is given the
program and both are posttested. In the second phase of the design, the original
comparison group is given the program while the original program group serves as the
"control". This design is identical in structure to it's randomized
experimental version, but lacks the random assignment to group. It is certainly
superior to the simple pre-post nonequivalent groups design. In addition, because it
assures that all participants eventually get the program, it is probably one of the most
ethically feasible quasi-experiments.

The Nonequivalent Dependent Variables (NEDV) Design

The Nonequivalent Dependent Variables (NEDV)
Design is a deceptive one. In its simple form, it is an extremely weak design with respect
to internal validity. But in its pattern matching variations, it opens the door to an
entirely different approach to causal assessment that is extremely powerful. The design
notation shown here is for the simple two-variable case. Notice that this design has only a
single group of participants! The two lines in the notation indicate
separate variables, not separate groups.

The idea in this design is that you have a program designed to change a specific
outcome. For instance, let's assume you are doing training in algebra for first-year
high-school students. Your training program is
designed to affect algebra scores. But it is not designed to affect geometry scores. And,
pre-post geometry performance might be reasonably expected to be affected by other
internally validity factors like history or maturation. In this case, the pre-post
geometry performance acts like a control group -- it models what would likely have
happened to the algebra pre-post scores if the program hadn't been given. The key is that
the "control" variable has to be similar enough to the target variable to be
affected in the same way by history, maturation, and the other single
group internal validity threats, but not so similar that it is affected by the
program. The figure shows the results we might get for our two-variable algebra-geometry
example. Note that this design only works if the geometry variable is a reasonable proxy
for what would have happened on the algebra scores in the absence of the program. The real
allure of this design is the possibility that we don't need a control group -- we can give
the program to all of our sample! The problem is that in its two-variable simple version,
the assumption of the control variable is a difficult one to meet. (Note that a double-pretest version of this design would be
considerably stronger).

The Pattern Matching NEDV Design. Although the two-variable NEDV
design is quite weak, we can make it considerably stronger by adding multiple outcome
variables. In this variation, we need many outcome variables and a theory that tells how
affected (from most to least) each variable will be by the program. Let's reconsider
the example of our algebra program above. Now, instead of having only an algebra and
geometry score, we have ten measures that we collect pre and post. We expect that the
algebra measure would be most affected by the program (because that's what the program was
most designed to affect). But here, we recognize that geometry might also be affected
because training in algebra might be relevant, at least tangentially, to geometry skills.
On the other hand, we might theorize that creativity would be much less affected, even
indirectly, by training in algebra and so our creativity measure is predicted to be least
affected of the ten measures.

Now, let's line up our theoretical
expectations against our pre-post gains for each variable. The graph we'll use is called a
"ladder graph" because if there is a correspondence between expectations and
observed results we'll get horizontal lines and a figure that looks a bit like a ladder.
You can see in the figure that the expected order of outcomes (on the left) are mirrored
well in the actual outcomes (on the right).

Depending on the circumstances, the Pattern Matching NEDV design can be quite strong
with respect to internal validity. In general, the design is stronger if you have a larger
set of variables and you find that your expectation pattern matches well with the observed
results. What are the threats to internal validity in this design? Only a factor (e.g., an
historical event or maturational pattern) that would yield the same outcome pattern can
act as an alternative explanation. And, the more complex the predicted pattern, the less
likely it is that some other factor would yield it. The problem is, the more complex the
predicted pattern, the less likely it is that you will find it matches to your observed
data as well.

The Pattern Matching NEDV design is especially attractive for several reasons. It
requires that the researcher specify expectations prior to institution of the program.
Doing so can be a sobering experience. Often we make naive assumptions about how our
programs or interventions will work. When we're forced to look at them in detail, we begin
to see that our assumptions may be unrealistic. The design also requires a detailed
measurement net -- a large set of outcome variables and a detailed sense of how they are
related to each other. Developing this level of detail about your measurement constructs
is liable to improve the construct validity of your study. Increasingly, we have
methodologies that can help researchers empirically develop construct networks that
describe the expected interrelationships among outcome variables (see Concept
Mapping for more information about how to do this). Finally, the Pattern Matching NEDV
is especially intriguing because it suggests that it is possible to assess the effects of
programs even if you only have a treated group. Assuming the other conditions for the
design are met, control groups are not necessarily needed for causal assessment. Of
course, you can also couple the Pattern Matching NEDV design with standard experimental or
quasi-experimental control group designs for even more enhanced validity. And, if your
experimental or quasi-experimental design already has many outcome measures as part of the
measurement protocol, the design might be considerably enriched by generating
variable-level expectations about program outcomes and testing the match statistically.

One of my favorite questions to my statistician friends goes to the heart of the
potential of the Pattern Matching NEDV design. "Suppose," I ask them, "that
you have ten outcome variables in a study and that you find that all ten show no
statistically significant treatment effects when tested individually (or even when tested
as a multivariate set). And suppose, like the desperate graduate student who finds in
their initial analysis that nothing is significant that you decide to look at the
direction of the effects across the ten variables. You line up the variables in terms of
which should be most to least affected by your program. And, miracle of miracles, you find
that there is a strong and statistically significant correlation between the expected and
observed order of effects even though no individual effect was statistically
significant. Is this finding interpretable as a treatment effect?" My answer is
"yes." I think the graduate student's desperation-driven intuition to look at
order of effects is a sensible one. I would conclude that the reason you did not find
statistical effects on the individual variables is that you didn't have sufficient
statistical power. Of course, the results will only be interpretable as a treatment effect
if you can rule out any other plausible factor that could have caused the ordering of
outcomes. But the more detailed the predicted pattern and the stronger the correlation to
observed results, the more likely the treatment effect becomes the most plausible
explanation. In such cases, the expected pattern of results is like a unique fingerprint
-- and the observed pattern that matches it can only be due to that unique source pattern.

I believe that the pattern matching notion implicit in the NEDV design opens the way to
an entirely different approach to causal assessment, one that is closely linked to
detailed prior explication of the program and to detailed mapping of constructs. It
suggests a much richer model for causal assessment than one that relies only on a
simplistic dichotomous treatment-control model. In fact, I'm so convinced of the
importance of this idea that I've staked a major part of my career on developing pattern
matching models for conducting research!

The Regression Point Displacement (RPD) Design

The Regression Point Displacement (RPD)
design is a simple quasi-experimental strategy that has important implications, especially
for community-based research. The problem with community-level interventions is that it is
difficult to do causal assessment, to determine if your program made a difference as
opposed to other potential factors. Typically, in community-level interventions, program
costs preclude our implementing the program in more than one community. We look at
pre-post indicators for the program community and see whether there is a change. If we're
relatively enlightened, we seek out another similar community and use it as a comparison.
But, because the intervention is at the community level, we only have a single
"unit" of measurement for our program and comparison groups.

The RPD design attempts to enhance the single program unit situation by comparing the
performance on that single unit with the performance of a large set of comparison units.
In community research, we would compare the pre-post results for the intervention
community with a large set of other communities. The advantage of doing this is that we
don't rely on a single nonequivalent community, we attempt to use results from a
heterogeneous set of nonequivalent communities to model the comparison condition, and then
compare our single site to this model. For typical community-based research, such an
approach may greatly enhance our ability to make causal inferences.

I'll illustrate the RPD design with an
example of a community-based AIDS education program. We decide to pilot our new AIDS
education program in one particular community in a state, perhaps a county. The state
routinely publishes annual HIV positive rates by county for the entire state. So, we use
the remaining counties in the state as control counties. But instead of averaging all of
the control counties to obtain a single control score, we use them as separate units in
the analysis. The first figure shows the bivariate pre-post distribution of HIV positive
rates per 1000 people for all the counties in the state. The program county -- the one
that gets the AIDS education program -- is shown as an X and the remaining control
counties are shown as Os. We compute a regression line for the control cases (shown in
blue on the figure). The regression line models our predicted outcome for a count with any
specific pretest rate. To estimate the effect of the program we test whether the
displacement of the program county from the control county regression line is
statistically significant.

The second figure shows why the RPD design was given its name. In this design, we know
we have a treatment effect when there is a significant displacement
of the program point from the control group regression line.

The RPD design is especially applicable in situations where a treatment or program is
applied in a single geographical unit (e.g., a state, county, city, hospital, hospital
unit) instead of an individual, where there are lots of other units available as control
cases, and where there is routine measurement (e.g., monthly, annually) of relevant
outcome variables.