2
Seeking Strong Evidence

The committee’s charge was to consider the scientific evidence on teacher preparation and to design an agenda for the research that is needed to provide the knowledge for improving that preparation. We found many different kinds of evidence that relate to teacher preparation: as we sifted through the available work, we repeatedly confronted questions about evidentiary standards. At times we struggled to agree on whether particular kinds of information constituted evidence and on the sorts of inferences that could be drawn from different kinds of evidence. This chapter describes the issues we identified and our approach to them.

APPROACHES TO RESEARCH DESIGN AND EVIDENCE

Much has been written about the problems of conducting research in education, specifically about the appropriateness of various research designs and methods and ways to interpret their results. In general, we are in agreement with the approach to research in education described in the National Research Council (NRC) (2002a) report Scientific Research in Education. In particular, that report identified six principles that should guide, but not dictate, the design of research in education:

Pose significant questions that can be investigated empirically.

Link research to relevant theory.

Use methods that permit direct investigation of the question.

Provide a coherent and explicit chain of reasoning.

The National Academies of Sciences, Engineering, and Medicine 500 Fifth St. N.W. | Washington, D.C. 20001

Citation Manager

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 21
2
Seeking Strong Evidence
T
he committee’s charge was to consider the scientific evidence on
teacher preparation and to design an agenda for the research that is
needed to provide the knowledge for improving that preparation. We
found many different kinds of evidence that relate to teacher preparation:
as we sifted through the available work, we repeatedly confronted questions
about evidentiary standards. At times we struggled to agree on whether
particular kinds of information constituted evidence and on the sorts of in-
ferences that could be drawn from different kinds of evidence. This chapter
describes the issues we identified and our approach to them.
APPROACHES TO RESEARCH DESIgN AND EVIDENCE
Much has been written about the problems of conducting research in
education, specifically about the appropriateness of various research designs
and methods and ways to interpret their results. In general, we are in agree-
ment with the approach to research in education described in the National
Research Council (NRC) (2002a) report Scientific Research in Education.
In particular, that report identified six principles that should guide, but not
dictate, the design of research in education:
1. Pose significant questions that can be investigated empirically.
2. Link research to relevant theory.
3. Use methods that permit direct investigation of the question.
4. Provide a coherent and explicit chain of reasoning.
2

OCR for page 21
22 PREPARING TEACHERS
5. Replicate and generalize across studies.
6. Disclose research to encourage professional scrutiny and critique.
The application of these principles to questions about teacher prepara-
tion poses particular conceptual and empirical challenges:
• here are no well-formed theories that link teacher preparation to
T
student outcomes.
• he complex nature of schooling children makes it difficult to iden-
T
tify empirically the role of teacher preparation among the many
intertwined influences on student outcomes.
• he use of strict experimental design principles can be problematic
T
in some educational settings. Teacher candidates are sorted into
teacher preparation programs in nonrandom ways, just as begin-
ning teachers are nonrandomly sorted into schools and groups of
students: consequently, it is difficult to control for all the important
factors that are likely to influence student outcomes.
Improving learning outcomes for children is a complex process. Both
common sense and sophisticated research (e.g., Sanders and Rivers, 1996;
Aaronson, Barrow, and Sander, 2003; Rockoff, 2004; Rivkin, Hanushek,
and Kain, 2005; Kane, Rockoff, and Staiger, 2006) indicate that teach-
ers have enormously important effects on children’s learning and that the
quality of teaching explains a meaningful proportion of the variation in
achievement among children. However, understanding that teachers are
important to student outcomes and understanding how and why teach-
ers influence outcomes are very different; our charge required us to think
carefully about the evidence of the effects of teacher preparation. Student
learning is affected by numerous factors besides teaching, many of which
are beyond the control of the educational system. Even the factors that are
affected by education policy involve intricate interactions among teachers,
administrators, students, and their peers.1
Disentangling the role that teachers play in influencing student out-
comes is difficult, and understanding the ways in which teacher education
influences student outcomes is much more difficult. The design and the
delivery of teacher education are connected to outcomes for K-12 students
through a series of choices made by teacher educators and by teacher
candidates in their roles as students and, later, as teachers. Identifying the
empirical effects of teacher preparation on student outcomes poses many
1 We note the progress that has been made in exploring causal relationships in education in
new work supported by the Department of Education and in work synthesized by the What
Works Clearinghouse (see http://ies.ed.gov/ncee/wwc/ [September 2009]).

OCR for page 21
2
SEEKING STRONG EVIDENCE
of the problems that arise in most social science research, including: (1)
the development of empirical measures of the important constructs, (2)
accounting for the heterogeneous behavioral responses of individuals, and
(3) the nonrandom assignment of treatments (teacher preparation) in the
observable data. As in other social science research, the challenge of devel-
oping convincing evidence of the causal relationship between the prepara-
tion of teacher candidates and the outcomes of their K-12 students places
strong demands on theory, research designs, and empirical models.
Some of these challenges are illustrated in Figure 2-1. Teacher candi-
dates bring certain abilities, knowledge, and experiences with them as they
enter teacher preparation programs. These differences likely vary within
and across programs. The candidates then experience a variety of learning
opportunities as part of their teacher education. Again, these experiences
and the resulting knowledge and skills likely vary within and across pro-
grams. After completing their training, candidates who pursue teaching
likely enter classrooms that vary greatly within and across schools on a
variety of dimensions, including the characteristics of students, the cur-
riculum, the school climate, and the neighborhood climate. Each source of
variation affects individual student achievement: taken together, they com-
Student and
environment
student ability
Prospective and motivation
teachers peers
family
academic ability neighborhood
prior experiences
Teacher
preparation
Student
selectivity outcomes
School workforce
intensity teacher knowledge
content skills and practices
timing
School
Leadership
State requirements
student attributes
teacher certification District policies class size mentoring
teacher education program salaries induction
facilities
hiring
learning environment
other teacher policies
professional development
FIguRE 2-1 A model of the effects of teacher preparation on student achievement.
SOURCE: Adapted from Boyd et al. (2006, p. 159).

OCR for page 21
2 PREPARING TEACHERS
plicate the search for the empirical relationship between teacher education
and student outcomes.
We note that establishing the chain of causation is a challenge not
only in the field of education. Researchers and policy makers in medicine
and in many other social science fields struggle to design studies that yield
dependable results when carried out in real-world circumstances and to
make sound decisions in the absence of a clear or complete evidentiary
base (see, e.g., Sackett et al., 1996; Murnane and Nelson, 2007; Kilburn
and Karoly, 2008; Leigh, 2009). Common to many fields is the challenge
of thinking systematically about different sorts of evidence related to a
question involving complex interactions of human behavior, which may not
only vary in strength, but also vary in the mechanisms (or potential causes)
about which they provide information. It is also the case that in many so-
cial science fields, as well as education, researchers have worked creatively
to develop quasi-experimental research designs and other ways of making
use of available data in order to examine empirical questions in complex
real-world circumstances. Understanding of the nature of scientific evidence
in education (and other fields) is evolving, and the diverse methodological
backgrounds of the committee members enabled us to consider the issues
broadly.2 We considered all of these issues as we weighed different kinds of
studies and other available resources related to teacher preparation.
CAuSAL EVIDENCE
At the heart of many differences of opinion about the available research
on teacher preparation (as on many topics in education) are questions
about the strength of causal inferences to be made from it. One important
purpose of research on teacher preparation is to provide an empirical basis
for changes to policy and practice regarding the structure, content, and
timing of teacher preparation, and many research methods can contribute
to causal understanding. Causal understanding is built upon a body of
research that usually begins with descriptive analysis and empirical efforts
to identify correlations, and the development of competing theories of be-
havior. Refinements and adjustments are made to theoretical and empirical
2 Katz and Singer (2007) developed an approach that uses abductive reasoning to integrate
complex sets of qualitative and quantitative evidence related to a set of competing hypotheses.
This approach integrates the different types of evidence by considering both their relative
empirical strength and their relevance to the different hypotheses. The result is a systematic
account of the overall strength of the evidentiary base for each of the competing hypotheses,
which may clearly identify the hypothesis with the strongest overall explanatory power. This
approach was developed in the context of the literature on expert systems and has been applied
in military intelligence, medical diagnosis, and molecular biology, among other fields, but has
not been widely used in the social sciences (see also Singer, 2008).

OCR for page 21
2
SEEKING STRONG EVIDENCE
models as alternative explanations are explored. Causal understanding is
built on a converging body of evidence that includes research designs that
support causal inference, such as random assignment of subjects to treat-
ment and control.
Although there has rightly been much focus recently on research that
uses random assignment and other methods that produce direct causal evi-
dence, qualitative and other quantitative methods that describe institutions,
participants, and outcomes also provide valuable information. Descriptive
methods (whether qualitative or quantitative) shed light on the factors and
forces that may affect student outcomes. Correlation does not necessarily
imply cause, but it may provide useful guidance in ruling out competing
alternative explanations. When combined with theory, such methods con-
tribute to the identification and development of research hypotheses and
point to methods that can identify a causal relationship between a policy
and an outcome. In addition, descriptive analyses can provide information
that explains the chain of behaviors that lead to various student outcomes.
Because different methods have different strengths and weaknesses, it is
important to seek converging evidence that draws on multiple methods. We
expand below on the challenges of identifying causal pathways—as well as
the strengths and weaknesses of different research methods—in the context
of a hypothetical example about the impact of coursework on teachers of
mathematics.
The Complexity of Analysis: An Example
Suppose policy makers are interested in understanding whether teacher
preparation that includes rigorous mathematics coursework leads to higher
mathematics achievement among students. There are a variety of research
designs that could be used to explore this relationship. One would be to
identify the teacher preparation programs in which math content is more
rigorous than in other programs and compare the math achievement of the
students taught by the graduates of the more rigorous program with the stu-
dents taught by the teachers who completed the less rigorous math programs.
However, without careful statistical controls, this simple comparison may
well yield a misleading result because the other characteristics of teachers,
their students, and schools that influence achievement outcomes may them-
selves be correlated with teachers’ mathematics preparation.
To see how misleading results can occur, consider that parents have
control over which schools their children attend; teachers have control
over the schools in which they choose to work; and principals have con-
trol over the teachers they hire. Each of these choices influences which
teachers teach which students, and there is evidence that these choices
lead to quite different outcomes across schools in a variety of settings (see,

OCR for page 21
26 PREPARING TEACHERS
e.g., Betts, Reuben, and Danenberg, 2000; Lankford et al., 2002; Clotfel-
ter et al., 2006; Peske and Haycock, 2006). Teachers who score higher on
measures of academic ability and teacher preparation are usually found
in schools in which students come from more advantaged backgrounds
and score better on achievement tests. This nonrandom sorting of teachers
and their characteristics to students with systematically different achieve-
ment complicates the identification of the effect of preparation on student
outcomes. Thus, one may not be able to draw valid conclusions about
the effects of treatments from studies that do not adequately control for
these confounding effects.
Researchers sometimes propose the inclusion of readily available mea-
sures in a statistical model to control for these kinds of differences, but this
approach may not resolve the uncertainty. A variety of other important but
very difficult-to-measure characteristics of students, teachers, or schools
can also confound the understanding of teachers’ mathematics prepara-
tion. For example, parents who believe that education is very important
to their children are likely not only to seek out schools where they believe
their children will receive the best education, but also to provide other
advantages that improve their children’s achievement. Teachers also make
choices about how they approach their teaching, and those who make in-
vestments in rigorous mathematics preparation may also be more likely to
engage in other activities (such as professional development) that support
their students’ learning. Many teachers find it rewarding to work in schools
where students are inquisitive and have good study skills. Principals in these
schools will have strong applicant pools for vacancies and will hire the
most qualified applicants. As a result, one cannot determine whether better
prepared teachers lead to better student outcomes or whether the presence
of students who are very likely to succeed is among the school character-
istics that attract and retain better prepared teachers. Because rigorous
math preparation is correlated with such difficult-to-measure attributes of
teachers, students, and schools, isolating the causal effect of rigorous math-
ematics preparation is very difficult. The difficulty may not be overcome
even with the assistance of multiple regression models that control for a
long list of readily available variables. This is the challenge that often leads
researchers to turn to various other research designs.
Randomized and Quasi-Experimental Designs
The strongest case for a causal relationship can often be made when
randomization is used to control for differences that cannot be easily mea-
sured and controlled for statistically. In the context of our example about
the impact of the teachers’ mathematics coursework, the ideal case would

OCR for page 21
2
SEEKING STRONG EVIDENCE
be to randomly assign teacher candidates to different forms of mathematics
coursework and then to randomly assign those teachers to different schools
and students. If such a research design were both feasible and implemented
well, a strong evaluation of the effectiveness of rigorous math preparation
would be possible. When assignment is truly random, any possible effects of
variation in the treatment (e.g., in the rigor of mathematics content prepara-
tion) will be evident despite variation in all of the other possible influences
on student achievement. That is, any observed effect of the treatment on
the outcome could be assumed to be a result of the treatment, rather than
of other factors that may be unmeasured but are correlated with the treat-
ment. For many research questions and contexts, random assignment ex-
perimental designs provide very strong internal validity when implemented
well, and they are thus sometimes referred to as the “gold standard” of
evaluation techniques.
However, randomized experimental designs have some potential short-
comings. Most significant is that such designs are not a feasible or appro-
priate research design in some education settings. To continue the example
above, experimentally manipulating the extent of mathematics content
preparation a teacher receives may be difficult, but it is possible. Randomly
assigning those teachers across a wide range of student abilities may prove
more difficult. Random assignment is also susceptible to other potentially
confounding factors, such as when some parents respond to their percep-
tions about teacher quality by adjusting other factors, such as their own
contribution toward student achievement. Also troubling is the challenge
of accounting for the important individualized interactions that occur be-
tween teachers and students that lead to high student achievement. Fi-
nally, many experiments are designed tightly around a particular treatment
and counterfactual case (that is, hypotheses about what the circumstances
would be without the treatment). If well executed, the experiment provides
very strong evidence for this specific question, but typically such experi-
ments have limited ability to generalize to other settings (external validity)
(Shadish, Cook, and Campbell, 2002; Morgan and Winship, 2007). In addi-
tion, random assignment experiments are expensive and time-consuming to
carry out, which places practical constraints on the information this method
has been able to provide. In general, random assignment is an important
and underused research design in education; however, its strength—possibly
providing clearer information about causal relationships—must be weighed
against the difficulties in carrying out and generalizing from such studies.
Other research designs, often called quasi-experimental, use observa-
tional data to capitalize on naturally occurring variation (Campbell and
Stanley, 1963; Cook and Campbell, 1986). These methods use varying ap-
proaches to attempt to mimic the ways in which randomized designs con-

OCR for page 21
28 PREPARING TEACHERS
trol the variation.3 For example, a technique called regression discontinuity
analysis, an alternative method developed for the evaluation of social pro-
grams and more recently applied to educational interventions, uses statisti-
cal procedures to correct for possible selection bias and support an estimate
of causal impact (see, e.g., Bloom et al., 2005). Such approaches work well
when they can convincingly rule out competing explanations for changes in
student achievement that might otherwise be identified as an effect of the
factor being investigated. Thus these approaches work best when research-
ers have a strong understanding of the underlying process (in this case,
both the content of teacher preparation and the other forces that shape the
relationship between teacher preparation and K-12 student achievement)
and can explore the validity of competing explanations (National Research
Council, 2002a, p. 113; Morgan and Winship, 2007). Theory and descrip-
tive analysis, including quantitative and qualitative studies, as well as the
opinions of experts, all contribute to such an understanding.
A method that attempts to control for many alternative explanations
and is receiving increasing attention especially in examining issues of teacher
preparation is the value-added model of student achievement. We examine
this method in a bit more detail.
Value-Added Models
Value-added modeling is a method for using data about changes in
student achievement over time as a measure of the value that teachers or
schools have added to their students’ learning. The appeal of value-added
methods is the promise they offer of using statistical techniques to adjust for
unmeasured differences across students. Doing so would make it possible to
identify a measure of student learning that can be attributed to individual
teachers and schools. This approach generally requires very large databases
because researchers must statistically isolate the student achievement data
from data on other student attributes that could affect achievement (such
as the students’ prior achievement or the characteristics of their peers) to
control for the confounding factors identified in Figure 2-1. The models
typically include a variety of controls intended to account for many of the
competing explanations of the link between teacher preparation and stu-
dent achievement. The recent availability of district and statewide databases
that link teachers to their students’ achievement scores (Crowe, 2007) has
made this analysis feasible. However, there is substantial debate in the re-
search community about this approach (Kane and Staiger, 2002; McCaffery
et al., 2003; Rivkin, 2007; National Research Council, 2010).
3 For
a more detailed discussion of quasi-experimental designs, see Shadish, Cook, and
Campbell (2002); Morgan and Winship (2007).

OCR for page 21
2
SEEKING STRONG EVIDENCE
There are concerns that value-added methods do not adequately dis-
entangle the role of individual teachers or their characteristics from other
factors that influence student achievement. That is, they may not statisti-
cally control for the full range of potential confounding factors depicted in
Figure 2-1 (Rothstein, 2009). Accurate identification of these effects is com-
plicated by the nonrandom assignment of teachers to schools and students,
both across and within schools. In addition, there are concerns about mea-
sures of student outcomes and accurate measurement of teacher preparation
attributes. For example, there is concern that commonly used measures of
student achievement may assess only a portion of the knowledge and skills
that are viewed as important for students to learn. It may also be the case
that currently available measures of what constitutes teacher preparation
are inadequate representations of the underlying concepts. Another concern
is that student achievement tests developed in the context of high-stakes
accountability goals may provide a distorted understanding of the factors
that influence student achievement. While the tests themselves may be well
designed, the stakes attached to their results may cause teachers to focus
disproportionately on students who are scoring near the cut-points asso-
ciated with high stakes, at the expense of students who are performing
substantially above or below those thresholds.
All of these are important concerns that may affect the ability to draw
inferences from the estimated model. As with any research design, value-
added models may provide convincing evidence or limited insights, depend-
ing on how well the model fits the research question and how well it is
implemented. Value-added models may provide valuable information about
effective teacher preparation, but not definitive conclusions, and are best
considered together with other evidence from a variety of perspectives.
Qualitative and Descriptive Analyses
Qualitative and descriptive analyses also have much to contribute.
Proper interpretation of the outcomes of experimental designs and the
statistical approaches described above are dependent on the clear iden-
tification of the treatment and its faithful implementation. Theory, case
studies, interpretive research, descriptive quantitative analysis, expert judg-
ment, interviews, and observational protocols all help to identify promising
treatments and can provide important insights about the mechanisms by
which a treatment may lead to improved student outcomes. For example,
if it appears that stronger mathematics preparation for teachers is associ-
ated with improved math outcomes for students, there is good reason to
broaden and deepen the analysis with additional descriptive evidence from
other contexts and ultimately to develop research designs to investigate the
potential causal links.

OCR for page 21
0 PREPARING TEACHERS
As detailed in later chapters, there is more descriptive research avail-
able concerning teacher education than other forms of information. Indeed,
many policy and program initiatives related to teacher quality and prepara-
tion have emerged in the past 20 years, and there has been a great deal of
interest in the content and effects of teacher education. Professional societ-
ies in the academic disciplines have taken seriously their responsibility to
offer guidance both about what students should learn and the knowledge
and skills teachers need in order to develop that learning in their students,
and they have drawn on both research and the intellectual traditions of
their fields in doing so. We return in subsequent chapters to questions about
what can be concluded from this literature, but these contributions to the
discourse on teacher preparation have identified promising approaches and
pointed to the mechanisms that seem likely to have the greatest influence on
teacher quality. They also allow the field to refine testable hypotheses and
to develop sophisticated, nuanced questions for empirical study.
CONCLuSION
Although there has been a great deal of research on teacher education
(for summaries, see Wilson, Floden, and Ferrini-Mundy, 2001; Cochran-
Smith and Zeichner, 2005; Darling-Hammond and Bransford, 2005), few
issues are considered settled. As a result, the field has produced many
exciting research projects that are exploring a variety of ways of gaining
evidence. For many questions, researchers are grappling with fundamental
issues of theory development, formulating testable hypotheses, develop-
ing research designs to empirically test these theories, trying to collect
the necessary data, and examining the properties of a variety of emerging
empirical models.
Given the dynamic state of the research, we chose to examine a range of
research designs, bearing in mind the norms of social science research, and
to assess the accumulated evidence. Some research methods have greater
internal or external validity than others, but each has limitations; polarized
discussions that focus only on the strengths or weaknesses of a particular
method have contributed little to understanding of important research ques-
tions. We concluded that the accumulated evidence from diverse methods
applied in diverse settings would increase our confidence in any particular
findings.
Ideally, policy makers would base policy on a body of strong empirical
evidence that clearly converges on particular courses of action. In practice,
policy decisions are often needed before the research has reached that state.
Public scrutiny of deficiencies in teacher preparation has inspired many new
program and policy initiatives that have, in turn, generated a great deal of
information. Unfortunately, like most innovations in education, many of

OCR for page 21
SEEKING STRONG EVIDENCE
these initiatives have not been coupled with rigorous research programs to
collect good data on these programs, the fidelity of their implementation,
or their effects. Thus, although policy makers may need to make decisions
with incomplete information, the weaker the causal evidence, the more
cautiously they should approach these decisions and the more insistent they
should be about supporting research efforts to study policy experiments.
We return to this point in Chapter 9 in our discussion of a proposed re-
search agenda.