Pages

Thursday, November 22, 2012

Data Driven Nonsense from Harvard and the Gates Foundation

A new
Gates-funded Harvard study has found that Los Angeles Unified School District
(LAUSD) teachers vary substantially in quality (more than in other districts)
and that it disproportionately places inexperienced teachers in lower performing
classrooms (as in other districts). The study, Human
Capital Diagnostic, was done by the Strategic Data Project (SDP), which is
connected with Harvard University’s Center for Education Policy Research.

The biggest
problem with this study is that it is a bunch of nonsense.

Let’s start
with the authors’ most profound claim: The best teachers in LAUSD provide the
equivalent of eight additional months of instruction during the school year compared
with the district’s worst teachers. Since their research was based entirely on student
scores on the California Standards Tests (CSTs), a high-stakes exam used to
rank schools, and the top teachers in the study were the ones with the largest student
gains on these tests, what they are really saying is that the best teachers
provided the equivalent of eight additional months of test prep.

Big wow!

The authors
state that there is “no specific cut-off for determining whether an effect size
is large or small,” but they assert that a standard deviation of 0.2 is
considered large in education research. The study found that the difference between
a 25th and a 75th percentile teacher is one-quarter of a
standard deviation (0.25). This would be significant if it was based on a
meaningful measurement of teacher effectiveness. Unfortunately, all it really
says it that some teachers are better than others at squeezing out student gains
on an otherwise lousy exam. It does not tell us whether their students are
becoming self-motivated, independent learners or competent critical thinkers
and problem-solvers. Furthermore, the study provided no explanation for how it determined
that 0.25 standard deviations was equivalent to eight months of instruction.

The authors
also claim that Teach for America (TFA) and Career Ladder teachers have higher
effects on their students than other novice teachers by 0.05 and 0.03 standard deviations
and they even attributed a gain of one to two months in additional learning to
these relatively small standard deviations. They make similar claims for
National Board Certified teachers, whose students test gains were 0.03-0.07
standard deviations higher than those of other teachers. Yet, if a standard
deviation of 0.2 is considered large in education, then a standard deviation of
0.03-0.07 ought to be considered small or even insignificant.

While the
standard deviation may be insignificant, the fact that this was being researched
in the first place is not. TFA provided 13% of new hires to the district over
the past six years (according to the study’s authors) and it would be of great
interest to the district’s administrators to show that the investment was
worthwhile. So let’s assume for the sake of argument that the difference
between TFA recruits and other novice teachers was significant. What would this
mean? TFA teachers may in fact be more willing than other novice teachers to
work long, unpaid overtime hours and substitute quality student-centered
instruction for “drill and kill” style teaching, both of which could produce
higher test scores without improving the quality of student learning.

Perhaps a
bigger problem with this study (like all studies and reforms based on student
test data) is that numbers are not the only relevant type of data in education
and sometimes not even the best. Ester Quintaro, writing for the Shanker Blog, talks about the “streetlight
effect,” from the parable of the drunk who searches for his lost wallet under
the streetlight, not because he lost it there, but because the light is better
there and it would be easier to find it if it happened to be there. Student
test data is easy to access now that it is required of every district in the
U.S. under No Child Left Behind (NCLB)—it is under the streetlight.Yet, at best it is only a proxy or very rough
estimate of teacher quality since it only considers a small part of what
teachers are expected to do.

Quintara
also correctly points out that NCLB has helped to institutionalize what counts
as data. “Scientifically-based research” is now limited to standardized test
scores, which, as it turns out, are not particularly scientific. Case studies,
ethnographies, teacher observations and portfolios, and other qualitative data
are considered unacceptable.

One promising
finding from the study was that teacher performance after two years was found
to be a good predictor of future effectiveness. In other words, the current
system of giving tenure to teachers after two years of good evaluations makes
sense. Teachers are not getting worse after two years. Novice teachers are not
better than veterans and should not have the right to bump them during layoffs
and LAUSD is not top heavy with a bunch of cranky veterans who can no longer
teach.