The Buck Stops Here

Saturday, January 28, 2012

Don't Believe the "Defenders" of Teachers: Teachers Do Matter

You often see education commentators trying to suggest that bad school performance is almost entirely the fault of poverty and other external factors, not the fault of poor teaching. In making this claim, commentators often point to variance in student test scores that is allegedly "explained" by teachers. For example, Anthony Cody says, "Even Eric Hanushek, the economist who has done more to advance these evaluation systems than anyone, admits that teachers only account for around ten percent of the variability in student test scores."

Family and income are surely important, but the "10% of variance" argument is wrong for at least two reasons:

First, in statistical terms, saying that teachers account for 10% of the variance in student test scores does NOT mean that teachers are unimportant. Wrong, wrong, wrong. (At the end of the blog post, I say more about what explaining variance means.)

The eminent Harvard professors Rosenthal and Rubin explained this in a 1982 article, "A Simple, General Purpose Display of Magnitude of Experimental Effect," Journal of Educational Psychology 74 no. 2: 166-69 (that article isn't available online, but is described here).

As luck would have it, Rosenthal and Rubin address the precise example of a case wherein 10% of the variance was explained:

We found experienced behavioral researchers and experienced statisticians quite
surprised when we showed them that the Pearson r of .32 associated with a coefficient of determination (r2) of only .10 was the correlational equivalent of increasing a success rate from 34% to 66% by means of an experimental treatment procedure; for example, these values could mean that a
death rate under the control condition is 66% but is only 34% under the experimental condition. We believe . . . that there may be a widespread tendency to underestimate the importance of the effects of behavioral (and biomedical) interventions . . . simply because they are often associated with what are
thought to be low values of r2.

By analogy, saying that teacher quality explains 10% of the variance would be equivalent to saying that teachers can raise the passing rate from 34% to 66%. That's nothing to sneeze at, and it certainly isn't a reason for teachers to throw up their hands in dismay at the hopelessness of their task.

Second, the fact that teachers account for 10% of variance NOW, given a particular set of data points, tells us little or nothing about the true causal importance of teachers. As Richard Berk explains in his book Regression Analysis: A Constructive Critique, "Contributions to explained variance for different predictors do not represent the causal importance of a variable." 10% isn't a Platonic ceiling on what teachers can accomplish, and the proportion of variance explained tells us very little about how much impact teachers really do have.

A simple hypothetical example makes this clear: Imagine that all teachers in a school were of equal quality. Given equal teachers, any variation in student test scores would automatically have to arise from something other than differing quality of teaching. So a regression equation in that context might tell us that demographics explain a huge amount of the variation in test scores, while teaching quality explains nothing. But it would be completely wrong to conclude that demographics are inherently more important than teaching quality, or even that teaching quality doesn't matter. The exact opposite might be the case, for all that such a regression could tell us.

Moreover, if all teachers became twice as effective as they are now, there would still be variance among teachers and variance among student test scores, and teachers collectively might still "account" for a "small" amount of variance, but student performance might be much higher. The fact that teachers account for 10% of variance today (as large as that actually is) simply does not give us any sort of limit on how much student achievement could rise if the mean teacher effectiveness shifted sharply to the right.

So the would-be defenders of teachers can breathe a sigh of relief: value-added modeling might still be a shaky idea for several other reasons, but there's no need to denigrate the potential of teachers.

What does this mean? The denominator is calculated by taking all the individual Y's (in the education context, all of the student test scores that you're trying to explain), subtracting the average Y value, squaring all of the differences, and adding up all of the squared values. In the context of the following graph, the denominator gives us a measure of the total squared distance (in the vertical direction) that all of the red dots deviate from the average Y value.

The numerator tells us how far the regression line deviates from the predicted Y values. The regression line predicts that the Y values will be along the line itself, which obviously isn't exactly true. So the predicted Y values (that's what the little ^ sign over the Y means) have the average Y value subtracted, the difference is squared, and then all the squared differences are added up.

All in all, the "proportion of variance explained" figure is just a way to represent how close a regression line based on X will come to the actual red dots in the graph, compared to how close a line based on just the average red dot will come.

For the same reason that correlation is not causation, accounting for variance does not provide an upper limit for the true causal importance of a variable. As noted above, the level of variance "explained" is a bad way to determine how important X actually is. See D'Andrade and Hart, for example.
UPDATE: See Cosma Shalizi's post on explaining variance.

Jeff Buckley's "Corpus Christi Carol"

Tuesday, January 17, 2012

The New Groupthink

Our schools have also been transformed by the New Groupthink. Today, elementary school classrooms are commonly arranged in pods of desks, the better to foster group learning. Even subjects like math and creative writing are often taught as committee projects. In one fourth-grade classroom I visited in New York City, students engaged in group work were forbidden to ask a question unless every member of the group had the very same question.

I'm not sure why group seating in classrooms seems to have caught on so strongly. As a parent, I know that children are better behaved (if only by necessity) when they're not sitting close enough to bother someone else, mark on the other child's paper, etc.

Saturday, January 07, 2012

A Case Study in Bias

Two studies came out comparing the performance of schools or teachers.
In the first case, Raj Chetty, John Friedman, and Jonah Rockoff came up with just about the most extensive and sophisticated study of teachers' value-added that I've ever seen. As highlighted in the New York Times, the study includes estimates for how much high-quality teachers improve their students' income years later, and also (see pp. 29 ff.) includes a new way to check for bias by looking at how cohorts of students change performance when a high or low value-added teacher arrives from somewhere else. Very cool.

But such a study, implying that some teachers are better than others, and that teacher quality can be revealed by how well their students do on tests (conditioning on prior achievement and student demographics), is disturbing to some people. Diane Ravitch tweeted at least 67 times the day the study came out, trying to undermine the study by questioning its lack of peer review (so far), the way in which it was conducted, and the very project of looking at test scores in the first place.

In the second case, there's a group called Educate Now in Louisiana that released a PDF chart (available here) that merely lists the schools in New Orleans identified by whether they are Recovery School District schools or voucher-accepting private schools, and then listing what percentage of students score above basic on English and Math in grades 3-5. That's all. No attempt to control for the individual students' prior achievement, no attempt to control for any student demographic variables such as poverty, no attempt to control for the fact that students are eligible for vouchers only if they had been attending a failing public school, no statistical analysis whatsoever.

This is as primitive as it gets, and is a horrible way to judge the merit of voucher schools (as I explained here).

Did Diane Ravitch tweet 67 times criticizing this purported attempt to compare voucher schools to public schools? No: right in the midst of her incessant criticism of an immeasurably superior study, she sent out one tweet that said, "How did voucher schools in New Orleans do?" followed by a link.

Ravitch here displays the worst sort of intellectual bias: when what looks like one of the best studies out there doesn't fit her ideology, she acts as if it is far more questionable than the baloney that she otherwise is happy to plug. To be sure, it's OK to ask questions about the new value-added study, what it means, how it was done, and whether it was oversold in the media. But it's not OK to pass along a worthless analysis of the merits of vouchers.

Anti-reformers need to think a bit more carefully about whether they want someone as their standard-bearer who doesn't know the difference between good and bad research (or, worse, who doesn't care).