Appellate Court Gets It Wrong on NYC Teacher Data

Here's something you won't read too often in RHSU: "UFT president Michael Mulgrew is right." But he is. Just today, a New York state appellate court ruled that New York City must release reports that show value-added data on a teacher-by-teacher basis, with teachers' names attached. I agree with Mulgrew that this is an unfortunate decision.

New York City issues the reports in question to about 12,000 teachers annually, covering teachers in fourth through eighth grades whose kids take the state reading and math assessments. The value-added model in question incorporates a variety of factors, including student absenteeism, race, class size, and so forth. The result is good and useful data that ought to be incorporated into management decisions--but that shouldn't be released like this.

Several media organizations had sued for access to the individual teacher data. The appellate court ruled for the media outfits, determining that teachers' names did not fall within six exemptions protecting personal privacy under the law. The court explained, "Balancing the privacy interests at stake against the public interest in disclosure of the information...we conclude that the requested reports should be disclosed. Indeed, the reports concern information of a type that is of compelling interest to the public, namely, the proficiency of public employees in the performance of their job duties."

I disagree that this data, released in this fashion, serves a compelling public interest. There I find myself agreeing with Mulgrew and endorsing the UFT's announcement that it will appeal. Mulgrew's response to the decision focused especially on the hefty standard errors in the measurement. He said, "Experts agree that an 'accountability' measure with a 58-point swing -- like the DOE's teacher data system -- is worse than useless. Parents and teachers need credible, accurate assessments rather than guesswork." While I think he is engaging in a bit of hyperbole here, his larger point is broadly on target--and there are several other problems that Mulgrew could and should have flagged.

As I argued a year ago in response to the L.A. Times analysis, in explaining why--at least at this point--I think it's a bad idea to release teacher-level data with names attached:

Given my taste for mean-spirited measures, and the impressive journalistic moxie it showed, I really wanted to endorse the LAT's effort. But I can't. Now, don't get me wrong. I'm all for using student achievement to evaluate and reward teachers and for using transparency to recognize excellence and shame mediocrity. But I have three serious problems with what the LAT did.

First, as I've noted here before, I'm increasingly nervous at how casually reading and math value-added calculations are being treated as de facto determinants of "good" teaching. As I wrote back in April, "There are all kinds of problems with this unqualified presumption. At the most technical level, there are a dozen or more recognized ways to specify value-added calculations. These various models can generate substantially different results, with a third of each result varying with the specifications used. When used for a teacher in a single classroom, we frequently only have 20 or 25 observations (if that). The problem is that the correlation of such results year after year is somewhere in the .25 to .35 range."

Second, beyond these kinds of technical considerations, there are structural problems. For instance, in those cases where students receive substantial pull-out instruction or work with a designated reading instructor, LAT-style value-added calculations are going to conflate the impact of the teacher and this other instruction. How much of this takes place varies by school and district, but I'm certainly familiar with locales where these kinds of "nontraditional" (something other than one teacher instructing 20-odd students) arrangements accounts for a hefty share of daily instruction. This means that teachers who are producing substantial gains might be pulled down by inept colleagues, or that teachers who are not producing gains might look better than they should. Currently, there is nothing in the design of data systems that can correct for these kinds of common challenges. At a minimum, in the case of LAUSD, I would like to see data on how much of the relevant instruction is provided by the teachers in question--rather than by colleagues.

Third, there's a profound failure to recognize the difference between responsible management and public transparency. Transparency for public agencies entails knowing how their money is spent, how they're faring, and expecting organizational leaders to report on organizational performance. It typically doesn't entail reporting on how many traffic citations individual LAPD officers issued or what kind of performance review a National Guardsman was given by his commanding officer. Why? Because we recognize that these data are inevitably imperfect, limited measures and that using them sensibly requires judgment. Sensible judgment becomes much more difficult when decisions are made in the glare of the public eye.

So, where do I come out? I'm for the smart use of value-added by districts or schools. I'm all for building and refining these systems and using them to evaluate, reward, and remove teachers. But I think it's a mistake to get in the business of publicly identifying individual teachers in this fashion. I think it confuses as much as it clarifies, puts more stress on primitive systems than they can bear, and promises to unnecessarily entangle a useful management tool in personalities and public reputations.

Sadly, this little drama is par for the course in K-12. In other sectors, folks develop useful tools to handle money, data, or personnel, and then they just use them. In education, reformers taken with their own virtue aren't satisfied by such mundane steps. So, we get the kind of overcaffeinated enthusiasm that turns value-added from a smart tool into a public crusade. (Just as we got NCLB's ludicrously bloated accountability apparatus rather than something smart, lean, and a bit more humble.) When the shortcomings become clear, when reanalysis shows that some teachers were unfairly dinged, or when it becomes apparent that some teachers were scored using sample sizes too small to generate robust estimates, value-added will suffer a heated backlash. And, if any states get into this public I.D. game (as some are contemplating), we'll be able to add litigation to the list. This will be unfortunate, but not an unreasonable response--and not surprising. After all, this is a movie we've seen too many times.