Through the lens of social science, eduwonkette takes a serious, if sometimes irreverent, look at some of the most contentious education policy debates in this opinion blog. (Find eduwonkette's complete archives prior to Jan. 6, 2008 here.)

The NYC Teacher Experiment Revisited

Over at the Ed Sector, there's some confusion about my concern with the ethics of the NYC teacher experiment (see here). To be clear, my problem is not that NYC is collecting value-added data. As I have written before, standardized tests have a role to play in teacher assessment alongside holistic evaluation of teachers' effectiveness. But as eduwonk himself noted, the methodological issues are hairy and as of yet unresolved.

The concern expressed in my earlier post was how this experiment was conducted in secret and, in my opinion, in violation of generally accepted human subjects policies. The entire enterprise of social science relies on potential study participants trusting researchers to minimize risks and fully disclose the purpose of their study. Every time a gaff like this happens, it undermines researchers' ability to build trust with study participants in the future. Let's review the chronology:

1) In September, an academic experiment headed by two very talented researchers, Jonah Rockoff (Columbia Business School) and Tom Kane (Harvard Grad School of Ed), was announced. It was presented as an experiment intended to generate academic knowledge, not to inform human resources decisions in real time. (You can watch a video of a study recruitment session here.)

2) Academic research is bound not only by common sense research ethics, but by the conventions of university Institutional Review Boards. What this means is that when academic researchers conduct research intended to produce generalizable knowledge - i.e. if researchers want to publish off of these data - the experiment has to proceed within generally accepted research ethics and a university IRB has to approve it. (Even if this was not an academic research project, the DOE should have notified teachers of an intervention of potential consequence for them. After all, the data are not just being collected, but distributed to principals in the experiment's treatment group.)

IRBs are primarily concerned with the harm that researchers could do to subjects by intervening in their lives, and applicants to IRBs must demonstrate that their project poses minimal risks, that participants have been notified of these risks, and that participants have consented to the research. Teachers did not need to consent in this case, as they are government employees and their employers can collect whatever data they want.

However, it is difficult for me to understand how one could justify not notifying teachers in the study. After all, the information given to their principal - which, given the ongoing methodological problems with value-added, may or may not be accurate - has the potential to permanently change their principals' perceptions of them and their future employment prospects. Moreover, this treatment is not being applied universally to NYC teachers. By simply having the bad luck to be selected into the study's treatment condition, some teachers are affected and others are not.

It is important to note that a "live experimental" study like this one is different from the secondary data analysis studies that eduwonk cites. He wrote:

By that logic, all these various studies with panel data, choice studies using lotteries, etc...all constitute human experimentation and are wrong.

Studies based on secondary data analysis are fundamentally different - and are treated differently by IRBs - because researchers are analyzing "dead" data that have no effect on real people's lives. Ongoing research projects in which interventions are made in real people's lives are held to a different standard. And should be.

3)According to Edwize and the NYT article, teachers were not notified of the study. What went wrong is that at some point this went from an academic study to a human resources project that Chris Cerf wants to take prime time. Perhaps he mispoke, or the NYT article had this wrong, but it appears that these data, collected under the auspices of an academic research study, may be used as early as June. As eduwonk noted, simply gathering the data is not a problem. The problem is that under the cover of "academic research," data are being given to princpals in ways that affect teachers' future employment without teachers' knowledge.

The irony, of course, is that none of this would be a big deal if the project had been announced to teachers. When I watched the recruitment session video back in September, it didn't seem like a big deal at all. I bookmarked that this was an interesting experiment conducted by two reseachers whose work is first rate, and assumed that the experiment would proceed under normal conditions (i.e. full disclosure of the study). For reasons I don't fully understand, it didn't. And here we are.

There's much more to say about the methodological and broader philsophical issues with value-added measures. I'll follow up with a post on these issues later.

Her position here would be a lot more compelling if (a) this were an actual experiment in the way she and other anti-Klein partisans are seeking to describe it rather than what it is. In addition --and again-- the fact is that we don't know what they are doing with the data so at this point all these leaps to various consequences are unfounded.

But we do know what they are doing with the information, at least in the context of this experiment (and, as I have explained above, it is an experiment). Principals in the treatment group are given value-added data reports on each of their teachers. These principals' perceptions of teachers' academic effectiveness are thus affected - correctly or incorrectly - by this information. Saying "principals can't use it" is like trying to strike evidence from the record in a courtroom. Jurors' perceptions are already influenced, and the damage is done.

Categories:

3 Comments

The rhetorical strategy of Rotherham & Carey is classic. (a) eduwonkette compared the NYC experiment to Tuskegee; (b) Tuskegee was awful, and the comparison is inappropriate and preposterous; (c) therefore, whatever concerns eduwonkette might have are also inappropriate and preposterous. Of course eduwonkette never did actually say that Tuskegee was comparable to NYC, and in fact went out of her way to say that she wasn't equating the two; but somehow saying that was interpreted as exactly the opposite.

Another study that's kind of an interesting parallel with regards to human subjects concerns is the old Rosenthal and Jacobson (1968) study Pygmalion in the Classroom. In Pygmalion, teachers were told that a handful of students in their classes were identified as late bloomers based on a standardized test, and the performance of students was tracked over time to see if the students who were (randomly) identified as late bloomers did better academically than those who were not. So the experimental manipulation was presumably of teachers' expectations for students' performance (although there are questions about whether the experiment actually did manipulate teachers' expectations in the way that was claimed.) Would an IRB allow such a study now? I doubt it, largely due to concerns about the impact on students. And yet the students in Pygmalion aren't the research participants; the teachers are. This looks a lot like the Rockoff-Kane scenario to me, where principals are getting information that is intended to manipulate their expectations for, and evaluations of, teachers' performance. The presumed difference is that the data being given to principals are in fact accurate and objective, whereas the data provided to the teachers in Pygmalion were fictitious. But the assumption that the data provided to principals truly are an accurate indicator of their contribution to student learning is highly questionable both in the abstract, based on the methodological challenges in value-added assessment, and in how the data are likely being represented to the principals in the reports they are receiving.

This looks a lot like the Rockoff-Kane scenario to me, where principals are getting information that is intended to manipulate their expectations for, and evaluations of, teachers' performance. The presumed difference is that the data being given to principals are in fact accurate and objective, whereas the data provided to the teachers in Pygmalion were fictitious.

But the thing that confuses me is that for the Pygmalion experiment to be interesting, the data had to be fictious. You had to know that any observed correlation was the result of the information given teachers, and not the abilities of the kids themselves.

But the NYC experiment gives principals potentially meaningful data, and then looks for what???

The reporting on this has been kind of spotty, but here's my guess. Both treatment and control groups of principals will be asked to rate teachers, and within the treatment group, the researchers can see if there is a correlation between the value-added measure and principals' ratings. (If so, that's evidence that the principals are incorporating the value-added information into their ratings of teacher's "skill" or "quality".) Then, next year, it will be possible to see if the ratings of the principals in the treatment group are a better predictor of next year's value-added score than the ratings of the principals in the control group. If so, then this year's value-added information has increased the ability of a principal to judge a teacher's future performance--at least in terms of that teacher's contribution to the standardized test socre on which the value-added info is based.