Over at the Brookings Institution’s education blog, Paul Bruno offers a thoughtful critique of Overregulation Theory (OT), the idea that government regulations on school choice programs can undermine their positive effects. Bruno argues that although OT is “one of the most plausible explanations” of the negative results that two studies of Louisiana’s voucher program recently found, it is not “entirely consistent with the available evidence” and “does not by itself explain substantial negative effects from vouchers.”

I agree with Bruno–and have stated repeatedly–that the studies’ findings do not conclusively prove OT. That said, I believe both that OT is consistent with the available evidence and that it could explain the substantial negative effects (though I think it’s likely there are other factors at play as well). I’ll explain why below, but first, a shameless plug:

On Friday, March 4th at noon, the Cato Institute will be hosting a debate over the impact of regulations on school choice programs featuring Patrick Wolf, Douglas Harris, Michael Petrilli, and yours truly, moderated by Neal McCluskey. If you’re in the D.C. area, please RSVP at this link and join us! Come for the policy discussion, stay for the sponsored lunch!

Is the evidence consistent with Overregulation Theory?

Bruno notes that the differences in enrollment trends between participating and non-participating private schools is consistent with OT. Participating schools had been experiencing declining enrollment in the decade before the voucher program was enacted whereas non-participating schools had slightly increasing enrollment on average. This is consistent with the OT’s prediction that better schools (which were able to maintain their enrollment or grow) would be more likely eschew the vouchers due to the significant regulatory burden, while the lower-performing schools (which were losing students) were more desperate for students and funding, and were therefore more willing to jump through the voucher program’s regulatory hoops. However, Bruno calls this evidence into question:

For one thing, the authors of the Louisiana study specifically check to see if learning outcomes vary significantly between schools experiencing greater or lesser prior enrollment declines, and find that they do not. (Bedrick acknowledges this, but doubts there was enough variation in the enrollment trends of participating schools to identify differences.)

We should be skeptical of the explanatory value of the study’s enrollment check. There is no good reason to assume that the correlation between enrollment growth or decline among the small sample of participating schools (which had significantly negative growth, on average) is the same as among all private schools in the state. Making such an assumption is like a blind man holding onto the truck of an elephant and assuming that he’s holding a snake.

The study does not show the variation in enrollment trends among the participating and non-participating schools, but we could imagine a scenario where the enrollment trend among participating schools ranged, say, from -25% to +5% while the range at non-participating schools was -5% to +25%. As shown in the following charts (which use hypothetical data), there may be a strong correlation between enrollment trends and outcomes among the entire population, while there is little correlation in the subset of participating schools.

In short, looking at the relationship between enrollment growth and performance in the narrow subset of participating schools doesn’t necessarily tell us anything about the relationship between enrollment growth and performance generally. Hence the study’s “check” that Bruno cites does not provide evidence against OT.

Is there evidence that regulations improve performance?

Bruno also cites evidence that regulations can have a positive impact on student outcomes:

Joshua Cowen of Michigan State University also points out that there is previous evidence of positive effects from accountability rules on voucher program outcomes in other states (though regulations may differ in Louisiana).

The Cowen article considers the impact of high-stakes testing imposed on the Milwaukee voucher program during a multi-year study of that program. The “results indicate substantial growth for voucher students in the first high-stakes testing year, particularly in mathematics, and for students with higher levels of earlier academic achievement.” But is this strong evidence that regulations improve performance? One of the authors of both the original Milwaukee study and the cited article–JPG all-star, Patrick Wolf–cautions against over-interpreting these results:

Ours is one study of what happened in one year for one school choice program that switched from low-stakes testing to high-stakes testing. As we point out in the report, it is entirely possible that the surge in the test scores of the voucher students was a “one-off” due to a greater focus of the voucher schools on test preparation and test-taking strategies that year. In other words, by taking the standardized testing seriously in that final year, the schools simply may have produced a truer measure of student’s actual (better) performance all along, not necessarily a signal that they actually learned a lot more in the one year under the new accountability regime.

If we had had another year to examine the trend in scores in our study we might have been able to tease out a possible test-prep bump from an effect of actually higher rates of learning due to accountability. Our research mandate ended in 2010-11, sadly, and we had to leave it there – a finding that is enticing and suggestive but hardly conclusive.

It’s certainly possible that the high-stakes test improved actual learning. But it’s also possible–and, I would argue, more probable–that changing the stakes just meant that the schools responded to the new incentive by focusing more on test-taking strategies to boost their scores.

For that matter, even if it were true that the regulations actually improved student learning, that does not contradict Overregulation Theory. Both advocates and skeptics of the regulations believe that schools respond to incentives. Those of us who are concerned about the impact of the regulations don’t believe that they can’t improve performance. Rather, our concern is that regulations imposed from above are less effective at improving performance than the incentives created by direct accountability to parents in a robust market in education, and may have adverse unintended consequences.

To explain: We’re concerned that regulations forbidding the use of a school’s preferred admissions standards or requiring the state test (which is aligned to the state curriculum) might drive away better-performing schools, leaving parents to choose only among the lower-performing schools. We’re concerned that price controls will inhibit growth, providing schools with an incentive only to fill empty seats rather than to scale up. We’re concerned that mandatory state tests will inhibit innovation and induce conformity. None of these concerns rule out the possibility (or, indeed, the likelihood) that over time, requiring private schools to administer the state test and report the results and/or face sanctions based on test performance will improve the participating schools’ performance on that test.

Again: we agree that schools respond to incentives. We just think the results of top-down incentives are likely to be inferior to the results of bottom-up choice and competition, which have proved to be powerful tools in so many other fields for spurring innovation and improving quality.

Can Overregulation Theory alone explain the negative results in Louisiana?

[E]ven if regulation prevented all but the worst private schools from participating, this would explain why students did not benefit from transferring into them, but not why students would transfer into them in the first place.

So Overregulation Theory might be part of the story in explaining negative voucher effects in Louisiana, but it is not by itself sufficient. To explain the results we see in the study, it is necessary to tell an additional story about why families would sort into these apparently inferior schools.

Bruno offers a few possible stories–that parents select schools “that provide unobserved benefits,” that the voucher program “induced families to select inferior schools,” or that parents merely “assume any private school must be superior to their available public schools”–but any of these can be consistent with OT. Indeed, the second story Bruno offers is practically an extension of OT: if the voucher regulations truncate supply so that it is dominated by low-quality schools, and the government gives false assurances that they have vetted those schools, then it is likely that we will see parents lured into choosing inferior schools.

That’s not to say that there are no other factors causing the negative results. It’s likely that there are. (I find Douglas Harris’s argument that the private schools’ curricula did not align with the state test in the first year particularly compelling, though I don’t think it entirely explains the magnitude of the negative results.) We just don’t have any compelling evidence that OT is wrong, and OT can suffice to explain the negative results.

I will conclude as I began: expressing agreement. I concur with Bruno’s assessment that “it is likely that the existing evidence will not allow us to fully adjudicate between competing hypotheses.” Indeed, it’s likely that future evidence won’t be conclusive either (it rarely is), but I hope that further research will shed more light on this important question. Bruno concludes by calling for greater efforts to “understand how families determine where their children will be educated,” noting that by understanding how and why parents might make “sub-optimal — or even harmful” decisions will help “maximize the benefits of school choice while mitigating its risks.” These are noble goals and I share Bruno’s desire to pursue them. I just hope that policymakers will approach what we learn with a spirit of humility about what they can accomplish.

In a major news development, today the Heartland Institute described JPGB as a “widely read education reform-pop culture blog.” After all these years of struggling for recognition as a major voice in the pop culture world, at long last our toil and struggle has been vindicated.

Oh, and they have this podcast I did on the Win-Win report showing that the research consistently supports school choice. If you’re, you know, into that kind of thing.

In case you forgot what that column of zeros on the right looks like, here it is again.

As you can see from their research results pictured above, the mood ring is capable of identifying a variety of student emotional states that could affect the learning environment. Teachers need to be particularly wary of the “hungry for waffles” mood because it is sometimes followed by the “flatulence” or “full bladder” mood.

Besides, mood rings are pretty groovy. And they can’t be any dumber than these Q Sensor bracelets.

The foundation has given $1.4 million in grants to several university researchers to begin testing the devices in middle-school classrooms this fall.

The biometric bracelets, produced by a Massachusetts startup company, Affectiva Inc, send a small current across the skin and then measure subtle changes in electrical charges as the sympathetic nervous system responds to stimuli. The wireless devices have been used in pilot tests to gauge consumers’ emotional response to advertising.

Gates officials hope the devices, known as Q Sensors, can become a common classroom tool, enabling teachers to see, in real time, which kids are tuned in and which are zoned out.

And now the Gates Foundation is extending that foolish enterprise to include measuring Galvanic Skin Response as a proxy for student engagement. This simply will not work. The extent to which students sweat is not a proxy for engagement or for learning. It is probably a better proxy for whether they are seated near the heater or next to a really pretty girl (or handsome boy).

Galvanic Skin Response has already been widely used as part of the “scientific” effort to detect lying. And as any person who actually cares about science knows — lie detectors do not work. Sweating is no more a sign of lying than it is of student engagement.

I’m worried that the Gates Foundation is turning into a Big Bucket of Crazy. Anyone who works for Gates should be worried about this. Anyone who is funded by Gates should be worried about this. If people don’t stand up and tell Gates that they are off the rails, the reputation of everyone associated with Gates will be tainted.

Andrew Coulson has replied to Sherman Dorn on the productivity implosion chart. Turns out that I had been using an old version of the chart, and Professor Dorn has conceeded the larger point over the broad sweep of the spending and academic trends, but who doesn’t enjoy a tussle over methods?

Share this:

I just wanted to add a few thoughts to my post yesterday. Readers may be wondering what is wrong with using science to identify the best educational practices and then implementing those best practices. If they are best, why wouldn’t we want to do them?

Let me answer by analogy. We could use science to identify where we could get the highest return on capital. If science can tell us where the highest returns can be found, why would we want to let markets allocate capital and potentially make a lot of mistakes? Government could just use science and avoid all of those errors by making sure capital went to where it could best be used.

Of course, we tried this approach in the Soviet Union and it failed miserably. The primary problem is that science is always uncertain and susceptible to corruption. We can run models to measure returns on capital, but we have uncertainty about the models and we have uncertainty about the future. Markets provide a reality test to scientific models by allowing us to choose among competing models and experience the consequences of choosing wisely or not. Science can advise us, but only choice, freedom, and experience permit us to benefit from what science has to offer.

And even more dangerous is that in the absence of choice and competition among scientific models, authorities will allow their own interests or preferences to distort what they claim science has to say. For an excellent example of this, check out the story of Lysenko and Soviet research on genetics. For decades Soviet science was compelled to believe that environmental influences could be inherited.

In Education MythsI argued that we needed to rely on science rather than our direct experience to identify effective policies. Our eyes can mislead us, while scientific evidence has the systematic rigor to guide us more accurately.

That’s true, but I am now more aware of the opposite failing — believing that we can resolve all policy disputes and identify the “right way” to educate all children solely by relying on science. Science has its limits. Science cannot adjudicate among the competing values that might attract us to one educational approach over another. Science usually tells us about outcomes for the typical or average student and cannot easily tell us about what is most effective for individual students with diverse needs. Science is slow and uncertain, while policy and practice decisions have to be made right now whether a consensus of scientific evidence exists or not. We should rely on science when we can but we also need to be humble about what science can and can’t address.

I was thinking about this while reflecting on the Gates Foundation’s Measuring Effective Teachers Project. The project is an ambitious $45 million enterprise to improve the stability of value-added measures while identifying effective practices that contribute to higher value-added performance. These are worthy goals. The project intends to advance those goals by administering two standardized tests to students in 8 different school systems, surveying the students, and videotaping classroom lessons.

The idea is to see if combining information from the tests, survey, and classroom observations could produce more stable measures of teacher contributions to learning than is possible by just using the state test. And since they are observing classrooms and surveying students, they can also identify certain teacher practices and techniques that might be associated with greater improvement. The Gates folks are using science to improve the measures of student progress and to identify what makes a more effective teacher.

This is a great use of science, but there are limits to what we can expect. When identifying practices that are more effective, we have to remember that this is just more effective for the typical student. Different practices may be more effective for different students. In principle science could help address this also, but even this study, with 3,000 teachers, is not nearly large enough to produce a fine-grained analysis of what kind of approach is most effective for many different kinds of kids.

My fear is that the researchers, their foundation-backers, and most-importantly, the policymaker and educator consumers of the research are insensitive to these limitations of science. I fear that the project will identify the “right” way to teach and then it will be used to enforce that right way on everyone, even though it is highly likely that there are different “right” ways for different kids.

Unfortunately, Vicki Phillips mis-read her own Foundation’s report. On p. 34 the correlation between test prep and value-added is positive, not negative. If the study shows any relationship between test prep and student progress, it is that test prep contributes to higher value-added. Let’s leave aside the fact that these were simply a series of pairwise correlations and not the sort of multivariate analysis that you would expect if you were really trying to identify effective teaching practices. Vicki Phillips was just plain wrong in what she said. Even worse, despite having the error pointed out, neither the Gates Foundation nor the New York Times has considered it worthwhile to post a public correction. Science says what I say it says.

And this is the greatest danger of a lack of humility in the application of science to public policy. Science can be corrupted so that it simply becomes a shield disguising the policy preferences of those in authority. How many times have you heard a school official justify a particular policy by saying that it is supported by research when in fact no such research exists? This (mis)use of science is a way for authority figures to tell their critics, “shut up!”

But even if the Gates report had conducted multivariate analyses on effective teaching practices and even if Vicki Phillips could accurately describe the results of those analyses, the Gates project of using science to identify the “best” practices is doomed to failure. The very nature of education is that different techniques are more effective in different kinds of situations for different kinds of kids. Science can identify the best approach for the average student but it cannot identify the best approach for each individual student. And if students are highly varied in their needs, which I believe they are, this is a major limitation.

But as the Gates Foundation pushes national standards with new national tests, they seem inclined to impose the “best” practices that science identified on all students. The combination of Gates building a national infrastructure for driving educator behavior while launching a gigantic scientific effort to identify the best practices is worrisome.

There is nothing wrong with using science to inform local practice. But science needs markets to keep it honest. If competing educators can be informed by science, then they can pick among competing claims about what science tells us. And they can learn from their experience whether the practices that are recommended for the typical student by science work in the particular circumstances in which they are operating.

But if the science of best educator practice is combined with a national infrastructure of standards and testing, then local actors cannot adjudicate among competing claims about what science says. What the central authorities decide science says will be infused in the national standards and tests and all must adhere to that vision if they wish to excel along these centralized criteria. Even if the central authority completely misunderstands what science has to say, we will all have to accept that interpretation.

I don’t mean to be overly alarmist. Gates has a lot of sensible people working for them and there are many barriers remaining before we fully implement national standards and testing. My concern is that the Gates Foundation is being informed by an incorrect theory of reform. Reform does not come from science identifying the right thing to do and then a centralized authority imposing that right thing on everyone. Progress comes from decentralized decision-makers having the freedom and motivation to choose among competing claims about what is right according to science.