In the meantime, regardless of one’s opinion on whether the “Florida formula” is a success and/or should be exported to other states, the assertion that the reforms are responsible for the state’s increases in NAEP scores and FCAT proficiency rates during the late 1990s and 2000s not only violates basic principles of policy analysis, but it is also, at best, implausible. The reforms’ estimated effects, if any, tend to be quite small, and most of them are, by design, targeted at subgroups (e.g., the “lowest-performing” students and schools). Thus, even large impacts are no guarantee to show up at the aggregate statewide level (see the papers and reviews in the first footnote for more discussion).

DiCarlo obviously has formal training in the statistical dark arts, and the vast majority of academics involved in policy analysis would probably agree with his point of view. What he lacks however is an appreciation of the limitations of social science.

Social scientists are quite rightly obsessed with issues of causality. Statistical training quickly reveals to the student that people are constantly making ad-hoc theories about some X resulting in some Y without much proof. Life abounds with half-baked models of reality and incomplete understandings of phenomena, which have a consistent and nasty habit of proving quite complex.

Social scientists have developed powerful statistical methods to attempt to establish causality techniques like random assignment and regression discontinuity can illuminate issues of causality. These types of studies can bring great value, but it is important to understand their limitations.

DiCarlo for instance reviews the literature on the impact of school choice in Florida. Random assignment school choice studies have consistently found modest but statistically significant test score gains for participating students. Some react to these studies with a bored “meh.” DiCarlo helps himself along in reaching this conclusion by citing some non-random assignment studies. More problematically, he fails to understand the limitations of even the best studies.

For example, even the very best random assignment school choice studies fall apart after a few short years. Students don’t live in social science laboratories but rather in the real world. Random lotteries can divide students into nearly identical groups with the main difference being that one group applied for but did not get to attend a charter or private school. They cannot however stop students in the control group from moving around.

Despite the best efforts of researchers, attrition immediately begins to degrade control groups in random assignment studies. Usually after three years, they are spent. Those seeking a definitive answer on the long-term impact of school choice on student test scores are in for disappointment. Social science has very real limits, and in this case, is only suggestive. Choice students tend to make small but cumulative gains year by year which tend to become statistically significant around year three, which is right around when the random assignment design falls apart. What’s the long-term impact? I’d like to know too, but it is beyond the power of social science to tell us, leading us to look for evidence from persistence rates.

So let’s get back to DiCarlo, who wrote “The reforms’ estimated effects, if any, tend to be quite small, and most of them are, by design, targeted at subgroups (e.g., the “lowest-performing” students and schools). Thus, even large impacts are no guarantee to show up at the aggregate statewide level.” This is true but fails to recognize the poverty of the social science approach itself.

DiCarlo mentions that “even large impacts are no guarantee to show up at the aggregate statewide level.” This is a reference to the “ecological fallacy” which teaches us to employ extreme caution when travelling between the level of individual and aggregate level data. Read the above link if you want to know all the brutally geeky reasons why this is the case, take my word for it if you don’t.

DiCarlo is correct that connecting the individual level data (e.g. the studies he cites) to aggregate level gains is a dicey business. He however fails to appreciate the limitations of the studies he cites and the fact that the ecological fallacy problem cuts both ways. In other words, while generally positive, we simply don’t know the relationship between individual policies and aggregate gains.

We know for instance that we have a positive study on alternative certification and student learning gains. We do not and essentially cannot know however how many if any NAEP point gains resulted from this policy. The proper reaction for a practical person interested in larger student learning gains should be summarized as “who cares?” The evidence we have indicates that the students who had alternatively certified teacher made larger learning gains. Given the lack of any positive evidence associated with teacher certification, that’s going to be enough for most fair minded people.

The individual impact of particular policies on gains in Florida is not clear. What is crystal clear however is the fact that there were aggregate level gains in Florida. You don’t require a random assignment study or a regression equation, for instance when considering the percentage of FCAT 1 reading scores (aka illiterate) above. When you see the percentage of African American students scoring at the lowest of five achievement levels drop from 41% to 26% on a test with consistent standards, it is little wonder why policymakers around the country have emulated the policy, despite DiCarlo’s skepticism.

I could go on and bomb you with charts showing improving graduation rates, NAEP scores, Advance Placement passing rates, etc. but I’ll spare you. The point is that there are very clear signs of aggregate level improvement in Florida, and also a large number of studies at the individual level showing positive results from individual policies.

With large aggregate gains and plenty of positive research, the reasonable course is not to avoid doing any of the Florida reforms, but rather to do all of them. In the immortal words of Freud, sometimes a cigar really is just a cigar.

Matt DiCarlo is taken to task for looking at the micro-level studies and not seeing an aggregate result that looks like the macro numbers for Florida. But it’s a leap of faith to argue that the reforms work as a package, as you suggest (‘do all of them”). The evidence does not say that either. Perhaps it’s a reasonable course of action as you note but we should be clear about what the evidence actually is telling us here.

And the discussion of school choice studies is confused. Random assignment experiments do not constrain the control group to stay in public schools so that researchers can assess whether attending private (or charter schools) improves outcomes. Whatever the control group does is part of the counterfactual, including going to other schools of choice. Your comments suggest social science methods are too limited because the control group is mobile, but I would argue it is answering the question of whether one set of schools yields better outcomes than other schools (ones the control group students can attend). Maybe you are interested in knowing what happens if all of one group is constrained to attend one type of school and all of another group is constrained to attend another type of school, but this is fictional world being represented and the point of asking the question is unclear.

It is best to consider both individual and aggregate level evidence independently of each other. They are both important but they may be utterly unrelated. Skeptics need an explanation of Florida’s aggregate progress other than the reforms. A literature review of micro level data cannot provide one due to the ecological fallacy problem, which cuts both ways.

The main problem with the random assignment studies as I understand it is attrition in the control group- kids lose the lottery and then move, making it very difficult for researchers to keep track of them. There are others who read the blog who know much more about this from painful first hand experience, but the bottom line is that our test score evaluations all have a short time horizon. David Figlio had a regression discontinuity approach to study the Florida tax credit, but it eventually ran into data problems as well.

I’m always on board to affirm the limits of science. But I think you err in suggesting we have no findings on the long term academic impact of vouchers. We don’t have test scores but we do have high school graduation rates (random assignment studies in DC and NYC plus good evidence in Milwaukee as well).

I may have failed to make this clear, but I’d just like to note that the main purpose of my post was not to argue about NAEP, but rather to provide a balanced review of the available high-quality research on the Florida reforms. I discussed 15-20 strong studies (many of which, by the way, are positive, with standard caveats).

We obviously differ in our interpretation of NAEP cohort changes. That’s fine, and worth discussing (we’ve done this before, if I recall), but I think we can agree that there’s better evidence out there, and that this work merits attention.

Along those lines, at risk of imposing my preferences on people commenting on my writing, I’d be interested in hearing reactions to the substantive review – for example, whether I missed any good papers, or if my characterization of the research on each component was incomplete or unfair.

Thanks for your response. It is great to have a formidable opponent with a sense of humor. I don’t think your review of the micro-level studies was unbalanced. I prefer random assignment charter school studies, but I don’t know of any from Florida yet, so that falls into the quibble category.

The point that I was trying to make was that even the best micro-level research have very real limits and that we don’t and as far as I understand can’t have a good handle on how the micro and aggregate levels ultimately relate. That leaves us in the position of gathering as much information we can and drawing the best conclusions possible.

Nobody, least of all me, would object to the argument that we should gather as much information as possible and draw the best conclusions we can.

However, if you believe that NAEP cohort changes are valid policy evidence, that policies should be evaluated and exported en masse, and that a literature review is “splitting hairs,” then we are so far apart in terms of premises and general perspective that it’s probably best to simply agree to disagree.