October 8, 2008

The WWC falls down on the job again

The What Works Clearinghouse (WWC) does a noble job of identifying much of the junk science research that plagues education research and masquerades as real research. The WWC, however, is not without its faults.I have noted at least twoinstances in which the WWC has given its imprimatur to very questionable research.

In August, the WWC released a report on Reading Mastery-- one of the most researched reading programs in existence. Despite the fact that other reputable organizations have found that much of the Reading Mastery research base passes scientific muster, the WWC did not find a single study that met its standards. Clearly something was amiss.

The author of Reading Mastery, Zig Engelmann, has just weighed in on the WWC's latest shenanigans -- Machinations of What Works Clearinghouse. Basically Zig says that WWC failed to locate a large portion of the extant post 1985 Reading Mastery research base, improperly excluded the entirety (38 studies) of the pre-1985 research base, and used dubious criteria for excluding at least one study it did consider. I suggest you read the whole thing. I'll elaborate on two points that Zig raises.

Dubious Rationale for Excluding Pre 1985 Research

The WWC arbitrarily limits its research review to studies reported no earlier than 1985 (unless the WWC principal investigator deems the study important enough to report). This 1985 cut-off makes little sense. Beginning reading performance hasn't changed much since 1985. In fact, we have readily available evidence that it hasn't changed much since as early as 1971. That evidence is the NAEP Long-Term Trend in Reading test data (not to be confused with the plain ol' NAEP test which changes frequently). Here's a graph of the performance of nine year olds (4th grade):

As you can see, the performance of nine-year olds in reading has stayed remarkably flat during the period 1971 - 2004 with little difference between pre-1985 scores and post-1985 scores. My back of the envelope calculation is that the change between 1971 and 1999 is less than a quarter of a standard deviation, i.e., not educationally significant. In fact, scores in 1980 were higher across the board than they were in 1999. Only in the post-1999 do scores rise above the 1980 high-water mark.

Since we have reliable data going back to 1971 showing similar performance in early reading, there is no compelling reason to arbitrarily set the cut-off at 1985. The rationale the WWC offers is lame:

... the fact that preschool enrollment has increased, combined with the fact that more preschool and kindergarten programs run full-day, means that students in the early grades may be better prepared to receive reading instruction today than students 25 years ago. Moreover, it is possible that any changes in reading readiness over this period may not have been evenly distributed, since differences in reading ability by socioeconomic status and race are apparent at the kindergarten level . . . Any of these changes could have implications for the effectiveness of an intervention. If school readiness has increased, then an intervention that was effective 25 years ago may not be effective in more recent years. (p. 2, Appendix A)

Perhaps the WWC hasn't heard, but there isn't any evidence that preschool, full-day kindergarten, and Headstart provide any lasting effects that don't quickly fade out. In fact all of the potential causes given by the WWC (for none have been confirmed by research) must be superficial and superfluous to reading performance, since the NAEP data shows that none of them have had a significant effect on reading performance.

This is a somewhat embarrassing admission coming from the WWC what with its lofty evidentiary standards and all. I also suggest you read Zig's evisceration of this argument which concludes:

The assertion that the children are better prepared now and therefore what was effective 25 years ago might not be effective now is logically impossible. Lower performers make all the mistakes that higher performers make. They make additional mistakes that higher performers don’t make and their mistakes are more persistent, more difficult to correct. Therefore, if the program is easier for them now because of their higher degree of undefined ―readiness, they will make fewer mistakes and progress through the program sequence faster.

...

[B]eginning reading for grades K–3 is stable because nothing of significance has changed in the last 40 years. The instructional goal is the same—to teach children strategies and information that would permit them to read material that could be easily covered with a vocabulary of 4,000 words. The frequency of these words has not changed. The syntax of the language has not changed significantly. For these reasons, the content of the first four levels of Reading Mastery has not changed over the years.

I am not aware of any properly conducted scientific research which has a shelf life of only 20 years. Research doesn't go bad. I'm not going to stop taking penicillin based drugs while the research gets updated because the basic research was conducted 80 years ago. And, I see little reason for the WWC to exclude any properly conducted research on Reading Mastery, such as the Project Follow Through, or for any other educational program for that matter.

Dubious Confounding Factors

It's bad enough that the WWC failed to even locate, much less consider, a majority of the extant Reading Mastery research. It's even worse that they set an arbitrary cut-off date that excluded at least 38 studies on Reading Mastery. However, improperly excluding a study (which otherwise meets all the selection criteria) based on the fact that the new teachers were provided initial training goes beyond the pale.

The RITE study (Carlson and Francis, 2002) which involved 9300 students and 277 teachers (Zig claims that it is "probably the second largest instructional study ever conducted (after Project Follow Through") met all of the WWC exceedingly high selection criteria. However, the WWC excluded the study because "support [was] provided to teachers through the RITE program" which the WWC believes to be a confounding factor. Here's the confounding "support" the teachers received:

This support consisted of summer training, less than two hours of monitoring during the year, and help from a designated trainer. Nearly half of the teachers (137) were in their first year of teaching Reading Mastery. The training focused on how to provide positive reinforcement, how to correct specific errors, how to organize and manage the classroom so that one small group is in reading instruction while the other two groups are engaged in independent work and are not disrupting the instruction... The teachers were trained to teach Reading Mastery exactly the way the [Teacher's] Guide describes it, with all the technical details in place.

This is not only a ridiculous reason for excluding an otherwise acceptable study, but also against the WWC's own protocols which permits the inclusion of "commercial programs and products that [have] an external developer who: Provides technical assistance (e.g., provides instructions/guidance on the implementation of the intervention)." (p. 6, Protocol)

The WWC excluded many other otherwise acceptable Reading Mastery studies based on "confounding factors." I wonder how many were confounding factors related to initial training like the RITE study. I know that more than one study was excluded because the control group initially performed at least half a standard deviation above the Reading Mastery group, yet despite this advantage, the Reading Mastery group outperformed the control group by the end of the study. I'm thinking that the magnitude of the effect size more than compensates for the reliability issue caused by initial discrepancy which favored the control group.

In any event, there you have it. The WWC failing to do their job properly yet again. This is beginning to become a pattern.

(A) To ensure that student and other stakeholder needs are understood and addressed, the school district or school shall:

...use assessment results and the value-added progress dimension to make informed decisions about curriculum, instruction, assessment, and goals;...Monitor ... instructional materials to determine their effectiveness in helping students meet performance objectives;

(B) The school district or school shall implement a district-wide curriculum and instructional program that is characterized by systematic planning, articulation, and evaluation. The school district's curriculum shall be developed with input from and dialogue with parents, community members, and other stakeholders.

If it were a matter of choosing between what Zig says and WWC says, Zig is the clear winner. But the matter of WWC is much more fundamental than the instances of Reading Mastery and Reading Recovery.

Let's buy the Institute of Medicine’s declaration, “A systematic review is a scientific investigation that focuses on a specific question…” WWC begs the “specific question” of reading instruction: teaching children to read any text with understanding equal to that were the communication spoken.

The cumulative evidence indicates that Reading Mastery reliably “works.” There are other “programs” that also deliver “readers.”

But what’s with WWC? It’s built on the proposition that the sole legitimate “design” is a randomized control comparative experiment. That’s an ex cathedra proposition not stipulated by god, the pope, or any scientific authority (outside of education). To the contrary. Over 40 years ago, Julian Stanley and Don Campbell showed conclusively that no situation in education meets the statistical requirements underlying this design. The invented a number of quasi-experimental designs to try to deal with the matter. Murray Sidman described an alternative logic of N=1 experimental designs even more applicable to instruction.

The mix of ideology and ineptitude has created a condition that would be comical were it not tragic. Out of one side of its mouth, the government contends that all the reading programs in schools are based on the “new science of reading.” Out of the other side of its mouth, the government is contending that very widely used programs such as Houghton-Mifflin and Open Court have no or very weak evidence that they work. On the output side the government is requiring the use of tests that are sensitive only to SES differences and not to instructional differences and mandating that schools and teachers register cumulative annual gains that are statistically impossible.

Zig says, “We need to engage in a full-fledged assault on WWC.” The thing is, there is no “we” and no operationalization of an “assault.”

sheds no light on "What Works." But the study relies on self reports of principals, "reading coaches" and teachers as indicators of "What's Happening" and on third and fourth grade standardized test scores at the school level from state-reported data bases as the indicator of "What is our children learning."

"What's happening" in Reading First schools is much what's happening in non-Reading First schools. And the few differences in "What's being learned" between RF and Non-RF amounts to only 2-3 percentage points.

The most telling finding is that 28% of k-3 kids were receiving "interventions." But with the "science" and the mandates, a lot of kids aren't being taught how to read. That's bad news. Even worse news: no more is known about how to accomplish the aspiration of NCLB than when the legislation was enacted.

About D-Ed Reckoning

The primary problem with K-12 education today is the problem of dead reckoning--an estimate based on little or no information. We don't know what a good K-12 education system is because we've never seen one operating. A good education system is one that is capable of educating almost every child.