You are here

Extreme school ratings

The U.S. Department of Education recently granted Ohio relief from No Child Left Behind’s (NCLB) most awkward mandates. To receive this relief, Ohio was required to present a school accountability plan that would put its kids on a college- and career-ready path. Ohio’s NCLB waiver request proposes a revamped accountability system based on three indicators of school quality: (1) student achievement, (2) student growth, and (3) achievement gap closure. The three indicator scores (reported as percentages) are summed and averaged—each given equal weight—to determine a school’s overall performance.[1]

The proposed system’s third indicator, gap closure, is a newly-conceived measure of how well nationally-defined student subgroups (e.g., racial, economically disadvantaged, special education, English language learners) perform on standardized tests compared to a state-designated baseline test score—an annual measureable objective (AMO). Any school building with more than 30 students in a subgroup must report its subgroup scores.

To gauge how well schools would perform under the proposed accountability system, the Ohio Department of Education (ODE) simulated schools’ performance using 2010-11 data. ODE’s simulated results, however, put into question the validity of the gap closure indicator, as currently designed.

The following charts show ODE’s simulated results. Consider the distribution of Ohio school buildings’ overall rating (Figure 1). The vertical axis indicates the number of school buildings that received a certain rating, and the horizontal axis shows the rating scale, which is expressed as a percentage. We observe that most school ratings fall within a relatively narrow band between 75 and 95 percent, somewhat normally distributed, though with a leftward skew (mean = 73 percent, standard deviation = 16.9).

Figure 1: Overall school building ratings relatively evenly distributed around mean. Distribution of overall school building ratings, ODE simulated results using 2010-11 data.

Source: Ohio Department of Education and author’s calculations. Note: A higher percentage for a building’s rating—the horizontal axes in Figures 1 and 2—correspond to a higher grade. This is equivalent to how student grades are calculated (e.g., “A” ≡ 90 to 100 percent). Overall school building rating is comprised of three equally weighted indicators (1) student achievement, (2) student growth, and (3) gap closure.

Now consider how school buildings are distributed according to Ohio’s proposed gap closure indicator (Figure 2). Again, the vertical axis indicates the number of school buildings, while the horizontal axis indicates a school’s gap closure rating. We notice a very different distribution of schools’ gap closure ratings compared to schools’ overall ratings. Gap closure ratings are nearly evenly dispersed across the entire rating scale (mean = 64 percent, standard deviation = 33.5). Moreover, we observe a large number of schools falling in the extreme margins of the distribution; for example, 890 out of 3,275 buildings received a 100 percent rating while 320 received 25 percent rating or less.

Figure 2: More than one in three schools rated at extreme margins (indicated in brighter red, 100 percent or under 25 percent) for gap closure indicator. Distribution of gap closure rating by building, ODE simulated results using 2010-11 data.

The distribution of gap closure ratings seems unusual. Why don’t we see a more balanced, normally-distributed dispersion of school ratings, similar to what the overall ratings show—with schools gravitating toward the mean? Moreover, should we conclude that the 890 schools which received a 100 percent rating are marvelously narrowing achievement gaps, while the 320 buildings that received less than 25 percent are miserably failing?

These questions warrant a closer examination of the schools at the extremes. Perhaps we’ll find that the top-performing schools actually only have few or only one subgroup to educate, while those at the bottom of the distribution have to educate many students across many subgroups.

A preliminary scan of schools supports the hypothesis. Take Midway Elementary School, a rural all-White school: it received a 100 percent gap closure rating, because it met the test score benchmark for one subgroup—White students. Meanwhile, the urban Charles Mooney Elementary School in Cleveland received a 0 percent gap closure rating, as it did not meet the state standards for any of its five subgroups.

Narrowing achievement gaps for disadvantaged subgroups is a legitimate educational objective, and ODE is right to include gap closure in the annual school and district report cards. But school ratings should also reflect effort. In a follow-up piece, we plan to conduct a more detailed examination of the schools that fall in the extreme margins to ascertain whether the schools at the top of the distribution (100 percent ratings) are simply those with few or only one subgroup, while those at the bottom of the distribution are those with numerous subgroups.

If some schools do in fact receive 33 percent of its overall school rating points virtually free—simply because they have few subgroups—ODE should consider adjusting its gap closure rating formula. Perhaps they could upwardly adjust the rating of high-subgroup schools based on their number of racial minority or special education students. This would be tantamount to a degree of difficulty adjustment. (Think figure skating scores: missing a triple axel is punished less than missing a single axel.) Another alternative may be to reduce weight of the gap closure indicator from 33 percent for single or low-subgroup school buildings. These adjustments would ensure that low-subgroup schools are not unfairly rewarded and high-subgroup schools are not being excessively punished in their overall school building ratings.