As states submit ESSA plans, policymakers must design responsible school rating systems

Is it possible to summarize a school’s performance, fairly and accurately, with a single rating? Schools are complex organizations that serve a variety of purposes, and measuring their progress toward these goals is notoriously tricky. Can we really say that this school is a “B+” and that school down the street is a “D”? Should we?

Education policymakers across the country are grappling with these questions as states develop their plans to implement the Every Student Succeeds Act (ESSA). Eleven states have submitted plans to the Department of Education already, providing a first glimpse at how states are handling the thorny issue of summative performance measures.

Let’s take a look at summative school ratings, with a particular focus on what research suggests about their effects and implications.

Background and debate on school ratings

Summative ratings, like school letter grades or star ratings, serve multiple purposes. State accountability systems use these ratings as the basis for sanction or intervention, while the public uses these ratings to compare and choose schools, decide where to live, evaluate school board members, and more.

Creating a single, summative school rating is a multi-step process. It requires policymakers to:

Identify criteria for evaluating schools

Select measures that map to those criteria

Determine how much weight to give to each measure

Collect data

Compute scores and convert to a rating

Each step of this process is subject to debate and disagreement. Steps 1 and 2 are vulnerable to criticisms that today’s school performance measures are flawed or limited in scope. Step 3 is vulnerable to disputes about how to weigh measures like growth and proficiency, with a political challenge that different weighting schemes create different winners and losers. Step 5 is vulnerable to broader objections that school performance is multi-dimensional and reductive evaluations are misleading. We might worry, for example, that a school that is good for one child would be bad for another child, and that reductive evaluations could wrongly imply to schools that either “all is well” or “all must change.”

A glance at the handling of summative ratings in the first ESSA state plans submitted, compiled by Andrew Ujifusa at EdWeek, reveals the different ways that states have navigated these issues. Among the proposals are: A–F letter grades for New Mexico and Tennessee; star ratings for Nevada and Washington, D.C.; color-coded targets for Vermont; named tiers for Illinois (from “Exemplary School” to “Lowest-Performing School”) and Massachusetts (from “Tier 1” to “Tier 6”); and a numerical index for Connecticut and New Jersey (with New Jersey’s scores including a percentile ranking). This variation is nothing new. One review by the Education Commission of the States in 2013 and another by Richard Welsh in 2016 show a wide variety of approaches, with many of the early ESSA plans consistent with prior state policies.

How might summative ratings affect students and schools?

The rationale underlying summative school ratings is that they provide a clear signal of quality to pierce the “smog” of data that can otherwise cloud people’s decisionmaking. This argument enjoys support from research in cognitive psychology and consumer behavior. Studies on topics ranging from hospital evaluations to gourmet jam purchases indicate that decisionmaking can suffer when the amount of information we receive exceeds our capacity to process it. Simplified ratings can help to make school quality information more manageable. Evaluative ratings, like color-coded letter grades, also might elicit emotional responses that motivate and organize decisionmaking.

To design summative ratings responsibly, policymakers need to consider the effects on both the users of school ratings (including parents and the public) and the subjects of school ratings (including administrators and teachers).

Parents are perhaps the most active users of school ratings, particularly while choosing schools for their children. At least in theory, a school-choosing parent confronts a multi-step process very much like the process described above. She decides which school characteristics matter most to her, what information to collect about those characteristics, and how to make tradeoffs. She collects data and makes reductive judgments about how well schools would serve her child. The process can be daunting, especially if the data tools are overwhelming, and might cause her to shun this “rational” school choice process and seek shortcuts for her decisionmaking. A summative school rating, if it reflects parents’ priorities, can make this process manageable. Indeed, a study from Charlotte-Mecklenburg found that providing simplified performance reports led more parents to request high-scoring schools. Another particularly interesting study—from Rebecca Jacobsen, Jeffrey Snyder, and Andrew Saultz—showed that people randomly assigned to see school performance represented in letter grades perceived larger quality differences between high-scoring and low-scoring schools than people assigned to see numerical performance indices, performance rankings (e.g., “advanced” and “basic”), or the percentage of students proficient.

School performance ratings also affect the behaviors of those who work for schools. Whether these effects are positive or negative depends on how well the behaviors rewarded by ratings systems align with the behaviors we would like schools to pursue. Florida’s “A+” accountability system, with its use of A–F school letter grades (and corresponding consequences), appears to have induced meaningful changes in instructional practices and improved performance from low-scoring schools. On the other hand, research on “bubble effects” in K-12 education and institutional responses to U.S. News and World Report rankings in higher education provide warnings that the combination of intense accountability pressure and gameable accountability metrics can lead people to cut corners. While accountability systems can generate pressure with or without summative ratings, ratings can intensify those pressures through increased visibility and heightened stakes.

The way forward

It seems, then, that the question of whether to create a summative rating depends on what kind of rating the state’s political process would create. A good summative rating—one that encourages constructive practices and evaluates schools by how they serve their communities rather than which communities they serve—is better than no summative rating, while no summative rating is better than a bad one. A clear signal to pierce the smog of school data can be valuable but only if it guides people in helpful directions.

Author

Some states might opt for a more hands-off approach and provide school dashboards without ratings to accompany them. This approach is tempting. It invites people to make their own judgments, based on their own criteria, and addresses the fundamentally correct objection that summative ratings cannot do justice to schools. However, this comes at a real cost. By refusing to rate schools, a state does not obviate the need for others to do so. It simply asks parents and the public to do it themselves. That is a deceptively hard task, and a comprehensive dashboard without a focal point like a rating can frustrate and confuse more than it clarifies.

Other states might opt for a middle-ground approach and provide different ratings for different domains. For example, a school might receive an “A” for academic growth but a “C” for extracurricular opportunities. Done carefully, this could work. Done carelessly, it could lose the simplifying value of summative ratings or unintentionally create impressions of equivalent importance across the rated domains. If policymakers believe that academic growth is more important than extracurricular opportunities, then the school ratings should reflect that in both their calculation and presentation.

Successfully designing a school rating system requires not only determining how to measure school performance but also navigating political considerations and understanding the psychology of simplified ratings. We will see through states’ ESSA plans whether they are up to the task.

The Brown Center Chalkboard launched in January 2013 as a weekly series of new analyses of policy, research, and practice relevant to U.S. education.

In July 2015, the Chalkboard was re-launched as a Brookings blog in order to offer more frequent, timely, and diverse content. Contributors to both the original paper series and current blog are committed to bringing evidence to bear on the debates around education policy in America.