100 percent of the variation in teachers’ with value-added measures is systematic.

and relatively high weight (usuallyfinal scores.

For example, there may be differences

35–50 percent). Some states do notSystem designers must pay closebetween students in different classesspecify a weight but employ a matrixattention to how raw value-added scores that are not measurable, and these dif-by which different combinations ofare converted into evaluation ratingsferences may cause some teachers tovalue-added scores, observations, andand how those ratings are distributedreceive lower (or higher) scores forother components generate final ratings; in relation to other components. Thisreasons they cannot control (Rothstein,in these systems, value-added scoresattention is particularly important given 2009).

still tend to be a driving component.

that value-added models, unlike manyIn practice, systematic error isBecause there will be minimal variation other measures (such as observations),arguably no less important than random

between districts, there will be littleare designed to produce a spread oferror—statistical noise due largely toopportunity to test whether outcomesresults—some teachers at the top, some small samples. Even a perfect value-differ for different designs.

at the bottom, and some in the middle. added model would generate estimates
with random error.

Think about the political polls cited
almost every day on television and in

newspapers. A poll might show a politician’s approval rating at 60 percent, but
there is usually a margin of error accompanying that estimate. In this case, let’s
say it is plus or minus four percentage
points. Given this margin of error, we
can be confident that the “true” rating is
somewhere between 56 and 64 percent
(though more likely closer to 60 than to

In polls, this confidence intervalis usually relatively narrow becausepolling companies use very largeiment with going higher. Such variation much of a spread.

samples, which reduces the chance thatcould be useful in assessing whetherSome states and districts that haveanomalies will influence the results.

and why different configurations lead to already determined scoring formulas do Classes, on the other hand, tend to bedivergent results, and this informationnot seem to be paying much attentionsmall—a few dozen students at most.

could then be used to make informedto this issue. They are instead relying on Thus, value-added estimates—especiallydecisions about increasing or decreasing the easy way out. For example, they are those based on one year of data, smallweights in the future.

No matter what the weight of value-guidance on how districts might cali-If you were told that a politician’sadded measures may be on paper, their brate the scoring to suit the other com-approval rating was 60 percent, plus or

actual importance will depend in nosmall part on the other componentsponents they choose.

minus 30 percentage points, you would
laugh off the statistic. You would know
chosen and how they are scored. Con-

Don’t ignore error—address it.

that it is foolish to draw any strong con-sider an extreme hypothetical example: Although the existence of error in value- clusions from a rating so imprecise. YetIf an evaluation is composed of value-added data is discussed continually,this is exactly what states and districtsadded data and observations, with each there is almost never any discussion, let are doing with value-added estimates. Itcounting for 50 percent, and a time-alone action, about whether and how to is at least defensible to argue that thesestrapped principal gives all teachersaddress it. There are different types ofestimates, used in this manner, have nothe same observation score, thenerror, although they are often conflated. business driving high-stakes decisions.

value-added measures will determineSome of the imprecision associatedThere are relatively simple ways