The state gathers FCAT reading and math scores for each student in tested grades and, using a complex formula, predicts how each student should perform on the year’s tests. Then if students perform below or above that predicted level, the difference is attributed to their teacher’s effect on learning.

If a teacher doesn’t teach reading or math, their VAM scores came from schoolwide averages or from the scores of students they taught, regardless of a teacher’s subject.

For instance, a math teacher’s aggregate VAM score of 0.05 means that his or her students made 5 percent more growth than expected on math FCATs, thanks to the teacher. But a music teacher’s VAM scores may be a combination of their students’ math and reading scores.

Standard errors work like this.

They should be used to create “confidence intervals” — another term for margins of error — around a score.

For instance, if a teacher’s value added score of 5 has a standard error of plus or minus 2, that means that 5 is an estimate and it is possible that the teacher’s “true” score is some other number.

You use the standard error to build a range of possible scores, a “confidence interval,” around 5.

If you want to be 95 percent certain, then your confidence interval must be about twice the size of the standard error number — in this case, it should be 4 points above and below 5.

The teacher’s true VAM score is likely between 1 and 9, we are 95 percent certain.

It’s common for people to seek a 95 percent confidence range but they may go lower. Multiplying the standard error by 1, for instance, produces about a 68 percent confidence range.

VAM confidence

How confident should you be in VAM scores? Pay attention to their standard errors. Here are the percentages of VAM scores with standard errors larger than scores:

School districts likely will mislabel teachers as “effective” or “unsatisfactory,” teacher advocates say, if school officials rely on the state’s newly released teacher value-added scores, which measure a teacher’s impact on student learning.

The scores, which extend to many digits beyond a decimal point, aren’t as precise as they look. They come with an important caveat called the standard error.

It is a statistical way to measure how stable or variable a set of numbers is, to help people decide how certain they can be of the numbers.

Up to 73 percent of the state’s teacher value-added scores, or VAM scores, have high standard errors, and up to a half may be high enough to make their scores possibly too close to call, experts said.

That means a teacher who is viewed as above average or below average may be average after all.

“These standard errors are so large that … it would be unfair to sort teachers into categories based on these numbers,” said Donna Mohr, a statistics professor at the University of North Florida who specializes in growth curves and statistical applications.

By state law, VAM scores make up to half of a teacher’s annual review.

“Highly effective” and “effective” teachers can get raises, but “unsatisfactory” and “needs improvement” teachers can lose their jobs.

“You have the potential of taking a really good teacher and, because they don’t fit the formula exactly right, you’re telling them to hit the road,” said Mark Pudlow, a Florida Education Association spokesman.

But many school districts are ignoring the standard errors, including several First Coast districts. State officials insist that standard errors don’t invalidate VAM scores and can be used to make more accurate decisions about teachers.

‘false reliability’

“VAM used in conjunction with the standard error does differentiate among levels of teacher performance with respect to student growth,” said Joe Follick, a Florida Department of Education spokesman.

Mohr fears that by forgetting standard errors school districts are “assigning a false reliability” to VAM scores.

“I wouldn’t say that makes them invalid,” she said. “It’s just that the standard errors are a huge grain of salt you should be attaching to the [teacher] scores.”

With every statistic, there is uncertainty. As soon as you begin measuring anything, experts say, you change it, introducing variability or errors.

That is why VAM scores are called estimates.

“Standard errors express the statistical uncertainty in the VAM scores,” said Harold Doran, director of psychometrics in the assessment division of American Institutes for Research (AIR), the Washington, D.C., firm that Florida is paying nearly $4 million for VAM scores and analyses.

“No measure is perfect,” he said. “Using the standard error is one guard against over-reliance on statistical data. We encourage our clients to make good use of all data, not just the VAM score.”

How high is too high?

Each of Florida’s VAM scores has a standard error, which can be high or low based on how much data — student test scores, for instance — is available.

A teacher with 30 students would have a higher standard error than a teacher with 60 students. If a VAM score is based on three years of data it likely will have a lower standard error than one with two years of data.

The more data, the less chance for error or variability.

But how high is too high for a standard error?

Standard errors are used to set “confidence intervals,” which are like a margins of error around a VAM score.

something to watch

If a teacher has a VAM score of 5 and the standard error is +/-2, then to be 95 percent certain her district should assume the teacher’s true score is somewhere between 1 and 9. That is the confidence interval.

The higher the standard error, the wider the confidence interval.

Kata Mihaly, an economist at the Rand Corporation, says that if a VAM score’s confidence interval includes zero, its standard error may be too high for the score to be distinguishable from an average teacher’s score with similar students.

In Duval, Clay and St. Johns counties, about 74 percent of the 2012-13 aggregated VAM scores have such confidence intervals.

Mohr says those districts could find it hard to rank teachers accurately.

“If you have 20 teachers in a school and you rank them … they all might be excellent teachers,” Wood said. “Even the teacher with the lowest scores might still be an excellent teacher.”

That’s why, in addition to VAM, Nassau uses other measures, such as observation data and the number of students making certain test scores, in each teacher’s evaluations.

Duval Schools Superintendent Nikolai Vitti said his district takes VAM scores and then adds other student growth data and scores from other tests to be certain about teacher evaluations. The district also is creating its own tests beyond reading and math.

“I only have confidence that reading and math VAM provides some insight into teacher impact,” he said.

Even with high or low VAM scores, a standard error larger than a VAM score could mean there’s not enough evidence to draw conclusions about the score.

source of concern

Across the state, 47 percent of the standard errors are higher than their related, aggregated VAM scores. Similar percentages prevail in Clay, Duval and Putnam schools’ data.

In Putnam County people are beginning to question flaws in the state’s evaluation systems, said Deborah N. Decubellis, director of human resources.

“The standard error concerns us when it is greater than the teacher’s overall score,” she said.

“We also have a real concern when a positive VAM score is achieved and the school is rated as an F school.”

Other districts are like St. Johns, which factors the standard error and VAM scores into evaluations.

“It’s run through several formulas,” spokeswoman Christina Langston said.

VAM scores need more data to back them up, Mohr said, perhaps five to 10 years’ worth instead of the current three.

“It’s going to take a number of years before you get the kind of stability in the numbers that would allow you to reasonably sort people into certain categories,” she said.

The conservative vision of government is to have an overwhelming military/police presence and provide entitlements to the their elderly base. Everything else should be run in the free market without regulation.

Excellent comparison bugle on the crime stats affecting police officer job security. Of course right now republicans exclude the police and fire unions when they attack other unions. Once those other unions (postal/teacher/etc) are gone they will come after fire and police. Soon there will be only private fire and police protection. Everyone else is on their own. Typical third world structurte. Thanks republicans

A score of 5 has a +/- 2 margin of error? That's an 80% chance of being wrong! One can only assume that the folks that came up with this brilliant program are products of Florida schools an probably F schools to boot.

They should have consulted Corrine Brown. Even she could have told them that this was a hosed up idea.

Teachers are union members and fair game for political assault by the Republican Party. It's part of a strategy to blame the people on the front lines for the failures of those above.

It is a conservative trend to place blame for society's ills on the government. So they create a test that does precisely that. Maybe we should start to evaluate police on the number of crimes committed and possibly fire those who work in high crime areas for failing to do their job.

Education begins in the home. Teachers don't teach in a vacuum and no test can properly evaluate their job. For a political party that doesn't believe in science when it comes to evolution and climate change, they sure have faith in their science of testing.

If we applied one tenth as much oversight and accountability to these test scammers as we do the teachers teaching our kids, we'd be firing a lot more of the professional testers rather than getting rid of the teachers.

Of course the professional testing companies don't want accountability and won't get it because they are owned by folks like jeb's brother