Standardized Testing: A Decade in Review

There are those who think American public education is in shambles and needs to be completely remade, arguing that the standardized tests mandated by No Child Left Behind will be a vital tool in turning around U.S. schools: They say student promotion and graduation, teacher pay and employment, and school funding should all be tied to NCLB tests. As someone who spent many years working in the assessment industry -- not to mention someone who has been reading the newspaper for the last ten years -- I can't say I understand that idea. What exactly has occurred in standardized testing over the last decade that justifies such belief in large-scale assessments, or such blind faith in the completely unregulated, massively profitable industry that writes and scores NCLB tests?

We know that testing data can be manipulated to tell any story. We know that a school administration -- by making test questions easier or lowering cut scores -- can portray improvement in its classrooms even when such improvement doesn't really exist, as happened most recently in 2009 in the New York City schools. We know that "rogue" teachers or administrators -- by erasing incorrect student answers and changing them to correct ones -- can show student achievement even if there is no such achievement, as scandals in Atlanta and Detroit during 2010 both revealed and the current erasure debacle in Washington, D.C. also seems to show. And we know, from my book, that the testing companies fudge numbers all the time, whether reliability numbers (to show the industry is doing a more "standardized" job than it really is); validity numbers (to show the industry is doing a more accurate job than it really is); or score distribution numbers (when test scoring companies work to ensure student results match the predictions of their own psychometricians).

Psychometricians, of course, are the rock stars of the testing world, omniscient statisticians doing a job virtually no one comprehends. While I don't claim to understand their mysterious math, I do find it odd that during my long career writing and scoring tests I only once laid eyes on a psychometrician, and that was during a pick-up soccer game on the grassy grounds of ETS. Never when I wrote tests, or scored tests, or met with teachers to discuss those tests, did I see a psychometrician, meaning the most important people in the testing industry are people who don't often know what the tests look like and don't usually see the students' answers to them.

On top of what seems a general dubiousness about the numbers produced by standardized tests, we also know the testing industry regularly screws up. In the last decade or so, scoring errors have occurred on tests returned to students in Arizona (1999-2000), Washington (2000), Virginia (2005), Florida (2006), South Carolina (2008), and Minnesota (2010), not to mention Indiana, Illinois, Connecticut, etc. In 2000, a scoring error by NCS-Pearson (now Pearson Educational Measurement) led to 8,000 Minnesota students being told they failed a state math test when they did not, in fact, fail it (some of those students weren't able to graduate from high school on time because of it). In 2004, ETS erroneously informed over 4,000 teachers they had failed a PRAXIS exam that they had actually passed, leading to lost jobs and lawsuits aplenty. In 2006, Pearson again erred, giving lower scores than were deserved to more than 4,000 students taking the SAT, with the company making the excuse (apparently with a straight face) that their blunder resulted due to "abnormally high moisture content" in that year's score sheets. Keep in mind, also, that most of those errors were discovered only after a test-taker complained about a score, not when any company voluntarily disclosed the problem, raising questions about the legitimacy of every other test administered over the last 10 years.

Even without errors as obvious as the ones above, in my career, there seemed to be a major disconnect between the profit motive of the testing industry's major players (Pearson Education, McGraw-Hill, Riverside Publishing, ETS K-12, DRC) and any altruistic goals for American education. For the many years I scored students tests, I saw an industry primarily focused on meeting deadlines and completing contracts, with the importance of the correct scores being put on tests seeming to come in second to the rush to get any score put on them. My work in test development was no different, with the companies who employed me apparently willing to take huge shortcuts in developing tests because meeting a contract's deadline was clearly more important than the quality of any assessment.

Last year, I was amazed to see the management of a publishing company giving its test developers only four weeks to produce K-12 assessments for the Detroit Public Schools (a school system now bankrupt, but then, willing to pay millions to a testing company); later, however, that short time-frame looked like a leisurely vacation compared to breakneck pace the company next worked its employees at, when the test development staff was required to pound out more than 200 Common Core Standard tests over the next two months. Two hundred tests is probably more than a not-for-profit like ETS has developed in its entire history, but in a rush to address the new CCS market and get their hands on "Race to the Top" money that company worked its employees nearly to exhaustion and seemed willing to go to any length to write those tests: They recycled items used many times on previous tests, re-aligned items to link them to academic standards they were only sort of linked to, hired people with neither teaching nor testing experience to work as full-time test developers, employed any consultant off the streets willing to work, and re-hired testing vendors previously fired for the poor quality of their work (one of those vendors celebrated its renewed contract by immediately advertising on Craigslist, hoping to find anyone at all willing to write test questions for $8 each).

It's not like questions about the efficacy of the testing industry haven't long been raised, since even before the dramatic increase in testing that has recently resulted from NCLB. In 2001, a New York Times story about testing errors quoted various employees at a test-scoring factory in Iowa City who doubted the quality of the work being done: "There was a lack of personnel, a lack of time, too many projects, too few people," one said, while another noted the surfeit of work she faced meant she was concerned about "[her] ability, and the ability of the scorers, to continue making sound decisions and keeping the best interest of the students in mind."

In 2002, Amy Weivoda raised similar concerns at Salon.com, noting her experience scoring tests "led [her] to believe with absolute certainty that standardized tests are an utter waste of money and valuable teaching time, and that they measure nothing more than a state government's willingness to waste money." She pointed out that some of her test-scoring colleagues "were so spectacularly sociopathic they could find no other work. Some of the scorers had earned their degrees in prison." In 2009, a test scorer in Jacksonville, Florida wrote a two-act play about his career, a drama he said highlights the "silliness" of testing.

In 2010, Dan DiMaggio cited many of the same issues in The Monthly Review, writing of test scoring being standardized only in its "mystifying training process, supervisors who are often more confused than the scorers themselves, and a pervasive inability of these tests to foster creativity and competent writing." That dismay with the test scoring process was found again in a 2011 article in the Minneapolis City Pages, a story that concludes with one of the scorers commenting that the limitations of standardized testing were obvious to all: "Nobody is saying, 'I'm doing good work, I'm helping society,'" she says. "Everyone is saying, 'This isn't right.'"

The City Pages story ends with a quote from a Pearson Education spokesman, in which the company man notes the complaining scorers were "people who have a very limited exposure and narrow point of view on what is truly a science." Lest anyone buy too heavily into the "science" of standardized testing, it's telling to note that 2009 audits performed by the United States Department of Education of tests in Tennessee and Florida found identical problems to those the scorers detailed (not to mention other problems as well). Even if one concedes that all the complaints above are about test-scoring and not test development, it's important to remember that the open-ended questions on tests that are scored by humans seem to be the sort of "next generation" assessments the Obama administration is moving towards.

With even the President recently deriding the emphasis on "filling in bubbles" that results from multiple-choice tests, the current education reform agenda instead seems to be aimed at tests that address critical thinking skills, including "students' ability to read complex text, complete research projects, excel at classroom speaking and listening assignments, and work with digital media." Impressive jargon indeed, but every one of those tasks needs to be assessed by a living, breathing human being, and there's more than enough evidence that the living, breathing human beings currently doing that job either don't do it very well or don't think it can be done very well.

If we recognize from the evidence above that the testing industry seems ill-prepared to score all the student responses filled with incredibly complex thinking that the "next generation" of tests will surely generate, I can imagine only two other ways those students answers will be assessed. Either classroom teachers will be hired to score the answers to those tests, or the student responses will be "read" and scored by the new automated scoring technologies powered by artificial intelligence. The problem with teachers scoring the tests, of course, is that the education reformers believe this country's teachers can't be trusted to make decisions about kids, so that option seems unlikely. More likely is that those new tests will be scored by those vaunted automated scoring technologies, machines that can assess student answers to open-ended questions without being able to actually read them. Of note, no one is claiming those computers can read, only that it has been proven statistically that those automated systems can score student tests as accurately as do the temporary employees currently doing the job. That argument, of course, is presented as a defense of the new technologies, although I'm not sure how much solace we should find in knowing that a computer that can't read a student's answer understands it just as well as does some bored slacker being paid slave wages to give only fleeting glances to the work.

For the last 10 years in this country, we've regularly seen standardized tests results that can't be believed and standardized testing companies that can't be trusted. Still, the United States seems to be heading towards taking the decisions about American education out of the hands of American educators and instead placing that sacred trust in the welcoming arms of an industry run entirely without oversight and populated completely with for-profit companies chasing billions of dollars in business. In fact, even though the testing industry has proven consistently incompetent over the last ten years, we seem to be moving towards expanding our emphasis on it. To me that comes across as naïve, although a less forgiving person might think that entrusting public education to a bunch of bumbling, for-profit companies falls more towards the unethical/immoral/mercenary end of the spectrum.

In any case, when next some standardized test scores are found to be incorrect or fraudulent (because they will), or some standardized testing company commits or tries to cover up another egregious error (because they will), perhaps then we can admit large-scale assessment isn't the panacea it's often been touted to be. Perhaps then we can concede that an educational philosophy based on a system of national standardized tests isn't any Brave New World of American education; it's just a bad idea that even the Chinese are already turning away from as being too inefficient and antiquated.