better to let the apples, oranges and locomotives stay in the real world and, in the classroom, to focus on abstract equations

One train leaves Station A at 6 p.m. traveling at 40 miles per hour toward Station B. A second train leaves Station B at 7 p.m. traveling on parallel tracks at 50 m.p.h. toward Station A. The stations are 400 miles apart. When do the trains pass each other?

Entranced, perhaps, by those infamous hypothetical trains, many educators in recent years have incorporated more and more examples from the real world to teach abstract concepts. The idea is that making math more relevant makes it easier to learn.

That idea may be wrong, if researchers at Ohio State University are correct. An experiment by the researchers suggests that , in this case 40 (t + 1) = 400 – 50t, where t is the travel time in hours of the second train. (The answer is below.)“The motivation behind this research was to examine a very widespread belief about the teaching of mathematics, namely that teaching students multiple concrete examples will benefit learning,” said Jennifer A. Kaminski, a research scientist at the Center for Cognitive Science at Ohio State. “It was really just that, a belief.”

Dr. Kaminski and her colleagues Vladimir M. Sloutsky and Andrew F. Heckler did something relatively rare in education research: they performed a randomized, controlled experiment. Their results appear in Friday’s issue of the journal Science.

Though the experiment tested college students, the researchers suggested that their findings might also be true for math education in elementary through high school, the subject of decades of debates about the best teaching methods.

In the experiment, the college students learned a simple but unfamiliar mathematical system, essentially a set of rules. Some learned the system through purely abstract symbols, and others learned it through concrete examples like combining liquids in measuring cups and tennis balls in a container.Then the students were tested on a different situation — what they were told was a children’s game — that used the same math. “We told students you can use the knowledge you just acquired to figure out these rules of the game,” Dr. Kaminski said.

The students who learned the math abstractly did well with figuring out the rules of the game. Those who had learned through examples using measuring cups or tennis balls performed little better than might be expected if they were simply guessing. Students who were presented the abstract symbols after the concrete examples did better than those who learned only through cups or balls, but not as well as those who learned only the abstract symbols.

The problem with the real-world examples, Dr. Kaminski said, was that they obscured the underlying math, and students were not able to transfer their knowledge to new problems.

“They tend to remember the superficial, the two trains passing in the night,” Dr. Kaminski said. “It’s really a problem of our attention getting pulled to superficial information.”The researchers said they had experimental evidence showing a similar effect with 11-year-old children. The findings run counter to what Dr. Kaminski said was a “pervasive assumption” among math educators that concrete examples help more children better understand math.But if the Ohio State findings also apply to more basic math lessons, then teaching fractions with slices of pizza or statistics by pulling marbles out of a bag might prove counterproductive. “There are reasons to think it could affect everyone, including young learners,” Dr. Kaminski said.Dr. Kaminski said even the effectiveness of using blocks and other “manipulatives,” which have become more pervasive in preschool and kindergarten, remained untested. It has not been shown that lessons in which children learn to count by using blocks translate to a better understanding of numbers than a more abstract approach would have achieved.

The Ohio State researchers have begun new experiments with elementary school students.Other mathematicians called the findings interesting but warned against overgeneralizing. “One size can’t fit all,” said Douglas H. Clements, a professor of learning and instruction at the University of Buffalo. “That’s not denying what these guys have found, whatsoever.”Some children need manipulatives to learn math basics, Dr. Clements said, but only as a starting point. “It’s a fascinating article,” said David Bressoud, a professor of mathematics at Macalester College in St. Paul and president-elect of the Mathematical Association of America. “In some respects, it’s not too surprising.”As for the answer to the math problem at the top of this article, the two trains pass each other at 11 p.m. at the midway point between Stations A and B. Or, using the abstract approach, t = 4. ■

Up in smoke goes the educrats’ mantra about hands on and story solving for math.

I love how the educrats go on and on about how we don’t know what the future holds. Therefore we must use reform math to prepare our children for the unknown applications they will encounter later in life.

What a bunch of bunk. It lacked common sense then, now it has been disproved.

It seems the critics/parents were right all along; teach the concepts to children so they can be applied in any future situation.

Now, Regina, will you pack your bags and leave us alone. Oh, and take the dopey principals who have bought into your pseudo math with you.

the results of the election also shows that other schools do not want to change the methodology they are using right now (TRADITIONAL mostly) so they voted against unification. Maybe we are lucky enough for the BOE to look at the results by ZONE and realize TERC is not what Travell school wants and deserves. Our children, ALL children deserve the same education and, they are not receiving it. We pay the same tax % in the east than in the west….why not giving the same quality education to the east and to the west?????

sorry 241 but the results were loud and clear TERC WON! and as the BOE hacks on this blog stated over and over again “the kids at travell are stupid”….time let go mmaybe a tutor,home school or a private school …

It seems Mr. Obama’s friend Bill Ayers of Weather Underground fame is now seeking revolutionary change by another means.

Query whether people like Bill Ayers will expect the White House doors to be thrown open to them in the event Mr. Obama is elected.

Putting national implications to one side, though, the following information and related open-ended question also seems relevant to the Ridgewood district’s current struggles with a certain willful, inscrutable administrator currently populating Cottage Place. Enjoy!

In fact, he is a tenured Distinguished Professor of Education at the University of Illinois, Chicago.

I haven’t heard if Obama has corrected himself on this. However, this is what I’m interested in:

The more pressing issue is not the damage done by the Weather Underground 40 years ago, but the far greater harm inflicted on the nation’s schoolchildren by the political and educational movement in which Ayers plays a leading role today.

[…]

Instead of planting bombs in public buildings, Ayers now works to indoctrinate America’s future teachers in the revolutionary cause, urging them to pass on the lessons to their public school students.

[…]

Ayers’s influence on what is taught in the nation’s public schools is likely to grow in the future. Last month, he was elected vice president for curriculum of the 25,000-member American Educational Research Association (AERA), the nation’s largest organization of education-school professors and researchers. Ayers won the election handily, and there is no doubt that his fellow education professors knew whom they were voting for. In the short biographical statement distributed to prospective voters beforehand, Ayers listed among his scholarly books Fugitive Days, an unapologetic memoir about his ten years in the Weather Underground. The book includes dramatic accounts of how he bombed the Pentagon and other public buildings.

(Sol Stern in the City Journal)

Maybe the media should be questioning Obama and McCain about their views on Ayers in this influential position. Some readers might believe doing so would be a demonstration of “gotcha” politics, but I really would like to hear their answers.

Interesting what one can find when when one finally decide to overturn a couple of innocuous-looking stones, isn’t it?

(Begin quote)

Sol Stern

Obama’s Real Bill Ayers ProblemThe ex-Weatherman is now a radical educator with influence.

23 April 2008

Barack Obama complains that he’s been unfairly attacked for a casual political and social relationship with his neighbor, former Weatherman Bill Ayers. Obama has a point. In the ultraliberal Hyde Park community where the presidential candidate first earned his political spurs, Ayers is widely regarded as a member in good standing of the city’s civic establishment, not an unrepentant domestic terrorist. But Obama and his critics are arguing about the wrong moral question. The more pressing issue is not the damage done by the Weather Underground 40 years ago, but the far greater harm inflicted on the nation’s schoolchildren by the political and educational movement in which Ayers plays a leading role today.

A Chicago native son, Ayers first went into combat with his Weatherman comrades during the “Days of Rage” in 1969, smashing storefront windows along the city’s Magnificent Mile and assaulting police officers and city officials. Chicago’s mayor at the time was the Democratic boss of bosses, Richard J. Daley. The city’s current mayor, Richard M. Daley, has employed Ayers as a teacher trainer for the public schools and consulted him on the city’s education-reform plans. Obama’s supporters can reasonably ask: If Daley fils can forgive Ayers for his past violence, why should Obama’s less consequential contacts with Ayers be a political disqualification? It’s hard to disagree. Chicago’s liberals have chosen to define deviancy down in Ayers’s case, and Obama can’t be blamed for that.

What he can be blamed for is not acknowledging that his neighbor has a political agenda that, if successful, would make it impossible to lift academic achievement for disadvantaged children. As I have shown elsewhere in City Journal, Ayers’s politics have hardly changed since his Weatherman days. He still boasts about working full-time to bring down American capitalism and imperialism. This time, however, he does it from his tenured perch as Distinguished Professor of Education at the University of Illinois, Chicago. Instead of planting bombs in public buildings, Ayers now works to indoctrinate America’s future teachers in the revolutionary cause, urging them to pass on the lessons to their public school students.

Indeed, the education department at the University of Illinois is a hotbed for the radical education professoriate. As Ayers puts it in one of his course descriptions, prospective K–12 teachers need to “be aware of the social and moral universe we inhabit and . . . be a teacher capable of hope and struggle, outrage and action, a teacher teaching for social justice and liberation.” Ayers’s texts on the imperative of social-justice teaching are among the most popular works in the syllabi of the nation’s ed schools and teacher-training institutes. One of Ayers’s major themes is that the American public school system is nothing but a reflection of capitalist hegemony. Thus, the mission of all progressive teachers is to take back the classrooms and turn them into laboratories of revolutionary change.

Unfortunately, neither Obama nor his critics in the media seem to have a clue about Ayers’s current work and his widespread influence in the education schools. In his last debate with Hillary Clinton, Obama referred to Ayers as a “professor of English,” an error that the media then repeated. Would that Ayers were just another radical English professor. In that case, his poisonous anti-American teaching would be limited to a few hundred college students in the liberal arts. But through his indoctrination of future K–12 teachers, Ayers has been able to influence what happens in hundreds, perhaps thousands, of classrooms.

Ayers’s influence on what is taught in the nation’s public schools is likely to grow in the future. Last month, he was elected vice president for curriculum of the 25,000-member American Educational Research Association (AERA), the nation’s largest organization of education-school professors and researchers. Ayers won the election handily, and there is no doubt that his fellow education professors knew whom they were voting for. In the short biographical statement distributed to prospective voters beforehand, Ayers listed among his scholarly books Fugitive Days, an unapologetic memoir about his ten years in the Weather Underground. The book includes dramatic accounts of how he bombed the Pentagon and other public buildings.

AERA already does a great deal to advance the social-justice teaching agenda in the nation’s schools and has established a Social Justice Division with its own executive director. With Bill Ayers now part of the organization’s national leadership, you can be sure that it will encourage even more funding and support for research on how teachers can promote left-wing ideology in the nation’s classrooms—and correspondingly less support for research on such mundane subjects as the best methods for teaching underprivileged children to read.

The next time Obama—the candidate who purports to be our next “education president”—discusses education on the campaign trail, it would be nice to hear what he thinks of his Hyde Park neighbor’s vision for turning the nation’s schools into left-wing indoctrination centers. Indeed, it’s an appropriate question for all the presidential candidates.

Sol Stern is a contributing editor of City Journal and the author of Breaking Free: Public School Lessons and the Imperative of School Choice.

The New Jersey Department of Education provides a web page containing a list of links relating to “Language Arts Literacy”.

While the list is admittedly alphabetical, it is nonetheless at least a little funny (ha-ha, “isn’t that sweet justice” funny) to this observer that Chicago-based Mr. Bill Ayers’ AERA organization and Ridgewood-based Ms. Botsford’s ASCD organization are listed together at the top of the official links page (found at http://www.state.nj.us/education/aps/cccs/lal/assoc.htm).

Note the innocuous descriptions of the two organizations, each of which has its own rather aggressive public agenda not necessarily in line with the best interests of New Jersey’s school-age children, IMHO.

(Begin Quote)

American Educational Research Association (AERA)

AERA is concerned with improving the educational process by encouraging scholarly inquiry related to education and by promoting the dissemination and practical application of research results.

Association for Supervision and Curriculum Development (ASCD)

ASCD is an international, nonprofit, nonpartisan education association committed to the mission of forging covenants in teaching and learning for the success of all learners. ASCD provides professional development in curriculum and supervision; encourages research, evaluation, and theory development; and disseminates information on education issues.

A quick internet search reveals that Ms. Botsford’s ASCD organization’s heavy buy-in with respect to “Authentic Assessment” may have its intellectual roots in work published in 1999 by the same American Educational Research Association (AERA) that recently hired Bill Ayers in concert with the American Psychological Association (APA) and the National Council for Measurement in Education (NCME).

Standardized testing plays an increasingly important role in the lives of today’s students and educators. The U.S. No Child Left Behind Act (NCLB) requires assessment in math and literacy in grades 3–8 and 10 and, as of 2007–08, in science once in grades 3–5, 6–9, and 10–12. Based on National Center for Education Statistics enrollment projections, that will be roughly 68 million tests per year, simply to meet the requirements of NCLB. Such an intense focus on assessment, with real consequences attached for students and educators, makes it imperative that policymakers understand the complexities involved with assessment and in using assessments as part of high-stakes accountability policies.

As policymakers continue to establish and revise state and national assessment and accountability systems, two overarching questions must be addressed:

Do current tests supply valid and reliable information? What happens to such assessments when high stakes are attached to the outcomes?

The American Educational Research Association (AERA), the American Psychological Association (APA), and the National Council for Measurement in Education (NCME) have jointly released The Standards for Educational and Psychological Testing (1999), a detailed set of guidelines on assessment use. Within these guidelines, the associations note that although tests, “when used appropriately, can be valid measures of student achievement,” decisions “about a student’s continued education, such as retention, tracking, or graduation, should not be based on the results of a single test but should include other relevant and valid information” (APA, 2001, paras. 9, 14). In a position supported by its Leadership Council, ASCD takes a similar stance (see box).

ASCD Adopted Position on High-Stakes Testing, 2004

Decision makers in education—students, parents, educators, community members, and policymakers—all need timely access to information from many sources. Judgments about student learning and education program success need to be informed by multiple measures. Using a single achievement test to sanction students, educators, schools, districts, states/provinces, or countries is an inappropriate use of assessment. ASCD supports the use of multiple measures in assessment systems that are

Fair, balanced, and grounded in the art and science of learning and teaching;

Reflective of curricular and developmental goals and representative of content that students have had an opportunity to learn;

Used to inform and improve instruction;

Designed to accommodate nonnative speakers and special-needs students; and

Valid, reliable, and supported by professional, scientific, and ethical standards designed to fairly assess the unique and diverse abilities and knowledge base of all students.

Complexities in AssessmentOn both the individual and system levels, assessment poses issues worthy of consideration.

Individual Assessment. Multiple forms of assessment are important because of the potential effect of human error within even well-designed systems. Researchers at the National Board on Educational Testing and Public Policy found that human error in testing programs occurs during all phases of testing (from design and administration to scoring and reporting), and that such errors can have a significant negative effect on students when high-stakes decisions are made.

In 1999, researchers found that individuals involved in the assessment process made numerous errors across the different phases of the assessment process, resulting in significant negative consequences. For example, 50 students were wrongly denied graduation; 8,668 students were needlessly required to attend summer school; and 257,000 students were misclassified as limited-English-proficient (Rhodes & Madaus, 2003). In January of 2003, more than 4,000 teacher candidates were incorrectly failed on their certification tests due to an ETS scoring error (Clark, 2004).

Systemic Assessment. Using test results to evaluate educational systems is also problematic. As highlighted in a recent presentation at ETS (Raudenbush, 2004), the general concept of using tests for this purpose assumes there is a causal relationship between the system (treatment) and the test score (outcomes); however, assessment systems as currently designed are not structured to determine causation (there are no comparison or control groups). The assessment systems assume that school effects cause any differentiation in scores, but those differences could be the result of other, uncontrolled-for variables, such as the effect of previous schools or the effect of wealth or community characteristics (Popham, 2003; Turkheimer, Haley, Waldron, D’Onofrio, & Gottesman, 2003). According to Raudenbush, using school-mean proficiency results (NCLB’s basic accountability mechanisms) to evaluate schools is “scientifically indefensible,” and although value-added assessment (which measures year-to-year gain) addresses some issues, it, too, presents a flawed analysis of schoolwide performance, particularly when there are transitions between schools or significant differences in earlier educational experiences.

High-Stakes AccountabilityThe addition of high-stakes consequences to assessment systems in order to motivate change in educator behavior adds one more serious degree of complexity. High-stakes accountability mechanisms generally rely on operant theories of motivation that emphasize the use of external incentives (punishments or rewards) to force change (Ryan & Brown, in press). Other theories of motivation, however, suggest that such reliance on external incentives will result in negative and unintended consequences (Ryan & Brown, in press; Ryan & Deci, 2000). Operant approaches to motivation focus on behaviors (that is, the reward or punishment is designed to cause behavioral change), but the testing movement focuses on outcomes (the achievement of specific scores) regardless of behavior change. These conflicting goals result in a situation where the ends (higher test scores) become more important than the means (changes in educator behavior) used to achieve those ends. In other words, because the rewards and punishments stemming from the testing program are attached to conditions that educators may not have control over (including school and classroom resources, community poverty, social supports, and so on), educators are left to make changes in variables they do control (such as student enrollments, test administration, and classroom instruction).

As predicted by Ryan and Brown, the change in these variables is complex and includes consequences that policymakers could not have intended, such as narrowing the curriculum and associated training to tested subjects (Berry, Turchi, Johnson, Hare, Owens, & Clements, 2003; Moon, Callahan, & Tomlinson, 2003), increased push-out of underperforming students (Lewin & Medina, 2003), and increased manipulation of test administration (Rodriguez, 1999). A recent survey conducted by the National Board on Educational Testing and Public Policy found that 75 percent of teachers thought that state-mandated testing programs led teachers in their school to teach in ways that contradict their own ideas of good educational practice (Pedulla, 2003).

Assessment Types, Uses, and ScoringBecause much of the responsibility for the use of assessments resides with the users, it is important that policymakers understand in general what tests can and cannot do, as well as the appropriate ways in which tests might be used as part of an accountability system.

At best, tests are an incomplete measure of what a student knows and can do. A final score measures only the student’s performance relative to the sample of items included on that specific test. This is why educators argue for the use of multiple measures in evaluating students—so that a more complete picture of the student can be generated. Educators use assessments that cover a variety of purposes and measure differing levels of knowledge, skills, and abilities. For an assessment to work well, it must be consistent with the instructions of the test maker. Using a test for a purpose for which it was not intended can result in invalid or unreliable outcomes. The same is true regarding use of a test that has not been fully validated, or using tests where the scoring parameters have been set for political or public relations purposes rather than measurement purposes.

Thus, it is critical that the appropriate assessments and measures be used for the identified policy or educational goals. Three general areas to consider when examining assessments are test type (such as achievement tests or aptitude tests), test use (for diagnostics, placement, or formative or summative evaluation), and the scoring reference (raw scores, norm-referenced scores, or criterion-referenced scores).

Test Type. Achievement and aptitude tests, although similar, attempt to measure two different concepts. Achievement tests generally measure the specific content a student has (or has not) learned, whereas aptitude tests attempt to predict a student’s future behavior or achievement (Popham, 2003). Although student outcomes on these tests may be related, it would be inappropriate to use the tests interchangeably because they measure different constructs. The SAT is an example of an aptitude test that is frequently misused by policy activists to make content-focused judgments or comparisons of student achievement.

Test Use. Tests are used to help diagnose areas of student strength and weakness, as well as specific learning difficulties. Tests can also be used to guide school readiness and placement decisions, and to make formative or summative evaluations. Formative evaluations are structured assessments designed to gauge the progress of students as measured against specific learning objectives. Such assessments are used to help guide instruction so that teachers and students have a general idea of what learning outcomes have been achieved, and where further focus is needed. Summative assessments, on the other hand, are used to evaluate achievement at the end of specific educational programs (for example, mathematics achievement at the end of grade 10).

Scoring. The way in which tests are designed to have scores reported—as norm-referenced or criterion-referenced—also plays a key role in test usage. Norm-referenced tests are designed to result in a score spread, so that students can be compared to their peers and placed in a hierarchy by percentage. Scores reported from a norm-referenced test, therefore, are broken out in such a way as to ensure that half of the test takers score in the top 50 percent, and half score in the bottom 50 percent. Because the goal is to differentiate between test takers, when test items are created and validated, items that are too easy—or too hard—are discarded because they fail to differentiate between students. Even if a norm-referenced test is created from a set of state standards, it is exceptionally difficult to use such a test as a summative assessment because important content items may have been discarded in the test building process for being deemed too easy or too hard (Popham, 2003; Linn & Gronlund, 1995).

Criterion-referenced tests, however, do try to focus specifically on student outcomes relative to a fixed body of knowledge. Criterion-referenced tests can result in the majority of students scoring above, or below, a specified cut score. And, in fact, a criterion-referenced test should be positively (or negatively) skewed, depending on the success of the students and teachers in addressing the body of content from which the test has been constructed. State assessments designed to measure the achievement of students relative to the state’s content standards should be criterion-referenced.

Test scores are also occasionally reported in raw scores, which are simply the total of correct responses. Unfortunately, the raw score is frequently misinterpreted because it is reported without interpretation. A test that is particularly difficult (or easy) may have an unusually low (or high) average score. Without knowing the context of the test or the scoring, it is impossible to make a judgment as to what the raw scores say about the performance of test takers.

Scores should be interpreted in terms of the specific tests from which they were derived. In other words, student scores on a reading test should not be taken to represent students’ general ability to read; rather, the scores should be examined only in light of the skills the assessment was intended to measure. For instance, a reading test that measures a student’s ability to sound out words would not tell us how well a student comprehends the main idea in a paragraph of text. Scores should be interpreted in light of all the student’s relevant characteristics. A student’s score on a specific test may be influenced by many variables, including language background, education, cultural background, and motivation. A low score does not necessarily indicate that the student does not know the material or that the system has failed to engage the student.

Scores should be interpreted according to the type of decisions to be made. Test scores should not be generalized to actions beyond the original purpose of the test. Scores should be interpreted as a band of possible scores, rather than an absolute value. Because tests are only an approximate measure of what a student actually knows and can do, the student’s true abilities may differ from the measured score. Most tests include a measure of standard error, which can be used to help determine where a student’s true score may lie. For example, the true score for a student scoring a 68 on a test with a 4-point standard error is likely to fall within the range of 64 to 72.

Scores should be verified by supplementary evidence. This is perhaps the single most important admonition for test users. No test can ensure the accurate measure of a student’s true performance; other evidence should be examined. Allowing students to retake the same test does not provide supplementary evidence of performance. Instead, alternative measures, such as classroom performance, should be used to help make accurate determinations of student abilities.

Constructing Assessment Systems

In constructing assessment systems, test makers can draw from a variety of item types and formats, depending on the type of assessment being created and its purpose. For example, although selected-response tests (such as multiple-choice tests) are easy to score and offer a reasonable measure for vocabulary, facts, or general principles and methods, they are less useful for measuring complex achievement, such as the application of principles or the ability to generate hypotheses or conduct experiments. Such complex abilities require more complex item constructs, such as those found on constructed-response tests, which may include essay questions or actual performance assessments.

On the other hand, performance and portfolio assessments (authentic assessment assessments) allow students to more intentionally demonstrate their competence. Although such assessments may resemble traditional constructed-response tests, their goal is to mirror tasks that people might face in real life. For example, they might require students to demonstrate writing competence through a series of polished essays, papers, or poems (depending on the type of writing being assessed), or to design, set up, run, and evaluate a science experiment. Other types of performance assessment include speeches, formal presentations, or exhibits of student work.

Portfolio assessments, although similar to performance assessments, are designed to collect data over time and can also include measures from traditional assessments. The goal of portfolios is to allow teachers, students, and evaluators to gauge student growth by examining specific artifacts that students have created. Students in British Columbia, for example, are required to present a Graduation Portfolio Assessment, which accounts for 4 of the 80 course credits required to be awarded a diploma (BC Ministry of Education, 2004). The portfolio documents student work in grades 10–12 in six domains: Arts and Design, Community Involvement and Responsibility, Education and Career Planning, Employability Skills, Information Technology, and Personal Health. Although districts have approached the requirement in different ways, Surrey School District, which has the largest enrollment in British Columbia, is helping students create electronic portfolios that will provide Web-accessible evidence of their academic performance. In Providence, Rhode Island, the Met School has gone one step further and eliminated grades and traditional tests altogether, evaluating student work completely through publicly presented portfolios (Washor & Mojkowski, 2003).

Constructed-response tests—including performance and portfolio assessments—provide a richer evaluation of students, but they are much more time-consuming for teachers, students, and evaluators; they are also more expensive and difficult to administer and score in a large-scale standardized manner. Connecticut school officials are currently in a dispute with the U.S. Department of Education regarding assessment costs, because they don’t want to “dumb down” their constructed-response tests by dropping writing components that require hand scoring (Archer, 2005). Even so, the educational richness inherent in authentic assessments suggests that policymakers take seriously the possibility of incorporating a deep evidence base in assessment and accountability models.

Assessment and EthicsThe ethical practices related to testing and assessment further complicate the picture. As highlighted by Megargee (2000), the ethical responsibilities for assessment are split between the test developer and the test user—the developer being responsible for ensuring the tests are scientifically reliable and valid measures, and the user for “the proper administration and scoring of the test, interpretation of the test scores, communication of the results, safeguarding the welfare of the test takers, and maintaining the confidentiality of their test records” (p. 52). This separation of ethical responsibility between test makers and consumers results in a loophole that allows commercial test makers to sell assessments to clients even when they know the tests will be misused. Additionally, although the education profession has taken responsibility for creating ethical standards, it currently has no mechanisms for enforcement.

Conclusions

Policymakers face a daunting challenge in designing school assessment and accountability systems; however, professionals in assessment have worked hard to provide the basic outline for policies that can support positive assessment systems. These systems cannot be implemented cheaply, and when cost-saving compromises are made, serious damage to both individuals and systems (school and assessment) can result. Therefore, policymakers should work to carefully understand (and adjust for) the trade-offs they make as they seek to create cost-effective accountability systems. It is not an understatement to say that the lives of individual students will be positively—or negatively—affected by the decisions they make.

In an effort to increase both the instructional use of assessments and public confidence in such systems, states should work to keep these systems transparent, allowing relevant stakeholders to review test content and student answer sheets. Teachers, parents, and students cannot use test data to improve instruction or focus learning if they are denied access to detailed score reports. In fact, states may be required to give such information to parents. Washington State officials recently decided to give parents access to student tests and booklets because they determined that under the Family Educational Rights and Privacy Act (FERPA), exams were defined as part of a student’s educational records and, therefore, must be made available to parents—and to students once they reach 18 years of age (Houtz, 2005).

Professional associations and psychometricians have focused on creating standards for test use (AERA, APA, & NCME, 1999), some of which have been delineated here. Due to the split between assessment creators and consumers regarding ethical responsibilities for test usage, as well as the lack of professional enforcement mechanisms, it is imperative that policymakers incorporate the recommendations of assessment professionals as they create systems that use evidence from standardized and large-scale assessment programs.

Recent Origins of Standardized Testing

Much of the theory and many constructs undergirding standardized assessments evolved from work done on standardized intelligence testing. British psychologist Sir Francis Galton, French psychologist Alfred Binet, and an American from Stanford University, Lewis Terman, are generally credited as the fathers of modern intelligence testing (Megargee, 2000). The work of Terman and Binet ultimately resulted in the Stanford-Binet Intelligence Scale, which is still in use today. The SAT—an aptitude test (a test that attempts to predict a student’s future achievement)—came into being in 1926 to help predict a student’s likely success in college, and the Graduate Records Examinations (GRE) were introduced a decade later. In 1939, David Wechsler introduced an intelligence scale that broke intelligence into discrete pieces, in this case verbal and nonverbal subtests. The first large-scale use of standardized intelligence testing occurred in the U.S. military during World War I, when more than 1,700,000 recruits were tested to determine their role (as officers or enlisted men) or denote them as unable to serve. Standardized achievement tests, which attempt to measure the specific knowledge and skills that a student currently possesses (and not general intellectual ability or potential for future achievement), came into widespread use in the 1970s through minimum competency testing (Popham, 2001).

The evolution of intelligence testing has been turbulent, with researchers still debating whether intelligence is a single construct referred to as “g” (Gottfredson, 1998) or consists of many different intelligences, such as Gardner’s theory of multiple intelligences posits: linguistic, logical-mathematical, musical, spatial, bodily-kinesthetic, interpersonal, intrapersonal, and naturalist (Checkley, 1997). In addition to debates about how to define intelligence, scientists are trying to determine how much of it—if any—is hereditary and how much is learned—that is, influenced positively or negatively by the environment in which a person exists. One recent study, for example, found that the effects of poverty on intelligence could overwhelm any genetic differences, emphasizing the complex nature of intelligence (Turkheimer, Haley, Waldron, D’Onofrio, & Gottesman, 2003).

Historically, intelligence testing has also been used in ways that many people today find offensive. The eugenics movement of the early-mid-20th century used intelligence testing to identify individuals who were “feebleminded” (or had other deficiencies) so that they could be institutionalized or placed in basic-skills tracks (Stoskopf, 1999b). Eugenic policies were created to “strengthen” the genetic makeup of Americans, and scientists who supported these policies provided the impetus for U.S. immigration restrictions in the 1920s and sterilization laws that were in effect through the 1960s—resulting in the sterilization of, at a minimum, 60,000 individuals (Reilly, 1987). As recently as last year, a candidate for U.S. Congress from Tennessee, James Hart, garnered almost 60,000 votes running on a platform of eugenics (Associated Press, 2004; Hart, 2004; McDowell, 2004).

Early IQ testing, which was greatly affected by culturally biased items, also resulted in the tracking of African American children into low-level courses and vocational schools, on the basis of the assumption that they had generally low mental abilities (Stoskopf, 1999a). In 1923, Carl Brigham, who later helped create the SAT, published A Study of American Intelligence, which alleged on the basis of U.S. Army testing that intelligence was tied to race. Brigham recanted his findings in 1930; however, his work was used extensively to provide “scientific” evidence for racist policies in the 1920s (Stoskopf, 1999a).

[Extensive bibliography omitted]

Dan Laitsch is an assistant professor in the Faculty of Education at Simon Fraser University in British Columbia, Canada, and is coeditor of the International Journal for Education Policy and Leadership.