Abstract

The SAT was conceived in bias content/context/values, to promote and maintain a status quo based on race, gender, and to a lesser degree class. The idea was to identify and reward potential leaders not just based on the accident of birth, but also based on being intellectually gifted. However, there was never successful separation of race, class, and gender from the criterion of gifted and therefore from the selection of those students destined for success. In many ways, the SAT has succeeded and continues to succeed in ways that continue to fuel the debate of bias. This paper traces the beginnings of the SAT and follows its controversial existence through to the current times.

Introduction

The testing double-edged sword

Tests are widely used on today's society in a variety of areas at both the elementary and secondary levels. Tests offered are local to the school, regional to the state, and national to the country. Results of local tests, in many instances, are used to evaluate students in order to provide remedial education and to correctly stream them into gifted programs. However national tests are used to identify academically gifted students for academic rewards such as college admissions and national merit awards. Many writers have pointed to the correlation between academic preparation opportunities and performance on these tests. Increasingly in this climate of accountability, test results are increasingly being used to determine the fortunes of schools and their districts in terms of merit pay, funding, scorecards, and even autonomy.

Despite the high stakes increasingly placed on these tests, such tests have always been mired in controversy in terms biased uses and content. This paper focuses on the Scholastic Aptitude Test (SAT), (also called the Scholastic Ability Test, the Scholastic Achievement Test, and the Scholastic Assessment Test) in exploring the issue of bias. One has to wonder whether the inability to settle on a name points to a lack of certainty of what the test actually measures and the ongoing conflict with what it purports to measure, as well as the consequences of these measures to test takers. It will present the view that the historic beginnings of this test were conceived to satisfy biased social values. Today, this bias remains and continues to be perpetuated in terms of the context in which this test is given and the content which it uses to make high-stakes decisions regarding the future of millions of high school seniors each year. It will also look at some of the social consequences associated with the continued use of this test.

Overview of Definitions

This paper is about test bias on terms of context, content, and values. It is only fitting to begin with a definition of key terms, namely validity and bias.

Validity

According to the Standards (1999), validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses of the test. It is "an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment" (Messick, 1989, page 13). The Standards goes on to describe validity as a unitary concept and is the degree to which all accumulated evidence supports the intended interpretation for the proposed uses of the scores. The validation process requires the collection of evidence from various sources and also requires that the meaning of test scores be revised as new evidence about the meaning of the scores become available.

Scores from high stakes tests, such as the SAT, carries with them life-changing consequences as the associated interpretations and uses function as gatekeepers to social opportunities. These social opportunities include employment selection and promotion and access to educational opportunities. Given the level of consequences that the interpretation and uses of the SAT scores carry with it, one of the questions that begs to be asked is "should so much weight be placed solely on these scores?" The purpose of the SAT is to predict the ability of test takers to succeed at the college level, through the prediction of freshman grade point average (GPA). Accordingly, it is also appropriate to extend the definition of validity to include the different types of evidence.

Construct Validity

This is the degree to which a test measures the construct. Constructs cannot be directly observed; accordingly, a valid measure should:

Be strongly correlated with both other measures that measure the same construct and/or measures of those constructs that are theoretically related

Should properly discriminate between groups who vary in ability in terms of the construct being measured

Should not be correlated with constructs that are not related to the construct of interest.

"Construct validity is evaluated by investigating what qualities a test measures, that is, by determining the degree to which certain explanatory concepts or constructs account for performance on the test", (Messick, 1989, page 16).

Predictive Validity

The degree of accuracy can predict measures to be obtained at a later time. "Predictive validity indicates the extent to which an in individual's future level on the criterion evidence is predicted from prior test performance", (Messick, 1989, page 16). Therefore, scores on a test that predicts freshman academic performance average would be considered valid if it does indeed successfully and consistently predicts performance on that construct

Content Validity

According to Messick (1989), content validity is evaluated by showing how well the content of the test samples the class of situations or subject matter about which conclusions are to be drawn. It is the degree to which items on a test measure the intended content as well as the degree to which the content adequately represents and is relevant to the domain of interest. "Test content refers to the themes, wording, and format of the items, tasks or questions on a test, as well as the guidelines for procedures regarding administration and scoring", (Standards, 1999).

Table 1 shows Messick's model describing the facets of validity. The evidential/interpretation facet includes construct validity provides evidence as the to appropriateness of test interpretation and test use. The evidential/test use evidence supports evidence of appropriate interpretation of the test results for a specific applied purpose and setting. The consequential facet describes the outcome of the use of the test, in terms of its value and social consequences, which includes existing and potential adverse impacts on sub groups. The four facets are not mutually exclusive; adverse impacts on sub groups affect the meaning of test scores, which in turn affects the meaning of the construct.

Test Interpretation

Test Use

Evidential Basis

Construct Validity

Construct validity + Relevance/utility

Consequential Basis

Value Implication

Social Consequences

Table 1 Messick (1989) Facets of Validity

Bias

"Bias in a test is a slant in the way a test measures what I is intended to measure: it is a systematic error that disadvantages the test performance of one group compared to another" (Sheppard, 1981). Bond (1981) outlines essential ways in which a test can be biased.

In terms of the items themselves as in the case of analogies, whose content, values, and meanings are embedded in a socio-cultural context that is more familiar to one group and therefore places other groups at a disadvantage because of their lack of exposure to said context. Here, construct bias exists as the test is no longer measuring the intended construct, such as verbal ability, but rather, is measuring familiarity with the words and context in question. This has been one of the strongest criticisms of the SAT and similar tests. Many have argued that these tests "draw upon language, terms, expressions, and values familiar to white, middle-class America, but relatively unfamiliar to black, Hispanics, and other distinct cultural groups", (Bond, 1991).

Use of these tests in employment and college admissions, the issue being that selection and prediction are far more complex than the use of information from one test speaks to. If a biased test is used of selection, the result will be the potential to systematically over predict the performance of the favored group's performance while systematically under predicting of performance for the non-favored group.

For a test to be regarded as "unbiased" or "fair", differential item functioning should not take place. Mean scores on the test's items should be alike for subgroups of interests who are alike with respect to overall ability on the construct of interest. There are potentially great social consequences for high-stakes, widely used bias tests. These include the lack of access to social rewards such as higher education and employment opportunities, which leads to a systematic disenfranchisement of sub-groups against whom the test is biased.

Present Day SAT Issues and Social Consequences

The 2001 SAT scores still reflect differences across racial lines as can be seen on table 1.

*There is DNA-based similarity of American Indians and central Asian Turkic peoples and the similarities in their IQ's.

On average, blacks are scoring 200 points below whites and with the exception of Asians substantial gaps exist between whites and other minority groups.

Another of the main objections to the use of the SAT is the accusation of class bias, which seems to stir just as much passion as the issue of race. The question that remains is how much difference is there between the two issues in terms of the negatively affected population.

Issues of Class Bias

The advent of the GI Bill sharply changed the college attendance demographic. Formerly reserved as a privilege for the elite, college attendance is now commonly viewed as a right. This resulted in diverse groups of applicants applying each year to American colleges. One of the results was the explosion of growth in the tertiary education sector, especially in the availability of various types of institutions of higher education. The extent of this growth practically guarantees anyone desirous of a college education the opportunity to attend.

Taken from US Census Bureau Web site at http://www.census.gov/hhes/income/income99/99tablea.html

Table 4

The question becomes does the SAT measure "aptitude" or "ability" which seems to be race/class-based with the aim of providing opportunity based on merit, or does it serve to continuously assist with the perpetuation of social inequity.

Poorer kids are more likely to attend ill-equipped schools, have inexperienced teachers, over-crowded classrooms, tracked into vocational streams, less educated parents. All of this means less access to college prep courses - both private and through the school system. If SAT is the gate-keeper of merit-based opportunity for social mobility, then it would seem that a significant portion of poor students will be trapped into the conditions that there were born in - if getting out is by means of advanced education. Minorities, who are typically over-represented at or below the poverty line, will continue to be over-represented in the lower SES groups and will also have to deal with the inherently white, male, middle-class socio-cultural biases upon which the test is built.

Some Social Consequences

Access to Selective Colleges

Minority/poor students will continue to have unequal access to selective colleges even those with strong academic records. This violates the very intent of the creation of SAT as a common yardstick of ability, which gives students from poorer and more humble circumstances to overcome their poor start. In short, non-white and poor students, because of much lower scores, will continue to be systematically deprived of admission to the more selective colleges.

Differential and Alternate Admissions Policies

While black/white comparison shows a significant gap in scores favoring whites, when Asians are included, they by far have the highest scores in both math and reading portions of the test. However, according to Trusthiem (1988), many colleges either discounted or did not use the SAT for black student admission. It may be argued that this practice has remained in large part because universities need to recruit racially balanced freshman classes. Some schools (nearly 400) either no longer using SAT for admissions or make it an optional criteria. This unequal credential requirements or lack of use of a "common yardstick" goes against the very issue that the SAT was created to fix is now becoming increasingly common. List of schools here: http://www.fairtest.org/univ/optional.htm.

Access to Scholarships

Where scholarships are awarded on the basis of SAT and Preliminary SAT (PSAT) scores low scoring students will not be eligible for these opportunities. Minority and poor students are over-represented in the low scoring groups resulting in their having less access to these opportunities as their middle/upper-class white peers. If scholarships are intended for promising students from more humble circumstances, these very students are least likely to benefit and will more likely than their better-off peers, to have to finance their education using other more expensive financial aid such as student loans or may have to limit their college choices to those that are cheaper and less selective.

Perpetuation of Racial Stereotypes

The racial stereotypes, which echoed in the eugenics sentiments of the SAT creators, will continue to be perpetuated as minorities and poor students are and will continue to be systematically excluded from the social mobility opportunities afforded by a college education. If the opportunity to climb the social ladder is afforded through education, the gains afforded to those on the lower rung will continue to be painfully slow and may not keep pace with the widening gap between the have and the have-nots.

Issues of Gender Bias

test content in which many more men than women are referred to or depicted and where women are depicted, they are typically shown in lower status situations

test context in which questions are set in experiences more familiar to men

test validity in which women's academic capabilities are under-predicted while that of men are over-predicted

test use in which women's access to educational opportunities are diminished or restricted by an institutions reliance on test scores which under-predicts their abilities.

Historically, one of the strongest arguments used by opponents if the use of the SAT is its bias against women. Critics have argued that the impact of this bias has been and is the ongoing access to educational opportunities, especially as this relates to scholarships.

The purpose of the SAT is to predict freshman year performance. "In 1988, women's SAT scores were 56 points lower than men's: 12 points in the Verbal Section - where women excelled until 1972 when men began to outscore them - and 43 points in the Math Section", Rosser, 1989, page 4. It is believed that the gradual change in test content to include more science and business related items at the expense of arts and humanities, resulted in a verbal gender gap by 1986, which favored men. This trend has continued to current times as shown in table 4, which shows a substantial gender gap in scores on the 2001 test.

Some Social Consequences

Scholarships

The National Merit Scholarships, one of the most prestigious competitions in the United States, uses PSAT results as the sole criteria to select semifinalists. The predictive validity issue, which favors males, has resulted in the pool of semifinalists being predominantly male, despite females' equal or better academic performance in high school. This in effect, denies female students access to millions of dollars in scholarship funds. As mentioned earlier, the consequences to ethnic minority females are greater. This access to scholarships extends beyond National Merit to include any merit-based award, including sports scholarships.

Additionally, females may become less likely to be placed in advanced placement (AP) courses, less inclined to apply to more selective colleges and universities, which translate in a disadvantage in college placements. These consequences ultimately impact career choices and opportunities resulting in less women being in prestigious careers. If the trend is similar to high school and college performance, it may be argued that it is likely that given the opportunity, they would have been likely to perform equally or in some cases, better than their male counterparts.

These consequences are compounded for females of color and those from poorer backgrounds.

Perpetuation of Gender Stereotypes

The belief that boys are inherently more capable in some areas of studies such as mathematics, science, and business, is perpetuated. In many cases, girls are still being socialized away from these fields and where they persist, their low SATM scores may prevent girls from applying to more selective colleges and competitive programs. The end result is that females is and will continue to be under-represented in lucrative professions, especially in science, technology, and business, in which, if their college performance holds true, would more likely have been successful.

The consistent lower scores may be viewed as an assault on girls' self-confidence. The constant announcement of the lower scores may lead girls to believe that they will not do as well, fulfilling the prophecy.

Girls socialized to follow rules and guess less than boys. Many believe that this lack of guessing contributes to score disparities.

Conclusion

The issue of the bias of the SAT and its consequences to females and minorities continues to be the subject of wide debate. Despite the increasing number of institutions who no longer require scores for admissions, a significant number still relies on these scores for admission formulae. ETS has and continues to try to include items considered fair to various sub groups of interest. However, the increasing gap in scores, as well as the strong correlation to income seems indicate that the goal of fairness is still in the distance. The resulting consequence of a lack of access to more selective colleges and programs will continue for the groups who are adversely affected by the use of this test.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999). Standards for Educational and Psychological Testing. Washington, DC: Author (Copies available at AERA, 1230 17th Street, N.W., Washington, DC 20036)