New evidence that SAT hurts blacks

Roy Freedle is 76 now, with a research psychologist's innate patience. He knows that decades often pass before valid ideas take root. When the notion is as radical as his, that the SAT is racially biased, an even longer wait might be expected. But after 23 years the research he has done on the surprising reaction of black students to hard words versus easy words seems to be gaining new respectability.

Seven years ago, after being discouraged from investigating findings while working for the Educational Testing Service, Freedle published a paper in the Harvard Educational Review that won significant attention.

He was retired from ETS by then. As he expected, his former supervisors dismissed his conclusions. Researchers working for the College Board, which owns the SAT, said the test was not biased. But the then president of the University of California system, a cognitive psychologist named Richard C. Atkinson, was intrigued. He asked the director of research in his office to replicate Freedle's study.

Now, in the latest issue of the Harvard Educational Review, the two scholars who took on that project have published a paper saying Freedle was right about a flaw in the SAT, even in its current form. They say "the SAT, a high-stakes test with significant consequences for the educational opportunities available to young people in the United States, favors one ethnic group over another."

"The confirmation of unfair test results throws into question the validity of the test and, consequently, all decisions based on its results," said Maria Veronica Santelices, now at the Pontificia Universidad Catolica de Chile in Santiago, and Mark Wilson of UC Berkeley. "All admissions decisions based exclusively or predominantly on SAT performance--and therefore access to higher education institutions and subsequent job placement and professional success--appear to be biased against the African American minority group and could be exposed to legal challenge."

Researchers at the College Board and ETS don't like this new paper anymore than they liked Freedle's in 2003. Laurence Bunin, the College Board vice president in charge of the SAT, said the Santelices-Wilson study is "fundamentally flawed." He pointed out that it had not yet been peer reviewed. He said the scholars' conclusions were "wrong and irresponsible and a disservice to students, parents and colleges," and were based on "a very small, limited and unrepresentative sample."

College Board spokeswoman Kathleen Steinberg said the Harvard Educational Review declined the College Board's offer of a response to the paper, but plans to publish a criticism of the paper by ETS researcher Neil Dorans, as well as a response by Freedle himself.

They are discussing a complex topic, full of psychometric terms and concepts I am not competent to judge. Back in 2003, when I wrote a long article for the Atlantic Monthly on Freedle's work, I relied heavily on him and other experts to explain what they were talking about. Much of it had to do with a method of test analysis called differential item functioning, or DIF (rhymes with cliff). Psychometricians like Freedle and his colleagues at ETS, which was then managing the SAT, looked at how different ethnicities that were matched at different scoring levels (those who had scored 360 on the SAT verbal test, then those who had scored 380, and so on) did on each item.

At each level of ability, but particularly in the lower-scoring groups, white students on average did better than blacks on the easier items, whereas blacks on average did better than whites on the harder ones. (Whites, however, as a group did better overall.)

This was unexpected. The deeper Freedle got into it, the more uncomfortable his supervisors seemed to be with his work. He had to revise one paper more than 11 times before they allowed him to publish it.

Hard questions, those that produced more wrong answers, tended to have longer, less common words. Easy questions tended to have shorter, more common words. Freedle thought this was key to the relative success African American students had with the harder ones. Simpler words tended to have more meanings, and in some cases different meanings in white middle class neighborhoods than they had in underprivileged minority neighborhoods, he concluded. This, he said, could help explain why African American students did worse on questions with common words than on questions that depended on harder, but less ambiguous words they studied at school.

On average, he said, black students were performing only slightly above matched-ability whites on hard questions. But averages did not submit applications to colleges. Individual students did. Some of those individuals, he discovered, would have gotten a boost of a hundred points or more on the SAT if the score was weighted toward the hard items. He proposed that the College Board offer a supplement to SAT scores, called the Revised-SAT, or R-SAT, which would be calculated based only on the hard items. This, he said, would "greatly increase the number of high-scoring minority individuals."

In their paper, Santelices and Wilson rule out Freedle's suggestion that the bias he found in the test might affect all kinds of multiple-choice questions, or minorities other than blacks. But they did find it in sentence completion and reading comprehension sections of the SAT.

Saul Geiser was the director of research in Atkinson's office originally given the assignment to look into Freedle's theory. Eventually he arranged for Santelices, then a doctoral candidate at UC Berkeley, to do the research as her PhD thesis, working with Wilson, a UC Berkeley psychometrician who had also been asked to look at Freedle's work.

Geiser said he thinks the two researchers did a good job. He does not agree with Bunin's criticisms of their work. He said he, like Freedle, wants more more research on why blacks and whites answer these questions differently, so that any unfair disadvantages for blacks can be removed.

He said he thought the College Board, in particular, should "get over the denial" of any merit to what Freedle has discovered. That may take a while. The College Board, after all, may be right that the SAT is unbiased.

But the new paper means more researchers are likely to go more deeply into what Freedle has found, and eventually settle the question of what should be done about it.

Read Jay's blog every day at http://washingtonpost.com/class-struggle.
Follow all the Post's Education coverage on Twitter, Facebook and our Education web page, http://washingtonpost.com/education.

I'm disinclined to pay $10 to read the article. Jay, can you provide an example of a question with a common word that has such a drastically different meaning in black culture than in white culture that it interferes with correctly answering the question? It seems unlikely to me that, for example, the SAT would ask a question like, Complete the following sentence:

When I was growing up, I enjoyed the fireworks in my _____.

A. Hometown
B. Hood
C. ...
D. ...

We constantly hear about bias on tests - the New Haven firefighters' exam springs to mind - but I have yet to see one question that has been pointed to as biased, much less one where someone can explain just how it is biased.

I find it all extremely unlikely. I wish you had quoted some of the key points, Jay.

I know that the obvious criticism was that on harder words, those with a partial knowledge of the definition would be more likely led into a trap, whereas someone who had no clue what the word was would guess randomly. This could lead to fewer than 20% of those with more knowledge getting it right, whereas a random guess would lead to 20% getting it right.

That still seems far more obvious.

And of course, it's not as if the ACT has less of an achievement gap than the SAT does, and the ACT doesn't test vocabulary.

But Jay, look at the big picture! The testing that hurts Blacks more than anything is the primitive NCLB testing. The testing that you support also hurts everyone else, but the sad truth is that data-driven accountablity is more damaging to children in poor schools - the kids that it was designed to help. Ironically, the best side of NCLB could have been the flip side of its most destructive side. Had disaggregated test scores been used for a Consumers Report on all well schools were addressing the Achievement Gap, we might be seeing real progress.

But getting back to the SAT, you should link up the nuanced discussion of words in context and the importance of common knowledge, reread Hirsch's piece on why there is no such thing as a reading test, and use your bully pulpit for diagnostic testing, as well as a rejection of that obcene concept that a "culture of accountability" can not damage the kids of all races in poor schools.

In response to tomsing's comment: Your example is at best uniformed and more likely racist. The research did not say the test used cultural slang.

However your question would be flawed even if neighborhood was used as opposed to "hood", since hometown and neighborhood could both be correct answers. Also, you should note that the words "hometown" and "neighborhood' (or its slang hood) don't have the same meaning.

The problem is you would not know what words might have different meaning, in part because your insight into African American, or inner city culture, likely comes only from TV cliches.

However, when I graduated high school there could have been at least two definition for the noun "pick"? Is it a farm tool or a device for hair maintenance?

I think there's an argument to be made for having an SAT-R, not just because it would be fairer to African-Americans but also because it would be more useful a tool for all bright kids.

I missed a single question on the verbal portion on the SAT and it dropped my score way down. Had I taken a different version of the test with a question of similar difficulty (as measured by the results of the group pre-testing), I might've gotten the 800. I don't know, however, since it was fall of my 12th and the next administration of the SAT wasn't until after my applications deadlines had passed. So I never bothered to re-take it.

Had the proposed SAT-R been available, it would've presumaly been a more accurate reflection of where I stood vis-a-vis other bright students. The elite colleges could have better faith that the 800 scorer is actually smarter than the 770 scorer rather than it being a matter of luck as with the current test.

Once again we see race injected into everything. The SAT is a test of ability to think and process information, the outcomes are not race related but ability related. The math involves comprehension of concepts and its application; as a parent is frustration to me that we continue to excuse the failure of the educational system in the US by blaming "race". The truth is we are failing the minority students by continuously lowering the standards, expectations and accepting mediocre results. I am a Latina mother and my children better have a reason to bring home a "C" and for not taking challenging classes. I had teachers tell me a "C" is OK my answer no is not acceptable. This translates to lower GPAs and lower test scores. Please stop cheating this kids of their futures by committing the crime of low expectations. Let them all compete in a level playing field, not in one riled by special rules.

"He [Freedie] proposed that the College Board offer a supplement to SAT scores, called the Revised-SAT, or R-SAT, which would be calculated based only on the hard items." This, he said, would "greatly increase the number of high-scoring minority individuals."

Without judging the validity of Mr Freedie's thesis, there is merit in reporting a subcategory score for the most difficult tasks when a single score is derived from several subcategories of tasks ranging from "easy" to "hardest". Those involved in analysis of alternatives where numerical scoring is used to rank-order alternatives recognize this as the Aggregation Problem.

Basically the Aggregation Problem involves the relative mix of "easy", "hard", "harder", and "hardest" tasks to compute an overall score. If one uses too many "easy" tasks to compute the score, then the relative merits of alternatives are not well differentiated: (e.g., a clearly superior alternative might score only a few percentage points better than lesser alternatives.)

If too many of the "hardest" tasks are used to compute a single score, then the relative merits of the different alternatives are exaggerated: (e.g., a clearly superior alternative might be rated twice as good as lesser alternatives even though in reality it is not 100% better say when costs are factored in.)

By presenting a score derived from the hardest tasks along side with the overall score derived from all tasks, a decision maker can gain valuable insight into the relative merits of alternatives under considered.

Here are some of the simple words he mentioned in his 2003 paper: horse, snake, canoe, golf. And some of the hard ones, vehemence, anathema, sychophant, intractable. I didn't see any words there that could be confused, but as he said that was just a preliminary theory to try to explain an unexpected result. He has said there has to be much more research to figure out what, if anything, is going on.

I think some of the comments made above reflect a misunderstanding of what DIF is. DIF is not when members of one group do worse on the test as a whole (that's DTF- Differential Test Functioning). DIF is when candidates of equal ability overall have different chances of success on specific items (i.e. question). In other words, some items differently for different groups of people, so the same trait is not being measured and the results are not comparable.

As an example from a vocabulary test I've used as a research instrument:
"We are _________ to public demand."
This item shows very large DIF between males and females of the same overall ability. If a male candidate had a probability of success of 50% on that item, a female candidate of the same overall ability had only a 21% probability of success. This item is much, much more difficult for females than males, for reasons that are not obvious and it would not be acceptable in an operational test used for high-stakes purposes.

Having a single bad item in a test is usually not enough to distort the overall results (i.e. it doesn't result in DTF), but when there are systematic differences in large numbers of items, DIF needs to be addressed. This is not dumbing down the test, as comments above seem to imply, it's ensuring that candidates are being treated fairly.

jsump, I certainly mean no racism by my comment. The point of my example was, I don't think the people writing the questions at the SAT would be oblivious to a particular definition of a word. I don't think they'd say, "Well, a hood is either a part of a car or a garment, so it's not the right answer. Let's include it."

Lots of words have alternate meanings. We call them homophones, or homonyms. Pick could also refer to a basketball play, or a choice, and probably more meanings, and in both of our examples, it might function as a noun or a verb. That doesn't mean that, in context, the specific meaning shouldn't be perfectly clear, and the SAT holds both a black kid and a white kid responsible for knowing the one definition that it is applicable to the question at hand.

Jay, thanks for your response. I find it hard to believe that a black kid from a poor neighborhood doesn't know that horse is not just a basketball game, or that a snake is not just an underhanded person. I have no idea what other meanings the researchers might propose for canoe and golf, but again, lots of words are homonyms/homophones, and in context - reading is all about context - it should be clear.

If Freedle said in 2003 that more work is needed, good for him for not jumping to conclusions and confusing correlation with causation. But you quote the authors of the current study as saying that this is evidence that the SAT favors one ethnic group over another, and that admissions based on the SAT are biased against blacks. And your own headline states that there is new evidence that the SAT hurts blacks. I have a problem with all of that. In reality, all there is is evidence that certain groups of black students don't perform as well as certain groups of white students. It doesn't show bias, and it doesn't show that the test hurts any one group.

Ultimately, the SAT is marketed as a tool to predict college success (not sure how that's defined, exactly). If you want to prove or disprove that it's biased against, say, black people, it seems like the thing to do is compare the college success of the black kids at a certain score point with the college success of the general population at that score point. If there's a disparity - if, all else being equal, black kids are more (or less) successful in college than their SAT scores would indicate - then that is something the College Board should address.

This independent study of racial bias in the SAT is very important. The University of California researchers who (re)examined the data and concluded Freedle was correct noted:

"The confirmation of unfair test results throws into question the validity of the test and, consequently, all decisions based on its results. All admissions decisions based exclusively or predominantly on SAT performance—and therefore access to higher education institutions and subsequent job placement and professional success—appear to be biased against the African American minority group and could be exposed to legal challenge."

That's very strong language for an academic journal. This study will present a profound challenge to institutions which still rely heavily on the SAT to determine undergraduate admissions or scholarship awards. It will be a powerful resource for advocates of test-optional admissions and our allies who are trying to dismantle test-based barriers to equal opportunity in higher education.

Who are these UC researchers who conducted this "independent study" and "(re)examined" the data? How do I know that they didn't start out with an assumption of "racial bias in the SAT" of some unknown/undefined type and then went looking for statistical patterns in the data to support their going-in assumption?

If I conducted a double blind test, would another group of researchers come to the same conclusion as these UC researchers using the same data if these other researchers didn't know that they were studying racial bias and/or the SAT?

Humans are great at seeing patterns in complex phenomena where no pattern actually exist. Some human scientists say that this pattern-seeking bias in humans is genetic in nature and they credit human survivability over the millennia for its presence. (Fore warned, fore armed" even if a lot of false positives result.)

But as I understand it, the most likely reason why blacks, Hispanics, and Asians are getting higher marks on the questions is because they know *none* of the answers and are randomly guessing. Whites of similar ability, on the other hand, are somewhat more familiar with the words, so are more likely to fall for a wrong answer trap--thus putting their rate at lower than 20% (which is what blacks are receiving).

Cite: http://www.kimberlyswygert.com/archives/000447.html

So what Freedle wants to do is put together a test in which blacks can randomly guess and do better. If true, is that what we should do?

Jay, as I said, I don't think anyone ever questioned Freedle's data. It was his recommemdations that were problematic. So what does this new piece say about that?

What do you mean by biased? What is the "right" distribution of scores?

Blacks already get a preference equivalent to several hundred points because of affirmative action. Doesn't that correct for the bias in the SAT?

Also, didn't the CollegeBoard find that blacks with a given score had worse college GPAs on average? Most people think GPAs are a better measure of a person's academic achievement and are in fact better predictors of future success. So doesn't that mean the SAT is biased in FAVOR of blacks?

"But as I understand it, the most likely reason why blacks, Hispanics, and Asians are getting higher marks on the questions is because they know *none* of the answers and are randomly guessing."

That would be easy to check through item "discrimination" or "fit" statistics. I doubt that ETS would be careless enough to overlook obvious routine analysis like that.

This seems to be a much more fundamental issue where the "construct" being tested is not the same for different cultural groups. The measuring stick being used is not stable, so test scores are not directly comparable.

Did you read the link I included? She links to the actual College Board response, and if I understand it, that's exactly what happened--that he didn't consider that.

As I understand it, here's what the situation is:

Blacks (and, I think, Hispanics and Asians) get 22% right on hard problems, whereas whites get 31% right. Given that there are five possible answers, the *obvious* possibility that no one has dismissed that I've seen is that minorities are more likely to do straight guessing when faced with difficult words. For people with extremely weak vocabularies, a random guess is going to result in a *better* outcome than using weak knowledge, because weak knowledge means the tester is more likely to fall for one of the SAT's many traps.

What Freedle proposes is to create a test that's incredibly difficult, thereby giving everyone a chance to randomly guess, which would thus reduce the performance gap.

Moreover, Freedle wants to drop everything *except* difficult vocabulary, because it's only when people guess randomly that minorities do almost as well as whites, and the random guessing is only prevalent on vocab questions.

I would like to see Freedle's hypothesis applied to the critical analysis and math sections of the SAT. All of my African-American friends and collegues have very strong vocabularies. The reason may be based on cultural factors, a culture that includes an oral tradition and church attendance, for example. But there also could be a geographic reason, since many African-Americans who take the SAT are based in urban and suburban areas, while more Whites come from rural areas where parents were poorly educated and where the Internet was slow to make inroads.

I don't think that blacks and other minorities have some inherent disability when taking the SAT. Rather the other races that they are competing against have advantages that are not always available to them. Not trying to be racist or stereotypical but blacks receive the lowest average income of any race in America. they cannot afford expensive SAT prep and frequently lack an emphasis in education. Look at the chart posted by afsljafweljkjlfe on Math SAT scores. Now look at this chart on the average family income.

and again at the average SAT score (2009)
White 1581
Asian 1623
Native American 1448
Hispanic 1362
African American 1276

They all correspond nearly identically. I doubt the problem has anything to do with homonyms as that only affects 10 questions in the reading sections so it would not cause a drop that severe, and it definitely is not some intrinsic disability. Rather poorer students are typically educated in the worst school districts. Because of their sub-par education their SAT scores fall, It's as simple as that.

The data may be correct, but the interpretation is only one of many, some already articulated above. Here's another; Jay summarizes the distinction thus: "Hard questions, those that produced more wrong answers, tended to have longer, less common words. Easy questions tended to have shorter, more common words."

I have no reason to doubt that longer, less common words are correlated with more wrong answers overall (and conversely). But that's not the only relevant distinction. English words are highly polysemous (having many unrelated meanings) but shorter and more common words likely will have a lot more of them. So those nominally harder questions are more likely to be you-know-it-or-you-don't; maybe you have no idea what "allele" means, but you don't have to chose among many meanings you do know in order to pick the one appropriate to the context. It could be true, probably is, that the collection of meanings-you-do-know is smaller on average for variously disadvantaged groups, but it could also still be true that the ability to pick out the most appropriate one (and do it quickly!) reflects a real difference in intellectual facility. (If you do crossword puzzles, you know it's mostly the clues that make them hard, not so much the words.) If the DIF merely reveals such a difference that is correlated with race, then it is not per se evidence of bias.

I don't know how these various hypotheses could be distinguished empirically, but surely it is premature to proclaim one of them as proven.

Does anyone think this is more of a Socio-Economic issue as opposed to a race issue? In full disclosure, we are a black family, I'd say upper middle class, live in a county with good schools. Daughter recently graduated and will be attending a very competitive state university. Took the SAT, ACT, and AP classes, graduated with over a 4.0 GPA. Despite her "blackness" I'd say she did quite well. So why is that the case? Is it b/c of our income, where we live and how she was raised? Does the researcher think a white student that comes from an improvised area would do better? I'm not a educator, don't play one on TV and didn't stay at a Holiday Inn Express last night, but as a parent please explain why my child should be held to lower standard because of her race.

Jay, you're a hack for quoting this paper. Take a look at what version of the SAT these numbskulls were examining. Take a look at how many questions they analyzed. See if the same questions, analyzed on different tests, give the same or different results.

"Does the researcher think a white student that comes from an improvised area would do better?"

Actually, poor white students do much better on average than wealthy black students on most tests. Here's documentation on the SAT from the Journal of Blacks in Higher Education:
http://www.jbhe.com/features/49_college_admissions-test.html

That doesn't mean your child should be held to a lower standard and of course, what is true for the group isn't always true for the individual. But poverty doesn't explain the test score gap.

I have yest to see an actual question from the SAT exams in question that demonstrate the alleged bias. Until I see otherwise, I will consider reports such as this to be pure politically-motivated, junk statistical psuedo-science.

"Does the researcher think a white student that comes from an improvised area would do better?"

Actually, poor white students do much better on average than wealthy black students on most tests. Here's documentation on the SAT from the Journal of Blacks in Higher Education:
http://www.jbhe.com/features/49_college_admissions-test.html

That doesn't mean your child should be held to a lower standard and of course, what is true for the group isn't always true for the individual. But poverty doesn't explain the test score gap.
******
Appreciate the link to the data in your response. Equally appreciated the group/individual comment (it holds true across the board). WOW! Very interesting article but I'd like to see more recent data, the article is from 2005--theoretically some of the information that was derived is even older based on time to review input and compile the analysis. I'm very curious to see if there has been a change worthy of mentioning. There have been so many insensitive comments (not yours) on this subject. It's a tough pill to swallow, sometimes I just get so enraged when individuals are evaluated as the group. I don't think we need a different test, we need a change in how we educate. It's sad to think we may never achieve educational equality in the country.

Eaglechk, the "poor whites do better than wealthy blacks" metric is extremely robust and holds true across every cognitive test and has been known for 30 years or more. If it ever *didn't* hold true, the news would have been trumpeted to the skies. Rest assured, newer data shows the same gap or we would have heard otherwise.

In another thread, I posted NAEP data that showed whites eligible for free lunch outscoring blacks and Hispanics who were not eligible for free lunch. That was from 2008, I think. But it's been true forever, on every test you can think of. They usually hide this data because it's discouraging. For example, the SAT used to put the breakdown by race *and* income on their website, but they don't anymore. You have to be a researcher to get it now.

I got this interesting email from a veteran public school teacher in LA:
"
Before I started reading this article, my reaction to the headline was, in the current vernacular, WTF?

After reading it, I agree with it 100%; anyone who teaches in an urban school should know that this SAT challenge is a symptom of the core issue: language use and exposure. The incessant drive to 'differentiate instruction' comes from mostly this language disparity. Whenever I'm instructing directly, I must be very careful and thorough with my words, and check students' understanding constantly. While I may be a 'Black male teacher,' my cultural background is almost as disparate from my students as the 23 year old White female 1st year teacher from Orange County. In South LA/Watts words like 'slang,' 'flip,' and 'abroad' may have subtle but significantly different meanings than they do in other neighborhoods/communities, which compounded over time would account for the SAT disparity.

Freedle's work also confirms my simple advice to students who want to do better on the SAT: "read, and write, read and write, read and write, read and write." If you're taking an 'SAT-prep' class in HS, that's way too late.

While I classify the challenges Blacks have on the SAT as ethical versus cultural, Freedle's conclusions I can accept."

I've been a bubble test coach for many years, and this claim seems totally plausible to me. One thing, for instance, that makes the GRE Verbal much harder than the SAT is that it not only has more obscure words, but does a lot of the stuff Freedle talks about on a more intentionally tricky level—like using "nice" in its semi-archaic definition of "precise" and having another answer choice that almost but not quite would work with it meaning "pleasant". I've never seen anything that seems tricky in this way on the SAT, but then again, I'm a middle-class white guy from the suburbs, and if the study is correct, then of course I wouldn't notice it.

Incidentally, though I've never noticed the pattern talked about in this article in particular, (though I suppose now I'll unavoidably start looking for it,) I have tutored a fair number of black students from a range of class backgrounds, and my experience mostly seems to confirm Claude Steele's theory of "stereotype threat": the common difficulty for me has been that they tend to be overly tenacious about going with the "right" way of doing the problem (solving the algebra, canceling out all the Pis, etc.) instead of quickly shrugging and switching to Plan B (plugging in some numbers, saying Pi is 3ish and seeing what answers are just way off, etc.), because solving it that way seems like an admission of defeat rather than a shortcut to victory.

Excerpt: 'The Harvard Educational Review has published a research article by Maria Veronica Santelices (Pontificia
Universidad Católica de Chile) and Mark Wilson (University of California, Berkeley) that is critical of the
Differential Item Functioning (DIF) analyses used in the construction of the SAT®. Unfortunately, this work is
deeply flawed. It utilizes only partial data sets, focuses on a student sample that lacks representation and diversity,
and draws conclusions that do not match the data. Simply stated, this research does not withstand scrutiny.'

Jay, I thank you for this article. I am a 48 year old African American woman. While achieving academic scholarships for both undergrad and grad schools based on my grades, ACT and GMAT scores, I did not score as well on the SAT which was always a mystery to me. It should be noted I was in accelerated reading programs even in grade school, and have gone on to have a very successful corporate career. There is nothing to fear in providing fair and consistent standards and a level playing field. It makes us stronger as a nation, when as Americans our best players are at bat no matter their gender or ethnic background. The only fear is in artificially upholding mediocrity and celebrating it as achievement. This practice is what weakens us as a nation competing in a global economy, and creates a burden we must all carry when those who are unfairly, denied opportunities must be taken of one way or another by those who have.

Use of the SAT for university admissions is well-covered, again, in Bowen, Chingos, and McPherson's recent book on university completion, "Crossing the Finish Line". http://press.princeton.edu/titles/8971.html

IIRC, undermatching, by which they mean insufficient aspiration to matriculate to higher completion-rate schools accounts for some under-completion among socioeconomically disadvantaged minorities. But, once again, the direct role of SAT scores? Zip. Grades carry the predictive weight, and that is what admissions committees use.
Me speaking here: The function of the SAT? To indicate TO APPLICANTS the level of cumulative academically oriented learning acquired by students at a post-secondary institution. Whatever my quarrels with gaming the Challenge Index, there's no doubt that working hard in a few or several AP courses has greater impact on SAT scores than enrollment in the parent company's Kaplan Test Prep course. But, the admissions office will make greater use of the toughness of the schedule and performance in the courses AP tests, and subject matter tests than it will on the SAT.

I'd suggest instead, looking at a significant rate-of-learning differentials in HS, increasingly estimable without selectivity bias, as school systems have forced the PSAT on all students by removing cost barriers to testing in sophomore, junior (and even freshman) HS years.

I'm sorry I have not studied the paper under review. I hope all readers of JM's discussion will note carefully the specific and delimited impact of the bias in the SAT the blogger reports. Delimited, because the exam is little used in determining admissions.

If black examinees do better relative to examinees generally on word recognition involving harder, formal academic vocabulary items than on vocabulary items involving (theoretically) easier, common vocabulary items, then we might hypothesize that they'd do better relative to examinees generally on complete tests involving generally more difficult items (such as the GRE, MCAT, LSAT, GMAT and the like) than less difficult tests (such as the SAT and ACT) -- after controlling for differential selection effects associated with higher dropout rates during college for black students than for students generally. That doesn't now appear to be the case.

Mr Freedle and subsequent researchers are to be commended for effort in analyzing differential performance of ethnic groups, but we have yet to see evidence that tests devoid of easier items (for the general population) are more differentially predictive than are current tests (including the SAT), which in numerous studies have been found either to accurately predict or to "overestimate" the average performance of black students when a general prediction formula is used..

It would appear, from what we now know, that Mr. Freedle and the more recent researchers cited, have produced findings that have prompted considerable ado over very lttle of pragmatic significance.

But I don't think we have set seen whether tests embodying Mr. Freedle's ideas are more/less valid that the conventional tests to which he objects for predicting academic performance. Perhaps ETS or ACT, using existing files containing both item level responses and criterion data (e.g., self-reported GPA) might consider "reconstructing" SAT scores using Freedle-type criteria for item selection, and conducting a "differential validity" study for ethnic groups using the "reconstructed" test scores and appropriately modified versions of "regular" scores, respectively.

However, given the magnitude and longevity of the "score gap" (in this instance, the average for black examinees tending to be substantially below the general average on admission tests regardless of type) it seems unlikely that modifying the item composition of tests (according to established test-construction specifications) will lead to its reduction--more's the pity.

Let's cut right to it, instead of getting involved in the psychometrics and how to tweak a small portion of an exam, that portion of which matters little, and which accounts for little.

To put it in terms of outcomes, not acronym-loaded parameters and proceedures, how much do the supposed findings account for under-enrollment of any group in professions, say engineering? In biomedical sciences and applications? In digital engineering? In retail supply-chain management at any level?

How much in the enormous difference inwithin-group but between-sex success in matriculating to and completing university for some ethnic groups?

Not a bit; not a tinker's damn. (There's nothing to stop anyone who can text-message from looking up and learning an anachronism for "pissant", pronounced today "piss - ant", for emphasis.) That's the magnitude of real-world effect.

SAT scores overpredict college success for blacks and blacks are typically admitted with significantly lower SAT scores than whites or Asians. Those two facts seem to override any "bias" on the SAT, although I feel that any viable college applicant ought to know the usual meanings of common words. Period

By definition, blacks and whites are equally good at randomly guessing on multiple choice questions. So, the more difficult the question and thus the higher the percentage of students who randomly guess, the narrower the white-black differential.

If you made all the questions impossible esoteric, the white-black gap would disappear. If you made them all unbelievably easy, the white-black gap would also disappear. But when you make them a reasonable mix of difficulty in order to maximize the predictive value of the SAT, you wind up with a white-black gap -- because there is also a white-black gap in real world performance. (Philip L Roth et al 2003) http://linkinghub.elsevier.com/retrieve/pii/S0021901003015012

I finally managed to get a copy of the Santelices and Wilson paper and gave it a quick skim last night. It replicates Freedle's work, but addresses criticisms of his methodology, coming to the same conclusions.

Reading many of the comments above, it's clear that DIF is frequently misunderstood. DIF is not when there is an overall gap between black and white canditates (i.e. an external difference in results). DIF is an internal measure, where candidates with the same overall score perform differently on particular items, resulting in white and black candidates with different ability receiving the same overall score. By altering the ratios of items favoring white or black candidates, different persons would be accepted or rejected into college. Given the high-stakes nature of these tests, it's perfectly reasonable to require test developers to justify the choice of content of their tests and show that it's non-discriminatory.

The day we can stop comparing black people to white people will be a fabulous day! However, the correlation is an interesting one. Based on the degree of feedback,I believe most [non-black] readers are uncomfortable with the black children doing better than white children on the hard questions? I suggest to those individuals that they should unruffle their feathers and allow the educational experts to their jobs.