a) Note that there is no data for the class of 2018. Perhaps removing this data is one way that Williams keeps track of who it gave this data to and, therefore, who it can go after for leaking it to me.

b) The numbers of students range for 185 for the class of 2011 to 391 for the class of 2017. Since around 1,250 applicants are admitted to Williams each year, we definitely don’t have the complete data.

c) It is interesting to see data for applicants that we admitted — I assume that everyone in this data was admitted — but who chose not to enroll.

d) Would you believe a 230 point difference between Asian-American and African-American SAT scores among Williams students?

30 Responses to “Admissions Data”

Unsurprised but disappointed you’d do that analysis. This is a subset of the data and you don’t know if it is a random subset. As such, it’s really not appropriate to do that type of comparison.

It’s also private data you’re not supposed to have nor share about current students. This is ethically data you should not present as you have done. For example,had #4 on the excerpt you posted enrolled at Williams, it would not be hard to believe that class of 2019, Asian + from British Columbia is enough info to identify a specific student.

Basic research ethics would demand you take down this post and at minimum anonymize the raw data. Basic statistics would say that if you don’t know how the sample was selected, you’d be far more cautious inferring anything from it.

I think Sigh is correct on a number of counts. First, as noted, the data is not “anonymized” (is that a word) and could be linked to specific students.

Equally important, without knowing how the specific students were selected in the data you were given, trying to draw generalized conclusions about the class as a whole is impossible. Its possible, for example, that the data were specifically selected to try to emphasize (exaggerate?) differences between groups as shown in your graph.

It is often said that the “smoking-gun” statistics for the data like this is:
% of Asian Americans with 800 Math SAT who applied and were rejected
% of White students with 800 math SAT who applied and were rejected (also, geographical distribution of this by state would be of interest)
% of URMs with 800 math SAT who applied and were rejected

How do you think these percentages look at Harvard? At MIT? at Williams?
Leaked data of this particular statistics would make headlines of every national newspaper

Agree with the above, plus: The first thing you should have done is asked the administration about the data. If they refused to comment, you could have shared some of it (after ensuring individual confidentiality), but they should at least have the chance to explain what it is, such as whether it is random, and how best to interpret it.

While we are at it, it would be interesting to look at your dataset for % of Asian Americans w/ sat math of 800 vs whites, and compare it with baseline for the two races from colllegeboard data ( this data is available publically online). This would give you a first order approximation if it is harder for asians to get in compared to baseline expectations

The question of much more interest is white male sat math score vs Asian female math sat score. I do not have a link for this, but I recall seeing statistics that Asian females score higher them white males on math GRE. Of course, college board does not disclose data broken down by gender AND race, primarily because it would be embarassing for them. To put it bluntly, it is not that all girls are bad at math. Only white ones are.

Food for thought:
If it turned out that % of Asian girls with high math sat is higher than % of white males with high sat, would that mean that Williams is holding Asian girls interested in STEM to higher standard compared to white males interested in STEM?
Hoes does that square with williams’ professed support for closing the STEM gap? Should someone in administration be held accountable for that?

Just goes to show that if Williams wants to define itself by high SAT scores (best school discussion earlier) it will have to admit a lot more Asians and a lot fewer blacks. Colleges in California that admit only on the basis of scores are heavily Asian. Not suggesting the school does that but Asians have a very strong academic focus and work ethic-not genetic superiority. Would be interesting to see how many Asian athletes there are-it’s hard to study all the time and play sports (violin is not a sport). Fact of the matter is, this information is really just confirmatory so why can’t we admit it and say we are doing it for societal benefit (or whatever reason you choose) or to create a more rounded group of people?

I opened up the data set. We are talking about 2,110 individuals whose data is now available for inspection. I’m not sure that having completely random data would do much to improve the significance of the mean scores reported above.

That being said, it appear to me that Williams College staff have leaked to Ephblog damning information documenting the unjust racial discrimination the school practices against Asian students.

The idea that racial discrimination against these Asian students is justified by the absurd suggestion that they are less interesting or less likely to make the campus a better place than lower scoring black students is deeply offensive.

It is clear to me that there are a lot of Asian students who “deserve” to be at Williams College who are not and a lot of black students who do not “deserve” to be at Williams College who are.

I expect that as more information leaks out this evil, intolerable, racial discrimination will come to an end. As it stands, there are a lot of Asian students and parents who are right to be furious at this disgusting revelation.

The discrimination claim rests on the assumption that higher SAT scores means a better student/candidate. I’m pretty sure plenty of academics would dispute that assumption. If you take the position that the best candidates are not simply the ones with the best SAT scores, then the admissions issues become much more complicated.

So, Whitney,
Can you please list a metric better/ more objective then sat to assess the incoming students to prove discrimination? Or is it that such a metric does not exist so that the discrimination ex-ante cannot be proven? What would be the hypothetical evidence that would satisfy you?

Can you please list a metric better/ more objective then sat to assess the incoming students to prove discrimination?

That’s a fair question, to which I don’t have a good answer. I’m not sure there is a good way of objectively measuring the “best students,” but I’m doubtful that SAT alone is a good measure. If it were, we could probably file the application on a post card with an SAT score and a list of grades and be done with it. I think there are simply too many different dimensions to “intelligence” to be able to capture them on a single test. For example, is someone who is fluent in 4 languages smarter than someone who has mastered college level thermodynamics? (I know that was the single hardest class I ever took at Williams. I think it was Chem 301 or maybe 303 at the time, and it met at 8:00 am three days a week with Prof. Peacock-Lopez, who was new at the College at the time.) I don’t know the answer, though I suspect there are more people in the world who meet the first test than the second.

I think the SAT (or the ACT) is a good tool, because it allows for some apples-to-apples comparisons amongst prospective students of wildly differing backgrounds. But I don’t think Williams would serve itslef

I was very intrigued by DDF’s suggestion last week of asking each professor to name the 3 or 4 “best students” that he or she knew in each graduating class, without trying to set forth criteria to make those selections.

Let X be admissions rate for whites who scored 800 on math SAT. Let X/Y be admissions rate for asians who scored 800 on math sat. Is there a value of Y at which you would say “ok, there is something very fishy happening here. Is it 2? Is it 4? Is it 8? Is it 64? Is it 10000000?

What is the value of Y at Williams? At Harvard? We can be fairly confident it is 1 at Caltech.

1) On “ethics,” there seems to be confusion. First, researchers do have a duty of confidentiality when they promise to keep data secret as part of the process of receiving access to it. I, personally, have access to other sorts of Williams data that I do not make public, precisely for this reason. Second, reporters have no such duty. If a source gives you data, then you can use it. A source gave me this data — in fact, the source encouraged me to make it public! — and so I have shared it with you. I had no duty to keep this information secret, even if it includes data which is not very well “masked.”

Anyone who argues otherwise should explain why the New York Times does not have a duty to keep the data it gets (like e-mails and spreadsheets from Wikileaks) private, especially when the data contains information that identifies specific people.

2) Other responses:

Equally important, without knowing how the specific students were selected in the data you were given, trying to draw generalized conclusions about the class as a whole is impossible.

Maybe, maybe not. Depends on the details.

Its possible, for example, that the data were specifically selected to try to emphasize (exaggerate?) differences between groups as shown in your graph.

It’s also possible that the sky will be purple when you wake up tomorrow, but that would not be the way to bet.

a) You really think that there is some secret alt-rightist with access to this data who would take the trouble of doing this? I doubt it!

b) Even a secret alt-rightist would have no reason to do so! This data is completely consistent with everything else I have ever read about elite college admissions. You really think that, for example, the average SAT score is equal between Asians and Blacks at Williams? Ha! Recall that we know, for certainty, that the average white/black SAT score gap is 200 points at Amherst.

c) It is really, really hard to take a given set of data, and then pull out a biased sub-sample, without leaving any clues.

Sorry, David, but just because you are trying to play journalist does not make you one. If you were, you would know that journalists actually do have ethical standards and do not simply publish all the information that they encounter. The privacy of individuals is one of the fundamental aspects of journalistic ethics, and documents that are acquired are routinely redacted before release to protect privacy. Even the emails from Wikileaks were redacted before publishing (though unredacted versions may be available on Wikileaks itself), and those involve public figures, for which privacy standards are much lower.

I have deleted the raw data at least temporarily, both because the arguments above about student-confidentiality have merit and because a college official, whom I respect, asked me to.

Question to JAS/ZSD/sigh/others: If identifying information were removed, would you still object to making the data available? (I could do this easily by removing city information and recategorizing all non-US students as “foreign.” This would mean that there would be many rows for any combination of class/ethnicity/gender/country, thereby making it impossible to identify any particular student.

I would also leave race/gender/state combo for large states (at least NY and Texas). One of the common complaints is that students from the north east get admitted due to virtue signaling things on their application while students from the flyover get admitted due to merit.

Why should student applications be private? The college and university system exists for the good of civil society, and is paid for by taxes and through tax exempt endowments. (I understand that this particular data set is confidential, but why should it be?)