A few years ago I put up a post, WORDSUM & IQ & the correlation, as a “reference” post. Basically if anyone objected to using WORDSUM, a variable in the General Social Survey, then I would point to that post and observe that the correlation between WORDSUM and general intelligence is 0.71. That makes sense, since WORDSUM is a vocabulary test, and verbal fluency is well correlated with intelligence.

But I realized over the years I’ve posted many posts using the GSS and WORDSUM, but never explicitly laid out the distribution of WORDSUM scores, which range from 0 (0 out of 10) to 10 (10 out of 10). I’ve used categories like “stupid, interval 0-4,” but often only mentioned the percentiles in the comments after prompting from a reader. This post is to fix that problem forever, and will serve as a reference for the future.

First, please keep in mind that I limited the sample to the year 2000 and later. The N is ~7,000, but far lower for some of variables crossed. Therefore, I invite you to replicate my results. After the charts I will list all the variables, so if you care you should be able to replicate displaying all the sample sizes in ~10 minutes. I am also going to attach a csv file with the raw table data. As for the charts, they are simple.

- The x-axis is a WORDSUM category, ranging from 0 to 10

- The y-axis is the percent of a given demographic class who received that score. I’ve labelled some of them where the chart doesn’t get too busy

All of the charts have a line which represents the total population in the sample (“All”).

First, I use the General Social Survey. Second, I use the WORDSUM variable, a 10 question vocabulary test which has a correlation of 0.70 with general intelligence. My curiosity is about differences across white ethnic groups by region. To do this I use the ETHNIC variable, which asks respondents where their ancestors came from by nation. I omitted some nations because of small sample size, and amalgamated others.

Scandinavian = Denmark, Norway, Sweden, Finland (yes, I know that Finland is not part of Scandinavia, Jaakkeli!)

British = England, Wales, Scotland

Next we need to break it down by region. The REGION variable uses the Census divisions. You can see them to the left. I combined a few of these to create the following classes:

Northeast = New England, Middle Atlantic

Midwest = E North Central, W North Central

South = W S Central, E S Central, South Atlantic

West = Pacific, Mountain

The key method I used is to look for mean vocabulary test scores by ethnicity and religion. I also later broke down some of these ethnic groups by religion. Finally, all bar plots have 95 percent confidence intervals. This should give you a sense of the sample sizes for each combination.

First let’s break it down by race/ethnicity and compare it by region to get a reference:

Finally, let’s separate by religion for Germans and Eastern Europeans:

I include the last plot because these reports of nationality have to be taken with a consideration for the structure they may mask. People whose ancestors from Poland in the United States fall into two large categories: people of Jewish heritage whose identity as ethnic Poles was contested (recall that Jews often spoke Yiddish as their first language, a Germanic language), and Roman Catholic Slavs. I suspect many of those in the “None” category are also Jews by culture, if not religion.

Second: there is a tendency of people of all ethnic groups to have lower vocabulary scores if they are from the South or Midwest. This tendency is in many cases outside of the 95 percent confidence interval. It’s especially striking in the three groups with huge samples sizes in all regions: Germans, Irish, and British. Irish here includes both Scots-Irish and those of Irish Catholic background. Not only are the sample sizes for these groups large, but the roots of these groups in some of these regions go rather far back. In particular, the division between the people of British ancestry goes back centuries in the North vs. South divide.

How to understand this? There are a lot of complicating factors. But as outlined in Albion’s Seed and The Cousins’ Wars the divisions between the Anglo-Celtic folkways runs deep and long. If a time traveler from the 18th century arrived in the United States today and were asked which region was the heart of intellectual ferment they would correctly guess New England. Early Puritan New England was the first universal-literacy society in the world. This was to some extent a matter of conscious planning. The leaders of the New England colonies enforced limitations upon who could emigrate to their dominion. Religious exclusions and persecutions in this region are well known, but there was also a policy of rejecting the settlement of those who were perceived to be possible burdens upon the community. New England then selected for a middle class migration out of East Anglia and the port towns of southwest England. But the fathers of the early colony also rejected the transfer of the privileges of the blood nobility from the motherland, thereby throwing up a barrier to the migration of the aristocracy.

In contrast the lowland South received a more representative selection of the British class strata. The younger sons of the British nobility and self-styled gentlemen arrived to make their mark, as did those who became indentured servants and even slaves. A class society on the model of southwestern England recapitulated itself in this region. As for the uplands, what became Appalachia, an influx of Scots-Irish came to dominate the scene by the mid of the 18th century, disembarking in Philadelphia, and pushing down the spine of the high country down to the Deep South.

Conflicts between these “Anglo” groups framed the terms of debate over the 18th and 19th centuries. They were to some extent at the root of the Age of Sectionalism. Today because of the salience of race, and the prominence of the later wave of migration in the late 19th and early 20th century which remained vibrant in living memory for mod, these early divisions have moved out of sight. But they still remain. The difference between Germans in Texas and the Anglos of Southern extraction remains to this day, but note that Germans exhibit the same regional differences in vocabulary score as Anglos. Why? This may be a case where the original cultural substratum has an outsized impact (the dialect of eastern New England, made famous by the Catholic Irish of Boston, is descended from East Anglian English!).

Of course there might be a genetic difference. Intelligence is a quantitative trait, so it would be trivial to generate two populations which are genetically similar, but very different in trait value, simply through selection. In the 1630s ~20 thousands Puritans settled New England. For various reasons there was very little migration over the next century and a half. By 1780 New England’s population was 700,000, almost all through natural increase (not only was New England the world’s first universal literacy society, but its fertility was the highest in the late 17th century).

Finally, there’s the issue of disease and pathogen load. Endemic hookworm infection does seem likely to have made Southerners, of both races, relatively indolent and lethargic in comparison to Northerners. Who knows what pathogens simply fall below our radar?

Overall I think that a more fine-grained and detailed exploration of these topics is warranted. Our public discussion is too coarse, and data-thin.

(Republished from Discover/GNXP by permission of author or representative)

WORDSUM is a variable in the General Social Survey. It is a 10 word vocabulary test. A score of 10 is perfect. A score of 0 means you didn’t know any of the vocabulary words. WORDSUM has a correlation of 0.71 with general intelligence. In other words, variation of WORDSUM can explain 50% of the variation of general intelligence. To the left is a distribution of WORDSUM results from the 2000s. As you can see, a score of 7 is modal. In the treatment below I will label 0-4 “Dumb,” 5-7 “Not Dumb,” and 8-10 “Smart.” Who says I’m not charitable? You also probably know that general intelligence has some correlation with income and wealth. But to what extent? One way you can look at this is inspecting the SEI variable in the GSS, which combines both monetary and non-monetary status and achievement, and see how it relates to WORDSUM. The correlation is 0.38. It’s there, but not that strong.

To further explore the issue I want to focus on two GSS variables, WEALTH and INCOME. WEALTH was asked in 2006, and it has a lot of categories of interest. INCOME has been asked a since 1974, but unfortunately its highest category is $25,000 and more, so there’s not much information at the non-low end of the scale (at least in current dollar values).

Below you see WEALTH crossed with WORDSUM. I’ve presented columns and rows adding up to 100%. Then you see INCOME crossed with WORDSUM. I’ve just created two categories, low, and non-low (less than $25,000 and more). Additionally, since the sample sizes were large I constrained to those 50 years and older for INCOME.

Of those with low income, about 1 out of 5 are smart. And of those who are smart, 1 out of 5 are poor. Remember, this is for those above the age of 50, not college students. I thought perhaps retirees might be skewing this. Constraining it to 50-64 changes the results some in a significant fashion. 1 out of 5 poor remain smart, but only 1 out of 10 of the smart are poor. As for the rich dumb, you have to look to wealth. It is notable to me that there’s a big drop off at more than $500,000 dollars in wealth. And, a large fraction of those with wealth in the $100,000 to $500,000 are dumb. I think we might be seeing the 2000s real estate boom.

…It’s not supposed to be an exact measure of IQ by profession by any means, as it is based entirely on average annual income figures. In other words, it’s an income table with the values converted to IQ scores….

…the following table estimates average IQ scores by occupation solely on the basis of the Career Cast mid-level income figures. The median salary (of a paralegal assistant) is taken to correspond to an IQ of 100. One standard deviation is assumed to be 15 IQ points….

You can see the full list at the Audacious Epigone‘s place, but here’s a selection I found of interest:

Occupation

Estimated IQ from median income

Surgeon

234

Physician

161

CEO

148

Dentist

140

Attorney

128

Petroleum engineer

126

Pharmacist

126

Physicist

125

Astronomer

125

Financial planner

123

Nuclear engineer

121

Optometrist

121

Aerospace engineer

120

Mathematician

120

Economist

117

Software engineer

117

School principle

116

Electrical engineer

115

Web developer

115

Construction foreman

115

Geologist

114

Veterinarian

114

Mechanical engineer

113

Biologist

111

Statistician

111

Architect

111

Chemist

109

Stockbroker

109

Registered nurse

107

Historian

107

Philosopher

106

Accountant

106

Farmer

105

Zoologist

104

Author

103

Undertaker

103

Librarian

103

Anthropologist

103

Dietician

102

Archeologist

102

Physiologist

102

Teacher

102

Police officer

101

Actor

101

Electrician

100

Paralegal

100

Plumber

100

Clergy

98

Social worker

97

Carpenter

97

Machinist

96

Nuclear decontamination technician

96

Welder

95

Roofer

95

Bus driver

95

Agricultural scientist

95

Typist

94

Travel Agent

93

Butcher

92

Barber

90

Janitor

90

Maid

88

Dishwasher

88

Off the top of my head, I would say that the highest disjunction in the low income direction would be clergy. This is especially true for Roman Catholic and mainline Protestant denominations in the United States, which have moderately stringent educational prerequisites for their clerics. I assume that the biggest in the other direction are surgeons and medical doctors, who enter a market where there’s less and less real price signalling, where labor controls the supply of future labor, as well as well influencing the range of services that competitive professions (e.g., nurses) can provide.

(Republished from Discover/GNXP by permission of author or representative)

Every time I use the WORDSUM variable from the GSS people will complain that a score on a 10-question vocabulary test is not a good measure of intelligence. The reality is that “good” is too imprecise a term. The correlation between adult IQ and WORDSUM = 0.71. The source for this number is a 1980 paper, The Enduring Effects of Education on Verbal Skills. I’ve reproduced the relevant table…

Estimated Correlations for Variables in a Model of Enduring Effects of Education for White, Native-Born People 25 to 72 Years Old in the Contemporary [1970s] United States

Child IQ

Age

Sex

Father’s Educ

Father’s SEI

Educ

Adult IQ

WORDSUM

Child IQ

-

0

0

0.31

0.30

0.51

0.80

-

Age

-

-

0.026

-0.304

-0.130

-0.304

-0.42

-0.005

Sex

-

-

-

-0.054

0.058

0.050

0

-0.121

Father’s Educ

-

-

-

-

0.488

0.469

0.30

0.302

Father’s SEI

-

-

-

-

-

0.347

0.31

0.285

Educ

-

-

-

-

-

-

0.66

0.511

Adult IQ

-

-

-

-

-

-

-

0.71

WORDSUM

-

-

-

-

-

-

-

Obviously since the WORDSUM test was not given to those under 18 you can’t calculate the correlation between childhood IQ and WORDSUM score. Additionally, I suspect since 1980 there’s been a bit more cognitive stratification by education. I notice in the GSS sample that there are many older people, especially women, who have high WORDSUM scores but no college education. In the younger age cohorts this pattern is not as evident because if you are intelligent the probability is much higher that you’ll obtain a university education.

A correlation of 0.71 is not mind-blowing, there’s a significant difference between IQ and WORDSUM as they relate to each other linearly. But I think it’s good enough to get a sense that WORDSUM is a serviceable substitute for a more rigorous measure of g in lieu of any alternatives, and not so clumsy a proxy so as to be useless. Though that call is up to you, and readers are free to disagree with the methodology of the model used to obtain this correlation. Additionally, I would point out that WORDSUM is a subset of the vocabulary subsection of the Wechsler Adult Intelligence Scale. WORDSUM is in effect a slice of an IQ test.

I am bookmarking this post so that in the future I can simply place a link in the comment threads in response to objections to WORDSUM.