Bermuda is a tiny British Overseas Territory in the North Atlantic Ocean, some 600 miles from the East Coast of the United States (population: 64,700). Even though Bermuda is 1000 miles from the Caribbean Sea, there are a number of sociological similarities between Bermuda and the Caribbean island nations; it is an associate member of the Caribbean Community. Its economy, much like the Cayman Islands and The Bahamas, is largely based on finance and tourism, and it likewise enjoys one of the highest standards of living in the world.

According to the 2000 census, Bermuda is 54.8% black and 34.1% white. IQ and the Wealth of Nations (2002) did not include intelligence data for Bermuda, but IQ and Global Inequality (2006) reported an IQ of 90, as the average of two studies. In this post I discuss some overlooked data which suggest that Bermudian blacks have an IQ that is very close to 100, and that there is no IQ gap between black and white Bermudians. There is also some overlooked test data which suggest otherwise, and we are left with some uncertainty over the meaning of the conflicting research.

Sandra Scarr is one of the very few social scientists to offer serious, original research into the causes of race differences in intelligence. Her Minnesota Transracial Adoption Study might be the single best empirical work on the topic. She also, improbably, managed to remain mostly respectable among other social scientists by walking a careful line with her hereditarianism. In her tribute to Arthur Jensen, she admitted that this acrobatic feat involved summarizing her own research in ways that weren’t entirely scrupulous:

My colleagues and I reported the [MTRAS] data accurately and as fully as possible, and then tried to make the results palatable to environmentally committed colleagues. In retrospect, this was a mistake … We should have been agnostic on the conclusions. (Scarr, 1998)

Indeed, Scarr’s research on race is cited by her “environmentally committed colleagues” just as readily as it’s cited by Jensenists. Part of this is because the new behavioral genetic methods for examining race differences were so crude, that her results were ambiguous and open to conflicting interpretations.

But one of Scarr’s findings has fallen into obscurity, and hasn’t been cited by environmentalists or hereditarians. The only rhetorical use of this finding was by Scarr (1987), in which she boasted about her contributions to the non-genetic theory of black-white IQ differences:

…In 1967, I began a program of research that continues today, employing five previously unused strategies to study the sources of racial differences in intellectual performance: (1) studies of individual differences within the U.S. black population by the twin method; (2) the study of genetic markers of degrees of African ancestry and their relation to intellectual differences within the U.S. black population; (3) the study of transracial adoption, with Richard Weinberg, by which socially classified black children are reared in the cultural environment sampled by the tests and the school; (4) cross-cultural studies in which black children are or are not socially disadvantaged; and (5) educational intervention programs with young children to test idea about reaction range and malleability. Evidence against a racial genetic hypothesis … has come from all five sources. (p. 222)

Evidence for the racial genetic hypothesis also comes from all five sources. Scarr proceeds to describe her cross-cultural evidence against the genetic hypothesis:

In Bermuda, we found that black children have IQ scores at the norm for white children in the United States at age 2; at age 4 their average IQ score is 99, and by sixth grade they score 2 years above U.S. white children in vocabulary, reading, and math on the California Achievement Test, a culturally loaded instrument to be sure! These findings amazed Bermudian officials as much as us.

Results from cognitive tests before the age of 3 are more often described as Developmental Quotients (DQs) rather the Intelligence Quotients (IQs), because these scores are influenced by non-cognitive dimensions like motor development. While white children typically score higher than black children on some test measures, like vocabulary, before the age of 2, black children often match or exceed white children on developmental cognitive tests (Bayley, 1965). But African-American deficits on IQ tests are large by the age of 3 (Peoples et al, 1995), so if Bermudian children are, in fact, exceeding white American children on cognitive tests at age 4, that is certainly an underappreciated challenge for hereditarians. Unfortunately, Scarr provided no citations in her article, and when I first read this article (2004-ish), Google Scholar and Google Books weren’t around to help me search for this Bermudian mystery study. However, several years later, in 2007, I found a different study, which strongly supported Scarr’s assertion.

Bermuda and The Adult Literacy and Life Skills Survey

“Literacy” is typically measured as a binary trait: either you can read or you can’t read. Lynn and Vanhanen (2006) correlate this kind of literacy with national IQ in IQ and Global Inequality:

The adult literacy rate is the percentage of people ages 15 and above who can, with understanding, read and write a short, simple statement related to their everyday life (Human Development Report, 2002, p. 272). Statistical data on adult literacy are in most cases estimations, which may be based on censuses or school enrollment statistics. (p. 82)

The correlation between national IQ and literacy is 0.66 (p. 103). But their data show that some undeveloped nations with the lowest IQs have high literacy levels. The ability to read and write is something that humans appear to be capable of across the entire nonpathological spectrum of IQ differences, but this does not preclude qualitative differences in literacy:

According to our hypothesis, differences in national IQs may explain a significant part of the contemporary global inequalities in literacy, although it is quite possible that, ultimately, the adult literacy rate will rise to near 100 percent in all societies. It should be taken into account, however, that there may be significant differences in the quality of literacy between countries, although the percentages are the same. (p. 111)

“Quality of literacy”, like the phrase “quality of schooling” used in some economics papers, is really a euphemism for ability differences. The Educational Testing Service (ETS) has already created these kinds of “functional literacy” tests for national and international comparisons. Linda Gottfredson (2002) makes the argument that these tests are “a surrogate measure of g” (pp. 359-367). The tests are not measuring the ability to read (performance does not improve when the questions are delivered orally) but core mental faculties such as comprehension and reasoning:

… the [Adult Literacy and Life Skills Survey] defines skills along a continuum of proficiency. There is no arbitrary standard distinguishing adults who have or do not have skills. For example, many previous studies have distinguished between adults who are either “literate” or “illiterate”. Instead, the ALL study conceptualizes proficiency along a continuum and this is used to denote how well adults use information to function in society and the economy. (Desjardins et al., 2005, p. 15)

Americans were surveyed with the National Adult Literacy Survey (NALS) in the early 1990s, and large samples were subsequently tested in nearly 30 countries with its modification, the International Adult Literacy Survey (IALS). The Adult Literacy and Life Skills Survey (ALL), another international successor to the NALS, is a comparison of representative adult population samples in 6 world regions: Bermuda, Canada, Italy, the United States, Norway, Switzerland, and Nuevo Leon, Mexico.

The four ALL sub-tests are prose literacy, document literacy, numeracy, and problem solving.

Bermuda gave a strong performance on the ALL, placing third, behind Norway and Switzerland. Bermuda’s scores were slightly ahead of Canada and the United States, and far ahead of Italy and Nuevo Leon.

Three of these nations—the US, Italy, and Norway—also participated in the TIMSS 2003 international achievement test, which included our “Greenwich IQ”, the United Kingdom. Using these three nations as a bridge between the two tests, gives us an Achievement Quotient (AQ) of 99 for Bermuda. (Table I below.)

Table I: Achievement test scores in Bermuda

Admin

Sample

Age

N

Test

AQ

Reference

_

2003

A

16-65

2696

ALL

99

Desjardins et al., 2005

_

Should the functional literacy test count as an achievement test or an intelligence test? At the very least the ALL and PISA include problem solving sub-tests that are not obviously related to learned material. These sub-tests seemingly have a greater conceptual claim on intelligence than, say, the Peabody Picture Vocabulary Test, the 10 item WORDSUM test from the GSS, or even a number of the sub-tests from the Wechsler tests. From a psychometric standpoint, these tests are also better constructed for international comparisons (e.g. more thoroughly checked and corrected for test bias). I will nevertheless classify them as achievement tests for now since they are not validated or popularly recognized as intelligence tests among specialists. But the ALL certainly still qualifies as evidence that Bermuda has an intelligence level comparable to Western Europe and its global diaspora.

Furthermore, the ALL classified Bermudians and Americans according to race, which allows us to see the functional literacy scores of blacks and whites in both nations on the same test. (Riley, 2006, p. 11 ; Rivera-Batiz, 2008, p. 16 ). Black-white gap? Nope. Table II shows the Achievement Quotients for all four groups, normalized against the UK TIMSS results. The U.S. gap is .74 SD, while the Bermuda gap is an invisible .03 SD.

It is perhaps noteworthy, though, that the subtest scores still show a pattern familiar in cross-cultural testing: Bermudian blacks scored highest on verbal, lower on numeracy, and lowest on abstract thinking. The only statistically significant difference between blacks and whites in the Bermuda sample was that whites scored higher on problem solving (likely the most g loaded subtest).

John Loehlin once noted that even though fullscale IQ scores did not vary with ancestry in Sandra Scarr’s crude admixture study (Scarr et al, 1977), the subtest patterns did, in fact, show a genetic correlation. Black ancestry correlated with memory performance, while white ancestry correlated with abstract problem solving (Loehlin 2000, pp.187-188). Compare this with the international sub-test patterns:

African blacks show the same test profile as US and Jamaican blacks, for example with strengths on perceptual and short term memory tasks and weakness on tests of abstract reasoning (this is for matched total IQ, remember).

The ALL survey also shows that functional literacy is strongly related to income and education in all the nations. Additionally, the data reveal no sex difference in Bermuda.

Finally, the ALL survey shows performance differences among adults aged 16-65. The average ALL score increases between the two cohorts roughly born between 1947 and 1982 by 7.46 AQ points (this is further evidence that the ALL is an intelligence test, since math and reading tests have not been shown to exhibit performance gains). This is also close to the 0.3 IQ points per year inflation rate shown on standard intelligence tests. Bermuda and the United States show the smallest score gains of the seven nations (6.51 and 3.48 points, respectively). Bermudian scores have even dipped somewhat among the youngest cohort (Riley, 2006, p. vi ).

(I can find nothing to suggest that Scarr’s California Achievement Test data for Bermuda has ever been published. But, at least through the 1980s and into part of the 1990s, the CAT was routinely administered to all the government school children in Bermuda. There is no good reason to reject Scarr’s assertion that Bermudian middle schoolers exceeded American children on this test of reading and mathematics.)

Bermuda. A study by Sandoval, Zimmerman, and Woo–Sam (1983) reported an IQ of 88 for a sample of 161 7–11-year-old children in Bermuda tested with the WISC–R. Scarr and McCartney (1988) have reported a study of 125 4 year-olds given the Stanford Binet. The sample was approximately representative of the racial mix, consisting of 61 percent Africans and 37 percent Europeans (Phillips’, 1996). The IQ of the sample was 92. The average of the two studies gives an IQ of 90 for Bermuda.

Sandoval et al. (1983 ) take their data from an unpublished doctoral dissertation (Astwood, 1974). I have requested this document from its University library, and I will update this post in the future if it contains any data unreported by Sandoval. Lynn incorrectly reports the sample size from this paper, which is 92, not 161. He also incorrectly reports the IQ, which is 89, not 88. Sandoval et al, analyze the verbal subtest responses for item difficulty bias, and conclude that the test is fair for use in Bermuda.

Sandra Scarr and Kathleen McCartney collaborated on several different studies in Bermuda, which used different samples of children. The first published IQ data comes from McCartney et al (1982 ). The purpose of this research was to study the effects of day care quality on child outcomes. Bermuda was chosen as an ideal environment for examining this issue, because most Bermudian children are raised in common day care facilities for the majority of the day while their parents work (50% were found to be enrolled in day care in the first year of life, 84% by age 2, and 90% by age 3). Additionally, the parents choose day care facilities based on proximity to their workplace, and not on reputation or quality. Both of these help filter out unknown selection effects that might contaminate typical day care studies. 159 Children who had attended 9 different day care centers since infancy were tested at age 4 with the Peabody Picture Vocabulary Test-Revised. Their guardians (almost all mothers) were also given the same test. 130 of the children were black, 21 were white, and 7 were Portuguese.

The IQ of the children was 82.8, and the IQ of the mothers was 85. Data are not reported by race, but the authors state that “White mothers scored higher on the PPVT” (p. 138), and mother’s IQ was highly correlated with mother’s ethnicity (.33). For comparison, the correlation between race and Wordsum IQ is .22 in the General Social Survey. The authors find that the mother’s race is highly correlated with child IQ, while day care variables are not:

Age of entry into group care and number of hours spent in group care in the first three to four years of life had no significant effect of [sic] PPVT scores, nor did differences in the qualities of the day care environments (p. 140)

Scarr & McCartney (1988 ) report Bermudian data for the Mother-Child Home Program. This is an experiment where trained teachers visit mothers at their home dozens of times over a period of two years and tell them ways to play and interact with their children that are hypothetically beneficial for IQ development.

Lynn reports a sample size of 125 for this study. But some of the children dropped out or died, so the child sample is actually 117. Further, the mothers were also tested with the vocabulary subtest of the Wechsler Adult Intelligence Scale, so there is additional data here for 117 more people. Lynn reports an IQ of 92, but the IQ scores reported in this paper are nowhere close to this number. The SBIS IQs are actually reported as 106.6 for the treatment group and 103.1 for the control group (p. 539). If we apply a Flynn adjustment of 2 points, this gives us an IQ of 101 for the control group. The maternal scores on the WAIS subtest were significantly lower (see Table III). The MCHP treatment program showed no significant effects on child intelligence.

Scarr’s third and largest IQ study on the island, in cooperation with the Bermuda government, involved testing nearly every 2 and 4 year old with the Stanford-Binet Intelligence Scale. This was the Islandwide Screening, Assessment, and Treatment Program. 1020 children were tested, representing some 86% of children in the targeted age range on the entire island. This is almost certainly the study Scarr was referring to in her 1987 editorial, but the full data from this study was never published.

Data from smaller subsamples of this screening program are reported by Scarr et al (1994 ). 75.5% of Bermudian children passed the screening assessment for cognitive and language disabilities (p. 206). The SBIS IQs of a small random sample who passed screening was 102.5. 10.1% of children failed the screen for cognitive disability. The average IQ of two small intervention groups who failed the cognitive screening was 82. 14.4% of the children failed the screen for language disability. The average IQ of two small intervention groups who failed the language screening was 92. A weighted average of these subsamples gives us an IQ of 100 for Bermuda—very close to the value reported by Scarr (1987). However, the Stanford-Binet norms used by these researchers were outdated by 12 years, and there is no indication the reported data were corrected for this. A Flynn adjustment of -3.6 points gives us a somewhat lower IQ of 96.

These 4 studies give us 8 IQ samples for Bermuda (Table III). A weighted average of the 6 normal samples gives us an IQ of 89 for Bermuda.

Table III: IQ test scores in Bermuda

Admin

Sample

Age

N

Test

IQ

Reference

_

1980

A

4

159

PPVT

83

McCartney et al, 1982

_

1980

A

Adult

159

PPVT

85

_

_

1975

A

7-11

92

WISC-R

89

Sandoval et al, 1983

_

_

1978

SA

4

78

SBIS

104

Scarr & McCartney, 1988

_

1978

A

4

39

SBIS

101

_

1978

SA

28

78

WAIS

91

_

1978

A

28

39

WAIS

92

_

_

1985

A

4

108

SBIS

96

Scarr et al, 1994

_

Concluding Remarks

My own IQ average for Bermuda is actually a little lower than Richard Lynn’s. The weighted average would obviously be higher if Sandra Scarr would’ve published her data from the islandwide screening of Bermudian children. I have requested unpublished numbers from Scarr (who has so far ignored my emails), but I’m not going to freely extrapolate study details from the vague claims in her 1987 article (which doesn’t even clearly indicate she is referring to the data from the screening project). But even assuming that Scarr’s 99 IQ claim was for the islandwide screening sample (N=1020), the deviation IQs from this project reported by Scarr, et al (1994 ) do not suggest a Flynn correction was made for the outdated Stanford-Binet norms. This correction would actually give us an IQ of 95 for the screening project, and the weighted average IQ for Bermuda would still only be 92. Given the approximate demographics of the island, this would be consistent with a white IQ of 100 and a black IQ of 88.

The ALL adult literacy scores are a significant obstacle to this interpretation. This test is seemingly more like an IQ test than an achievement test, and the study is so much larger and better conducted than all the other intelligence studies that it feels like it should be weighted accordingly. On the other hand, we might somewhat reconcile the conflicting data points if we assume the ALL is an achievement test. Large IQ-achievement test discrepancies were also found for Cuba and the Dominican Republic (Which is not to imply that these discrepancies aren’t a huge problem as well). This would also be consistent with Sandra Scarr’s (unverifiable) claim that Bermudian sixth graders were outscoring US norms on the California Achievement Test during the 1980s.

The lack of race differences in the ALL are yet another problem. The ALL showed no Bermudian race differences in functional literacy, even among 60 year olds. However, there are large sociological gaps between blacks and whites in Bermuda, much as there is in the United States (I don’t intend to summarize non-cognitive gaps in my HVGIQ series, but a detailed review of black-white differences in Bermuda can be found in (Mincy et al, 2009 ). A more recent and concise summary can be read here: an article from a major Bermuda newspaper). There are large B-W differences in earnings, crime, employment, and educational attainment in Bermuda. It seems implausible that all the same racial gaps would show up in Bermuda and the US, and yet have almost completely opposite causes. (Of course, someone might similarly argue that this makes discrimination-type explanations more plausible for both places!)

Further, even though the precise gaps were not reported, race was the strongest predictor of IQ scores in McCartney et al, 1982 (p. 138 ). Which brings us back to the question of why five out of six samples showed the IQ scores that one would more reasonably expect for an island that is two-thirds black and one-third white. Given this and the large B-W social disparities in Bermuda, I’m inclined to accept these results over the ALL, but I confess to a significant amount of uncertainty over all these conflicting test scores.

࿔࿔࿔

REFERENCES
Astwood, N.C. (1974). A comparison of American and Bermudian children on the Wechsler intelligence scale for children-revised. Unpublished doctoral dissertation, Adelphi University, USA.

Mincy, R.B., Jethwani-Keyser, M., & Haldane, E. (2009). A Study of Employment, Earnings, and Educational Gaps between Young Black Bermudian Males and their Same-Age Peers. NY, USA: Columbia University School of Social Work.

8 Comments

Ya, this doesn’t look terribly good for a hereditarian hypothesis. See pages 13-14 here also. Bermudans consistently score at the ~47 or so percentile of the US Terra Nova normative sample (in math and reading). (The Terra Nova is the California Achievement Test.) Based on the 2000 NAEP Main assessment, the US national score was 0.33 SD below the White score. Assuming a semi-equivalence between Terra Nova and NAEP, the Bermudian score is about 0.4 SD below the US White score i.e., AQ 94. If White Bermudans had an AQ of 100, the lowest AQ that Black Bermudans could have is one of 90 — that is, of course if White Bermudans had an AQ of 100.

You’ll have to forgive me for not reading the entire 53 page ALL report (Riley, 2006), but while looking at how the participants were chosen, I notice this problem they encountered:

“However, during this extended data-gathering period, two major unexpected setbacks were encountered. Firstly, a general election was called in July 2003, with both political candidates and interviewers calling on some of the same households. This led to some householders rejecting the visits of the interviewers. Secondly, on 5 September 2003, the worst hurricane in Bermuda’s recent history, Hurricane Fabian, struck the island causing millions of dollars in damage. Householders were, therefore, preoccupied with getting their lives back to normal.”

The first setback seems innocuous enough, but the second one, the hurricane, is of interest. Perhaps the hurricane forced the occurrence of an elite sample? It’s reasonable to assume that those least burdened financially by the hurricane would have come from higher SES households then the general population.

The larger 300 page report linked above (Desjardins, 2005) goes into more details about the sampling procedures. Survey weights are used to correct for potential nonresponse problems like that (p. 326). But Bermuda also had the highest response rate of any country (82%, p. 327). And that’s actually a lot higher than the response rate of most of the other nations (e.g. Switzerland (40%) or Norway (56%)).

It’s unusual that (Riley, 2006) says the hurricane interrupted data collection, because the hurricane happened in September 2003, but the Desjardins report says that Bermudian data collection ended in August 2003 (p. 322).

This actually belongs in the comments of another post (Fuzzy Wuzzy was a bear…) but I didn’t want it to get lost in the archives.

A while back I did an analysis of Hispanic immigrant children performance as measured in the New Immigrant Survey, which used four subtests on the Woodcock Johnson III Tests of Achievement Battery. I was reminded of it because the test scores seemed similar to what’s mentioned in the Fuzzy Wuzzy post. Anyway, I blogged the results., and while I have since taken down the blog, the wayback machine remembers.

The Latino immigrant kids scored a cumulative of around 94 regardless of whether tested in English or Spanish at age 12, against a (white?) mean of 100. Non-latino immigrant children averaged around 96.

I don’t know enough about the literature to know if these results add anything, but I thought I would mention them.

“Described by the Washington Post as an “African-American who appears to be white,”[5] Butterfield goes out of his way to tell people that he is African American. He has noted having grown up on the “black” side of town, and led civil rights marches. He is proud of his Black identity.[6] and is a member of the Congressional Black Caucus.”