At what age does the cognitive ability gap between blacks and whites first appear? At what age does the black-white ability gap stop growing?

Knowing the answers to these questions is vital to understanding the etiology of the black-white ability gap, especially if this gap has an environmental cause. However, the only scholarly work that attempts to investigate these issues is John Loehlin’s Race Differences in Intelligence (1975), which is nearly 40 years old. So I will update and expand upon that review here on Human Varieties by summarizing all available measurements of African American cognitive ability from early infancy to age 3; I will also discuss the relevance of this data to current debates in the social sciences.

Current Research Opinions

The onset and development of the black-white intelligence gap hasn’t generated much serious scholarly interest. Sociologist Meredith Phillips notes that: “Research on how the black-white test score gap changes with age is sparse and contradictory.” (Phillips et al, 1998, p. 231)

A good example of this is that even the late Arthur Jensen, the world’s foremost expert on the black-white intelligence gap, wrote two contradictory opinions on the development of the gap.

One of his last major articles, in collaboration with the late J. Philippe Rushton, argued that the full gap is apparent at age 3, and that there is no further expansion:

“Racial-group differences in IQ appear early. For example, the Black and the White 3 year-old children in the standardization sample of the Stanford–Binet IV show a 1 standard deviation mean difference after being matched on gender, birth order, and maternal education (Peoples, Fagan, & Drotar, 1995). Similarly, the Black and the White 2 1⁄2- to 6-year-old children in the U.S. standardization sample of the Differential Aptitude Scale have a 1 standard deviation mean difference … (Lynn, 1996). The size of the average Black–White difference does not change significantly over the developmental period from 3 years of age and beyond (see Jensen, 1974, 1998b).” (Rushton & Jensen, 2005, pp. 240-241. Emphasis always mine.)

Their first reference in support of the bolded statement—(Jensen, 1974)—uses black siblings from a California school district, in lieu of longitudinal data, to look for a cumulative deficit in IQ scores as black children get older. The older siblings did not show lower IQs than their younger siblings over the age range from 6-12, which suggests a general stability in black IQ through the early years of schooling. This is supportive of the bolded claim. However, the second citation—Jensen’s magnum opus, The g Factor (1998)—is not supportive of the bolded claim. I’ll simply quote Jensen’s entire short discussion of this issue:

“Age Variation. Black infants score higher than white infants on developmental scales that depend mainly on sensorimotor abilities. Scores on these infant scales have near-zero correlation with IQ at school age, because the IQ predominantly reflects cognitive rather than sensorimotor development. Between ages three and five years, which is before children normally enter school, the mean W-B IQ difference steadily increases. By five to six years of age, the mean difference is about 0.70σ (eleven IQ points), then approaches about lσ during the elementary school years, remaining fairly constant until puberty, when it increases slightly and stabilizes at about 1.2σ. The latest (1986) Stanford-Binet IV norms show a W-B difference in prepubescent children that is almost five IQ points smaller than the W-B difference in postpubescent children. (The W-B difference is 0.80σ for ages 2 through 11 as compared with 1.10σ for ages 12 through 23.) This could constitute evidence that the mean W-B difference in the population is decreasing. Or it could simply be that the W-B difference increases from early to later childhood. The interpretation of this age effect on the size of the W-B mean difference remains uncertain in this instance, as it is based entirely on cross-sectional rather than longitudinal data. Both kinds of data are needed to settle the issue. The cause of variation in the mean IQ of different age groups all tested within the same year (a cross-sectional study) may not be the same as the cause of variation (if any) in mean IQ of the same group of individuals when tested at different ages (a longitudinal study).” (Jensen, 1998, pp. 358-359)

So according to Rushton and Jensen (2005), the black-white gap is fully formed by age 3, and does not widen thereafter, but according to Jensen (1998), much of the gap develops between the ages 3 to 5, and more than 1/3 of the gap forms after blacks enter school. Both sources even use the same standardization data from the Stanford-Binet IV to make contradictory arguments. (I suspect the explanation for this is that J.P. Rushton was responsible for most of the content in their collaborative papers during the 2000s.)

Oddly enough, the scholarly view was probably closer to Rushton and Jensen (2005) in 1998, and closer to Jensen (1998) in 2005! For example, Christopher Jencks and Meredith Phillips’ The Black-White Test Score Gap (1998) also argued that the B-W gap was about 1 SD by age three, without much further growth during primary school. In contrast, three popular papers published in the mid-2000s argued that performance gaps have narrowed appreciably among young preschool age children since the 1980s, and that much of the gap occurs after children enter school (Fryer & Levitt, 2004; Fryer & Levitt, 2006; Dickens & Flynn, 2006a).

Fryer and Levitt (2004; 2006) analyze math and reading scores in the Early Childhood Longitudinal Study (which started tracking kindergarteners in 1998). They find a B-W gap of 0.64 SD at school entry, which gradually expands to 0.83 SD by the end of third grade. The two economists are much more optimistic in the earlier 2004 paper, which only follows performance through the 1st grade. The gap at school entry is smaller in the ECLS than in previous longitudinal datasets (cf. Phillips et al, 1998), and it virtually disappears with a modest number of sociological controls, which Fryer and Levitt believe means that the gap has narrowed significantly since the 80s and 90s. But the 2006 paper—which follows the same children through the 3rd grade—takes on a more somber tone because the gap grows steadily each year, and their socioeconomic controls cease making a large difference in the size of the gap.

Dickens & Flynn (2006a) analyze nine standardization samples from four different major IQ tests and they also find a narrowing gap over time. A literal reading of their source data would seem to suggest that the gap is now .73 among young children (e.g. the latest Stanford-Binet standardization from 2001 shows an 11 point gap among children younger than 7, while the latest Wechsler standardization from 2002 shows an 11 point gap among 7 year olds). But Dickens & Flynn pool the data and extrapolate the “real” black-white gap from the trend lines. This gives them an estimated IQ of 95.4 for black 4 year olds in 2002, and an IQ of 90.5 for black 12 year olds.

Charles Murray subsequently looked for evidence of a narrowing gap in the Woodcock-Johnson battery, and found further support for Jensen’s (1998) original model of the B-W gap in early childhood:

“In infancy, the B–W difference can be close to zero (Fryer & Levitt, 2004). The difference rises through the preschool years, usually reaching about 0.70σ on full-scale IQ batteries by 5 to 6 years of age, then rising within a few years to about 1.0σ where it stabilizes for the rest of elementary school (Jensen, 1998). The results from the Woodcock–Johnson standardizations follow the common pattern for IQ tests. The B–W differences for children tested at age five [is] 0.57σ … rising immediately thereafter to [0.87σ ] at age 6 …” (Murray, 2007, p. 4)

So here we have two competing models of black-white IQ differences in early childhood: The stability model (Rushton & Jensen, 2005; Jencks & Phillips, 1998), posits that the black-white IQ gap is already about 1 SD by age 3, and does not grow significantly larger after children enter school. This is also the model that Loehlin et al (1975) found the most empirical support for in the 1970s.

Neither model needs to be completely true, or universally true. It’s certainly possible that the stability model fit the data better in the 1970s, but is no longer true since the 1990s.

Evaluating Growth vs. Stability

Fryer & Levitt’s (2004; 2006) dataset, the ECLS, is a large, representative, and recent sample, but their analysis does not provide strong evidence that racial gaps grow during elementary school. At least three different research teams have shown more stable gaps in the same dataset; all noting that Fryer & Levitt’s expanding gaps are a consequence of methodological choices (Murnane et al, 2006; Koretz & Kim, 2007; Bond & Lang, 2012). Murnane et al (2006) also examine math and reading gaps in another good dataset from the 1990s, the NICHD (National Institute of Child Heath and Human Development Study), and do not find growing gaps in the first three years of school.

A meta-analysis of 8 different surveys of schoolchildren conducted between 1965 and 1992 showed that math, reading, and vocabulary gaps only increased by .14 SD, on average, between the 1st and 12th grade (Phillips et al, 1998,p. 236). This is a comparatively small divergence, but it’s still larger than zero. It’s worth noting then that achievement test gaps shouldn’t necessarily exhibit the same patterns as intelligence test gaps. In fact, a reasonable prediction from an initial intelligence gap between two groups is that there will be an increasing divergence in learned skills. The effect of intelligence on learning plausibly increases over time since newer, more abstract skillsets are dependent on the numerous less abstract skills that one has already assimilated. Fryer and Levitt (2006) actually wrestle with this idea (pp. 273-279), but ultimately do not know how to test it.

More direct evidence, of course, can come from looking at performance on intelligence tests instead of on achievement tests. Jensen’s (1998) claim that the IQ gap grows from .70 SD at age 6 to 1 SD during elementary school and 1.2 SD by young adulthood was actually taken from Audrey Shuey’s data in The Testing of Negro Intelligence (1966). This can’t be treated as an accurate summary of the literature. Even the data in Shuey’s book doesn’t really support lower scores during preschool: Shuey notes that the IQ gap for preschoolers in studies published between 1922 and 1945 was 9 points, while the IQ gap for preschoolers in the studies from 1945-1965 was 16.5 points (p. 30). There was also a follow-up volume to this book covering all the data published between 1966 and 1979: The Testing of Negro Intelligence (Vol. 2) (Osborne & McGurk, 1982). While the first volume had 17 studies for children ages 2-6, the second volume listed 49 studies for this age group. Osborne and McGurk find a gap of 20 IQ points for preschoolers (1.33 SD), which is larger than their gaps for school children and adults (p. 291). The combined estimates of both volumes does not support a growth model. Of course, both books are old, and there is no volume 3 to document possible trends over the last several decades.

Dickens & Flynn (2006a) is therefore the most appropriate study to answer this question, because it looks at recent intelligence data from nationally representative samples. And, as I noted above, the most recent norms for the Stanford-Binet and the Wechsler Intelligence Scale for Children do lend credence to the growth model. Dickens and Flynn compare their estimates to Jensen’s: “Using Shuey’s 1966 data, Jensen (1998) estimated a gap of 0.70 standard deviations in early childhood, 1.00 standard deviations in middle childhood, and 1.20 standard deviations in early adulthood. Our current estimates are 0.31 (age 4), 0.63 (age 12), and 0.87 (age 18).” (Dickens & Flynn (2006b, p. 923)

James Flynn repeats this estimate—a 4.6 point gap at age 4—as a basic fact in his book What is Intelligence? (2007, p.123):

By 2002 the mean IQ of black American children aged 4 had risen to 95.4. This puts them less than 5 points below white 4-year-olds at 100. However, by the age of 24, blacks lose fully 12 points and sink to 83.4, almost 17 points below whites. In other words, they lose 0.60 points per year as they age.

According to Dickens & Flynn (2006b) there is no real data which contradict this estimate: “No recent data pose a serious challenge to our current [estimate] of Black IQ: 95.4 at age 4…” (p. 924)

A Meta-Analysis of African-American IQ at Age 3

My own view on this matter—informed by the scholarly review Race Differences in Intelligence (1975), and the studies by Montie & Fagan (1988) and Peoples et al (1995)—is that the full one standard deviation IQ gap between blacks and whites is apparent by at least 36 months (age 3) on standard IQ tests. But the first reference draws its conclusions from only a small number of older studies, and the results from the two more recent references may not be typical. No systematic review has ever been written on IQ gaps by age 3.

So over the past several weeks, I’ve combed the research literature for all existing data. I started with Audrey Shuey’s early literature review (1966), which contains 17 studies for preschool age children between 1922 and 1965. Only two of these studies were for children age 3 (Rhoads et al, 1945, & Horton et al, 1962). I only included studies for this analysis that reported, at minimum, a clear mean IQ and a sample size, which excluded the Horton study (the paper suggests that, compared with whites, twice as many black 3 year olds scored below the 4th percentile on the Merrill Palmer Mental Test in 1958).

Next I turned to Osborne & McGurk’s (1982) horribly organized follow-up volume, which included 49 studies for preschool age children published between 1966-1979. This yielded about 8 more usable studies.

Then I turned to Google Scholar and Google Books: combining the names of all known IQ tests that are designed to evaluate three year olds (“stanford-binet”; “mccarthy scales”, etc) along with standard race and age terms (“black”, “negro”, “african-american” “36 months” “age 3” “3 years”, etc). This yielded maybe 100 promising leads, though most books and research papers did not contain the desired data.

Next I analyzed two national surveys with available data for children in this age group.

I also emailed out requests to the publishers of the IQ tests with standardization data for 3 year olds. In several cases I have not yet received a reply, but one major publisher did write back to say that they don’t release their ethnic data to researchers under any circumstances. I will update this post in the future if any norm data is forthcoming.

Together these methods yielded 29 usable published studies. I excluded obviously redundant samples, although I did leave in a few borderline cases (e.g. Milgram & Ozer, 1967; & Milgram, 1971.)

Ultimately my dataset includes 35 different samples of children born between 1936 and 2000. There is IQ data for 2569 black children and 2762 white children, age 3. This information is reported in Table 1. Here’s a guide to reading the table:

Admin is the year the tests were administered. If the administration date wasn’t reported in the paper, I used the year prior to publication.

Birth is the birth year of the children. This was often decided by subtracting three years from the known or estimated administration date.

Age is reported in months. If the paper just said three years old, this is listed as 36 months. All ages from 30 to 41 months were included as age 3.

Sample is a representivity rating: “D” is for disadvantaged samples, selected on the basis of low socioeconomic status or other handicaps; “P” is for privileged samples, selected for higher than average socioeconomic status or other advantages; “N” is for normal samples that are not obviously or egregiously unrepresentative.

Test is the IQ test; the various test abbreviations are listed under Table 1. The next six columns are the IQs, standard deviations (SD) and sample sizes (N) for the black and white children tested. The d column is the standardized mean difference between the white and the black samples in the same study. I used 15 as a general standard deviation rather than the individual SDs in each sample.

Reference is the citation information. Different samples are separated by the alternating colored bands. Many references included IQ data for two or more tests on the same sample. Sometimes separate references reported different test data for the same sample (e.g. Berlin 1995 and Brooks-Gunn 1996), so they share the same colored band. One reference (Slaughter, 1983) contains four rows, because the same sample was tested at age 32 months with two different tests, and again at 41 months with two different tests.

Six of the 35 samples are my own analyses. The five samples labeled ‘Malloy 2013a’ are data from the CNLSY79 (Children of the National Longitudinal Survey of Youth 1979). I am not the first person to publish the IQ scores of the youngest children in this sample. Farkas & Beron (2004) reported that blacks score 17.2 points below whites on the PPVT in this dataset at age 36 months (p. 478). More recently, Bond & Lang (2012) reported a slightly smaller, 14.6 point gap for 3-year-olds in this dataset (p. 13). The reported values from these two papers are inconsistent, and neither paper contains specific sample sizes or separate scores by birth year, so I re-analyzed the data.

The CNLSY contained data for 284 black 3-year-olds and 533 white 3-year-olds born between 1983 and 1991. The average black score was 79.6 and the average white score was 96.3. This is a difference of 15.4 points (1.11σ), which is close to half way between the different scores reported by Farkas & Beron (2004) and Bond & Lang (2012).

The sample labeled ‘Malloy 2013b’ is my analysis of the Early Head Start Research and Evaluation dataset, which includes the largest sample size in Table 1. Once again, I am not the first person to publish the early IQ scores from this sample (Love et al., 2002; Coon, 2007; Berlin et al, 2009; Raikes et al., 2010), but I needed slightly more information than what was reported in these papers.

Results

I applied Flynn Effect adjustments to the data when possible, but the specific administration dates and test norms are not always reported. The weighted average IQ for the 14 disadvantaged black samples is 84.9, the average for the 16 normal samples is 86.5, and the average for the 5 privileged black samples is 99.4. The average IQ of all 35 samples is 86.7.

A majority of these samples contain control groups of whites who were tested at similar times, and under similar conditions. When we compare the 20 samples with both blacks and whites, we get a difference of almost exactly 1 standard deviation: the black IQ is 85.4 and the white IQ is 100.8 (15.4 points/1.03σ). A slightly different version of this is comparing the 16 normal black samples with the 13 normal white samples, which gives us virtually the same result: black IQ=86.5, white IQ=102.3 (15.8 points/1.05σ)

The 4 middle class white samples averaged an IQ of 106.6. This gives us a gap of 7.2 (.48σ) for the privileged samples. Similarly, the Early Head Start Project, the only sample with both disadvantaged blacks and whites, gives us an IQ gap of 7.8 (.52σ).

The test most frequently given to these early samples is the Stanford-Binet. The weighted average IQ of the 18 black samples tested on the Stanford-Binet is 88.5, while the average for the 9 samples of whites on this test is 104.6. This is a 16.1 point difference (1.07σ).

The second most commonly administered test at this age is the Peabody Picture Vocabulary Test. The average IQ of the 17 black samples tested on the PPVT is 79, while the average of the 9 white samples is 94.3. This is a 15.3 point difference (1.02σ).

Figure I is a scatterplot of these early black IQ scores over time by birth cohort. The oldest study tilted the trendline slightly downward, so I removed this outlier and the 5 decade trendline appears completely flat. There appears to be no discernable pattern of increasing or decreasing scores over time. I also looked at the B-W gap over time (The d column). Once again the initial trendline suggested an increasing gap, but removing the same outlier reveals no changes over time.

Figure I: Black IQ, Age 3: 1960s-2000s

Conclusion

Arthur Jensen argued that the B-W IQ gap increases from 0.70 SD at age 6 to 1.2 SD in adolescence. (Jensen, 1998). On the other hand, Arthur Jensen argued that the B-W IQ gap is about 1 SD as early as age 3 and does not grow significantly larger over time (Rushton & Jensen, 2005).

As usual, Arthur Jensen was right.

Fryer & Levitt’s (2004; 2006) popular research showing growing gaps in the first three years of school in the ECLS is probably not a genuine trend. A meta-analysis of 8 large surveys showed only slightly widening gaps on achievement tests from the 1st to the 12th grade (Phillips et al, 1998). Other researchers have not only found patterns contrary to Fryer and Levitt in other recent datasets, but also in the same dataset used by Fryer and Levitt (Murnane et al, 2006; Koretz & Kim, 2007; Bond & Lang, 2012).

Farkas & Beron (2004) used longitudinal data from the CNLSY to track children’s scores on the PPVT from early childhood through junior highschool. They showed that PPVT gaps were at least 1 SD at age 3, and do not continue to grow after children enter kindergarten, or as they progress through school (Figure II).

Much like Fryer & Levitt (2004), Dickens & Flynn (2006a) argue that performance gaps have dramatically narrowed among young children since the 1980s, and that gaps grow much wider after children enter school. Extrapolating from recent standardizations of the Stanford-Binet and Wechsler intelligence tests, Dickens and Flynn argue that the IQ gap among black and white four year olds since the year 2000 is only 4.6 points (.31σ). They even boldly state that “no recent data pose a serious challenge” to this estimate. (Dickens & Flynn, 2006b, p. 924)

This post adds evidence relevant to these interrelated issues by confirming for the first time that a gap of 1 full standard deviation is already apparent on IQ tests at 36 months of age, and that there has been no obvious convergence in this early performance difference over time.

If there is a 1 SD gap at age 3, this precludes an IQ gap that has room to grow much wider during school, unless A) the B-W school age gap is larger than we previously thought, or B) the IQ gap actually shrinks between ages 4-6, and then grows wider again later on. Neither of these theories is particularly compelling.

Evidence concerning later gaps will have to wait, though; my next post moves in the opposite direction. Part two of this post will discuss research opinions on the measurement of intelligence before 36 months, and summarize the data on cognitive gaps from early infancy through age 2.

࿔࿔࿔

REFERENCES

Bakeman, R., & Brown, J.V. (1980). Early interaction: consequences for social and mental development at three years. Child Development, 51, 437-447.

Campbell, F.A., & Ramey, C.T. (1995). Cognitive and school outcomes for high-risk African-American students at middle adolescence: Positive effects of early intervention. American Educational Research Journal, 32, 743-772.

Horton, C.P., & Crump, E.P. (1962). Growth and Development XI. Descriptive Analysis of the Backgrounds of 76 Negro Children Whose Scores are above or below Average on the Merrill-Palmer Scale of Mental Tests at three Years of Age. Journal of Genetic Psychology, 100, 255-265.

Koretz, D., & Kim, Y.S. (2007). Changes in the Black-White Test score Gap in the Elementary School Grades (CSE Report 715). National Center for Research on Evaluation, Standards, and Student Testing, University of California, Los Angeles.

Murray, C. (2007). The magnitude and components of change in the black–white IQ difference from 1920 to 1991: A birth cohort analysis of the Woodcock–Johnson standardizations. Intelligence, 35, 305-318.

Raikes, H.H., Chazen-Cohen, R., Love, J.M., & Brooks-Gunn, J. (2010). Early Head Start impacts at age 3 and a description of the age 5 follow-up study. In A.J. Reynolds, A.J. Rolnick & M.M. Englund (Eds.), Childhood Programs and Practices in the First Decade of Life: A Human Capital Integration (pp. 99-118). NY, USA: Cambridge University Press.

Rhoads, T.F., Rapoport, M., Kennedy, R., & Stokes, J. (1945). Studies on the growth and development of male children receiving evaporated milk: II. Physical growth, dentition, and intelligence of white and negro children through the first four years as influenced by vitamin supplements. Journal of Pediatrics, 26, 415-454.

Slaughter, D. T. (1982). Longitudinal Assessment of the Intelligence of Black Infants, Ages 22 to 41 Months. In G.A. McWhorter (Ed.) Studies on Black children and their families: papers presented at 6th National Council for Black Studies Conference 1982 (pp. 1-32). Afro-American Studies and Research Program, University of Illinois.

3 Comments

That the B-W test score gap is completely established by age 3 refutes environmentalists claims across the board. B-W differences are a common feature of childhood environments (e.g. effort in school, tv watching, diet, peer groups, extracurriculars). That after all that, for ages 3 through 18, the score gap remains unchanged shows pretty convincingly that none of that stuff really matters.

It was disappointing to see Arthur Jensen get such an important fact wrong, it had always seemed to me that A.J. was one of the few (along with Linda Gottfredson) who had a reasonable grasp of IQ, race and it’s context and importance, but I might have to revise.

The sample sizes here are extremely low. 80% of them are under 100. The largest two are based on your own unpublished analyses. Dickens & Flynn, Murray, and Jensen (1998) all base their conclusions on standardization samples.