Jensen and Bias: An Exchange

In response to:

I have read with interest and some amazement Stephen Jay Gould’s review of Jensen’s book on Bias in Mental Testing [NYR, May 1]. I am surprised to begin with, that a technical book on a rather difficult psychometric aspect of psychology should be reviewed by a non-psychologist. Technical knowledge and competence in this field would seem to be primary requisites for a proper review, and as the contribution by Gould makes quite clear, he is little more than an amateur in this field. It seems doubtful whether you would have asked an eminent theologian to review a book by Dr. Gould on evolution; books on psychology require a degree of knowledge and expertise which outsiders simply do not possess.

This is quite clear when we consider just one question, namely the usefulness or otherwise of the concept of general intelligence. Gould states, correctly, that different solutions of the rotational problem in factor analysis lead to different answers, but he fails to mention that experimental studies, results of applied psychological work, and many other sources of evidence favor the notion of a strong general factor of intelligence as compared with alternative solutions. Profesor Quin MacNamar in his presidential address to the Psychological Society cited a number of these reasons, and Professor Philip Vernon, a former President of the British Psychological Society, performed a similar service in the pages of the American Psychologist. Dr. Gould prefers his own untutored and prejudiced interpretation to those of the experts, without giving any reason; this is hardly a sign of a review written in the proper impartial scientific spirit, and in full knowledge of the facts.

One line of evidence which has become very important in recent years, and which supports the hypothesis of the strong general factor of intelligence, is work with non-cognitive tests, such as measures of reaction time, speed and sensory discrimination, and in particular EEG evoked potentials. None of these are tainted by cultural factors, and all are reactions to simple sensory stimuli, yet they all correlate highly with IQ as measured by traditional tests. In our own recent work EEG evoked potentials show correlations with Wechsler IQ as high as does the Wechsler IQ with, say, the Binet IQ. In other words, non-cognitive tests of this kind, which aim to disclose the biological basis of intelligence, correlate as highly with IQ tests as these correlate with each other. No assessment of the hypothesis of the existence of intelligence as a powerful factor can be considered meaningful which leaves out some of the most important sources of evidence supporting it.

Employment of a reviewer who is not only not an expert in the field in question, but whose previous writings have marked him out as a bitter opponent of the author of the book reviewed is not the only index that suggests that whoever commissioned the review was more interested in propaganda than in truth. Infinitely worse than anything Gould has to say is the cartoon showing Arthur Jensen in Brown Shirt uniform, with a Hitler moustache and haircut. This truly is guilt by association with a vengeance, and the cartoon could take its place with honor in the collection of similar works of art published by Streicher in Der Stürmer, the Nazi journal notorious for its obscene drawings ridiculing political opponents and Jews alike. It is sad to see a journal having some pretentions to respectability descending into such depths.

Let me end up by ‘saying the Gould’s belief that Jensen is retreating from his position that racial differences in IQ are not only observable, but may be due to genetic causes, is quite wrong. He did not deal with this problem in the book under review at all, but if he had he could have pointed to the important fact that recent work has shown Japanese and Chinese groups to be very significantly superior on (white-made!) IQ tests to white groups. This reduces to shambles any attempt to account for observed differences in terms of biased tests, higher socio-economic status, better schooling, etc. Gould of course doesn’t mention these facts; they do not fit easily into his preconceived ideas. May I suggest that next time a psychological book is up for review, a psychologist be employed to review it, and that the choice be made of one who is not known for his hostility to the author in question. Journals like yours should be impartial, not given to propagandist exercises of this kind.

H.J. Eysenck

Institute of Psychiatry

Denmark Hill, London, England

To the Editors:

Stephen Jay Gould makes an heroic effort to distinguish “statistical” bias from “vernacular” bias in his review of Arthur R. Jensen’s Bias in Mental Testing, but he has done such a good job of explaining what “S-bias” is that he has convinced me that “S-bias” is exactly what ordinary people have in mind when they say a test is biased, and exactly what Judges Robert Carter and Robert F. Peckham had in mind when they threw out police tests and IQ tests. They meant that if blacks scored lower in these tests it was because the tests were unfair to blacks, in a way in which they were not unfair to whites. The issue in neither case was environment versus heredity, a subject which exercises Professor Gould but is not relevant to the question of whether (as Judge Robert Carter had to decide) a pencil-and-paper police test is one good way of deciding whether police applicants have certain capacities useful to being a policeman, or whether (as Judge Peckham had to decide) an IQ test is a useful component in a decision as to whether a child should be put in a class for slow children.

Stephen Jay Gould has imported into the vernacular notion of bias something that is just not part of it. When we say an employer is “biased” we mean he treats a black differently from a white, and by a different standard: There is no implication that he is treating that black differently because he thinks blacks are naturally inferior, or because they have Southern accents, or for any other reason. And when we say a test is “biased,” we imply nothing as to whether the people against whom it is biased perform differently because of heredity, environment, or what-not. So if Jensen has demonstrated that there is no “S-bias” in intelligence tests he has demonstrated something that is not at all trivial. A good number of Federal judges, and the lawyers who argue before them, believe there is “statistical bias” in tests, that they will not predict as well whether blacks will make good policemen, or firemen, as they will for whites, and it is for this reason these tests are being broadly superseded by quotas. If they are wrong, that is a matter of no small significance.

Nathan Glazer

Cambridge, Massachusetts

Stephen Jay Gould replies:

Dr. Eysenck’s remarkable letter contains five paragraphs (its only indisputable aspect). The first is an ad hominem attack upon me. The second makes a substantive claim, but supports it only by citing authority and completely avoiding either fact or argument. The third—the only meat in a sandwich surrounded by too much very stale (if not moldy) bread—makes a point, to which I will return. The fourth (insofar as it concerns me) is another ad hominem attack. The fifth combines yet another ad hominem attack with a substantive point, to which I will also return.

This smokescreen of vituperation shrouds a rather surprising thing—that Dr. Eysenck has chosen to say not a word about either of my two major criticisms of Arthur Jensen’s work: (1) my claim that the “bias” identified by Jensen as absent from mental tests is an unfamiliar statistical entity bearing no relationship to our vernacular meaning of bias as unfair assessment based upon cultural and environmental differences; (2) my argument that “intelligence,” depicted as a unitary and quantified entity, represents an unwarranted reification of correlations that need not be causal or, if causal, need not be viewed as innate. Let me then move to the only substantive points that Eysenck does raise.

Third paragraph: Eysenck would validate a notion of general intelligence by using the fallacious argument criticized in (2) above, but not discussed explicitly by him. He argues that IQ would be affirmed as a measure of “intelligence” if it correlates strongly with basic, neurological reactions of the brain that cannot (so he claims) be attributed to cultural or environmental differences. These include: (1) reaction time (in which an experimenter measures how long it takes a subject to react to a stimulus—time from seeing a flashing light to pushing a button, for example); and (2) EEG evoked potentials (in which an experimenter attaches electrodes to a subject’s head and records the timing and intensity of electrical responses within his brain to various stimuli). Eysenck then makes a specific claim about such studies—that they correlate as highly with IQ tests as various IQ tests correlate with each other. Ignoramus that I am, I dare not venture into this area of psychological professionalism. So let me, instead, simply cite the contrary opinions of an expert—namely, Arthur Jensen. In Bias in Mental Testing, the book that I reviewed, Jensen summarizes (p. 314) many studies on the correlation of Wechsler IQ with Binet IQ, the two tests that Eysenck chooses as his standard. Average correlation is 0.77 for Binet with the Wechsler adult scale (WAIS) and 0.80 for Binet with WISC (Wechsler intelligence scale for children). These high correlations are scarcely surprising since all these tests use similar material, and are constructed with the same end in mind.

On the correlation of IQ with reaction time (an area of Jensen’s primary research), Jensen writes (p. 691): “Neither I nor anyone else, to my knowledge, has been able to get correlations larger than about -0.4 to -0.5 between choice RT [reaction time] and IQ, with typical correlations in the -0.3 to -0.4 range, using reasonable-sized samples.” (The correlations are appropriately negative because short reaction time is supposed to accompany more intelligence. But they are much lower than the Wechsler-Binet correlation, not equal to it as Eysenck claims. The correlation coefficient, by the way, is a peculiar statistic with a highly assymetrical distribution that compresses differences at the upper end and magnifies them at the lower end. Thus, a correlation of 0.4 is not “half as good” as one of 0.8, but substantially less intense.)

Jensen is even more dubious about the literature on evoked potential, for he writes (p. 709): “The AEP average evoked potential and IQ research picture soon becomes a thicket of seemingly inconsistent and confusing findings, confounded variables, methodological differences, statistically questionable conclusions, unbridled theoretical speculation, and, not surprisingly, considerable controversy.” The only correlations he cites between IQ tests and evoked potentials average -0.28 with none higher than -0.35 (again, appropriately negative since less time between a stimulus and responding brain waves supposedly records more intelligence—but again vastly less than the Wechsler-Binet correlation, not equal to it). The -0.28 may be statistically “significant,” but vernacular and statistical meanings of the word “significant” are quite unrelated. A statistically “significant” correlation is not necessarily a strong one, but only one that can be adequately discriminated from a value of zero, or no correlation.

The relevant measure, in this case, is the coefficient of determination, or r (the correlation coefficient times itself). An ru2 of -0.28 means that variation in evoked potential accounts for a whopping 8 percent (-0.28 × -0.28, or 0.0784) of the variation in measured IQ! Jensen then casts further doubt upon the literature of evoked potentials: (p. 709): “Visual and auditory AEPs seem to yield quite different, even contrary results, visual latencies usually being negatively correlated with IQ, and auditory latencies being positively correlated. The directions of correlations also seem to flip-flop according to whether the IQs of the sample involved in the study are distributed mostly in the below-average range or mostly in the above-average range of IQs”

But even if the correlations were as high as Eysenck claimed, what would it mean? It wouldn’t validate a notion of inborn general intelligence. Who can say that childhood nutrition (both gastronomical and educational) does not affect the growing brain and induce variation equally recorded by reaction time and performance on mental tests?

Fifth paragraph: I am aware of the data on superior performance of oriental children on IQ tests constructed by caucasians, but regard it ludicrous to claim that such a result demonstrates innate, genetic differences in average intelligence between human groups. Putting a group other than one’s own on top may obviate a charge of overt racism, but this is a personal matter that has nothing to do with the validity of the general argument. Human groups differ in measured, mean IQ. This is an empirical fact; the data on orientals merely add one more case to the compendium. We still do not know how to assign these mean differences to the various theoretical factors that might account for it. Can we be sure that patterns of rearing and cultural traditions of Chinese and Japanese homes do not account for the small advantages recorded by their children in IQ tests derived by whites? After all, these wellmeaning whites have tried for more than half a century to eliminate from their tests those items that clearly embody WASP culture. But what is left is not a map of the genes.

Eysenck’s ad hominem smokescreen reminds me of LBJ trying to quell his Vietnam critics by claiming that his volumes of confidential facts should compel trusting silence. Arguments can be flawed, in their premises and internal logic, beyond any power of confirming numbers to validate. Jensen’s book is so flawed—and you don’t have to be a professional psychometrician to find the errors. If only card-carrying colleagues could understand and interpret each other, what would intellectual life be worth?

In contrast, Nathan Glazer does treat one of the major points I raised in my review. But I fear that he has misunderstood me. S-bias is a much narrower and numerical thing than he imagines. A test is S-biased if and only if the same score leads to a different assessment of what the test supposedly predicts (school grades or job performance) as a function of membership within a group—if, for example, the same IQ of 100 scored by both a black and a white applicant led us to predict better job performance for the black. Judge Carter tossed out the police test because too few blacks scored above the cut-off point, not because blacks with the same scores as whites were treated differently. He argued that the lower black mean probably records culture and environment rather than inborn endowment. This is V-bias. Jensen proves only that this narrow S-bias either doesn’t exist in mental tests, or can be recognized and eliminated when it does exist. He admits that V-bias cannot be excluded as a cause of mean differences in test scores between groups. Thus he invalidates the notorious argument of his 1969 article.