The Estimation of Pre-Morbid Cognition

By Bill Masters

“Imagine someone saying, ‘But I know how tall I am’ and laying his hand on top of his head to prove it.”
L. Wittgenstein

Introduction

For the defense, neuropsychology has become a bane. Increasingly, neuropsychologists are in court testifying that this or that defendant damaged the brain of this or that plaintiff. What’s disconcerting is that, almost invariably, these diagnoses of brain damage are inferred not from hard signs but, instead, from non-specific symptoms and plaintiff’s responses to various neuropsychological tests.

For instance, in the personal injury case, the typical neuropsychological assessment involves the following basic steps:

The neuropsychologist administers a battery of sophisticated neuropsychological tests to establish plaintiff’s current level of cognition;

The neuropsychologist then estimates, using one or more imprecise methods, plaintiff’s pre‑injury level of cognition;

If plaintiff’s current level of cognition is lower than her pre-injury level, the neuropsychologist diagnoses a deficit in cognition;

The neuropsychologist then identifies and rules out alternative causes of that deficit;

After ruling out alternate causes of plaintiff’s cognitive deficit, the neuropsychologist assigns the cause of the deficit to the trauma for which defendant is allegedly responsible.

What is most striking in this diagnostic process is step 2 in light of step 1: the lack of symmetry in the rigor of assessing, on the one hand, the current level of cognition and, on the other hand, estimating the pre‑injury level of cognition. Neuropsychologists, to assess cognition accurately, expend enormous efforts developing state-of-the-art cognitive tests. The most popular neuropsychological test, the Weschler Adult Intelligence Scale, for example, has evolved from the Weschler-Bellevue (1939) to the WAIS (1955) to the WAIS-R (1981) to the WAIS-III (1997). Continuing refinements in these sophisticated tests are considered essential to assess cognition accurately. Yet this same rigor is not applied when estimating pre-injury cognition, despite the need for a similar level of sophistication. This lack of symmetry is the neuropsychologist’s Achilles’ heel.

· ·

Two basic methods exist for assessing whether a plaintiff’s pre‑injury cognition differs from her current level of cognition. One is that in which the comparisons of measures are direct and the other is that in which the comparisons are indirect. Direct comparison involves comparing the results of the same kind of test or battery of tests for cognition given before the injury with those given currently. (The pre-injury testing should have been sufficiently recent so that aging does not become a confounding variable.) For example, a year before her accident plaintiff was administered the WAIS R. (WAIS-R is constituted of 11 subtests; six of which are verbal scale subtests used to establish verbal IQ, and five performance scale subtests used to establish Performance IQ.) Her scores were as follows: Full Scale IQ (FSIQ): 112; Verbal IQ (VIQ): 116; Performance IQ (PIQ): 106. After her accident, she was again administered the WAIS-R. Her scores were as follows: FSIQ: 100; VIQ: 102; PIQ: 98. Because her pre-injury FSIQ was twelve points higher than her post-injury FSIQ (and the mean test-retest variability for FSIQ is only 6.2 points) the neuropsychologist infers that the accident caused plaintiff’s deficit in cognition.

But this method is direct only to the extent the measure of pre-injury cognition is substantially similar to or significantly correlated with the measure of current cognition. Some neuropsychologists would substantially relax the “similarity requirement.” To them, direct comparison includes comparison of a variety of cognitive tests, even those whose correlations are low.

For example, after plaintiff’s accident her neuropsychologist administered the WAIS-R to estimate her current level of cognition. The results were FSIQ: 103; VIQ: 104; PIQ: 102. For a standard of comparison, the neuropsychologist then began to look into plaintiff’s pre-injury testing history for tests that may have been administered, such as the Wechsler Intelligence Scale for Children (WISC and WISC-R); Stanford-Binet Intelligence Scale; Wechsler Preschool and Primary Scale of Intelligence (WPPSI); Wechsler Adult Intelligence Scale (WAIS); Scholastic Aptitude Test (SAT); Graduate Records Examination (GRE); Armed Forces Qualification Test (AFQT); Armed Forces Classification Battery (AFCB); and General Ability Test (GAT). She eventually found the results of plaintiff’s SAT. The results were verbal, 507; mathematics, 702; for a total score of 1209. Comparing these pre-injury SAT results with the post-injury WAIS-R results, the neuropsychologist inferred that plaintiff had a cognitive deficit and that the deficit was caused by plaintiff’s accident.

This comparison does not intuitively seem particularly “direct.” SAT is touted to be an aptitude test designed to measure the capacity to learn what is taught in college. An aptitude test such as SAT is closely related to tests of intelligence, such as WAIS-R. Both measure capacities for the relatively constant behavior to learn. Critics of SAT argue, however, that SAT is a measure of achievement not aptitude, and that the correlation between the results on the SAT and academic performance in college is very modest (r is less than .50). What, in fact, then is the correlation between results on the SAT and results on the WAIS-R? Surprisingly, no study exists which establishes that correlation. We would expect, as a rule of thumb, a very modest correlation (r < .50 ), but can we be sure?

Comparing the results of the SAT with the results of the WAIS-R seemed suspect, an intuition that once pursued proved correct. But now suppose that before the accident, plaintiff had been administered the WAIS. The results were FSIQ: 112; VIQ: 116; PIQ: 106. After the accident, she was administered the WAIS-R. The results are FSIQ: 103; VIQ: 108 PIQ: 98. Because plaintiff’s FSIQ after the accident was lower than her FSIQ before the accident, the neuropsychologist inferred that the accident caused plaintiff to have a deficit in cognition. Now, this conclusion does not seem particularly suspect, but we need to ask, what is the relationship between WAIS and the WAIS-R? In answer, we find that the mean scores on the WAIS-R are not directly comparable to those on the WAIS. WAIS verbal, performance and full scale IQs are about 7, 8, and 9 points higher, respectively, than those on WAIS-R.

So use of WAIS-R results in significantly lower estimates of FSIQ than use of WAIS. It would be tempting for plaintiff’s neuropsychologist to compare, without adjustment, plaintiff’s pre-injury performance on WAIS with her post-injury performance on WAIS-R. If, in fact, plaintiff had no cognitive deficit, her pre-injury performance on WAIS would likely be higher than her post-injury performance on WAIS-R, enabling her neuropsychologist to use the disparity as evidence of deficit in cognition.

· ·

Most often the neuropsychologist will not have a basis for direct comparison. So she will need to resort to comparisons considered “indirect.” Indirect comparisons have essentially four forms: (1) current ability (or pattern‑analytic) estimates; (2) demographically‑based regression (or actuarial) estimates; (3) best performance (or human judgment) estimates; and (4) estimates using combinations of (1) and (2).

Current Ability Estimates

Two ways exist to assess current ability: (1) analysis of the “scatter” (defined as intersubtest variability) of certain WAIS‑R subtests and (2) assessment of reading and vocabulary skills using specialized tests such as the National Adult Reading Test (NART) and the North American Reading Test (NAART or NART-R).

An index developed using the distinction between hold and don’t hold tests, to evaluate pre-injury cognition, is called the “deterioration quotient.”[1] That quotient is calculated as follows:

DQ = Hold Subtests — Don’t Hold Subtests

Hold Subtests

As is apparent from the relationship of the variables in this index, the lower the score or value on the don’t hold subtests, the higher the value of DQ, and, by inference, the more likely there is a deficit in cognition.

But analysis of scatter, as a method of estimating premorbid cognition, is based on the following faulty assumptions:

Ø Significant “scatter” does not occur in the WAIS-R in the normal population. (In fact, significant “scatter” occurs in normal populations on the WAIS‑R);

Ø The test-retest reliabilities for subtests of the WAIS-R are near perfect. (In fact, the test‑retest reliabilities for WAIS‑R subtests are less than perfect);

Ø The standard errors of measurement of the subtest scaled scores are insignificant. (In fact, the standard errors of measurement of the subtest scaled scores can be significant);

Ø The correlation between the hold subtests and FSIQ is 1.0. (In fact, the correlation between the hold subtest and FSIQ is less than unity);

Ø The correlations across the hold subtests is high. (In fact, the correlation across most pairs of subtests is low.)[2]

As a result of these faults, many neuropsychologists consider this method of estimation to be inadequate.[3]

Reading and Vocabulary Skills

Another way to assess “current abilities” is to identify and assess plaintiff’s reading and vocabulary skills. These skills are highly correlated with general intelligence. But what’s more, because these skills are overlearned, they are less likely to deteriorate upon injury to the brain.

Tests have been developed to measure these verbal skills. Of particular interest for assessing pre-injury cognition are tests of the ability to pronounce certain words. These tests include the National Adult Reading Test (NART and NART 2) and the North American Adult Reading Test (NAART or NART-R). All these tests measure an individual’s ability to pronounce correctly words that have unconventional pronunciations. Using words with unconventional pronunciations is designed to nullify the effects of previous familiarity with such words independent of present ability to analyze and employ them.

NART uses 50 “irregular” words familiar to Britons that cannot be pronounced through the use of common phonetic rules. Such words include “goaled,” “radix,” and “demesne.” But NART estimates FSIQ on WAIS, not WAIS‑R. So use of NART to assess performance on WAIS‑R would result in an overestimation of pre‑injury FSIQ (favoring plaintiffs).

NART 2 is also standardized on Britons. But it estimates FSIQ on WAIS-R. Even so, because standardized on Britons, NART or NART 2, if used with Americans, would tend to result in more errors and hence a lower value for estimated pre-injury FSIQ (favoring defendants).

NAART (or NART-R) uses 61 irregular words familiar to North Americans (Canadians and Americans). Words included are, for example, “psalm,” “hiatus,” “drachm” and “campanile.” It, rather than NART or NART 2, is appropriate for use with English-speaking Americans.

Is performance on NAART (or NART-R) highly correlated with FSIQ? The results of studies designed to answer this question are equivocal. Performance by Americans on NAART (NART-R) and FSIQ correlated at .66 in one study,[4] at .70 in another study,[5] but at only .46 in yet another study.[6]

NAART (NART-R) has additional shortcomings. NAART overestimates FSIQ for those with FSIQs below 100 and underestimates FSIQ for those with FSIQs over 110.[7] NAART (NART‑R) may also not be indicative of general intelligence because it does not predict WAIS-R PIQ. NAART (NART-R) is also not particularly useful in estimating pre-injury cognition in those with impaired skills in use of the English language. That is, it underestimates pre-injury FSIQ in those with poor literacy skills, speech impairments, or dyslexia, or in those for whom English is a second language.

Nor is NAART (NART-R), based on studies of NART, particularly useful in estimating pre-injury cognition in those with dementia. In those cases, NAART (NART R) would likely underestimate pre-injury FSIQ. In cases of moderate dementia, NAART (NART-R) would likely underestimate pre-injury FSIQ by about 15 IQ points.[8]

III. Demographically-Based Estimates

Another method of assessing pre-injury levels of cognition is with the use of a demographic formula. “Demographic” data such as age, sex, race, education and occupation are crunched with multiple regression techniques to arrive at the formula. The formula keyed to the WAIS is called the Wilson index and that keyed to WAIS-R is called the Barona index. Of the two, the Barona index is more appropriate.[9] The Barona index takes the following form:

These demographic variables, such as age and education, would take those values applicable to the particular individual being evaluated. For example, a 52-year-old white male with 20 years of education employed as an attorney in the state of Washington would have an estimated FSIQ in the range of 94.38 to 142.94, based on the following values: 54.96 + 0.47(6) + 1.76(2) + 4.71(3) + 5.02(6) + 1.89(6) + 0.59(3) ± 24.28.

An assumption underlying the Barona index is that values of the formula correlate highly with WAIS-R IQ in the normal population. But this assumption is faulty. The Barona index, for instance, correctly classified subjects within the WAIS-R IQ range 38%, 24% and 36% for VIQ, PIQ and FSIQ, respectively. So the Barona index does not predict IQ better than chance.[10]

Demographic formulas provide estimates, not exact predictions. So they should be viewed as providing a range within which is the true pre-injury level of cognition. But this range is huge. The standard error of estimate (SEE), using the Barona index, for FSIQ is 12.14. That creates a broad range of ± 24 IQ points, within which would be found the true mean of the population.

Demographic regression methods are further limited because they tend to predict scores toward the population mean, and so are less able to estimate IQs of specific individuals. The Barona index overestimates pre-injury IQ in those with low IQs and underestimates pre-injury IQ in those with high IQs. So legitimate use of this formula is limited to those who fall in the middle of the IQ spectrum. As one researcher remarked, “When evaluated in terms of the classification of individual patients, * * * the accuracy of * * * estimates [based on demographic formula] is seen to be distressingly low.”[11]

This index also tends to overestimate pre-injury levels of cognition in those with psychiatric disorders.[12] So be alert to the plaintiff’s neuropsychologist who uses this index to estimate pre-injury cognition in a plaintiff with either a low or high FSIQ or with a psychiatric disorder.

Best Performance Estimates

The third method of estimating pre-injury cognition is called the “best performance method” (BPM). In applying this method, the neuropsychologist scours the plaintiff’s “history” in search of what the neuropsychologist considers to be a marker of plaintiff’s best cognitive performance “whether it be the highest score or set of scores, unscoreable behavior not necessarily observed in a formal testing situation, or evidence of premorbid achievement. . .”.[13] That highest level of cognitive functioning then becomes the standard against which current levels of cognitive function are measured. (Some neuropsychologists, when referring to the “best performance method,” refer to the practice of using plaintiff’s highest WAIS‑R subtest scores to determine pre-injury general intelligence.)

The BPM involves three search parameters. First is what the neuropsychologist takes to be the scope of the plaintiff’s relevant history. Second is, within that history, what she takes to be the appropriate markers of cognitive performance. And third is, of those markers of cognitive performance, which she considers to be the most relevant indicia of best performance. Obviously, these parameters provide the neuropsychologist with significant interpretative space within which to maneuver in estimating pre-injury cognition.

As a result, the BPM has a significant potential for forensic misuse. This potential is illustrated by a rough analogy: Imagine a baseball historian looking through the batting averages of a player for his highest season average and then concluding that the highest season average represents his overall batting ability, the standard against which all other indicators of his hitting performance must be measured. Later, games in which his batting performance fell below this standard would be interpreted to indicate periods of physical disability. This analogy illustrates that the BPM is subject to well‑documented types of judgment error, particularly confirmatory bias and disregard of the phenomenon of “regression to the mean,” or variation due to chance.

BPM is based on the following faulty assumptions:

ØIn the healthy population, any single test or skill probably provides a reasonable estimate of performance on all other kinds of intellectual tasks. This assumption is unwarranted. Considerable intra‑individual scatter in test scores occurs in healthy people.

ØAn individual cannot perform at a higher level than one’s biological capacity will permit. This assumption is also unwarranted. The highest test score will probably have component of significant positive error and the lowest test score will have a component of significant negative error. All else being equal, the component of error in extreme scores will increase as the number of tests increases. So an observed score will be higher than the true score for about half of all subjects and in this sense higher than biological capacity.[14]

ØThe marker(s) selected to exemplify this individual’s best performance correlate highly with FSIQ. This assumption may or may not be true.

ØLittle or no scatter exists among WAIS-R subtest scores. This assumption is false; significant scatter occurs in normal and clinical populations on the WAIS-R.[15]

The BPM will systematically overestimate pre-injury cognition.[16] So it is not surprising that use of the BPM is the hallmark of the plaintiffs’ oriented neuropsychologist.

Estimates from a Combination of Methods

The fourth method of estimating pre-injury cognition is some combination of the foregoing methods. The current combinations include the following: (1) Demographic variables and select subtests of WAIS-R; and (2) demographic variables and tests of reading and vocabulary skills such as NART-R.

An idea behind these combinations is that the weakness in one of the methods is offset by the strength of the other method. That is, the scores on the demographic formula predict the mean or average scores of a group of people. The weakness of the demographic formula is that it tends to underestimate high IQ scores and overestimate low IQ scores. For example, if you administered the WAIS-R to a room of attorneys of the same age, race, and gender, you would have varied test results for FSIQ. To estimate the FSIQs of these same attorneys using a demographic formula, you would equate the FSIQ to the mean score of the group.

But in estimating IQ for forensic purposes, we are interested not in the average score of a group of demographically similar people, but in a particular plaintiff’s score. In this regard, select use of subtests of WAIS-R has the alleged strength that the results of such subtests are specific to the plaintiff being evaluated. But does the strength of using individual specific measures of cognition, in fact, compensate for that weakness in the use of the demographic formula alone?

In considering answers to that question, we might be justifiably skeptical. First, we notice that the demographic formulas include independent variables for education and occupation; both are correlated with vocabulary and reading ability, and so are not statistically independent. Why would adding independent variables significantly correlated with independent variables presently in the demographic formula increase predictive accuracy of estimating pre-injury cognition? The answer is that perhaps unshared variance exists in these variables that relate to FSIQ.

What, then, is the appropriate measure of predictive accuracy? Is it enough that the multiple correlation coefficient R (or the coefficient of determination, R2) for the actual pre-injury WAIS-R FSIQ and the predicted FSIQ from the combined method be increased over the R (or R2) for the actual pre-injury WAIS-R FSIQ, and the predicted FSIQ for the demographic formula and the predicted FSIQ for the WAIS-R subtest scores, or for the scores on NART-R? If so, then adding more independent variables (even variables which are not statistically independent) will result in the model being more predictive. This is simply because R (or R2) necessarily increases when more independent variables are added to the equation. As a result, R (or R2) is an inadequate measure of predictive accuracy. Instead, we would want to have an increased “adjusted R2” and a reduced SEE before believing that the combination formula improves prediction.[17]

Demographic Variables and Subtests of WAIS-R

This approach is a statistical synthesis of actuarial and clinical prediction. Two examples of this approach are the Vanderploeg equations[18] and the Oklahoma Premorbid Intelligence Estimation.[19] The Vanderploeg equation for FSIQ using the Vocabulary subtest of WAIS-R is as follows:

This formula would predict the pre-injury FSIQ of a 52-year-old white attorney with a Vocabulary scaled score of 17 to be between 114.77 and 149.03.

Unfortunately, this equation overestimates the IQs of normal subjects, underestimate the IQs of normal elderly people, and is not superior to the Barona equation.[20]

The Oklahoma Premorbid Intelligence Estimation (OPIE) is a method of combining demographic variables and current WAIS-R Vocabulary and Picture Completion subtests to estimate pre-injury cognition. (In the Vocabulary subtest, the examinee is asked to define 35 vocabulary words of increasing difficulty; in the Picture Completion subtest, the examinee is asked to tell what important part is missing from 20 incomplete pictures of human features, familiar objects, or scenes arranged in increasing order of difficulty). This formula is as follows:

This formula would predict the pre-injury FSIQ of a 52-year-old white attorney with a Vocabulary raw score of 68 and Picture Completion raw score of 16 to be between 78.89 and 105.53.

This estimate of FSIQ seems low. A Vocabulary raw score of 68 is equivalent to a scaled score of 17 where the mean scaled score is 10. A Picture Completion raw score of 16 is equivalent to a scaled score of 10, the mean scaled score. Should those scores on these WAIS-R subtests at or above the mean generate a range of FSIQ that falls, for the most part, below the mean FSIQ score of 100?

The weakness of combined measures using subtests of WAIS-R is that considerable evidence exists that performance on those subtests declines with neurological deterioration.[21]

Demographic Variables and Language Skills

Those language skills combined with demographic variables are either vocabulary or the ability to pronounce certain words. The weakness of measures of language skills based on vocabulary is that vocabulary is sensitive to neurological deterioration. However, the ability to pronounce certain words is less so. As a result, a word pronouncing test such as NART-R is often combined with a demographic formula to produce a combined regression equation that is hoped to be more predictive of FSIQ.

Studies testing this hypothesis are equivocal. One study purported to demonstrate a significant increase in predictive power for FSIQ with a change in R2 of .055 and a SEE of 8.83.[22] But the sample used did not include the neurologically impaired, a serious flaw. Another study purported to demonstrate no significant increase in predictive power. In that study, when all demographic variables were forced into the FSIQ regression equation along with the NART-R score, the percentage of variance accounted for increased by only 3.[23]

Conclusion

Neuropsychologists have no precise methods with which to estimate the FSIQ of a particular plaintiff. Direct methods of estimation are usually unavailable and when they are used, neuropsychologists rely on measures typically not substantially similar to or significantly correlated with the measure used to estimate current levels of cognition.

Indirect methods of estimation are equally imprecise, too imprecise for the kinds of fine distinctions forensic neuropsychologists often assert in court. Each type of indirect method should be viewed with suspicion as products of forensic neuropsychologists intent on shielding the Achilles’ heel of their evaluations. Estimates relying on measures of current ability are imprecise. Analysis of scatter is based on the faulty assumption that significant scatter does not occur in the normal population. Word pronunciation tests such as NAART have equivocal correlation with FSIQ, are not useful in a variety of circumstances, especially with those who have poor language skills or dementia. Estimates relying upon demographic formula are also imprecise. These formulas do not predict FSIQ better than chance, generate huge ranges of FSIQ within which is the true FSIQ, and, although the most objective method available, are among the least used in estimating pre-injury cognition.[24] Estimates relying upon combinations of current ability and demographic ability are also imprecise. These formulas are insufficiently validated and either underestimate or overestimate FSIQ, depending on the formula used. Finally, estimates using the best performance method are the most imprecise–designed to overestimate FSIQ in order to substantiate plaintiffs’ claims of neurological injury.

In short, the neuropsychologist has a choice of using two basic methods of estimation–actuarial (or demographic) or clinical. Both are inadequate to the task. The actuarial method contains substantial errors and the clinical method is prey to significant, largely ineradicable errors in judgment.[25]

· ·

Estimates of pre-injury cognition using one or more of these imprecise techniques is often proffered in litigation as clinical evidence. Proffered clinical evidence that is unreliable or invalid should be inadmissible under rules of evidence such as FRE 702.[26] FRE 702, for example, requires that proffered scientific evidence in the form of expert opinion be “knowledge.” “Knowledge” has been defined negatively as more than subjective belief or unsupported speculation, and positively as any body of known facts or truths accepted as such on good grounds. Within that set of beliefs characterized as knowledge the belief must also fall within that subcategory of beliefs characterized as “scientific” knowledge. Scientific knowledge is defined as belief derived by the scientific method, that kind of method based on generating hypotheses and testing them to see if they can be falsified.[27]

An important consideration in assessing the evidential reliability of a particular scientific technique is its “known or potential rate of error.”[28] If that rate of error is too high, the technique lacks validity. Clinical data from an invalid technique should be inadmissible.

This kind of analysis for validity is increasingly being applied to clinical assessments. Those assessments must be genuinely scientific as distinct from being unscientific speculation offered by a genuine scientist. Not only must the expert’s methodology have been objectively and independently validated but the expert’s inferences and opinions must not be too significantly underdetermined by the clinical data.[29]

These strict requirements of evidential reliability pose a problem for forensic neuropsychologists assessing pre-injury cognition. Simply, neuropsychologists, facing the forensic task of assessing pre-injury cognition and lacking a scientifically adequate technique or methodology for doing so, will do what no scientist would do: they will guess.

From the foregoing analysis of the reliability of techniques which neuropsychologists use to estimate pre-injury cognition, obviously the rate of error of these techniques is significant. The rate of error is equivalent to or, at best, slightly better than a rate that is random. That is a rate of error too great to be compatible with sound scientific methodology. Indeed, a technique with lower error rates than these techniques is the polygraph, and the results of polygraph tests are infrequently admitted into evidence.[30] As a result, in cases of alleged brain injury, the lesson for the defense is this: try hard to block the introduction into evidence of a neuropsychologist’s estimates of pre-injury cognition.

Federal Rule of Evidence 702 provides: If scientific, technical, or other specialized knowledge will assist the trier of fact to understand the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill, experience, training, or education, may testify thereto in the form of an opinion or otherwise.