Thursday, March 31, 2011

Monahan, J., & Walker, L. (2011). Twenty-Five Years of Social Science in Law. Law and Human Behavior, 35(1), 72-82.

In this essay, we take the publication of the seventh edition of the casebook Social Science in Law (2010) as an opportunity to reflect on continuities and changes that have occurred in the application of social science research to American law over the past quarter-century. We structure these reflections by comparing and contrasting the original edition of the book with the current one. When the first edition appeared, courts’ reliance on social science was often confused and always contested. Now, courts’ reliance on social science is so common as to be unremarkable. What has changed—sometimes radically—are the substantive legal questions on which social science has been brought to bear.

Courts occasionally permit psychologists to present expert evidence in an attempt to help jurors evaluate eyewitness identification evidence. This paper reviews research assessing the impact of this expert evidence, which we argue should aim to increase jurors' ability to discriminate accurate from inaccurate identifications. With this in mind we identify three different research designs, two indirectly measuring the expert's impact on juror discrimination accuracy and one which directly assesses its effect on this measure. Across a total of 24 experiments, three have used the superior direct methodology, only one of which provides evidence that expert testimony can improve jurors' ability to discriminate between accurate and inaccurate eyewitness identifications.

Juror and jury research is a thriving area of investigation in legal psychology. The basic ANOVA and regression, well-known by psychologists, are inappropriate for analysing many types of data from this area of research. This paper describes statistical techniques suitable for some of the main questions asked by jury researchers. First, we discuss how to examine manipulations that may affect levels of reasonable doubt and how to measure reasonable doubt using the coefficients estimated from a logistic regression. Second, we compare models designed for analysing the data like those which often arise in research where jurors first make categorical judgments (e.g., negligent or not, guilty or not) and then dependent on their response may make another judgment (e.g., award, punishment). We concentrate on zero-inflated and hurdle models. Third, we examine how to take into account that jurors are part of a jury using multilevel modelling. We illustrate each of the techniques using software that can be downloaded for free from the Internet (the package R) and provide a web page that gives further details for running these analyses.

One small portion of the paper explained why composite/cluster scores from IQ tests often are higher (or lower) than the arithmetic mean of the tests that comprise the composite. This observation often baffles test users.

I would urge those who have ponder this question to read that section of the report. And THEN, be prepared to be blown away by an instructional video Joel posted at his blog where he leads you through a visual-graphic explanation of the phenomena. Don't be scared by the geometry or some of the terms. Just sit back and relax and now recognize, even if all the technical stuff is not your cup-of-tea, that there is an explanation for this score phenomena. And when colleagues ask, just refer them to Joel's blog.

It is brilliant and worth a view, even if you are not a quantitatively oriented thinker.

Below is a screen capture of the start [double click on icon to enlarge]

An issue/question that has surfaced (not for the first time) is why markedly discrepant subtest scores that form a composite can still be considered valid indicators of the construct domain. Often clinicians believe that if there is a significant and large discrepancy between tests within a composite, the total score should be considered invalid.

The issue is complex and was touched on briefly in our report and in the NASP and CHC threads by Joel Schneider. Here I mention just ONE concept for consideration.

Below is a 2-D MDS analysis of the WJ III Cog/Ach tests for subjects aged 6-18 in the norm sample. MDS also finds structure as does factor analysis. This 2D model is based on the analysis of the tests correlation matrix. What I think is a major value of MDS, and other spatial statistics, is that one can "see" the numerical relations between tests. Although the metrics are not identical, the visual-spatial map of the WJ III tests does, more-or-less, mirror the intercorrelations between tests. [Double click on image to enlarge]

So....take a look at the Gc, Grw, or Gq tests in this MDS map. All of these tests cluster closely together. Inspection of their intercorrelations finds high correlations among all measures. Conversely, look at the large amount of spatial territory covered by the WJ III Gv tests. Also look at the Ga tests (note that a red line is not connecting Auditory Attention, AA, down in the right-hand quadrant with the other Ga tests). Furthermore, even though most of the Gsm tests are relatively cohesive or tight, Memory for Sentences is further away from the other Gsm tests.

IMHO, these visual-spatial maps, which mirror intercorrelations, tell us than in humans, not all cognitive/ach domains include narrow abilities that are highly interrcorrrelated. I call it "ability domain cohesion." Clearly the different Gv abilities measured by the WJ III Gv tests indicate that the Gv domain is less cohesive (less tight) than the Gc or Grw domain. This does not suggest the tests are flawed..instead it tells us about the varying degrees of cohesiveness present in different ability domains.

Thus, for ability domains that are very very broad (in terms of domain cohesion--e.g., Gv and Ga in this MDS figure), wildly different test scores (e.g., between WJ III Spatial Relations, SR, and Picture Recognition, PR) may be valid and simply reflect that inherent lower cohesiveness (tightness) of these ability domains in human intelligence. Thus, if a person is significantly different in his/her respective Gv SR or PR scores, and these scores are providing valid indications of their relative standing on these measured abilities, then combining them together is appropriate and reflects a valid estimate of the Gv domain....which by nature is broad...and people will often display significant within-domain variability.

Bottom line. Composite scores produced by subtests that are markedly different are likely valid estimates of domains...it is just the nature of human intelligence that some of these domains are more tight or cohesive than others.- iPost using BlogPress from my Kevin McGrew's iPad

Sunday, March 27, 2011

Should psychologists engage in the practice of calculating simple arithmetic averages of two or more scaled or standard scores from different subtests (pseudo-composites) within or across different IQ batteries? Dr. Joel Schneider and I, Dr. Kevin McGrew say "no."

Do psychologists who include simple pseudo-composite scores in their reports, or make interpretations and recommendations based on such scores, have a professional responsibility to alert recipients of psychological reports (e.g., lawyers, the courts, parents, special education staff, other mental health practitioners, etc.) of the potential amount of error in their statements when simple pseudo-composite scores are the foundation of some of their statements? We believe "yes."

Simple pseudo-composite scores, in contrast to norm-based scores (i.e., composite scores with norms provided by test publishers/authors--e.g., Wechsler Verbal Comprehension Index), contain significant sources of error. Although they have intuitive appeal, this appeal cloaks hidden sources of error in the scores---with the amount of error being a function of a combination of psychometric variables.

In the report we offer recommendations and resources that allow users to calculate psychometrically sound pseudo-composites when they are deemed important and relevant to the interpretation of a person's assessment results.

Finally, understanding the sources of error in simple pseudo-composite scores provides an opportunity for practitioners to understand the paradoxical phenomenon frequently observed in practice where norm-based or psychometrically sound pseudo-composite scores are often higher (or lower) than the subtest scores that comprise the composite. The "total does not equal the average of the parts" phenomenon is explained conceptually, statistically, and via an interesting visual explanation based on trigonometry.

Abstract

The publishers and authors of intelligence test batteries provide norm-based composite scores based on two or more individual subtests. In practice, clinicians frequently form hypotheses based on combinations of tests for which norm-based composite scores are not available. In addition, with the emergence of Cattell-Horn-Carroll (CHC) theory as the consensus psychometric theory of intelligence, clinicians are now more frequently “crossing batteries” to form composites intended to represent broad or narrow CHC abilities. Beyond simple “eye-balling” of groups of subtests, clinicians at times compute the arithmetic average of subtest scaled or standard scores (pseudo-composites). This practice suffers from serious psychometric flaws and can lead to incorrect diagnoses and decisions. The problems with pseudo-composite scores are explained and recommendations made for the proper calculation of special composite scores.

Saturday, March 19, 2011

A very nice, concise overview of recent research and theoretical discussions of the possibility of a personality g-factor...akin to "g" (general intelligence), as well as the findings of possible "plasticity" and "stability" higher-order factors above the Big 5 personality traits. Double click on image to enlarge.

Tuesday, March 15, 2011

I have had a number of people send me copies of this article (see abstracts and journal info below), especially those who do work related to Dx of MR/ID in Atkins death penalty cases.

The abstract is self-explanatory--the authors conclude that the WAIS-III four-factor structure is not validated in an MR/ID population. I can hear a lawyer now--"so Dr. __________, according to MacLean et al. the WAIS-III doesn't measure the same abilities in individuals with MR/ID...so aren't your results questionable?"

A close read of the article suggests the results should be take with a serious grain of salt. In fact, the discussion is primarily a discussion of the various methodological and statistical reasons why the published 4-factor model may not have fit.

As is often the case when dealing with samples of convenience (the authors own words), especially samples of individuals at the lower end of the ability continuum, the variables often show significant problems with non-normality and skew. This is present in this sample. Given that we are dealing with SEM-based statistics, the problem is actually one of not meeting the assumption of multivariate normality. The variables also showed restricted SD's---restricted range of talent, a condition that dampens correlations in a matrix.

While doing extensive modeling research at the Institute for Community Integration at the University of Minnesota, an institute devoted to individuals with MR/ID/DD, I was constantly faced with data sets with these problems. As a result, I was constantly faced with model fit statistics that were much lower than the standard acceptable rules-of-thumbs for model fit statistics...which reflected the less than statistical and distributional robustness of such sample data. The best way to overcome the resultant low model fits (after trying transformations of the variables to different scales), was to compare the fit of competing models. The best fitting model, when compared to competing models, may still show a relatively poor absolute fit value (when compared to the standard rules of thumb), but by demonstrating that it was the best when compared to alternatives, the case could be made that it was still the best possible model given the constraints of the sample data.

This leads to the MAJOR flaw of this study. Although the authors discuss the sample problems above, they only tested one model...the WAIS-III four-factor model. They then looked at the absolute value of the fit statistics and concluded that the 4-factor model was not a good fit. I see this as a major flaw. Since the standard rules-of-thumb for absolute magnitude of fit stats may no longer hold in samples with statistical and distributional problems, they should have specified competing models (e.g., two-factor; CHC-model, single factor, etc.) and then compared the relative model fit statistics before rendering a conclusion.

Finally, as the authors correctly point out, the current results, even with the flaws above, may simply reflect the well-established finding that the differentiation of cognitive abilities is less for lower functioning individuals, and more for higher functioning. This is Spearman's Law of Diminishing Returns (SLODR) [Click here for an interesting recent discussion of SLODR]

Bottom line for the blogmaster--I judge the authors conclusions to be overstated for the reasons noted above, particularly the failure to compare the 4-factor model to alternative models. It is very possible that the 4-factor model may be the best fitting model given the statistical and distributional constraints of the underlying sample data.

Abstract

Intellectual assessment is central to the process of diagnosing an intellectual disability and the assessment process needs to be valid and reliable. One fundamental aspect of validity is that of measurement invariance, i.e. that the assessment measures the same thing in different populations. There are reasons to believe that measurement invariance of the Wechsler scales may not hold for people with an intellectual disability. Many of the issues which may influence factorial invariance are common to all versions of the scales. The present study, therefore, explored the factorial validity of the WAIS-III as used with people with an intellectual disability. Confirmatory factor analysis was used to assess goodness of fit of the proposed four factor model using 13 and 11 subtests. None of the indices used suggested a good fit for the model, indicating a lack of factorial validity and suggesting a lack of measurement invariance of the assessment with people with an intellectual disability. Several explanations for this and implications for other intellectual assessments were discussed.

SAGE Open, our new open access publication, has received more than 200 submissions since launching on January 1, with new articles being submitted daily. Be a part of this groundbreaking publication and prepare your manuscript today.

SAGE Open publishes peer-reviewed, original research and review articles in an interactive, open-access format. Articles may span the full spectrum of the social and behavioral sciences and the humanities. Find out more, including manuscript submission guidelines, at www.sageopen.com.

Why publish in SAGE Open?

Quick review and decision times for authors

Speedy, continuous-publication online format

Global distribution of your research via SAGE Journals Online, including enhanced online features such as public usage metrics, comments features, subject categories, and article ranking and recommendations

Friday, March 11, 2011

While at the NASP conference I ran across a number of excellent poster papers on number sense, a hot topic in mathematics and individual differences. Today I received a PDF of one of the papers I requested....a review of the literature on the various components that have been mentioned regarding number sense (34 different elements). The poster paper can be viewed by clicking here (authors gave me permission). [double click on image to enlarge}

Of interest to me is the factor structure of number sense. Although 34 elements may be mentioned in definitions, how many latent dimensions really exist? My reading has suggested two. This is a nice paper and the references are awesome for anyone looking to get up to speed in this area.

Thursday, March 10, 2011

According to the most widely accepted Cattell–Horn–Carroll (CHC) model of intelligence measurement, each subtest score of the Wechsler Intelligence Scale for Adults (3rd ed.; WAIS–III) should reflect both 1st- and 2nd-order factors (i.e., 4 or 5 broad abilities and 1 general factor). To disentangle the contribution of each factor, we applied a Schmid–Leiman orthogonalization transformation (SLT) to the standardization data published in the French technical manual for the WAIS–III. Results showed that the general factor accounted for 63% of the common variance and that the specific contributions of the 1st-order factors were weak (4.7%–15.9%). We also addressed this issue by using confirmatory factor analysis. Results indicated that the bifactor model (with 1st-order group and general factors) better fit the data than did the traditional higher order structure. Models based on the CHC framework were also tested. Results indicated that a higher order CHC model showed a better fit than did the classical 4-factor model; however, the WAIS bifactor structure was the most adequate. We recommend that users do not discount the Full Scale IQ when interpreting the index scores of the WAIS–III because the general factor accounts for the bulk of the common variance in the French WAIS–III. The 4 index scores cannot be considered to reflect only broad ability because they include a strong contribution of the general factor. (PsycINFO Database Record (c) 2011 APA, all rights reserved)

Objective: Comparability of meaning of neuropsychological test results across ethnic, linguistic, and cultural groups is important for clinicians challenged with assessing increasing numbers of older ethnic minorities. We examined the dimensional structure of a neuropsychological test battery in linguistically and demographically diverse older adults. Method: The Spanish and English Neuropsychological Assessment Scales (SENAS), developed to provide psychometrically sound measures of cognition for multiethnic and multilingual applications, was administered to a community dwelling sample of 760 Whites, 443 African Americans, 451 English-speaking Hispanics, and 882 Spanish-speaking Hispanics. Cognitive function spanned a broad range from normal to mildly impaired to demented. Multiple group confirmatory factor analysis was used to examine equivalence of the dimensional structure for the SENAS across the groups defined by language and ethnicity. Results: Covariance among 16 SENAS tests was best explained by five cognitive dimensions corresponding to episodic memory, semantic memory/language, spatial ability, attention/working memory, and verbal fluency. Multiple Group confirmatory factor analysis supported a common dimensional structure in the diverse groups. Measures of episodic memory showed the most compelling evidence of measurement equivalence across groups. Measurement equivalence was observed for most but not all measures of semantic memory/language and spatial ability. Measures of attention/working memory defined a common dimension in the different groups, but results suggest that scores are not strictly comparable across groups. Conclusions: These results support the applicability of the SENAS for use with multiethnic and bilingual older adults, and more broadly, provide evidence of similar dimensions of cognition in the groups represented in the study. (PsycINFO Database Record (c) 2011 APA, all rights reserved)

Research suggests that executive functioning skills may enhance the school readiness of children from disadvantaged homes. Questions remain, however, concerning both the structure and the stability of executive functioning among preschoolers. In addition, there is a lack of research addressing potential predictors of longitudinal change in executive functioning during early childhood. This study examined the structure of executive functioning from fall to spring of the preschool year using a multimethod battery of measures. Confirmatory factor analyses revealed a unidimensional model fit the data well at both time points, and tests of measurement invariance across time points indicated that children's mean latent executive functioning scores significantly improved over time. Verbal ability was a significant predictor of longitudinal change in executive functioning. Theoretical implications and directions for future research are discussed. (PsycINFO Database Record (c) 2011 APA, all rights reserved)

We've updated our list of psychologists (plus a few stray neuroscientists, therapists, students and psych-bloggers) who Tweet. Follower counts were correct as of Friday 4 March 2011. Compare with the previous list compiled in November 2010. The Digest editorial team are in purple highlight.

Thanks to Ben Watson for updating the follower counts. If you'd like to be added to future iterations of the list please add your full name and Twitter tag to comments. Future additions to the list must be fully-qualified psychologists. Also, we're restricting the list to individuals, so no organisations please.

About Me

Dr. Kevin McGrew is Director of the Institute for Applied Psychometrics (llc). Additional information, including potential conflicts of interest resulting from commercial test development or other consultation, can be found at The MindHub(TM; http://www.themindhub.com ). General email contact is iap@earthlink.net.