Progressing a Paradigm Shift in Psychometrics

From PsychWiki - A Collaborative Psychology Wiki

This entry calls for assistance in dealing with a problem that the author, and before him his father, have, between them, been struggling with for 75 years.

More specifically, we would greatly welcome suggestions for alternative ways of thinking about individual differences, references to the work of other researchers who have made significant progress toward developing an alternative conceptual and measurement framework, and suggestions re potential collaborators in taking the work forward.

At the time of writing, it is not entirely clear how the editors of the PsychWiki propose to advance their objective of holding “Virtual Lab Meetings”.

Whereas the main Wikipedia specifically discourages original research and personal points of view, a “Lab meeting” starts precisely with one of these and attracts comments like “No. That’s not the correct way to think about it. Try this ….”. Or “You mean, you’ve never heard of XYZ’s outstanding work …”. Such comments provoke lateral thinking even though one disagrees with them.

So how to organise a similar conversation on the Web? Our suggestion is that readers leave this lead page much as it is but copy it into the “Discussion” page and then enter their comments at the precise places where they belong in the text. Subsequent readers can then enter new comments of their own or agree or disagree with the entries of earlier commentators.

Contents

The Problem

The fundamental problem with the current psychometric methods and models used by psychologists was noted by Spearman (the father of g) almost a century ago:

“Every normal man, woman, and child is … a genius at something … It remains to discover at what … This must be a most difficult matter, owing to the very fact that it occurs in only a minute proportion of all possible abilities. It certainly cannot be detected by any of the testing procedures at present in current usage.”

Of course, many will doubt whether the statement is true.

Unfortunately, as Spearman hints, its truth or otherwise cannot be established via research conducted with measures developed to meet the most widely held expectations.

However, assuming for a moment that it might be true, Spearman’s statement, in essence, calls on us to identify precisely what, specifically, each individual is good at. This cannot be done by offering a “profile” over, say, 1 (IQ), 2 (Eductive and Reproductive ability), or 16 (16 PF) “variables”.

To do what we are enjoined to do, we would have to develop something akin to the Linnaeus-Darwin classificatory framework in Biology or Dalton’s Atomic theory in Chemistry. That is, we would have to abandon our attempt to “measure” everyone, after the manner of physicists, on a limited number of “variables” and, instead, embrace the descriptive procedures of biology and chemistry.

Our own work to date (see eg Raven & Stephenson, 2001) suggests that we need to adopt a two stage (not a two factor) framework to do this effectively and that the requisite framework is at loggerheads with what have, in the past, been taken to be the basic cannons of psychometrics.

One must first identify precisely what it is that an individual is strongly and intrinsically motivated to do. The kinds of things people are strongly predisposed to do are legion and range from developing new scientific theories, through putting people at ease, to robbing banks and creating political turbulence. So we need some kind of classificatory framework (atomic theory) to tell us, by analogy, which of these are “elements” and which are “compounds”.

Then we need to be able to identify all the components of competence that can be brought to bear to undertake the chosen activity effectively. These include things like sensitivity to the fleeting feelings on the edge of consciousness which indicate the germ of an idea or a potential problem, ability to use the unconscious, ability to persuade others to help, the ability to initiate 'experimental interactions with the environment' and learn from the effecs of the action, and the ability to persist over a long period of time. It is, however, essential to note that these are all difficult and demanding activities which no one is going to engage in unless they care very strongly about the activity they are undertaking. It therefore makes no sense to try to try to assess people’s ability to do such things outwith the context of an activity they are strongly and intrinsically motivated to undertake.

In other words, it makes no sense, (as psychologists have tried to do in the past) to seek to assess such things as “intuitive thinking”, “creativity”, “self-confidence”, or even “the ability to think” in any general way. One must always ask “In relation to what?”. People who are highly creative in finding ways of disrupting classroom or social activity are unlikely to display much creativity in art. People who are confident that they can deal with drunks in the street may lack all self confidence if confronted with an academic task.

Thus Spearman may well be right (and, indeed, the available evidence suggests that he is right). Given an appropriate developmental environment, everyone is a genius at something. Everyone is creative at something. Everyone is a superb intuitive thinker about something. The question is “In relation to what?”

But, clearly, one cannot assess these abilities as we have tried to do in the past. It just doesn’t make sense. This does not mean that the concepts are meaningless. To draw an analogy. All animals are geniuses at something. Some are good at being dogs. Others at being butterflies. One could develop a couple of thousand scales to assess all animals on “dogginess”, “frogginess”, flyiness” and so on. But that would be kinda stupid. Furthermore, to be good at being a dog, a frog, a fly (etc.) most animals need a heart, lungs, brain, eyes etc. (Compare: To be good at causing havoc in a classroom one needs to be intuitively sensitive to what is going on, to be creative, to study and learn from the effects of one’s actions, to be confident, to persist despite punishment, etc etc. ). It is true that the quality of hearts, lungs, etc. within a species might usefully be assessed. But it would not make any kind of sense to seek to differentiate between animals per se in terms of their “heartiness”, lunginess, or eyeiness. By the same token it does not make sense to seek to assess the strength of intuition, the ability to think, self-confidence, or persistence outwith the context of a task that the individual finds engaging.

But note something else. The success with which people are able to undertake their chosen tasks is likely to depend on how many of these independent and substitutable components of competence they bring to bear. Do they turn their emotions into the task, and anticipate obstacles and think of ways round them, and learn from the effects of their actions, and persist over a long period of time? These components of competence therefore operate cumulatively and substitutively, not as an “internally consistent” factor. To predict people’s probable success at their chosen tasks, one needs something more like a multiple regression coefficient based on finding out whether they engage in each of a number of, probably relatively independent, but cumulatively necessary, components of competence.

It follows from these remarks that there is no way in which we can get to where we need to get to starting from the traditional preoccupations, thoughtways, and assumptions of psychometricians.

But we haven’t yet finished with what we can learn from Spearman because he went on to make two further observations:

1. The kinds of tests from the correlations between which his g had emerged “had no place in schools” because they did not encourage teachers to educate – to “draw out” - the diverse talents of their pupils – ie to create multiple individualised developmental programmes that harness each pupil’s idiosyncratic motives and lead them to develop the components of competence mentioned above so that they end up with one or other of the seemingly endless distinctive talents they could possess. Stated the other way round, the use of these tests results in schools being destructive, and therefore unethical, institutions because they lead most teachers (and society) to overlook and fail to develop and utilise most people’s most important talents.

2. These tests had little construct validity. There is, for example, no sense in which a typical “science” test measures scientific competence. It measures the ability to recall, for a brief period of time, a quickly-to-be-forgotten smattering of unrelated scraps of temporary scientific knowledge and routinised techniques.

Ignoring these observations constitutes nothing less than a crime against humanity. Here’s why:

Firstly, Because, because by failing to identify people’s talents, the tests in current use deprive people of opportunities to gain recognition for, develop and utilise those talents. But they do not just harm people as individuals because, as, Kanter (1985) and others have shown, people contribute in very different ways to the emergent properties of the groups that make the main contributions to the development and survival of organisations. One person is good at noticing a problem, another good at publicising it, another at intervening in the organisation to raise the funds needed to do something about it, another at orchestrating espionage activity to find information relevant to solving it from within other organisations, another at creating the wider political turbulence required to get legislation enacted to create a market for the product, another at soothing out arguments between people within the group … and so on. These contributions do not show up in most staff appraisal and guidance, placement, and development exercises – and still less are they called for in most job descriptions and selection practices. Worse, these neglected talents are the very ones that are required to introduce the radical changes in our way of life that are required if we are to survive as a species. What could be more unethical than contributing to the extinction of our species and the destruction of the planet as we know it?

Secondly, neglecting them contributes directly to the production of thousands of studies which, while billed as contributing to “Evidence Based Practice”, are scientifically incompetent and lead to policy decisions that are highly unethical.

Here are a couple of examples from education and “health care”.

Many “progressive” educational activities seek to nurture qualities like self-confidence, problem-solving ability, initiative, and the ability to understand and intervene in organisations. Furthermore, they seek to nurture different qualities in different children. Since there are no good measures of outcomes like self confidence, ability to work with others, or ability to understand and intervene in organisations, most comparative evaluations utilise only traditional measures, mostly just of “The 3Rs”. Since the “progressive” teachers did not set out to produce higher reading scores (at least as conventionally measured), their pupils do no better on these tests than pupils who have studied in other programmes. Politicians take this as a signal to close the programmes. Yet these are the very educational activities that are required to nurture the talents that are needed to transform our society in such a way that our species will have a chance of survival. (Readers interested in thinking about how the concepts and methods of organisational psychology can be deployed to contribute to the movement toward a learning society should go to Creating_a_Learning_Society:_What_have_Organisational_Psychologists_to_Offer? )

But this is actually the least of the problems these programmes pose for evaluation because many of the teachers concerned set out to nurture different – and idiosyncratic – qualities in different children. Most evaluation designs offer no prospect of documenting growth in multiple, divergent, directions.

The net result of evaluations conducted according to traditional criteria and practice is that neither the beneficial results of the “progressive” programmes nor the detrimental effects of conventional “educational” activities show up. So everyone thinks that the “progressive” programmes are ineffective; a waste of time and money. This results in the perpetuation of conventional educational activities which inflict serious damage on most pupils and, in the long term, on society.

At one level, these unscientific and unethical conclusions stem from the utilisation of what amount to arbitrary selections of measures. At another level they stem from the use of tests that lack construct validity: the use of tests that do not measure reading ability in any meaningful sense of the word, still less problem solving ability or self confidence. And, at yet another level, they stem from an even more fundamental problem: effective educational and developmental activities transform people. Participants do not become more or less of something. Latent motives or talents are stumbled upon and capitalised upon. Ideally, all of the participants develop in different ways.

These problems become still more intractable in the context of attempts to evaluate adult developmental activities. People not only learn how to contribute in very different ways to eg community development projects, they also establish networks of relationships with others which enable them to be different people; to be effective. Thus effectiveness is not, after all, an individual characteristic. But how is this to show up in the evaluations? People have become more competent and confident because, and only because, they are working in a context which respects, harnesses, and builds on their individual motives and talents.

Although it is a digression here, it is not entirely irrelevant to note that these are not the last of the problems that need to be addressed by anyone who who wishes to undertake a serious evaluation of such programmes. This is because many of those concerned have done things like resolve personal problems that have bugged them all their lives. They have met husbands, wives, friends who have, through their complementarity, enabled them to be more competent individuals, more fulfilled people. Such changes cannot show up on conventional, internally-consistent, psychometric tests.

Although it would not be appropraite to discuss them here, the problems these observations pose for evaluation models ss such become even more serious when they are applied to the evaluation of such things as psychotherapy and health care more generally … especially when such evaluations are made mandatory and required for such things as Evidence Based Health Care Treatments and “Payment by results”. Once this is done, they make it virtually impossible to even think about, let alone discuss, activities that aim to enhance well being.

Previous attempts to solve the problem

The problems just articulated have, in reality, been noticed by huge numbers of people … but they have not been articulated so crisply as they were by Spearman or discussed in as much detail as here. They have mostly found expression simply as seething rage in conversations in staff common rooms and at meetings of innovators.

One set of attempts to find a solution to the problem emerges in schemes to promote the collection of folios of work by pupils in schools. Most of these collapse. This is partly because the activities on which the folios are based are too school-oriented to have much chance of revealing pupils’ motives and talents, and it is partly because they confront their users with the task of sifting through quantities of material and trying to “make sense” of it … ie trying to develop a conceptual classificatory framework of their own and relate the mass of “data” to it. (Of course, these are not the main reasons why such efforts fail. These have much more to do with being driven out by National curricula and mandatory high stakes testing programmes.)

Over the past few years, there has been much talk around the “strengths” movement, most often associated with the name of Seligman. This is not the place to launch into a critique of this work. Suffice it to say that it does not really seem to engage with the multiple talents framework that can be discerned behind our earlier discussion of “progressive education”.

The most promising attempt to come to terms with the problems that have been mentioned seems to stem from the work of those of David McClelland’s colleagues who developed the scoring system for McClelland’s variant of the Thematic Apperception Test known as the “Test of Imagination”.

Very few researchers have studied this scoring framework (which constitutes the operational definition of the terms used; what their actual meaning is) with any care. Most simply impose lay interpretations on what are, in reality, technical terms used in a very specific (and, as it happens, misleading) way. And most of the critiques of the framework that are bandied around simply echo those made by others who, without thinking about the actual implications of the operational definitions of the terms or the limitations of conventional psychometric thinking sought, in the early days, to apply unexamined received opinion relating to “standards” in test construction (including such things as internal consistency and validity) to a framework with which they had made no real effort to familiarise themselves. (It is interesting that it has taken half a century for the inappropriateness of applying notions derived from Classical Test Theory to tests developed using Item Response Theory to become even moderately well known.) In spite of attempted developments in psychometry the basic question is about to use Hilbert space as if it were Cartesian space. Furthermore, computer programs calculate on absolute scale. Therefore the problem is more of interpretation than to use as if sophisticated psychometrics.

The McClelland Framework

It is not possible here to present the original framework or the author’s reconceptualisation of it in any detail. For that, the reader must turn to eg Raven & Stephenson (eds) (2001). However, in essence, what the scoring system does is first ask “What specific kinds of activity is this person strongly intrinsically motivated to undertake?” and then, and only then, “In relation to that kind of activity (ie not ‘in general’) which, and more importantly, how many, of a specifiable number of difficult and demanding activities that are likely to enable this individual to undertake that kind of activity effectively does he or she undertake spontaneously?”

At this point we may repeat a couple of points made earlier:

1. The kinds of activity people may be strongly predisposed to undertake are legion. So we need some kind of classificatory framework akin to atomic theory in Chemistry to tell us which are elements and which compounds and perhaps to impose some kind of meaningful grouping (not “factor” based) on the list.

2. The cumulative and substitutable components of competence that people need to bring to bear to undertake their chosen activities effectively seem to be more limited in number, but they run across cognitive, affective, and conative domains. It is easy to say that they involve cognitive activities like making plans, anticipating obstacles, and “thinking” of ways round the obstacles. Unfortunately, even these, nominally cognitive, activities are, in reality, complex activities having cognitive, affective and conative components. For example “Thinking” means paying attention to feelings (affective) on the edge of consciousness, trying to make them explicit (conation), and initiating action to “test” emergent, usually unverbalised, “hypotheses” and learning (again not usually in a formal, verbalised, manner) from the effects of that action more about the problem itself and the effectiveness of the trial solution. The use of affect involves much more than turning ones emotions into the task. (The source of virtually all “cognitive” activity is primarily affective). Persuading people to help demands complex social skills involving knowledge, sensitivity, planning etc. etc. Yet none of these difficult and demanding abilities can meaningfully be assessed across tasks but only in the context of a task that is individually engaging.

In short, the dominant thoughtways, not only of psychometrics but also of psychology in general, mislead.

From our current viewpoint, the attempt by McClelland and his colleague to produce “motivation” “profiles” was unfortunate. If we are right that people’s motivational predispositions are legion yet very specific and persistent over the life cycle, then what one needs is a statement, akin to a chemical description of a substance, about each individual identifying which specific activities he or she is strongly motivated to carry out and which components of competence he or she brings to bear to carry them out.

Having said that, it is immediately obvious that what will happen as the individual is placed in different environments can be modelled in the same way. Chemical substances may be transformed by their environments. They combine with other substances to form new substances having emergent properties that cannot be predicted from summing the components. (Copper is, in a sense, transformed by sulphuric acid to yield copper sulphate which has properties that cannot be predicted in any linear sense by summing the properties of copper, sulphur, and oxygen.) As psychologists, we need to think more carefully both about how people are transformed by their environments and about how they get together with others to form groups having emergent properties that are not the sum of their individual characteristics.

But, then again, although we have not given the necessary background to explicate the statement here, a biological and ecological analogy is probably a more appropriate analogy than a chemical one. Ask yourself where biologists would have got to if they had tried to classify all the animals and plants in terms of their “scores” on 1, 2, or 16 variables, their environments in terms of 10, and then study the effects of the environments on the animals using multiple regression techniques! No. What is needed is something analogous to an ecological framework in which one maps multiple feedback loops between developing organisms and their environments.

What we would like from readers.

As has been indicated, we are none too sure how to progress from the position outlined above. We would therefore greatly welcome any references to the work of researchers who have made significant progress over and above that outlined above. This might include alternative formulations of the problem. We would also greatly welcome any suggestions anyone is able to make toward developing an appropriate classificatory framework. And suggestions readers might have regarding whom we might contact in our search for collaborators who might help us to take the work forward would also be greatly welcomed.

Forthcoming Symposium

.

The issues raised above will be discussed in a symposium entitled
Serious Errors in the Evaluation of Individuals and Programmes Arising from the Use of Tests yielding Arbitrary Metrics and from the deployment of Arbitrary selections of Measures (and their amelioration with the aid of Item Response Theory) which will be held at the conference organised by the International Test Commission in Liverpool, England, from July 13 to 16, 2008. See http://www.itc2008.com

References

Only a few of the relevant references can be cited here. More can be found on the author’s website: http://www.eyeonsociety.co.uk although this is still, very much, in the course of development.