Composite versus multiple wellbeing scores

Can anyone meaningfully evaluate their ‘life as a whole’? If they can, how do they do it, and in what ways might this information be useful?

The Enlightenment and the rhetoric of ‘greatest happiness’

Ever since in 1726 Francis Hutcheson coined the memorable phrase ‘the greatest happiness of the greatest number’ as the way of judging good deeds and good policies from bad ones, the possibility of ‘maximising’ happiness or wellbeing has been mooted. This phrase, despite its excellent intentions, doesn’t actually make logical sense: even if you could know what the ‘greatest happiness’ was and knew how to measure it (which you can’t because happiness is just far too diverse, messy, and uncertain for that), obviously you can’t ‘maximise’ two different goods.

A few decades after this, Jeremy Bentham began popularising Hutcheson’s phrase. Bentham realised that if people went beyond ‘happiness is what counts’ to a belief that ‘happiness can be counted’, the ‘maximum happiness’ idea would be a lot more influential. So Bentham theorised that it should be possible to measure happiness using a ‘felicific calculus’, a simple sum of ‘pleasures minus pains’.

However, any attempt at such calculation must:

a) select only a few from an infinite of possible indicators,

b) translate information on those indicators into numbers, and

c) choose one of an infinite of possible rationales for combining numerical indicators of incommensurable goods into a single aggregate number.

Bentham’s felicific calculus was obviously a polemical, attention-seeking metaphor. It is doubtful whether he seriously believed anyone would try to come up with a valid measure that would capture the complex meanings and inevitably ambiguous evaluations of wellbeing in a single aggregate number.

Today’s massive statistical efforts to develop and use aggregate measures of wellbeing has been so successful in generating public interest that there are risks that, as with GDP, people forget that these are extremely rough and ready tools that can only offer us suggestive glimpses of wellbeing. All aggregate indicators need to be kept in due perspective, and complemented by a variety of other ways of learning about the complex processes involved in generating and evaluating wellbeing.

Nonetheless, probably the most surprising finding of happiness research is that most people worldwide, when asked about their overall happiness or life satisfaction, are willing and able to offer an aggregate self-ranking. In doing so, some mental trick must allow respondents to assign implicit weightings to a wide variety of incommensurable goods. Understanding how this aggregation happens is one of the more intriguing challenges of wellbeing research.

Aggregate wellbeing and composite indices

Nowadays, most countries have regular surveys of happiness or life satisfaction which assume that people, at least in aggregate, can provide valuable information about the experience of ‘life as a whole’. But most national-level planners and researchers would also like more detailed information about the many domains in which people experience wellbeing or illbeing.

So for some purposes, statistical or planning agencies combine scores on several different assessment criteria to give an aggregate wellbeing score. When this aggregation isn’t done, what you have is lots of different indications of how well people are doing, and you need to work out which of these matters and why. But if you do aggregate scores, you need to either pretend that each score is equally important, or find some criterion for ‘weighting’ the scores in terms of their relative importance for overall wellbeing.

Aggregation is always controversial because it involves questionable assumptions about the relative importance and statistical ‘weighting’ of different scores and dimensions.

Messy experience and neat reports

Any claims about the implicit quantification of overall happiness or wellbeing involve mental aggregation of a variety of incommensurate values, even if this is only implicit. Mentally, people assess how happy they are, what kind of ‘balance’ they have between positive and negative emotions, whether the wellbeing of a population is higher than before, or lower than some other population.

The UK charity New Philanthropy Capital promotes an online survey-based tool designed to help charities, schools and other organisations show the difference they make to the lives of young people. It looks at eight aspects of subjective well-being to quantify aggregate impact based on: self-esteem, emotional wellbeing, resilience, and satisfaction with friends; satisfaction with family, community, school life, and life overall.

A nice illustration of the difficulty of survey-based statistical aggregation came from Oxfam’s experiment in developing a ‘Humankind Index’ to compare information on wellbeing, with an emphasis on the subjective self-evaluations of disadvantaged populations. Different ‘weights’ were assigned to several subdomains of prosperity so as to reflect ‘priority’ on the basis of the responses by the 3,000 people they consulted.

Surprisingly, the consultation process led to the conclusion that ‘feeling good’ mattered least, as their respondents rated it less than one-fifth as important as having decent housing. It would seem that this interesting consultative attempt to stir up interest in subjective life-evaluation and domain evaluation ran foul of the perceived necessity to provide aggregate statistical information. It simply doesn’t seem realistic that Oxfam’s informants would agree that ‘feeling good’ is something so trivial that it is very much less important than housing. To arrive at that figure, what kind of implicit theory of ‘importance’ might have guided their responses? And in any case, what did they think the point of housing might be, if it doesn’t play some role in helping people to feel good?

In short, it seems likely that even though the processes of devising measurement tools and analytical approaches may well help people think about wellbeing more carefully than they might otherwise do, the results from combinations of several different numerical reports are most unlikely to give us useful information.