Self-reported measures of happiness are growing in popularity as alternatives to GDP. This column presents a novel statistical critique of the validity of comparing such measures across groups. Since monotonic transformations of individuals’ happiness levels can reverse average happiness rankings between countries, no meaningful comparison can be made without assumptions on the distribution of happiness.

Related

Economists have long known that GDP is an imperfect measure of well-being. In addition to missing nonmarket transactions, it ignores environmental degradation, the quality of social interactions, and many other outcomes of economic interest. But at least since Easterlin (1974) some economists have gone further, and challenged the view that per capita GDP and well-being are positively related. This has fostered calls to replace or downplay GDP as a measure of national success,1 while augmenting or replacing it with measures of subjective well-being, a position supported in some governmental circles. Others have argued for the purposeful contraction of economies (e.g. Victor 2010, Kallis et al. 2012). Subjective well-being measures have also been used to challenge ‘common knowledge’ on topics ranging from whether being wheelchair-bound reduces happiness to whether students should prefer ‘Podunk U’ or Harvard.

We view many of these conclusions as premature. They have been reached without an adequate recognition of the limits of happiness scales. Our goal is not to provide a litany of concerns – which can be done for most social scientific metrics – but rather to establish that standard happiness measures cannot rank the average happiness of two groups without strong additional assumptions about the underlying distribution of happiness. We expect that even if we achieve consensus on these assumptions, many comparisons will remain unranked.

Any ranking can be reversed

One common measure of happiness asks respondents to evaluate their happiness on a three point scale (e.g. General Social Survey). “Taken all together, how would you say things are these days – would you say that you are very happy, pretty happy, or not too happy?” Researchers typically call “very happy” 2, “pretty happy” 1, and “not too happy” 0.

But surely not everyone who reports being very happy is equally happy. Even if everyone interprets the question identically, some very happy people will be “extremely happy” while others will be only a little happier than “pretty happy.” These responses represent individuals placing their happiness (a continuous variable) into discrete categories. For example, an individual may decide she is “pretty happy” if her happiness is 60-80, and “very happy” if it is above 80. Oswald (2008) refers to this as the “reporting function.” With three intervals, if all individuals use the same reporting function we can, without loss of generality, define the bottom interval as happiness below zero and the top interval as happiness above 1.

Consider now the example in Table 1 which shows happiness reports from two fictitious groups. Group A’s happiness appears to stochastically dominate Group B’s. Any values we assign that maintains the ranking of the categories shows that Group A is happier on average. But what happens when we treat the responses as representing intervals?

Table 1. An example

The natural solution is to use ordered logit or probit, but it is important to normalise correctly. Any comparison of two groups’ happiness levels must assume their reporting functions are the same. If one group answers “very happy” only when “extraordinarily happy” and the other answers “very happy” when “quite happy,” it is impossible to use these responses to compare the two groups’ average levels of underlying happiness. Thus we need to normalise the two cutoffs and not the mean and variance, as is common.

Using these normalisations and ordered probit, we find that Group B is, in fact, happier on average (-.14 v. -.18). The intuition is as follows. The two groups have the same number of respondents who are “Not too happy” but Group A has more “very happy” respondents. If happiness in each group is normally distributed, Group A’s distribution must have a higher variance. Since Group A’s variance is larger, its least happy members are less happy than those in Group B. When we estimate the mean, this ends up swamping the fact that its most happy members are happier. Stochastic dominance in the categorical responses is not sufficient to rank average happiness.

Moreover, we can easily reverse our conclusion about whether Group A or B is happier on average. Just as any monotonic transformation of a valid utility function represents the same preference ordering, any monotonic transformation of the happiness distribution is consistent with the same distribution of reported happiness. If we transform the happiness function so that h’ = exp(h), we obtain a log-normal (right-skewed) distribution of happiness. A right-skewed distribution increases dispersion among the happiest and decreases it among the least happy. Since Group A has more individuals in the happiest category, this transformation increases Group A’s happiness more and proves sufficient to reverse the ordering.

This is not the result of a carefully crafted example. Starting from the results of two ordered probit estimates, there is always an exponential transformation that reverses which of the groups has higher mean happiness. If Group A has higher mean and lower variance under the normal, there is always some c such that the transformation h' = ech makes Group B happier on average. If Group A has a higher mean and higher variance under the normal, there is always some c such that the transformation h' = e(ch) makes Group B happier on average.

Reassessing the link between GDP and happiness

Are GDP and happiness really unrelated? In Bond and Lang (2014), we used ordered probit to estimate mean happiness in the US in each year of the General Social Survey from 1972 to 2006. Figure 1 plots the relation between average happiness and real per capita GDP and confirms the ‘Easterlin Paradox’: the relation is negative, although not statistically significant. However, as shown in Figure 2, the variance of happiness also declines with per capita GDP. Thus, if happiness is sufficiently left-skewed, the Easterlin Paradox is resolved. If h' = –e(-2.6h), the relation between per capita GDP and happiness is not only positive but statistically significant as shown in Figure 3.

Figure 1. Mean happiness and GDP per capita

Figure 2. Standard deviation of happiness and GDP per capita

Figure 3. Left-skewed happiness with no Easterlin Paradox

What about cross-country comparisons? Using a four-point scale, the World Values Survey assessed happiness in 57 countries. Since few respondents report being in the lowest category, we combined the bottom two categories, which maintains comparability with our results from the GSS. As shown in Table 2 – if happiness is normally distributed – Mexico, Trinidad and Tobago, Great Britain, Ghana, and Colombia are the happiest countries, suggesting the relation between development and happiness is weak or negative. A right-skewed log-normal transformation (h' = e2h) strengthens this result, replacing Britain and Columbia with Guatemala and South Africa. However, if happiness is left-skewed (h' = –e(-2h)), the happiest five countries become New Zealand, Sweden, Canada, Norway, and Great Britain, all members of the OECD. Note that some rankings are unaffected by the skewness of the happiness distribution within this range: it is very difficult to make Iraq look like a happy place. However, there is great variation among others. Ghana goes from the happiest to the third least happy depending on which of the two transformations we choose.

Table 2. Ranking of countries by mean happiness

Source: 2005 Wold Values Survey.

Happiness research going forward

How can researchers draw appropriate conclusions from happiness data given the problems we have raised?

We could assume we know the policy-relevant distribution of happiness. However, this takes definitive stands on key issues in the field. For example, since wealth and income are highly right-skewed, assuming happiness is normally distributed almost necessarily requires that the marginal effect of income/wealth on happiness is strongly decreasing.

Alternatively, we could assume that only the categories themselves are policy relevant. We could then use stochastic dominance to rank groups based on their categorical responses. While this approach has intuitive appeal, it is easy to construct examples where it leads to the wrong conclusion if the known underlying data are placed in discrete categories. For example, actors have high mean wages but most have low incomes.

Of course, as the number of points on the scale becomes large, they can be treated as points rather than intervals, but the assumption of a common reporting function becomes problematic. Using the conventional Cantril (1965) scale with eleven intervals raises significant technical problems since we have only two free normalisations but cannot identify any moments beyond the first two.

We hold out hope that researchers can reach agreement on what constitute reasonable distributions of happiness. For example, we propose the ‘Tolstoy Assumption’: that happiness is left-skewed (“All happy families are alike; each unhappy family is unhappy in its own way”). If a researcher reaches a conclusion requiring happiness to be right-skewed, she must also conclude that per capita income and happiness do not rise together. Regardless, happiness researchers must be explicit about their underlying distributional assumptions when drawing conclusions from subjective well-being data.

If we accept the Tolstoy Assumption, some results (e.g. the positive effect of Moving-to-Opportunity on happiness) are reinforced while, as we have shown, others are reversed. Until we recognise and address the unique problems associated with the categorical data used to estimate happiness, it is premature to apply them to draw strong conclusions that inform policy.