For Whom the Bell Holds

I can only recognize the occurrence of the normal curve … as a very abnormal phenomenon. — Karl Pearson (1901)

Widely believed and rarely questioned is the notion that human characteristics, including in particular measures of performance, are distributed along a symmetric bell-shaped curve. There is a small handful of folks who score high and a comparable handful who score low, with most folks bunched together in the middle. Said another way, average scores are typical scores, and the number of extreme scores in one direction is about the same as the number of extreme scores in the other direction.

A recent article by Ernest O’Boyle and Herman Aguinus (2012) is therefore worthy of note, because it challenges the ubiquity of the bell-shaped curve as a description of human performance. These researchers looked at a large variety of objective performance measures from such domains as entertainment, science, politics, and sports and found that their distributions rarely fit a bell curve*.

And it was not just that the distributions were skewed or looked like a squished, dented, and/or stretched bell. Instead, most distributions had an altogether different shape, one technically described as a power function or a Paretian distribution, with many more extreme cases than would be expected were performance scores distributed in a bell-shaped fashion. Most people scored well below the arithmetic average, whereas a few scored well above.

For example, consider entertainers nominated for an Emmy Award. The typical nominee had but one nomination, and a notable minority of nominees accounted for the lion’s share of nominations. The good news is that there are superstar entertainers, and probably more than we might expect given our bell-shaped preconceptions. The less than good news is that most entertainers are below average. The same conclusions hold for other domains of human performance and — perhaps — for many other human characteristics as well**.

So what does it all mean?

First, as those of you who have taken a statistics course may remember, a bell-shaped curve is often described as a “normal” distribution, and the implication follows that other distributions are somehow abnormal. When we find that actual scores do not fit a bell shape, we suspect a problem with the sample or with the assessment of whatever characteristic is of interest but not with our assumptions about the population from which the sample is obtained.

Second, pointed-headily but importantly, the inferential statistics often used by social scientists to establish whether research results should be taken seriously or not are based on the assumption that “true” scores are distributed normally, i.e., in bell-shaped fashion. If this assumption is unwarranted, as it appears to be in many cases, then the conclusions based on the typical statistical tests may be wrong, resulting in false positives or false negatives, as the case may be***. Oh my.

Third, more generally and also importantly, the way we think about performance is challenged. Consider the grades we receive in school or the evaluations we receive at work. If those handing them out assume that performance is bell-shaped, then they may assign evaluations or grades accordingly, to fit the bell curve and not the reality.

Some “typical” students or workers may be inaccurately upgraded to average, whereas others doing quite well may be inaccurately downgraded to avoid assigning ostensibly inflated grades or evaluations, even though they may be appropriate. Indeed, in some schools or workplaces, a bell-shaped distribution may even be mandated by the powers-that-be, never mind what students or workers are actually doing. Oh my.

Fourth, we need to think differently about positive psychology. If a goal of positive psychology is to study people who do extremely well, then there may be more such people than we might expect. That’s good and makes positive psychology research easier to do because appropriate research participants are easier to find. However, if another goal of positive psychology people is to help “average” people to do better, it needs to be recognized that most people are below average. That’s bad because interventions, even those that “work” as intended, need to be reconceptualized. Once again, oh my.

So, the paper by O’Boyle and Aguinis has implications for theory, research, and practice. We should ask for whom the bell holds. With apologies to John Donne (and Ernest Hemingway), I conclude that it may not often hold for thee or for me.

* Why is the assumption of a bell-shaped curve so firmly established? It seems to describe rather well the distribution of certain biological characteristics — e.g., height or weight — as well as measures of human performance under highly-constrained circumstances—e.g., working on an assembly line or filling out a self-report survey. Moreover, bell-shaped curves have desirable mathematical properties and apparently captured the fancy of many early statisticians, who generalized their applicability beyond what was warranted.

** These are not exotic points. Most of us know that the “average” income in a workplace likely describes the pay of almost no one because lots of workers make very little, whereas some — those 1%-ers decried by the various occupy protests — make staggering amounts. And although families in a community may on “average” have 2.3 children, finding a family that fits that generalization is of course impossible, no matter how little we think of our neighbors' teenaged sons. The practical problem for society is when these points are not so obvious and policy decisions are then based on them.

*** How many erroneous conclusions based on violations of normality are established in the research literature is not clear, at least to me, although I assume (hope?) that there are relatively few. Within limits, many inferential tests assuming normal distributions are robust in the wake of violations. In any event, researchers should routinely look at the distributions of their measures and when indicated check conclusions based on statistical tests assuming normal distributions against conclusions based on other statistical tests that make no such assumptions.

This is interesting to me, both as a teacher and as a positive psychology wellbeing coach. I believe the coaching model is great - and tends to serve those who have different financial "bells" than those who may benefit most from improved wellbeing. That is, save health plan money, etc. Different interventions . . . yes, I think it may require some new pilot studies as we broaden what we do to the whole population.

Academy awards nominees is a very bad example. Simply by being in a leading or supporting role in a serious film puts you in the top little slice of acting ability for the human race. By looking at just nominees you are going to be looking at that top slice of the curve, and so of course it will look like a power law function, even if acting ability is actually a bell curve. Better examples please?
T

The figure accompanying the blog entry contain frequency distributions. If many more individuals were included at the left (i.e., those with little ability), I think a power function would still be found, maybe even more strikingly. But I suppose this is an empirical issue. Thanks for the comment.

I remember an engineering professor talking about the distribution of grades in difficult math classes. He said the distribution isn't bell-shaped, it's more like a camel with a hump on either side and almost nothing in the middle. When we asked him why, he replied; "either you get it or you don't."

Just read your blog. I have a PhD in statistics. There seems to be one thing that you are missing here. Normality does not apply the distribution of scores or performance or behavior, etc. If you are trying to use normality to describe distributions of your data itself, then you will often be wrong!! This is NOT what we are teaching our students in statistics courses (although granted this is often what they think they hear).

The principles of normality apply to means or averages across samples--again not to the individual data points in a sample. The theory says that if you have a large enough sample, that no matter the distribution of the population level data, the AVERAGE of the data will approximately follow a normal distribution (across hypothetical samples). This DOESN'T mean that as the sample gets bigger you can use a normal curve to describe the individuals in the data. If the data are not normal to begin with, a normal curve will NEVER be appropriate to describe the population level data. If you do, you are misusing statistics.

For example, the grades issue has come up. What I'm saying is that if grades really followed the "two hump and no middle" distribution described in the comment above, no matter the class size, you CANNOT use a normal curve to describe the individuals in the population (estimate percentiles, etc). However, for a large class size one CAN use a normal assumption to compare the performance of the class average against a fixed standard or with another large class. Again this is where normality starts to kick in--when using the average.

I take your point that the data in a sample will rarely appear to be normally distributed, but one of the parametric assumptions applied to many statistical analysis is that the population from which the sample is drawn has a normal distribution. If this assumption is violated, then non-parametric statistics should be employed, and the data should be described with medians and quartiles rather than means and standard deviations.

Whether the mean or the median is used, if the statistic is accumulated across many samples of the same size, then the sampling distribution of that statistic will be symmetrical, but will not be normally distributed - it will be leptokurtic, which is to say that the curve will be narrower and taller than a normal distribution.

Many of the datasets that I deal with, both my own and those of colleagues, are clearly non-normal and better described by a power curve. Sometimes the distribution can be attributed to characteristics of the measure or the sample, but sometimes there is not a clear explanation. So it seems important that statisticians and statistical psychologists consider carefully what assumptions can be safely made about the population data. Unfortunately, few published papers include any description of the sample data and the degree to which is conforms to assumptions of normality. Lacking this information in the published record, the situation will be difficult to investigate.

I am working on a line of investigation that suggests that while the bell curve may tend to describe the distribution of human characteristics, the power curve more represents the outcome of operation of our economy and society. This would describe a non-optimal process.