That’s Just Not Normal – Power Laws

Most people are familiar with the idea of a bell curve. This statistical concept describes a distribution of outcomes of commonly observed processes and natural phenomena. A classic example of this idea is seen in the normal distribution of the height of human beings. The average American adult male is approximately 5’9″. About 68% of men are within 3″ of that mean (i.e. 5’6″ to 6′). And 95% of all men are within 6″ of the mean (5’3″ to 6’3″). Therefore, only 5% of adult American males are either greater than 6’3″ or shorter than 5’3″. As a normal distribution, heights are clustered heavily towards the mean, with outliers being rarer occurrences. The mean and median are equal. That is, there are an equal number of people above the average as below the average. Additionally, the entire range of values is relatively small. The tallest person to have ever lived is less than 5 times as tall as the shortest person. Other examples of measurements that fit roughly to a normal distribution include SAT and IQ scores. The graphic below shows our example of average height, with the classic bell shaped curve.

But there is another commonly observed distribution type that has a much different set of characteristics. This class of distributions is governed by a concept known as power laws. As opposed to the clustering around the “mean” seen in normal distributions, power law distributions are skewed so that a small number of outcomes have dramatically higher values than the remaining population. Typically, as well, a larger number of values are below the mean. Let’s look at a quick example. The size of U.S. cities is representative of a power distribution. A small number of very large cities constitute a big percentage of the overall population. Also, a very large number of small cities represent a small portion of the population. Out of 25,000 places, just the top 20 cities contain roughly 10% of the population. Also, as opposed to height, the ratio of the largest city to the smallest spans several orders of magnitude. New York, with 8.24 million people is 2 million times larger than Lost Springs, Wyoming. The graphic below shows the classic shape of a power law distribution.

Other well known examples of power law distributions are the popularity of surnames, best selling books and the most popular websites. In each case, a small number of “top performers” (think Smith or Gladwell or Google) is dramatically larger than a huge number of marginal performers. A good way to think of power law distributions is that a handful of the largest items account for a markedly disproportionate percentage of the combined values of the overall distribution.

Let’s examine our surname example. Similar to the population of cities, the top 20 names in the U.S. accounted for over 8% of the population. The leading name, Smith, accounted for over 1% of the total by itself. On the other end of the scale, there are 10’s of thousands of surnames with less than 2500 occurrences, less than 1/1000 as common as Smith.

A curious feature of power law distributions is that many follow a rule known as Zipf’s law. This idea, proposed by the linguist George Kingsley Zipf in 1935, states that the frequency of an item will be inversely proportional to its rank. That is, the 2nd most popular item will be 1/2 the size of the highest ranked item. The 3rd most popular will be 1/3 the size of the leading item. Let’s look at a table of largest U.S. cities as an example:

City

Population

Zipf’s Law Ratio

Zipf’s Predicted Pop

Error

New York

8,244,910

Los Angeles

3,819,702

1/2

4,122,455

7%

Chicago

2,707,120

1/3

2,748,303

1%

Houston

2,145,146

1/4

2,061,228

-4%

Philadelphia

1,536,471

1/5

1,648,982

7%

Phoenix

1,469,471

1/6

1,374,152

-7%

San Antonio

1,359,758

1/7

1,177,844

-15%

San Diego

1,326,179

1/8

1,030,614

-29%

Dallas

1,223,229

1/9

916,101

-34%

San Jose

967,487

1/10

824,491

-17%

While not a perfect match, the sequence does roughly follow the predictions of the law. More recent adjustments to Zipf’s algorithm produce even more accurate results.

Why do some phenomena follow a normal distribution while others follow a power law distribution? Phenomena with normal distributions are driven by the following dynamics:

Bounding constraints that inhibit growth or change

Slow growth across time leading to a limited range of values

Events are independent from one another

Related to simple measurements and repeatable processes

Phenomena that follow a power law distribution are driven by the following dynamics:

Lack of natural bounding constraints to inhibit geometric growth

Significant growth over time leading to very large ranges of values

Inter-connectivity, dependency or relationships between items (typically described as a network effect)

Related to highly dynamic, complex systems

Let’s look again at our stereotypical examples of each distribution type, through the lens of these dynamics. The height of humans is constrained by biological and genetic factors that limit its growth. Consequently, the average height has not changed significantly over time. Over the last 1000 years, it has changed by just a few inches. Each person’s height is essentially an independent and simple measurement.

The size of cities has grown dramatically over the last several centuries. The population of New York in 1723 was 7,248, less than 1/1000 its current size. It could continue growing rapidly by incorporating adjacent land, and building taller structures. Its growth was highly dynamic and complex, based on the countless decisions of millions of people and institutions.

Outside of power laws being a mathematical curiosity, why should the average professional care about their existence? Here are a number of reasons:

Application of Statistical Tests – Anytime you are applying a statistical test to a dataset, it’s important to understand the nature of the distribution of data. Many statistical tools will only work correctly on a given distribution type.

Diversity of Products – Chris Anderson coined the term “The Long Tail” to describe elements of the new digital economy. He showed how many product offerings historically followed a classic power law distribution. For example, a small number of top albums garnered a large percentage of overall record sales. A few blockbuster movies dominated the box office. In the traditional economy, limitations of record store shelf space, or number of theaters, eliminated many low popularity works from attaining adequate distribution. Anderson argues that in digital world, with unlimited virtual “shelf space”, low popularity items will play an increasing role. Producers will be able to effectively cater to a wider range of tastes with tailored offerings.

Risk Management – A common mistake that many institutions have made is viewing disruptive events as normally distributed phenomena. This has led to an underweighting of the impact of these events. In fact, many of these types of events (e.g. natural disasters, financial crises) are represented better as power law distributions. That is, the impact of a few, rare events is dramatically greater than the average event. Consequently, the probability of a major disruptive event is much higher. As an example, if one uses a normal distribution to model the performance of the stock market, the infamous crash of 1987 would be a statistical impossibility. That is, the predicted likelihood of such an event would be infinitesimal, eliminating it from planning scenarios. However, that same meltdown is a rare but expected event, when the performance of the market is modeled as a power law distribution.

Power Law Trend – We appear to be moving to a world where power laws describe an increasing number of important phenomena. Factors such as globalization and social media create more interconnectedness and complexity, in turn leading to power law behavior. Being able to recognize power law driven phenomena will allow you to perform sounder analysis of data and make better management decisions.