5.
What does it mean? Let p (x )dx be the fraction of cities with a population between x and x + dx If this histogram is a straight line on log − log scales, then ln p (x ) = −α ln x + c where α and c are constants Hence p (x ) = Cx −α where C = ec

6.
What does it mean? Let p (x )dx be the fraction of cities with a population between x and x + dx If this histogram is a straight line on log − log scales, then ln p (x ) = −α ln x + c where α and c are constants Hence p (x ) = Cx −α where C = ec Distributions of this form are said to follow a power law The constant α is called the exponent of the power law We typically don’t care about c.

8.
Alleged power-law phenomena The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville The numbers of customers affected in electrical blackouts in the United States between 1984 and 2002 The number of links to web sites found in a 1997 web crawl of about 200 million web pages

9.
Alleged power-law phenomena The frequency of occurrence of unique words in the novel Moby Dick by Herman Melville The numbers of customers affected in electrical blackouts in the United States between 1984 and 2002 The number of links to web sites found in a 1997 web crawl of about 200 million web pages The number of hits on web pages The number of papers scientist write The number of citations received by papers Annual incomes Sales of books, music; in fact anything that can be sold

12.
The power law distribution The power-law distribution is p (x ) ∝ x − α where α, the scaling parameter, is a constant The scaling parameter typically lies in the range 2 < α < 3, although there are some occasional exceptions Typically, the entire process doesn’t obey a power law Instead, the power law applies only for values greater than some minimum xmin

19.
Distributional propertiesFor any power law with exponent α > 1, the median is deﬁned: x1/2 = 21/(α−1) xminIf we use power-law to model wealth distribution, then we might be interestedin the fraction of wealth in the richer half: ∞ − α +2 x1 / 2 xp (x )dx x1/2 ∞ = = 2−(α−2)/(α−1) xp (x )dx xmin xminprovided α > 2, the integrals convergeWhen the wealth distribution was modelled using a power-law, α wasestimated to be 2.1, so 2−0.091 94% of the wealth is in the hands of thericher 50% of the population

20.
Top-heavy distribution & the 80/20 rulePareto principle: aka 80/20 ruleThe law of the vital few, and the principle of factor sparsity states that, for manyevents, roughly 80% of the effects come from 20% of the causes

21.
Top-heavy distribution & the 80/20 rulePareto principle: aka 80/20 ruleThe law of the vital few, and the principle of factor sparsity states that, for manyevents, roughly 80% of the effects come from 20% of the causes For example, the distribution of world GDP Population quantile Income Richest 20% 82.70% Second 20% 11.75% Third 20% 2.30% Fourth 20% 1.85% Poorest 20% 1.40%Other examples are: 80% of your proﬁts come from 20% of your customers 80% of your complaints come from 20% of your customers 80% of your proﬁts come from 20% of the time you spend

22.
Scale-free distributions The power law distribution is often referred to as a scale-free distribution A power law is the only distribution that is the same on regardless of the scale

23.
Scale-free distributions The power law distribution is often referred to as a scale-free distribution A power law is the only distribution that is the same on regardless of the scale For any b, we have p (bx ) = g (b )p (x ) That is, if we increase the scale by which we measure x by a factor of b, the shape of the distribution p (x ) is unchanged, except for a multiplicative constant The PL distribution is the only distribution with this property

25.
Random numbers The discrete case is a bit more tricky Instead, we have to solve the CMF numerically by “doubling up” and a binary search

26.
Random numbers The discrete case is a bit more tricky Instead, we have to solve the CMF numerically by “doubling up” and a binary search So for a given u, we ﬁrst bound the solution to the equation via: 1: x2 := xmin 2: repeat 3: x1 := x2 4: x2 := 2x1 5: until P (x2 ) < 1 − u Basically, the algorithm tests whether u ∈ [x , 2x ), starting with x = xmin Once we have the region we use a binary search

30.
Method 2Similar to method 1, but Don’t bin, just plot the data CDF Then use least squares to estimate α Using linear regression is a bad idea

31.
Method 2Similar to method 1, but Don’t bin, just plot the data CDF Then use least squares to estimate α Using linear regression is a bad idea Error estimates are completely off It doesn’t even provide a good point estimate of α

32.
Method 2Similar to method 1, but Don’t bin, just plot the data CDF Then use least squares to estimate α Using linear regression is a bad idea Error estimates are completely off It doesn’t even provide a good point estimate of α On the bright side you do get a good R 2 value

36.
Estimating xmin Recall that the power-law pdf is −α α−1 x p (x ) = xmin xmin where α > 1 and xmin > 0 xmin isn’t a parameter in the usual since - it’s a cut-off in the state space Typically power-laws are only present in the distributional tails. So how much of the data should we discard so our distribution ﬁts a power-law?

39.
Estimating xmin : method 3 Minimise the distance between the data and the ﬁtted model CDFs: D = max |S (x ) − P (x )| x ≥xmin where S (x ) is the CDF of the data and P (x ) is the theoretical CDF (the Kolmogorov-Smirnov statistic) Our estimate xmin is then the value of xmin that minimises D Use some form of bootstrapping to get a handle on uncertainty of xmin

41.
Word distributions Suppose we type randomly on a typewriter We hit the space bar with probability qs and a letter with probability ql If there are m letters in the alphabet, then ql = (1 − qs )/m http://activerain.com/

42.
Word distributions Suppose we type randomly on a typewriter We hit the space bar with probability qs and a letter with probability ql If there are m letters in the alphabet, then ql = (1 − qs )/m The distribution of word frequency has http://activerain.com/ the form p (x ) ∼ x −α

46.
Random walks With a bit of algebra, we get: n (2n) f2n = (2n − 1)22n For large n, we get 2 f2n n (2n − 1)2 So as n → ∞, we get f2n ∼ n−3/2 So the distribution of return times follows a power law with exponent α = 3/2!

47.
Random walks With a bit of algebra, we get: n (2n) f2n = (2n − 1)22n For large n, we get 2 f2n n (2n − 1)2 So as n → ∞, we get f2n ∼ n−3/2 So the distribution of return times follows a power law with exponent α = 3/2! Tenuous link to phylogenetics

48.
Phase transitions and critical phenomena Suppose we have a simple lattice. Each square is coloured with probability p = 0.5 We can look at the clusters of coloured squares. For example, the mean cluster area, s , of a randomly chosen square: If a square is white, then zero If a square is coloured, but surround by white, then one etc

49.
Phase transitions and critical phenomena Suppose we have a simple lattice. Each square is coloured with probability p = 0.5 We can look at the clusters of coloured squares. For example, the mean cluster area, s , of a randomly chosen square: If a square is white, then zero If a square is coloured, but surround by white, then one etc When p is small, s is independent of the lattice size When p is large, s depends on the lattice size

50.
Phase transitions and critical phenomena p=0.3 As we increase p, the value of s also increases For some p, s starts to increase with the lattice size p=0.5927... This is know as the critical value, and is p = pc = 0.5927462.. If we calculate the distribution of p (s ), then when p = pc , p (s ) follows a power-law distribution p=0.9

51.
Forest ﬁreThis simple model has been used as a primitive model of forest ﬁres We start with an empty lattice and trees grow at random Every so often, a forest ﬁre strikes at random If the forest is too connected, i.e. large p, then the forest burns down So (it is argued) that the forest size oscillates around p = pc

52.
Forest ﬁreThis simple model has been used as a primitive model of forest ﬁres We start with an empty lattice and trees grow at random Every so often, a forest ﬁre strikes at random If the forest is too connected, i.e. large p, then the forest burns down So (it is argued) that the forest size oscillates around p = pc This is an example of self-organised criticality

53.
Future work There isn’t even an R package for power law estimation Writing this talk I have (more or less) written one Use a Bayesian change point model to estimate xmin in a vaguely sensible way RJMCMC to change between the power law and other heavy tailed distributionsReferences A. Clauset, C.R. Shalizi, and M.E.J. Newman. Power-lawdistributionsinempiricaldata. http://arxiv.org/abs/0706.1062 MEJ Newman. Powerlaws,ParetodistributionsandZipf’slaw. http://arxiv.org/abs/cond-mat/0412004