Zipf law and the firm size distribution: a critical discussion of popular estimators

Abstract

The upper tail of the firm size distribution is often assumed to follow a Power Law. Several recent papers, using different estimators and different data sets, conclude that the Zipf Law, in particular, provides a good fit, implying that the fraction of firms with size above a given value is inversely proportional to the value itself. In this article we compare the asymptotic and small sample properties of different methods through which this conclusion has been reached. We find that the family of estimators most widely adopted, based on an OLS regression, is in fact unreliable and basically useless for appropriate inference. This finding raises doubts about previously identified Zipf behavior. Based on extensive numerical analysis, we recommend the adoption of the Hill estimator over any other method when individual observations are available.

Keywords

This work was partially supported by the Italian Ministry of University and Research, grant PRIN 2009 “The growth of firms and countries: distributional properties and economic determinants”, prot. 2009H8WPX5.

JEL Classification

L11 C15 C46 D20

Appendix

To keep comparability with Gabaix and Ibragimov (2011), in Tables 4 and 5 we extend their analysis of AR(1) and MA(1) data to all the estimators considered in this article, although these types of time dependence are more compelling for applications in finance. We design the Monte Carlo exactly as in Gabaix and Ibragimov (2011).

Table 4

AR(1) data with sub-asymptotic deviation from Zipf Law

Top 50

Top 500

c

ρ

Hill

Rank −1/2

Rank

CDF

PDF

Hill

Rank −1/2

Rank

CDF

PDF

0.0

0.0

1.022 (0.07)

1.010 (0.05)

0.923 (0.12)

0.964 (0.68)

0.605 (0.80)

1.002 (0.05)

0.998 (0.05)

0.978 (0.08)

0.973 (0.66)

0.864 (0.50)

(0.147) (0.149)

(0.202) (0.195)

(0.185) (0.182)

(0.055) (0.238)

(0.138) (0.124)

(0.045) (0.045)

(0.063) (0.063)

(0.062) (0.063)

(0.033) (0.131)

(0.070) (0.099)

0.0

0.5

1.119 (0.31)

1.174 (0.15)

1.077 (0.15)

1.170 (0.82)

0.718 (0.53)

1.163 (0.78)

1.124 (0.51)

1.102 (0.46)

1.111 (0.84)

1.000 (0.44)

(0.161) (0.253)

(0.235) (0.321)

(0.215) (0.297)

(0.051) (0.355)

(0.142) (0.186)

(0.052) (0.100)

(0.071) (0.146)

(0.070) (0.145)

(0.032) (0.223)

(0.065) (0.162)

0.0

0.8

1.315 (0.58)

1.483 (0.41)

1.369 (0.35)

1.485 (0.94)

0.891 (0.43)

1.306 (0.89)

1.261 (0.74)

1.237 (0.72)

1.260 (0.88)

1.120 (0.65)

(0.190) (0.452)

(0.297) (0.559)

(0.274) (0.515)

(0.079) (0.602)

(0.149) (0.334)

(0.059) (0.199)

(0.080) (0.268)

(0.078) (0.264)

(0.040) (0.348)

(0.071) (0.273)

0.5

0.0

1.046 (0.09)

1.024 (0.05)

0.935 (0.11)

0.974 (0.68)

0.615 (0.77)

1.157 (0.90)

1.084 (0.22)

1.061 (0.15)

1.014 (0.66)

0.915 (0.26)

(0.151) (0.154)

(0.205) (0.200)

(0.187) (0.187)

(0.056) (0.244)

(0.141) (0.128)

(0.052) (0.056)

(0.069) (0.076)

(0.067) (0.075)

(0.036) (0.148)

(0.082) (0.118)

0.5

0.5

1.142 (0.33)

1.189 (0.16)

1.091 (0.15)

1.184 (0.83)

0.729 (0.51)

1.303 (0.97)

1.199 (0.67)

1.175 (0.63)

1.154 (0.84)

1.050 (0.46)

(0.165) (0.264)

(0.238) (0.330)

(0.218) (0.305)

(0.051) (0.364)

(0.145) (0.192)

(0.058) (0.122)

(0.076) (0.169)

(0.074) (0.167)

(0.034) (0.245)

(0.075) (0.185)

0.5

0.8

1.342 (0.59)

1.507 (0.42)

1.390 (0.37)

1.508 (0.94)

0.910 (0.43)

1.442 (0.95)

1.339 (0.80)

1.313 (0.78)

1.315 (0.89)

1.186 (0.67)

(0.194) (0.471)

(0.301) (0.577)

(0.278) (0.532)

(0.079) (0.621)

(0.152) (0.347)

(0.065) (0.236)

(0.085) (0.303)

(0.083) (0.298)

(0.042) (0.385)

(0.079) (0.308)

0.8

0.0

1.182 (0.28)

1.108 (0.05)

1.010 (0.08)

1.036 (0.68)

0.673 (0.57)

1.475 (1.00)

1.313 (0.94)

1.284 (0.91)

1.141 (0.70)

1.053 (0.24)

(0.171) (0.184)

(0.222) (0.235)

(0.202) (0.219)

(0.062) (0.281)

(0.158) (0.153)

(0.066) (0.074)

(0.083) (0.110)

(0.081) (0.109)

(0.049) (0.201)

(0.112) (0.168)

0.8

0.5

1.255 (0.47)

1.268 (0.21)

1.162 (0.18)

1.251 (0.85)

0.786 (0.42)

1.699 (1.00)

1.442 (0.91)

1.410 (0.89)

1.298 (0.85)

1.205 (0.56)

(0.181) (0.313)

(0.254) (0.376)

(0.232) (0.347)

(0.054) (0.408)

(0.159) (0.223)

(0.076) (0.176)

(0.091) (0.241)

(0.089) (0.237)

(0.043) (0.317)

(0.106) (0.259)

0.8

0.8

1.466 (0.66)

1.611 (0.48)

1.485 (0.42)

1.609 (0.95)

0.988 (0.42)

1.896 (1.00)

1.613 (0.90)

1.580 (0.89)

1.513 (0.89)

1.404 (0.72)

(0.212) (0.556)

(0.322) (0.659)

(0.297) (0.607)

(0.082) (0.703)

(0.166) (0.397)

(0.085) (0.354)

(0.102) (0.428)

(0.100) (0.420)

(0.053) (0.514)

(0.107) (0.429)

Note: Estimates of tail index for the AR(1) process Yi = ρYi−1+𝜖i, with innovations 𝜖i following P(X > x) = x−1(1+c(x−1−1)) ,x > 1,c ∈ [0, 1). Results over 10,000 Monte Carlo simulations with sample size N = 2000 and varying tail width (Top-50 vs. Top-500 observations), for different values of c and ρ. CDF and PDF estimates computed with 15 bins. For each combination: the first line reports point estimates of tail index averaged over the replications and, in parenthesis, the percentage of times the null of unitary tail index is rejected (at 5 % confidence level); the second line shows, in parenthesis, the theoretical standard errors (usual OLS standard errors for CDF and PDF estimators; propagated via Taylor expansion of the asymptotic variance as in Gabaix and Ibragimov (2011) for the Rank −1/2 estimator; as given in Eq. 2.4 for the Hill estimator) together with the sample standard errors.

Table 5

MA(1) data with sub-asymptotic deviation from Zipf Law

Top 50

Top 500

c

θ

Hill

Rank −1/2

Rank

CDF

PDF

Hill

Rank −1/2

Rank

CDF

PDF

0.0

0.0

1.022 (0.07)

1.010 (0.05)

0.923 (0.12)

0.964 (0.68)

0.605 (0.80)

1.002 (0.05)

0.998 (0.05)

0.978 (0.08)

0.973 (0.66)

0.864 (0.50)

(0.147) (0.149)

(0.202) (0.195)

(0.185) (0.182)

(0.055) (0.238)

(0.138) (0.124)

(0.045) (0.045)

(0.063) (0.063)

(0.062) (0.063)

(0.033) (0.131)

(0.070) (0.099)

0.0

0.5

1.065 (0.17)

1.077 (0.11)

0.987 (0.15)

1.048 (0.74)

0.650 (0.69)

1.073 (0.40)

1.052 (0.21)

1.031 (0.18)

1.021 (0.73)

0.920 (0.38)

(0.154) (0.202)

(0.215) (0.277)

(0.197) (0.257)

(0.056) (0.328)

(0.139) (0.164)

(0.048) (0.066)

(0.067) (0.095)

(0.065) (0.094)

(0.035) (0.175)

(0.068) (0.129)

0.0

0.8

1.075 (0.20)

1.077 (0.12)

0.988 (0.17)

0.998 (0.73)

0.598 (0.67)

1.078 (0.43)

1.055 (0.22)

1.034 (0.19)

0.999 (0.69)

0.893 (0.30)

(0.155) (0.218)

(0.215) (0.292)

(0.198) (0.272)

(0.067) (0.340)

(0.164) (0.197)

(0.048) (0.068)

(0.067) (0.098)

(0.065) (0.097)

(0.038) (0.167)

(0.090) (0.130)

0.5

0.0

1.046 (0.09)

1.024 (0.05)

0.935 (0.11)

0.974 (0.68)

0.615 (0.77)

1.157 (0.90)

1.084 (0.22)

1.061 (0.15)

1.014 (0.66)

0.915 (0.26)

(0.151) (0.154)

(0.205) (0.201)

(0.187) (0.187)

(0.056) (0.244)

(0.141) (0.128)

(0.052) (0.055)

(0.069) (0.076)

(0.067) (0.075)

(0.036) (0.148)

(0.082) (0.118)

0.5

0.5

1.087 (0.20)

1.091 (0.11)

0.999 (0.15)

1.059 (0.75)

0.660 (0.66)

1.221 (0.96)

1.131 (0.47)

1.108 (0.39)

1.062 (0.73)

0.968 (0.32)

(0.157) (0.209)

(0.218) (0.284)

(0.200) (0.264)

(0.057) (0.336)

(0.141) (0.168)

(0.055) (0.080)

(0.072) (0.112)

(0.070) (0.111)

(0.038) (0.197)

(0.078) (0.149)

0.5

0.8

1.097 (0.22)

1.090 (0.13)

1.000 (0.17)

1.009 (0.72)

0.608 (0.64)

1.226 (0.96)

1.133 (0.48)

1.110 (0.40)

1.037 (0.68)

0.944 (0.21)

(0.158) (0.225)

(0.218) (0.300)

(0.200) (0.279)

(0.068) (0.348)

(0.167) (0.202)

(0.055) (0.083)

(0.072) (0.115)

(0.070) (0.114)

(0.041) (0.188)

(0.102) (0.151)

0.8

0.0

1.182 (0.28)

1.108 (0.05)

1.010 (0.08)

1.036 (0.68)

0.673 (0.57)

1.475 (1.00)

1.313 (0.94)

1.284 (0.91)

1.141 (0.70)

1.053 (0.24)

(0.171) (0.184)

(0.222) (0.235)

(0.202) (0.219)

(0.062) (0.281)

(0.158) (0.153)

(0.066) (0.074)

(0.083) (0.110)

(0.081) (0.109)

(0.049) (0.201)

(0.112) (0.168)

0.8

0.5

1.208 (0.36)

1.169 (0.14)

1.070 (0.15)

1.125 (0.77)

0.717 (0.52)

1.588 (1.00)

1.368 (0.92)

1.338 (0.89)

1.198 (0.75)

1.114 (0.40)

(0.174) (0.246)

(0.234) (0.325)

(0.214) (0.301)

(0.060) (0.378)

(0.155) (0.196)

(0.071) (0.112)

(0.087) (0.164)

(0.085) (0.162)

(0.049) (0.265)

(0.109) (0.215)

0.8

0.8

1.216 (0.37)

1.167 (0.15)

1.070 (0.16)

1.075 (0.73)

0.669 (0.51)

1.599 (1.00)

1.371 (0.92)

1.341 (0.89)

1.169 (0.70)

1.097 (0.28)

(0.175) (0.263)

(0.233) (0.343)

(0.214) (0.318)

(0.073) (0.394)

(0.181) (0.232)

(0.072) (0.117)

(0.087) (0.169)

(0.085) (0.167)

(0.056) (0.257)

(0.138) (0.218)

Note: Estimates of tail index for the MA(1) process Yi = 𝜖i+θ𝜖i−1, with innovations 𝜖i following P(X > x) = x−1(1+c(x−1−1)) ,x > 1,c ∈ [0, 1). Results over 10,000 Monte Carlo simulations with sample size N = 2000 and varying tail width (Top–50 vs. Top–500 observations), for different values of c and θ. CDF and PDF estimates computed with 15 bins. For each combination: the first line reports point estimates of tail index averaged over the replications and, in parenthesis, the percentage of times the null of unitary tail index is rejected (at 5% confidence level); the second line shows, in parenthesis, the theoretical standard errors (usual OLS standard errors for CDF and PDF estimators; propagated via Taylor expansion of the asymptotic variance as in Gabaix and Ibragimov (2011) for the Rank −1/2 estimator; as given in Eq. 2.4 for the Hill estimator) together with sample standard errors.

For the case of AR(1) DGP, we generate R = 10,000 random samples of size N = 2,000 extracted from the AR(1) process

with 𝜖 extracted from Eq. 4.1, for a given combination of the values of c and ρ. On each sample we apply all the estimators for two different tail widths, i.e. including either the top–50 or the top–500 observations in the tails. We then repeat the Monte Carlo tests for different values of the parameters.

In Table 4, we report the average of point estimates across the 10,000 runs, together with asymptotic (theoretical) and sampled standard errors, as well as rejection rates of a t-test (at 5 % level) of the true null of unitary tail index performed at each run. First, consider the sensitivity to the AR(1) structure, setting aside the impact of the sub-asymptotic correction (i.e., set c = 0 and vary ρ), and take the case when the top–50 observations are considered. Although all the rejection rates are above the theoretical 5 %, the results provide a clear ranking. First, the CDF and PDF estimators both severely over-reject. Second, among the other three estimators, the Rank −1/2 is over-performing the others. However, if we take the top–500 observations in the tail, then the frequency at which the true null of unitary tail index is mistakenly rejected rapidly grows to above 50 % for all the estimators. Similar conclusions emerge when we let both c and ρ vary at the same time.

Table 5 replicates the analysis to study the properties under the MA(1) process

with 𝜖 ∼ (4.1). As before, we simulate R = 10,000 random samples of size N = 2,000 with varying c and θ, and again compare the behavior of the estimators for different tail width (top–50 and top–500 observations). The findings for θ = 0 obviously replicate the analysis on AR(1) with ρ = 0. Further, if we switch off the sub-asymptotic correction (i.e. set c = 0, and vary θ), we observe that, first, the CDF and PDF estimators are once again unreliable, with very high rejection rates. Second, although rejection rates are above the theoretical 5 % for all the methods, the Rank and Rank −1/2 estimators perform better (smaller rejection rates) than the other methods. The Rank performs slightly better if the tail includes the top-50 observations, while the Rank −1/2 is slightly better for the top–500 observations. Third, the patterns are similar when we let c and θ vary together. If anything, we notice that the rejection rates associated with all the estimators rapidly increase to above 20 % if we take the top-500 observations in the tail. Conversely, they are less dependent from the parameters in the top-50 exercise.