Monday, August 26, 2013

Benford's Law

Benford's Law, also called the First-Digit Law, refers to the frequency distribution of digits in many (but not all) real-life sources of data. In this distribution, the number 1 occurs as the leading digit about 30% of the time, while larger numbers occur in that position less frequently: 9 as the first digit less than 5% of the time. This distribution of first digits is the same as the widths of grid-lines on a logarithmic scale. Benford's Law also concerns the expected distribution for digits beyond the first, which approach a uniform distribution.This result has been found to apply to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature). It tends to be most accurate when values are distributed across multiple orders of magnitude.

2 Comments:

This isn't a "law." It's just basic mathematics. If you say "pick a number between 1 and 9, then there will be a uniform distribution across the entire decade: 1/9 for all possible leading digits. But if you pick a number between (say) 1 and 3, then the decadal distribution is:1/3,1/3,1/3,0,0,0,0,0,0.Similarly, if you pick a number between 1 and 20, the first digit distribution is:11/20,1/10,1/20,1/20,1/20,1/20,1/20,1/20,1/20

So clearly the distribution of leading digits given an arbitrary random range is not uniform. Proving that the right distribution is the log base 10 is a little harder, but you can do a simulation if you like. In any case, Benford's "law" is just a standard scaling argument.

If I understood what you're saying, it's basically the explanation I was grasping for, but, despite it's simplicity, for some reason couldn't quite articulate.

Is the explanation:

You'd expect a big chunk of the magnitudes we encounter to be in the 1-2 range, another chunk to be in the 1-3 range, another to be in the 1-4 range, etc... So, basically, we should expect fewer 2s than 1s, fewer 3s than 4s, etc...