Following Benford's Law, or Looking Out for No. 1

By MALCOLM W. BROWNE

Published: August 4, 1998

DR. THEODORE P. HILL asks his mathematics students at the Georgia Institute of Technology to go home and either flip a coin 200 times and record the results, or merely pretend to flip a coin and fake 200 results. The following day he runs his eye over the homework data, and to the students' amazement, he easily fingers nearly all those who faked their tosses.

''The truth is,'' he said in an interview, ''most people don't know the real odds of such an exercise, so they can't fake data convincingly.''

There is more to this than a classroom trick.

Dr. Hill is one of a growing number of statisticians, accountants and mathematicians who are convinced that an astonishing mathematical theorem known as Benford's Law is a powerful and relatively simple tool for pointing suspicion at frauds, embezzlers, tax evaders, sloppy accountants and even computer bugs.

The income tax agencies of several nations and several states, including California, are using detection software based on Benford's Law, as are a score of large companies and accounting businesses.

Benford's Law is named for the late Dr. Frank Benford, a physicist at the General Electric Company. In 1938 he noticed that pages of logarithms corresponding to numbers starting with the numeral 1 were much dirtier and more worn than other pages.

(A logarithm is an exponent. Any number can be expressed as the fractional exponent -- the logarithm -- of some base number, such as 10. Published tables permit users to look up logarithms corresponding to numbers, or numbers corresponding to logarithms.)

Logarithm tables (and the slide rules derived from them) are not much used for routine calculating anymore; electronic calculators and computers are simpler and faster. But logarithms remain important in many scientific and technical applications, and they were a key element in Dr. Benford's discovery.

Dr. Benford concluded that it was unlikely that physicists and engineers had some special preference for logarithms starting with 1. He therefore embarked on a mathematical analysis of 20,229 sets of numbers, including such wildly disparate categories as the areas of rivers, baseball statistics, numbers in magazine articles and the street addresses of the first 342 people listed in the book ''American Men of Science.'' All these seemingly unrelated sets of numbers followed the same first-digit probability pattern as the worn pages of logarithm tables suggested. In all cases, the number 1 turned up as the first digit about 30 percent of the time, more often than any other.

Dr. Benford derived a formula to explain this. If absolute certainty is defined as 1 and absolute impossibility as 0, then the probability of any number ''d'' from 1 through 9 being the first digit is log to the base 10 of (1 + 1/d). This formula predicts the frequencies of numbers found in many categories of statistics.

Probability predictions are often surprising. In the case of the coin-tossing experiment, Dr. Hill wrote in the current issue of the magazine American Scientist, a ''quite involved calculation'' revealed a surprising probability. It showed, he said, that the overwhelming odds are that at some point in a series of 200 tosses, either heads or tails will come up six or more times in a row. Most fakers don't know this and avoid guessing long runs of heads or tails, which they mistakenly believe to be improbable. At just a glance, Dr. Hill can see whether or not a student's 200 coin-toss results contain a run of six heads or tails; if they don't, the student is branded a fake.

Even more astonishing are the effects of Benford's Law on number sequences. Intuitively, most people assume that in a string of numbers sampled randomly from some body of data, the first non-zero digit could be any number from 1 through 9. All nine numbers would be regarded as equally probable.

But, as Dr. Benford discovered, in a huge assortment of number sequences -- random samples from a day's stock quotations, a tournament's tennis scores, the numbers on the front page of The New York Times, the populations of towns, electricity bills in the Solomon Islands, the molecular weights of compounds, the half-lives of radioactive atoms and much more -- this is not so.

Given a string of at least four numbers sampled from one or more of these sets of data, the chance that the first digit will be 1 is not one in nine, as many people would imagine; according to Benford's Law, it is 30.1 percent, or nearly one in three. The chance that the first number in the string will be 2 is only 17.6 percent, and the probabilities that successive numbers will be the first digit decline smoothly up to 9, which has only a 4.6 percent chance.

A strange feature of these probabilities is that they are ''scale invariant'' and ''base invariant.'' For example, it doesn't matter whether the numbers are based on the dollar prices of stocks or their prices in yen or marks, nor does it matter if the numbers are in terms of stocks per dollar; provided there are enough numbers in the sample, the first digit of the sequence is more likely to be 1 than any other.

The larger and more varied the sampling of numbers from different data sets, mathematicians have found, the more closely the distribution of numbers approaches what Benford's Law predicted.