Math Detectives

Millions of people use social media sites to keep in touch with friends and family. According to the U.S. Department of Justice, some social media users use stolen identities in order to post inflammatory images and advertisements online. Benford’s Law is a mathematical concept that can help identify these fake accounts.

Hill gave his students an assignment. They were either to flip a coin 200 times and record the results, or they could simply pretend to flip a coin and create 200 fake results. To their surprise, Hill could usually tell which lists were fake just by glancing at their results. It turns out that Hill was using math, not magic.

“The truth is, most people don’t know the real odds of such an exercise, so they can’t fake data convincingly,” Hill explained.

Hill used a mathematical idea called Benford’s Law to spot the fakes. Benford’s Law looks at the first number, or leading digit, of each number in a set. For example, the numbers 12 and 150, each have the same leading digit, which is 1. In contrast, the number 750 has the leading digit 7. Benford’s Law says something unexpected about a set of numbers: the numbers 1 through 9 do not each have the same chance of being a leading number. In a real set of numbers, 1 is the leading digit 30.1 percent of the time. Then for the numbers 2 through 9, each will be a leading digit less often than the one that preceded it.

Looking Out for Number One

In 1881, astronomer Simon Newcomb needed to make a difficult calculation. He consulted a book of logarithmic tables. Newcomb noticed that the pages earlier in the book, which listed numbers beginning only with 1, were more worn than the ones that began only with 2 . “That the ten digits do not occur with equal frequency must be evident to anyone making much use of logarithmic tables, and noticing how much faster the first pages wear out than the last ones,” Newcomb observed. “The first significant figure is oftener 1 than any other digit, and the frequency diminishes up to 9.”

Newcomb’s observation was rediscovered in 1938 by physicist Frank Benford, who now gives the law its name. Benford examined 20,229 sets of numbers. The sets came from a wide range of places. They came from baseball scores, numbers in magazine articles, lists of street addresses, and other sources. Benford confirmed Newcomb’s observation. He found that 1 was a leading digit 30 percent of the time—more than any other number. In comparison, 2 was a leading number 17.6 percent of the time, while 9 was a leading number only 4.6 percent of the time.

Benford’s Law can help determine whether a set of numbers is real or fake. By using Benford’s Law, and through what he called a “quite involved calculation,” Ted Hill was able to make predictions about a series of 200 coin tosses. Benford’s Law predicted that in a series of 200 coin tosses, either heads or tails will at some point come up 6 or more times in a row. The students who faked their coin tosses believed that it was unlikely for heads or tails to appear so many times in a row. If a list did not include a series of 6 or more similar results, Hill could conclude the list was probably fake.

It turns out Benford’s Law has uses other than predicting coin tosses. Accountant Mark Nigrini uses a system based on Benford’s Law to discover when people put false information on income tax forms. If the leading digits on a tax return return follow Benford’s Law—with the smaller digits appearing more often than the larger ones—it is likely that the numbers reported are accurate. However, if the numbers 5 or 6 appeared as leading digits more often than 1 or 2, Nigrini said, “I think I’d call someone in for a detailed audit.”

Benford’s Law does have limitations. It works best when certain requirements are met. First, the law works better with sets of 500 or more numbers. Second, the numbers 1 through 9 must each have an equal chance of occurring as a leading digit. For example, a list of the heights of high school football players could not follow Benford’s Law, because most high school football players are between 5 and 6 feet tall, and none are between 1 or 2 feet tall. As a result, 5 and 6 will appear more often than 1 or 2. Last, even when a set of numbers does not follow Benford’s Law, it is not proof that the set is definitely fake. The law can, however, alert someone to check a list again for errors or for false information.

Protecting Social Media

Recently, researchers have been able to use Benford’s Law to find fake user accounts on social media sites like Facebook, Instagram, or Twitter. Fake social media accounts can have real consequences.

Jonathan Albright, a research director at Columbia University, said the Justice Department’s indictment shows how a useful social media site like Facebook can also be used as a weapon. “Facebook built incredibly effective tools which let Russia profile citizens here in the U.S. and figure out how to manipulate us,” Albright believed. “Facebook, essentially, gave them everything they needed.” It is believed that as many as 150 million Americans encountered this Russian propaganda through social media sites like Facebook and Instagram.

If Benford’s Law can help people spot fake coin tosses and fraudulent tax returns, can it also help people detect fake social media accounts? University of Maryland professor Jen Golbeck has said the answer is yes.

This graph illustrates Benford’s Law. The law says that in a set of numbers, the number 1 will appear as a the first digit in a number about 30 percent of the time. Then for the numbers 2 through 9, each number will be a leading digit less often than the one that preceded it. If a set of numbers does not follow Benford’s Law, it can be a sign that it is fake.

Friend Counts Offer a Clue

According to Golbeck, a list of real Facebook users should be expected to follow Benford’s Law. If you recall, in a set of real numbers, you can expect the number 1 to be the leading digit about 30 percent of the time. Therefore, if you examined 1,000 Facebook users, you would expect 300 of them, or 30 percent, to have a friend count with the leading digit 1. In other words, 30 percent of Facebook users should have 10-19, 100-199, or 1,000-1,999 people in their friend lists.

In order to determine whether a social media user is a real person, Golbeck counted not only the number of his or her friends, but also the number of friends of each of their friends. When examining her final friends-of-friends list, Golbeck found that the vast majority of accounts on Facebook, Twitter, and other social media sites followed Benford’s Law. However, on the social media site Twitter, Golbeck found 170 accounts that did not follow Benford’s Law. These accounts, she concluded, were the fake ones.

In an online paper published in 2015, Golbeck made the following conclusion. “Nearly every last one of the 170 accounts mentioned above appeared to be engaged in suspicious activity. Some accounts were spam, but most were part of a network of Russian bots [online robots] that posted random snippets of literary works or quotations, often pulled arbitrarily from the middle of a sentence. All the Russian accounts behaved the same way: following other accounts of their type, posting exactly one stock photo image, and using a different stock photo image as the profile picture.”

“While we are currently investigating the purpose of these bot accounts’ existence, their deviation from Benford’s Law made it quite easy to identify their highly unusual behavior,” she added. “Of the 170 accounts, only 2 seemed to belong to legitimate users.”

Fake accounts do not only affect politics. In September 2017, a group of researchers at the University of Iowa found more than 50 websites that offered users fake “likes” for their posts in exchange for access to their accounts. Their user information was then used to falsely “like” even more online posts. The researchers found that similar sites were able to create as many as 100 million fake “likes” on Facebook between 2015 and 2016. If a post has more likes, Facebook’s algorithm makes it likely that the post will be seen by more people. As more people see it, it can make the information posted seem more legitimate or important, even if it isn’t true.

“When you become part of this network, you can say ‘Give me likes on this post and as soon as you request it, you get thousands of likes on a specific post,” said Zubair Shafiq, a professor of computer science at the University of Iowa in Iowa City who identified these networks.

Now that you have learned about how likes can be faked, you may want to think twice about what looks like popular opinion on social media. All those likes may not be what they appear!