An Algorithm for Identify the Source of Viruses and Facebook Rumors

A team of EPFL scientists has developed an algorithm that can identify the source of an epidemic or information circulating within a network, a method that could also be used to help with criminal investigations.

(Credit: Photos.com)

Investigators are well aware of how difficult it is to trace an unlawful act to its source. The job was arguably easier with old, Mafia-style criminal organizations, as their hierarchical structures more or less resembled predictable family trees.

In the Internet age, however, the networks used by organized criminals have changed. Innumerable nodes and connections escalate the complexity of these networks, making it ever more difficult to root out the guilty party. EPFL researcher Pedro Pinto of the Audiovisual Communications Laboratory and his colleagues have developed an algorithm that could become a valuable ally for investigators, criminal or otherwise, as long as a network is involved. The team’s research was published August 10, 2012, in the journal Physical Review Letters.

Finding the source of a Facebook rumor

“Using our method, we can find the source of all kinds of things circulating in a network just by ‘listening’ to a limited number of members of that network,” explains Pinto. Suppose you come across a rumor about yourself that has spread on Facebook and been sent to 500 people — your friends, or even friends of your friends. How do you find the person who started the rumor? “By looking at the messages received by just 15-20 of your friends, and taking into account the time factor, our algorithm can trace the path of that information back and find the source,” Pinto adds. This method can also be used to identify the origin of a spam message or a computer virus using only a limited number of sensors within the network.

Trace the propagation of an epidemic

Out in the real world, the algorithm can be employed to find the primary source of an infectious disease, such as cholera. “We tested our method with data on an epidemic in South Africa provided by EPFL professor Andrea Rinaldo’s Ecohydrology Laboratory,” says Pinto. “By modeling water networks, river networks, and human transport networks, we were able to find the spot where the first cases of infection appeared by monitoring only a small fraction of the villages.”

The method would also be useful in responding to terrorist attacks, such as the 1995 sarin gas attack in the Tokyo subway, in which poisonous gas released in the city’s subterranean tunnels killed 13 people and injured nearly 1,000 more. “Using this algorithm, it wouldn’t be necessary to equip every station with detectors. A sample would be sufficient to rapidly identify the origin of the attack, and action could be taken before it spreads too far,” says Pinto.

Identifying the brains behind a terrorist attack

Computer simulations of the telephone conversations that could have occurred during the terrorist attacks on September 11, 2001, were used to test Pinto’s system. “By reconstructing the message exchange inside the 9/11 terrorist network extracted from publicly released news, our system spit out the names of three potential suspects — one of whom was found to be the mastermind of the attacks, according to the official enquiry.”

The validity of this method thus has been proven a posteriori. But according to Pinto, it could also be used preventatively — for example, to understand an outbreak before it gets out of control. “By carefully selecting points in the network to test, we could more rapidly detect the spread of an epidemic,” he points out. It could also be a valuable tool for advertisers who use viral marketing strategies by leveraging the Internet and social networks to reach customers. For example, this algorithm would allow them to identify the specific Internet blogs that are the most influential for their target audience and to understand how in these articles spread throughout the online community.