Data sifted from Facebook wiped after legal threats

Data sifted from Facebook wiped after legal threats

来源：未知 作者：陈咒 时间：2017-11-08 04:01:26

By Jim Giles (Image: Pete Warden) Legal threats from Facebook have led to the destruction of a social science dataset about to be released to researchers. Lawyers from the social networking site contacted Pete Warden, an entrepreneur based in Boulder, Colorado, in February after he announced plans to release data he had collected from the public profiles of 210 million Facebook users. Warden says that Facebook threatened legal action if he did not delete the data. He duly destroyed all the records, saying he did not have the funds to contest a lawsuit. Warden’s records included a “social graph”, a representation of all the friend connections between users in the dataset. This would have been a powerful research tool for social scientists and others interested in how people interact. More than 50 researchers had requested copies of the dataset, says Warden, after he had blogged about making it available. He had already used the graph to show how the social connections of the 120 million US users his data covered were apparently concentrated in regional clusters. Some researchers wanted to combine Warden’s data with other sources, such as census records, to probe the link between factors such as income, mobility, employment and social connections. Warden obtained the data by writing “crawler” software that harvested information from Facebook profile pages which could be viewed without logging in to the site. He gathered users’ names, locations, friends and interests, but planned to remove names and use other anonymisation methods to prevent specific profiles being linked to individuals. In compiling his data without seeking permission, Warden had violated the site’s terms of service, said a Facebook spokesperson, adding: “Warden was extremely cooperative with Facebook from the moment we contacted him and he abandoned his plans.” Researcher Ben Zhao at the University of California, Santa Barbara, compiled a dataset of 10 million Facebook profiles in 2008. He says that he notified Facebook beforehand and that his relations with the company have been amicable. But Warden’s decision not to notify Facebook is not unprecedented. Many websites host a small text file called “robots.txt” which is read by crawler software, such as that used by search engines, to determine what parts of the site they can and can’t access (see New Scientist‘s robots.txt file). Rather than read the terms of service, researchers often refer to that file to determine if crawler software will work on that particular site. Facebook’s robots.txt file doesn’t prohibit the use of crawler software to sift through public profiles. Joseph Bonneau at the University of Cambridge, UK, took that approach before using crawler software on Facebook to study online security. “I have never asked for nor received permission, and this is the case for the vast majority of researchers,” he says. “A lot of researchers have data from Facebook. The legal status of this data is not clear, but Facebook has much more legal resources than researchers do.” In acting to stop Warden releasing his data, Facebook may have feared a user backlash over privacy, or that the data could be misused. A marketing company could have tried to de-anonymise it to send targeted spam,