Developer Shares Story of Being Threatened by Facebook for Crawling

Pete Warden, a former software engineer at Apple, who is now working on his own start-up, posted an interesting story about how Facebook threatened to sue him for crawling the social network. I reached out to both Warden and Facebook for more details, but so far have only received response from Facebook, who calls the incident as "violation of our terms."

But first, Warden's story. Read the whole thing in his words here for more context about what he wanted to do with the data, but to make a long story short, he was building a tool to bring data from email and various social networks into one place to make it easier for users to manage their contacts, and he crawled Facebook. He says he checked Facebook's robot.txt, and that "they welcome the web crawlers that search engines use to gather their data," so he wrote his own. He was able to obtain data like which pages people were fans of and links to a few of their friends. He created a map showing how different countries, states and cities were connected to each other and released it so that others could use the information. Once Facebook caught wind of this, they threatened legal action. Warden writes:

Their contention was robots.txt had no legal force and they could sue anyone for accessing their site even if they scrupulously obeyed the instructions it contained. The only legal way to access any web site with a crawler was to obtain prior written permission.

Obviously this isn't the way the web has worked for the last 16 years since robots.txt was introduced, but my lawyer advised me that it had never been tested in court, and the legal costs alone of being a test case would bankrupt me. With that in mind, I spent the next few weeks negotiating a final agreement with their attorney. They were quite accommodating on the details, such as allowing my blog post to remain up, and initially I was hopeful that they were interested in a supervised release of the data set with privacy safeguards. Unfortunately it became clear towards the end that they wanted the whole set destroyed.

Facebook Public Policy Communications Manager Andrew Noyes tells WebProNews, "Pete Warden aggregated a large amount of data from over 200 million users without our permission, in violation of our terms. He also publicly stated he intended to make that raw data freely available to others. Warden was extremely cooperative with Facebook from the moment we contacted him and he abandoned his plans."

"We have, and will continue to, act to enforce our terms of service where appropriate," adds Noyes.

Noyes pointed to Facebook's Statement of Rights and Responsibilities, which states that "You will not collect users' content or information, or otherwise access Facebook, using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission." That's under the safety section, by the way.

"I'm bummed that Facebook are taking a legal position that would cripple the web if it was adopted (how many people would Google need to hire to write letters to every single website they crawled?), concludes Warden. "And a bit frustrated that people don't understand that the data I was planning to release is already in the hands of lots of commercial marketing firms, but mostly I'm just looking forward to leaving the massive distraction of a legal threat behind and getting on with building my startup."

Hearing some of what both parties have to say on the issue, what are your thoughts? Discuss here.

If we hear back from Warden or if Facebook offers us more insight into the situation, which I'm told may still happen, I'll update this article.