Google's Orkut Personal Information Offered Outside Orkut

It was rumored after the Orkut social network service was shut down a few days following its launch that Google was working to upgrade features to prevent the site's data from being mined.

Certainly at least one company got a good chunk of data, as shown by the Orkut Personal Network Geomapper. It lets you look up anyone in the Orkut database at the time the information was mined, then see their connections.

For example, curious about Google cofounder Larry Page's network? Here are his personal connections, mapped pleasantly across the United States.

The service is definitely cool, helping you visualize your network of friends geographically. It's the type of feature you'd expect Google itself to offer. Perhaps those running Orkut will take note and offer something similar internally, using current data.

So far, I haven't found that any personal data is being revealed beyond the names of your connections. However, no doubt that personal data is part of the information that was mined, which is an embarrassment for Google.

Certainly, anyone within Orkut can find some of this data directly. But the amount of data Orkut provides to those outside someone's personal network can be limited. It's possible that no such limits are part of this database.

Bear in mind that the data is apparently fairly old. But Orkut, mined once, leaves fears that it could be mined again or worse.

In a last bit of irony, Google has been indexing pages from the Geomapper site -- nearly 700, at last count. In fact, the Geomapper site itself is currently ranked 10th on Google for search on orkut.

Source Of Data

Rolan Yang, creator of the Geomapper site, said he gained the information from an acquaintance.

"Roughly two months ago, an associate passed me a file which contained information believed to be [the” Orkut web site. I do not know how he obtained the information, and I am not sure I want to know. My best guess would be that the information was scraped by some sort of search engine spider since it only contained data which is visible from one's public Orkut profile (username, location if given, friends, etc). Since subscribers are continually signing on in real-time, capturing an entire mirror of the website is not likely to be possible," he said, via email.

Yang added that the site wasn't originally intended for use outside a small number of people he knows:

"The Orkut Personal Network GeoMapper was written out of personal scientific curiosity and its use was meant for only a small circle of friends. Unfortunately someone leaked the URL to a 'blog' network after which all hell broke loose!," he wrote.

Google Blogoscoped is one such blog that passed along the URL back on April 29. Using Feedster, I can see other blogs have picked it up since at least April 25.

Google Reaction?

When this story was first posted, Yang said he'd not been contacted by Google about his site.

"Why have Google's lawyers not contacted me yet? I can't say for sure, but would you think that a lawsuit filed in protest of 'spidering and caching of website information' might just be in conflict of interest with their primary business?," he wrote.

(NOTE: About a week later, Google issued a cease-and-desist letter. See this article for more about that.)

Orkut's terms do have provisions against "using any robot, spider, site search/retrieval application, or other device to retrieve or index any portion or the orkut.com service" and a litany of other things that it might consider unauthorized use. But Yang says he didn't do the actual data acquisition, and if he's not an Orkut member, he wouldn't appear to be bound by its terms.

Geomapper User Reaction

Yang said he's received mostly positive comments since the site went live in the past few weeks, along with a few negative ones are relating to privacy. His response was that these people may not realize they already agreed that such data might be reused at least by Google itself in other forms, as per the Orkut terms of service:

By submitting, posting or displaying any Materials on or through the orkut.com service, you automatically grant to us a worldwide, non-exclusive, sublicenseable, transferable, royalty-free, perpetual, irrevocable right to copy, distribute, create derivative works of, publicly perform and display such Materials. That said, our use of your personal information is governed by our Privacy Policy and we will never rent, sell or share your personal information with any third party for marketing purposes without your express permission.

That's not giving Yang the right, of course. But it does highlight that material some assumed might be kept only within Orkut could potentially be used by Google itself in other ways, assuming these also don't violate Orkut's privacy policy or its FAQ clarifying that it doesn't own the information submitted.

Yang also said that he's using the meta robots tag to prevent search engines from indexing the maps created on his service. As said, this hasn't stopped Google from getting nearly 700 of these pages.

When I looked at what was indexed, such as this example, I do see the tag there. But in Google's cached version, it is not. It may be these were added after the pages were originally crawled. If so, then the pages should get removed at Google and other search engines in the near future. The site itself, of course, will remain online.

NOTE: This story was originally published on May 6, updated with comments from Geomapper on May 7 and with Google's action on May 18.