A Revised Taxonomy of Social Networking Data
Lately I've been reading about user security and privacy -- control, really -- on social networking sites. The issues are hard and the solutions harder, but I'm seeing a lot of confusion in even forming the questions. Social networking sites deal with several different types of user data, and it's essential to separate them.
Below is my taxonomy of social networking data, which I first presented at the Internet Governance Forum meeting last November, and again -- revised -- at an OECD workshop on the role of Internet intermediaries in June.
* Service data is the data you give to a social networking site in order to use it. Such data might include your legal name, your age, and your credit-card number.
* Disclosed data is what you post on your own pages: blog entries, photographs, messages, comments, and so on.
* Entrusted data is what you post on other people's pages. It's basically the same stuff as disclosed data, but the difference is that you don't have control over the data once you post it -- another user does.
* Incidental data is what other people post about you: a paragraph about you that someone else writes, a picture of you that someone else takes and posts. Again, it's basically the same stuff as disclosed data, but the difference is that you don't have control over it, and you didn't create it in the first place.
* Behavioral data is data the site collects about your habits by recording what you do and who you do it with. It might include games you play, topics you write about, news articles you access (and what that says about your political leanings), and so on.
* Derived data is data about you that is derived from all the other data. For example, if 80 percent of your friends self-identify as gay, you're likely gay yourself.
There are other ways to look at user data. Some of it you give to the social networking site in confidence, expecting the site to safeguard the data. Some of it you publish openly and others use it to find you. And some of it you share only within an enumerated circle of other users. At the receiving end, social networking sites can monetize all of it: generally by selling targeted advertising.
Different social networking sites give users different rights for each data type. Some are always private, some can be made private, and some are always public. Some can be edited or deleted -- I know one site that allows entrusted data to be edited or deleted within a 24-hour period -- and some cannot. Some can be viewed and some cannot.
It's also clear that users should have different rights with respect to each data type. We should be allowed to export, change, and delete disclosed data, even if the social networking sites don't want us to. It's less clear what rights we have for entrusted data -- and far less clear for incidental data. If you post pictures from a party with me in them, can I demand you remove those pictures -- or at least blur out my face? (Go look up the conviction of three Google executives in Italian court over a YouTube video.) And what about behavioral data? It's frequently a critical part of a social networking site's business model. We often don't mind if a site uses it to target advertisements, but are less sanguine when it sells data to third parties.
As we continue our conversations about what sorts of fundamental rights people have with respect to their data, and more countries contemplate regulation on social networking sites and user data, it will be important to keep this taxonomy in mind. The sorts of things that would be suitable for one type of data might be completely unworkable and inappropriate for another.

export

sort

Rianne Kaptein, Pavel Serdyukov, and Jaap Kamps. Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, page 839--840. New York, NY, USA, ACM, (2010)