Why crawling emails is a privacy problem

The people who keep proclaiming that privacy is dead tend to be the ones who gain the most form the death of privacy, Danah Boyd (from Microsoft Research) pointed out at the SXSW conference last month.

The people who keep proclaiming that privacy is dead tend to be the ones who gain the most form the death of privacy, Danah Boyd (from Microsoft Research) pointed out at the SXSW conference last month. Both Google and Microsoft are backing the Digital Due Process coalition that's asking the US government to strengthen privacy laws to make sure documents stored in cloud services are as private as they would be on an American's hard drive (and to protect location-based information gathered by various services) - so that law enforcement agencies need a warrant to access them rather than just a subpoena. (I talked about some of the implications of the current rules for document metadata last year).

That didn't stop Tim O'Brien, Microsoft senior director of platform strategy, pledging at the SaasCon conference that Microsoft wouldn't scan Hotmail messages to tailor ads to users - which is hard to read as anything other than a criticism of Gmail. Google defended the practice to InfoWorld by saying it's just like scanning for spam and no human ever reads the Gmail messages. Yes, but.

No-one else gets to read your Gmail messages, but Frank Shaw, corporate VP for Microsoft corporate communications (translation: the man deciding what the Microsoft message should be) gave us an interesting example of why that doesn't mean they stay private. If you submit an article to a peer-reviewed journal, you can use Google AdWords to find out whether it's been accepted or rejected around a week before you get the official reply. (We're waiting for one of our academic acquaintances to have a paper up for approval so we can try this out). You don't have to spend any money; just go check out whether it's more expensive to get an ad targeted to the title of your proposal and the word 'accepted' or the title plus 'rejected'; the one that costs more reveals the fate of your submission.

How come? Gmail is so popular that the discussion of the approval is pretty much guaranteed to end up in a Gmail message. Once that message has been scraped for ad relevancy, AdWords knows that the keyword for the title of your article is more valuable with one of those two words, because someone has expressed an interest in that combination. It's an unintended consequence of the omnivorous machine learning that powers Google, and of the fact that it's really hard to anonymise unique identifiers completely.

Privacy isn't about secrets, Danah Boyd says; it's about control. Every system that gets to extract information from your data limits the amount of control you have. Free services walk a fine balance between extracting enough value from the users to keep them in business and taking away so much control that the users go elsewhere. Google bets that the value of the ads you get in your email or next to your search results is worth the rare cases when their ad mining system could reveal something that was meant to stay private. Is that a balance you're happy with?