Privacy, Anonymity, and Hockey

The Hamburg Data Protection Officer has released a statement declaring Google Analytic’s compliance with German privacy laws. Sadly, this comes with the caveat that users are going to have to delete their accounts and create new ones, so that they don’t retain any illegal data.

The German laws are a pretty good idea in concept: prevent companies from collecting personal data from users without their consent. We already know that Google does this (through search) so nervousness over Google’s overall behavior may be warranted. However, is the nervousness over Google Analyics?

I have to wonder sometimes if people tend to forget the difference between privacy and anonymity, and how each affect our lives. Google Analytics does seem to infringe somewhat on our privacy, but at the same time it keeps us anonymous and I think that, in this case, this is much more important.

How does GA work

We know how GA collects data, and if you’ve read our RUGA series then you, too, know. It’s pretty basic:

You view a page and Javascript executes

The execution of Javascript assigns you a “session” number which defines a visit

Attached to this session number is data about:

what pages you viewed

how long you stayed on them

technical details (OS, screen resolution, etc.)

purchasing details

etc.

This data is then sorted by account and aggregated.

Aggregate numbers are reported.

At no time other than in initial data collection is any personally identifiable information collected, and even then there is nothing more than an IP address (which tells a company nothing more than your ISP). Even this, however, is stripped when data is aggregated.

The truth is, most websites have no interest in knowing what you did. That’s uninteresting, statistically irrelevant data. They want to know what their users en masse did. This information is useful for modeling a great user experience, simplifying purchasing processes, and making sure that users who want to buy from you can.

You can, theoretically, collect personally identifiable information via a few different hacks, but this violates Google’s TOS and Google will shut down accounts for it. Further, the people who want user specific information are not going to use GA. It simply doesn’t have the feature set for any decent per-user analysis.

Now, of course, some people still do this. I noticed a while back that Groupon was including users email addresses in the utm data being reported when users clicked links from their emails: a real no-no. Looks like they got busted for that though, as recently they’ve replaced it with a large “user” string. However, this was a user error, not a GA fault. This was some one using the software to do something it wasn’t built to do.

Privacy vs. Anonymity

The PanopticonThere’s a fairly well known thought experiment used in privacy studies. Jeremy Bentham dreamed up a prison system which he referred to as “The Panopticon”: a method for using observation, or gaze, as a means of control. The concept was simple, make the inmates of an imaginary prison think that you’re watching them all the time. If they think this, then they will behave, whether or not some one is actually there.

This has since become the model for the ‘horror story’ surveillance state: everything you do is associated back to you, nothing you do is ever anonymous. It’s also the model for much of Britain’s current surveillance model.

Ironically this model is often used when talking about privacy, but what it really is about is anonymity. Since each prisoner is easily identified, and their actions easily associated back onto them, none are anonymous. However, the system is surprisingly private. Since the inmates aren’t actually being watched consistantly, and so no one really knows what they’re doing most of the time.

The Panoptic SocietyIn comparison, here’s something that happened in Vancouver a while back.

On June 15th the Canucks (Vancouver’s Hockey team) made it to the finals. As the crowd packed tighter and tighter downtown to see the game, it got pushy, people started getting squished, and some people started getting downright violent.

Pulled back it’s not so bad. The mass of the crowd is visible, general actions by the group can be seen. Zoom in a little more and you can see more detail. Zoom in a bunch and you can tell exactly what’s happening, identified by person.

If you look carefully enough at the full site, you can just see the edge of my face in the middle right. You can see that, by that point, I was trying to get out (a friend was going into panic attacks after being squished between people so hard that she was lifted off her feet). Each action happening while this is taken is recorded, each person identifiable. It’s not like Bentham’s model per se, since all data is recorded, and thus no action is private. None are anonymous, and none are private. Of course this was only one shot, but as the night progressed the presence of cameras and smart phones produced a myriad of photos and videos clearly identifying individuals. More anonymous than in Benthams thought experiment, but also far, far less private.

The Analytic ModelGoogle Analytics is the other end of the scale. To build a metaphor, let’s say we took a zoomed in view of the crowd.

Google Analytics doesn’t record personally identifiable information, instead it obfuscates it under a “session ID”. You might compare this to blurring everyone’s faces.

However, there is one even more important difference: since data is stored per session you dont even have your data stored under the “mask” of the session ID. Each new session gives you a new mask. In our ‘crowd photo’ metaphor ir would be as if you gained a new face in each photo.

Then since GA presents everything in aggregate you wouldn’t even get to look at it in this detail. What you would see would be:

And so all you really get is a broad idea of how people are behaving.

This is the opposite of the Bentham model: not private, but anonymous. Actions of the group can be seen, and peoples behaviors are observed, but their identifies obfuscated, their actions generalized as part of the crowd.

This is a far less oppressive idea than either of the previous models. Bentham’s model entails direct manipulation, the panoptic society provides much opportunity for abuse, the Google Analytics model provides little more than the ability for people to understand how groups behave. Privacy here is not the problem. Anonymity is.