Saturday, June 12, 2010

Rethinking analysis of Google's AP data capture

In "Rethink things in light of Google's Gstumbler report," Kim Cameron asks that we rethink our analysis of Google's wireless data capture in light of the third-party analysis of the gstumbler data capture software. In particular he seems to have a particular fondness for the phrase "wrong," "completely wrong," and "wishful thinking" when referring to my comments on the topic. In my defense, I will say that there was no "wishful thinking" going on in my mind. I was just examining the published information rather than jumping to conclusions -- something that I will always advocate. In this case, after examining the published report, it does appear that those who jumped to conclusions happened to be closer to the mark, but I still think they were wrong to jump to those conclusions until the actual facts had been published.

I read through the entire report and have to say that the information in the report is quite different than the information that had been published at the time I expressed my opinions on the events at hand. The differences include:

We had been led to believe that Google had only captured data on open wireless networks (networks that broadcast their SSIDs and/or were unencrypted). The analysis of the software shows that to be incorrect -- Google captured data on every network regardless of the state of openness. So no matter what the user did to try to protect their network, Google captured data that the underlying protocols required to be transmitted in the clear.

We had been led to believe that Google had only captured data from wireless access points (APs). Again the analysis shows that this was incorrect -- Google captured data on any device for which it was able to capture the wireless traffic for (AP or user device). So portable devices that were currently transmitting as the Street View vehicle passed would have their data captured.

One factor that is potentially in the user's favor is that the typical wireless configuration would encourage portable devices to transmit at just enough power for the AP to hear them (devices on wireless networks do not talk directly to each other). Depending upon the household configuration, it is possible (probable?) that a number of devices would not be transmitting strongly enough for them to be detected from a vehicle out in the middle of the street. However, if Google had a big honking antenna on the vehicle with lots of gain in the right frequencies, it could have detected every device within the house.

Given this new information I would have to agree that Google has clearly stepped into the arena of doing something that could be detrimental to the user's privacy.

That said, however, we need to be a little careful about the automatic assumption that the intent was to put all of this data into some global database. In fact, the way the data was captured -- the header of every data packet was captured, many of which would contain duplicate information -- makes it clear that Google intended to do some post-processing of the data. One could hope that they would use this post-processing step to restrict the data making it into any general, world-wide database. Of course, we don't know whether or not they would do this and even if they would, they still have that raw data capture which contains information that could clearly be used to the users detriment.

In addition, the fact that we know that Google did this, doesn't preclude the fact that others can be doing this (or have already done this) without publicizing that they have done so -- especially those who do intend to use this information for nefarious purposes.

We should take this incident as a wake-up call to start building privacy into the foundations of our programs and protocols.