Earlier this month, Google’s official engineering blog confessed that the company’s Street View cars and bikes have “inadvertently” gathered personal data in transit on unencrypted Wi-Fi networks for the past three years (see the post: Wi-Fi Data Collection). As chronicled in major news stories in the past three weeks, Google’s actions are under scrutiny by government regulators everywhere (see links to news stories at the end of this post).

This is a topic close to my heart because my research group has been conducting similar surveys of wireless signals for the past five years as part of a project funded by the US National Science Foundation. Here’s a picture of our own slightly less obtrusive Wi-Fi sampling car in South Central Los Angeles in 2005. (On second thought, we shouldn’t have chosen a black SUV. Too scary.)

Our project was research and not commerce, so thanks to something called the National Research Act of 1974, in order to begin our research project we needed ethics approval from a university panel of researchers and civilians. We had to investigate, explain and justify the privacy implications of our study to a group called the university’s “institutional review board” before we started doing anything.

My argument to the board went like this: We want to count the presence and absence of Wi-Fi networks, and we want to uniquely identify them so that we can tell where Wi-Fi exists and where it doesn’t. (This is the same thing Google wants to do. Other companies do it too, like Microsoft and Skyhook Wireless.) A main commercial motivation for this kind of project is to improve GPS accuracy (try it: http://loki.com/). Our research motivation was to understand the evolution and diffusion of computer networks.

This is akin to doing a survey of telephone adoption by counting telephone poles. We can do this research from public streets and sidewalks. We are looking at unencrypted information that is broadcast in the clear to everyone anyway (called the “management frame” — this information is what creates the list of available Wi-Fi access points that is on the upper right (Mac) or lower right (PC) of your laptop). We don’t look at the content of the transmissions.

Although our equipment looks different from your laptop (and works faster and on more channels), our code does essentially the same thing that your laptop does when you open it in a new place. It listens to see if there is any Wi-Fi around. That’s it. To me, it didn’t seem like a difficult ethical case to make. Indeed we easily passed ethics review and our research was declared exempt from further review.

To give you an example of what we see, here is a screenshot from a popular open source wireless sniffer, kismet. (We use a slightly modified version.)

[Kismet screenshot — click to enlarge.]

Google was trying to do the same thing that my wireless research group was doing — again, no ethical problems there. However, they claim to have “inadvertently” also listened to the content of communications. (This is called “payload” data.) Here’s the problem with the story we’re getting from Google: the word “inadvertently.”

I see no way that this could be inadvertent. Continuing my earlier metaphor: If your plan is to count telephone poles how would you “inadvertently” tap telephone lines and transcribe everything that you hear? The two actions are quite different. Of course I don’t know how Google wrote its software for these capture platforms. With our team we use slightly modified versions of open source wireless tools. It is possible to use tools like these to save the “payload” data from wireless systems. There are legitimate engineering reasons for doing so if you are trying to improve the performance of your network. That’s why these tools exist.

However, I don’t understand how we could ever have “inadvertently” done that. It isn’t like stumbling over a banana peel or forgetting to leave a light switch (or variable) turned on. Even sampling some of the payload data would start producing about 10x as much data as listening only to the management frame. Do you ever go to the store for a can of soda and “inadvertently” fill your cart with ten cans? I didn’t think so.

If you inadvertently started buying 10x as many groceries as you wanted, I bet you’d notice. I bet it would take you less than three years to notice, too.

The only interpretation I can think of is that the word “inadvertently” is being applied by the legal department. The real chain of events is probably that the coders at Google intentionally designed these systems to act in this (illegal) way, but they didn’t understand the legal and PR implications. Programmers may have set it up on their own initiative and not briefed anyone else who could have seen this disaster looming, but that isn’t “inadvertent.” And it isn’t a “programming error” — another phrase that is being used in the press.

From here Google looks pretty guilty. Now in the most recent news reports it looks like they are trying to destroy the data as quickly as possible as a way out of the scandal. But looking at the data would make it even clearer that its collection wasn’t accidental. So in this case destroying the private data may not be a way to protect the privacy of those they snooped on — instead it seems like a way to protect Google’s nontraditional use of this word: “inadvertently.”

12 Responses to “Confessions of a Spy Car Driver”

The problem is that I can think of many reasons that they would wish to capture data from private networks, some legitimate some not-so ligit.

But I don’t think google intended to use that data for any of those reasons, I think they just threw a wide net to capture as much data as possible for analysis. The people and engineers doing the collections are dumb drivers, they don’t have the capacity to analyze the data.

I think they “inadvertently” captured the extra data on open networks, not for nefarious reasons. From the reports it seems that the data is fragmented, gleaned from drive-by, broad spectrum capture. Useless for most purposes. Then when someone with a brain saw the PR implications, they figured the right thing to do was destroy all the data that could hold personal information.

Did you log the output of kismet? If you did, you may well have “inadvertently gathered personal data” if the names of the networks reflected information about the owner (as, for example, several of the networks on my street do). It’s quite possible to interpret Google’s confession as over-caution by lawyers rather than an admission of extreme guilt…

Michael, Thanks for the comment. I think we are agreeing, right? My point was that the engineers involved wouldn’t “inadvertently” capture the payloads. I just don’t see it. And I don’t see how it could be a “programming error.” They are trying to make it seem like an engineering mistake, when in fact is was probably engineers INTENTIONALLY capturing the data, and not realizing the implications of doing so. So “inadvertently” seems like quite a dodge!

That’s a good one — I love funny SSIDs (network names). And many of them are personal. (The most common SSID that is not a default setting is “home” but many people use their own last name.) My favorite SSID of all time:

SSID: keepondriving

My research group indeed found lots of streets where people used the SSIDs to have joking conversations with each other. Example SSIDs on one street:

SSID: JohnIsGay
SSID: NoDaveIsSuperGay

Here’s another one:

SSID: FuckOff
SSID: AreYouHavingABadDayJenna

But as a *public* identifier (that’s what the ID stands for) which is broadcast in the clear in order to *advertise* your access point, there’s no way to consider the SSID a secret. You can put personal information in the SSID if you want to, but if you do, you know you’re advertising it. Or you should know.

The wiretap act in the US, which Google has violated if it kept the payloads, makes payloads off-limits because looking at them involves intercepting an electronic communication when you aren’t “an addressee or [the] intended recipient” [quote from 18 USC 2511 (3)(a)]. I am the intended recipient of the management frame, and so is Google. And so are you.

I’m not splitting hairs, this is really clear cut. That’s the point of the SSID and the management frame, to advertise the signal and its settings. If we got mad at Google for collecting SSIDs with their spy cars that would be like being mad at them for reading billboards on the highway.

But the *payload* is different, even if you don’t bother to encrypt it. It is never intended for the Google spy car.

And–my big point with the post–I’m never going to “inadvertently” capture either the SSID or the payload. If I mean to get it, I’ll get it. If I don’t, I won’t.

[…] Confessions of a Spy Car Driver I'm not sure I go along with this particular public beating of Google. In this example, if the author logged the output of kismet he may well have “inadvertently gathered personal data” if the names of the networks reflected information about the owner (as, for example, several of the networks on my street do). It’s quite possible to interpret Google’s confession as over-caution by lawyers rather than an admission of extreme guilt. (tags: Google WiFi Privacy) […]

I agree with your assessment of this case, Christian. You do not inadvertently shop for things you do not need and a company does not inadvertently spy on data. Corporate behaviour is rational, well-calculated, intentional, although this rationality frequently or always has irrational effects, and, as an effect, as Horkheimer and Adorno knew, in extreme forms turns into barbaric fascist irrationality. To spy on the user data is well-calculated, it can only be for economic purposes. The effects and consequences of this and other highly rational behaviour of Google can in the end have irrational consequences by turning into a full-scale corporate dictatorship, in which all private data is expropriated and owned by companies.