While the rest of us have generally been enjoying the sunshine and warm weather for the past few weeks, there has been a permanent cloud over Mountain View, as the storm over Google's capturing of Wi-Fi content with its Street View cars has developed.
That storm now threatens significant reputational damage to Google, not least …

COMMENTS

Page:

Hmm...

'Surreptitious' - my android phone has made it quite clear that it can geo-locate using local wifi-networks. The idea that this was a clandestine operation is nonsense.

'with this geo-validation of IP addresses it could now give you an ad for a clothing store two streets away'

except with dynamic ips - which most are - this information is out of date quickly.

'If you or I were to wander around London recording the contents of communications from politicians, retailers and the general public, it is fair to assume we would be arrested and prosecuted in short order.'

Why? Under what law is it illegal to record public conversations? Am I legally obliged to forget everything I have overheard?

Rw Hmm

"'Surreptitious' - my android phone has made it quite clear that it can geo-locate using local wifi-networks."

How long for, though? People change ISPs, replace routers for other reasons constantly. Degrading the accuracy constantly. And after they're all destroyed by the 2013 solar flare. it wont work at all. Or does Google plan to remap the whole world regularly?

except

that most of you activities on google are attached to a unique identifier, you could (and will) change ip but once it knows that demostennese@gmail.com is attached to that ip which at that time was given to an machine at 10 downing street you will never be unknown again. Yes you could wipe your pc go to a different location and get a new identifier but you 1 slip up and it is connected to you again

Everyone's missing the point...

I'm amazed that nobody's caught on to the real story here. Yes, IP addresses can be dynamic. MAC addresses, on the other hand, are globally unique, and in the case of mobile phones or laptops likely have a one-to-one relationship with their owner. Your MAC address can identify you.

Some key points about Wi-Fi:

1) Wi-Fi packets are essentially split into header and body.

2) The header contains the MAC addresses of the source and destination hardware

3) Even if you encrypt your wireless connection, only the body is encrypted. The MAC addresses are unencrypted

Some key points from the report that Google itself released about its mapping software:

1) Google stored the MAC addresses of ALL DEVICES it found.

2) This includes your wireless routers, and any devices connected to it at the time

3) These addresses were captured regardless of whether you encrypted your wireless connection

4) Google's software listened to ALL wireless traffic it heard, irrespective of whether the SSID was broadcast or not

This means Google has a record of all the MAC addresses of every network it drove past, irrespective of whether you broadcast your wireless SSID and irrespective of whether you chose to encrypt your connection. Google and use your MAC address to identify, within metres, your home address if you were connected to your wireless network when it drove past.

This is the real story. Google claims collecting unencrypted data was a mistake; let's give them the benefit of the doubt. They fully accept that they intentionally set out to collect everyone's network identifier, and nobody seems to care or notice because they're all worried about the fragmented and probably useless payload data.

Microsoft's Kim Cameron has an excellent blog which has been picking apart, step by step, Google's actions and their ramifications: http://www.identityblog.com/

Re: So what?

"You do understand that your MAC address never makes it beyond your router? When you use your web browser to connect to a Web Server, that Web Server doesn't know your MAC address?

In other words - so what if Google connected your MAC address?"

I do understand that. MAC addresses have typically been useless, as they're local identifiers. But they are globally unique, and once you've got a database of a majority of them linked to geolocation, a whole swathe of previously impossible ideas are now feasible.

Now first of all, this isn't specific to Google. I'm sure Google would 'do no evil'(!), but we have to accept that if we allow one company to do this, then others will follow. And they may not have such noble intentions.

If you have a database of geolocated MAC addresses, you could make that publicly accessible. I could then attend an international conference at a hotel, use the public Wi-Fi and scan the MAC addresses of everyone else using the hotspot, and use the database to find their addresses whilst making the fairly-safe assumption that they're not at home.

While I'm reticent to use the 'think of the children!' argument, it's an easy one to highlight the issue. If a kid in a park has a mobile phone with WiFi switched on, I can get it's MAC address and get their address from the database, which immediately gives me a way to gain their trust.

Those are two very simple examples to highlight a point. It doesn't take much to realise that being able to work out the home address of any WiFi enabled device (read: user) is a serious privacy issue. Nobody's really talked about the effects it could have, because it's never been possible in the past. I'm not saying it's wrong and we should never do it, but the implications need to be seriously thought through and regulated.

Sound and fair comment

I'll start with the 'I'm no lawyer but...' disclaimer; one feature of a Corporation, I think especially in US law is that is a means to make an organisation the embodiment of a person in terms of rights and responsibilities - if you as an individual could reasonably expect to be banged up as a result of ripping data out of the airwaves then any corporation should be just as liable.

Well now...

"If you as an individual could reasonably expect to be banged up as a result of ripping data out of the airwaves then any corporation should be just as liable."

That's the funniest thing I have read in a very long time. I thought the past few decades made these matters perfectly clear to everyone: corporations have more rights and freedoms than do ordinary citizens. Sad, but proven repeatedly to be true...

Sound and Fair?

I fail to see why a person or corporation would be liable for any legal proceeding based on collecting publicly available data. If you don't want other people to know your MAC address, you probably shouldn't use devices that broadcast it to the entire world outside your house.

It is not unlike using a walkie talkie and then complaining that other people can surf to the same channel you are using and listen to your conversation. If you don't like it, use something more secure.

What people will certainly argue is that there are no options which allow them to encrypt the header which Google was reading from. To that I say, GREAT! You have just identified a huge security flaw and a customer need. Now go start a router company that encrypts all the personal data and blow Linksys/DLink/Etc out of the water.

The joy of American business.

It's awesome! Consolidation in any given sector into a small number of large companies with nearly unlimited resources means that you not only require huge start up capital to enter any established market, but that you can't be too innovative, or you'll get squished.

The joy of American business is that you can prevent competition and raise the barriers to entry and there are no regulators with enough of a spine to prevent it!

Eh?

" It is absurd to suggest that the development team would then create software outside the boundaries of those specifications."

If you really believe that that then you are truly uninformed.

I've been involved in commercial software development for over 30 years and it wouldn't surprise me if every project that I have been involved with had software written outside the boundaries of the specification. One of the main measures of the effectiveness of software development processes has to be the measure of its ability to prevent developers from following their own lead and going right out of scope, I don't think that there is a process that could be 100% effective at this.

Unfortunately, that comment put me off continuing to read what appeared to be quite a sensible article.

Sir

You might want to read the rest, then..

Be serious. In your "30 years", how often have you come across one of those "out of spec" areas that would lead directly to criminal charges?

Make no mistake, what Google did here is the privilege of the police and (almost) secret services. If you allow a company to get away with breaking the law with gay abandon, well, look what the presently economy looks like t see what effect it can have.

Google committed a crime. "Oops" is not going to undo that, and although I would agree with you that some things may moved out of spec, you seem to ignore in this comment that this work would have required an awful LOT to have moved outside spec - front end as well as back end. That in itself requires collusion of those managing development, and I'm sorry - with my experience (also in the 30 year bracket) I didn't buy that "rogue code" explanation for a single second.

In addition, if this was off spec I'd like an explanation for why Google then attempted to patent the very act. This was either arrogance or the most stupid timing ever (or possibly both, come to think of it). Whatever it was, an "accident" it was not, and the fact they have been trying to sell it as such tells me they knew damn well what they were doing, and validated my original cynicism.

In short, I think the "do no evil" paint has by now truly washed off. About time too.

@mccp

"If you really believe that that then you are truly uninformed ... it wouldn't surprise me if every project that I have been involved with had software written outside the boundaries of the specification."

If you really believe that it wouldn't surprise me if you frequently lose contracts. Clearly you are incompetent. Can you please supply your real name so that I can ensure I never employ you? Project deadlines are tight and we don't have the time or money for sloppy cowboys like you to waste.

@AC 11.41

Well bully for you. I'm with the original poster. Software gets written, libraries get written, libraries get reused. These libraries have more code in that need for the particular requirements. Time doesn't permit fixing libraries to remove code (and, after all, the code works, so why fix it?).

So, there you go - library reuse, which people have been trying to get others to do for years to reduce dev. time, leads directly to this problem.

I'm not saying Google are not wrong, but there is a perfectly reasonable sequence of event that leads to this being an accident (more reasonable in fact than the doing it on purpose explanation). You can even apply this to the testing regime. They will have been testing for the results they need i.e. SSID and MAC addresses. So they use the library which provides the ability to extract this data. And its works. So now they have code to capture the data they need, and have tested it and have got out the data they need. What they haven't done is realise that there is other data being recorded, because they reused a library, and they only tested the BIT THEY WERE USING.

AC 2 AC

@AC2

Fortunately, I don't need you to hire me (and I'm far too senior to need a job offer from some jumped-up developer).

It's unfortunate that there are so many developers out there who seem to think this kind of behaviour is acceptable. That's why 90% of developers give the other 10% a bad name. I'm guessing that you are in that 90%, or just possibly a project manager who thinks that he's one of the boys, instead of their manager.

<smugness>I have a very well-paid and secure job because I'm actually able to deliver projects free of the cowboy practises which you and your fellow incompetents clearly hold so dear.</smugness>

as a software developer...

I agree!

for starters, as a software developer, I know one thing about data collection.....log everything in its raw form. This means if you decide later you actually wanted something else, then you don't have to go recollect your data. You may not actually use all the data, and the data may not in a state you can easily mine it, but you still log every bit of information you get. Thats data collection 101 :-)

Someone probably forgot to tell the software devs they were dealing with sensitive information and they simply thought they were being clever and proactive by collecting everything.... and generally, that kind of decision would be a developer decision.

Judging by this article...

The guy who wrote it has his head up his arse.

"We were immediately unconvinced that this activity could have been carried out accidentally and having been involved in large technology projects for the better part of fifteen years, it seemed untenable to me that this “rogue code” could have found its way into the project and been deployed without anyone knowing it was there." This paragraph is absolute bollocks. I've been involved with projects for more than 15 years, and I can think of a number of ways this code would be there by accident. Pity no-one has written an article that points this out, but I guess there wouldn't be any money in that.

No accident.

Software Development

You may want to revisit your section on software development. What you describe is a view in principle of a very rigid development process, as might be designed by an auditor. It bears little to no relation to how things work in the real world.

Let me point out a few of your biggest errors:

"It is absurd to suggest that the development team would then create software outside the boundaries of those specifications."

No, it's not. The general consensus seems to be that developers shine when given opportunities to push boundaries. Now in shops that develop software for public consumption, that (sometimes) takes second place to producing a quality product, but for internal development for companies that are pushing to be on the cutting edge, they'll allow their developers a much longer leash.

"Any data which could not be explained by those technical specifications would raise alarms and be investigated. That is the whole point of testing software before it is deployed - to ensure that it is doing what it was designed to do and that it is stable."

The first does not follow from the second. The point of testing software is to ensure that it does what it was designed to do,and that it is stable. But very rarely does testing reach to proving that the software does NOTHING BUT what it was designed to do, which is the gist of your first sentence.

"But in the interests of objectivity, even if we accept that this code was not noticed during the testing stage (which really is stretching the realms of possibility), once a project has been deployed testing continues on live data. This is important because once a project is deployed in the real world it often behaves differently to how it behaves in a lab environment. Resource efficiency needs to be checked, external factors need to be controlled or at least mitigated and data has to be accurate. This means that even if all the above stages failed to notice the data being generated by this code, once in a live environment it would be impossible to miss."

This is the one which proved to me that you don't know the real world of software development AT ALL. Deployed projects do have testing, but that is usually reduced to minimal amounts to avoid causing performance problems. Resource efficiency monitoring and error logging would be about it. Given the likely relative sizes of the different types of data being collected, most compression effort and troubleshooting of space usage would likely be focused on the photographic component.

Your entire piece also entirely ignores one standard development practice which goes a long way to explain how code ends up in projects without the managers ever knowing it's there. And it's this practice that Google themselves claim caused this issue: the use of external libraries. Google's story is that the Wi-Fi library they used in the StreetView project was developed in their labs as an experimental project, and was included by the StreetView development team because it did what they needed, and they were either unaware or unconcerned that it collected more data than they needed. I'm not saying that I buy this story, but the fact that you don't even mention it puts a huge question mark on your understanding of this issue.

Finally, there is the issue of patents:

"Then on June 3rd 2010 as a result of ongoing class action suits in the US it emerged that Google had filed a patent application for similar technology in 2008, this reinforced our opinion that this could not have been rogue code. In order for a patent application to be filed, it seemed obvious to us that Google's legal department would have had to review the technology and submit the application. This also would suggest that the project had been funded which in itself would require the attention of managers, designers, developers and testers."

Software development companies try to patent EVERYTHING THEY DO -- even experimental stuff that they have no intention of actually using. They do it because they know every other software development company is trying to patent everything THEY do, and patent portfolios are used both offensively and defensively in this business. So the fact that Google applied for a patent means only that they developed the software, not that they ever intended to use it.

Your determination that Google did this deliberately is based on some very flawed (some might say naive) views of software development. There also seems to be some indication of bias -- you seem to be avoiding any points that lessen your case that it was deliberate.

I don't agree with what Google did, and I don't know whether they did it deliberately or not. But I know that you have not done the analysis necessary to determine whether or not they did it deliberately.

ace post

Just like to add that this

> those four core stages of design, development, testing and deployment

but for the rest of the article I would have taken as a kind of dilbert-esque irony. Stage 1 is ... I don't know how to put it. I can't believe Mr. Hanff has done much work in IT. Stage 3 is perfunctory addon, always.

BTW in a poorly designed and poorly managed database (most of them), junk accretes. The client usually finds out only when they run out of space.

Mens rea

"Then on June 3rd 2010 as a result of ongoing class action suits in the US it emerged that @Steven Knox : You wrote :

"""

Google had filed a patent application for similar technology in 2008, this reinforced our opinion that this could not have been rogue code. In order for a patent application to be filed, it seemed obvious to us that Google's legal department would have had to review the technology and submit the application. This also would suggest that the project had been funded which in itself would require the attention of managers, designers, developers and testers."

Software development companies try to patent EVERYTHING THEY DO -- even experimental stuff that they have no intention of actually using. They do it because they know every other software development company is trying to patent everything THEY do, and patent portfolios are used both offensively and defensively in this business. So the fact that Google applied for a patent means only that they developed the software, not that they ever intended to use it.

The author is correct to point out that the patent is the smoking gun.

Regardless for the reasons a company patents a technology, the fact that they did file the patent establishes mens rea. (Guilty mind) Meaning that they knew about the technology and that their legal dept as well as product managers had to know that they code exists. The fact that one can show that Google can find value in having such code (legal or otherwise) and that they understood the possible potential by creating the patent, they should also have known that it violated multiple countries' laws.

There is enough evidence to suggest that Google acted in an illegal and reckless manner.

Can you say circumstantial evidence?

Look at it this way... You're holding a smoking gun and are standing over a dead body. While forensics evidence can't show that you fired the gun, and that there were no witnesses as to what happened... regardless of what you say, your prints are on the murder weapon and you are at the scene of the crime at the time of death. If the prosecution can show mens rea, that you had a motive to kill the victim... you will end up being charged.

@Ian Michael Gumby: yerrmebessmate

If they are moot then they in turn enmooten any point Mr. Hanff made about software development at google. This seemed pretty key to his argument though.

> the fact that they did file the patent establishes mens rea

As I see it, it establishes that they filed a patent, no more. If patenting it somehow did establish guilt, then the patenting of it was the incriminating act, not particularly the use which followed from it. That seems paradoxical, especially as you say ...

> There is enough evidence to suggest that Google acted in an illegal and reckless manner.

... which is evidence (I accept) but not proof. There may be a subtle point I'm missing here.

> Look at it this way... You're holding a smoking gun and are standing over a dead body. While forensics evidence can't show that you fired the gun, and that there were no witnesses as to what happened... regardless of what you say, your prints are on the murder weapon [...]

Murder weapon -> it was a weapon of murder -> a deliberate act of unlawful killing as determined by a court -> you are guilty of murder as decided by a court. No? Yes? So why are you using this example as circumstantial evidence? You are presuming the desired conclusion. Seems like sloppy argument.

Not knocking the work of Mr. Hanff in general BTW.

I wonder how rational this is going to be in the morning. Where's the whiskey icon anyway?

Well....

I'd still say that whether or not professional project development processes work in the real world or not is no excuse for breaching many laws in many countries. Bearing in mind the potential fallout from this is really good business sense either on paper or in the real world to give your devs a sufficently long leash with which to hang their employers?

And are you so naive to think that Google did do this purely by accident? Surely if it was an unecessary and experimental feature it would've beeen culled to preserve the bandwidth required to fire all the StreetView data back to HQ?

To say that the authors interpretation of project development somehow detracts from the fact that Google would have known what they were collecting and should be penalised merely demonstrates how little you know of, or care about, the privacy implications both immediate and for the future.

@Adam Salisbury

If I may continue to defend Mr Knox

> I'd still say that whether or not professional project development processes work in the real world or not is no excuse for breaching many laws in many countries

That's the point. They *may be* that adequate excuse. It's about intent. Not denying google should pay a price for this, but the difference between murder & manslaughter is significant and reflected in the sentences, surely (but IANAL).

> [...] really good business sense [...?]

No but business decisions are sometimes totally irrational. No I can't explain it but it is so. you'd hope larger corps would have more sense but IME it's not assured.

> And are you so naive to think that Google did do this purely by accident?

Gah, stop presuming guilt! Myself & Steve Knox are not defending google, just saying the case is not yet made.

> Surely if it was an unecessary and experimental feature it would've beeen culled to preserve the bandwidth required to fire all the StreetView data back to HQ?

Steve Knox addressed this point already: "Given the likely relative sizes of the different types of data being collected, most compression effort and troubleshooting of space usage would likely be focused on the photographic component." It's a strong point too. Also see my point about DBs growing.

Since when was ignorance of the law an exuse...

"Your determination that Google did this deliberately is based on some very flawed (some might say naive) views of software development. There also seems to be some indication of bias -- you seem to be avoiding any points that lessen your case that it was deliberate."

I always thought "ignorance of the law" was never an exuse to break the law....

@Ian Michael Gumby

I hope making analogies is not part of your employment; you just failed miserably at it.

"Look at it this way... You're holding a smoking gun and are standing over a dead body. While forensics evidence can't show that you fired the gun, and that there were no witnesses as to what happened... regardless of what you say, your prints are on the murder weapon and you are at the scene of the crime at the time of death. If the prosecution can show mens rea, that you had a motive to kill the victim... you will end up being charged. Does that make sense?"

No one is contesting whether Google collected the data or not, they are contesting whether it happened knowingly or not. Let's try to reshape your analogy a little, eh?

You're holding a smoking gun and standing over a dead body... at a gun range, where the person was running around behind the target area. Did you shoot them on purpose? There is no way to tell if there was no witness.

If Google did in fact try to file a patent for this technology, and was turned down or advised against it due to the possible illegality of it, that clearly shows they had NO motive to do it on purpose. Google did not get as large and powerful as it did by being stupid. If you try to patent something and get told no, you don't go out and do it. You'll obviously get caught.

The circumstantial evidence points to it being an accident, unless you are presuming supreme incompetence and stupidity from one of the brightest and most intelligent companies in the world.

Re: IP Addresses

The title is required, and must contain letters and/or digits

At date/time X, IP address Y, which probably did some sort of google search during the period of capturing, hence is associated with permanent cookie Z.

So google now knows cookie Z came from IP address Y at this street address.

Now your dynamic IP kicks in and your IP changes. But guess what? Your google cookie is still Z, so now you have IP Address A, but still cookie Z, and since they know the address of Y that also used cookie Z, then the overwhelming likely hood is that IP address A is also from the same physical address as Y. And again when the IP Address changes to B, C etc, you still have cookie Z as the Primary Key across all the IP changes. This is assuming you use google searches, which the majority of people do, and that you don't block cookies, which the majority of people don't do

Now, there are exceptions to this, such as if you are using a mobile device transiently etc. But the vast majority of the information collected would be from static devices.

@Max 6

Max 6 makes the point that seems to be missing from most of these conversations. You are choosing, by your own free will, to broadcast this information to everyone within a certain radius. It isn't our fault if we chose to listen in. If you didn't want us to, you probably shouldn't have broadcast it.

But Daddy why does granny smell of wee?

How will google tie the private IP address of a WiFi user to the IP address which communicates with the outside world? (hint 192.168.x.x or 10.x.x.x)

What fraction of people who are so stupid as to not encrypt their Wifi connection will be using the ISP assigned address (almost none I'd say)

Hmmm some of this article smells of piss coupled with a dose of lets make the story bigger than it is

Yes I do object to google collecting the information and more so about capturing the private data of idiots .

As for locating people via the IP address assigned to them by their ISP I can can assure you that it's reasonably accurate in the UK.

From memory I believe it is perfectly legal to capture the Beacon from a wireless router and the data contained within it (you are making a public broadcast after all) but capturing user data or authenticating with it are a completely different matter.

Eh?

"How will google tie the private IP address of a WiFi user to the IP address which communicates with the outside world? (hint 192.168.x.x or 10.x.x.x)"

That doesn't sound right. If you request a website, you want it sent to your ISP assigned IP address. Which has to be included in every outgoing packet. The internal IP address is needed only for your router to pass incoming packets to the correct computer, or whatever.

Maybe on your network

Not IP address

It may be possible that there is some routing traffic in what they captured, but indeed, most WiFi enabled kit is router based and would thus use network address translation.

However, the SSID is another matter - pick that up plus GPS and you have a location match. Personally, I have a feeling Apple is collecting these associations too as my iPhone knows far too quickly where I am, even without a satellite view. In addition, if you collect data from an open network you also have access to it - one packet with payload INJECTED towards some Google collection system and you'd have the public IP address. Sure, it'll be part of a pool but such addresses do not change as quickly as many seem to believe - few people kill their routers for a good 30 minutes every night to pick up a new IP address.

Last but not least, you can also intercept MAC addresses. *That* tends to tie you to a system more than network presence per se. All you now need is one Google app that picks up MAC address whilst using Google services and they'll know it's you, cookie destruction and Google sharing notwithstanding. Now imagine Google selling THAT data - the moment you go online, the LAN could query Google for your name and then market directly to you. *Not* good.

I'd love to see what these clowns have been doing. Maybe they will have to return part of their NSA funding now..

MAC addresses are not routable

I wish people would read some basic networking books before coming out with statements like google knows your MAC address so it'll be able to trace your packets to you. All google can do is match the MAC address of your router to a GPS location. - Handy tool if you are wandering around sniffing WIFI but without GPS.

Contrary to popular belief MAC addresses, though unique, do not pass through routers. In fact every time a packet goes through a router the MAC address changes to that of the router.

MAC addresses are not routable

True but that doesn't mean Google or others cannot determine your router's MAC address (e.g. http://test-geolocation.appspot.com/), The point is that the router MAC address acts like a cookie you cannot delete. Whilst the Internet facing IP address may change regularly and cookies may be deleted, the router's MAC address is likely to remain for at least a couple of years. It is the linkage of all the retrieved information that is significant. You may be blocking cookies and not allowing various active web content but if someone else using the same WLAN is less careful, then they will have leaked your location in addition to their own (because you will both be using the same Internet facing IP).