Why Google keeps your data forever, tracks you with ads

In a conversation with Ars Technica, Google's top privacy people defend the …

Not many companies could get away with defending controversial data retention practices by saying that the data is needed to "learn from good guys, fight off bad guys, [and] invent the future." But that's how Google sees itself and its practices—not surprising from a company that would give itself an unofficial motto like "don't be evil."

I had the chance recently to sit down with two of Google's top privacy people: deputy general counsel Nicole Wong and security/privacy engineer Alma Whitten. While the "good guy/bad guy" and "don't be evil" quotes may seem too cute by half to some, Wong and Whitten made a strong pitch for the truth of both slogans. In their view, Google really is fighting the good fight when it comes to your online privacy.

Anonymization and its discontents

Google logs an astonishing amount of data, including the search logs from its flagship product. It keeps this data indefinitely, so searching for a combination of yourwife'sname and youraddress and "rat poison in her cereal" is not a particularly smart idea (though search users do this sort of thing anyway).

But the company does "anonymize" this data eventually. The last octet of the IP address is wiped after nine months, which means there are 254 possibilities for the IP address in question (.0 and .255 are reserved addresses). After 18 months, Google anonymizes the unique cookie data stored in these logs.

But Whitten, who was involved in Google's decisions on such issues, said that Google has done the best it can to keep the retention period to a minimum while still extracting maximum value from that data... and that this "value" isn't just to Google but also to users.

"Wonderful things that can be done with an abundance of data," she said. When Google's teams began looking at the data retention issue a few years back, they "started with zero" and tried to see if they could make it work. They could not; Google would lose the ability to do too many useful things.

Search data is mined to "learn from the good guys," in Google's parlance, by watching how users correct their own spelling mistakes, how they write in their native language, and what sites they visit after searches. That information has been crucial to Google's famously algorithm-driven approach to problems like spell check, machine language translation, and improving its main search engine. Without the algorithms, Google Translate wouldn't be able to support less-used languages like Catalan and Welsh.

Data is also mined to watch how the "bad guys" run link farms and other Web irritants so that Google can take countermeasures.

Google eventually settled on anonymizing the IP address after nine months, though even here, "we believe that we have lost the ability to do things," said Whitten.

Web users don't mind being tracked?

Instead of cutting the data retention period further, Google is more focused on 1) transparency and 2) keeping the data locked down safely. The company believes that when users know what Google keeps and why it keeps it—and when they have the chance to opt out—users are often happy to let Google do its thing.

Wong points to behavioral advertising, which Google jumped into last year. This sort of advertising relies on a vast ad network across many sites, and the ads record a visitor's unique cookie. Google can collate this data on the back end and compile a list of interest categories associated with a particular user cookie; since most users never clean their cookies, this works well as a general ad targeting mechanism.

"We believe there is real value to seeing ads about the things that interest you," she wrote. "If, for example, you love adventure travel and therefore visit adventure travel sites, Google could show you more ads for activities like hiking trips to Patagonia or African safaris. While interest-based advertising can infer your interest in adventure travel from the websites you visit, you can also choose your favorite categories, or tell us which categories you don't want to see ads for."

Choosing your favorite categories—and opting-out of behavioral ads altogether—is made possible by Google's Ads Preferences manager. The site gets limited use; despite the hundreds of millions who use Google services or are served Google ads, only "tens of thousands" visit the Ads Preferences site each week, I'm told. One might assume that these would be the most motivated "opt-outers," those who actually understand what behavioral advertising is, know how it works, and hate it with a passion.

The Google folks insist that this isn't actually what happens when people visit the Ad Preferences page. Compared to the number of people who choose to opt out entirely, four times more people merely edit their categories, while ten times more people do nothing at all.

This could mean several things (are most users just confused about the options and simply do nothing?), but Google takes it as vindication of its willingness to be transparent about what it does, and its willingness to put users in control. Certainly, there are other companies that could take a page from the Google playbook. The Ads Preferences manager makes it simple to opt out with single click, but this only applies to one browser; Google has also built a browser plugin that can remember the setting across browsers and after cookie purges.

Given the sheer amount of hate directed at Google-owned Doubleclick that erupted in our recent comment thread on ad blocking, though, it looks like Google still has some ways to go before it convinces the geekerati that its opt-out behavioral targeting practices truly aren't "evil."

As Google services rack up increasing amount of data on users, the company's strategy for reassuring users is based on such transparency, user control, and data safety. Whitten stresses with pride that Google's data doesn't leak, and Wong notes how aggressively the company pushed back against a broad Department of Justice data request in 2005.

Not that I really care if a company tracks my internet searches that much, but if there's one company I would be willing to trust it's Google. At least... as long as Page and Brin are at the helm. After they get old and decide to step down, all bets are off.

I found out yesterday that I can hide ads with a user style sheet. They still load; I don't have to see them.

And still be tracked by Google.

No thanks from me. hosts file-based blocking ftw.

Out of our discussion yesterday I pretty much found out you can google any company like "google opt-out", "yahoo opt-out", "microsoft opt-out" and they all have pages that explain with great detail how you can opt-out of their behavioral tracking.

I also found a cool Firefox plugin that automatically opts you out of 90 different company's tracking called TACO. Seems like right up a lot of people's alleys.

What an utter load of rubbish. I have been working in the online industry since the mid 90s, and none of these entities - Google, Doubleclick, Yahoo, et al - are 'good guys'. They are businesses designed to generate profits and increase shareholder value. And as such they are responsible to no one but said shareholders.

Don't kid yourselves your information is very valuable. That is why it should always be private. Anybody that has had any experience with with any type of oppression will tell you that. Don't trust anybody with any of your private information. It isn't being paranoid it is being wise. The worlds track record is a good place to start. Trust me life isn't usually in your favor. Especially when someone wants to make a buck off you or claim some type of authority.I can't believe how naive some people are? Don't believe me google yourself someday. I think you will find we have already thrown the baby out with the bath water. Don't ever take your freedom and who you are for granted. There is a reason someone wants to know everything about you. It isn't because your so special.

This opt-out feature seems absurd. In order to not have behaviorally-based ads displayed, I have to positively and uniquely identify myself to google. So they're tracking everything, all my searches etc., more efficiently, and retaining the data indefinitely.

Huh? I only mildly care about seeing the ads. It's the tracking I want to opt out of!This feature is worthless, no wonder nobody uses it. A complete red herring.

I don't let google set cookies on my machine. I don't use gmail. I know that there's still IP tracking, and the other crap like browser configuration signatures and keystroke timing signatures as has been covered here recently, but there's little I can do about that.

In an intelligently-run society that respected privacy, that kind of online tracking should require opt-in!Don't worry, I'm not holding my breath.

Again... ¿Can't they just hash the IP/cookie pairs? I understand the uniqueness requirement for analysis, but (properly) hashed values work perfectly well in data mining, and I talk from direct experience in this field.

If they need regionalization, they should aggregate the data at time of hashing. I bet they do this already, you don't just query every <country>/<city> IP record for giga or tera sized data sets.

I found out yesterday that I can hide ads with a user style sheet. They still load; I don't have to see them.

And still be tracked by Google.

No thanks from me. hosts file-based blocking ftw.

Out of our discussion yesterday I pretty much found out you can google any company like "google opt-out", "yahoo opt-out", "microsoft opt-out" and they all have pages that explain with great detail how you can opt-out of their behavioral tracking.

I also found a cool Firefox plugin that automatically opts you out of 90 different company's tracking called TACO. Seems like right up a lot of people's alleys.

Problem with this is I have to log into Google to not be tracked by Google since when I'm not logged in I assume its not applying my preferences. On the one hand it does sound kind of weird that I'd have to log in not to be tracked, on the other opt in only doesn't work. It doesn't work because for one thing no one would bother even if they didn't mind, and for another the "bad guys" they want to keep from mucking up their results would also not sign up. Annoying catch 22s.

How so? Most content generation is backed by some kind of reward. Be it improved standing or reputation to cold hard cash.

One of the few ways of getting said cash on the net is through advertising. It is one of the ways you can maintain a broad variety of sites and voices on the internet. Condemning advertising per se is a stupid move.

Out of our discussion yesterday I pretty much found out you can google any company like "google opt-out", "yahoo opt-out", "microsoft opt-out" and they all have pages that explain with great detail how you can opt-out of their behavioral tracking.

I also found a cool Firefox plugin that automatically opts you out of 90 different company's tracking called TACO. Seems like right up a lot of people's alleys.

Don't kid yourselves your information is very valuable. That is why it should always be private. Anybody that has had any experience with with any type of oppression will tell you that. Don't trust anybody with any of your private information. It isn't being paranoid it is being wise. The worlds track record is a good place to start. Trust me life isn't usually in your favor. Especially when someone wants to make a buck off you or claim some type of authority.I can't believe how naive some people are? Don't believe me google yourself someday. I think you will find we have already thrown the baby out with the bath water. Don't ever take your freedom and who you are for granted. There is a reason someone wants to know everything about you. It isn't because your so special.

No, your information is valuable to you, but it is only worth a few bucks tops to an advertising company (and more likely just a few cents). Shoot, even actual dangerous identification material, such a CC numbers, etc. only go for a few bucks a pop on the grey and black markets. Value comes only if a retailer/criminal identifies you as a worthwhile target/victim, until then we are each just one number among billions.

That's not to say you shouldn't worry about privacy, you should, just know that your privacy is only important to you.

Listen, if you guys wants to be the real good guys, list out some options for us, just say if you want this service and that service, you need to allow us to keep your data for _ month, and if you set 0 month or nothing kept, you only get somethings.

Seriously how hard can it be to give people the option to be not seen or tracked? Unless you are doing more than what you are willing to admit.

I have never really understand the "OMFG Google sees what sites I am visiting" paranoia.

If Google wants to track my browsing history to show me more targeted advertising, I say let them. I'd rather see ads for products/services that interest me rather than just a random collection of ads. It's better than seeing an abundance of feminine hygiene commercials like regular TV.

Another thing to keep in mind is that as advertising gets more targeted, the needs for flashy/obnoxious ads is somewhat reduced. Advertisers don't need to rely as much on gimmicks if the subject of the ad is more interesting to you.

Another important point: Remember that unless you sign up for a Google account, they have no way to track YOU. They can track your browser instance and IP address, but that's about it. Neither one of these things is especially unique, so it's pretty useless, except for statistical purposes. If you don't want Google tracking you, don't sign up for a Google account.

Every web site you visit is logging your accesses anyway. Server logs have been around since the dawn of the web.

I know there is value in this kind of data but I'll do my best to evade tracking just as I would evade people from my local supermarket if they decided to start following me around to see where I else I shopped, "just to improve your shopping experience in our store". Yeah, it's not the same thing but it's equally as distasteful to me.,

For those who use lots of Google services, by all means opt in and contribute with your data, but I stopped using Google for search a while back and the only Google service I use now is Earth (desktop) and Maps (iPhone).

They want to hold unto it because its very valuable no surprise there. I like the fact that they are open about what data they have about you and that they do not hold it forever. But honestly 18 months, 6 months who cares.

In the end it comes down to "Do I trust them to handle it securly?" And the answer is yes for two reasons.

a) Google unlike other companies ( Microsoft, Sidekick data ...) is competent when it comes to storing your data. Apart from some lost Gmail emails I have not heard about a bigger security leak whatever.

b) Having the trust of hundreds of millions of users is worth A LOT of money to them. So they will not squander that frivolously.

Both of these things need to go together. A company needs to have a substantial monetary interest in keeping your trust and it needs to know what it is doing. I don't think that Microsoft is more evil but I don't think that they are competent. Other companies esp. smaller companies have not that much to loose. So Google may not be perfect but it definitely is the company I trust with my data the most.

So, people who are paranoid and blocking ads because they feel their privacy is being viloated..

2 things: 1, You honestly think you're that important as an individual? Get over yourself. Aggreated, you are worthwhile. As an individual you're still the paranoid loser you were every other day of your life.

Second, do you wear a mask when you visit the store? People can see what you buy and if it's a small store, the owner knows "Gee, Bill Smith sure has a foot odor problem" and is far more information than google knows about you.

The site gets limited use; despite the hundreds of millions who use Google services or are served Google ads, only "tens of thousands" visit the Ads Preferences site each week, I'm told. One might assume that these would be the most motivated "opt-outers," those who actually understand what behavioral advertising is, know how it works, and hate it with a passion.

As others have pointed out, this is exactly the fallacy that Google wants people to believe. Motivated "opt-outers" are the ones most likely to have at the very least disabled the cookies required by Google's Preferences stuff. Quite the opposite of transparency, this is a perfect example of Google's attempts to obfuscate and manipulate the truth. This system allows them to say to the press, or Congress, or whomever, "Gee, we have an opt-out mechanism and no one uses it. Obviously most people must be okay with what we're doing." Never mind that it's completely disingenuous to make people positively identify themselves to the tracking system in order to supposedly not be tracked.

Make no mistake. Google can talk circles around the issue until they're blue in the face, but the simple fact remains that Google makes billions of dollars a year from advertising revenue, and advertisers pay more for better-targeted ad placements. That's the primary motivation for everything Google does. Period.

"We believe there is real value to seeing ads about the things that interest you," she wrote.

She believes incorrectly. I do not want to see ads.

QFT. I don't really care if the ads are targeted or not. I'll ignore them all the same. I'm not going to buy something from some shady Google ad from god knows who.If I want to buy something, I'll go find a reputable retailer myself and buy it.

And you're surprised?1. I didn't know about this until I read about it in this article.2. Having learned of it, I went searching at the Google site and couldn't find it.3. Having followed your helpful link to it, I ponder how "mass market" a feature is if it takes an ars link to find it.4. Visiting the page, finding the "Opt out" button I click it, and nothing happens (page reloads, looking the same).5. Had to click it again (get a page explaining opt-out; upon return to the ad prefs page, "opt out" has changed to "opt in").6. All things considered, you wouldn't expect people to visit this page very often, no matter how motivated; article-cited "tens of thousands visit the page; hundreds of millions of searches" sounds remarkably close to "everyone visits the page, once."