Search Privacy At Google & Other Search Engines

There's been some pretty scary statements made about Google and the privacy of search requests recently. You may have heard that Google was nominated for a "Big Brother" award. You may also have read that Google knows everything you ever searched for. Should you be frightened? It is time to boycott Google to protect yourself, as blogger Gavin Sheridan called for last month?

Relax. Yes, there are privacy issues to be aware of when you do a search at Google. However, these issues are just as much as a concern for other search engines you visit, as well. More importantly, the fear that you personally could be tracked isn't realistic, for the vast majority of users, at least by Google itself.

In this article, we'll take a closer look at just what exactly Google knows about you, when you come to do a search -- and see why you needn't be so worried, for the moment.

It's no wonder that people are getting worried about Google and search privacy after reading statements like these:

"Google builds up a detailed profile of your search terms over many years. Google probably knew when you last thought you were pregnant, what diseases your children have had, and who your divorce lawyer is."

The reality is that Google doesn't know who you are personally, as a named individual. Its use of cookies, which is hardly unique, doesn't give it a magical ability to somehow see your face and know your name, through your computer screen.

Instead, all Google knows is that particular browser software, on a particular computer, made a request. A cookie does give it the ability to potentially see all requests made by that particular browser software, over time. However, Google still doesn't know who was sitting at the browser when the request made.

In short, when I do a search at Google, this is all it knows that identifies "me" uniquely, an anonymous number:

740674ce2123e969

No name, no address, no telephone number. In fact, if someone else sits at my computer, Google can't even tell that someone new is now searching.

What Does Google Record?

Let's step back a moment, and see what happens when you come into Google, to understand how that unique cookie number is given to you and, importantly, tells Google nothing about who you personally are.

Let's assume you've never been to Google.com before. You visit the site and search for "cars." What does Google record?

As stated in its privacy policy, Google makes a note of things like the time you visited, your internet address and the type of browser you are using. All this is recorded in what's known as a log file.It's also standard practice for web servers to keep track of this information, so Google isn't doing anything unusual.

What does this look like? Here's a simplified example of how our search for "cars" might appear in Google's logs:

740674ce2123e969(my unique cookie ID, assigned to my browser the first time I visited)

My Internet Address

From the key information above, if Google wants to know who exactly I am, the most important element is my internet address (this is called my IP address, when not "resolved" and turned into a domain name):

inktomi1-lng.server.ntl.com

As you can see, that address says nothing about me being personally Danny Sullivan. NTL is a large internet access provider in the United Kingdom that I use to access the web. The internet address represents the name of the NTL computer that is serving my requests. (For the curious, the reason familiar search name Inktomi is mentioned is probably a remnant from the time when Inktomi provided internet caching services to ISPs such as NTL).

NTL itself could look at its records, then know that I personally connected to the web. However, it doesn't pass my name on to Google. This means that at most, all Google knows is that an NTL user came to visit.

It is true that in some cases, a person's internet address might be tied more personally to then. For example, perhaps you work at a company where everyone's computer is given a name that matches their own. Then your internet address perhaps could look like this:

danny.sullivan.searchenginewatch.com

Such situations are generally rare. They also still don't provide a guarantee that you, personally, are sitting at a computer that uses your name as part of its internet address. Nevertheless, this is a good reason for system administrators not to make internet addresses linked to personal data like names.

What About The Cookie?

Why would Google want to know you are, anyway? One reason is because some people set preferences, such as to see more than 10 pages at a time or to only see results in English. It's also helpful for Google's own internal reasons to know how often unique users come to its site and how they behave when searching.

Unfortunately, Google can't just depend on your internet address to know if it has seen you, an anonymous but uniquesearcher, before. For example, imagine I go offline for a few hours, then reconnect. Now my NTL address might be slightly different. Or if I have trouble getting a good connection with NTL, I might switch over to my AOL account. In all these cases, I'm the same person but with three different internet addresses. That means to Google, I'm three different people.

A cookie solves this problem, which is why so many sites use them. With a cookie, no matter what ISP I use to connect, Google knows that it has seen my particular browser before. This is because it places a unique numeric ID within a portion of my browser designed for these. Then, anytime my browser talks to Google, it sends along my unique ID, so Google remembers who I am -- at least, to the degree that I'm a unique browser on a particular computer.

Google still doesn't know who I am personally, of course. Indeed, if my wife sits down at my computer and searches, Google has no idea that the age and sex of the person searching as suddenly changed. It still sees the same cookie, since she's using my browser.

In addition, if I use my laptop computer while traveling, that has a different ID. So far as Google is concerned, I'm a different person. Similarly, if I have Netscape Navigator and Internet Explorer on my computer, Google will give each browser software its own unique ID. This means that if I switch browsers, I become a different person to Google.

The Big Brother Nomination

From what's been explored above, you can see it is possible that Google might know all the searches that a particular browser made. However, that's a far cry from having a personal profile of what a named individual searched for.

So how did we get from Google using anonymous cookies to situation where some people seem to believe that at any moment, Google can order up your personal search history? This has mainly emerged out of the nomination of Google by the Google Watch web site for Privacy International's 2003 US Big Brother Awards.

My related article, Google And The Big Brother Nomination, examines in detail the accusations that were made against Google in relation to privacy by Google Watch. It's a long article, so let me summarize to say that I did not feel any of the accusations held up as a sign that Google itself was abusing privacy.

In addition, it's important to note that while Google was nominated for the Big Brother awards, Privacy International itself did not select Google as a finalist. Clearly, Google was not seen as a large threat to privacy.

Nevertheless, some people have taken the mere fact that Google was nominated as a sign of wrong doing, which is especially bad given that anyone could nominate any company for the award. Even worse, some have misunderstood accusations in the nomination to mistakenly assume that Google must have personally identifiable profiles of them.

So How DO You Know It's Me Personally?

"Personally identifiable information." It's an important phrase to understand. It's used in the privacy world to say that some company has information that honestly and truly lets them know to some degree who you personally are.

For instance, in the situation with Google's cookie, it knows nothing personal about you. As previously said, it only knows that a particular and anonymous browser that it has seen before made a request.

In contrast, let's say you are a Yahoo member, where you've filled out a registration form giving your name, address, age and other information. Assuming you didn't lie when filling out that form, Yahoo really does know who you are, to some degree, when you've "logged in" to the web site. Unlike Google, Yahoo has some "personally identifiable information."

Moreover, because Yahoo knows who you are personally when logged in, it has the capability to know what you personally have searched for in the past, any time you've logged in. The same is true for any search site where users can create personal accounts.

Given this, worry about anonymous cookies as with Google as a potential search privacy threat is almost misplaced. Instead, any search site with a user registration scheme represents a more serious concern.

The Rise & Fall Of Yahoo Impulse Mail

For example, at the end of last year, Yahoo was considering what would have been the first significant use of personalized search history in the industry. A program called Yahoo Impulse Mail would have delivered targeted email ads based on what you searched for.

The program came to light in a report from last summer's Direct Marketing Days conference. Through the program, anyone with a free email account at Yahoo who hadn't opted-out of getting third party ads might receive targeted email based on their search habits. Marketers wouldn't send this directly, but instead it would come via Yahoo.

In other words, let's say you had searched for "cars" recently. As a result of this, Yahoo might send you an email ad on behalf of a company like Ford, since based on your search habits, Yahoo knew you were interested in cars.

When I followed with a Yahoo spokesperson about the proposed program last year, I was reassured that Yahoo would not share any personal information with advertisers. So while it might know my search habits, it wouldn't tell someone like Ford about me personally. Instead, it would serve as a middleman.

"We have a strict policy of keeping user information within Yahoo. Yahoo does not sell or rent user information to third parties," said Yahoo spokesperson Diana Lee.

I was also told that the information would be only used in "aggregate" form, implying anonymity in being mixed in with many others. However, it wasn't clear to me what protection this really was supposed to provide. My impression was that Yahoo would aggregate a series of terms linked to a particular email ad, then those ads would go out to the mass of users who searched for them. If so, this didn't negate the fact that individual users would still be monitored personally, in some way.

Another defense of the program was that people could "opt out" of getting these targeted emails.

"The key thing is that people have opted in and signed onto their email account before doing the search," said Alan Thompson, senior producer, in a story about the program. "We're being very conservative about privacy."

To me, this wasn't reassuring. Sure, you could opt-out of receiving the email. However, Yahoo gave you no ability to opt-out of search monitoring while logged in as a user. To keep your searches private from Yahoo, you instead needed to overtly log out at a Yahoo member.

So what happened with the program? Today Yahoo says it was never formally announced, despite the earlier story positioning it this way. In addition, Yahoo says it has no plans to unveil it. The company says it was simply an idea that some Yahoo staffers were considering, was tested briefly but that Yahoo ultimately decided not to push ahead with it.

"We explore different ideas all the time. It doesn't mean all of them go to fruition," said Lee.

Privacy & Personalized Search Results

So the first major mining of personal search histories failed to happen. That doesn't mean it won't come up again. Indeed, we'll almost certainly see such mining happen when some search engine finally gets brave enough to try personalized search results.

In such a system, a search engine might feed you results customized based on your age, sex and other demographic or personal information. In order to do so, you'd need to register with the service and agree that it would monitor your searches and listing selections.

I first wrote about the potential of such systems back in 1998, as well as raised the potential privacy issues involved:

"I've long been expecting for some journalist to track down what a politician is searching for while at work, in the way that some have sought video rental records," I wrote at the time.

Personalized search remains in the future. Indeed, Google acquired a firm called Outride back in September 2001 that was working on a personalized search solution. However, Google since done nothing public in the space. Nor are any other major search engines suggesting that they are about to do so. User concerns about privacy have been a key factor in holding this search advancement back, they've said in the past.

Time For Better Privacy Policies?

Clearly some people are worried about the privacy of their search requests, be it at Google, Yahoo or any other search engine. Google's taken the most heat over this issue lately, and largely without cause.

Nevertheless, a useful thing that the entire search industry might do is to reexamine their privacy policies and consider expanding them to provide more specifics about what exactly happens with search data on a personal basis.

Below, I'll look at the policies of just two search engines, Yahoo and Google, to illustrate some potential changes and problems. Chris Sherman is also planning a recap across several other search engines, in the coming weeks, for SearchDay.

Yahoo's Search Privacy Policy

When I looked at the Yahoo Impulse Mail program last year, another defense Yahoo offered about the proposed program was that its privacy policy made clear that searches were monitored. Let's take a look at the portion of that policy that's about search:

"When visitors conduct a search on Yahoo, we keep track of which search terms are popular. You can save your searches and access them from your My Yahoo page. Advertising shown to you may be related to the search term you entered"

That's the extent of Yahoo's use of search requests. In no way does it suggest that Yahoo can associate your searches with your personal profile, a capability the company has, though it is not something Yahoo says it currently does.

The fact that you can save searches does obviously suggest that those particular saved searches could be linked to you. However, this is also an action you would have explicitly chosen to do.

Finally, as for showing advertising, this has historically been done based not on your personal profile but rather by in real time customizing the ads you see based on the terms you entered.

Why isn't the policy more detailed? My explanation is that until now, Yahoo nor any of the other search engines have not felt consumer concern to go into more depth.

For instance, Yahoo has an entire page about its use of cookies, with paragraphs and paragraphs of information. The same is true for its use of web beacons or web bugs. But search privacy is covered in just a single paragraph? What gives? The answer is that consumers have had a lot of concerns about cookie use and web bugs, especially given both scare stories and real abuses that have happened.

In contrast, search privacy has simply not been raised as an issue. I would argue this largely because no one has abused it in any way. While that may continue, the allegations raised about Google mean that the entire search engine industry will need to better explain what happens with search data.

Google's Search Privacy Policy

For example, as stated, some people believe that Google has personalized search histories for them. It doesn't, and it should say as much in its privacy policy. Instead, it simply says:

Individually identifiable information about you is not willfully disclosed to any third party.

That implies that Google has personal information about anyone searching at the site and may cause more worry that relief. Google does later say:

Google does not collect any unique information about you (such as your name, email address, etc.) except when you specifically and knowingly provide such information. Google notes and saves information such as time of day, browser type, browser language, and IP address with each query.

That's more helpful, but imagine a section that explains specifically what Google does with searches. It could look something like this:

When you search at Google, information is recorded along with the search conducted, such as the time of day, browser type you used, your internet address and an anonymous user ID provided by our cookie.

Personal information, such as your name or email address, is not recorded. Google does not require such information to be provided in order to search the web. It may be collected if you use other Google services, such as Google Groups. However, no personal information collected is ever linked with the anonymous ID assigned to your search requests.

Google never provides search histories to third parties, unless required to by law. In these cases, the search histories provided carry no personal information about you.

OK, that's a rough idea, doesn't cover everything and certainly could be written better. However, a policy like this provided by Google and others would certainly help consumers better understand officially what happens when they search. For a similar idea, check out the nice privacy policy that Yahoo maintains about internet address or IP logging.

As for services that do record personal information with searches, I could well see them providing an option to opt-out of this. Or, if they won't allow an opt-out, they might have to provide better guarantees about how that information is safe-guarded or destroyed over time.

The Google Account

You may have noticed that in my mention of a Google privacy policy revision, I made note of personal information perhaps being collected for non-search activities, such as using Google Groups. In addition to that, those who want to have a Google API key or take part in Google Answers need to have a Google Account, something introduced in the middle of last year.

Google says that having an account gives you a separate cookie that is not recorded or linked with your searches. The same is true if you take part in Google's advertising programs, which require personal information. However, having a Google Account does open the possibility that in the future, you could see that linked to search queries. If so, privacy implications will need to be addressed by Google.

Trusting The Companies

As seen, searching in a situation where a company only knows you by cookie ID keeps you anonymous even to that company. Instead, it's only when you "log in" through a user registration system that you need to have a heightened awareness that you've given up your anonymity.

Once you register with Yahoo and sign in to our services, you are not anonymous to us.

Nor is giving up your anonymity bad, assuming you trust the company you do this with. As Yahoo says:

"Yahoo has long been a brand and company that consumers trust. Consumers voluntarily register to log in and provide personal information because they trust us and we work very hard to continue to maintain that trust through robust notice and meaningful choice," said Lee.

However, should you decide for whatever reason that you want to regain your anonymity, then do log out before doing some searches. That will offer you better protection, at least from the companies.

"If you decide you are searching for something and don't want us to know, it's very easy for a user to log-off," Lee said.

Lee also stressed that Yahoo doesn't currently maintain search "profiles" of its users and instead finds that when it comes to search, it's not effective to try and target ads based on what people search for over time -- their search behavior. Instead, it is far better to target on a per-search basis, as is done through paid listings, an activity that can be done without profiling.

The Government's Missing Key

Finally, back to Google. While Google lacks any personal information to tie into search requests, the information it records could conceivably do this if a government agency got a hold of it.

For instance, this is the horror story Google Watch tells:

"The fact that you record unique cookie ID, plus IP number, plus date and time, makes much of your information "identifiable." Authorities can also do a "sneak and peek" search of a Google user's hard drive when he isn't home, retrieve a Google cookie ID, and then get a keyword search history from you for this ID.

Sure, this is absolutely possible, along with other things detailed in my Google And The Big Brother Nomination article. But even then, it doesn't show conclusively who conducted searches, assuming that more than one person lives in the house. Nor do searches in and of themselves mean anything. Is someone a terrorist because they searched for "weapons of mass destruction" or "osama bin laden?" If so, then millions will need to be arrested, as plenty of ordinary people have looked for these things.

It's also important to note that even without a Google cookie ID, a government authority, as long as they know your ISP, could go to the ISP directly to see records of what you've searched on at Google, as well as anywhere else you've been on the web.

The U.K. Supreme Court has granted permission in part for Google to appeal against a ruling relating to a dispute over the user information through cookies via use of the Apple Safari browser.
0 Comments