Which is pretty exciting as both are huge leaps towards what we’ve envisioned as a “datatrust” in various blog posts and our white paper. Well except for maybe the “trust” part. (Especially given our experiences with Yahoo here and here.)

A few more points to contemplate:

Now that the Promised Land of collating all the world’s data approaches on the horizon, will that change people’s willingness to make data publicly accessible? What I share on my personal website might not be okay rearing its head in new contexts I never intended. As we’ve said elsewhere, when talking about privacy, context is everything.

What about ownership? Both Yahoo! and Google may only temporarily cache the data insofar as is needed to serve it up. But, in effect, they will become the gatekeepers to all of our public data, data you and I contribute to. So the question remains, What about ownership?

There’s still a lot of data that’s *not* publicly accessible. Possibly some of the most interesting and accurate data out there. How will we get at that? Case in point, Facebook just shut down a new app that allows you to extract your personal “Facebook Newsfeed” and make it public via an RSS feed, citing, what else? Privacy concerns. (Not to mention the fact that access to Facebook data is generally hamstrung by privacy.)

For people who choose not to be tracked, Google developed a plug-in that persists even after cookies are cleared. Most other systems for opt-out rely on cookies. Given that most people who are concerned about their privacy clear their cookies periodically, it was important to EFF that Google’s opt-out mechanism would remain even if all cookies were cleared.

Even more interesting was Google’s decision to link a page to the caption “Ads by Google” that explains the behavioral targeting technique with a list of interest categories that have been assigned to you. In other words, Google is making more transparent what they know, or think they know about you. You can then choose to remove some of those interest categories or to opt-out of tracking altogether.

As Zimmer points out, Google could show more fine-grained detail regarding what they know about you. But it’s still a fascinating step for a major corporation to take. Even better, Google isn’t the only one creating pages that show users how they’re being viewed for marketing purposes.

BlueKai and eXelate Media run “behavioral exchanges,” selling information to companies about website visitors. Like Google, they both provide pages, here and here, where people can choose to opt-out of tracking altogether. Otherwise, they can monitor and edit what interests are associated with them.

It’s hard to know how “transparent” all this really is to people who are not tech and privacy geeks. Ultimately, companies need to improve data collection practices for everyone, not just people who care enough to find out. And I would argue that it can’t be a model where a select few can just opt-out and protect themselves, and the companies can continue to do anything they want to do with everyone else’s data. But it’s still a new way of managing your life online that doesn’t require as much investment in self-education and time as the many of the other methods described by EFF in its Surveillance Self-Defense Site.

Will this model become the dominant one in online tracking? Compare the transparency of these companies with RealAge, an online quiz that’s just been outed as selling information to pharmaceutical companies who want to market directly to quiz takers. What most consumers find instinctively distasteful is a feeling of being fooled. RealAge claimed that it protected privacy by not giving personally identifiable information to the companies and that it is “providing value in return for the information” with ads that might interest the quiz takers, but it’s not the kind of value RealAge users consciously “paid” for. What BlueKai, eXelate Media, and Google have shown is an understanding that for many people, their privacy is violated not just when a company knows such-and-such information is associated with Mr. Tom Smith, but when any of that information is being collected and shared without the full knowledge and consent of Tom Smith.

It’s obvious why RealAge chose to be vague about where their profits came from–would 27 million people have taken the test if the website had declared prominently that the information would be sold to pharmaceutical companies? But it’s hard to see how sustainable that business model is. Presumably, BlueKai and eXelate Media, as well as Google, will also get somewhat less data with their more transparent strategy. But what model of business will still be around ten, twenty, fifty years in the future?

Please allow me to now further simplify by summarizing Darnton’s analysis:

> The Enlightenment represented the dawn of a new age of learning, built on the free-ish exchange of ideas in letters and books.

> The enlightened founders of the United States limited copyright to 28 years, recognizing the necessity of both protecting authors’ rights and advancing public knowledge. Life expectancy was much shorter then, but a young author could have a reasonable expectation of his or her book losing copyright within their lifetime.

> The 1998 Sonny Bono Copyright Term Extension Act extends copyright to the life of the author + seventy years. That means the books now entering the public domain date to roughly to the 1920s, and all the authors are dead.

> Google has been digitizing millions of books. Some of them are in the public domain, some are still copyrighted, and the largest portion are copyrighted but out of print, and therefore largely out of reach.

> A draft settlement between Google and publishers promises to bring the texts of these books to the people, at low cost (at your home computer) or no cost (at public and university libraries which purchase a license). This archive could quickly become the world’s largest library, bar none.

> This exciting archive could represent a Digital Republic of Learning that would have made Diderot (the author of the first encyclopedia) salivate.

> While there have been some similar efforts by not-for-profit groups like the Open Content Alliance, Google Books, will eat their lunch.

> The draft agreement between Google and publishers has problems: libraries would be limited to a single computer terminal with access to the archive, and users would have to pay to print copyrighted material.

> The biggest problem, however, is this:

“What will happen if Google favors profitability over access? Nothing, if I read the terms of the settlement correctly. Only the registry, acting for the copyright holders, has the power to force a change in the subscription prices charged by Google, and there is no reason to expect the registry to object if the prices are too high.”

It’s interesting to consider this scenario. In the short life of Google, most criticism has come from a smallish cadre of geeks. Under different management, could the company ever do anything to make your mom mad?

Everyone is in a tizzy with the news that Google is slashing its data-retention policy from 18 months to nine. To be more specific, Google will “anonymize IP addresses on our server logs after 9 months.” The announcement, though, only highlights for me the lack of clarity around the word “anonymize” and the general lack of information around what these data retention policies are actually doing for users’ privacy.

Data-retention is a big issue for some privacy advocates, on the theory that something like the AOL privacy scandal wouldn’t have happened if AOL hadn’t been storing the search queries to begin with. But as we’ve stated before, we at CDP don’t think data deletion is the answer. In fact, we’re concerned that announcements like the one today from Google can actually further confuse consumers about what’s at stake.

To begin with, Google isn’t promising to delete its data after nine months, just to “anonymize” it. The company knows that the word “anonymize” can mean quite a lot of things, and even says so: “We haven’t sorted out all of the implementation details, and we may not be able to use precisely the same methods for anonymizing as we do after 18 months…”

Google is being prodded by the European Union’s stricter regulations around privacy, but even the EU directive on data retention only states, “Such data must be erased or made anonymous when no longer needed for the purpose of the transmission of a communication, except for the data necessary for billing or interconnection payments.” No clear directive on what “made anonymous” means.

When AOL made its search query data public, the company thought it had “anonymized” it. Same when Netflixreleased its data. That didn’t stop people from individually identifying people in the “anonymized” data set. I trust that Google’s engineers are not using AOL’s and Netflix’s “anonymization” techniques, but it’s clear that focusing so much on the length of time data is retained draws attention away from what happens after the nine months are up.

Cuil, the new search engine, launched with much fanfare this past week. It’s been blogged about all over the place already, so I’m not going to analyze how its results compare to Google’s. I’m more curious about its privacy policy, which trumpets that it collects NOTHING, nada, zip, zilch.

The two news items together highlight the problem at the heart of our ongoing search for more privacy online. Despite all the handwringing over online data collection, especially by big search engines, people love seeing the data that gets collected, even when they’re not advertisers. We want to see how often we’re mentioned in Twitter, or what parts of the world are searching for topics we blog about. It’s not hard to imagine more serious research and analysis being applied to this data and real social good coming out of it.

I’ve never found very compelling the National Rifle Association’s argument, “Guns don’t kill people; people kill people.” But I find myself wanting to say something similar about data collection: “Data collection doesn’t violate privacy; irresponsible people and laws violate privacy.” Shutting down data collection altogether can’t be the answer.

“It’s what we do. Our corporate mission is to organize the world’s information and make it universally accessible and useful. Health information is very fragmented today, and we think we can help. Google believes the Internet can help users get access to their health information and help people make more empowered and informed health decisions. People already come to Google to search for health information, so we are a natural starting point. In addition, we have a lot of experience storing and managing large amounts of data and developing consumer products that offer a positive and simple user experience.”

I thought their mission, as a corporation, was to maximize profits for their shareholders.

The answer to Question #6 is even worse:

“Much like other Google products we offer, Google Health is free to anyone who uses it. There are no ads in Google Health. Our primary focus is providing a good user experience and meeting our users’ needs.”

But we all know that “other Google products” that are free make money through advertising. And there are “no ads in Google Health”?

In launching Google Health, Google has clearly acknowledged that health information is even more sensitive than the personal information the company has been assiduously collecting up to this point. Although it glosses over the differences between its other applications and Google Health, promising to “conduct our health service with the same privacy, security, and integrity users have come to expect in all our services,” the mere fact that it doesn’t have advertising trumpets that Google is trying to differentiate Google Health from something like Gmail.

But the harder Google tries to assure me that there is no advertising and that the service is free, the harder it is for me to believe there are truly no costs to me. Clearly, there is a real value to providing secure online access to personal health records. Medical records, for the appropriate people, should be accessible, transferable, and plain legible, as anyone who has tried to read a doctor’s handwriting can attest. So why would someone give me something for nothing?

According to the Wall Street Journal, Google is not ruling out advertising in the future, and in the meantime, it hopes Google Health will simply drive more users to Google in general. Perhaps Google itself doesn’t quite know where Google Health will go. But given how easy it is to imagine nightmare scenarios of what can happen with this kind of information, I want the company who’s collecting it and storing it to have a better story about why it’s doing this.

Even as Google has become the most coveted place to work, to the extent that even their cafeteria gets media coverage, it’s also getting increasingly negative attention as a potentially sinister force. The New Yorker recently published an article with rather vague speculation at the way Google might take over the world. Now, we hear that Microsoft is trying to buy Yahoo so they can together fight Google. (Isn’t it funny that Microsoft is seeing another company as the big, bad world-dominator?) More and more, people are starting to wonder, “What exactly is Google up to?”

But given that we can’t read the minds of Sergey Brin and Larry Page, perhaps what we should be looking at is the conflict-of-interest inherent in Google’s business model. Google’s stated mission as a company is to organize the world’s information and make it universally accessible and useful. But are Google’s customers really the individuals searching for information, or are they the advertisers who actually increase Google’s revenues and stock value? To be fair, Google makes a respectable effort to separate advertising from “legitimate,” as in “non-jerry-rigged” search results. But after ten years, the Google search experience is pretty much the same as it’s always been. Has Google been working really hard on tools to help people find better information faster, or has it been working really hard on tools to help advertisers better target potential customers?

Google doesn’t have to be evil to be troubling. It may have started out with the purest of intentions, but it’s hampered itself with the conflict-of-interest at the heart of its operations. Law professor Tim Wu, as quoted in the New Yorker, said it straight, “I predict that Google will end up at war with itself.”