Leaderboard Zone

Echoing other conversations I’ve seen around the web (wait for 12/14 to post)….Scott pings the Big Concept that while the folks building Google now are clearly well intentioned, we are creating an asset in Google together that someday may be out of our collective control. In Salon’s blog, Scott Rosenberg comments:

But Google is a public company. The people leading it today will not be leading it forever. It’s not inconceivable that in some future downturn Google will find itself under pressure to “monetize” its trove of books more ruthlessly.

Today’s Google represents an extremely benign face of capitalism, and it may be that the only way to get a project of this magnitude done efficiently is in the private sector. But capitalism has its own dynamic, and ad-supported businesses tend to move in one direction — towards more and more aggressive advertising.

Since we are, after all, talking about digitizing the entire body of published human knowledge, I can’t help thinking that a public-sector effort — whether government-backed or non-profit or both — is more likely to serve the long-term public good. I know that’s an unfashionable position in this market-driven era. It’s also an unrealistic one given the current U.S. government’s priorities.

Content Marquee

Charles Ferguson writes a lengthy and clearly considered piece on Google for Tech Review, focusing on the Microsoft angle and concluding that the only way Google can truly “win” is by controlling a new architecture of computing through the time honored approach of proprietary APIs. Ferguson argues that the search wars are about to enter a major battle for control of standards which simplify the increasingly heterogeneous world of search, and in such a battle, Microsoft is far better suited.

I enjoyed reading this piece, and I am sure I will read it again and again, to more fully consider its argument. But I find myself disagreeing with the premise – why, in this world of the web, do we need to be bound by this winner takes all approach to the world? It works in a resource constrained world of homogenous PCs – once a consumer has purchased his Windows box, he’s not going to easily purchase an emerging competitor – but somehow, it really doesnt’ strike me as the right metaphor for a Web 2.0 world. I do agree that Google would be well served to make its service more of a platform, and that APIs are the way to go. But I’d really be interested in what Tim O’Reilly has to say about this piece, or Tim Bray, or any number of other folks. I’ll keep my eye out…meanwhile, do read the piece. It’s a worthy provocation.

Some folks have been calling me and together we’ve been pondering the implications of the Google Print announcement. And one drop dead obvious thing dawned on me during the conversations.

This is so obvious as to be almost embarrassing to restate, but this program marks a major departure in Google’s overall approach to search. After all, what has been the presumptive model till now? If it’s on the web and publicly available, it’s in the index. That’s why we called it web search, after all. But Gary Price and Chris Sherman, among many others, have reminded us how vast and darkly lit the invisible web is – all that information trapped in the amber of password-protected databases, or crumbling film libraries, or ….books.

Now other companies have taken significant steps toward illuminating these dark corners of the world’s knowledge web – Yahoo with its CAP program, Amazon with A9 and Search Inside the Book. And Google has long claimed that it’s mission was to go beyond the web and crawl the world’s information, wherever it lay.

But Google was, until now, the world’s purest web search engine. What, I wonder, are the implications of tens of millions of book pages entering this once pure space? (Google has announced that the results will be included in the index, not separated out in a vertical book search engine.)

Why am I on about this? Well, it comes down to the essence of what – so far – has made Google Google: the ranking paradigm. Here’s a sketch from the book I am working on:

In essence, academic publishing is a flawed but useful system of peer review incorporating ranking, citation, and annotation as core concepts. Fair enough. So what?

Well, in short, it was Tim Berners Lee’s attempt to address the drawbacks of this system (through network technology and hypertext) that led to his creation of the World Wide Web (4), and it was Larry Page and Sergey Brin’s attempts to make Berners Lee’s World Wide Web better that led to Google.

Which brings us back to Page, and his original research work focusing on backlinks. He reasoned that the entire web was loosely based on the premise of citation and annotation – after all, what was a link but a citation, and what was the text describing that link but annotation?

The point I’m making is this: Google was born of, by, and in the web, as an extremely clever algorithm which noticed the relationships between links, and exploited those relationships to create a ranking system which brought order and relevance to the web. Google’s job was not to build the web, its job was to organize it and make it accessible to us.

But all this new Print material, well, it’s never been on the web before. It’s Google who is actively bringing it to us. How, therefore, does Google rank it, make it visible, surface it, and..importantly…monetize it? If a philanthropist were to drop the entire contents of the Library of Congress onto the web, Google would ultimately index it, and as folks linked to the content, that content would rise and fall as a natural extension of everything else on the web. But in this case, Google itself is adding content to the web, and is itself surfacing the content based on keywords we enter. This is a new role – one of active creator, rather than passive indexer.

This means, in short, that Google is making editorial decisions about how to surface this new content, decisions it can’t claim are based on the founding principle of its mission – PageRank. Sure, there are straightforward keyword matching techniques, and over time the web will deep link those book pages – each page in Print has a unique URL. But really, the magic of what made Google Google – the existing link structure of the web – is entirely non-existent with these newly surfaced print pages. By extension, the same will be true for any new media brought into the index – be it movies, music, radio, television, photos, you name it. That’s why I’m so interested in what role Google will play in monetizing this content (see here and here) and why I am so fascinated with this media v. technology angle.

I guess the net net of all this is that this move by Google, which I think is monumental, marks a shift in who the company is in the world. It’s no longer simply an indexer of the world’s knowledge web. Google Print is a clear declaration that it’s a builder of it as well.

The NYT now reports on Google’s program to digitize some of the world’s most important libraries, and it is truly an amazing project. Google was founded at Stanford in partial association with that university’s digital library effort, so this must be a pretty proud day for Stanford, which is a participant, as well as the original Googlers. John Markoff spoke to Larry Page:

Mr. Page said yesterday that the project traced to the roots of Google, which he and Mr. Brin founded in 1998 after taking a leave from a graduate computer science program at Stanford where they worked on a “digital libraries” project. “What we first discussed at Stanford is now becoming practical,” Mr. Page said.

The details: Google is working with Stanford, the University of Michigan, Harvard, Oxford, and the New York Public Library to make millions of books available in its index. For now the project is in pilot phase, but there are hopes and expectations this will go big in the next few years. A source told me the project was originally named Google Library, but for now it will exist under the Google Print moniker. An example of Google Print is here. The screenshot at left is what I was provided by Google for today’s launch.

The implications here are significant. First, the idea that the world’s knowledge, as held through books and libraries, is opening up to all via a web browser cannot be understated. It’s one thing to have the an original copy of The Origin of Species on the shelves, where students and interested parties have to travel to find it. It’s another to have it available to everyone via a search index and your web browser. Second, this move clearly puts Google in the category of innovator when it comes to adding information to their index. But it also raises significant business model questions, one that are both exciting and unanswered. I brought them up in an earlier post:

A very interesting case will be Google Print. As that program expands, and it’s rumored that it will, dramatically, a number of questions arise. How will Google monetize out-of-copyright books? If it indeed does bring tens of thousands of out-of-print books onto the web and into its index, will it allow others to access and index that new treasure trove, or will it act more like a traditional media company, which would “own” that resource for itself? How will it choose what it brings into the index – those that might sell? Those that somehow are the most “in demand” by some measurable standard? With regard to books that are in print, will it limit itself to being soley an organizational tool supported by AdWords, or will it start to take a vig for books that are sold via the Google Print service (in fact, maybe it does already and I’m simply unaware of it – any publishers out there, let me know!)? And will the print model scale to television and movies or music?

Google Print already monetizes a selection of in-copyright books via advertising, and shares some of those revenues with the publishers. But it’s a very short distance between that and, say, an affiliate link to Amazon or any other booksellers for a cut of an in copyright sale. It’s also a very short route to the on demand publishing of an out of print and out of copyright book with a company that is set up to do such a deal, and I am aware of at least one that is about to launch that will provide just such a service. Of course, if you want an ebook, that can be arranged as well. For out of copyright books, the tail is extraordinarily long, and quite possibly very very profitable. In other words, this could well be a step toward diversifying Google’s revenue streams away from advertising and into direct sales and/or subscriptions – ie, the content business. As one source who is familiar with the industry tells me, Google is not doing this only out of the kindness of its heart – there is a lot of money to be made in selling books, in particular books with no copyright.

I did ask Adam Smith, a manager of the Print program at Google, how Google will decide which books get scanned first. He said quite forthrightly that he did not have a good answer for me on that yet. I’ve heard from others that for now it’s pretty random, but the question is important. As to whether Google will allow anyone else to index the books they scan, I am pretty sure the answer is no. After all, Amazon is also scanning books, and I am sure they aren’t letting others in on their hard work. I’ll repost if that turns out to be inaccurate. And of course there are other efforts, including Project Gutenberg and the Internet Archive. But now, we have a commercial giant who has both a mission-based (organize the world’s information and make it accessible) as well as a commercially viable reason to bring this information to the world. As David Hayes, a copyright lawyer at Fenwick who worked on this deal and who I’ve known from my own work with his firm put it: “This will create a revolutionary new information location tool that should be a benefit to the whole world.” I for one applaud the effort – it’s an example of enlightened capitalism, and I hope it thrives.

Bill Gates has a Google thing. When I asked him about the search competition last summer, he turned on the sarcasm. “We’ll never be as cool as them. Every conference you go to, there they are dressed in black, and no one is cooler!” Clearly Gates’s dander was up, not only because the Google upstarts were eating his lunch, but they were press darlings as well. Behind the rant was a taunting subtext: watch me. Bill, you see, had been busy figuring how to get his lunch back.

I am under embargo* on the details of this until later tonight, but this just came to me without me asking from a very reliable source. To honor the embargo, I will reserve my analysis and thoughts for later, and simply reprint the text of a note which was sent to certain parties at Harvard today. It describes what has been called (by the NYT) “Project Ocean” – a pilot project to scan and make searchable the contents of some of the world’s most prestigious libraries. I went into some of the issues this raises toward the bottom of this recent post (where I talk about Google Print). For the entire text of the email, click on the extended entry. Snippets:

As all of us know, Harvard’s is the world’s

preeminent university library. Its holdings of over 15 million volumes

are the result of nearly four centuries of thoughtful and comprehensive

collecting. While those holdings are of primary importance to Harvard

students and faculty, we have, for several years, been considering ways

to make the collections more useful and accessible to scholars around

the world….

Harvard University is embarking on a collaboration with Google that could

harness Google’s search technology to provide to both the Harvard community and

the larger public a revolutionary new information location tool to find

materials available in libraries. In the coming months, Google will collaborate

with Harvard’s libraries on a pilot project to digitize a substantial number of

the 15 million volumes held in the University’s extensive library system.

Google will provide online access to the full text of those works that are in

the public domain. In related agreements, Google will launch similar projects

with Oxford, Stanford, the University of Michigan, and the New York Public

Library. As of 9 am on December 14, an FAQ detailing the Harvard pilot program

with Google will be available at http://hul.harvard.edu….

If the pilot is deemed successful, Harvard will explore a long-term program with Google through

which the vast majority of the University’s library books would be digitized and

included in Google’s searchable database. Google will bear the direct costs of

digitization in the pilot project….

* To be clear, my rules on embargos are this: I promise not to report anything I’ve been told by the organization that requests the embargo until the embargo time, but if similar information comes to me through third party sources, I will report that information. I will not, however, use that third party information as an excuse to disclose any information still under embargo.

Look what comes up as an AdWord when you type in GOOG, which is the stock ticker for Google. Clearly, some enterprising real estate agent thinks the folks at Google are checking their net worth and thinking about buying a house near work….