Google: The Ten Years Stories

In the past two weeks nearly every press outlet on the planet has called me asking for thoughts on where Google is going and how Google got to where it is. The reason? Google turns 10 years old, according to most estimates, this weekend. I've talked to as many…

In the past two weeks nearly every press outlet on the planet has called me asking for thoughts on where Google is going and how Google got to where it is. The reason? Google turns 10 years old, according to most estimates, this weekend.

I’ve talked to as many folks as I can (after all I was a journalist covering technology for quite some time) but I did have to turn down a few given how busy life gets after the summer holidays. In any case, I’ll post links to all the Ten Year stories I find here (not just ones I’m quoted in!), starting with the Daily Telegraph in London:

29 thoughts on “Google: The Ten Years Stories”

Congratulations to Google. It´s an awesome company. I love their products and I am happy that this company exists, since its innovations are a great benefit to human kind. In addition I can´t understand the hysteria about its alleged efforts of dominating the world, since every person is free to decide wether to use google-services or to don´t do so.

I remember when I grew up in the NYC metropolitan area there was a radio station that espoused: “dare to be different”.

Nowadays, it seems like the tech industry will let you wear any color t-shirt you like — as long as it’s black. ;D

But like Robin Williams (quoting Robert Frost), we should dare to take the “road less traveled” — even if that appears to be “the road not traveled”:

This used to be not only possible in the United States, but it even used to be the hallmark of “freedom” that made the United States special. Now, it seems that Americans from Benjamin Franklin to Milton Friedman would turn in their graves to hear people say that they are not free to choose which road to take.

Presently, people seem to act like the only way to get information out of the Internet is to go by one particular Bible.

One algorithm from one source — well, maybe one source for text, another for audio, and one for video.

People talk about the “network effect” — and thereby support the idea that a larger catalog is better than a smaller catalog.

However, this is probably false, for the same reason that a full-text search of the library of congress (a library of deposit, but nontheless for the “purposes” of congress) for an article about oncology. It might make more sense to “search” the NLM for such information, or even better an abstracting / indexing service focused on oncology (and therefore using the access vocabulary used in the field of oncology).

Using a “one-size fits-all” algorithm for comparing appliances with hotels simply doesn’t make any sense whatsoever. Nonetheless: that is what Google wants — and for some reason, people seem to feel that they have “no alternative” than to use one (“universal”) algorithm from one source, rather than to use different algorithms for different kinds of information.

It is in fact a grotesque aberration of what the Internet is all about (namely a decentralized, distributed network). Instead of using the Internet the way it was designed (i.e., as a multitude of narrowly targeted information sources), people are behaving as is they have no choice but to accept one single media empire is “dictating” (e.g., see also http://battellemedia.com/archives/004598.php ).

Presently, people seem to act like the only way to get information out of the Internet is to go by one particular Bible. One algorithm from one source

Ok, got it. I could not agree more. It’s scary, the way people don’t think twice about the diversity of their information sources.

What gives our liberal culture strength is not that my Bible (or Google) is better than your Bible (or Live or Ask or whatever), but that we have the ability to understand and engage with multiple Bibles.

You want to know, frankly, what makes me tired? The fact that Web Browsers ask for search engine defaults. Even the new Google Chrome. Why should I have to pick a default? Why should there even exist a default? There shouldn’t.

My web browser should be able to send my query out to 10 different search engines, simultaneously, and then synthesize the results. Or show me the results of one, but show me how different (or diverse, unique, standard, commonplace, etc.) the results were from one engine to the next. Basically, the web browser should be able to give ME real-time statistics and feedback on the quality of multiple search engines, based on my own browsing behavior.

Picking, or having to pick, a default is similar to what you say, about one bible to rule them all. It’s a broken model, a broken architecture. And most of the tech industry is happy to propagate it, rather than really innovate, because they can’t imagine anything else.

Well, AFAIK, there are about 1×10^100 information sources possible per TLD (give or take a couple zeros 😉 — that’s a whole lot. Seems to me like a couple hundred thousand should suffice — but you’re right: 1 is rather “parochial” (and that’s why I don’t pay any attention to that — because I am free to choose!

I think where we disagree, nmw, is that I still see the need for general purpose search engines. For example, what if my information need is to find cultural and historical influences for the Czechoslovak Velvet Revolution in 1989 — to find actual evidence for these influences, not simply to find some wikipedia page?

Would I want a history search engine vertical? A music search engine vertical? A philosophy vertical? A theatre/playwright vertical? A political vertical? Some other keyword vertical? No. I want something general purpose than spans all types of knowledge. Done in a better way than Universal Search, because that is just a glomming of computer-data-type verticals, rather than topical verticals.

But there is still a need for a search engine that is general. Not for shoes, not for houses, but for really seeking to understand the world.

But again, Google itself doesn’t do that. They are a navigational-based search engine (find the “Home Depot” home page), rather than in informational-based search engine. They really are not set up to support my Velvet Revolution information seeking behavior.

What, my information need (about the Velvet Revolution) sounds like scifi to you? That should not be the case at all. It’s what 40+ years of Information Retrieval as a field of study has been designed to do. It’s only in the last 10 years (happy anniversary) that the web has really turned its back on that sort of information seeking behavior, and left those of us wishing to find that sort of information high and dry.

Nick Carr had a recent article on the question of whether Google is making us stupid. My answer to that is yes.. because it only supports one type of information seeking behavior, and one type alone: Surface skimming of information. Top 10, and then move along, nothing to see here, buddy.

It doesn’t have to be like that. The search engine itself can be designed to elicit deeper engagement with the information, deeper exploration, deeper understanding. That’s not scifi. That’s a conscious decision not to implement the entirely-possible

But there is still a need for a search engine that is general. Not for shoes, not for houses, but for really seeking to understand the world.

Let me clarify: There is still a need for the shoes and houses vertical. I hastily said there wasn’t, but it’s not what I meant. I do agree with you that we need more of those sorts of things.

I am saying, though, that we *also* need some general purpose engines, too. And not just keyword verticals. Those general engines should allow us to go deeper on general knowledge.

But what we don’t need, and where I think we agree, is a search engine that tries to be both a shoe vertical, and a general informational engine, and really ends up doing neither that well at all. What ends up happening it that the search engine turns into a Home Depot-finding engine. And defaults to Wikipedia on everything else.

I’m not saying that the computer has to *make up* the answer. I’m just saying that the computer has to help me *find* the answer. Or, find the set of pages, which contain the snippets, which snippets when assembled allow me to synthesize the answer.

But the main issue here is that your information need is not navigational. You’re not trying to find the one home page for some known item. You’re actually trying to find out.. you are searching for.. information related to some topic.

This is what information retrieval (“search”) is all about. Finding information that is relevant to your need. It’s not scifi. It’s what lots, and lots of people are doing. There are whole hosts of techniques, both algorithms and interfaces, that exist to deal with that sort of question. Has Google ever given us *any* of those algorithms or interfaces? No. Instead, they would rather tweak whitespace.

Here is another example: Suppose I am a lawyer in the Enron case, and I want to find emails from that Enron email data set that show evidence of individuals knowingly and actively engaged in market manipulation.

If I were to slap a Google Enterprise appliance onto that data set, how well do you think I would do in searching for that information? Do you think I could just type “market manipulation”, look through results 1..10 and be done? Do you think the individuals in question are actually going to use that phrase in their emails? (“Hey, Barry, it’s Jim. How is it going with your market manipulation today? Wanna grab some lunch later on?”)

No, the Google Enterprise device would absolutely and utterly fail on this query. And on every other single query of that same nature. Because they design their systems to only support a single type of information seeking behavior: Navigation. They do nothing to support informational-oriented needs.

So this stuff does exist. It’s not scifi. It is still general-purpose search, and not shoes or houses verticals. There also exists real consumer need for support of these types of queries (Andrei Broder, in his 2002 paper, estimated that between 35% and 50% of all queries were of this “informational” type.)

And yet there has been a consistent, 10-year-long failure to deliver systems that support this type of information seeking behavior. Happy birthday, indeed. Yay.

If I were to slap a Google Enterprise appliance onto that data set, how well do you think I would do in searching for that information?

Continuing with this line of reasoning: Do you think Google would ever pop up a “one box” for queries springing from this Enron-related information need, i.e. a “market manipulation onebox”?

The very idea of a “market manipulation onebox” is ridiculous, isn’t it? Because oneboxes are designed toward surface skimming information seeking behavior. They’re designed to support navigational queries. Rather than topical, informational queries.

But oneboxes are all we get from Google, by way of interface innovation that supports our searching behaviour. That’ll never help me find the emails I need to find, in my “market manipulation” information need.

Or spell correction. Google has been bragging for years about their spelling correction algorithms and suggestions, as if they were the be-all and end-all of information seeking interface and algorithm design, the pinnacle of search engine achievement.

So now, again, suppose I have this Enron-related information need.

If I type in the query “market manpiulation”, will Google do any better in helping me find the information I seek? At best, they’ll say “Did you mean “market manipulation”, and then still get me results 1..10 filled with nothing relevant.

What good is spelling correction, when the underlying system still doesn’t support the type of query I’m issuing?

If I type in the query “market manpiulation”, will Google do any better in helping me find the information I seek? At best, they’ll say “Did you mean “market manipulation”, and then still get me results 1..10 filled with nothing relevant.

I am a trained information scientist and have been doing research on “natural language” information retrieval for almost 2 decades, so I know very well what you are talking about.

In one of my earlier research focuses, I studied standardization of document type description (this was before the dublin core, and it was primarily focused on corporate records management).

When you say “email”, then that is a “type” of record (and it has different content — both WRT information and also WRT evidence). The “meaning” of each document format has been traditionally a field of study which spans that of the historian and that of the legal scholar — only recently have information scientists (and also computer scientists) become more involved in this field (which has traditionally been referred to as “diplomatics”).

This field remains one of the most promising areas for vast advances in information retrieval — but it is also an enormously complex area. To give you an example of this: If I send an “email” via my email software and ISP, is that the same type of message as if I were to send the same message using the GMail service? Would it be interpreted as the same in a court of law? How is GMail different than, say, “text messaging” via skype? Why would a researcher be interested in restricting the search to “emails”?

Perhaps Google is doing research in this area — but I doubt it (because AFAIK, they are primarily employing software engineers, rather than people proficient in information science, information retrieval and/or archives management).

All in all, I think you are expecting Google to come up with “solutions” for something they have very little understanding of.

I am a trained information scientist and have been doing research on “natural language” information retrieval for almost 2 decades, so I know very well what you are talking about.

Oops! Sorry to have lectured you on something you already know. It’s just that so few people on the web.. bloggers.. average users.. even lots of Googlers.. really know or understand or have even thought about lots of these issues.

When you say “email”, then that is a “type” of record (and it has different content — both WRT information and also WRT evidence).

Content and evidence of an email might be different from, say, a corporate memo, or a press release, or a web page, or whatever.

But no matter what the content type, there is still a need.. across emails, across internal memos, across press releases, inside of web pages, to find information relevant to one’s need, such as whether or not that email, memo, etc. is somehow related to the “market manipulation” topic. It’s not all just about homepage finding.

The same is true of the web in general. Sometimes you really need to find information related to a topic, such as the Velvet Revolution, rather than just find the home page of the Velvet Revolution. Is there even such thing as a home page for a 19-year old political movement? No. That’s why there needs to be information retrieval systems to help you find/retrieve all the information that is scattered across the web.

This field remains one of the most promising areas for vast advances in information retrieval — but it is also an enormously complex area.

I agree. But isn’t this what Google is doing? Don’t they claim to be working on solving some of the toughest problems in the search industry? Aren’t they claiming to constantly innovate? Isn’t 70% of their 20,000-person workforce (14,000 people) working on these problems? That’s a whole helluva lot of brainpower. Most of which seems to be spent tweaking whitespace 😉

Perhaps Google is doing research in this area — but I doubt it (because AFAIK, they are primarily employing software engineers, rather than people proficient in information science, information retrieval and/or archives management).

I have a friend who is also an information scientist. This friend recently gave a talk at Google, and told them (explained to them) all about relevance feedback.. which is a decades-old idea that you must of course be aware of. My friend said that 95% of the people in the room had no idea, had never heard, of relevance feedback.

So you must be correct.. they aren’t employing people who can make search better. They’re employing software engineers, who know more about threading and memory management than they do about information seeking, retrieval, and organization.

But this is the root of my constant questioning and clarification seeking: If Google’s goal is to organize the world’s information, why aren’t they doing it? Why are they instead developing operating systems for mobile phones? I wish someone would say why.

IMHO: At one moment in time, they had a good idea (a link-based search engine). But we’re already past that moment (even Google says the links don’t work, so they manipulate the results and also come out with a special “pretend” tag for links that are supposedly meaningless (“nofollow”).

Google’s time has passed: the only reason people pay attention to them is that they have trained millions of people to click on links for them (and advertisers still think this is worth spending money on).

I agree. But isn’t this what Google is doing? Don’t they claim to be working on solving some of the toughest problems in the search industry? Aren’t they claiming to constantly innovate? Isn’t 70% of their 20,000-person workforce (14,000 people) working on these problems? That’s a whole helluva lot of brainpower. Most of which seems to be spent tweaking whitespace 😉

John, in your book, didn’t Brin or Page claim that search was only 5% solved? Or maybe 10%? Somewhere around there? And when did you do those interviews? 2003-2004?

How did Google go from 5% to 95% from 2003 to 2008? Did they really make that huge of a leap forward? Considering that from 1998 to 2003 they went from 0% to 5%.. the huge leap in relevance that the initial Google engine offered.. I find it very, very hard to believe that they’ve done the other 95% – 5% = 90% over the past five years, and none of us have really noticed!