This is wrong on so many levels it’s hard to know where to begin. Google doesn’t devalue things it touches. It increases their value by making them easier to find and access. Google increases your audience as a content creator, which is the most important asset you have. It takes a special kind of cluelessness to claim that something that increases your biggest asset “devalues” your business. Thomson’s mistake seems to be that he’s confusing “price” and “value” which is a bit scary for the managing editor of a business publication. Yes, the widespread availability of news may push down the price (that’s just supply and demand), but it doesn’t decrease the value at all. It opens up more opportunities to capture that value.

In a word, no. And he’s wrong on so many levels that it’s hard for me to know where to begin! But I’ll try.

He’s right that Google makes it easy to find a news article, but only in the limited sense that it’s easy to find if you’re explicitly looking for it. That’s only a marginal improvement on the pre-Google world. Google also makes it easy for readers to find commodity information on a particular subject–and frankly, the real innovation there is Wikipedia. Google has never made serious investments in supporting exploratory search.

Google doesn’t do much to help users appreciate the differentiation among competing sources for news–or for products in general. For users, this may achieve a satisficing outcome–with minimal effort, they obtain the gist of the news from good-enough sources. But for content creators, this is commoditization: because the interface de-emphasizes the differentiation, users perceive a set of undifferentiated choices.

Masnick complains that Thomson is confusing price and value, but in fact Masnick is confusing value with breadth of distribution. There are numerous examples where controlling distribution increases value: first-class seating, peer-reviewed publication, and even Google’s PageRank measure. In fact, to the extent that Google helps identify the best sources of information, it adds value, But Google destroys far more value by reducing the notion of value to a single, scalar (i.e., one-dimensional) measure.

By analogy, think of what has happened to the retail industry as comparison shoppers started using online aggregators to compare competitors on price, but not much else. Other dimensions of utility started to lose value–most notably, customer service. Retailers have suffered, and consumers suffer too, no longer able to make trade-offs based on the utility they assign to dimensions that they can no longer observe. What shopbots have done for retail, Google has done for everyone, but most of all for media.

One can reasonably ask why publishers don’t simply opt out of Google, using robots.txt to turn away Google’s crawlers. The answer is that they can’t unless they’re competitors opt out too. Google has lowered the value of content by persuading everyone, en masse, to offer packaging that masks the content providers’ differentiation. Like Wal-Mart, they’ve made consumers happy with lower prices, but don’t be surprised that some content providers are concerned about being strong-armed out of business (cf. Vlasic Pickles).

There’s no point in whining about it, and I commend media providers who are struggling to create value under such hostile conditions. I also know the media players have made many of their own mistakes to help get them into this pickle, not least of which was collectively giving Google so much leverage over them. But let’s dispense with the myth that Google’s gale of creative destruction is creating value for media providers. At best, Google is creating value at their expense.

47 responses so far ↓

I’ve read this post 3 times now, and at best, all I can figure out that you’re saying is that Google destroys value by preventing news publications from tricking people into focusing on them instead of others.

I’m sorry, but I’ve yet to see a market that is made better off by inefficiencies.

Making the overall distribution of information more efficient only increases opportunity for publishers, increasing the overall size of the market and then, yes, increasing value.

But it’s up to those publishers to capture that value. So, sure, I could see a very weak argument that it “decreases” value, but only for publications too stupid to figure out how to capture the increased market handed to them.

Mike, I appreciate the response, and I apologize if my argument is not clear. I’ll try to restate it more succinctly.

If I understand your argument correctly, it’s that Google makes it possible for news publications to achieve greater distribution, and that is intrinsically valuable.

But I think you’re ignoring a key part of the equation: Google’s approach to content aggregation and search encourages people to see news (and the world in general) through a very narrow lens in which it’s hard to tell things apart. The result is ultimately self-fulfilling: it becomes more important to publications to invest in search engine optimization that to create more valuable content.

Or are you suggesting that investing in content is stupid, and smart publications recognize that SEO is everything? I can’t believe that’s what you mean, but don’t you see that the market being “handed” to them comes with that very string attached?

Not to mention that Google has played a key role in creating an environment where very few people can charge for content, but instead have to rely on ad-supported models. It’s not that people don’t value the content; rather, they’ll happily take it from someone else for free–and they are mostly unaware of either the differences among news organizations (which admittedly are disappearing as the news business implodes and can’t afford to invest in better reporting / analysis) or the attention costs associated with being forced to look at ads.

I actually disagree greatly that Google makes people view things through a narrow lens — and I can speak from experience on this. Google does push up more *relevant* things to the top, as based on links in to them. In other words, it’s using the fact that people think a certain source is more valuable to make more people aware of that.

What that’s absolutely done for us is vastly increase the size of our audience, not because they all come via Google, but because they start by coming in via Google… AND THEN STICK AROUND.

People saying Google devalues the content seem to assume that Google search is the sole interface by which people read news. That’s just not true.

Finally, I disagree that Google “has created an environment where very few people can charge for content.” It’s just basic economics as to why few people can charge for content. It’s not Google, it’s efficient markets in action. I’ve been banging the free content economics drum since before Google even existed. It’s not Google. It’s an efficient market. Blaming Google is all about missing the opportunity.

I think we’re at least clarifying where we disagree. I have strong feelings about relevance, which I actually shared with Google recently (slides / video). But I suspect there we’ll simply agree to disagree.

I also concede that some people have benefited from Google. I give a fair amount of thought to SEO when I wrap up my blog posts, and I benefit as a result. Any zero-sum game has winners and losers.

As for your questioning my assumption that “Google search is the sole interface by which people read news”, you’re right, but only in that it’s not 100%. For the average media site, 61% of results come from organic search results. That isn’t just Google, but it’s proportional to search engine market share, so Google’s 2/3 of the market translates to 41%.

All in all, that doesn’t sound like a lot of people reading news by sticking around. In fact, it suggests that Google often takes people to places where they then decide not to stick around. I’m not sure whether to that implies Google is commodifying content or simply sending users to low-value sites!

Interesting. Well, I can tell you that the number of folks coming to us via Google is more in the 20% range — so I’d like to think that the quality of our content makes more people stick around and come back on their own later. Google search sends a lot of traffic our way, but we get a lot more from RSS aggregators. But… who knows. 🙂

On the economics front… this one’s a bit out of date, but gives you a good picture:

It sounds like our blogs have similar traffic distributions, only that your numbers are probably a couple of orders of magnitude higher than mine. 🙂 Perhaps that is a measure of quality, or perhaps of the breadth of topics covered.

In my own case, I suspect that I don’t have to compete that hard on quality, since there really aren’t any comparable media outlets (MSM, blogs, etc.) that I’m competing with. That would certainly explain the value I create through differentiation, which Google is neither helping nor hurting.

It does make for an interesting conundrum, that Google ideally favors high-quality sites, and yet the highest quality sites are the ones where mostly people come back on their own. Is Google helping people discover those new sites? That’s not the case for me: I mostly rely on word of mouth from colleagues I trust–both offline and online. But I’ll admit the possibility that Google sometimes acts as a “discovery engine” for others–even though I feel their very approach to relevance is designed for known-item search rather than discovery.

In any case, thanks for the link about the economics of free. If I read it correctly, you are arguing that anything with zero marginal distribution costs should be made available for free. Not surprising, creators who face very non-zero fixed costs find the idea unappealing, and aren’t convinced that they can solve the problem by making all of their profits from selling goods or services that are inherently scarce.

I don’t think it’s “protectionism” to try to create scarcity in order to recoup fixed costs. In fact, such artificial scarcity is essential for many public goods, the easiest example I can think of being “selling” endowed chairs, building names, etc. to philanthropists. Non-profit institutions that depend on donors manage their inventory of naming opportunities carefully to maximize donations–often so that they can use those donations to provide free services to others.

Perhaps that’s totally different than what you had in mind. But my point is that removing “artificial” scarcity may actually destroy a public good. That’s why some people cling to seemingly obsolete business models like charging for content, even when technology and cultural pressure have them cornered.

Also, while it’s top of mind, I want to thank you for engaging me in this discussion here. It’s a real treat, even if neither of us is making much headway in persuading the other of our respective points of view.

To a neo-classical enconomist who subscribes to the concepts of marginal utility and the subjective theory of value, for example, “value” and “price” are essentially identical. If a consumer is not willing to pay any positive price for an article, then the only cost is his opportunity cost. Opportunity cost to one side, we might fairly say that the article has no value to the consumer. The point of mark-to-market accounting is that something’s value is well approximated by where it trades.

But I’m not an economist. They may know the meaning of the terms they use, but we non-economists often conclude that those tidy terms end up being the basis for oversimplified and boring conversations about some world that’s not terribly close to our own messy one.

I’ve long been a reader of Mike’s thoughts on free, including the modestly named “Grand Unified Theory On The Economics Of Free.” In that, he writes about rethinking our products and services in terms of their benefits. Focused on those benefits, Mike then recommends that we charge only for the scarce products and services while we give away the non-scarce ones (or, as Daniel notes, “public goods,” defined as being non-rivalrous and non-excludable).

Well, woe to the sellers of benefits that are expressed only in public goods! 🙂

But Daniel’s quite right, I think, that google de-emphasizes differentiation, boiling the news down to pagerank-derived relevance and decreasing its aggregate scarcity. Google takes pages–bundles of words and links–and flattens them into SEO projects. One reason that’s especially brutal to news articles is that, in addition to being a kind of non-rivalous and non-exludable good, they’re also experience goods, meaning readers don’t fully know they’re worth reading till after they’ve read them. (Unlike a pair of pants, you can’t try before you buy.) Because news is an experience good, it would be really useful for users of news to be able to find articles and posts based on other criteria they care about. (That profound lack of information up front is decidedly *not* part of an efficient market, by the way.)

So, very many users may well come from google and stick around at high-quality sites like techdirt. And that number of users may even be larger than it would have been otherwise–perhaps if lycos or alta vista had stayed at the top. Let’s even say that google is the best at guiding the *most* users to techdirt. Even so, it’s not at all clear that google’s method is the best method for guiding to techdirt the users who are the *most likely* users to stick around. Anyhow, this is where exploratory search comes in and has the potential of allocating users to articles more efficiently. My sense is that there’s value in that.

I was just having this discussion with my wife. The problem, as I see it, is that–with Google (or any search engine)–I can only find the content I already know I’m interested in. I can search on, say, “stimulus” and find scores of good articles.

But if I’m reading a hard-copy newspaper, I’m not only reading the stuff I’m already interested in, I’m scanning each page, at least for the headlines, and maybe coming across something I had no idea was out there…and is still fascinating.

Case in point: Thursday, on Darwin’s birthday, the Philly Inquirer had a terrific article on the new research that is suggesting an evolutionary model for cancer. I’d never have found that on line because I wouldn’t have known to look for it.

Your point is well taken. But I hope you agree that you can also scan the pages of an online newspaper–there’s nothing unique about printed paper. And there’s no reason a search engine couldn’t offer you a richer interface for exploration.

For example, take a look at online version of The Guardian, a leading newspaper in the United Kingdom:

Fascinating discussion — was Google the villain or just the first instrument here?

My POV here comes from having worked with a few large media cos in the early 2000s. I can say that very few saw the commoditization of similar, relevant news items, and many overvalued their brand enormously. Those who questioned brand value in an age of free content were gently replaced by those who worshiped the masthead, and believed every piece their remnant ad stock was in fact, premium.

Am also wondering (will this boot me from the Noisy Community, I wonder?) if Google in addition to becoming a verb, has come to define the search experience, so that the Guardian page Daniel links to above does not meet expectations for “results”?

You’ll have to try harder than that to get yourself booted from The Noisy Community!

I agree with you that media companies have their share of the blame in this mess. They are guilty of complacency and, as you said, of overvaluing their brands. In fact, their eagerness to be indexed by search engines may have, in part, been from overestimating the draw of their brand names.

Thus, at the very least, I concede that Google’s victims lined up for the slaughter, and they weren’t exactly exhibiting healthy lifestyles even before that. But that doesn’t change the fact that Google’s effect on them has been a devaluing one. I’m not trying to make a moral judgment here, but rather to establish the historical facts.

As for your point about the Guardian page, you’re right that it’s not the search experience as defined by Google. You can compare it to search at the New York Times, which is powered by Google:

Fascinating discussion — was Google the villain or just the first instrument here?

Perry: At first, Google was “just the first instrument”, as you say. But as time went on, and Google continued to not offer any sort of exploratory search, and continued only offering 10 blue links at a time, Google became (imho) the “villian”, as you say.

Go back and read what Daniel wrote, in the original blog post:

There are numerous examples where controlling distribution increases value: first-class seating, peer-reviewed publication, and even Google’s PageRank measure. In fact, to the extent that Google helps identify the best sources of information, it adds value, But Google destroys far more value by reducing the notion of value to a single, scalar (i.e., one-dimensional) measure.

So the answer is both: Google was both the first instrument, but then became the “villian” (your words) by refusing to implement any kind of user/retrieval dialogue (either algorithms or interface) that goes beyond a single dimensional relevance measure.

Google is a satisficing engine, not a search engine. A real search engine, a real information seeking and exploration system, would do so much more.

“Google Page Rank supposedly makes qualitative distinctions between content by measuring quantitative links to content, but in reality it doesn’t work that way–not enough of the time, anyway. I can see this from my own posts: sometimes I want to find a previous post of mine among the thousands that I’ve previously written. So I start digging through Google using keywords that I think will unearth the post. What I end up finding much of the time are my most popular posts related to those keywords, and often not the actual content I’m seeking. Given that some of my best content hasn’t necessarily been the most linked-to content, I struggle to find it.”

A good search engine should be transparent enough and interactive enough to allow us to turn off the “popularity” filter when we want to, so as to actually be able to find the good content that we want, when we need it.

It’s funny that so many arguments end up at bemoaning Google’s lack of transparency. It’s farcical that a best seller entitled “What Would Google Do?” praises Google for being transparent. And it’s wonderfully ironic that Matt Asay (who is one of the leading advocated of open source) and I are on the same side, defending the WSJ against the pro-Google onslaught.

I did see at least one commenter on Matt’s blog making a Google = Wal-Mart argument similar to the one I made here–not sure if he made it independently. In any case, it’s heartening to know that there are at least a few of us on this side of the arguments. Not that popularity implies the validity of the argument, but a bit of moral support makes me feel warm and fuzzy.

a) I know of quite a few publishers (magazines) internet (search) saved from extinction

b) creating good content and seo/sem/marketing on the internet in general are not in opposition: they go hand in hand, if link based rankings are just a popularity contest then providing good content helps your rankings

c) yes, there are ways search could work better for all people who don’t know how to properly use what we have available now, but

d) we can’t expect google or any other search engine to ‘evaluate’ content the way tribal elders, universities or publishers used to: remember, they wan’t to ‘organize information’ not ‘create repository of knowledge’

e) so what are the choices? either algorithmic (text/link based ranking) or editorially driven (‘exploratory’) search of Mahalo? is there any other option? (btw, I really like the Home Depot – if remember correctly – sample you’ve mentioned)

Jacek, glad you made it here and watched the talk! I’m on my way to being an internet star!

In any case, your points are well taken. I’m sure some publications made the best of search as a distribution mechanism, and I know that SEO is not entirely independent of content quality–although I know equally well that it is a major distraction and expense for many content creators, as evidences by the size of the SEO industry.

But I think you take a narrow view of exploratory search. It doesn’t have to be editorially driven. Moreover, as evidenced by Duck Duck Go (I swear they don’t pay me for product placement!), you can bootstrap on the already-available human-generated content, like Wikipedia.

The difference between best-first search and exploratory search is that the former aspires to deliver an optimal result ranking without consulting the user, while the latter aspires to maximize the user’s control through interaction. I think the latter is inherently a more robust approach, even if the former works well enough for the easy use cases.

The difference between best-first search and exploratory search is that the former aspires to deliver an optimal result ranking without consulting the user, while the latter aspires to maximize the user’s control through interaction.

Yes, I second that definition.

Or, to state what Daniel is saying in another way: One (of many) possible implementations of exploratory search would be to give users on/off control over the signals (aka features) that go into a best-first ranking. In this manner, users can customize the single, global best-first ranking into a multiplicity of ever-changing best-first rankings. And if they don’t like what they are seeing after switching one signal on or off, they can interactively toggle it and keep going.

So with the Matt Asay example above, he should be given enough control to turn off the pagerank signal, on this particular search.

In this way, you could still have a search engine that was algorithmically-driven. But you could give the user enough control to alter that algorithm, in real time, based on the user’s own information need. Matt Asay didn’t need the most popular articles, for that particular search. He should be able to say that.

So notice, the choice is not between 100% algorithmic and 100% Mahalo. In fact, I think that algorithmic vs. user-ranked is an orthogonal dimension to what we’re talking about here. The more important distinction, imho, is between “user knows best” (exploratory search) and “search engine designers know best” (Google).

I think you can have exploratory search that relies either on algorithmic or user underpinnings. And I think you can have a “we know better than you do” Google approach that also is either algorithmically- or human-grounded. (For an example of the latter: It’s basically Mahalo with no user interaction/feedback.) The key distinction here is as Daniel says: How much control is given to, and how much interaction is elicited from, the searcher? We need to start getting away from the world where Google is everyone’s nanny.

Sorry, let me restate that last paragraph, using terminology that is a little clearer. I’ll italicize my changes:

I think you can have exploratory search that relies either on algorithmic or human underpinnings. And I think you can have a “we know better than you do” Google approach that also is either algorithmically- or human-grounded. (For an example of the latter: It’s basically Mahalo with no searcher interaction/feedback.) The key distinction here is as Daniel says: How much control is given to, and how much interaction is elicited from, the searcher? We need to start getting away from the world where Google is everyone’s nanny.

I think you can have exploratory search that relies either on algorithmic or human underpinnings. And I think you can have a “we know better than you do” Google approach that also is either algorithmically- or human-grounded. (For an example of the latter: It’s basically Mahalo with no searcher interaction/feedback.) The key distinction here is as Daniel says: How much control is given to, and how much interaction is elicited from, the searcher? We need to start getting away from the world where Google is everyone’s nanny.

Well, maybe we agree more then disagree after all. Neither Google is 100% algorithmic, nor Mahalo fully editorial (if it wasn’t for Google, Mahalo wouldn’t probably know how to come up with their content).

a) @Daniel “But I think you take a narrow view of exploratory search. It doesn’t have to be editorially driven.”

I really have no idea how it could work without editors. Based on what? ‘Wisdom of folks’, ‘semantic web’ data-about-data, natural language processing? These things don’t work, can’t work unless you employ lots of statistics, which means you are implementing pattern recognition, machine learning etc algorithms.

b) “The difference between best-first search and exploratory search is that the former aspires to deliver an optimal result ranking without consulting the user, while the latter aspires to maximize the user’s control through interaction.”

Excellent definition: yes, there is a need to ‘put queries back into some kind of context’, but is this about search? or just user interface letting her de-de-contextualize the search phrase? (this is what I believe G is trying to do with ‘personalization’ of search, again, approach taking control away from the searcher). I’m in favor of ‘more power to the user’, letting me switch on/off ranking factors. At the same time I’d like to see some kind of filter that would save me from going through 100s of pages of BS (and here again, G is the best we have, and it isn’t good enough, not even close)

@Jeremy “I think you can have exploratory search that relies either on algorithmic or human underpinnings.”

Exactly, so are we talking about search itself or the interface features? If it means we just clarify the query (using ‘exploraory search features’), but the query is still some kind of a text object (with all synonymy and private ‘search history’, and ‘other-folks-search-and-bounce-history’ etc) we are not doing anything different unless there is ‘human editor’ involved somewhere in the process.

d) But if we reduce ‘the exploratory thing’ to ‘intention clarifying interactions with search engine’, then how could it help (or do any damage to) the value of things touched (or un-touched) by Google? Don’t really want to say that, but wouldn’t it just make more opportunities for smart ‘seo’? And the transparency thing: G can’t be any more transparent about their algorithms, that would ruin everything for the user (yes, again, thanks to ‘seo’s, the greedy ones).

Well, maybe we agree more then disagree after all. Neither Google is 100% algorithmic, nor Mahalo fully editorial (if it wasn’t for Google, Mahalo wouldn’t probably know how to come up with their content).

Yes, true.. but that’s still not the important distinction that I (and I think also Daniel, thought I don’t want to put words in his mouth) am drawing here. How much algorithm vs. how much human organization is, I think, relatively independent of the real issue, which is the question of how much control the searcher is given, to change, alter, negotiate with, refactor, disagree with, or amplify whatever it is the search engine gives him or her by default.

So whether that default is based on algorithmic organization (Google) or on human organization (Mahalo), that doesn’t matter so much, to my perspective. The point is: Can the searcher take whatever is given to them and do something unique with it.

Exactly, so are we talking about search itself or the interface features?

The search or the interface? I think that is a bit of an unnecessary dichotomy. In order for there to be an interface that does anything, it has to be supported/supportable, somehow, by the back end. And if the back end allows any kind of dialogue/refactoring, that has to be reflected somehow in the interface. So I think my answer is: both.

For example, a week or two ago I talked about a Yahoo research labs project, in which you, the searcher, were given a slider widget after you had gotten the results to your text query. At the poles of the slider were two categories/genres: “commercial” and “non-commercial”. As you moved the slider one direction or the other, your results list would automatically and dynamically re-arrange itself, to push the commercial results down and the non-commercial links up, or vice versa.

Now the brains behind that slider could have been implemented either algorithmically or by humans. Yahoo could have had some machine learning algorithm doing commercial vs. non-commercial binary classification of every web page. On the other hand, it could also have been human-driven.. meaning that the classification was based on human(s) decision about what type of page it was. Either way, that doesn’t really matter. What matters is the search engine then exposes that information to the searcher in some way, and lets the searcher fiddle with it.

Does that make sense? I’m not saying that everything that is exploratory search is a slider. I am just using that as one example.

But maybe I don’t quite understand what you are asking. Did I answer your question?

Don’t really want to say that, but wouldn’t it just make more opportunities for smart ’seo’?

So, I’m still not quite sure that I’ve understood your question.. but no, I do not think that it will make more opportunities for smart seo. The reason I think this is that seo cannot be everything, to everyone, all the time. For example in the Yahoo commercial vs. non-commercial slider experiment.. it could very well be that some seo does a really good job of making their page look like a commercial page. But then when a searcher sets the slider to “non-commercial”, that seo page will disappear from the rankings. The seo will have failed. The opposite holds true: If the seo tries to make the page look non-commercial, then that seo is going to miss all the searchers who set the slider to the commercial setting.

I think it is near impossible for the seo to make the page look both commercial and non-commercial at the same time. So I am not as worried.

“So whether that default is based on algorithmic organization (Google) or on human organization (Mahalo), that doesn’t matter so much, to my perspective. The point is: Can the searcher take whatever is given to them and do something unique with it.”

OK, and I care more about the other thing: the default settings (and again, on another lever, the default settings after the searcher makes one or two ‘exploratory search’ steps).

“The search or the interface? I think that is a bit of an unnecessary dichotomy. ”

Agreed. At some point it is the same thing, ‘the search experience or activity’ includes both sides. But I’m much interested in what happens/shows up on the page after ‘I choose the best possible (most relevant to my intentions) query/factor combination’. This is what I’ve been calling ‘search’ v ‘interface features’.

Can’t agree on the ‘seo’ thing. Smart seo – in your example – would just have more domains/pages, some plainly commercial (like a store), some hybrid (like a directory of products/reviews/service providers, heavily marketing his own offer), and a blog with discreet advertising.

OK, and I care more about the other thing: the default settings (and again, on another lever, the default settings after the searcher makes one or two ‘exploratory search’ steps).

Ok, I have no idea what you mean anymore. To me, “default settings” means that you’re *not* doing exploratory search. So what does “default setting after the searcher makes one or two exploratory search steps” mean? I’m not trying to be difficult; I really don’t understand what you’re getting at.

But I’m much interested in what happens/shows up on the page after ‘I choose the best possible (most relevant to my intentions) query/factor combination’. This is what I’ve been calling ’search’ v ‘interface features’.

What shows up after you’ve chosen the best possible combination is hopefully what you were seeking. Let’s go back to the Matt Asay quote that I mentioned above. I’ll reproduce it here for your ease:

“Google Page Rank supposedly makes qualitative distinctions between content by measuring quantitative links to content, but in reality it doesn’t work that way–not enough of the time, anyway. I can see this from my own posts: sometimes I want to find a previous post of mine among the thousands that I’ve previously written. So I start digging through Google using keywords that I think will unearth the post. What I end up finding much of the time are my most popular posts related to those keywords, and often not the actual content I’m seeking. Given that some of my best content hasn’t necessarily been the most linked-to content, I struggle to find it.”

So in Matt’s example, what has shown up is the article that he is looking for.

But to get to that article, he has to be able tell Google to turn off the pagerank/popularity aspect to its ranking algorithm, because that is interfering too much with his ability to find what he needs. So to me, the more interesting part is how the searcher interacts with the engine, and how the engine then interacts with the searcher, in order for both sides to come to an understanding of what it is that the searcher needs to find. In other words, how does the search engine help the searcher determine the best possible query/factor combination? It’s an iterative, joint process.

Can’t agree on the ’seo’ thing. Smart seo – in your example – would just have more domains/pages, some plainly commercial (like a store), some hybrid (like a directory of products/reviews/service providers, heavily marketing his own offer), and a blog with discreet advertising.

I have to re-disagree with you on the seo thing, and I am going to use your example to disagree with you.

Currently, there is already/only a single default ranked list for everyone, which already makes seo incredibly easy. And so many results pages are filled with seo from top to bottom.

If you add any more searcher/engine interaction, that only complicates the number of variables, which makes things more difficult. In fact, I proposed the addition of only a single variable: the commercial vs. non-commercial variable. And your response was that this was easily game-able, because a smart seo could fork their original single page/domain into three additional domains: 1 commercial, one non-commercial, and one hybrid.

Ok, so let’s assume a branching factor of three. And now let’s furthermore assume that a good exploratory search engine has at least ten different ways of communicating with its user, ten different ways of transparently letting the user refactor results. Examples include popular vs. non-popular, media rich vs. media poor (lots of images or not), recent vs. old (how long has the information been around), etc.

A good exploratory search engine would have way more than ten of these. But let’s assume these ten.

So in order for an seo to successfully rank well under all these options, he or she will need to create 10^3 = 1000 different pages/domains. An annual domain registration fee of $10/year now costs the seo $10,000/year.

And this is assuming a very moderate factor of 10. What if there are 50? Or 100? 50^3 = 125,000 different pages/domains, which would cost the seo $1.25 million per year. Assuming the seo even had time to create and optimize all those pages!

Will there still be seos that are successful at every single factor, and that are willing to shell out $10,000 or even $1.2 million? Maybe yes. But whereas there are now hundreds or thousands of seos all trying to optimize for the single, global, default ranking, there will only be 1 or 2 seos even able, much less willing, to optimize for thousands of non-default rankings. So the end result is that even if 1 or 2 seos get really good at it, that is orders of magnitude less than the dozens or hundreds of seos that are now already good at it. Total seo spam might not go to zero, but it goes way, way down.

The way this all relates to the main topic of the post is in the following sentence from the post, wherein Daniel writes:

But Google destroys far more value by reducing the notion of value to a single, scalar (i.e., one-dimensional) measure.

That is what all this is about. That’s why we talk about interactive, exploratory search, because it allows for the formation of multi-dimensional measures, rather than one-dimensional, Googalian measures. That’s why we talk about seo.. because a one-dimensional measure makes seo easier, and therefore fills more of the results pages with seo. If you add more dimensions, you make seo a lot harder, as I just explained.

Yes, that is the punch line exactly. The way to unlock the value of content is to expose it to users in its multi-faceted glory, giving users the power to filter, sort, and explore based on the features that matter to them. Conversely, the way to commodify that value is to reduce each document to a few key words and a scalar authority score.

I hope it was that simple. I wish my search engine was this well educated, smart, up-to-date in my field, sexy librarian (‘research assistant’). For 2500 years logic isn’t able formalize grammar of natural language, you assume it could be done with contextualized meaning?

What we have available is some data and statistics, the best thing that came out of this is google translation service – as you know, it isn’t working. So yes, search as you present it is possible, but in very small, closed systems with highly formalized ontologies (or in editorial environment of Home Depot, excellent, I want that for my internet store). There is just not enough science/technology to make it work right on the web, I’d also argue this kind of technology is impossible in principle, but this would require a paper, not a blog comment, to demonstrate (well, I’m not positive about it neither).

To explain:
a) default settings: let’s say you use temporal factors in Google (limiting your results to documents found this year), what do you get? One document or a list of documents? Here is another set of ‘default’ results, another ‘one-dimensional’ serp on your exploratory path.

b) Matt Asay problem is trivial, he just doesn’t remember enough about the post he is looking for. Exploratory search wouldn’t help him at all: every step is a highly uncontrollable and unpredictable exclusion of some (unknown) documents, little mis-step at the beginning of the process would make it impossible to find the right post. (You employ this way of thinking when talking about ‘ harder and harder seo’, which leads me to:)

d) there would be less competition for any set of ‘exclusionary factors/steps’, making it much easier to create/optimize content to meet any given set of requirements (known ones!) Economy of it? I hear Univ of Phoenix spent half a billion dollars on internet marketing last year.

Sorry about not being able to say these things in a more ‘clear and distinctive’ way. It may be because most of the points I’m trying to make seem obvious to me. Grateful for motivating me to thinking more about them.

@Daniel
As a champion of faceted search I would be interested to hear what your opinions were on how to handle the problem of when “fuzzy” meets facets.
What I mean by this is facets are great when the initial criteria is simple but, if the criteria is in any way fuzzy (e.g. a “fuzzy spelling” operator is used or asking for “more like this” on an existing document/piece of text) then the results list exhibits long-tail characteristics with rapidly declining quality/relevance. With a result list like this how do you begin to summarise using facets – do you
a) Summarise the entire result set (knowing there are a lot of irrelevant results)
b) Take a sample of the top results (but how/where do you choose to draw the line between relevant and irrelevant?).
With either of these choices how do you satisfactorily explain to the naive user what trade-offs are being made here?

Bringing this back to Google – I would argue that even for straight-forward queries there is a long tail of irrelevant content based on spammy content and general noise so the summarisation problem applies here too.

This is too big a question to try to answer in the comments, and is actually an active area of research at Endeca.

The short answer is that you can’t know for sure whether results are relevant to a user without reading a user’s mind, but you can analyze a result set for its coherence, plus you may have a lot of freedom in how you arrive at a result set from the query. Part of the implication there is that there may be room for negotiating the query interpretation process, even before we start thinking about slicing and dicing the results for the query. This is what I mean by clarification before refinement.

I hope it was that simple. I wish my search engine was this well educated, smart, up-to-date in my field, sexy librarian (’research assistant’). For 2500 years logic isn’t able formalize grammar of natural language, you assume it could be done with contextualized meaning?What we have available is some data and statistics, the best thing that came out of this is google translation service – as you know, it isn’t working. So yes, search as you present it is possible, but in very small, closed systems with highly formalized ontologies (or in editorial environment of Home Depot, excellent, I want that for my internet store). There is just not enough science/technology to make it work right on the web, I’d also argue this kind of technology is impossible in principle, but this would require a paper, not a blog comment, to demonstrate (well, I’m not positive about it neither).

Jacek: I think you are reading too much into what I am saying. I am not saying that we need to have natural language understanding in order to do all of this. I am not saying we need formalized ontologies. I am saying that this is something that should (and can) be done using only the data and statistics that we already have.

Let me explain it this way: Google publicly states that they have.. what.. something like 400 different “signals” (aka “features”) that go into their unidimensional ranking algorithm. Google takes 400 little statistics.. like the frequency of a term in a document, the number (and an quality) of links going in to a document, the frequency of updates to a document, the length of time the domain has been in existence, the diversity or information entropy of nearby links, etc… and combines all those statistics in the hopper to produce a single, unidimensional ranking. All that I am saying is that Google should “open up” that list of ingredients, and let the user decide for him or herself which of those signals to use, which not to use, and which to use, but in an inverse manner. For example, I think by default Google likes give you a list ordering that, ceterus parabus, prefers more recently updated pages to older pages. All I am saying is that you should be able to switch that around, and say that you’re looking for older, not-recently-updates pages. I don’t know why you think that we need formalized ontologies to be able to do something like that. That’s as trivial and as simple as dirt to do. Google already has the data and statistics to do it one way. All I’m saying is that they should make their engine open and transparent enough so that the user himself can choose to do it the other way.

To explain: a) default settings: let’s say you use temporal factors in Google (limiting your results to documents found this year), what do you get? One document or a list of documents? Here is another set of ‘default’ results, another ‘one-dimensional’ serp on your exploratory path.

Ok, I think I understand you now. To answer you: What you get is another list of documents. If that’s what you’re calling a “default” list, then yes, that’s what you get. But I think you’re confusing the meaning of “one-dimensional”. Yes, any single list is one-dimensional. But giving the user the ability to reorder that list in various ways, using various combinations of the ranking “signals” that go into that list, makes the totality multi-dimensional. For example, suppose you type in the query [north american songbirds]. Right now, all you get from Google is really a one-dimensional serp. But if Google then gives you a set of tools, to let you turn on/off (or reverse!) the popularity signal, turn on/off (or reverse!) the recency signal, turn on/off (or reverse!) whatever other signals go into the Google ranking engine, then you are able to take that one-dimensional list that comes from the [north american songbirds] query, and walk through that list is a multiple dimensionality of ways. Yes, every way that you choose is technically still a linear traversal. But that’s not what we’re talking about here. We’re talking about the totality of ways one could choose to walk through a ranked list. Right now, Google only gives you one way. Google’s way. Google thinks it knows best how to rank [north american songbirds], and does not give you any control over changing the way things are ranked. THAT is what I mean by one-dimensional.

b) Matt Asay problem is trivial, he just doesn’t remember enough about the post he is looking for. Exploratory search wouldn’t help him at all: every step is a highly uncontrollable and unpredictable exclusion of some (unknown) documents, little mis-step at the beginning of the process would make it impossible to find the right post. (You employ this way of thinking when talking about ‘ harder and harder seo’, which leads me to:)

I have to completely disagree here. It is exactly because Matt Asay doesn’t remember enough about the post that he is looking for that exploratory search would help him. Yes, you are correct.. every exploratory step is uncontrolled and not necessarily completely predictable. But that’s what exploration is! It is all about venturing off into the unknown, seeking new pathways, being able (enabled by the control that should be given to you by the search engine) to seek new pathways, and discovering/uncovering (“endeca”, Daniel? 😉 what you could not have discovered, otherwise. A little misstep at the beginning would not make it impossible to find the right post, because exploratory search is not exclusionary. Exploratory search is multiple, overlapping pathways. Exploratory search is uncovering patterns in the data that you would not have seen, yourself, if all you were doing walking down Google one-dimensional list, one link at a time. Exploratory search lets you try a lot of different directions that you otherwise could not. Look, given Matt Asay’s state of knowledge, tell me how he currently is able to solve his problem, using Google? Currently, he has to walk down the entire list of 102,000 documents, one document at a time, in the “most popular” order that Google gives to him. You really think that’s the best solution to his problem? I do not. Matt himself also does not.

d) there would be less competition for any set of ‘exclusionary factors/steps’, making it much easier to create/optimize content to meet any given set of requirements (known ones!) Economy of it? I hear Univ of Phoenix spent half a billion dollars on internet marketing last year.

I still don’t understand this point that you are making. If there is less competition, what that means is that the top 1-2 results are filled with seo.. rather than the top 30 results, as is currently the case. If I only have to go past 2 results to start getting to non-seo-ed content, that is much better than the current Google approach, in which — because of Google’s one-dimensional, immutable ranking procedure — I have to go through 30 results to start getting to non-seo-ed content. Right?

Sorry about not being able to say these things in a more ‘clear and distinctive’ way. It may be because most of the points I’m trying to make seem obvious to me. Grateful for motivating me to thinking more about them.

Likewise. I thoroughly enjoy our discussion, and am glad that you also derive value from it.

In fairness to Google, it’s not clear that Google could deliver this flexibility and maintain its current performance characteristics. That said, I wonder if they put too much effort into scale and speed, and not enough effort into the actual functionality being delivered. In any case, they could be more transparent even if they provide as much flexibility as might be ideal.

In unfairness to Google (hehe), we need to think about this in a “TCO” manner. What is the total cost to the user in terms of the amount of time it takes to find the information they need? If Google gives me a unidimensional results list in 200ms, but it takes me 10 minutes to walk through the top 230 results to find what I need, then the overall cost to the user is much worse than a search engine that takes 5000ms to return my results, but then lets me walk through it in a much more intelligent manner.. and thus I discover the information I need within 5 minutes.

I talk about this a lot more in the comments section of Greg Linden’s blog, and give some links to some early Information Retrieval evaluation papers that talk about “total cost” or “overall efficiency” of a session, rather than of a single iteration:

You’re preaching to the converted. My point is only that Google can’t just turn these features on without a lot of system implications. So, from their perspective, it’s a trade-off. But I’m with you in questioning whether they’re optimizing for the right performance measures.

Point well taken; yes, it will take extra effort from Google to implement these things. But why not? They have 20,000 employees, after all. And they say that 70% of their effort is spend on search. I’m convinced that 14,000 people could implement this.

I would take it a step further, and question not only whether they’re optimizing for the right performance measure.. but also question whether it should even be a goal to optimize for any single performance measure, as they seem to be doing. IR has a long history of acknowledging the differences and tradeoffs between, for example, a precision-oriented and recall-oriented search. Optimizing for one often degrades the other.

So what Google really should do is have some way of engaging the user in a dialogue, and figuring out what type of search the user is engaged in, what modality would best suit the user’s intent. And then once that modality is determined, they should swap-out their retrieval algorithms, in real time, and shift to one of multiple different algorithms, each of which has been optimized for a different type of task.

Sometimes there just isn’t a convenient wikipedia category (DuckDuckGo), department (Amazon) or keyword that can be used to clarify the intent.

As an example, I may be interested in the tensions between government and newly nationalized banks over the bonus culture that taxpayers are now funding (a hot topic in the UK currently). There is no keyword or category that exemplifies this so using example documents that cover this and saying find “more like this” is a valid technique. This is the sort of fuzzy, statistical “bag of words” approach that the Autonomy marketing dept choose to peddle under the god-awful banner “meaning based computing”.
Your “refine” comment seemed to suggest you avoid this form of information retrieval because it gives you the problems I outlined when summarising results. Is this a fair statement?

That would mark a pretty clear dividing line between the Endeca and Autonomy approaches.

Rather than seeking ways to avoid that fuzzy form of retrieval I was more interested in ways of accomodating it into a faceted search UI. Do you see them as incompatible?

[…] strikes me as brutally accurate. As much as I criticize the ad-supported model in general and Google’s role in devaluing online content in particular, I think that Elgan does a great job of explaining what may be one of the the news […]

[…] Porter does make some points that I agree with. His characterization that Google is “a parasite that creates nothing, merely offering little aggregation, lists and the ordering of information generated by people who have invested their capital, skill and time” is a caricature, but not entirely off base. What he’s missing, of course, is that this “creating nothing” is a significant technical feat. But I agree that Google’s relationship to content creators is often parasitic. […]

[…] that Google is sending lots of traffic to publishers. The problem is that Google has also helped devalue that content, while at the same time taking a plum spot in the value chain. As I’ve said […]

[…] years ago (and as Josh cites in his post), Google commodifies everything (my bad for not citing him here). Needless to say, I agree with Josh that: What we need is a search experience that let’s us […]