Leaderboard Zone

I’ve just finished reading A Taxonomy of Web Search by Andrei Broder, written largely while the author was CTO of Alta Vista (and using AV query data), and published after he moved to IBM Research in 2001.

The paper has a trove of references to other papers, which is good for my work, and it has a singular thesis: that all web searches are not equal. Broder sets out to dispel the notion that all searches are “informational” in nature. He instead maintains that many are “transactional” or “navigational” in nature. These two seemingly obvious categories are in fact relatively new to the academic field of Information Retrieval (IR), which developed largely in the context of large islands of data (ie, in the 70s/80s), rather than in the web era.

What I like about this paper is the use of the word “intent” – which over the years I’ve come to use quite a bit (see my last column on video advertising over the internet, in which I rant once again on “intent over content”, or my post on The Database of Intentions). Intent is behind every kind of search, Broder says, but “there is no assumption … that this intent can be inferred with any certitude from the query.” Ay, there’s the rub….To get to that intent, Broder employed a short survey on the site.

A few fun facts from Broder’s analysis of response and related log data:
– nearly 15% of searchers wish for “a good collection of links on a subject” as opposed to “a good document.”
– 12% of queries in the log data used were sexual in nature
– nearly 25% of searchers were looking for “a specific website that I already had in mind.”
– An estimated 36% of searchers were looking for transactional information – what Broder calls “the intent to perform some web-mediated activity.”

Broder concludes that the next generation of search engines will need to take into account this new taxonomy of intent – transactions, navigation, as well as informational. Given that this paper was published in late 2001, it’s interesting to see how the major engines already are on that path – with Yahoo’s focus on shopping being one of the best examples.

Content Marquee

A very kind reader (I’ll buy the drinks next time) has forwarded me eBay’s recent analyst day presentation (from earlier in the month). This reader’s comment: “The focus on search was VERY new…analysts were trying to figure out the significance.” Indeed, one of the slides declares: “eBay is Search!”

A couple of things come to mind when reviewing the slides (there were more than 300,covering the entire business, both US and int’l). First, eBay knows that the faster and more relevant they can make their internal search, the better their margins. They understand that they have a marketplace of people with very specific intents, looking for very specific things. The easier eBay can make it for folks to find what they want (whether it’s a bid, an item, or a comparison), the better their bottom line looks. Toward this end, eBay has built its own internal search engine called Voyager, which is optimized for eBay users. One of the slides in the analyst presentation boasts: “We are world class at analyzing our user base.”

Second and possibly more important is the role of search in customer acquisition. eBay has a world class IT solution in place to monitor tens of thousands of paid keywords across the web, each with its own P&L and analytics. I can’t confirm this, but I would not be surprised if eBay is Google’s largest customer, something that probably makes both companies uncomfortable, because each can analyze the other’s data and mine it for competitive edge. In the presentation eBay also notes the power of natural or algorithmic search (the “pure” results) – the company says it is revamping its entire site to optimize for natural search. Now that’s quite a statement. Again, major thanks to my source for this information.

Among the analysts still covering the internet after the great wipe out of 2001, Safa Rashtchy, of Piper Jaffray, has received the most notice as an early and ardent supporter of search. His company will soon come out with a new report forecasting online advertising revenues (Safa is credited with pegging the paid search segment of the market at a widely reported figure of nearly $7 billion by 2007). In his recent newsletter, which summarizes a conference Piper hosted on online advertising, Safa predicts revenues will sharply accelerate, to more than $15 billion by 2008. Also, he predicts that online ad revenues will match 2000’s number of $8.1 billion – the height of the boom – by next year. My guess: he’s wrong – when the counting’s done, 2004 will beat 2000 by a comfortable margin.

I’m offline most of today, but enough folks have passed me these two sites as interesting cultural outgrowths of search that I’ll post them here for your review. The first, GoogleRace, plugs into the Google API and ranks each candidate by keyword searches. Some of the tops searches are “large penis” and “George Bush is going to lose”. (Thanks Kenny..)

Googlehouse, on the other hand, is more understated. This is an attempt to make a cultural commentary through images – the site polls the Google Image search database and pulls up images that fit various parts of an imaginary house. Click around a bit, if you’ve got the time. This kind of stuff usually pushes my MEGO button, but…give it a looksee. (Thanks Tim…)

And another note. Thanks to this blog, many readers are now starting to ping me with interesting stuff, tidbits, even insights for my book. I really appreciate and encourage this. I’m at jbat@battellemedia.com, keep those cards and letters coming.

So I printed out three papers suggested by Gary Price in this post. I read the third one first, and didn’t find it earth shattering, though there were a few interesting tidbits. The paper is titled: “U.S. Versus European Web Searching Trends” by Amanda Spink and Bernard Jansen (Penn St. Univ) and Seda Ozmutlu & Huseyin C. Ozmutlu (Uludag University). Basic conclusions: US searchers tend to use fewer words in queries, and tended to have shorter search sessions overall. Also, European users tend to look at more query results, compared with US searchers, who were vieweing fewer results per query. (This buttresses the stereotype that US citizens are more impatient and less deliberative than their European counterparts).
Also consistent with stereotype was a comparison of general topic categories searched for by each group. For US searchers, the #1 topic, with nearly 25% of the overall searches, was “Commerce, travel, employment, or economy.” That category was # 3 for European searchers, with only 12.3% of the searches. European’s #1 category was “People Places and Things.” Also, it seems that Europe (recall this was in 2001) was still on a learning curve for tech, as the #2 search category was “Computers or the Internet.” That term was #4 for the US during the same period. Also telling: European searchers were more than 4 times more likley to look for for “Performing or Fine Arts” than US users, and not surprisingly, “Sex or Pornography” was two places higher on the European list, coming in at #4.
The study goes on to conclude, though not very forcefully, that there are noticeable differences between US and European searchers, but the authors don’t claim it’s necessarily a cultural thing, it may well be the distinction in the engines themselves, as much as anything. This study left me wanting more, and happy they have continued this kind of work. (I’ll be reviewing this latest find soon.)

Looksmart today announced a major update to FindArticles, a great idea whose execution so far I can’t quite endorse. The service – which has its own tab on the Looksmart site – has “over 3.5 million articles from over 700 publications.” While I’ve not explored it fully, a search for various folks who I know are subjects of major magazine pieces turns up a boatload of BusinessWire-type press releases. Not sure how that matches up with expectations of “articles from 700 publications…” but I hope they clean it up, because it could be a great service.

A new study says the Yellow Pages will continue to grow but…there’s a catch. I’ve taken to quoting the current size of the Yellow Pages market – $26 billion or so – as proof there’s a lot of growth left in search. After all, if non-local search is about $2-3 billion this year, and is poised to undermine the local Yellow Pages market due to its nascent push into local search, that’s a rather large market to grow into. But a study by the Kelsey Group, made public by eMarketer today, says that the Yellow Pages themselves will also grow in the next five years, to $36 billion. However, a full 23% of that 2008 revenue – about $10 billion – will be “digital directory” Yellow Pages – yup, paid local search.
OK, so the math is: Yellow Pages is a $26 billion biz now, growing to $36 billion by 2008, a $10 billion increase. Local Yellow Pages “Digital Directory” is – well not much now (as far as I can tell it’s not even in the current $26 billion number), but it will be $10 billion by 2008. Given that the local search piece of the business is quite small today, and the online yellow pages is probably no more than $300 million, if that, what this study really seems to say is that the lion’s share of ALL growth in Yellow Pages will be digital. In other words, there will NO growth in the Yellow Pages market, if it weren’t for online search. Now that I can buy into.

The Boston Globe also has a “worm turns” piece here, and one of the key sources in the piece takes offense at how his quotes were used…Another source (Dave Winer) also feels wronged… I find this kind of journalism, where reporters try so hard to make quotes fit into a pre-concieved notion of how a story should play, the most irritating thing in business reporting today. In this case, it’s not the Google Is Wonderful Look At All The Lava Lamps angle, it’s the opposite: Google Is Making Enemies And Is Too Big For Its Britches. Neither is right. Ugh.

Fortune this week proves that the worm can turn in the mainstream media’s coverage of all stories, even one that for years has proven my predictions wrong. At least 18 months ago I was cluck-clucking to the communications honchos at Google that they should “beware the backlash.” They were getting too much good press, and at some point the media always wakes up and eats its young. But Google enjoyed the longest free ride I’ve seen in recent history – even when those same PR honchos essentially went dark and refused to give anyone much access. The same story kept getting written, again, and again, and again….
This Fortune piece isn’t a hit, but it does rehash the negative bits with at least as much ardor as the positive ones. The piece sets up like this: At Fortune we did some *real* reporting (the implication being that all the stories before were stage managed affairs), and we found out that the company that everyone’s been lauding for the past three years is…complicated, contradictory, and not exactly perfect. Not a rocket science conclusion, but it manages to make the company seem a bit more human. The piece states some very old stuff as fresh (that the company has recently grown arrogant – this is new?), has tidbits of news known only to a very few insiders (that Bill Joy and Google flirted but eventually did not come to terms), and a couple off the record sources who are also investors saying stuff like: “Google has a lot of momentum, but its current position is probably not defensible.” (Yahoo holds 5% of Google…).
The piece is a fine round up of where things stand, but I can’t help feeling a bit empty – so much of the story was stuff we already knew, but had to be in there because of Fortune’s large readership, not all of which could be assumed to be avid followers of all things Google. That’s the problem with mainstream media coverage – it has to speak to everyone. The story lacked an analysis of business models, of the industry, and any deep discussion of the larger phenomenon Google represents. And the negative bits, while something of a first, had a twinge of gossip and/or sour grapes to them. Overall, I’m not sure this piece moved my view of Google one way or another. How about you all?

I’m starting to read more academic papers, research presented by various professors and the like (including some folks who are technologists working in the industry). One of the things I find fascinating about the search business is how quickly it’s turned from an academic pursuit – with all the implications of open, non-commercial sharing of findings – to one driven by clusters of high-powered nerds hitched to a particular corporation’s R&D machine. I’ve been asking around on this idea and found most landlocked geeks agree – Nutch aside, a good percentage of search research has by and large been silo’d – many of the best minds are at Google,Microsoft, Yahoo, and a few others. I’ll wager that between the 500 or so top engineers at these corporations, there ain’t a hell of a lot of sharing going on. Robust peer review between bare-knuckle competitors? Prolly not.
But it was not always so. Recall that both Yahoo and Google came out of the Stanford CS department, for the most part. Same for Excite (Joe and Graham). Lycos was midwived by CMU, and out of Berkeley came Inktomi. Anyway, you get the picture. A lot of innovation came out of the publish-and-peer-review culture of the university setting, and many of the folks who drove that culture are suited up, so to speak, in one corporate silo or another.
In any case, before they joined up, many of them wrote wonderful academic papers they shared with all in the name of progress (some still do). And there are still plenty of great academic researchers banging away on the database of intentions, though certainly they don’t benefit from owning their own slice of it like the majors do (many borrow data from the majors and perform analysis on that). So as I work on the book, I’ll be posting reviews of some of the papers I read – the interesting bits, so to speak. I’ll title each entry “The Search Papers: Cute Name Here” for ease of use, or more likely, as a clear caveat that discussion of academic Mumbo-J will follow. The first will be out in the next day or so, a wonderfully predictable little study (found via Gary Price, of course) comparing European and US search patterns from 2001 FAST and Excite (pre-Chapter 11) data. Hope you like it.