Vijay Chittoor's blog about interesting internet trends and more.

September 19, 2008

Ebay selling Stumbleupon? Social bookmarking and search

I have been following the story around Ebay selling Stumbleupon, just a year and a half after it acquired it for $75 million. Around the time of the acquisition, many people speculated that ebay would like to get a foothold in the search market, otherwise largely dominated by Google; this GigaOm article was one of those mentioning "The toolbar, if you ask StumbleUpon users provides more useful and productive results, than say Google. By marrying the toolbar to Skype client, eBay can do an end run around Google’s dominance of the search business. A simple search box inside Skype client is all it would take."

Interestingly, many people believe that social bookmarking sites were the first form of search to use "tagging", and are therefore a revolutionary new technique in search. After all, what could be better than letting humans figure out which pages to tag with keywords like "barack obama", instead of letting algorithms do this job. The inherent assumption is that conventional search engines don't use any form of tagging.

However, this view of the world is flawed. I'd like to argue that search engines like Google use human tagging in a big way in their search rankings. Except that it's a different kind of tagging - the tags used by publishers on the web to refer to the documents they link to. Conventionally, this is called "anchor text" in search engine lingo, but if you look at it, it serves the same purpose as tagging on bookmarking sites like stumbleupon. The only difference is that Stumbleupon and Del.icio.us expand the scope of the tagging to users who might not be writing or blogging on the web.

Let me take a couple of examples to make this more concrete. Search for "Sarah Palin" on Yahoo, and take a look at any of the links. Then try the search "link: <url>" on Yahoo: like this. This will show you all the webpages with links pointing to the URL you're analyzing. If you look at the text of these links, you'll see that many of them refer to the URL we're analyzing with the words "Sarah Palin" or variants thereof. For example, doing a "view source" on http://abfreedom.blogspot.com reveals this link: <a href="http://www.johnmccain.com/about/governorpalin.htm">Sarah Palin</a>

If you compare the search results on a popular keyword like "Sarah Palin", you'll find that search engines like Google and Yahoo have many more results, as well as more relevant ones, than social bookmarking sites like Stumbleupon or Del.icio.us. This difference is even more stark when you move to more tail-ish keywords: e.g., try "Miruts Yifter", which produces just 1 result on Stumbleupon but more than 15,000 relevant ones on Google. And most of those 15,000 results are linked off from other pages on the wb with the anchor text (or tag) "Miruts Yifter".

Essentially, Google gets to use tags much more than Stumbleupon. And to top it all, they could also crawl Stumbleupon pages and use the tags available on those.

But the key thing to note is that anchor text/tags are just 1 component of the ranking mechanism on classic search engines. Link popularity, text on the page/title/url and many other components go into thr final ranking.

It's just naive to assume that social bookmarking sites will be able to build better search just based on "tagging".

However, these sites do provide value to users in other ways: 1) a permanent store of users' bookmarks 2) ability to discover people interested in a similar topic 3) a fun browsing experience. For all these reasons, social bookmarking sites aren't going to go away, but don't expect them to develop into full fledged search engines. Ebay's rumored decision to sell Stumbleupon might be a reflection of this realization.

"It's just naive to assume that social bookmarking sites will be able to build better search just based on tagging."

I think you overstate.

I have to *really* squint to see arbitrary webpages as tags. The major keywords on a webpage are often not linkified; instead what's linkified is words like 'this' or 'click here'. There's a huge difference between writing and writing with the intent of tagging. The former requires text mining to process, the latter 'merely' good HCI.

Google has much that StumbleUpon does not, but it is also missing one data source. Bookmark-style tagging is available to anyone, not just the writer of the page. You don't have to own a webpage to tag links. The population of people willing to type tags into amazon or etsy or yelp or delicious far exceeds the number of people maintaining webpages that Google crawls. Indeed, it's closer to the number of people _using_ Google.

I'm not saying StumbleUpon will slay Google, just that the comparison is plausible and has subtleties.

Comparing #results on stumbleUpon and google is unfair for several reasons:

a) It focuses on a specific implementation of bookmarking engine and search engine.