Blog Post

Bing's Not Too Shabby for Natural Language Searches

Lately, many people have been experimenting with Bing, Microsoft’s (s msft) new search engine offering, which we covered here. I agree with the many people who are noting improvements that need to arrive in Bing, such as blog searching and more varied search results for basic keywords. However, not everyone realizes that Bing is built on a powerful search engine technology from an open source-focused company that Microsoft acquired last year: Powerset. As I covered in this post, the Powerset technology underlying Bing introduces some powerful features that many people aren’t trying. You may find them useful.

Powerset originally got noticed in the search community because it leveraged Hadoop, which is an open-source software framework that allows clusters of computers to make very quick work of mining large data sets. (Hadoop also powers Yahoo’s (s yhoo) search engine.) Powerset’s idea was to leverage Hadoop to improve natural language searching, where you type in questions in sentence form instead of using keywords. If you’ve followed natural language searching, you’ll know that it’s had a rocky, and generally unsuccessful, history.

Hadoop brought Powerset more speed at mining through the mountains of possibly relevant search returns that come back in natural language search engines. Powerset doesn’t underlie all of Bing, but this unique facility with natural language searching is in Bing, and you may find it useful if you haven’t tried it.

Here are some example queries to try at Bing.com, to get a sense for how it works:

You’ll notice that many of the search results that come back in these types of searches at Bing are mined from Wikipedia, because Powerset has always specialized in Wikipedia search. Entries on Wikipedia aren’t always correct, so you have to take results with a grain of salt, but Microsoft has extended Bing’s facility with natural language search beyond just Wikipedia searches, and it can be quite good at providing quick answers to natural language questions.

If these features aren’t being used (as you indicated), doesn’t that imply people aren’t all that interested? I realize natural language search is the holy grail of search, but not necessarily while typed.

The average length of searches has continued to inch its way up over the years as people become more comfortable with defining search criteria in large data sets (e.g., the Internet). However, new users typically only type one or two words, not entire sentences. Average users will type a few words, but not complete sentences. And super users will type several keywords and occasionally use additional qualifiers or boolean logic.

I’m just having trouble understanding the situation that creates the need for natural language searching within the current user’s workflow. A clever technology, no doubt. Important for the next step, absolutely. But I’d rather ask Hal verbally or simply type a few cleverly selected keywords.