An update on Powerset, the natural language search engine

June 22, 2007 12:50 AMMatt Marshall

Powerset is a quixotic search engine company here in San Francisco that has convinced itself it can take on Google.

While Powerset gears up to release its search engine publicly later this year, it hopes to nurture an army of 50,0000 early testers, or “Powerlabbers,” to bang on different parts of it beforehand — the idea being that the converts will not only improve the product, but will help push it at launch.

To lure those volunteers, Powerset seeks to produce a hail of fun projects for them to work on — more on this below.

As reported, Powerset is significant because its search engine aims to understand phrases — not merely words, as Google does. Powerset’s approach is potentially powerful. However, it requires significant mass education before people change the keyword-like search habits.

Earlier this week, we spent another hour and a half with Powerset to learn about their latest progress. It is still secretive, but it is planning to open considerably over coming weeks.

For the testing phase, Powerset aims to stick volunteer testers on bite-sized pieces of its search problem. It is homing in on sixteen different topical areas – ranging from entertainment to travel and porn – and in each of these areas wants users to provide feedback on its results.

So for example, in the area of entertainment, if a user asks: “Who won an academy award in 2001?” Powerset finds that easy. It will produce answers like Halle Berry, who won the award for Monster’s Ball (see image below). But if you ask “What is the most recent movie Halle Berry started in?” the engine may break down. Powerset tracks the range of questions posed by users, which creates feedback about what is and isn’t working. That way, Powerset hopes to prepare in key, popular topical areas before launch.

.

Take another example, travel. See below for the topical page. Volunteers suggest ideas for useful search themes, and they vote to push the best ideas to the top.

A sign of Powerset’s readiness to think differently is its approach to Web architecture. Powerset will base its site on Ruby on Rails, a new, edgy framework liked by engineers for its nimbleness. But Ruby is controversial because some say it can’t handle vast amounts of traffic efficiently. Few big-traffic sites have built upon it.

The company which released the framework, 37Signals, has used it for four applications, including its popular Basecamp. CNET’s Chow and Chowhound – and most recently by popular messaging site, Twitter, are also built on it.

This is not a company led by one or two brilliant co-founders. Rather, it is a team of now dozens of engineers — who to the outsider seem to share a single quality, a sort of wide-eyed, ebullient confidence, embodied by the relentlessly upbeat chief executive himself, Barney Pell. His two co-founders, Steve Newcomb and Lorenzo Thione, share the same trait. Or, if they have doubts, they try not to show it. That’s why they may pull something off.

Natural language search, as Powerset’s approach is called, faces an enormous challenge. The sheer number of phrases and semantic senses that can be intended by searchers is overwhelming.

Breaking it off in bits makes sense.

Powerlabs, the name given for the topical test features, launches in September, and is taking sign-ups now.

Powerset will specifically target high-school teachers for training on how to use its search engine. If they are recruited, they’ll impart their knowledge to students.

In return, Powerset hopes to get feedback on its main search engine. See below for example of a query: “Who proved Fermat’s last theorem?” Powerset provides a big blue feedback box. This way, if Powerset provides a poor result, testers can alert Powerset’s engine to the shortcoming.

Powerset is also working with databases to fill its result pages with more information. We’ve been told Powerset has partnered with MetaWeb’s Freebase (first reported by Techcrunch, which misspelled the name), though Powerset wouldn’t comment. In the entertainment example above, it pulls the “meta” information stored in Metaweb about Halle Berry into a widget. The widgets are useful, even if they’re not part of the main search engine technology. Powerset hopes to let bloggers embed the widgets into their blogs when they write about related material.

Another example of meta-data being used is on the result below about Steve Jobs and the iPod — you’ll see it pulls bio information and videos.