Tuesday, April 03, 2007

Recap of recent posts

There has been a flurry of long posts on papers and lectures here in the last week. It might have been a bit overwhelming. I would not be surprised if your eyes glazed over on a Monday morning -- ugh, too much to read -- and the posts passed you by.

But, there is some really good stuff in there. In case you missed them, I wanted to highlight a couple key posts on a couple particularly interesting topics:

"Knowledge extraction from search queries" talks about a couple papers out of Google on extracting facts from the Web. Question answering -- correctly answering questions such as "How old is Larry Page?" -- is an important and promising path to improving web search. This Google work is particularly unusual in that they propose using query logs and the information in them to help with knowledge extraction and question answering.

"The end of federated search?" and "Google and the deep web" discuss Google's efforts to crawl the deep web, data normally hidden in private databases behind html forms. Deep web data would make web search more comprehensive and, because the data often is well structured, could be particularly useful for improving question answering. The key part of the Google work is that it rejects a common technique of accessing deep web data in real-time, instead proposing copying everyone else's data to Google's servers.

"More on data center in a trailer" talks about Microsoft's and others' efforts to factory-install thousands of computers in a shipping container and the efficiencies gained from that approach.