Misc Links

Wednesday, August 15, 2007

Wired has an article about spock.com, a people search engine that combines crawled and user added content. From the few searches I did, looks like this is good for celebrity names than a regular person with web content. For instance, searching a name like "David Smith" produces these results. Of the top 10 results, only 3 of them actually have the name "David Smith" or something closer and the first result is not one of them. Compare this with a general purpose search engine like Google. Among a dozen random NLP/ML academic names (professors) I tried, it only got Jason Eisner and Tom Mitchell correct. One reason for this poor recall is probably they don't get content from user home pages.(Some sites where this data is derived from include MySpace, Friendster, IMDB, Wikipedia, ratemyprofessors.com, etc.)

Nevertheless, this website is a representative of interesting KDD-style problems that one could do with people names. It is also interesting as people names that we look for fall in the "long tail" without sufficient data to support calling for clever machine learning techniques.