EntityCube project gathers Web info on everyone

Ever tried creating a Wikipedia entry for yourself only to have it promptly deleted because you’re not famous? Well, on Microsoft Research’s beta EntityCube site, everyone is famous.

The site, developed by Microsoft Research Asia, gathers people’s information from all over the Web and organizes it into Wikipedia-like entries. Search EntityCube for “Steve Ballmer” and there it all is: the Microsoft CEO’s birthplace, height, recent news mentions, (fake) Twitter accounts, colleagues and more. Plus graphical representations of online hype and personal connections. Cool.

But try searching for someone not famous  such as me, “Nick Eaton”  and it’s clear EntityCube isn’t a finished product. The only things EntityCube gets right about me are some out-of-date biographical information (taken off of my YouTube profile) and a few blog posts I’ve written.

And if you try EntityCube’s name disambiguation, I’m not even one of the eight people EntityCube finds named Nick Eaton. Considering my relatively high presence on the Internet, since I work for Web-only seattlepi.com and have been involved with digital mainstream media for a number of years, the results are surprisingly inaccurate. True, I used to work for The Spokesman-Review in Spokane, but I don’t anymore. And I am not an assistant director at Alexander Howden Marine & Energy Ltd. in the United Kingdom.

“EntityCube is an entity search and summarization engine,” the MSR project site states, “which automatically summarizes the Web for the long tail, not just celebrities!”

“Not just celebrities” my butt.

&nbsp:

A screenshot of EntityCube with my annotations. Click to enlarge

OK, so I know I’m not the only Nick Eaton in the world; the world doesn’t revolve around me. I’m not surprised EntityCube grabbed info about other Nick Eatons. I didn’t expect it to get everything about me correct. But c’mon, I’m not even one of the disambiguated Nick Eatons?

EntityCube is the English-language version of MSR Asia’s Renlifang, which translates from Chinese into, well, “entity cube.” Renlifang has been well-received in China, attracting about 1 million users on peak days, Microsoft said.

CNET News’ Ina Fried reported about EntityCube on Monday, noting that it shares a similar name to the “entity cards” Microsoft has added to Bing search results. Search Bing for, say, “Washington State University” and a card-like module with basic information tops the results.

“The project is coming out of Microsoft’s research arm, but it would seem to be highly relevant to where the company’s Bing efforts are headed,” Fried wrote. “Something like EntityCube could conceivably allow Microsoft to expand that beyond the types of well-known people, such as musicians, for whom it currently offers summaries.”

The need for collecting and understanding Web information about a real-world entity (such as a person or a product) currently is fulfilled manually through search engines. But the information about a single entity might appear in thousands of Web pages. Even if a search engine could find all the relevant Web pages about an entity, the user would need to sift through all the pages to get a complete view of the entity. EntityCube is an entity search and summarization system that efficiently generates summaries of Web entities from billions of crawled Web pages. The summarized information is used to build an object-level search engine about people, locations, and organizations and explore their relationships.

A Microsoft spokesperson told me EntityCube currently is “just a research project and no plans for expanding have been announced.” Both EntityCube and Renlifang are in beta.