Using Wikipedia to Improve Search Quality

Bill Slawski put up an interesting post over the weekend titled Can Web Search Use Wikipedia to Understand References to Names?. Bill references a paper by Microsoft researcher Silviu Cucerzan. The gist of the paper is that search engines can use Wikipedia as a cross referencing source, to help a search engine understand when it sees a name like “Bush” in a document which Bush is being referred to (George W. Bush, his father, Reggie Bush, or whatever).

In principle, what the paper discusses is how the context of the use of a particular name in a web document can be compared to the context of the use of that name on Wikipedia. Simplistically put, if the reference to “Bush” appears on a site about the New Orleans Saints, the likelihood that it’s about Reggie Bush is quite high. The search engine can use an external reference source, such as Wikipedia, as a method of validation, but trying the various pages on Wikipedia with a last name of Bush, and noting the references in common.

For example, the Wikipedia page and the web page being analyzed probably both use phrases like New Orleans Saints, football, running back, etc. By developing this sense of context, the web page being analyzed can be more properly classified, even if the page never uses the running back’s full name. So if the user searches on Reggie Bush, the search engine will know that the particular web page can be considered as relevant to the query.

It makes for interesting reading, and provides some insight into the types of analysis that search engines perform. What makes this even more intense to think about is that this is just one example of thousands of such scenarios that search engines deal with. It’s a complicated process, indeed.

Want to stay on top of the latest search trends?

Get top insights and news from our search experts.

Related reading

Google recently announced that it will be expanding its hate-speech policy for publishers that use the company’s ad network. How does this affect SEO companies, and what can we do to make sure we and our clients stay on the right side of the policy?

Search Engine Optimization is an imperfect science, and the sheer amount of sometimes contradictory information available about Google updates and SEO best practices has led to a tendency for knee-jerk reactions and sweeping generalizations among the community. Here's why you don't need to panic.

Google has released a new, feed-based mobile homepage in the US, in perhaps the most drastic and significant update of its homepage since 1996. We take a look at what the relaunch entails, and how it might change things for marketers.

In today's world we are mostly narrowing in to online leads, thanks to the Internet essentially opening up the entire world for us to peruse. But offline leads should still be a factor we consider moving forward. Here are some clever ideas and tools you can use to combine the two.

Ever since Google made the announcement that HTTPS is a ranking signal, there has been a lot of discussion around that extra ‘s’. While there are clear benefits, there is also a lot of nervousness around actually making the switch. So is it really worth it?