Article Structure

Abstract

Named Entity Disambiguation (NED) refers to the task of mapping different named entity mentions in running text to their correct interpretations in a specific knowledge base (KB).

Introduction

Named entities (NEs) have received much attention over the last two decades (Nadeau and Sekine, 2007), mostly focused on recognizing the boundaries of textual NE mentions and classifying them as, e.g., Person, Organization or Location.

Topics

named entity

Named Entity Disambiguation (NED) refers to the task of mapping different named entity mentions in running text to their correct interpretations in a specific knowledge base (KB).

Page 1, “Abstract”

Named entities (NEs) have received much attention over the last two decades (Nadeau and Sekine, 2007), mostly focused on recognizing the boundaries of textual NE mentions and classifying them as, e.g., Person, Organization or Location.

Page 1, “Introduction”

Named Entity Disambiguation

Page 1, “Introduction”

The second line of approach is collective named entity disambiguation (CNED), where all mentions of entities in the document are disambiguated jointly.

Page 2, “Related Work”

cos: The cosine similarity between the named entity textual mention and the KB entry title.

Page 3, “Solution Graph”

Named Entity Selection: The simplest approach is to select the highest ranked entity in the list for each mention mi according to equation 5, where R could refer to Rm or R5.

Page 4, “Solution Graph”

Our results show that Page-Rank in conjunction with re-ranking by initial confidence score can be used as an effective approach to collectively disambiguate named entity textual mentions in a document.

confidence score

Each candidate has associated with it an initial confidence score , also detailed below.

Page 3, “Solution Graph”

Initial confidence scores of all candidates for a single NE mention are normalized to sum to l.

Page 3, “Solution Graph”

One is a setup where a ranking based solely on different initial confidence scores is used

Page 4, “Solution Graph”

We used the best initial confidence score (Freebase) for re-ranking.

Page 5, “Solution Graph”

Our results show that Page-Rank in conjunction with re-ranking by initial confidence score can be used as an effective approach to collectively disambiguate named entity textual mentions in a document.

edge weights

initial node and edge weights set to 1, edges being created wherever REF or J Prob are not zero).

Page 5, “Solution Graph”

In the first experiment, referred to as PR1, initial confidence is used as an initial node rank for PR and edge weights are uniform, edges, as in the PR baseline, being created wherever REF or J Prob are not zero.

Page 5, “Solution Graph”

In our second experiment, PRC, entity coherence features are tested by setting the edge weights to the coherence score and using uniform initial node weights.

Page 5, “Solution Graph”

edge weighting approaches, where for each approach edges were created only where the coherence score according to the approach was nonzero.

Page 5, “Solution Graph”

We also investigated a variant, called J Prob + Ref, in which the Ref edge weights are normalized to sum to 1 over the whole graph and then added to the J Prob edge weights (here edges result wherever J Prob or Ref scores are nonzero).

cosine similarity

cos: The cosine similarity between the named entity textual mention and the KB entry title.

Page 3, “Solution Graph”

ijim: While the cosine similarity between a textual mention in the document and the candidate

Page 3, “Solution Graph”

The cosine similarity between “Essex” and “Danbury, Essex” is higher than that between “Essex” and “Essex County Cricket Club”, which is not helpful in the NED setting.

Page 3, “Solution Graph”

ctxt: The cosine similarity between the sentence containing the NE mention in the query document and the textual description of the candidate NE in the KB (we use the first section of the Wikipedia article as the candidate entity description).