Find it Fast

Semantic Entity Extraction and Linking for Annotation and Ontology Evolution

Sabita Acharya

Bio:

Sabita is a PhD student in Computer Science at the University of Illinois at Chicago. Her research interests include Natural Language Processing, Data Mining,and Semantic Web. In her free time, she loves hiking, travelling, and reading mystery-detective novels.

Project Description:

A number of entity linking tools exist to take unstructured text (or sometimes semi-structured text), extract terms (often noun phrases) and then link those extracted entities to entities in a knowledge base. We have a tool (currently named Linkipedia) that addresses this challenge. Its use is described in one setting at http://nlp.cs.rpi.edu/paper/bioel.pdf In the setting of DataONE, our updated toolset leverages existing knowledge sources including DBpedia and a number of ontologies relevant to earth science to support entity linking from descriptions of data. We are using this tool in the DataONE project to take descriptions and link portions of those textual descriptions to ontology terms and then using those linking results to provide automatic annotation. While we have promising results, the annotation accuracy could be improved. Additionally our tool suite includes a number of components including a noun phrase extractor. The linking aspect can take text and propose appropriate links to knowledge base and ontology items. With the noun phrase extractor, it can take a description and identify noun phrases that do not link to any known ontologies, which is one way of identifying gaps in the ontology.
This project will package existing components into a web service for automatic annotation and ontology gap analysis. It will also attempt to improve on the suggested annotations and links.

DataONE is a collaboration among many partner organizations, and is funded by the US National Science Foundation (NSF) under a Cooperative Agreement. Acknowledgement: This material is based upon work supported by the National Science Foundation under Grant Numbers 0830944 and 1430508. Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.