Disease Named Entity Recognition by Machine Learning Using Semantic Type of Metathesaurus

Zhong Huang and Xiaohua Hu

Abstract—Named Entity Recognition (NER) has been an active research fields in biomedical text mining. In the past years, much attention has been focused on semantic types related to protein, gene, and other named entities in biology domain. Human disease named entity recognition in literatures, however, has not received much attention. Comparing the NER solutions targeting protein/gene named entities, existing machine learning solutions lacks same level of precision and recall for disease named entity recognition. The development of machine learning based NER for disease named entity is largely focused on local features of tokens in the sentence, by integrating its linguistic, orthographic, morphological, local contextual characteristics. In this paper, we utilized the sentence level semantic contextual information as one of discriminative features for disease NE recognition. Our method takes advantage of semantic types related to disease in UMLS metathesaurus by fuzzy dictionary lookup. The results show promises to improve the performance of current disease NER methods.