Abstract

Word sense disambiguation is the problem of selecting a sense for a word from a set of predefined possibilities. This is a significant problem in the biomedical domain where a single word may be used to describe a gene, protein, or abbreviation. In this paper, we evaluate SENSATIONAL, a novel unsupervised WSD technique, in comparison with two popular learning algorithms: support vector machines (SVM) and K-means. Based on the accuracy measure, our results show that SENSATIONAL outperforms SVM and K-means by 2% and 17%, respectively. In addition, we develop a polysemy-based search engine and an experimental visualization application that utilizes SENSATIONAL’s clustering technique.

Background

There are three types of WSD techniques (Ide and Veronis 1998): supervised learning, unsupervised learning and knowledge-based WSD. Supervised techniques need manually-labeled examples for each ambiguous term in the data set to predict the correct sense of the same word in a new context. This is referred to as training material which allows their corpus to build up a classification scheme based on the set of feature-encoded inputs and their appropriate sense label or category. The result of this training is a classifier that can be applied to future instances of the ambiguous word.

As a classification problem of machine learning, WSD has several characteristics that distinguish it from other traditional classification problems in NLP. Due to the difficulty of creating manually-labeled examples for the ambiguous terms, there is usually a small amount of training data available for WSD task. For example, the data set of ambiguous biomedical terms available from the National Library of Medicine (NLM) contains only 100 examples of each term being used in context. Also, the number of the senses of an ambiguous word for a WSD problem can be quite large. Take word cold as an example, there are more than 10 meanings of cold according to the Merriam-Webster Online Dictionary. Compared with WSD problem, for other classification problems in NLP, such as POS tagging (part-of-speech tagging which makes up the words in a text as corresponding to a particular part of speech), a word usually only has one or two POS’s in a single language. Furthermore, the features, which are extracted from the context of a target word and used for classification, usually include lexical features. Without proper feature selection criteria, the amount of possible lexical features used by a machine learning algorithm can be very large, while the frequencies with which they occur in the data sets can be very low.