Using element words to generate (multi)words for the SPECIALIST Lexicon [Poster].

The SPECIALIST Lexicon has been distributed annually by the National Library of Medicine (NLM) since1994. Lexical records are used for Part-of-Speech (POS) tagging, indexing, information retrieval, concept mapping, etc. in many Natural Language Processing (NLP) projects, such as Lexical Tools, MetaMap, SemRep, UMLS Metathesaurus, and ClinicalTrials.gov. This paper describes a new systematic approach to identify single words

and multiwords from MEDLINE through the use of element words. Element words are lowercase single words without punctuation and are not stopwords. Results show an accelerated growth of the Lexicon, particularly an increase in multiword records. Hence, improvement in recall or precision can be anticipated inNLP projects using the SPECIALIST Lexicon and its applications.