MOLEX

MOLEX (MOrphological LEXicon) is a morphological normalization module that enables morphologically-aware search and thus improves search performance. This is particularly important for Croatian as a morphologically complex language. The normalization module uses a morphological lexicon to conflate the various inflectional variants of a word into a single representative form (the lemma). A wide-cover lexicon has been acquired automatically from raw corpora based on a hand-crafted morphology model. The morphology model uses a representation framework that can be readily applied to other languages. This makes the development of morphological normalization modules for other languages easy and cost-effective.

Specification & Features

Morphological normalization of Croatian nouns, verbs, and adjectives

High coverage lexicon (covering over 3.5M word forms), constructed semi-automatically from a large representatve corpus

Produces a MultextEast morphosyntactic description of each input word form, providing information about the wordoforms case, gender, number etc.