Part-of-Speech Tagging

In linguistics, part-of-speech tagging (POS tagging or POST) is the process of marking up
a word in a text as corresponding to a particular part of speech, based on both its
definition, as well as its context — i.e. relationship with adjacent and related words
in a phrase, sentence, or paragraph.

Part-of-speech tagging is hard because some words can represent more than one part of speech
at different times, and because some parts of speech are complex or unspoken, which is not rare
in natural languages.

Here we implement a part-of-speech tagger based on hidden Markov models (HMMs) in Java.
Compared to other advanced algorithms (e.g. those based on maximum entropy classifier
or random fields), this implementation is extremely fast while providing comparable accuracy.