Czech HMM-based Tagger (using full morphology)

Download

Description

The HMM based Tagger is an implementation
of the Czech tagger developed at UFAL
. In order to work, the tagger requires preprocessing by a Czech morphological
module with a very high coverage. This module covers a superset of the
Czech
"HM" morphology. Both the morphological module and the tagger are supplied in two independent packages
as binary executables, together with all necessary precompiled Czech data.
Input must be in the ISO Latin 2 (iso-8859-2) code and follow the usual
csts.dtd
definition, and output is produced in the same way (ISO Latin 2 code, csts.dtd).
(As is the case with many of the tools provided with PDT 1.0, both executables
also accept - and then produce - a "simplified SGML", which is not a real,
valid SGML, but simply contains at least the tags for words, punctuation,
and sentence breaks, one item per line.)

Supported platforms

The tagger and the included morphological
module are compiled for Linux (2.2.x and above, such as Red Hat 7.0 and
later).

Installation

Unpack the HMMtgr.tgz archive in
a directory where you want the tagger to live, e.g.: