Documentation tools

Morphological analysis

The project uses a set of morphological compilers which
exists in two versions, the xerox and the
hfst tools. The xerox tools are the original
ones, they are robust and well documented, they are freely
available for research, but they are not open source. The hfst
tools are open source with no restrictions, but they are still
quite new (with version numbers like 0.6). Both compilers
compile the same source files, and at Giellatekno we use both
compilers.

A third compiler is also able to compile source files
written for xfst and lexc, the foma compiler.

The xerox compilers

The xerox tools can be found at fsmbook.com. They are
documented in the book referred to on that page (Beesley and Karttunen), we strongly recommend anyone
working on morphological transducers, both with xerox and hfst, to buy the book.

Note

There is a bug in the latest xfst, causing forms like oslolaččat
(derived from Oslo) not to work. If this is important to you, download
xfst 2.13, change the name to
xfst and put it in e.g. $HOME/bin.

twolc,
for phonological and morphophonological rules (cf. a shorter and a longer documentation).

lexc, for representing the Saami stems and the affix lexica

xfst
the finite-state transducer tool, for integrating the different parts
of the program, and for compiling the preprocessor.

tokenize, for tokenization and processing
(cf. documentation),
note that we do not use tokenize for preprocessing at the moment, but perl.

The programs are activated by printing e.g. lexc and
then pressing the enter key. The tools are documented in Karttunen /
Beesley Finite-State Morphology:
Xerox Tools and Techniques. The tools may also be
installed on your own machine, be it on Mac OSX, Linux or Windows. One
version of the software is found on the CD accompanying the book, for
the latest version, ask Trond for reference.

The foma compiler

Disambiguation tools

Analysis and testing

The easiest and the most effective way to do this (although a
little scary at first) is to use commandline tools. We have made a
short introduction in English and a
longer document in Norwegian on
this topic. The introduction
on how to use our parser is also an excellent introduction on how to
combine the individual tools.