wampeter

Projects

foma is a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers. Although NLP applications are probably the main use of foma, it is sufficiently generic to use for a large number of purposes. It comes with an xfst-compatible interface and regular expression language. The library contains efficient implementations of all classical automata/transducer algorithms: determinization, minimization, epsilon-removal, composition, and boolean operations. More advanced construction methods are also available: context restriction, quotients, first-order regular logic, transducers from replacement rules, etc.

Treba is a commandline tool for training, decoding, and calculating with weighted (probabilistic) finite state automata (WFSA/PFSA). Training algorithms include Baum-Welch (EM), Viterbi training, and Baum-Welch augmented with deterministic annealing. Treba is optimized for speed and numerical stability, and training algorithms can be run multi-threaded on hardware with multiple cores/CPUs. Forward, backward, and Viterbi decoding are supported. Automata for training/decoding are read from a text file, or can be generated randomly or with uniform transition probabilities with different topologies (ergodic or fully connected, Bakis or left-to-right, or deterministic). Observations used for training or decoding are read from text files compatible with AT&T finite state tools and OpenFST.