Apertium

The Apertium project develops a free/open-source platform for machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalised languages, but also work with more widely-spoken languages.

The platform, including data for a large number of language pairs, a translation engine and auxiliary tools is being developed around the world, largely in universities and companies (e.g. Prompsit Language Engineering), but independent free-software developers also play a huge role.

There are currently 30 published language pairs within the project (including a number of "firsts" — for example Aragonese—Spanish, Turkish—Kyrgyz, Spanish—Occitan, Breton—French, and Basque—Spanish among others), and several more in development.

Projects

Apertium id-ms: Indonesian-Malaysian machine translation
The Indonesian-Malaysian language pair in Apertium currently does not have active maintainers. The objective of this project is to develop a release-quality version of the Apertium id-ms language pair. The morphological analyzers for both Indonesian and Malaysian will be improved; the Indonesian and Malaysian dictionaries will be completed.

Apertium on your mobile
Provide customer the services of Apertium on their mobile with more added features like
•Different keypad for different language
•Translation of SMS and other text content like contact, address, memo
•Basic form for translation of text.

Corpus-based lexicalised feature transfer
This project will deal with setting additional lexical features, taking context into account. The main idea is to extend the Apertium pipeline by placing a new module, after the POS tagging process and before the transfer process, which will set additional tags that can later be used in the transfer module. Examples of such tags include noun definiteness, verb aspect etc.
The goal of the project is to both improve the existing sh-mk pair and to serve as a prototype for similar corpus-based modules.

Make lttoolbox-java embeddable
Currently, lttoolbox-java is only usable from the command line, and it relies on external resources of the language pair to be translated (which must be downloaded and compiled by the user separately). The aim of this task would be to overcome this so that we could have self-contained JAR files to translate a language pair that could easily be integrated in larger Java projects.

Rule-based finite-state disambiguation
Designing of an XML formalism for writing disambiguation rules, a validator for it, upgrades to lttoolbox needed to represent the rules as a finite-state transducer, a compiler, and a processor which applies the rules to an Apertium input stream.