Name of the file that contains the parallel treebank. Default format is Stockholm Tree Aligner format (where the sentence alignment is implicitely given by tree node alignments). To use a different format use the option -A

Name of the files that contains the source language treebank. This is useful to sepcify a file that is different from the one that is specified in the 'parallel-treebank-file'. For example, sentence alignment files from OPUS usually refer to non-parsed XML files. With -s we can overwrite this and refer to the parsed corpus instead. However, be aware that the same sentences have to be covered in the same order and appropriate IDs of these sentences have to be found when reading through the treebank files.

Define features to be used in training. (For alignment, features are taken from the modelfile.feat file!!) 'features' is a string with feature types separated by ':'. There are various features that can be used and combined. For more details look at Lingua::Align::Trees::Features. The default is 'insideST2:insideTS2:outsideST2:outsideTS2'

Classifier to be used. Default is 'megam'. Another possiblity is 'clue' which refers to a noisy-or like classifier with independent precision-weighted features (requires probabilistic values for each feature and supports only positive features). Other classifiers may be supported in future releases of Lingua::Align.

Path to the probabilistic source-to-target lexicon created by Moses from the word aligned corpus. Of course, it could be any kind of bilingual dictionary as long as it provides a score for each entry and it uses the same format as the one created by Moses. Default is moses/model/lex.0-0.e2f.

Name of the file that contains pairs of IDs for all sentences that have been word aligned with GIZA++/Moses. This is useful to match sentences when reading word alignment files for feature extraction (sometimes not all sentences are included in both, the parsed collection and the word aligned data!). Note that word aligments and parallel treebanks have still to be stored in the same order but sentences may be skipped if they do not appear in one of them. The format is like follows:

Switch on the linked-parent feature (depending on the links between parent nodes of the current node pair). This flag has to be specified in both, train and align mode! This flag should NOT be used together with -U or -C!

Use <iter> number of iterations for adaptive SEARN style learning. This is only useful in connection with (any of) the link depedency features from above (-C -U -P). Instead of learning from the given true link depedency feature extracted from the training data, this option will run the training several times and adjust these features acoording to the predicted link likelihoods from the previously trained classifier. This is currently very slow because it re-runs the feature extraction procedure (which should not be necessary when re-running the classifier). This should be improved later but the effect of SEARN seems to be very little anyway ....

Align terminal nodes only (leaf nodes). It is possible to use this flag together with -N which then forces the aligner to align corresponding node types only (terminals with terminals and non-terminals with non-terminals)

Type of alignment strategy to be used. Default is 'inference' which refers to a two-step procedure with local classification in the first step and alignment inference in the second (see LinkSearch with argument -l). An alternative strategy is called 'bottom-up' in which the alignment is done in a greedy bottom-up fashion starting with leaf node pairs and going up to the root nodes. Nodes are linked immediately when the classification score (conditional link likelihood) exceeds the threshold (usually 0.5). Aligned nodes are removed from the search space. Therefore, only 1:1 links are returned. In a final step link likelihoods are used to align previously unlinked nodes with the selected alignment inference strategy in the same way as in the two-step procedure.

Link strategy used to extract the node aligments after classification. Default strategy is 'greedy'. Other possible strategies are 'wellformed' (greedy + wellformedness criteria) and threshold (allow all links above the threshold score). You can also add the option 'final' (by adding the string '_final') to the selected strategy. In that case the aligner will first do the basic link search and then add links between nodes that obey the well-formedness criteria if either source or target language node is not linked yet. In other words, this final step makes 1:many links in the data that do not violate wellformedness. Yet another option is 'and' (which can be added as the string '_and' to the selected strategy, also in combination with '_final'). Using this option unlinked nodes (source and target) will be aligned in a last step in a greedy way even if they violate well-formedness. For example: 'wellformed_final_and' will force the aligner to, first, look for 1:1 links that are well-formed (multiple links are not allowed), then add well-formed links between nodes where one of them is already linked to another one, and, finally, adds links between still unlinked nodes.

Switch to add-links mode (union). Existing links between nodes will be kept in the output file and new ones will be added. (In the default mode, existing links will be considered for evaluation only). This option is espcially useful if one wants to use a pipeline of alignments, for example, terminal node alignment first and non-terminal nodes in the next step.

Similar to -u: switches to 'add-link' mode but now forces the aligner to use existing links to compete with the new ones. This means that the scores of existing links will be used in the link search algorithm applied for aligning tree nodes. This may also cause some existing links to disappear, for example, because they are not conform to the wellformedness criteria anymore.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.