Introduction

multilingual: it can be trained from an annotated corpus on multiple languages

customizable: features used in training can be customized.

DeSR is part of the Tanl framework, that provides the required tools to completely analyze sentences starting from text.

Technique

DeSR is a shift-reduce dependency parser, which uses a variant of the approach of (Yamada and Matsumoto 2003).

The parser builds dependency structures greedily by scanning input sentences in a single left-to-right or right-to-left pass and choosing at each step whether to perform a shift or to create a dependency between two adjacent tokens. Which transition to perform is learned from annotated corpora, based on features of the current parser state.

DeSR uses though a different set of rules and includes additional rules to handle non-projective dependencies that allow parsing to be performed deterministically in a single pass.

The algorithm also produces fully labeled dependency trees. A classifier is used for learning and predicting the proper parsing action.

The parser can be configured, selecting among several learning algorithms (Multi Layer Perceptron, Averaged Perceptron, Maximum Entropy, SVM), providing user-defined feature models, and selecting input-output formats (including the CoNLL-X shared task format). The MLP classifier works best for languages with sufficiently large training corpora and is fast both in training and parsing.