Sonderforschungsbereich 732:

:::

:::

Project D4-E (2006-2014)

Modular Lexicalization of Probabilistic Context-Free Grammars

This project aims to develop and implement improved statistical disambiguation methods for syntactic analyses. It also develops a clustering model for verb-argument tuples which generalises selectional restrictions over WordNet concepts.

In the next phase, the project will implement a new parameter estimation technique for the BitPar parser which was developed in the first phase. The new method is based on ensembles of decision trees and is intended to improve the accuracy of parsing with fine-grained syntactic categories which contain information about e.g. number, gender, and case. The project will also examine whether reranking strategies can further increase the accuracy of the parser. The reranker will use features derived from the clustering model as well as other features. The clustering model will be extended by (i) dealing with adjuncts in addition to arguments (ii) automatically inducing noun hierarchies instead of using WordNet, and (iii) implementing a hybrid probability model. The clustering model will be applied to tasks such as word sense disambiguation.