Syntax-Based Extraction

Abstract

In this chapter—the core of the book—we present and evaluate our methodology for collocation extraction based on deep syntactic parsing. First, a closer look at previous work which made use of parsed text for collocation extraction will reveal that the aim of fully-fledged syntax-based extraction was far from realized in these efforts due, primarily, to the insufficient robustness, precision, or coverage of the parsers used, as well as to the small number of syntactic configurations taken into account. Our work addresses these deficiencies with a generic extraction procedure that relies on a large-scale multilingual parsing system. After describing the system and extraction method, we focus on the contrastive evaluation of the method against the sliding window method, a standard syntax-free method based on the linear proximity of words. Cross-language evaluation shows that, despite the inherent errors and the challenges posed by the analysis of large amounts of unrestricted text, deep parsing contributes to a significant increase in performance. A detailed qualitative analysis of the results, including a case-study comparison, allows an assessment of the relative strengths and weaknesses of the two methods to be made. Following the qualitative comparison, a brief comparison of the current system with systems based on shallow parsing is presented.

Ritz J (2006) Collocation extraction: Needs, feeds and results of an extraction system for German. In: Proceedings of the Workshop on Multi-Word-Expressions in a Multilingual Context at the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, pp 41–48Google Scholar