A corpus-based approach to the acquisition of collocational prepositional phrases

A corpus-based approach to the acquisition of collocational prepositional phrases

Abstract

Collocational prepositional phrases in Dutch are patterns of the form P-NP-P, which have a non-compositional semantics and which are syntactically rigid or idiosyncratic. We present a number of linguistic tests which set such items apart from regularly built prepositional phrases. To find candidate strings which should be included in a computational dictionary as multi-word prepositional phrases, we extract all instances of the relevant pattern from a corpus. Next, we introduce a number of statistical tests to find those instances which behave like strong collocations. The strongest collocations according to the statistical tests are compared with lists of such items presented elsewhere, andwere evaluated by human judges.