Abstract

This thesis investigates the notion of collocation and the effects of lexical association measures triangulating corpus, lexicographic and experimental evidence. Focusing on institutional academic English, and in particular on the genre of degree course descriptions, a special-purpose corpus is constructed semi-automatically from the web. Two widely-used lexical association measures (Mutual Information and Log-likelihood), a relatively recent one (Lexical Gravity), and bare cooccurrence frequency are used to extract collocation candidates from this corpus using a stratified sampling technique. The 99 phrases thus selected are searched for in two dictionaries (a collocation dictionary and a general purpose learner dictionary), presented to expert informants who evaluate their acceptability, and used in a lexical decision task. The results of these evaluation tasks suggest that a) none of the measures significantly outperforms the others in extracting salient word pairs, even though bare frequency seems to perform marginally better than the others in the lexical decision task, and MI in the acceptability judgement task; b) different measures target different types of phrases (both in terms of the distinction between free/restricted combinations, and in terms of their degree of specialization) ; c) some measures perform better in the top range (e.g. Lexical Gravity), while for other measures the best results are scattered in different frequency ranges (e.g. Mutual Information); d) native speaker and non native speaker expert informants seem to evaluate collocativity in similar ways, even though non natives are more conservative, giving less extreme scores; e) the acceptability judgement questionnaire and the lexical decision task, performed on different groups using different experimental methodologies, provide converging evidence: the expressions that experts find to be the most acceptable are also recognized faster and more accurately by subjects in the test. In turn, f) this experimental evidence is correlated with the corpus evidence extracted by the association measures. The implications of these findings are manyfold. On the theoretical side, they confirm that corpora and lexical association measures provide evidence that is coherent with that obtained from experimental methods targeting language competence. On the descriptive side, the study suggests that the phraseology of degree course description is characterized by a mix of disciplinary terms (``cochlear implants'', ``linear algebra'') and core phraseology typical of the genre (``wide genre'', ``open days''), as well as showing which lexical association measures are more appropriate for targeting the former (Mutual Information) or the latter (Frequency or Lexical Gravity). Lastly on the applied side, these findings can be used to provide guidelines as to the best lexical association measures to use depending on the type of phrases one wants to extract, the amount of manual filtering that can be applied to the task, the number of phrases to extract and so forth.