The work presented here addresses the question of how to determine whether a grammar formalism is powerful enough to describe natural languages. The expressive power of a formalism can be characterized in terms of i) the string languages it generates (weak generative capacity (WGC)) or ii) the tree languages it generates (strong generative capacity (SGC)). The notion of WGC is not enough to determine whether a formalism is adequate for natural languages. We argue that even SGC is problematic since the sets of trees a grammar formalism for natural languages should be able to generate is difficult to determine. The concrete syntactic structures assumed for natural languages depend very much on theoretical stipulations and empirical evidence for syntactic structures is rather hard to obtain. Therefore, for lexicalized formalisms, we propose to consider the ability to generate certain strings together with specific predicate argument dependencies as a criterion for adequacy for natural languages.

Traditionally, parsers are evaluated against gold standard test data. This can cause problems if there is a mismatch between the data structures and representations used by the parser and the gold standard. A particular case in point is German, for which two treebanks (TiGer and TüBa-D/Z) are available with highly different annotation schemes for the acquisition of (e.g.) PCFG parsers. The differences between the TiGer and TüBa-D/Z annotation schemes make fair and unbiased parser evaluation difficult [7, 9, 12]. The resource (TEPACOC) presented in this paper takes a different approach to parser evaluation: instead of providing evaluation data in a single annotation scheme, TEPACOC uses comparable sentences and their annotations for 5 selected key grammatical phenomena (with 20 sentences each per phenomena) from both TiGer and TüBa-D/Z resources. This provides a 2 times 100 sentence comparable testsuite which allows us to evaluate TiGer-trained parsers against the TiGer part of TEPACOC, and TüBa-D/Z-trained parsers against the TüBa-D/Z part of TEPACOC for key phenomena, instead of comparing them against a single (and potentially biased) gold standard. To overcome the problem of inconsistency in human evaluation and to bridge the gap between the two different annotation schemes, we provide an extensive error classification, which enables us to compare parser output across the two different treebanks. In the remaining part of the paper we present the testsuite and describe the grammatical phenomena covered in the data. We discuss the different annotation strategies used in the two treebanks to encode these phenomena and present our error classification of potential parser errors.

The Inuit inhabit a vast area of--from a European point of view--most inhospitable land, stretching from the northeastern tip of Asia to the east coast of Greenland. Inuit peoples have never been numerous, their settlements being scattered over enormous distances. But nevertheless, from an ethnological point of view, all Inuit peoples shared a distinct culture, featuring sea mammal and caribou hunting, sophisticated survival skills, technical and social devices, including the sharing of essential goods and strategies for minimizing and controlling aggression.

Generative Grammar is the label of the most influential research program in linguistics and related fields in the second half of the 20. century. Initiated by a short book, Noam Chomsky's Syntactic Structures (1957), it became one of the driving forces among the disciplines jointly called the cognitive sciences. The term generative grammar refers to an explicit, formal characterization of the (largely implicit) knowledge determining the formal aspect of all kinds of language behavior. The program had a strong mentalist orientation right from the beginning, documented e.g. in a fundamental critique of Skinner's Verbal behavior (1957) by Chomsky (1959), arguing that behaviorist stimulus-response-theories could in no way account for the complexities of ordinary language use. The "Generative Enterprise", as the program was called in 1982, went through a number of stages, each of which was accompanied by discussions of specific problems and consequences within the narrower domain of linguistics as well as the wider range of related fields, such as ontogenetic development, psychology of language use, or biological evolution. Four stages of the Generative Enterprise can be marked off for expository purposes.

Simplicity as a methodological orientation applies to linguistic theory just as to any other field of research: ‘Occam’s razor’ is the label for the basic heuristic maxim according to which an adequate analysis must ultimately be reduced to indispensible specifications. In this sense, conceptual economy has been a strict and stimulating guideline in the development of Generative Grammar from the very beginning. Halle’s (1959) argument discarding the level of taxonomic phonemics in order to unify two otherwise separate phonological processes is an early characteristic example; a more general notion is that of an evaluation metric introduced in Chomsky (1957, 1975), which relates the relative simplicity of alternative linguistic descriptions systematically to the quest for explanatory adequacy of the theory underlying the descriptions to be evaluated. Further proposals along these lines include the theory of markedness developed in Chomsky and Halle (1968), Kean (1975, 1981), and others, the notion of underspecification proposed e.g. in Archangeli (1984), Farkas (1990), the concept of default values and related notions. An important step promoting this general orientation was the idea of Principles and Parameters developed in Chomsky (1981, 1986), which reduced the notion of language particular rule systems to universal principles, subject merely to parametrization with restricted options, largely related to properties of particular lexical items. On this account, the notion of a simplicity metric is to be dispensed with, as competing analyses of relevant data are now supposed to be essentially excluded by the restrictive system of principles.

This talk deals with the pragmatic notion topic and its encoding in Buli and some related Ghanaian Gur languages and reveals that it is responsible for several intricate phenomena in the grammar of these languages.