We use cookies to enhance your experience on our website. By continuing to use our website, you are agreeing to our use of cookies. You can change your cookie settings at any time.Find out moreJump to
Content

PRINTED FROM the OXFORD RESEARCH ENCYCLOPEDIA, LINGUISTICS (linguistics.oxfordre.com). (c) Oxford University Press USA, 2018. All Rights Reserved. Personal use only; commercial use is strictly prohibited (for details see Privacy Policy and Legal Notice).

date: 19 November 2018

Generative Grammar

Summary and Keywords

This article presents different types of generative grammar that can be used as models of natural languages focusing on a small subset of all the systems that have been devised. The central idea behind generative grammar may be rendered in the words of Richard Montague: “I reject the contention that an important theoretical difference exists between formal and natural languages” (“Universal Grammar,” Theoria, 36 [1970], 373–398).

A generative grammar is a formal system which is built from a finite number of ingredients, but provides an explicit way of constructing (generating) a potentially infinite set of strings of atomic symbols and possibly associates each of these strings with a constituent structure. If we view a language L as a specific set of strings, a generative grammar G is a grammar for L just in case the set of strings generated by G coincides exactly with L. For example, the grammar G1 in (1) is a grammar for the infinite language L1 = {anbn: n ≥ 1}:

(1)

The expressions in (1) can be read as the instructions “rewrite S as aSb” and “rewrite S as ab.” Notice that once S has been rewritten by ab, no rule of G1 can apply and the derivation terminates. Notice also that we require the derivation to continue until a string not containing S has been generated.1(2) shows the derivation of the string aaabbb by G1:

(2)

More generally, the generation of a string anbn involves applying rule (1)a n-1 times before applying (1)b (once). Since G! can only produce strings of the form anbn, we see that the set of strings generated by G1 is exactly L1 = {anbn: n ≥ 1}.

We can also record the derivation in (2) as in the tree (3), where each atomic element of a derived string is connected by a line (branch) to the symbol that was rewritten as that string:2

(3)

Equivalently, this can be represented as a labeled bracketing where the brackets are labeled by the non-terminal symbol that gave rise the string of terminal elements within the brackets:

(4)

Representations like (3) and (4) can be taken to represent the constituent structure associated with the string by its derivation. Then, G1 assigns aaabbb a constituent structure where ab, aabb and aaabbb are all constituents of type S.

The notion of generative grammar has been imported into the study of natural languages (human languages) from the study of artificial formal languages. The basic intuition behind this is that natural languages have a formal structure that lends itself to analysis in these terms. The pioneers in generative analysis of natural language are Harris (1951, 1957) and Chomsky (1953, 1955a, 1955b, 1956, 1957).

2. Classes of Grammars and Their Generative Capacity

Grammars can be partitioned into different types on the basis of their formal characteristics, in particular in terms of the restrictions imposed on the productions (rewrite rules) allowed in each class. The Chomsky-Schützenberger Hierarchy comprises four distinct classes (Chomsky, 1956; Chomsky & Schützenberger, 1963):

(5)

The type 0 grammars are the least restricted in that a string of more than one symbol can be rewritten as a single symbol, which is not allowed in the other types. Type 1 grammars (CSGs) allow only a single non-terminal symbol to be rewritten at each step of the derivation, but may allow it to be rewritten in a certain way only when it finds itself in a certain context. Formally, a Type 1 production must be of the form shown in (6), where A is a single non-terminal, and ϕ‎ and ξ‎ may be the null string, but ω‎ may not:4

(6)

If ϕ‎ is the null string, but ξ‎ is not, A can be rewritten as ω‎ just in case it is immediately followed by ξ‎, if ξ‎ is the null string, but ϕ‎ is not, A is rewritten as ω‎ only if immediately preceded by ϕ‎, and if neither ϕ‎ nor ξ‎ is the null string, both conditions apply. But if both are the null string, A can be rewritten as ω‎ in all environments.5

In Type 2 grammars (CFGs), both ϕ‎ and ξ‎ must be null in (6), i.e. the productions are all of the form shown in (7) (with ω‎ not null):

(7)

Thus, rewriting is never restricted to specific contexts. (The grammar G1 in (1) has this property, i.e. it is a CFG.)

Type 3 grammars (RGs) are CFGs with the added restriction that a non-terminal must be rewritten as a nonnull string of terminals x followed by at most a single non-terminal:

(7)

Thus, the RGs are a proper subset of the CFGs. For example, G1 in (1) is not a FSG because of rule (1)a.

The importance of the Chomsky-Schützenberger hierarchy derives from the fact that the different types of grammar differ in their “weak generative capacity,” i.e. they generate different types of languages seen simply as sets of strings. For example, L1 = {anbn: n ≥ 1} can be generated by the CFG G1, but not by any RG. More generally, it has been shown that there is a hierarchy of types of languages matching the Chomsky-Schützenberger hierarchy of grammars: The languages generated by RGs are a proper subset of the languages generated by CFGs, which are a proper subset of the languages generated by CSGs, which in turn are a proper subset of the languages generated by unrestricted rewrite systems (Type 0).

The study of formal languages also involves the study of abstract automata, which are usually seen as devices for recognizing the members of a set of strings. For example, the members of the set L1 = {anbn: n ≥ 1} can be recognized by a push down automaton (see Wall, 1972; Hopcroft & Ullman, 1969). For each type of grammar in the Chomsky-Schützenberger hierarchy there is a class of automata that recognizes exactly the sets of strings that are generated by grammars of that type. This equivalence has been useful in determining the weak generative capacity of different types of grammar.

We will now look at the relevance of such results to the analysis of natural languages.

3 The Relevance of Generative Capacity

A linguist studying natural languages will be engaged in accounting for the syntactic properties characteristic of all human languages. This involves both providing an account of the patterns that exist, as well as the patterns that are systematically absent. For the latter task, one may reasonably adopt the strategy of basing one’s analysis on a type of grammars such that the non-existing patterns lie beyond their weak generative capacity. But the former task obviously demands that the weak generative capacity of the type of grammar selected extends to all the existing patterns.

This requirement has led to discarding RGs as models for the grammar of natural languages. It can be shown that no RG generates L2 = {xx-1} (the set of all strings consisting of a string x followed by its mirror image x-1).6 A string abba exhibits nested dependencies in the sense that the first a requires the presence of the second a (and vice versa), and the first b requires the presence of the second (and vice versa), But Chomsky (1956, 1957) argues that nested dependencies are found in natural languages, for example, in English. Hence, L2 is a sublanguage of English and other natural languages, which therefore cannot be generated by any RG.

On the other hand, L2 can be generated by a CFG (which is not also a FSG). Consider G2, which is similar to G1:

By this reasoning, grammars for natural languages might be found among the CFGs that are not also FSGs. But some natural languages, for example, Dutch and Swiss German, exhibit another type of dependency which is also beyond the reach of CFG:7

(10)

In (10), the first object is selected by the first verb, the second NP by the second verb and the third NP by the third verb, as indicated by the subscripts.

The “cross-serial” dependencies in (10) correspond to what we find in the formal language L3 = {xx}, i.e. a set of all strings that can be divided into two identical substrings, for example, abcabc. But no CFG can generate L3.

Since cross-serial dependencies are in fact generated by CSGs, it might be concluded that grammars for natural languages must at least have the weak generative capacity of CSGs. However, investigations of the formal properties of grammars have identified a new class of “mildly context sensitive grammars” (MCSGs) that are intermediate between CSGs and CFGs in terms of weak generative capacity. It has also been shown that grammars in this class are capable of generating languages like L3 with crossing dependencies. If one pursues the goal of accounting for systematically absent patterns by selecting the most restrictive grammar type that can also generate the existing patterns, this suggests that the class of MCSGs might provide good candidates.

The notion of “mildly context sensitive grammars” (and languages) emerged from A. Joshi’s work on tree-adjoining grammars (TAGs), a class of grammars which falls within the class of MCSGs (see Joshi, 1969). However, it has been established by Vinjay-Shanker and Weir (1994) and subsequent work that TAGs are equivalent to a number of other grammar formalisms in terms of weak generative capacity, for example, head-driven phrase structure grammars (HPSGs) as defined by Pollard and Sag (1987, 1994) and the combinatory categorial grammars (CCGs) characterized by Steedman (1996). As formalized by Michaelis (1998), transformational grammars conforming to Chomsky’s Minimalist Program are also mildly context sensitive in terms of their weak generative power.

A choice between grammar formalisms that have the same weak generative capacity might be based on a comparison between them with respect to strong generative capacity. From this point of view, two different grammars that both generate the strings of a language L, are considered equivalent if and only if they parse each string in L into the same constituent structure. Being able to assign constituent structures that are consistent with constituency tests and support semantic interpretation,8 constitutes an important criterion of adequacy for grammars called “descriptive adequacy” by Chomsky (1965) in contrast to “observational adequacy,” which only refers to a grammar’s ability to generate the unanalyzed strings of a language, thus taking only weak generative capacity into consideration.

Recent investigations of MCSGs have revealed that different grammar types in this class are not necessarily equivalent with respect to their strong generative capacity. For example, TAGs and CCGs are not equivalent with respect to strong generative capacity (see Koller and Kuhlmann, 2009). Consequently, one might find that some of the different grammar formalisms give rise to grammars for natural languages meeting Chomsky’s criterion of descriptive adequacy while others don’t.

4 Transformational Grammar

To overcome the limitations of CFGs, Chomsky started developing transformational generative grammar (TGG) in the 1950s (Chomsky 1955a, 1955b, 1956, 1957). In TGG, the syntactic structure of a sentence may be built up from a “kernel sentence” generable by a CFG by application of syntactic transformations which insert, delete, or permute elements in the structures they apply to. One of the most influential early works in TGG is Chomsky (1965), which characterized what came to be known as the “Standard Theory” of TGG (later superseded by other versions, some of which we return to below).

A transformational rule would consist of a structural description characterizing the set of syntactic structures the rule could apply to, and a structural change specifying the effects of applying the rule. To illustrate, we reproduce the basics of the transformational account of cross-serial dependencies in Dutch offered by Evers (1975).

The kernel sentences which the structures with cross-serial dependencies are formed from, can be generated by a CFG with the productions in (11):9

(11)

These productions yield ungrammatical strings like (12) with the constituent structure indicated:

(12)

In this structure, each of the three initial NPs is locally connected with the right verb: Jan is identified as the subject of (a sentence containing) the verb zag “saw,” Piet is the subject of helpen “help” and de kinderen “the children” is the subject of zwemmen “swim,” just as in the English Jan saw Piet help the children swim. But the verbs come in the wrong order.

To create the right word order, Evers introduces a transformational rule called Verb Raising. This rule reorders the string by placing each verb to the immediate right of the verb following it creating a complex verb. Formally, the rule has the structural description to the right of the arrow in (13), and the structural change is specified to the right of the arrow:

(13)

At the first step of the derivation leading to the grammatical dat Jan Piet de kinderen zag helpen zwemmen (a string with cross-serial dependencies), the string in (12) is parsed to fit the structural description of Verb Raising as in (14):

Application of Verb Raising now yields the correct linear order of the three verbs:

(17)

This example illustrates some general properties of syntactic transformations. They are based on a structural analysis of the input, they may permute constituents and they may add new nodes to the input tree (the V that dominates two verbs permuted by Verb Raising). They may also apply iteratively. Verb Raising (in Evers’ analysis) is an example of a “cyclic transformation,” i.e. a member of a set of transformational rules that apply (possibly in a fixed order) within subtrees labeled by “cyclic categories,” for example, S in the case just discussed.

Chomsky’s notion of syntactic transformation derives from Harris (1951, 1957), who used transformations to characterize relations between classes of grammatical sentences. By contrast, Chomsky’s theory uses transformations to derive new structures from structures that would otherwise yield ungrammatical strings. Verb Raising, for example, must be defined to be an obligatory transformation (applying whenever its structural description is met) to derive grammatical sentences from strings that do not correspond to grammatical sentences. In this respect, Chomsky’s theory of transformations is generative in a way that Harris’ theory wasn’t.

Chomsky (1965) also adopted Katz and Postal’s (1964) idea that the semantic interpretation of a sentence is determined entirely by the structure assigned to it by the CFG base, after lexical insertion, but before any transformations have applied, i.e. at the level of analysis subsequently often referred to as the “deep structure” of the sentence.10

The empirical generalizations proposed by Ross (1967) inspired a development away from TGGs with large sets of transformational rules with highly specified structural descriptions and structural changes. Ross showed that a number of different transformations creating unbounded dependencies were subject to general conditions of applicability that need not be built into the structural descriptions associated with individual rules. For example, no transformation can move a single conjunct or a constituent of a single conjunct out of a conjoined structure (the Conjoined Structure Constraint) or extract a constituent from inside a relative clause (the Complex NP Constraint).11 On the one hand, this realization stimulated researchers to seek ways of unifying the various conditions uncovered by Ross. On the other hand, it led to a gradual simplification of the format for stating transformational rules.

A first step in this direction was taken by Chomsky (1973) who introduced the Subjacency Condition, which places an upper bound on the number of cyclic categories a constituent can move out of in one go. An additional general condition formulated by Fiengo (1977) requires that the position moved to (the “landing site”) not be properly contained in any constituent which doesn’t also contain the element that moves (Reinhart, 1976). The Government and Binding (GB) theory emanating from Chomsky (1981) also places general conditions on “extraction sites” (the position elements move from) based on the notion of grammatical “government.”

The ultimate outcome was the conclusion that all the specific transformational rules of the earlier Standard Theory could be eliminated in favor of a general license to move anything anywhere subject to general conditions of the sort just alluded to. It was also suggested that sentences whose derivation was previously thought to involve movement transformations, could be produced directly by the CFG base as structures containing gaps (empty positions) subject to being bound by another constituent in accordance with the general conditions that were initially thought to hold for movement transformations. If so, TGG could be turned into a purely representational theory with transformations replaced with declarative constraints.

However, the emphasis on derivations has come back in the current mainstream versions of TGG inspired by Chomsky (1993). In analyses adhering to the Minimalist Program, the only syntactic operations are Merge and Label. Merge puts together two structures to form a larger structure, and Label determines the syntactic category of the resulting structure on the basis of the categories of its immediate constituents. Movement now corresponds to the case where a constituent A is merged with a constituent B which contains a previously merged copy of A. On this “copy and paste” account, the “cut and paste character of classical movement transformations is recreated by subsequent copy-deletion at the interface between the syntactic component and the realizational phonological component subject to general conditions.12

Minimalist analyses also assume that syntactic derivations are subject to certain economy conditions. For example, Merge does not apply unless “triggered” by special features associated with one of the two constituents to be merged.

Classical TGGs (the Standard Theory) were shown by Peters and Ritchie (1973) to have the same weak generative power as unrestricted rewrite systems (Type 0 grammars) and Turing machines. Since this entails that there is no procedure for deciding whether a string is generated by some specific classical TGG, this is a negative result to the extent that native speakers are typically able to decide whether a sentence is part of their language or not. However, Michaelis (1998) demonstrates that a class of grammars conforming to the more restrictive Minimalist Program (Minimalist Grammars or MGs) belong to the class of MCSGs for which the decision problem can be solved efficiently.

5 Head-driven Phrase Structure Grammar

Turning now to non-transformational generative grammars, we begin with a brief description of Head-driven Phrase Structure Grammar (HPSG). HPSG can be seen as an extension of Generalized Phrase Structure Grammar as developed in Gazdar (1981), Gazdar, Klein, Pullum, and Sag (1985) and other studies. The foundational texts includes Pollard and Sag (1987, 1994).

The notion of “syntactic head” is common to a number of syntactic theories. An “endocentric” phrase inherits syntactic properties associated with one of its immediate constituents, which is then said to be the head of the phrase. Minimally, these properties will include the syntactic category of the head, as can be gleaned from traditional phrase structure rules:13

(18)

In the X’-theory (pronounced X-bar theory) emanating from Chomsky (1970), this is subsumed under a general principle for the construction of phrasal constituents:

(19)

In HPSG, heads are represented as fairly rich information structures built from attribute- value matrices (ATVs). Some of the information is ultimately associated with the specific word appearing in the head position. For example, a transitive or ditransitive verb appearing as the head of a VP will bring with it the information that the verb must co-occur with complements of a specified sort within the VP as well as information about the semantic relation between the complements and the head. This corresponds in part to the subcategorization frames introduced for heads in Chomsky (1965) (see footnote 9). Like subcategorization frames, the information about the head’s complement is not in general inherited by the mother node.

In addition, the head of a VP may also be annotated with the requirement that there must be a subject (even in cases where the subject bears no semantic relation to the verb, as in the case of “expletive subjects,” e.g. there in English) or the subjects of “raising verbs” like seem). Unlike the information about the heads complement selection, this information is passed on to the mother node, i.e. the VP, so that VP is ultimately constrained to combine with a subject in a way reminiscent of the “extended projection principle” (EPP) of Chomsky’s (1981) Government & Binding Theory. Like the EPP, the information that there must be a subject is not associated with individual lexical entries, but rather with the category verb as a whole.

Propagating information about the subject can be used to simulate the effects of local movement rules in TGGs to account for bounded dependencies. For example, the attribute-value matrix associated with a raising verb like seem will identify the value for its SUBJ attribute with the value of the SUBJ attribute of its complement, an infinitival VP which in turn inherits this attribute from the verb heading it along with the semantic role the subject is associated with (if any).

To handle unbounded dependencies, HPSG deploys a special feature, the “slash feature,” in a way reminiscent of a push-down automaton, an approach initiated by Gazdar (1981): When one of the complements of a head is missing, this is recorded by a slash feature in the heads ATV and, unlike other information about complements, this feature and its value is passed on to the phrasal mother node, for example, from V to VP and ultimately to the S-node immediately dominating the VP.14 By iteration of this mechanism, the slash feature may then travel over an unbounded domain via the complement/head relation until it meets a suitable filler for the gap it encodes. At the same time, general conditions on feature propagation impose restrictions on the paths that the slash feature may follow. This can be exploited to capture the restrictions on unbounded movement discovered by Ross (1967).

From this brief and oversimplified exposition, it should be apparent that HPSG implementations pursue the goal of eliminating phonologically empty syntactic elements (traces) as well as movement operations by making use of an enriched inventory of types of information associated with syntactic heads.

6 Combinatory Categorial Grammar

In pure Categorial Grammar (CG) originating from work by K. Ajdukiewicz in the 1930s, the set of syntactic categories Cat is defined on the basis of a finite set of basic categories (corresponding to things like N(P), V(P), etc. in other models):

(20)

The slash in the categories of the form X/Y introduced by (20)b is used to encode a distributional property in the category label: A word or phrase of category Y concatenates with an element of category Y to its right to produce a string of category X. In other words, a syntactic element of category X/Y is a function from syntactic elements of category Y to syntactic elements of category X. Correspondingly, its semantics is a function from meanings of the denotation type associated with Y to meanings of the denotation type associated with X (see Montague 1970a, 1970b for an early implementation of this strategy) For example, a transitive verb can be assigned to the category VP/NP restricting its occurrence to contexts where it is immediately followed by a NP (its object) with which it will combine to form a VP whose meaning is determined by combining the meaning of the verb with the meaning of the object NP.

Standardly, a “back-slash,” as in X\Y, is also used. The back-slash imposes the restriction that Y must precede X\Y. In an OV-language, the category for transitive verbs would then be VP\NP. Likewise, the necessity for a VP to combine with a subject NP to its left (in SVO and SOV languages alike) would be captured by replacing the category label VP with S\NP (and the category for transitive verbs would then be (S\NP)/NP or (S\NP)\NP). Notice that a CG doesn’t need a set of rewrite rules, since these rules are already encoded in the syntactic categories themselves.

A Combinatory Categorial Grammar (CCG) (as described by Steedman, 1987, 1996, 2000; Steedman and Baldridge, 2011) is a CG enriched with combinatory schemata going beyond those allowed in pure CGs and with syntactic type-raising (Curry & Feys, 1958). In a pure CG, categories only combine as in (21):

(21)

In addition, a CCG allows for function composition in accordance with the schemata in (22) (from Steedman and Baldridge (2011)):15

(22)

This enrichment enables the grammar to treat completeness as the shared object of the two verbs inside the conjoined structure conjectured and might have proved in examples like (23) (= Steedman & Baldridge’s (16)):

(23)

Another combinatory rule not employed in pure CGs, but consistent with general principles of CCG plays a crucial role in accounting for cross-serial dependencies:

Here, the dative NP em Hans is licensed by the first verb (hälfed), while the accusative NP es huus is licensed by the second verb (aastriche). The challenge is to enable the second verb to combine with the accusative NP across the first verb before the first verb combines with the dative across the intervening accusative NP. The “crossing forward composition” allowed by (24) achieves this by packaging the third verb with the second into a single functor which will apply to the accusative before it applies to the dative:16

(26)

It is noteworthy that this analysis does not involve any permutations of the string.

7 Lexical Functional Grammar

A Lexical Functional Grammar (LFG) as characterized in Kaplan and Bresnan (1982) and much subsequent work is a constraint-based system with multiple parallel levels of representation related to one another by a set of general rules and principles. The grammatical information relevant to a sentence’s well-formedness and interpretation originates from the lexical entries associated with the words the sentence contains. In this respect, LFGs are similar to HPSGs.

The levels of representation are called c-structure, a-structure, and f-structure. C-structures are trees generable by a CFG and represent the constituent structure of the sentence in a way familiar from other theories. But as in HPSGs, the labels are annotated with feature structures that provide the basis for the computation of the f-structure associated with the sentence.

Unlike c-structures, an f-structure is an attribute-value matrix formally similar to those employed in HPSG rather than trees. The f-structure is the representation of the different grammatical relations that obtain between the constituents of a c-structure, i.e. “subject of,” etc. These are not to be directly identified with the different semantic roles, for example, “agent,” that may be associated with c-structure constituents, although f-structure serves as an intermediary for the assignment of semantic roles to constituents. Importantly, grammatical relations are also taken not be directly determined from the c-structure itself, as in most versions of TGG, but are primitives of the theory.

As in HPSG, grammatical information propagates up the c-structure tree from daughters to mother nodes by unification of annotations. A grammatical sentence must be associated with a complete and coherent f-structure at the root of its c-structure tree. To illustrate, we may consider the analysis of Mary sees Sue in Nordlinger and Bresnan (2011), starting with the c-structure assumed, but introducing the annotations one by one as we go:

(27)

The lexical item Mary is annotated with PRED = “Mary” indicating its semantic value preceded by an “up-arrow” instructing the system to pass this information upwards and unify it with the f-structure of the NP-node that immediately dominates Mary. This NP-node is annotated with (up SUBJ) = down, indicating that the f-structure associated with the NP-node is passed on to the S-node and integrated into the functional structure of the sentence as a whole as the value of the sentence’s SUBJ attribute. Thus one piece of the f-structure associated with the sentence will be (28):

(28)

The verb sees has a more complex annotation:

(29)

As the up-arrows indicate the information is passed on to the V-node which has an annotation requiring that the information originating from sees propagates further to the VP-node where it unifies with the information OBJ [PRED ‘Sue’], which has reached the VP-node in a similar manner. Thus, the information associated with the VP is as in (30):

(30)

This information then propagates to the S-node where it unifies with the information in (28) passed on from the subject NP yielding the complete f-structure in (31):

(31)

This (simplified) sample analysis also illustrates the key role that the lexical properties of the verb play in LFG accounts of sentence structure. The lexicon associates sees with the annotation in (29) which already provides the information that there must be a subject (as well as an object). That is, the lexical entries for verbs encode a subject requirement corresponding to the EPP of GB-type TGGs.

Another characteristic feature of LFG accounts can be detected in the way the tense of the sentence is determined. In a more fine-grained analysis of our example sentence the value PRES(ent) of the verbs attribute TENSE comes from the -s suffixed to see. In many (but not all) TGG analyses, for example, the classical account of the morpho-syntax of tense in Chomsky (1957), this -s would be introduced as an exponent of a syntactic head Aux, Infl(ection) or T(ense) outside the VP and affixed to the verb by a syntactic transformation. But practitioners of LFG typically subscribe to a strong “lexicalist” principle requiring that any morpheme sequence that behaves like a single word must be preassembled in the lexicon. This, however, does not preclude a compositional analysis of forms like sees taking the suffix -s to provide the value for the verb’s TENSE feature as long as this happens in the lexicon rather than in the syntax.

The strict separation between morphological composition (in the lexicon) and syntactic composition makes it possible to postulate general principles favoring encoding of grammatical information by morphological means over syntactic encoding, for example, Nordlinger and Bresnan’s (2011) principle “Economy of Expression.” The impact of such principles can be appreciated by considering again the ways languages can meet the requirement that a sentence must have a subject.

In a number of languages, this requirement appears to be suspended, since a sentence that would have a pronominal subject in English, as in She sees Sue, doesn’t necessarily have one in these languages, for example, in Italian: Vede Sue “She sees Sue.” Many (but not all) of the languages that can omit subject pronouns (called “pro-drop languages” or “null subject languages”) have rich verbal inflection such that the form of a verb encodes information about the person and number features of the subject. For example, an Italian verb in the present tense of the indicative mood has six distinct forms corresponding to the six combinations of three values for the person feature and two for the number feature, whereas an English verb in the same tense only encodes the contrast between third person singular (-s) and all the rest, and other languages, for example, the modern Mainland Scandinavian languages, have no person/number marking on the verb at all. A fairly natural assumption is that the piece of verbal inflection encoding information about the person and number of the subject in languages like Italian actually is the subject—a kind of little pronoun contained in the verb itself.17 LFG provides a precise way of implementing this intuition. Just like the -s of sees provides a value for the TENSE attribute of the verb it combines with (in the lexicon), the different endings of an Italian verb (exemplified in (32)) provide values for the verb’s SUBJ attribute as depicted in (33):

(32)

(33)

Principles favoring encoding by morphological means will now predict that in languages like Italian, what appears to be a subject NP, like io in Io vedo Sue “I see Sue,” is actually not integrated into the sentence’s f-structure as a subject (since the place is already taken by -o), but must be parsed as a topic or a focused phrase (in addition to being associated indirectly with a grammatical function via binding). Bresnan and Mchombo (1987) present evidence from Chichewa (a Bantu language) which they take to be consistent with this prediction.

The dissociation between grammatical functions like SUBJ and OBJ and semantic roles like “agent” and “theme”/“patient,” is consistent with the fact that languages like English require a subject even in sentences like It is raining where the subject pronoun appears to be non-referential and to lack a semantic role. (The account sketched for languages like Italian correctly predicts that there is no subject pronoun in Piove “It rains” for the same reason there is no subject pronoun in Vedo Sue “I see Sue.”) However, grammatical functions are indeed often associated with semantic roles in a systematic way. In LFG analyses, this comes about as the outcome of mapping between the grammatical functions listed in a verb’s f-structure and the semantic roles (if any) in the a-structure associated with the verb. This mapping is determined by the two hierarchies in (34) (grammatical functions) and (35) (semantic roles) and the general principle in (36):18

(34)

(35)

(36)

This allows for straightforward non-transformational analysis of passives (in simple cases). Assuming that the annotation associated with a passive participle does not retain the verb’s OBJ attribute and that its a-structure does not contain the semantic role Agent (or prespecifies that it must be associated with an OBL), the principle in (36) will associate the Theme/Patient role (the highest role remaining in the a-structure) with the value of the grammatical function SUBJ, hence ultimately with the subject of the sentence, without movement of an object NP to the subject position.

In some constructions, the semantic role associated with the subject of the sentence must come from the complement of the verb rather than from the verb itself. This is the case in “raising sentences” like those in (37):

(37)

This pair of examples illustrates the general fact that the subject of raising verbs like seem and appear are assigned a semantic role just in case the infinitival verb following it would assign a semantic role to its own subject when it appears as a finite verb. In TGG analyses, this is accounted for by assuming that the subject of seem/appear is moved (“raised”) from the subject position of an infinitival clausal complement to the subject position of the main clause carrying its semantic role with it. In an LFG analysis, the same pattern of association between grammatical functions and semantic roles is accounted for by taking the c-structure to provide no subject position in the infinitival clause, although the f-structure annotation on the infinitival verb still has a SUBJ attribute associated with the highest semantic role in its a-structure (in cases like (37)a, but not in (37)b). Verbs like seem and appear, on the other hand, have no semantic role in their a-structure, but are annotated with an instruction to identify the value of its own SUBJ attribute with the value of the infinitive’s SUBJ attribute.

In general, LFG shares HPSG’s aversion to movement transformations applying to underlying constituent structures and shifts the burden of accounting for discontinuous dependencies to the system assembling lexically determined pieces of information into f-structures. This strategy may be extended to unbounded dependencies (see, for example, Lødrup, 2011 and references therein), and Bresnan, Kaplan, Peters, and Zaenen (1982) show that it also extends to the cross-serial dependencies exhibited by languages like Dutch and Swiss German. They give evidence from constituency tests showing that the NP cluster and V cluster in Dutch sentences like (38) must correspond to two parallel right-branching subtrees as already assumed in the representation (16) in the discussion of TGGs:19

(38)

(16)

This property is preserved in their c-structure representation of (38), although Bresnan et al. (1982) assign different category labels, do not assume complex verbs and have a ternary-branching VP:

(39)

Bresnan et al. (1982) show that there is an LFG which will associate the c-structure in (39) with an f-structure that in essential respects corresponds to the underlying “deep structure” posited in the TGG analysis:

(11)

They also point out that c-structures like (39) are generated by a CFG with the productions in (40):

(40)

This CFG will also generate strings of the form NPi Vj with i ≠ j where the number of NPs doesn’t match the number of Vs, but such a string will not be associated with an f-structure meeting the general well-formedness conditions on f-structures. That is, the rules and principles controlling the association of annotated c-structures with f-structures act as a filter restricting the set of strings NPi Vj in the language generated by the system as whole to those where i = j.

8 Language Acquisition

The question how a human being acquires his native language(s) has been a central issue in generative linguistics ever since Chomsky (1959). The basic fact to be explained is that children (up to a certain “critical age”) generally acquire any language they have sufficient exposure to, within a fairly short time span. Explanations offered within the general framework of generative grammar are to varying degrees based on Chomsky’s innateness hypothesis: The basic principles of grammar are somehow encoded in human brains as a characteristic property of the species. This, however, leaves open exactly what the basic principles of grammar are, a question to which different theories of generative grammar provide partially different answers.

An additional question that is closely connected with the problem of accounting for language acquisition is how to account for cross-linguistic variation. Since any human being can learn any human language with equal ease, it follows from the innateness hypothesis that the basic principles of grammar encoded in human brains must be valid for all human languages. Yet, human languages differ from one another in non-trivial respects. Thus, the challenge is to factor out, based on empirical investigations, the grammatical properties that are common to all human languages and impose a bound on variation and locate the points in the grammatical systems where individual languages are entitled to make different choices.20 These points of variation are often referred to as “parameters,” and the research strategy just described has led to different “Principles and Parameters” (P & P) theories, for example, the GB theory described by Chomsky (1981) as well as subsequent developments adhering to the Minimalist Program of Chomsky (1993).

For P & P theories, the acquisition problem corresponds to the question how the correct parameter values for the target language are discovered by the learner.

Research relevant to this question proceeds along two dimensions. On the purely theoretical side, the properties of various formal models of learning systems have been investigated to determine to what extent any of these provide plausible accounts of language acquisition by humans. On the empirical side, researchers have studied the various stages of language acquisition by humans to isolate the factors that drive the acquisition process.

The volume of relevant work on acquisition makes it impossible to review current research in this area here. Instead, we will merely present the outlines of one prominent approach to formal modeling of language acquisition consistent with P & P theories.

In the context of this approach, “learning” is understood to mean “identification in the limit” in the sense of Gold (1967): Starting from an initial hypothesis, a learner identifies the grammar Gt of the target language L(Gt) in the limit by changing hypotheses to fit new data just in case he eventually hypothesizes Gt and does not subsequently move away from Gt. A grammar/language is learnable “in the limit” if and only if a learner is guaranteed to eventually identify it using the procedure just described.

When this concept of learning is applied to language acquisition, it is generally assumed that what counts as data, is those sentences from the target language that the learner hears and tries to parse according to his current hypothesis (grammar).21 The learner will change his current hypothesis only if it comes up against a sentence it cannot assign a syntactic analysis. If grammars are viewed as lists of parameter values (plus invariant universal principles of grammar), changing a hypothesis amounts to changing the value of some parameter so that the resulting new grammar provides a syntactic analysis of the sentence under consideration. An added restriction would be that only one parameter value may be changed at each step along the learning path.

Another common assumption is that whenever two grammars Gi and Gj both generate the data available to the learner, but L(Gi) is a proper subset of L(Gj), the learner chooses Gi (“the Subset Principle” of Berwick [1985], Wexler and Manzini [1987]). This assumption is instrumental in making Gi learnable even when there is a Gj generating a superset of L (Gi).

The learner is generally assumed to proceed by trial and error when searching for a parameter value to reset in order to provide an analysis of a sentence inconsistent with his current hypothesis. But if the number of parameters is high, the size of the search space will be large and comprise different grammars that generate the input sentence, but are obtained from the current hypothesis by resetting different parameters. To alleviate this problem, it has been hypothesized that for each hypothesis distinct from the target grammar, the learner’s data always includes a sentence that can only be generated, if one specific parameter is set to the value it has in the target grammar. Such sentences are often called “triggers” in the literature.

However, Gibson and Wexler (1994) show that although a simple trigger-driven learning algorithm (TLA) of the sort just described is guaranteed to converge of the target grammar as long as there is a trigger for every (hypothesis, target) pair, there are some (hypothesis, target) pairs for which there are no triggers, even with a very small number of parameters. They illustrate this general problem by looking at how a TLA will behave when trying to identify the parameter values that determine basic word order in the target language. In their simplified scenario, only three parameters are relevant: The first specifies whether the subject is initial or final in S (PSubj = 1 (subject initial)/0 (subject final), and the value of the second determines whether a complement of the verb is initial or final in the VP (PComp = 1 (complements initial)/0 (complements final). These two parameters determine the basic word order unaffected by movement operations that may reorder either the subject or one of the complements to the beginning of the sentence. The third parameter (PV2) is set to 1 in languages that have the additional constraint that the verb must be placed in the second position (from the left) in declarative main clauses, as in all Germanic languages except English. It turns out that if the learner’s current hypothesis at some point corresponds to the grammar <0,0,1> generating a VOS language with V2 and the target grammar is <1,0,0> generating a SVO language without V2 (like English), there is no sentence from the target language which will set the learner on a path to the target, as long as the TLA only allows the learner to change one parameter value at a time and doesn’t allow any change at all, unless a (single) change of parameter value makes the current sentence analyzable. With these constraints imposed by the TLA, the learner will be stuck with the incorrect grammar <0,0,1>. In addition to the pair (<0,0,1>,<1,0,0>), there are five more (hypothesis, target) pairs for which no trigger exists in this scenario.22

This example illustrates how both empirical and theoretical approaches to language acquisition may inform both the theory of acquisition and the theory of grammar. Among the solutions that Gibson and Wexler (1994) consider for the learnability problem they uncover, one attributes a crucial role to the set of innate principles of grammar: If these principles fix no initial value for the parameters PSubj and PComp, but assigns PV2 the initial value 0, the TLA will never lead the learner to a hypothesis H such that no trigger exists for (H, target).23

Another possible solution they consider, is to revise the learning algorithm itself, for example, by allowing the learner to change more than a single parameter setting at a time.

9 Language Processing

A human being is in general capable of producing and understanding sentences in his native language rather effortlessly. Thus, if a generative grammar is intended to serve as a model of human linguistic knowledge, it must also be amenable to being integrated into a model of this aspect of human linguistic performance.

We can consider this issue in the light of what is known about the resources required for different types of abstract automata to parse sentences. For example, a PDA (push-down automaton) uses more resources (operations and memory space) than a FSA (finite state automaton) even to recognize a sentence as a member of a given language.24 If we measure the complexity of solving the recognition problem in terms of the resources involved (space and number of steps in the computation), recognizing and parsing a sentence is a more complex task with a PDA than with a FSA.

On the other hand, we have seen that human languages arguably require at least the weak generative capacity of PDAs, which correspond to CFGs in weak generative capacity. In fact, they require the power of so-called “embedded PDAs” which have the weak generative capacity of MCSGs, a family of grammars including HPSGs, CCGs, and MGs. MCSGs are intermediate between CSGs and CFGs in terms of weak generative capacity. The cost associated with a computation by an automaton parsing strings generated by a GSG is sufficiently high to suggest that even the recognition problem is virtually intractable. By contrast, membership can be decided efficiently by PDAs, and this property is believed to extend to the embedded PDAs needed to parse languages generated by MCSGs. (In view of the fact that ease of processing was invoked as one of the reasons for adopting the non-transformational architecture of LFG, it is interesting that the membership question for (an early version of) LFG was shown to be computationally intractable by Berwick 1985.

The preceding discussion has focused entirely on the relation between the computational complexity of the membership issue in relation to the formal properties of different classes of grammars and the corresponding classes of automata. In particular, no assumptions have been made about properties of the human brain that might be relevant to the issue. Chesi and Moro (2014), however, provide an interesting discussion confronting predictions made by the purely formal approach with empirical data obtained by studying the human parser in action.

If complexity is calculated as a function of the time and space used during computation, answering the membership question involves more computational complexity with a PDA than with a FSA. Similarly, a computation carried out by an “embedded PDA” (EPDA), the type of automaton needed to parse languages generated by MCSGs, involves more resources than parsing with a PDA. Since cross-serial dependencies can be generated by MCSGs, but not by a CFG, while nested dependencies can be generated by a CFG, an EPDA is needed to parse strings with cross-serial dependencies, while a pure PDA can handle nested dependencies. We should therefore conclude that parsing cross-serial dependencies involves more complexity than parsing nested dependencies. In particular, parsing Dutch or Swiss German sentences with cross-serial dependencies must be more costly in terms of computational resources than parsing the corresponding sentences in Standard German, which only exhibit nested dependencies:

(41)

(42)

Yet, as Chesi and Moro point out, there is empirical evidence that sentences like (41) are processed by native speakers more easily than sentences like (42). Observations of this sort highlight the importance of understanding exactly how parsing algorithms are implemented in the human brain.

Joshi, Aravind K. (1969). Properties of formal grammars with mixed types of rules and their linguistic relevance. Proceedings of the third International symposium on computational linguistics.Find this resource:

Koller, Alexander, & Kuhlmann, Marco. (2009). Dependency trees and the strong generative capacity of CCG. Proceedings of the 12th conference of the European chapter of the association for computational linguistics, Athens.Find this resource:

Notes:

(1.)
The symbols that can be rewritten are usually called non-terminal symbols. vs. non-terminal symbols. The others are terminal symbols. In the following examples, non-terminal symbols are written in capital letters.

(4.)
Lowercase Greek letters are used to represent strings of terminal and non-terminal symbols.

(5.)
Context-sensitive rewrite rules are often written in the form A → ω‎ / ϕ‎ _ ξ‎ ] where the dash indicates the position where A is rewritten as ω‎.

(6.)
Wall (1972) shows how this can be demonstrated using PDAs, which have the same weak generative capacity as RGs.

(7.)
Dutch sentences with this pattern were first discussed from this perspective by Huybregts (1976). Bresnan et al. (1982) argue that the issue really is about strong generative capacity rather than weak generative capacity.

(8.)
An important constituency test derives from empirical generalization that two strings can only be conjoined, for example, with and, if they are constituents. As for the semantic interpretation, the idea that the meaning of a sentence reflects the way the words and phrases are composed in the syntax, provides a way of assessing the plausibility of syntactic parses.

(9.)
In Chomsky (1965), context-sensitivity is introduced in the operation that inserts lexical items as the leaves of the syntactic tree. By general convention, a lexical item can only be inserted into an environment that matches its “subcategorization frame.” For example, an obligatorily transitive verb can only be inserted in a VP that directly dominates a noun phrase (NP).

(10.)
Current versions of TGG do not have any level of representation corresponding to “deep structure” and semantic interpretation is interleaved with the syntactic derivation of a sentence.

(11.)
Cross-linguistic empirical work has uncovered exceptions to some of Ross’s descriptive generalizations. The existence of such exceptions ultimately provides clues to the correct theoretical interpretation of the generalizations themselves.

(12.)
This could be thought of as “multidominance”: The “moved” constituent would then be represented as being directly dominated by two distinct nodes.

(13.)
The parentheses enclose optional components. The linear order built into (18) is irrelevant to the notion of “head”, for example, VP → (NP …) V would illustrate the same privileged relation between a phrase and its head.

(14.)
Gazdar (1981) shows that including the slash feature among the features associated with a phrase makes it possible to deduce Ross’s Coordinate Structure Constraint from the requirement that conjoined phrases must be of the same type.

(15.)
The subscripts on the slashes restrict the applicability of (22) to a proper subset of the elements of category X/Y, Y/Z, Y\Z or X\Y.

(16.)
(26) is a highly simplified version of Steedman and Baldridge’s (72).

(17.)
This analysis doesn’t extend to all languages with rich verbal inflection, for example, not to Icelandic which is not a null subject language, although Icelandic verbal inflections encode roughly the same amount of information as the Italian forms in (32).

(19.)
(16) actually differs from the structure proposed by Evers (1975), who assumes no branching inside the NP cluster as a result of S-nodes being “pruned” after the operation of Verb Raising.

(20.)
Greenberg (1963) is an influential contribution to empirical research of this sort. Cinque (2005) is a good example of how the limits to cross-linguistic variation can be characterized in generative grammar.

(21.)
That is, only positive data drive the learning process. Negative data, that is, the fact that a certain syntactic pattern has not yet shown up in the stream of incoming data, is usually considered irrelevant.

(22.)
An interesting discussion of these results can be found in Niyoga and Berwick (1996), who propose an interpretation of parameter spaces as Markov chains.

(23.)
In terms of the common understanding of “markedness” in grammar, innate principles of grammar might be said to fix the unmarked value for certain parameters, while the marked value can only be set upon exposure to relevant data.

(24.)
Considerations of this sort are also relevant to language acquisition, since a learner must have an efficient way of deciding whether an incoming sentence is generated by some grammar he hypothesizes.

Knut Tarald Taraldsen

Center for Advanced Study in Theoretical Linguistics, University of Tromsø