The Minimalist Program

The Minimalist Program

20th Anniversary Edition

Noam Chomsky

The MIT Press

Cambridge, MassachusettsLondon, England

2015 Massachusetts Institute of Technology

All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storageand retrieval) without permission in writing from the publisher.MIT Press books may be purchased at special quantity discounts for business or salespromotional use. For information, please email special_sales@mitpress.mit.edu.This book was set in Times LT Std by Toppan Best-set Premedia Limited. Printed andbound in the United States of America.Library of Congress Cataloging-in-Publication DataChomsky, Noam.The Minimalist program : 20th Anniversary Edition / Noam Chomsky. 20thAnniversary Edition.p. cmIncludes bibliographical references and index.ISBN 978-0-262-52734-7 (pbk. : alk. paper) 1. Minimalist theory (Linguistics)I. Title.P158.28.C48 2015410.18dc23201403024410

Contents

Preface to the 20th Anniversary Edition

IntroductionChapter 1

vii

The Theory of Principles and Parameters

11

with Howard Lasnik

Chapter 2

Some Notes on Economy of Derivation and Representation

Chapter 3

A Minimalist Program for Linguistic Theory

Chapter 4

Categories and Transformations

References 365Index 381

201

153

117

Preface to the 20th Anniversary Edition

As discussed in the introduction to the first (1995) edition, the essays

included here draw from ongoing work from the late 1980s through theearly 1990s.It is important to recognize that the Minimalist Program (MP) under development in this work, and since, is a program, not a theory, a fact that has oftenbeen misunderstood. In central respects, MP is a seamless continuation ofpursuits that trace back to the origins of generative grammar, even before thegeneral biolinguistics program, as it is now often called, began to take shapein the 1950s.In particular, a leading concern from the outset had been to clarify theconcept simplest grammar and to determine how to choose the simplestgrammar for each language.1 The basic reasons are just normal science. SinceGalileo, modern science has been guided by his maxim that nature is simpleand that it is the scientists task to show that this is the case. It has longbeen clear that the quest for simplicity is closely related to the quest for explanation, matters clarified by the important work of Nelson Goodman at midcentury. At about the same time, Einstein expressed the basic point in hischaracteristic way:Time and again the passion for understanding has led to the illusion that man is ableto comprehend the objective world rationally, by pure thought, without any empiricalfoundationsin short, by metaphysics. I believe that every true theorist is a kind oftamed metaphysicist, no matter how pure a positivist he may fancy himself. Themetaphysicist believes that the logically simple is also the real. The tamed metaphysicist believes that not all that is logically simple is embodied in experienced reality, butthat the totality of all sensory experience can be comprehended on the basis of aconceptual system built on premises of great simplicity. The skeptic will say that thisis a miracle creed. Admittedly so, but it is a miracle creed which has been borne outto an amazing extent by the development of science. (Einstein 1950, 13)

viii

Preface to the 20th Anniversary Edition

As discussed in the 1995 introduction, two distinct notions of simplicity

were pursued in early generative grammar: the general notion that Einsteinrefers to and that Goodman sought to sharpen, holding of rational inquirygenerally; and a theory-internal evaluation procedure designed to select theoptimal grammar for given data, within the format determined by UniversalGrammar (UG), which is understood in the modern literature to be the theoryof the biological endowment of the relevant components of the faculty oflanguage (FL). In effect, this yields an abstract language acquisition devicebut one that is unfeasible, as was recognized at once.A more specific concern arose as the biolinguistic framework took shapestarting in the 1950s. Any complication of UG poses barriers to some eventualaccount of the evolution of FL.2 There is, then, an additional and compellingreason to seek the simplest formulation of UG, eliminating stipulations, redundancy, and other complications, insofar as possible. MP is the current versionof this quest, within the general framework under consideration here.MP was a natural development after the crystallization of the principlesand-parameters framework (P&P) in the early 1980s. P&P overcame fundamental quandaries of the earlier framework, eliminating the need for anevaluation procedure, as discussed in the 1995 introduction. That leaves uswith only the general notion of simplicity and the specific concern for reducingUG to the minimum, now motivated in addition by concern about languageorigins that began to be discussed more seriously, but without much progress,in the 1970s.3P&P has been pursued very productively, making available a vast array ofnew empirical materials in languages of great typological variety, studied inmuch greater depth than heretofore. It has also revitalized psychology of language, historical and comparative linguistics, and other related disciplines, andhas led to innovative and highly insightful theoretical and empirical inquiry(see, e.g., Baker 2003, Longobardi 2003, Kayne 2013).The 1995 introduction takes note of a problem for the biological sciencesthat is already far from trivial: how can a system such as human language arisein the mind/brain ? The problem is no doubt a significant one. To addressit seriously, one must satisfy two elementary conditions. The first is to determine as best one can the nature of the phenotypethat is, what has evolved,namely FL. One must begin with the most satisfactory version of UG. Nobiologist, for example, would present a proposal about the evolution of the eyewithout presenting a clear accountpreferably, the best available oneofwhat an eye is. That is close to a truism, as is the second condition: pay attention to the empirical evidence about the origin of language.

Preface to the 20th Anniversary Edition

ix

The evidence is slim, but not zero. There are two empirical theses about theorigin of language (and, it can be plausibly argued, little more than these4).One, established with considerable confidence, is that there has been littleif any evolution of FL since our ancestors left Africa, some 50,00080,000years ago. The second, proposed with fair confidence, is that not long beforethis, there is no reason to believe that language existed at all (Tattersall 2012).If so, then FL emerged suddenly (in evolutionary time), and we would expectit to be quite simple, its basic properties largely determined by laws ofnature and by extralinguistic contingencies. Since language is clearly a computational system, the relevant laws of nature should include (and perhapsbe limited to) principles of efficient computation. These considerations lendsome independent reason to suspect that the research program of MP is on theright track.While a direct continuation of work from the earliest days, MP did formulatea new research program, sometimes called approaching UG from below.Pursuing this program, we seek to formulate a perfect solution to the conditions that language must meet, and then ask to what extent the many complexand varied phenomena of actual languages can be accounted for in these terms.By language here is meant I-language, what was called grammar in earlierwork, in one of the uses of this systematically ambiguous term.5The basic principle of language (BP) is that each language yields an infinitearray of hierarchically structured expressions, each interpreted at two interfaces, conceptual-intentional (C-I) and sensorimotor (SM)the former yielding a language of thought (LOT), perhaps the only such LOT; the latterin large part modality-independent, though there are preferences. The twointerfaces provide external conditions that BP must satisfy, subject to crucialqualifications mentioned below. If FL is perfect, then UG should reduce tothe simplest possible computational operation satisfying the external conditions, along with principles of minimal computation (MC) that are languageindependent. The Strong Minimalist Thesis (SMT) proposes that FL is perfectin this sense.SMT is not precisely formulated. MC can be interpreted in various ways,though some of its properties are uncontroversial, and reliance on these carriesus a long way, as work stimulated by MP has shown. There is a plausiblesuggestion as to what the simplest computational operation is: Merge, asdefined within MP.6 SMT accords with the guiding principle of the naturalsciences, and there is reason to expect something like this to be correct onevolutionary grounds. But of course, evaluation of the thesis is based on theempirical consequences of pursuing it.

Preface to the 20th Anniversary Edition

When the first edition of The Minimalist Program was published, thethesis seemed too extreme to be seriously proposed. In the years since, I thinkthat skepticism has lessened considerably. Some results have emerged thatseem to me to provide substantial evidence that this program is on theright track.One result has to do with the strange property of displacement that is ubiquitous in natural language: phrases are understood both where they are heardand in a position that is not articulated. To take a very simple case, the sentenceWhich book did John read? is understood to mean roughly For which bookX, John read the book X; the phrase which book is interpreted both where itappears and as the direct object of read, where it is not articulated. The sameholds for quite intricate expressions. Displacement had always seemedto mein particulara curious imperfection of language. Why should languagesresort to this device in a very wide range of constructions? Pursuit of SMTreveals that displacement with this property of multiple interpretation (thecopy theory of movement) is the simplest case. Some stipulation would berequired to block it, and correspondingly, any devices designed to yield theresult that comes free under SMT has an even heavier empirical burden tobear. This is a significant discovery, I thinktoo long in coming, and insufficiently appreciated, as are its consequences.One immediate consequence is that SMT yields structures that are appropriate for C-I interpretation, but obviously wrong for the SM interface, where allcopies but the hierarchically most prominent one are deleted (with interestingqualifications, which in fact support the conclusion). That follows from anotherapplication of MC: in externalization, reduce computation and articulation tothe minimum. The result is that the sentences that are heard have gaps, leadingto serious problems for parsing and perception, so-called filler-gap problems.We therefore have strong evidence that the basic design of language determines a crucial asymmetry between the two interfaces: the C-I interface isprivileged; externalization in one or another sensory modality (or none at all,as in thought) is an ancillary feature of language. If so, then specific usesof externalized language, such as communication, are peripheral to the coreelements of language design and evolution of FL, contrary to widespreaddoctrine.There is a great deal of additional evidence supporting this conclusion, andnone that I know of that is inconsistent with it. One important case is anothercurious property of language: structure-dependence of rules, a universal property that has been a puzzle since the 1950s. As an illustration, consider suchsimple sentences as Instinctively, eagles that fly swim and Can eagles that flyswim? Here the initial adverb or auxiliary verb does not relate to the linearly

Preface to the 20th Anniversary Edition

xi

proximal verb fly; rather, it relates to the linearly remote but structurally proximate verb swim. This observation holds for all relevant constructions in alllanguages, and it has been shown that children know the facts and make noerrors as early as testing is possible (Crain and Nakayama 1987). It is next toinconceivable that these facts are learned.7 The long-standing puzzle is thatthe procedure that is universally rejected, based on linear distance, is computationally far simpler than the one that is universally adopted, based on structural distance. The only known reason is that linear order is simply notavailable to acquisition of I-language, even though it is everywhere in the data.It appears that the internal system, biologically determined, observes SMT andtherefore ignores linear order in favor of structural distance.Linear order and other arrangements therefore appear to be reflexes of theSM modalities for externalization, having nothing particular to do with coreelements of language design (though of course they have a variety of secondary effects). That conclusion fits with the very limited evidence about theorigin of language. The SM systems long antedate the apparent emergence oflanguage and do not seem to have been modified significantly afterward (notsurprisingly, given the very brief time period prior to the departure of Homosapiens from Africa).It is a familiar fact that the complexity and variety of language appears tobe localized overwhelminglyand perhaps completelyin externalization(which includes Saussurean arbitrariness of the lexicon). In learning a language, the real problem is mastering externalization. Principles of semanticinterpretation are virtually unlearnable, beyond the most superficial cases, andare probably simply determined by UG; and the same appears to be largely orcompletely true for the syntactic operations (narrow syntax) that yield thestructures at the C-I interface. A possible account of the origin of language isthat some rewiring of the brain, presumably the result of some mutation,yielded the simplest computational operations for BP, including the link tosome preexisting conceptual structures CS,8 providing a LOT. Since this emergent system would have been subject to no selectional pressures, it would haveassumed an optimal form in accord with natural lawspecifically, MCrather the way a snowflake forms. A subsequent task is to relate this systemto some sensory modality for externalization, a nontrivial cognitive problemsince input and output have no intrinsic relations (apart from possible effectsof later adaptation). It is a task that can be solved in many ways, leading tothe variety of languages, each easily subject to the effects of historical accident. There are doubtless constraints on how externalization takes placetheprinciples of morphology, phonology, prosody, and so on. But it may be thatevolution played a slight role in establishing these constraints.

xii

Preface to the 20th Anniversary Edition

The general picture accords pretty well with what we know about language.The crucial question, of course, is to what extent SMT can in fact account forthe relevant phenomena of language. There has, I think, been substantial progress in moving toward this goal, with some significant results, such as thosejust mentioned.9 Needless to say, there remain vast areas to explore to determine how far SMT can reach, but the prospects seem exciting and certainlychallenging.Notes1. See Chomsky 1951 and subsequent publications by many authors from the 1950s.2. Commonly misnamed as evolution of language; languages change, but do notevolve.3. Piattelli-Palmarini (1974) introduced the term biolinguistics to refer to the approachthat was being pursued in work in generative grammar.4. On the dubious character of much current work, see Hauser et al. 2014.5. See the 1995 introduction. The term I-language (internal language viewed intensionally) was suggested in Chomsky 1986 in an effort to resolve the confusions caused bythe ambiguity of the term grammar, which had been used to refer both to the objectunder investigation (I-language) and to the theory of that object. I also introducedanother term, E-language (external language), referring to any other conception oflanguage, and observed that there may be no coherent notion of E-language. Sincethen the term has been used in a variety of ways, sometimes to refer to a (necessarily)finite corpus of data, sometimes to the set of expressions weakly generated by a generative grammar, analogous to the well-formed formulas of invented logical systemsa notion that may not even be definable for natural language, as discussed inChomsky 1955, but at best is derivative from the more basic notion of strong generation of structures. My own feeling is that the term E-language should simply beignored.6. For discussion of this topic, see the papers collected in Graff and Van Urk 2012.For recent updates, see Chomsky 2013b, forthcoming. And see sources cited in thesepapers.7. There have been heroic efforts to demonstrate the contrary (in the case of the auxiliary, not adverb construal). Every attempt that is clear enough to investigate fails,irremediably (see Berwick et al. 2011); but more interestingly, it would be of littleinterest even if some such effort were to succeed. The attempts fail to address the onlysignificant question: Why? Why is it the case that this property is ubiquitous andexceptionless? I know of no answer other than the one repeated here.8. For further discussion, see Chomsky 2010. It is important to recognize that CS forhumans appears to be radically different from the elements of symbolic/communicationsystems in other animals (see Petitto 2005, Chomsky 2013a), a fact that poses veryserious problems for studying the origin of human cognitive capacities.9. For some recent ideas, see Chomsky 2013b, forthcoming.

The chapters that follow are based in large part on regular lecture-seminars atMIT from 1986 through 1994. These have been continuing now for over 30years, with broad participation by students, faculty, and others, from variousinstitutions and disciplines. In these introductory remarks I will outline someof the background for the material that follows.This work is motivated by two related questions: (1) what are the generalconditions that the human language faculty should be expected to satisfy? and(2) to what extent is the language faculty determined by these conditions,without special structure that lies beyond them? The first question in turn hastwo aspects: what conditions are imposed on the language faculty by virtueof (A) its place within the array of cognitive systems of the mind/brain, and(B) general considerations of conceptual naturalness that have some independent plausibility, namely, simplicity, economy, symmetry, nonredundancy, andthe like?Question (B) is not precise, but not without content; attention to thesematters can provide guidelines here, as in rational inquiry generally. Insofaras such considerations can be clarified and rendered plausible, we can askwhether a particular system satisfies them in one or another form. Question(A), in contrast, has an exact answer, though only parts of it can be surmisedin the light of current understanding about language and related cognitivesystems.To the extent that the answer to question (2) is positive, language is something like a perfect system, meeting external constraints as well as can bedone, in one of the reasonable ways. The Minimalist Program for linguistictheory seeks to explore these possibilities.Any progress toward this goal will deepen a problem for the biological sciences that is already far from trivial: how can a system such as human languagearise in the mind/brain, or for that matter, in the organic world, in which one

Introduction

seems not to find anything like the basic properties of human language? Thatproblem has sometimes been posed as a crisis for the cognitive sciences. Theconcerns are appropriate, but their locus is misplaced; they are primarily aproblem for biology and the brain sciences, which, as currently understood,do not provide any basis for what appear to be fairly well established conclusions about language.1 Much of the broader interest of the detailed and technical study of language lies right here, in my opinion.The Minimalist Program shares several underlying factual assumptions withits predecessors back to the early 1950s, though these have taken somewhatdifferent forms as inquiry has proceeded. One is that there is a component ofthe human mind/brain dedicated to languagethe language facultyinteracting with other systems. Though not obviously correct, this assumption seemsreasonably well-established, and I will continue to take it for granted here,along with the further empirical thesis that the language faculty has at leasttwo components: a cognitive system that stores information, and performancesystems that access that information and use it in various ways. It is the cognitive system that primarily concerns us here.Performance systems are presumably at least in part language-specific,hence components of the language faculty. But they are generally assumed notto be specific to particular languages: they do not vary in the manner of thecognitive system, as linguistic environments vary. This is the simplest assumption, and is not known to be false, though it may well be. Knowing of no betterideas, I will keep to it, assuming language variation to be restricted to thecognitive system.I also borrow from earlier work the assumption that the cognitive systeminteracts with the performance systems by means of levels of linguistic representation, in the technical sense of this notion.2 A more specific assumption isthat the cognitive system interacts with just two such external systems: thearticulatory-perceptual system A-P and the conceptual-intentional system C-I.Accordingly, there are two interface levels, Phonetic Form (PF) at the A-Pinterface and Logical Form (LF) at the C-I interface. This double interfaceproperty is one way to express the traditional description of language as soundwith a meaning, traceable at least back to Aristotle.Though commonly adopted, at least tacitly, these assumptions about theinternal architecture of the language faculty and its place among other systemsof the mind/brain are not at all obvious. Even within the general framework,the idea that articulation and perception involve the same interface representation is controversial, and arguably incorrect in some fundamental way.3 Problems relating to the C-I interface are still more obscure and poorly understood.I will keep to these fairly conventional assumptions, only noting here that if

Introduction

they turn out to be correct, even in part, that would be a surprising and henceinteresting discovery.The leading questions that guide the Minimalist Program came into focusas the principles-and-parameters (P&P) model took shape about fifteen yearsago. A look at recent history may be helpful in placing these questions incontext. Needless to say, these remarks are schematic and selective, and benefitfrom hindsight.Early generative grammar faced two immediate problems: to find a way toaccount for the phenomena of particular languages (descriptive adequacy),and to explain how knowledge of these facts arises in the mind of the speakerhearer (explanatory adequacy). Though it was scarcely recognized at thetime, this research program revived the concerns of a rich tradition, of whichperhaps the last major representative was Otto Jespersen.4 Jespersen recognized that the structures of language come into existence in the mind of aspeaker by abstraction from experience with utterances, yielding a notionof their structure that is definite enough to guide him in framing sentencesof his own, crucially free expressions that are typically new to speaker andhearer.We can take these properties of language to set the primary goals of linguistic theory: to spell out clearly this notion of structure and the procedure bywhich it yields free expressions, and to explain how it arises in the mind ofthe speakerthe problems of descriptive and explanatory adequacy, respectively. To attain descriptive adequacy for a particular language L, the theoryof L (its grammar) must characterize the state attained by the language faculty,or at least some of its aspects. To attain explanatory adequacy, a theory oflanguage must characterize the initial state of the language faculty and showhow it maps experience to the state attained. Jespersen held further that it isonly with regard to syntax that we expect that there must be something incommon to all human speech; there can be a universal (or general) grammar,hence a perhaps far-reaching account of the initial state of the language facultyin this domain, though no one ever dreamed of a universal morphology. Thatidea too has a certain resonance in recent work.In the modern period these traditional concerns were displaced, in part bybehaviorist currents, in part by various structuralist approaches, which radically narrowed the domain of inquiry while greatly expanding the database forsome future inquiry that might return to the traditionaland surely validconcerns. To address them required a better understanding of the fact thatlanguage involves infinite use of finite means, in one classic formulation.Advances in the formal sciences provided that understanding, making it feasible to deal with the problems constructively. Generative grammar can be

Introduction

regarded as a kind of confluence of long-forgotten concerns of the study of

language and mind, and new understanding provided by the formal sciences.The first efforts to approach these problems quickly revealed that traditionalgrammatical and lexical studies do not begin to describe, let alone explain, themost elementary facts about even the best-studied languages. Rather, theyprovide hints that can be used by the reader who already has tacit knowledgeof language, and of particular languages; the central topic of inquiry was, insubstantial measure, simply ignored. Since the requisite tacit knowledge is soeasily accessed without reflection, traditional grammars and dictionariesappear to have very broad coverage of linguistic data. That is an illusion,however, as we quickly discover when we try to spell out what is taken forgranted: the nature of the language faculty, and its state in particular cases.This is hardly a situation unique to the study of language. Typically, whenquestions are more sharply formulated, it is learned that even elementaryphenomena had escaped notice, and that intuitive accounts that seemed simpleand persuasive are entirely inadequate. If we are satisfied that an apple fallsto the ground because that is its natural place, there will be no serious scienceof mechanics. The same is true if one is satisfied with traditional rules forforming questions, or with the lexical entries in the most elaborate dictionaries,none of which come close to describing simple properties of these linguisticobjects.Recognition of the unsuspected richness and complexity of the phenomenaof language created a tension between the goals of descriptive and explanatoryadequacy. It was clear that to achieve explanatory adequacy, a theory of theinitial state must allow only limited variation: particular languages must belargely known in advance of experience. The options permitted in UniversalGrammar (UG) must be highly restricted. Experience must suffice to fix themone way or another, yielding a state of the language faculty that determinesthe varied and complex array of expressions, their sound and meaning; andeven the most superficial look reveals the chasm that separates the knowledgeof the language user from the data of experience. But the goal of explanatoryadequacy receded still further into the distance as generative systems wereenriched in pursuit of descriptive adequacy, in radically different ways fordifferent languages. The problem was exacerbated by the huge range of phenomena discovered when attempts were made to formulate actual rule systemsfor various languages.This tension defined the research program of early generative grammarat least, the tendency within it that concerns me here. From the early 1960s,its central objective was to abstract general principles from the complexrule systems devised for particular languages, leaving rules that are simple,

Introduction

constrained in their operation by these UG principles. Steps in this direction reduce the variety of language-specific properties, thus contributing toexplanatory adequacy. They also tend to yield simpler and more natural theories, laying the groundwork for an eventual minimalist approach. There isno necessity that this be the case: it could turn out that an uglier, richer,and more complex version of UG reduces permissible variety, thus contributing to the primary empirical goal of explanatory adequacy. In practice,however, the two enterprises have proven to be mutually reinforcing and haveproceeded side by side. One illustration concerns redundant principles, withoverlapping empirical coverage. Repeatedly, it has been found that these arewrongly formulated and must be replaced by nonredundant ones. The discovery has been so regular that the need to eliminate redundancy has become aworking principle in inquiry. Again, this is a surprising property of a biologicalsystem.These efforts culminated in the P&P model (see Chomsky 1981a, for oneformulation). This constituted a radical break from the rich tradition of thousands of years of linguistic inquiry, far more so than early generative grammar,which could be seen as a revival of traditional concerns and approaches tothem (perhaps the reason why it was often more congenial to traditional grammarians than to modern structural linguists). In contrast, the P&P approachmaintains that the basic ideas of the tradition, incorporated without greatchange in early generative grammar, are misguided in principlein particular,the idea that a language consists of rules for forming grammatical constructions (relative clauses, passives, etc.). The P&P approach held that languageshave no rules in anything like the familiar sense, and no theoretically significant grammatical constructions except as taxonomic artifacts. There are universal principles and a finite array of options as to how they apply (parameters),but no language-particular rules and no grammatical constructions of the traditional sort within or across languages.For each particular language, the cognitive system, we assume, consists ofa computational system CS and a lexicon. The lexicon specifies the elementsthat CS selects and integrates to form linguistic expressions(PF, LF) pairings, we assume. The lexicon should provide just the information that isrequired for CS, without redundancy and in some optimal form, excludingwhatever is predictable by principles of UG or properties of the language inquestion. Virtually all items of the lexicon belong to the substantive categories,which we will take to be noun, verb, adjective, and particle, putting aside manyserious questions about their nature and interrelations. The other categories wewill call functional (tense, complementizer, etc.), a term that need not be mademore precise at the outset, and that we will refine as we proceed.

Introduction

Within the P&P approach the problems of typology and language variationarise in somewhat different form than before. Language differences and typology should be reducible to choice of values of parameters. A major researchproblem is to determine just what these options are, and in what componentsof language they are to be found. One proposal is that parameters are restrictedto formal features with no interpretation at the interface.5 A still stronger oneis that they are restricted to formal features of functional categories (see Borer1984, Fukui 1986, 1988). Such theses could be regarded as a partial expressionof Jespersens intuition about the syntax-morphology divide. I will assume thatsomething of the sort is correct, but without trying to be very clear about thematter, since too little is understood to venture any strong hypotheses, as faras I can see.In this context, language acquisition is interpreted as the process of fixingthe parameters of the initial state in one of the permissible ways. A specificchoice of parameter settings determines a language in the technical sense thatconcerns us here: an I-language in the sense of Chomsky 1986b, where I isunderstood to suggest internal, individual, and intensional.This way of formulating the issues, within the P&P model, brings out clearlya crucial inadequacy in the characterization of language as a state of the language faculty. The latter can hardly be expected to be an instantiation of theinitial state with parameter values fixed. Rather, a state of the language facultyis some accidental product of varied experience, of no particular interest initself, no more so than other collections of phenomena in the natural world(which is why scientists do experiments instead of recording what happens innatural circumstances). My personal feeling is that much more substantialidealization is required if we hope to understand the properties of the languagefaculty,6 but misunderstandings and confusion engendered even by limitedidealization are so pervasive that it may not be useful to pursue the mattertoday. Idealization, it should be noted, is a misleading term for the only reasonable way to approach a grasp of reality.The P&P model is in part a bold speculation rather than a specific hypothesis. Nevertheless, its basic assumptions seem reasonable in the light of whatis currently at all well understood, and they do suggest a natural way to resolvethe tension between descriptive and explanatory adequacy. In fact, this departure from the tradition offered the first hope of addressing the crucial problemof explanatory adequacy, which had been put aside as too difficult. Earlierwork in generative grammar sought only an evaluation measure that wouldselect among alternative theories of a language (grammars) that fit the formatprescribed by UG and are consistent with the relevant data. Beyond that,nothing seemed conceivable apart from some notion of feasibility, left

Introduction

imprecise (Chomsky 1965). But if something like the P&P concept of I-language proves to be accuratecapturing the essential nature of the concept oflanguage that is presupposed in the study of performance, acquisition, socialinteraction, and so onthen the question of explanatory adequacy can beseriously raised. It becomes the question of determining how values are set byexperience for finitely many universal parameters, not a trivial problem by anymeans, but at least one that can be constructively pursued.If these ideas prove to be on the right track, there is a single computationalsystem CHL for human language and only limited lexical variety. Variation oflanguage is essentially morphological in character, including the critical question of which parts of a computation are overtly realized, a topic brought tothe fore by Jean-Roger Vergnauds theory of abstract Case and James Huangswork on typologically varied interrogative and related constructions.This account of the P&P approach overstates the case. Further variationamong languages would be expected insofar as data are readily available todetermine particular choices. There are several such domains. One is peripheral parts of the phonology. Another is Saussurean arbitrariness, that is, thesound-meaning pairing for the substantive part of the lexicon. I put thesematters aside, along with many others that appear to be of limited relevanceto the computational properties of language that are the focus here, that is, thatdo not seem to enter into CHL: among them, variability of semantic fields,selection from the lexical repertoire made available in UG, and nontrivialquestions about the relation of lexical items to other cognitive systems.Like the earliest proposals in generative grammar, formulation of the P&Pmodel led to discovery and at least partial understanding of a vast range ofnew empirical materials, by now from a wide variety of typologically differentlanguages. The questions that could be clearly posed and the empirical factswith which they deal are novel in depth and variety, a promising and encouraging development in itself.With the tension between descriptive and explanatory adequacy reduced andthe latter problem at least on the agenda, the tasks at hand become far harderand more interesting. The primary one is to show that the apparent richnessand diversity of linguistic phenomena is illusory and epiphenomenal, the resultof interaction of fixed principles under slightly varying conditions. The shiftof perspective provided by the P&P approach also gives a different cast to thequestion of how simplicity considerations enter into the theory of grammar.As discussed in the earliest work in generative grammar, these considerationshave two distinct forms: an imprecise but not vacuous notion of simplicity thatenters into rational inquiry generally must be clearly distinguished from atheory-internal measure of simplicity that selects among I-languages (see

Introduction

Chomsky 1975a, chapter 4). The former notion of simplicity has nothingspecial to do with the study of language, but the theory-internal notion is acomponent of UG, part of the procedure for determining the relation betweenexperience and I-language; its status is something like that of a physical constant. In early work, the internal notion took the form of an evaluation procedure to select among proposed grammars (in present terms, I-languages)consistent with the permitted format for rule systems. The P&P approach suggests a way to move beyond that limited though nontrivial goal and to addressthe problem of explanatory adequacy. With no evaluation procedure, there isno internal notion of simplicity in the earlier sense.Nevertheless, rather similar ideas have resurfaced, this time in the form ofeconomy considerations that select among derivations, barring those that arenot optimal in a theory-internal sense. The external notion of simplicityremains unchanged: operative as always, even if only imprecisely.At this point still further questions arise, namely, those of the MinimalistProgram. How perfect is language? One expects imperfections in morphological-formal features of the lexicon and aspects of language induced byconditions at the A-P interface, at least. The essential question is whether, orto what extent, these components of the language faculty are the repository ofdepartures from virtual conceptual necessity, so that the computational systemCHL is otherwise not only unique but in some interesting sense optimal.Looking at the same problem from a different perspective, we seek to determine just how far the evidence really carries us toward attributing specificstructure to the language faculty, requiring that every departure from perfection be closely analyzed and well motivated.Progress toward this further goal places a huge descriptive burden on theanswers to the questions (A) and (B): the effect of the interface conditions,and the specific formulation of general considerations of internal coherence,conceptual naturalness, and the likesimplicity, in the external sense. Theempirical burden, already substantial in any P&P theory, now becomes farmore severe.The problems that arise are therefore extremely interesting. It is, I think, ofconsiderable importance that we can at least formulate such questions today,and even approach them in some areas with a degree of success. If recentthinking along these lines is anywhere near accurate, a rich and exciting futurelies ahead for the study of language and related disciplines.The chapters that follow are almost but not quite in chronological order. Thefirst, written jointly with Howard Lasnik for a general Handbook on syntax(Chomsky and Lasnik 1993), is a general introduction to the P&P approach,as we understood it in 1991. It is included here for general background.

Introduction

Chapter 2 (Chomsky 1991c), written in 1988, is largely based on lectures in

Tokyo and Kyoto in 1987 and MIT lecture-seminars from fall 1986. Chapter3 (Chomsky 1993), written in 1992, is based on the fall 1991 lecture-seminars.These chapters explore the possibility of a minimalist approach, sketch someof its natural contours, and pursue it in some central areas. Chomsky 1994b,based on the fall 1993 lecture-seminars, revises this picture and extends it todifferent aspects of language. It provides much of the basis for chapter 4,which, however, is a more far-reaching departure, taking much more seriouslythe conceptual framework of a minimalist approach and attempting to keep toits leading ideas in a more principled way; and in the course of so doing,revises substantially the approach developed in Chomsky 1994b and the firstthree chapters here.The field is changing rapidly under the impact of new empirical materialsand theoretical ideas. What looks reasonable today is likely to take a differentform tomorrow. That process is reflected in the material that follows. Chapters1 and 2 are written from much the same perspective. The approach is changedin chapter 3, considerably more so in chapter 4. Though the general frameworkremains, the modifications at each point are substantial. Concepts and principles regarded as fundamental in one chapter are challenged and eliminatedin those that follow. These include the basic ideas of the Extended StandardTheory that were adopted in the P&P approaches: D-Structure; S-Structure;government; the Projection Principle and the -Criterion; other conditionsheld to apply at D- and S-Structure; the Empty Category Principle; X-bartheory generally; the operation Move ; the split-I hypothesis; and others.All are eliminated or substantially revised in successive chapters, particularlythe last.The end result is a picture of language that differs considerably from evenits immediate precursors. Whether these steps are on the right track or not, ofcourse, only time will tell.Notes1. For some discussion of this issue, see Chomsky 1994a,c, referring to Edelman 1992.Edelman takes the crisis to be serious if not lethal for cognitive science generally,whether computational, connectionist, or whatever.2. Adapted, essentially, from Chomsky 1975a.3. The term articulatory is too narrow in that it suggests that the language faculty ismodality-specific, with a special relation to vocal organs. Work of the past years insign language undermines this traditional assumption. I will continue to use the term,but without any implications about specificity of output system, while keeping to thecase of spoken language.

10

Introduction

4. For some discussion, see Chomsky 1977, chapter 1.

5. Interpret here is of course to be understood in a theory-internal sense. In a looserinformal sense, interpretations are assigned by the language faculty (in a particularstate) to all sorts of objects, including fragments, nonsense expressions, expressions ofother languages, and possibly nonlinguistic noises as well.6. Thus, what we call English, French, Spanish, and so on, even under idealizations to idiolects in homogeneous speech communities, reflect the Norman Conquest,proximity to Germanic areas, a Basque substratum, and other factors that cannot seriously be regarded as properties of the language faculty. Pursuing the obvious reasoning,it is hard to imagine that the properties of the language facultya real object of thenatural worldare instantiated in any observed system. Similar assumptions are takenfor granted in the study of organisms generally.

The Theory of Principles and Parameters

with Howard Lasnik

1.1

Introduction

Principles-and-parameters (P&P) theory is not a precisely articulated theoretical system, but rather a particular approach to classical problems of the studyof language, guided by certain leading ideas that had been taking shape sincethe origins of modern generative grammar some 40 years ago. These ideascrystallized into a distinctive approach to the topic by about 1980. In the yearssince, many specific variants have been developed and explored. The empiricalbase of these inquiries has also greatly expanded as they have extended tolanguages of widely varying types and have engaged a much broader range ofevidence concerning language and its use, also penetrating to far greater depth.In this survey we will not attempt to delineate the variety of proposals thathave been investigated or to assess their empirical successes and inadequacies.Rather, we will pursue a particular path through the array of ideas and principles that have been developed, sometimes noting other directions that havebeen pursued, but without any attempt to be comprehensive; similarly, bibliographic references are far from comprehensive, usually indicating only a fewstudies of particular questions. The choice of a particular path should beregarded only as an expository device, an effort to indicate the kinds of questions that are being addressed, some of the thinking that guides much research,and its empirical motivation. We do not mean to imply that these particularchoices have been well established in contrast to others, only some of whichwe will be able even to mention.

This chapter, coauthored with Howard Lasnik, was originally published in Syntax: AnInternational Handbook of Contemporary Research, edited by Joachim Jacobs, Arnimvon Stechow, Wolfgang Sternefeld, and Theo Vennemann (Berlin and New York: Walterde Gruyter, 1993). It appears here, with minor revisions, by permission of thepublisher.

12

Chapter 1

The study of generative grammar has been guided by several fundamental

problems, each with a traditional flavor. The basic concern is to determine andcharacterize the linguistic capacities of particular individuals. We are concerned, then, with states of the language faculty, which we understand to besome array of cognitive traits and capacities, a particular component of thehuman mind/brain. The language faculty has an initial state, genetically determined; in the normal course of development it passes through a series of statesin early childhood, reaching a relatively stable steady state that undergoes littlesubsequent change, apart from the lexicon. To a good first approximation, theinitial state appears to be uniform for the species. Adapting traditional termsto a special usage, we call the theory of the state attained its grammar and thetheory of the initial state Universal Grammar (UG).There is also reason to believe that the initial state is in crucial respects aspecial characteristic of humans, with properties that appear to be unusual inthe biological world. If true, that is a matter of broader interest, but one of nodirect relevance to determining the properties and nature of this faculty of themind/brain.Two fundamental problems, then, are to determine, for each individual (say,Jones) the properties of the steady state that Joness language faculty attains,and the properties of the initial state that is a common human endowment. Wedistinguish between Joness competence (knowledge and understanding) andhis performance (what he does with that knowledge and understanding). Thesteady state constitutes Joness mature linguistic competence.A salient property of the steady state is that it permits infinite use offinite means, to borrow Wilhelm von Humboldts aphorism. A particularchoice of these finite means is a particular language, taking a languageto be a way to speak and understand, in a traditional formulation. Jonesscompetence is constituted by the particular system of finite means he hasacquired.The notion of infinite use requires further analysis. In the light of insightsof the formal sciences in the 20th century, we distinguish two senses of thisnotion, the first relating to competence, the second to performance. In the firstsense, a language specifies an infinite range of symbolic objects, which wecall structural descriptions (SDs). We may think of the language, then, as afinitely specified generative procedure (function) that enumerates an infiniteset of SDs. Each SD, in turn, specifies the full array of phonetic, semantic,and syntactic properties of a particular linguistic expression. This sense ofinfinite use relates to Joness linguistic competence: the generative procedure with its infinite scope.

The Theory of Principles and Parameters

13

The second sense of infinite use has to do with Joness performance as

he makes use of his competence to express his thoughts, to refer, to producesignals, to interpret what he hears, and so on. The language faculty is embedded in performance systems, which access the generative procedure. It is inthis broader context that questions of realization and use of SDs arise, questions of articulation, intentionality, interpretation, and the like: How does Jonessay X? What is Jones talking about? What does Jones take Smith to be sayingor intending to convey? And so on. We might think of the SD as providinginstructions to the performance systems that enable Jones to carry out theseactions.When we say that Jones has the language L, we now mean that Jonesslanguage faculty is in the state L, which we identify with a generative procedure embedded in performance systems. To distinguish this concept of language from others, let us refer to it as I-language, where I is to suggestinternal, individual, and intensional. The concept of language is internal,in that it deals with an inner state of Joness mind/brain, independent of otherelements in the world. It is individual in that it deals with Jones, and withlanguage communities only derivatively, as groups of people with similarI-languages. It is intensional in the technical sense that the I-language is afunction specified in intension, not extension: its extension is the set of SDs(what we might call the structure of the I-language). Two distinct I-languagesmight, in principle, have the same structure, though as a matter of empiricalfact, human language may happen not to permit this option. That is, it mightturn out that the range of I-languages permitted by UG is so narrow that thetheoretical option is simply not realized, that there are no distinct I-languagesgenerating the same set of SDs. This seems, in fact, not unlikely, but it is not alogical necessity. When we use the term language below, we mean I-language.In the earliest work in generative grammar, it was assumed that Jonesslanguage generates an SD for each of the permissible phonetic forms forhuman language, a set to be specified by UG. Thus, Joness language assignsa particular status to such expressions as (1), where t (trace) indicates theposition in which the question word is construed.(1) a.b.c.d.e.

John is sleepingJohn seems sleepingwhat do you think that Mary fixed t (answer: the car)what do you wonder whether Mary fixed t (answer: the car)how do you wonder whether Mary fixed the car t (answer: with awrench)f. expressions of Swahili, Hungarian, etc.

14

Chapter 1

In fact, some of the most instructive recent work has been concerned with thedifferences illustrated by (1de), both in some sense deviant, but assigned adifferent status by Joness language (sections 1.3.3, 1.4.1); and one might welllearn about the languages of Jones and Wang by studying their reactions toutterances of Swahili.Another notion that appears commonly in the literature is formal languagein the technical sense: set of well-formed formulas; in a familiar variety offormal arithmetic, (2 + 2) = 5 but not 2 + =2)5 (. Call such a set anE-language, where E is to suggest external and extensional. In the theoryof formal languages, the E-language is defined by stipulation, hence is unproblematic. But it is a question of empirical fact whether natural language has anycounterpart to this notion, that is, whether Joness I-language generates notonly a set of SDs but also a distinguished E-language: some subset of thephonetic forms of UG, including some but not all of those of (1). Apart fromexpository passages, the concept of E-language scarcely appears in the tradition of generative grammar that we are considering here. As distinct from thenotions discussed earlier, it has no known status in the study of language. Onemight define E-language in one or another way, but it does not seem to matterhow this is done; there is no known gap in linguistic theory, no explanatoryfunction, that would be filled were such a concept presented. Hence, it willplay no role in our discussion.In the study of formal languages, we may distinguish weak generation ofE-language from strong generation of the structure of the language (the set ofSDs). The weak generative capacity of a theory of I-languages is the set ofE-languages weakly generated, and its strong generative capacity is the set ofstructures strongly generated. In the study of natural language, the conceptsof structure and strong generation are central; the concepts of E-language andweak generation at best marginal, and perhaps not empirically meaningful atall. Note that if E-languages do exist, they are at a considerably further removefrom mechanisms and behavior than I-language. Thus, the child is presentedwith specimens of behavior in particular circumstances and acquires an I-language in some manner to be determined. The I-language is a state of the mind/brain. It has a certain structure (i.e., strongly generates a set of SDs). It mayor may not also weakly generate an E-language, a highly abstract object remotefrom mechanisms and behavior.In the terms just outlined, we can consider some of the classical problemsof the study of language.(2) a. What does Jones know when he has a particular language?b. How did Jones acquire this knowledge?c. How does Jones put this knowledge to use?

The Theory of Principles and Parameters

15

d. How did these properties of the mind/brain evolve in the species?

e. How are these properties realized in mechanisms of the brain?Under (2a), we want to account for a wide variety of facts, for example, thatJones knows that(3) a. Pin rhymes with bin.b. Each expression of (1) has its specific status.c. If Mary is too clever to expect anyone to catch, then we dont expectanyone to catch Mary (but nothing is said about whether Maryexpects anyone to catch us).d. If Mary is too angry to run the meeting, then either Mary is so angrythat she cant run the meeting, or she is so angry that we cant runthe meeting (compare: the crowd is too angry to run the meeting); incontrast, which meeting is Mary too angry to run has only the former(nondeviant) interpretation.e. If Mary painted the house brown, then its exterior (not necessarily itsinterior) is brown.f. If Mary persuaded Bill to go to college, then Bill came to intend togo to college (while Mary may or may not have).The proposed answer to problem (2a) would be that Jones has language Lgenerating SDs that express such facts as (3). Note that Jones has this knowledge whether or not he is aware of these facts about himself; it may take someeffort to elicit such awareness, and it might even be beyond Joness capacities.This is a question that falls within the broader context of performance systems.The answer to problem (2b) lies in substantial part in UG. The correct theoryof the initial state will be rich enough to account for the attainment of a specificlanguage on the basis of the evidence available to the child, but not so rich asto exclude attainable languages. We may proceed to ask as well how environmental factors and maturational processes interact with the initial statedescribed by UG.Problem (2c) calls for the development of performance theories, amongthem, theories of production and interpretation. Put generally, the problemsare beyond reach: it would be unreasonable to pose the problem of how Jonesdecides to say what he does, or how he interprets what he hears in particularcircumstances. But highly idealized aspects of the problem are amenable tostudy. A standard empirical hypothesis is that one component of the mind/brain is a parser, which assigns a percept to a signal (abstracting from othercircumstances relevant to interpretation). The parser presumably incorporatesthe language and much else, and the hypothesis is that interpretation involvessuch a system, embedded in others.

16

Chapter 1

It has sometimes been argued that linguistic theory must meet the empiricalcondition that it account for the ease and rapidity of parsing. But parsing doesnot, in fact, have these properties. Parsing may be slow and difficult, or evenimpossible, and it may be in error in the sense that the percept assigned (ifany) fails to match the SD associated with the signal; many familiar cases havebeen studied. In general, it is not the case that language is readily usable ordesigned for use. The subparts that are used are usable, trivially; biologicalconsiderations lead us to expect no more than that. Similarly, returning toproblem (2b), there is no a priori reason to expect that the languages permittedby UG be learnablethat is, attainable under normal circumstances. All thatwe can expect is that some of them may be; the others will not be found inhuman societies. If proposals within the P&P approach are close to the mark,then it will follow that languages are in fact learnable, but that is an empiricaldiscovery, and a rather surprising one.Problems (2de) appear to be beyond serious inquiry for the time being,along with many similar questions about cognition generally. Here again onemust be wary of many pitfalls (Lewontin 1990). We will put these mattersaside.A grammar for Jones is true if (or to the extent that) the language it describesis the one Jones has. In that case the grammar will account for such facts as(3), by providing a language that generates appropriate SDs. A true grammaris said to meet the condition of descriptive adequacy. A theory of UG is trueif (or to the extent that) it correctly describes the initial state of the languagefaculty. In that case it will provide a descriptively adequate grammar for eachattainable language. A true theory of UG meets the condition of explanatoryadequacy. The terminology is intended to suggest a certain plausible patternof explanation. Given an array of facts such as (3), we can give an account ofthem at one level by providing a grammar for Jones, and we can provide anexplanation for them at a deeper level by answering problem (2b), that is, byshowing how these facts derive from UG, given the boundary conditions setby experience. Note that this pattern of explanation, though standard, makescertain empirical assumptions about the actual process of acquisition that areby no means obviously true, for example, that the process is as if it wereinstantaneous. Such assumptions are indirectly supported to the extent that theexplanations succeed.Any serious approach to complex phenomena involves innumerable idealizations, and the one just sketched is no exception. We do not expect to findpure instantiations of the initial state of the language faculty (hence of UG).Rather, Jones will have some jumble of systems, based on the peculiar patternof his experience. The explanatory model outlined deals specifically with

The Theory of Principles and Parameters

17

language acquisition under the idealized conditions of a homogeneous speech

community. We assume that the system described by UG is a real componentof the mind/brain, put to use in the complex circumstances of ordinary life.The validity of this assumption is hardly in question. To reject it would be toassume either (1) that nonhomogeneous (conflicting) data are required forlanguage acquisition, or (2) that the mind/brain does indeed have the systemdescribed by UG, but it is not used in language acquisition. Neither assumptionis remotely plausible. Rejecting them, we accept the approach just outlined asa reasonable approach to the truth about humans, and a likely prerequisite toany serious inquiry into the complex and chaotic phenomenal world.Furthermore, even if a homogeneous speech community existed, we wouldnot expect its linguistic system to be a pure case. Rather, all sorts of accidentsof history would have contaminated the system, as in the properties of (roughly)Romance versus Germanic origin in the lexicon of English. The proper topicof inquiry, then, should be a theory of the initial state that abstracts from suchaccidents, no trivial matter. For working purposes (and nothing more thanthat), we may make a rough and tentative distinction between the core of alanguage and its periphery, where the core consists of what we tentativelyassume to be pure instantiations of UG and the periphery consists of markedexceptions (irregular verbs, etc.). Note that the periphery will also exhibitproperties of UG (e.g., ablaut phenomena), though less transparently. A reasonable approach would be to focus attention on the core system, putting asidephenomena that result from historical accident, dialect mixture, personal idiosyncrasies, and the like. As in any other empirical inquiry, theory-internalconsiderations enter into the effort to pursue this course, and we expect furtherdistinctions to be necessary (consider, for example, the phenomenon of doinsertion in English as in (1ce), not on a par with irregular verbs, but not ofthe generality of fronting of question words).The preceding remarks are largely conceptual, though not without empiricalconsequences. We now proceed along a particular path, in the manner indicated earlier, assuming further empirical risk at each point.We assume that the language (the generative procedure, the I-language) hastwo components: a computational system and a lexicon. The first generatesthe form of SDs; the second characterizes the lexical items that appear in them.Many crucial questions arise as to how these systems interact. We will assumethat one aspect of an SD is a system of representation, called D-Structure, atwhich lexical items are inserted. D-Structure expresses lexical properties in aform accessible to the computational system.We assume further a distinction between inflectional and derivational processes of morphology, the latter internal to the lexicon, the former involving

18

Chapter 1

computational operations of a broader syntactic scope. These computational

operations might involve word formation or checking. Consider for examplethe past tense form walked. The lexicon contains the root [walk], with itsidiosyncratic properties of sound, meaning, and form specified; and the inflectional feature [tense], one value of which is [past]. One of the computationalrules, call it R, associates the two by combining them (either adjoining [walk]to [tense], or conversely). We might interpret this descriptive comment in twoways. One possibility is that [walk] is drawn from the lexicon as such; then Rcombines it with [past]. A second possibility is that processes internal to thelexicon (redundancy rules) form the word walked with the properties [walk]and [past] already specified. The rule R then combines the amalgam with[past], checking and licensing its intrinsic feature [past]. In this case thelexicon is more structured. It contains the element [walk], as before, alongwith rules indicating that any verb may also intrinsically possess such properties as [past], [plural], and the like. Similar questions arise about complexwords (causatives, noun incorporation structures, compound nouns, etc.). Asthese topics are pursued with more precision, within more closely articulatedtheories, important and often subtle empirical issues arise (Marantz 1984, Fabb1984, Baker 1988, Di Sciullo and Williams 1988, Grimshaw 1990).The SD provides information (to be interpreted by performance systems)about the properties of each linguistic expression, including its sound and itsmeaning. We assume that the design of language provides a variety of symbolic systems (levels of representation) fulfilling these tasks, including thelevel of Phonetic Form (PF) and the level of Logical Form (LF), specifyingaspects of sound and meaning, respectively, insofar as they are linguisticallydetermined. Another is the level of D-Structure, which relates the computational system and the lexicon.The level PF must satisfy three basic conditions of adequacy. It must beuniversal, in the sense that an expression of any actual or potential humanlanguage is representable within it. It must be an interface, in that its elementshave an interpretation in terms of the sensorimotor systems. And it must beuniform, in that this interpretation is uniform for all languages, so as to captureall and only the properties of the system of language as such.The same three conditions hold for LF. To capture what the language facultydetermines about the meaning of an expression, it must be universal, in thatany thought expressible in a human language is representable in it; an interface, in that these representations have an interpretation in terms of othersystems of the mind/brain involved in thought, referring, planning, and so on;and uniform, in just the sense that the phonetic system is. We will put asideimportant questions concerning the nature of the LF interface: does it involve

The Theory of Principles and Parameters

19

a conceptual system (Jackendoff 1983, 1990b), a use theory of meaning, a

causal theory of reference, etc.? The conditions are more obscure than in thecase of the phonetic analogue, because the systems at the interface are muchless well understood, but there is nonetheless a wealth of evidence firm enoughto allow substantive inquiry.According to this conception, then, each SD contains three interface levels:the external interface levels PF and LF, and the internal interface level ofD-Structure. The elements at these levels are further analyzed into features:phonological, selectional, categorial, and so on. In general, each symbol of therepresentations is a feature set, in respects to be further specified.A further assumption, developed in the Extended Standard Theory (EST),is that these levels are not related directly; rather, their relations are mediatedby an intermediate level of S-Structure. Adopting this view, each SD is asequence (, , , ), where and are representations at the external interface levels PF and LF, is at the internal interface of computational systemand lexicon, and is derivative. The first three levels meet empirical conditions imposed by the performance systems and the lexicon. The level ofS-Structure must relate to these three levels in the manner specified in UG;we might think of it, informally, as the (presumably unique) solution to thisset of conditions. In the subsequent discussion we restrict ourselves largely tothe levels D-Structure, S-Structure, and LF, and the relations among them(syntax in a narrow sense). We are thus concerned primarily with the derivationfrom D-Structure to LF in (4).(4)

D-StructurePF

Lexicon

S-StructureLF

Subtle questions arise as to how the relations among these levels are to beconstrued: specifically, is there an inherent directionality, so that the relations should be construed as a mapping of one level to another, or is theresimply a nondirectional relation? To formulate this as a real empirical issue isnot a simple matter, and empirical evidence to distinguish such possibilities isnot easy to come by. But interesting (and conflicting) arguments have beenpresented. Discrimination among these alternatives becomes particularly difficult if we adopt (as we will) the standard EST assumption, from the early1970s, that representations may include empty categories (ECs): elements(feature sets) that are perfectly substantive from the point of view of the

20

Chapter 1

computational system, but that do not happen to be assigned an interpretation

by the mapping from S-Structure to PF, though they may have indirect phonetic effects; thus, the contraction rules of English convert want to into thephonological word wanna when there is no trace intervening (who do youwanna see but not who do you wanna see John (Chomsky and Lasnik 1977)).We will tentatively proceed on the assumption that the relations are, in fact,directional: D-Structure is mapped to S-Structure, which is (independently)mapped to PF and LF.The earliest modern work in generative grammar borrowed standard ideasof traditional grammar, which recognized (I) that a sentence has a hierarchyof phrases (noun phrases, clauses, etc.) and that these (or their heads) enterinto certain grammatical relations; and (II) that sentences belong to variousgrammatical constructions with systematic relations among them, some morebasic than others (actives more basic than passives, declaratives more basicthan interrogatives, etc.). Correspondingly, the earliest versions of UG provided two kinds of rules: (I) phrase structure rules generating SDs that expressthe hierarchy of phrases; and (II) transformational rules that form grammaticalconstructions from abstract underlying forms, with more transformationsinvolved in formation of the less basic constructions (thus, only obligatorytransformations apply to form active declaratives (kernel sentences), but someoptional ones are involved in formation of passives, interrogatives, etc.). Thephrase structure rules provide a geometrical account of grammatical relations, understood relationally; that is, subject is not a syntactic category likenoun phrase or verb, but is understood as the relation subject-of holdingof the pair (John, left) in John left, and so on (Chomsky 1951, 1965, 1975a).These notions were defined in such a way that the phrase structure rules (I)generate D-Structures (deep structures), each a phrase marker that representshierarchy and relations. Transformations convert these objects into new phrasemarkers. In the later EST version, as noted, D-Structures are mapped toS-Structures by such derivations, and the latter are mapped independently toPF and LF.The resort to phrase structure rules was also suggested by other considerations. The earliest work concentrated on what is now called generative phonology, and in this domain rewriting rules of the form X Y, where X isan expression rewritten as Y in the course of derivation, seems an appropriatedevice. If these rules are restricted to the form XAY XZY, A a single symboland Z nonnull, then we have a system of rules that can form phrase structurerepresentations in a natural way (context-free rules if X, Y are null). Furthermotivation derived from the theory of formal systems. Grammatical transformations as generative devices were suggested by work of Harris (1952), which

The Theory of Principles and Parameters

21

used formal relations among expressions as a device to normalize texts for

the analysis of discourse.As for UG, the earliest versions assumed that it provided a format for rulesystems and an evaluation metric that assigned a value to each generativeprocedure of the proper format. The crucial empirical condition on UG, then,is that the system provide only a few high-valued I-languages consistent withthe kinds of data available to the child, perhaps only one. If UG is feasible inthis sense, the fundamental problem (2b) can be addressed (Chomsky 1965).This approach recorded many achievements, but faced a fundamental andrecurrent problem: the tension between descriptive and explanatory adequacy.To achieve descriptive adequacy, it seemed necessary to enrich the format ofpermissible systems, but in doing so we lose the property of feasibility, so thatproblem (2b) is still unresolved. The conflict arises as soon as we move fromthe intuitive hints and examples of traditional grammar to explicit generativeprocedures. It was quickly recognized that the problem is inherent in the kindsof rule systems that were being considered. The most plausible approach to itis to try to factor out overarching principles that govern rule applicationgenerally, assigning them to UG; the actual rules of grammar can then be givenin the simplest form, with these principles ensuring that they will operate insuch a way as to yield the observed phenomena in their full complexity(Chomsky 1964, Ross 1967). The limit that might be reached is that rules areeliminated entirely, the apparent rules being deduced from general principlesof UG, in the sense that the interaction of the principles would yield the phenomena that the rules had been constructed to describe. To the extent that thisresult can be achieved, the rules postulated for particular languages will beshown to be epiphenomena.Such ideas were pursued with a good deal of success from the early 1960s,leading to the P&P approach, which assumed that the limit can in fact beattained: the hypothesis is that all principles are assigned to UG and that language variation is restricted to certain options as to how the principles apply.If so, then rule systems are eliminable, at least for the core of the language.To illustrate, consider again (1ce), repeated here.(1) c. what do you think that Mary fixed td. what do you wonder whether Mary fixed te. how do you wonder whether Mary fixed the car tThe goal is to show that the question words move from the position of t by ageneral principle that allows movement quite freely, with the options, interpretations, and varying status determined by the interaction of this principlewith others.

22

Chapter 1

What is the status of the rules (I) (phrase structure) and (II) (transformational) under this conception? The transformational rules still exist, but onlyas principles of UG, freely applicable to arbitrary expressions. Such devicesappear to be unavoidable in one or another form, whether taken to be operations forming derivations or relations established on representations. As forphrase structure rules, it appears that they may be completely superfluous.That would not be too surprising. With the advantage of hindsight we can seethat, unlike transformational rules, they were a dubious device to begin with,recapitulating information that must be presented, ineliminably, in the lexicon.For example, the fact that persuade takes a noun phrase (NP) and clausalphrase (CP) as complements, as a lexical property, requires that there be phrasestructure rules yielding VNPCP as an instantiation of the phrase XP headedby the verb persuade; and completely general properties require further thatXP must be VP (verb phrase), not, say, NP. The apparent eliminability ofphrase structure rules became clear by the late 1960s, with the separation ofthe lexicon from the computational system and the development of X-bartheory (section 1.3.2).The issues can be sharpened by considering two properties that descriptive statements about language might have or lack. They may or maynot be language-particular; they may or may not be construction-particular.The statements of traditional grammar are typically both language- andconstruction-particular, and the same is true of the rules of early generativegrammar. Consider the rule analyzing VP as VNP, or the rules fronting thequestion word in different ways in (1ce). Spelled out in full detail, thesephrase structure and transformational rules are specific to English and to theseconstructions. There are few exceptions to this pattern.The P&P approach aims to reduce descriptive statements to two categories: language-invariant, and language-particular. The language-invariantstatements are principles (including the parameters, each on a par with a principle of UG); the language-particular ones are specifications of particularvalues of parameters. The notion of construction, in the traditional sense,effectively disappears; it is perhaps useful for descriptive taxonomy but hasno theoretical status. Thus, there are no such constructions as Verb Phrase,or interrogative and relative clause, or passive and raising constructions.Rather, there are just general principles that interact to form these descriptiveartifacts.The parametric options available appear to be quite restricted. An assumption that seems not unrealistic is that there is only one computational systemthat forms derivations from D-Structure to LF; at some point in the derivation (S-Structure), the process branches to form PF by an independent

The Theory of Principles and Parameters

23

phonological derivation (as in (4)). Options would then be restricted to

two cases: (1) properties of the lexicon, or (2) the point in the derivation(4) from D-Structure to LF at which structures are mapped to PF (S-Structure)(Stowell 1986).In the category (1), apart from Saussurean arbitrariness and some limitedvariety in the choice of substantive elements, we have options as to how nonsubstantive (functional) elements are realized (Borer 1984, Fukui 1986, Speas1986) and variations in global properties of heads (e.g., do verbs precede orfollow their complements?) (Travis 1984).In the category (2) we find, for example, languages with overt movementof question phrase (English, Italian, etc.) and languages without overt movement (Chinese, Japanese, etc.). In these in-situ languages, with the questionphrase in the position that would be occupied by a trace in languages withovert movement, there is good evidence that similar movement operationstake place, but only in the mapping from S-Structure to LF, with no indicationin the physical form itself; the branch point at which PF is formed fromS-Structure precedes these operations in the derivation (4) from D-Structureto LF (Huang 1982, Lasnik and Saito 1984, 1992). Similarly, we find languages with overt manifestation of grammatical case (Greek, German, Japanese, etc.) and others with virtually no such manifestation (English, Chinese,etc.). But again, there is good reason to believe that the case systems are basically similar cross-linguistically and that the differences lie primarily in theirphonetic realization (the mapping to PF).The general expectation, for all constructions, is that languages will be verysimilar at the D-Structure and LF levels, as in the examples just discussed. Itis unlikely that there are parameters that affect the form of LF representationor the computational process from S-Structure to LF; little evidence is available to the language learner bearing on these matters, and there would be noway for values to be determined with any reliability. Accordingly, any variations at the LF level must be reflexes of D-Structure parameter settings, or ofvariations in the mapping from D-Structure to S-Structure to the extent thatits properties are determined from inspection of PF forms. D-Structure, in turn,reflects lexical properties; these too appear to be limited in variety insofar asthey affect the computational system. At the PF level, properties of the language can be readily observed and variation is possible within the fixed repertoire of phonetic properties and the invariant principles of universal phonetics.S-Structures are not constrained by interface conditions and can vary withinthe range permitted by the variation of the interface levels, the branch pointto the PF mapping, and any independent conditions that may hold ofS-Structure.

24

Chapter 1

The principles that have been investigated fall into two general categories:principles that are applied to construct derivations (transformational operationsand conditions on the way they operate), and principles that apply to representations (licensing conditions). The transformational operations are movement (adjunction, substitution), deletion, and perhaps insertion; we may thinkof these as instances of the general operation Affect , arbitrary (Lasnik andSaito 1984). Conditions of locality and others constrain the application andfunctioning of these operations. Licensing conditions at the external interfacelevels PF and LF establish the relation of language to other faculties ofthe mind/brain. D-Structure conditions specify the manner in which lexicalproperties are expressed in grammatical structures. That there should beS-Structure conditions is less obvious, but it seems that they may exist (seesection 1.3.3).The principles have further structure. There are natural groupings intomodules of language (binding theory, -theory, Case theory, etc.). Certainunifying concepts enter into many or all modules: conditions of locality, geometrical properties defined on phrase markers, and so on. There are alsocertain general ideas that appear to have wide applicability, among them,principles of economy stating that there can be no superfluous symbols inrepresentations (the principle of Full Interpretation, FI) or superfluous stepsin derivations (Chomsky 1986b, chapters 24 of this book). As these principlesare given an explicit formulation, they become empirical hypotheses withspecific import and range.The principle FI is assumed as a matter of course in phonology; if a symbolin a representation has no sensorimotor interpretations, the representation doesnot qualify as a PF representation. This is what we called the interface condition. The same condition applied to LF also entails that every element of therepresentation have a (language-independent) interpretation. There can, forexample, be no true expletives, or vacuous quantifiers, at the LF level. Theprinciple of economy of derivation requires that computational operationsmust be driven by some condition on representations, as a last resort toovercome a failure to meet such a condition. Interacting with other principlesof UG, such economy principles have wide-ranging effects and may, whenmatters are properly understood, subsume much of what appears to be thespecific character of particular principles.The shifts in focus over the years alter the task of inquiry considerably andyield different conceptions of what constitutes a real result in the study oflanguage. Suppose we have some collection of phenomena in a particularlanguage. In the early stages of generative grammar, the task was to find a rulesystem of the permitted form from which these phenomena (and infinitely

The Theory of Principles and Parameters

25

many others) could be derived. That is a harder task than the ones posed inpregenerative grammar, but not an impossible one: there are many potentialrule systems, and it is often possible to devise one that will more or lessworkthough the problem of explanatory adequacy at once arises, as noted.But this achievement, however difficult, does not count as a real result ifwe adopt the P&P approach as a goal. Rather, it merely sets the problem. Thetask is now to show how the phenomena derived by the rule system can bededuced from the invariant principles of UG with parameters set in one of thepermissible ways. This is a far harder and more challenging task. It is animportant fact that the problem can now be posed realistically, and solved ininteresting ways in some range of cases, with failures that are also interestinginsofar as they point the way to better solutions. The departure from the longand rich tradition of linguistic inquiry is much sharper and more radical thanin early generative grammar, with problems that are quite new and prospectsthat appear promising.Other traditional problems also assume a different form under a P&Papproach. Questions of typology and language change will be expressed interms of parameter choice (Lightfoot 1991). The theory of language acquisition will be concerned with acquisition of lexical items, fixing of parameters,and perhaps maturation of principles (Hyams 1986, Roeper and Williams1987, Borer and Wexler 1987; Chien and Wexler 1991, Crain 1991, Pierce1992). It might turn out that parsers are basically uniform for all languages:the parsers for English and Japanese would differ only in that parameters areset differently (Fong 1991). Other issues would also require some rethinking,if this approach turns out to be correct.Much of the most fruitful inquiry into generative grammar in the past yearshas pursued the working hypothesis that UG is a simple and elegant theory,with fundamental principles that have an intuitive character and broad generality. By dissolving the notion of construction and moving toward rule-freesystems, the P&P approach carries this tendency considerably forward. Arelated assumption is that UG is nonredundant, in the sense that phenomenaare explained by interaction of principles in one particular way. Discovery thatphenomena are overdetermined has commonly been taken to indicate a theoretical deficiency that should be overcome by new or refined principles. Theseworking hypotheses have proven successful as a guide to inquiry, leading tothe discovery of a vast range of empirical phenomena in widely varied languages and to forms of explanation that much exceed what could be contemplated not many years ago. These are rather surprising facts. The guiding ideasresemble those often adopted in the study of inorganic phenomena, wheresuccess has often been spectacular since the 17th century. But language is a

26

Chapter 1

biological system, and biological systems typically are messy, intricate, theresult of evolutionary tinkering, and shaped by accidental circumstances andby physical conditions that hold of complex systems with varied functions andelements. Redundancy is not only a typical feature of such systems, but anexpected one, in that it helps to compensate for injury and defect, and toaccommodate to a diversity of ends and functions. Language use appears tohave the expected properties; as noted, it is a familiar fact that large parts oflanguage are unusable, and the usable parts appear to form a chaotic andunprincipled segment of the full language. Nevertheless, it has been a fruitfulworking hypothesis that in its basic structure, the language faculty has properties of simplicity and elegance that are not characteristic of complex organicsystems, just as its infinite digital character seems biologically rather isolated.Possibly these conclusions are artifacts reflecting a particular pattern ofinquiry; the range of completely unexplained and apparently chaotic phenomena of language lends credibility to such skepticism. Still, the progress thathas been made by the contrary stance cannot be overlooked.The P&P approach is sometimes termed Government-Binding (GB) Theory.The terminology is misleading. True, early efforts to synthesize current thinking in these terms happened to concentrate on the theories of government andof binding (Chomsky 1981a), but these modules of language stand alongsidemany others: Case theory, -theory, and so on. It may turn out that the conceptof government has a kind of unifying role, but there is nothing inherent to theapproach that requires this. Furthermore, insofar as the theories of governmentand binding deal with real phenomena, they will appear in some form in everyapproach to language; this approach has no special claim on them. Determination of the nature of these and other systems is a common project, not specificto this particular conception of the nature of language and its use.1.2

The Lexicon

A person who has a language has access to detailed information about wordsof the language. Any theory of language must reflect this fact; thus, any theoryof language must include some sort of lexicon, the repository of all (idiosyncratic) properties of particular lexical items. These properties include a representation of the phonological form of each item, a specification of its syntacticcategory, and its semantic characteristics. Of particular interest in this discussion are the s(-emantic) selection and thematic properties of lexical heads:verbs, nouns, adjectives, and pre- or postpositions. These specify the argument structure of a head, indicating how many arguments the head licensesand what semantic role each receives. For example, the verb give must be

The Theory of Principles and Parameters

27

specified as assigning an agent role, a theme role, and a goal/recipient role. In

(5) John, a book, and Mary have these respective thematic (-) roles.(5) John gave a book to MaryThe association between assigned -roles and argument positions is to a largeextent predictable. For example, agent is apparently never assigned to acomplement. And to the extent that the association is predictable rather thanidiosyncratic, it need not (hence, must not) be stated in particular lexicalentries.This conception of the lexicon is based on that developed in the 1960s(Chomsky 1965), but it departs from it in certain respects. There, subcategorization and selectional conditions played a central role. The former conditionsstate for a lexical head what phrasal categories it takes as complementsforexample, that kick takes an NP complement. The latter conditions specifyintrinsic semantic features of the complement(s) and subject. In this case theNP complement of kick is [+ concrete]. It was noted in section 1.1 that phrasestructure rules are (largely) redundant with subcategorization, hence are(largely) eliminable. But now note that subcategorization follows almostentirely from -role specification. A verb with no -role to assign to a complement will not be able to take a complement. A verb with (obligatory) -rolesto assign will have to occur in a configuration with enough arguments (possibly including complements) to receive those -roles. Further, at least in part,selectional restrictions will also be determined by thematic properties. Toreceive a particular -role, the inherent semantic features of an argument mustbe compatible with that -role.These tentative conclusions about the organization of the lexicon raiseimportant questions about the acquisition of lexical knowledge. Suppose thatsubcategorization (c-selection) is artifactual, its effects derived from semanticproperties (s-selection). It is reasonable to ask whether this is a consequenceof the acquisition procedure itself (Pesetsky 1982). Pesetsky (developingideas of Grimshaw (1979)) suggests that this must be so. He compares theprimitives of c-selection (syntactic categories such as NP, CP, etc.) with thoseof -theory (agent, patient, goal, etc.) and argues that the latter, but notthe former, meet what we may call the condition of epistemological priority.That is, they can plausibly be applied by the learner to provide a preliminary,prelinguistic analysis of a reasonable sample of data and thus can provide thebasis for development from the initial state to the steady state. This is anattractive line of reasoning, but, given our current understanding of theseissues, it is not conclusive. While it does seem correct that the primitives ofc-selection do not have epistemological priority, it is not at all clear that those

28

Chapter 1

of s-selection do have such a status. Although the notion agent of an action

is possibly available to the child in advance of any syntactic knowledge, it isless clear that the -theoretic notion agent of a sentence is. That is, beforethe child knows anything about the syntax of his or her language (beyond whatis given by UG), can the child determine what portion of a sentence constitutesthe agent? Further, the evidence available to the learner likely consists ofsentences rather than simply individual verbs in isolation. But such sentencesexplicitly display c-selection properties: they exhibit verbs along with theircomplements. Thus, the child is simultaneously presented with evidencebearing on both s-selection (given that sentences are presented in context, andassuming that the relevant contexts can be determined) and c-selection. It isreasonable to assume that both aspects of the evidence contribute to the development of the knowledge. Alongside the state of affairs outlined by Pesetsky,the converse situation with c-selection evidence in fact providing informationabout the meanings of verbs (Lasnik 1990, Gleitman 1990) might also obtain.For example, exposure to a sentence containing a clausal complement to anunfamiliar verb would lead the learner to hypothesize that the verb is one ofpropositional attitude.This scenario is not necessarily in conflict with Pesetskys initial point aboutthe organization of lexical entries. The means by which knowledge is arrivedat is not invariably reflected in the form that the knowledge ultimately takes.For example, Grimshaw (1981) argues that the acquisition of the syntacticcategory of a lexical item is based in part on the notion canonical structuralrealization (CSR). The CSR of a physical object is N, that of an action is V,and so on. In the absence of evidence, the child will assume that a wordbelongs to its CSRthat, say, a word, referring to an action is a verb. AsGrimshaw indicates, while such semantic bootstrapping might constitutepart of the acquisition procedure, the resulting steady-state lexicon has no suchrequirement. Languages commonly have nouns, like destruction, referring toactions (as well as verbs, like be, that dont refer to actions).Note that this consideration indicates that lexical entries contain at leastsome syntactic information, in addition to the phonological and semanticinformation that surely must be present. Grimshaw argues that further syntactic specification is needed as well, c-selection in addition to s-selection.To consider one example, Grimshaw observes that the semantic categoryquestion can be structurally realized as either a clause, as in (6), or an NP,as in (7).(6) Mary asked [what time it was](7) Mary asked [the time]

The Theory of Principles and Parameters

29

The verb ask semantically selects a question. Grimshaw argues that it is alsonecessary to specify that it c-selects clause or NP in order to distinguish itfrom wonder, which only takes a clause (where * indicates deviance).(8)

Mary wondered [what time it was]

(9) *Mary wondered [the time]

Since, as suggested above, it is possible to reduce most of c-selection tos-selection, the question arises whether such reduction might somehow beavailable in this instance as well. Pesetsky argues that it is. As we will see insection 1.4.3, NPs must receive abstract Case from a Case assigner whileclauses need not. (Henceforth, we will capitalize the word Case in its technicalusage.) Given this, Pesetsky proposes that the difference between ask andwonder need not be stated in terms of c-selection, but rather follows from aCase difference: ask assigns objective Case but wonder does not. In this regard,wonder patterns with adjectives, which also do not assign objective Case.(10)

Mary is uncertain [what time it is]

(11) *Mary is uncertain [the time]

Pesetsky presents further evidence for this Case-assigning distinction betweenverbs like ask and those like wonder. In English, generally only objectiveCase-assigning verbs can occur in the passive. Given this, (6) and (8) contrastin precisely the predicted fashion.(12)

it was asked what time it was

(13) *it was wondered what time it was

As Pesetsky notes, a descriptive generalization pointed out by Grimshaw nowfollows: among the verbs that s-select questions, some c-select clause or NPwhile others c-select only clause; none c-select only NP. There are Caseassigning differences among verbs, and these are relevant to c-selection of NP(because of the Case requirement of NPs), but not of clauses.This reduction seems quite successful for a wide range of cases, but it isimportant to note that formal syntactic specifications in lexical entries havenot been entirely eliminated in favor of semantic ones. Whether or not a verbassigns objective Case is, as far as is known at present, a purely formal property not deducible from semantics. While much of c-selection follows froms-selection, there is a syntactic residue, statable, if Pesetsky is correct, in termsof lexically idiosyncratic Case properties.We will introduce further properties of the lexicon as required by theexposition.

30

Chapter 1

1.3

Computational System

1.3.1

General Properties of Derivations and Representations

The generative procedure that constitutes the (I-) language consists of a lexiconand a computational system. In section 1.2 we outlined some properties of thelexicon. We now turn to the computational system. Under the general assumptions of section 1.1, we consider the four levels of representation of the ESTsystem and the relations that hold among them, focusing attention on narrowsyntax, that is, the derivation relating D-Structure, S-Structure, and LF.D-Structure, LF, and PF are interface levels, which satisfy the general condition FI in a manner to be made precise. Each level is a symbolic system,consisting of atomic elements (primes) and objects constructed from them byconcatenation and other operations. We take these objects to be phrase markersin the familiar sense (represented conventionally by trees or labeled bracketing). Each prime is a feature complex, though for orthographic conveniencewe will generally use conventional symbols. For concreteness, take categoriesto be as in (14), for nouns, verbs, adjectives, and pre- and postpositions,respectively.(14) a.b.c.d.

N = [+N, V]V = [N, +V]A = [+N, +V]P = [N, V]

The feature [+ N] is the traditional substantive; the feature [+ V], predicate.

The primes constituting the terminal string of a phrase marker are drawnfrom the lexicon; others are projected from these heads by operations of thecomputational system. Elements that project no further are maximal projections. In informal notation, XP is the maximal projection from the terminalcategory X; thus, NP is the maximal projection of its head N, and so on. Seesection 1.3.2.The two basic relations of a phrase marker are domination and linearity. Inthe phrase marker (15) we say that B dominates D and E, C dominates F andG, and A dominates all other categories (nodes). Furthermore, B precedes C,F, and G; D precedes E, C, F, and G; and so on.(15)

ABD

CE

The Theory of Principles and Parameters

31

If X is a head, its sister is its complement; thus, if D and F are heads, thenE and G are their complements in (15). We assume that ordering relations aredetermined by a few parameter settings. Thus, in English, a right-branchinglanguage, all heads precede their complements, while in Japanese, a leftbranching language, all heads follow their complements; the order is determined by one setting of the head parameter. Examples below that abstractfrom particular languages are usually to be interpreted independently ofthe order given. Domination relations are determined by general principles(section 1.3.2).One fundamental concept that applies throughout the modules of grammaris command (Klima 1964, Langacker 1969, Lasnik 1976, Reinhart 1976,Stowell 1981, Aoun and Sportiche 1981). We say that c-commands if does not dominate and every that dominates dominates . Thus, in (15)B c-commands C, F, G; C c-commands B, D, E; D c-commands E and conversely; F c-commands G and conversely. Where is restricted to maximalprojections, we say that m-commands .A second fundamental concept is government (Chomsky 1981a, 1986a,Rizzi 1990), a more local variety of command to which we return insection 1.4.1.Given the language L, each SD is a sequence (, , , ), these being phrasemarkers drawn from the levels PF, LF, D-Structure, and S-Structure, respectively. The element reflects properties of the items selected from the lexiconas these are interpreted by the principles of UG, with the parameters fixed forL. The elements and are formed by successive application of operationsof the computational system to ; they will have the properties of , as modified by these operations. The PF representation is a string of phonetic primeswith syllabic and international structure indicated, derived by a computationfrom . We assume that the primes themselves are not modified in the courseof the derivation from to .A typical lexical entry consists of a phonological matrix and other features,among them the categorial features N, V, and so on; and in the case of Ns,Case and agreement features (person, number, gender), henceforth -features.In principle, any of these features may be lacking. In one case of particularinterest, the entire phonological matrix is lacking. In this case the element isan EC (empty category). Among these ECs we have the elements e of (16),(17); we use * to indicate severe deviance, ? a weaker variety.(16) a. John expected [e to hurt himself]b. it is common [e to hurt oneself](17) *e arrived yesterday (he arrived yesterday)

32

Chapter 1

We refer to the EC of (16) as PRO, an element that can be controlled by its

antecedent (John, in (16a)) or can be arbitrary in interpretation, as in (16b).Possibly the latter is also a case of control by an EC occupying the same position as for us in (18) (Epstein 1984).(18) it is convenient for us [for others to do the hard work]If so, PRO is always controlled. See section 1.4.2.The EC of (17) is a pronominal element, henceforth pro. It is not permittedin this position in English; the counterpart would be grammatical in Italian, anull subject language. On factors relevant to fixing the parameters, see Rizzi1982, 1986a, Huang 1984, Borer 1984, Jaeggli and Safir 1989. This EC actsmuch in the manner of an ordinary pronoun, having reference fixed by contextor by some antecedent in an appropriate position. The structural relations of(antecedent, pro) pairs are, furthermore, generally like those of (antecedent,pronoun) and unlike those of control. For example, in a null subject languagewe find the equivalents of (19ab), analogous to the pair (19cd) (John takento be the antecedent of pro, he).(19) a.b.c.d.

the people that pro taught admired John

*pro admired Johnthe people that he taught admired John*he admired John

The behavior of pro and he is similar, while PRO can never appear in thesepositions.A third type of EC, not drawn from the lexicon but created in the course ofa derivation, is illustrated in (20).(20) a. I wonder [who John expected [e to hurt himself]]b. John was expected [e to hurt himself]We refer to this EC as trace (t), a relational notion trace-of X, where X is themoved element serving as the antecedent binding the trace. Thus, John bindse in (20b) much as e binds the reflexive or as they binds the reciprocal in (21),in turn binding the reflexive.(21) they expected [each other to hurt themselves]In (20a) e is the trace of the NP who. The trace functions as a variable boundby who, understood as a restricted quantifier: for which e, e a person. Here,e in turn binds himself, just as each other binds themselves in (21) and Billbinds himself in (22), with Bill substituting for the variable of (20a).(22) John expected [Bill to hurt himself]

The Theory of Principles and Parameters

33

In (20a) both e and himself function as variables bound by the restricted

quantifier, so that the LF form would be interpreted I wonder [for which e, ea person, John expected e to hurt e]. Note that we are using the term bindhere to cover the association of an antecedent with its trace quite generally,including the case of the (syntactic) binding of a variable by a quantifier-likeelement; and we also use the term, at LF, in the sense of quantifier-variablebinding.In (20b) the verb was is composed of the lexical element be and the inflectional elements [past, 3 person, singular]. Assume now that the process ofcomposition adjoins the copula to the inflectional elements (raising). Recallthat there are two interpretations of this process: (1) raising of be to the inflection position of the sentence to construct the combined form [be + inflections],or (2) raising of [be + inflections] (= was, drawn from the lexicon with itsfeatures already assigned) to the inflection position, where the features arechecked. Either way, we have a second trace in (20b) = (23).(23) John was e2 expected [e1 to hurt himself]The EC e2 is the trace of be or was; e1 is the trace of John, binding himself.In each case the trace occupies the position from which its antecedent wasmoved. For concreteness, we adopt the checking theory (2), so that we havewas raising in (23).Raising of was to the inflection position is necessary to check inflectionalproperties. The same is true of the other inflected verbs, for example, wonderin (20a), which is [present, 1 person, singular]. Thus, a fuller (though still onlypartial) representation would be (24), where e1 is the trace of wonder.(24) I wonder e1 [who John expected [e2 to hurt himself]]There is reason to believe that in English (24) is an LF representation, whilethe counterpart in other similar languages (e.g., French) is an S-Structurerepresentation; (23) and its counterparts are S-Structure representations inboth kinds of language (Emonds 1978, Pollock 1989). Thus, English auxiliaries raise at S-Structure but main verbs raise only at LF, while the corresponding French elements all raise at S-Structure. English and French wouldthen be identical in relevant respects at D-Structure and LF, while differingat S-Structure, with English (25a) (corresponding to the basically sharedD-Structure) versus French (25b) (corresponding to the basically sharedLF form).(25) a. John often [kisses Mary]b. Jean embrasse souvent [t Marie]Jean kissesoftenMarie

34

Chapter 1

Informally, the trace functions throughout as if the antecedent were in that

position, receiving and assigning syntactic and semantic properties. Thus, e isin the normal position of the antecedent of a reflexive in both (20a) and (20b).And in (25b), the trace is the verbal head of VP, assigning a particular semanticrole and grammatical Case to its nominal object.Note that PRO and trace are quite different in their properties. Thus, anelement that controls PRO is an independent argument in the sense of section1.2, assigned an independent semantic role; but an element that binds a traceis not. Compare (16a) and (20b), repeated here:(26) a. John expected [e to hurt himself]b. John was expected [e to hurt himself]In (26a) John is the subject argument of expected, exactly as in (22); the ECcontrolled by John has its independent function as subject of hurt. In (26b),in contrast, John has no semantic role other than what it inherits from itstrace, as subject of hurt. Since the subject of is expected is assigned no independent argument role, it can be a nonargument (an expletive), as in (27).(27) there is expected [to be an eclipse tomorrow]Other differences of interpretation follow. Compare, for example, (28a)and (28b).(28) a. your friends hoped [e to finish the meeting happy]b. your friends seemed [e to finish the meeting happy]In (28a) your friends and e are independent arguments, assigned their semanticroles as subjects of hope and finish, respectively; therefore, the EC must bePRO, controlled by your friends. But seem assigns no semantic role to itssubject, which can again be an expletive, as in (29).(29) a. it seems [your friends finished the meeting happy]b. there seems [e to be a mistake in your argument]Accordingly, the EC in (28b) must be trace, with its antecedent your friendsreceiving its semantic role as an argument as if it were in that position. Weknow further that the adjective happy modifies the subject of its own clause,not that of a higher clause. Thus, in (30) happy modifies meeting, not yourfriends; the sentence means that your friends hoped that the atmosphere wouldbe happy when the meeting ends.(30) your friends hoped [the meeting would finish happy]In (28), then, happy modifies PRO in (a) and trace in (b). Example (28a) thusmeans that your friends had a certain wish: that they would be happy as the

The Theory of Principles and Parameters

35

meeting ends. But (28b) has roughly the meaning of (29a), with happy modifying your friends.Other differences of meaning also appear, as in (31a) and (31b) (Burzio1986).(31) a. one translator each was expected t to be assigned t to the visitingdiplomatsb. one translator each hoped PRO to be assigned t to the visitingdiplomatsIn (31a) neither one translator each nor its trace t is in a position with independent argument status. Therefore, the argument phrase one translator eachis interpreted as if it were in the position of the trace t, with the argumentstatus of object of assigned; the meaning is that it was expected that onetranslator each would be assigned to the visiting diplomats (i.e., each diplomatwould be assigned one translator). In (31b), in contrast, one translator eachand PRO are independent arguments; it is PRO, not one translator each, thatbinds t and is interpreted as if it were in that position. The subject one translator each is thus left without an interpretation, very much as it is in the similarconstruction (32).(32) one translator each hoped that he would be assigned to the visitingdiplomatsAlthough the argument status of the antecedent of a trace is determined inthe position of the trace, the antecedent may still have an independent semanticrole in other respects. Compare the examples of (33).(33) a.b.c.d.

*it seems to each other [that your friends are happy]

your friends seem to each other [t to be happy]it seems [that all your friends have not yet arrived]all your friends seem [to have not yet arrived]

In (33a) your friends cannot bind the reciprocal each other, but it can in (33b),thus functioning in its overt position, not that of its trace. In (33c) and (33d)the overt positions are relevant for determining scopal properties: thus, only(33c) can mean that it seems that not all your friends have arrived, with nottaking scope over all. We see, then, that scopal properties and argument statusare determined in different ways for antecedent-trace constructions. Such factsas these ought to fall out as consequences of the theory of ECs and semanticinterpretation. See section 1.4.2.PRO and trace also differ in their syntactic distribution. Thus, in (34) wesee the properties of control, with the antecedent and PRO functioning as

36

Chapter 1

independent arguments; but the properties of trace, with only one argument,cannot be exhibited in the analogous structures, as (35) illustrates.(34) a. John asked whether [PRO to leave]b. John expected that it would be fun [PRO to visit London](35) a. *John was asked whether [t to leave]b. *John was expected that it would be fun [t to visit London]In fact, trace and PRO do not overlap in their distribution; the facts should,again, fall out of the theory of ECs.We also allow a fourth type of EC, one that has only the categorial features[ N, V], projecting in the usual way. They serve only as targets for movement, to be filled in the course of derivation. Since these elements have nosemantic role, they will not satisfy the condition FI at D-Structure (as we willsharpen this below), and we may tentatively assume that they and the structuresprojected from them are inserted in the course of derivation, in a mannerpermitted by the theory of phrase structure. See section 1.4.3 for furthercomment.If these kinds of EC are indeed distinct, then we expect them to differ infeature composition (Chomsky 1982, Lasnik 1989). Optimally, the featuresshould be just those that distinguish overt elements. As a first approximation,suppose that overt NPs fall into the categories anaphor (reflexives, reciprocals), pronoun, and r-expression (John, the rational square root of 2, and otherexpressions that are quasi-referential in the internalist sense of section 1.1).We might assume, then, that we have two two-valued features, [anaphor] and[pronominal], with potentially four categories.(36) a.b.c.d.

[+anaphor,[anaphor,[anaphor,[+anaphor,

pronominal]+pronominal]pronominal]+pronominal]

An expression that is [+anaphor] functions referentially only in interaction

with its antecedent; the reference of an expression that is [+pronominal] maybe determined by an antecedent (but it does refer). Reflexives and reciprocalsthus fall into category (36a) and pronouns into category (36b). The thirdcategory contains elements that refer but are not referentially dependent:r-expressions. The four ECs discussed above would have the typology of (37).(37) a.b.c.d.

Trace of NP is [+anaphor, pronominal].

Pro is [anaphor, +pronominal].Trace of operator (variable) is [anaphor, pronominal].PRO is [+anaphor, +pronominal].

The Theory of Principles and Parameters

37

Thus, trace of NP is nonreferential, pro has the properties of pronouns, and

variables are referential in that they are placeholders for r-expressions.Controlled PRO falls into category (37d), hence all PRO if apparent uncontrolled PRO actually has a hidden controller (see (18)). We would expect, then,that trace of NP, pro, and variable would share relevant properties of overtanaphors, pronouns, and r-expressions, respectively. Such elements as Englishone, French on, German man might be partial overt counterparts to PRO,sharing the modal interpretation of arbitrary PRO and its restriction to subjectposition (Chomsky 1986b).These expectations are largely satisfied, when we abstract away from otherfactors. Thus, the structural relation of a trace to its antecedent is basically thatof an anaphor to its antecedent; in both cases the antecedent must c-commandthe trace, and other structural conditions must be met, as illustrated in (38),with the examples kept slightly different to avoid factors that bar the unwantedstructures.(38) a. i.ii.b. i.ii.c. i.ii.

John hurt himself

John was hurt t*himself thought [John seems to be intelligent]*t thought [John seems that it is raining]*John decided [himself left early]*John was decided [t to leave early]

These properties sharply restrict the options for movement of NPs: raisingnot lowering, object-to-subject but not conversely, and so on (Fiengo1977).Similar but not quite identical conditions hold of PRO. Thus, theC-Command Condition is illustrated by (39).(39) a. John expects [PRO to hurt himself]b. *[Johns mother] expects [PRO to hurt himself]c. *John expects [PRO to tell [Marys brother] about herself]In (39c) PRO is in a position to bind herself but the C-Command Conditionrequires that its antecedent be John, not Mary.Similarly, variables share relevant properties of r-expressions, as expected.(40) a. i. They think [John will leave tomorrow]ii. I wonder who they think [t will leave tomorrow]b. i. *it seems [John to be intelligent]ii. *I wonder who it seems [t to be intelligent]c. i. he thinks [John is intelligent]ii. I wonder who [he thinks [t is intelligent]]

38

Chapter 1

iii. John thinks [he is intelligent]

iv. I wonder who [t thinks [he is intelligent]]In (40a) the name and the variable appear as Case-marked subjects of finiteclauses, and the expressions are well formed, satisfying the Case-markingcondition on r-expressions, to which we return directly. In (40b) the name andthe variable appear as subjects of infinitives lacking Case, and the expressionsare severely deviant. In (40ci) he is not referentially bound by John (we cannottake he to refer to John, as we may in (40ciii)); and in the parallel structure(40cii) he and the variable t are unrelated referentially (we cannot take he tobe a variable bound by the operator who, which binds t, as we may in (40civ)).Again, many conditions on movement fall out as special cases.These ECs also have other features of overt expressions, specifically,-features. Thus, the trace in (20a) has the features [masculine, singular];hence the choice of overt anaphor.An EC lacking the typological features of (37) or -features is uninterpretable, hence impermissible at LF by the principle FI. Such an element, identified only by its categorial features (NP, V, etc.), may appear in the course ofa derivation, but only as a position to be filled or otherwise eliminated.It is an open question whether movement always leaves a trace, and whether,when it does, there are independent reasons for this. For the purposes of exposition, we tentatively assume that movement of an element always leaves atrace and, in the simplest case, forms a chain (, t), where , the head of thechain, is the moved element and t is its trace. The chain is an X-chain if itshead has the property X; we return to relevant choices of X. The elementssubject to interpretation at the interface level LF are chains (sometimes onemembered), each an abstract representation of the head of the chain.The movement operation (henceforth Move ) is an invariant principle ofcomputation, stating that a category can be moved to a target position. We takethe moved category and the target to be primes (lexical items, EC targets formovement, or projections from these minimal elements), with two options:either the moved category replaces the target (substitution), or it adjoinsto it (adjunction), as in (41) (order irrelevant, t the trace of , 1 and 2 twooccurrences of ).(41)1

t2X

The Theory of Principles and Parameters

39

Any further constraints on movement will be derivative from other principles,

including conditions on representations.There are two natural interpretations of the elements formed by adjunction:we might assume that each occurrence of in (41) is a category in its ownright (Lasnik and Saito 1992) or that together they form a single category [1,2] with the two occurrences of as its segments (May 1985, Chomsky 1986a).Empirical differences follow, as usual, as further theoretical structure isarticulated.The segment-category distinction requires a sharpening of the concepts ofdominance and those derived from it (command, etc.). Let us say that thecategory [l, 2] in (41) includes X, excludes t, and contains (and whateveris dominated by these elements). We restrict domination to inclusion. Thus,[l, 2] dominates only X. We say that a segment or category covers if itcontains , includes , or = . Defining the command relations as before, c-commands t in (41), since it is not dominated (only contained) by ; but Yincluded in does not. We carry over the properties of head and command tothe postadjunction structure. Thus, if was the head of the preadjunctioncategory and c-commanded , then in the postadjunction structure [l, 2], remains the head and c-commands . Where no confusion arises, we willrefer to the postadjunction category [l, 2] simply as .Substitution is constrained by a UG principle of recoverability of deletion, which requires that no information be lost by the operation; thus, may substitute for only if there is no feature conflict between them.The target of substitution will therefore always be an EC with the samecategorial features as the moved category (the structure-preserving hypothesisof Emonds 1976). A similar property holds for adjunction, it appears (seesection 1.3.3).Move permits multiple (successive-cyclic) movement, as in (42), derivedfrom the D-Structures (43), with the targets of movement inserted.(42) a. John seems [t to have been expected [t to leave]]b. I wonder [who John thought [t Bill expected [t to leave]]](43) a. e seems [e to have been expected [John to leave]]b. I wonder [e John thought [e Bill expected [who to leave]]]In (42a) we have the chain (John, t, t) with the links (John, t) and (t, t); in(42b) the chain (who, t, t), also with two links. The heads of the chains areJohn, who, respectively.We have so far assumed that the operation Move forms a single link of achain. Alternatively, we might assume that the operation is not Move but

40

Chapter 1

rather Form Chain, an operation that forms the full chains of (42) from theD-Structures (43) in a single step. Within a richer theoretical context, thedistinction may be more than merely notational (see chapter 3). We tentativelyassume the more conventional Move interpretation. The operation Move satisfies narrow locality conditions. Suppose that the position of the intermediate trace t in (42) is filled, as in (44), so that the chain must be formed with asingle link, skipping the blocked position (occupied by it, whether, whether,respectively).(44) a. *John seems that [it was expected [t to leave]]b. ?what did John remember [whether Bill fixed t]]c. *how did John remember [whether Bill fixed the car t]]The chains (John, t), (what, t), (how, t) violate the locality conditions, andthe expressions are deviant, though in strikingly different ways, facts thatdemand explanation in terms of properties of UG. Note that in case (44c) it isthe PF form with this interpretationthat is, with how construed in the position of the tracethat is deviant; if how is construed with remember, there isno deviance. The single PF form has two distinct SDs, one sharply deviant,the other not.Recall that each element must have a uniform, language-independent interpretation at the interface level LF (the principle FI). Some elements are arguments assigned specific semantic roles (-roles), such as agent and goal (seesection 1.2); overt anaphors, PRO, and r-expressions (including variables) areall arguments. Expletives (e.g., English there, Italian ci) are assigned no roles. Some elements (e.g., English it, French il, Italian pro) may ambiguouslyserve as arguments or expletives. By FI, expletives must be somehow removedat LF (section 1.3.3).An argument must receive a -role from a head (-marking). An argumentmay also receive a semantic role (whether to be considered a -role or not isa theory-internal question that we put aside) by predication by an XP (seeWilliams 1980), possibly an open sentence (e.g., the relative clause of (45),with a variable position t).(45) the job was offered to Mary, [who everyone agreed t had the bestqualifications]Other XPs (adjuncts, such as adverbial phrases) assign a semantic role to apredicate, a head, or another adjunct. As illustrated in (44bc), movement ofadjuncts and arguments has quite different properties (Huang 1982, Kayne1984, Lasnik and Saito 1984, 1992, Aoun 1986, Rizzi 1990, Cinque 1990). A-position is a position to which a -role is assigned. The elements receiving

The Theory of Principles and Parameters

41

interpretation at LF are chains. Hence, each argument chain (46) must containat least one -position.(46) (1, ... , n)Furthermore, n, the position occupied by 1 at D-Structure, must be a-position. The reason lies in the interpretation of D-Structure as a grammatical realization of lexical properties. Accordingly, -marking must take placeat D-Structure: an element, moved or not, will have at LF exactly the -markingproperties (assigning and receiving -roles) that it has at D-Structure. Fromthe same consideration, it follows that nothing can move into a -position,gaining a -role that was not assigned to it at D-Structure. Thus, a chain canhave no more than one -position, though any number of semantic roles maybe assigned in this position. In (47), for example, the wall receives a semanticrole from both paint and red.(47) we painted the wall redThe theory of Case (section 1.4.3) requires that every argument have abstractCase (possibly realized overtly in one or another way, depending on specificmorphological properties of the language). Hence, an argument chain (46)must have one and only one -position (namely, n) and at least one positionin which Case is assigned (a Case position). Following Joseph Aoun, we mightthink of the function of Case as to make an argument chain visible for-marking. The Last Resort condition on movement (see section 1.1) requiresthat movement is permitted only to satisfy some condition, in particular, tosatisfy visibility (hence, FI). Once an element has moved to a Case position,it can move no further, all relevant conditions now being satisfied. It follows,then, that every argument chain must be headed by a Case position and mustterminate in a -position (the Chain Condition).Note that these conclusions hold only for arguments other than PRO, ananomaly to which we return in section 1.4.3. On the status of chains headedby expletives with regard to the Chain Condition, see section 1.3.3.We have so far considered chains that originate from an NP argument position of D-Structure. These fall into the two types illustrated in (42), repeatedhere.(48) a. John seems [t to have been expected [t to leave]]b. I wonder [who John thought [t Bill expected [t to leave]]]In (48a) we have, among others, the argument chain (John, t, t) and in (48b)the operator-variable chain (who, t, t).Chains may also originate from non-NP positions. One case, alreadymentioned, is the movement of a lexical category (head movement), as

42

Chapter 1

in (23), (24), repeated here, illustrating the raising of V to the inflectional

positions.(49) a. John was t expected to hurt himselfb. I wonder t who John expected to hurt himselfHere we have the chains (was, t) and (wonder, t), the latter an LF chain forEnglish.Head movement is also involved in formation of compound words in manylanguages. Suppose we were to form a causative verb meaning cause-to-fallfrom the underlying D-Structure (50) by adjoining fall to cause.(50)

VPV

CP

cause

books

Vfall

This operation yields the structure (51), t the trace of fall.

(51)

VPVVcause

CPfall

books

Vt

See Baker 1988. Here cause is the head of a two-segment verbal category, ifwe assume a segment theory of adjunction.A second kind of chain originating from a non-NP position arises frommovement of nonarguments (adjuncts, predicates), as in (52).(52) a.b.c.d.

[to whose benefit] would that proposal be t

[how carefully] does he expect to fix the car t[visit England], he never will t[as successful as Mary], I dont think that John will ever be t

In each case the bracketed nonargument is the antecedent of the trace; thechains, then, are ([to whose benefit], t), ([how carefully], t), ([visit England],

The Theory of Principles and Parameters

43

t), ([as successful as Mary], t), respectively. The questioned element in (52a)is really who; the rest is carried along because who cannot be extracted fromthe D-Structure position (53) (pied-piping; Ross 1967).(53) that proposal would be [to who + POSSESSIVE benefit]The natural interpretation reflects the D-Structure form; the meaning is forwhich person x, that proposal would be to xs benefit. There is evidence thatthe LF form should indeed be construed in something like this manner (seesection 1.3.3). Case (52b) might be interpreted similarly; thus, the interpretation would be for what degree x, he expects to fix the car [x carefully]. Wemight, then, argue that these are not really cases of movement of adjunctphrases as such, but rather of the question elements who, how, with the adjunctphrase carried along. We might conclude further that operator movement isthe only kind of movement to which adjunct phrases are subject, unlike arguments, which can form argument chains. The conclusion is supported by theobservation that although adjuncts can typically appear in many sentence positions, they are not interpreted as if they had moved from some more deeplyembedded position (Saito 1985). Thus, (54a) is not given the interpretation of(54b), as it would be if carefully in (54a) had been moved from the D-Structureposition of carefully in (54b).(54) a. carefully, John told me to fix the carb. John told me to [fix the car carefully]This suggests that (52b) might also be regarded as a kind of pied-piping, withthe moved element how carrying along the larger phrase how carefully. Seechapters 3 and 4.Within the theory of empty categories and chains, we can return to thequestion of directionality of interlevel relations raised in section 1.1. As notedthere, such questions are obscure at best, and become even more subtle underthe assumptions of trace theory. Consider again the S-Structure representations(42) derived from the D-Structure representations (43) (repeated here).(55) a. John seems [t to have been expected [t to leave]]b. I wonder [who John thought [t Bill expected [t to leave]]](56) a. e seems [e to have been expected [John to leave]]b. I wonder [e John thought [e Bill expected [who to leave]]]We now ask whether (55ab) are derived from (56ab), respectively, bymovement of John, who; or whether D-Structure is derived from S-Structureby algorithm (Sportiche 1983, Rizzi 1986b), so that D-Structure is, in effect,

44

Chapter 1

a derived property of S-Structure; or whether there is simply a nondirectional

relation between the paired expressions. These are alternative expressionsof the relation between S-Structure and the lexicon. All three approachesare transformational in the abstract sense that they consider a relationbetween a displaced element and the position in which such an element isstandardly interpreted; and in the case of (55b), the position in which it wouldbe overt at S-Structure in languages of the Chinese-Japanese variety (seesection 1.1). Such displacement relations are a fundamental feature of humanlanguage, which must be captured somehow. Apparent differences amongalternative formulations often dissolve, on inquiry, to notational questionsabout how this property is expressed; similar questions arise with regard toapparent differences between multilevel approaches and unilevel alternatives that code global properties of phrase markers in complex symbols(Chomsky 1951, Harman 1963, Gazdar 1981). In the present case the empirical distinguishability of the approaches turns on highly theory-internal considerations. We will continue to adopt the derivational approach of section 1.1.We assume that this is, at root, a question of truth and falsity, though asubtle one.To see some of the problems that arise, consider the locality conditions onMove . A general condition, illustrated in (44), is that the target of movementmust be the closest possible position, with varying effects depending on thekind of movement involved. The condition is very strict for head movement,which cannot pass over the closest c-commanding head (the Head MovementConstraint (HMC), a special case of more general principles; see section1.4.1). Thus, in (57) formation of (b) from the D-Structure (a), raising will tothe clause-initial position, satisfies the HMC; but raising of read to this position, crossing the possible target position occupied by will, violates the HMC,yielding the sharply deviant interrogative expression (57c).(57) a. John will read the bookb. will John t read the bookc. *read John will t the bookBut the locality relations expressed in the step-by-step computation might notbe directly expressed at the output levels. That is, a derivation may satisfy theHMC in each step, but the output may appear to indicate that the condition isviolated. Consider again the formation of a causative verb meaning cause-tofall by adjoining fall to cause, as in (51). Recall that a verb must also beraised to the inflection position. Hence, the newly formed category cause-fallmust now raise to this position, forming the structure (58) (where TP is tenseheaded phrase, tf is the trace of fall, and tc is the trace of cause-fall).

The Theory of Principles and Parameters

45

CP

(58)John

TPTV

VPT

cause

Vtc

CPbooks

tf

fallHere we have two chains: (cause-fall, tc) and (fall, tf). Each step of chainformation satisfies the strict locality condition. But the resulting chain headedby fall does not. In the S-Structure, the chain (fall, tf) violates the HMC,because of the intervening head tc, a possible target of movement that isskipped by the chain. The form should thus be as deviant as (57c), but it iswell formed. The locality conditions are satisfied stepwise in the derivation,but are not satisfied by the output chain. Modifications required under nonderivational approaches are not entirely straightforward.1.3.2

D-Structure

The computational system forms SDs that express the basic structural facts(syntactic, phonological, and semantic) of the language in the form of phrasemarkers with terminal strings drawn from the lexicon. We are assuming thatsuch properties of natural language as displaced elements are expressed bymultiple representational levels, each simple in form and with simple operations such as Move relating them. Each level captures certain systematicaspects of the full complexity. The relation of the computational system to thelexicon is expressed at the internal interface level of D-Structure. D-Structureis mapped to LF, the interface with conceptual and performance systems; atsome point (S-Structure), perhaps varying somewhat from language to language, the derivation branches and an independent mapping (phonology)forms the PF representation that provides the interface with the sensorimotorsystems. See (4).The earliest attempts to develop generative grammar in the modern sensepostulated a single level of syntactic representation, formed by rules of theform (59), where A is a single symbol and X, Y, Z are strings (X and Ypossibly null), S is the designated initial symbol, and there is a set of

46

Chapter 1

designated terminal symbols that are then mapped by other rules to phoneticforms.(59) XAY XZYThe symbols were assumed to be complex, consisting of two kinds of elements:categorial and structural. Categorial elements were NP, V, and so on. Structuralelements were features that coded global properties of phrase markers; forexample, NP-VP agreement in the men are here is coded by the [+ plural]feature assigned to S and inherited by NP and VP through application of therule [S, + plural] [NP, + plural] [VP, + plural] (Chomsky 1951). Subsequentwork factored the complexity into two components, restricting the symbolsto just their categorial part (phrase structure rules forming phrase markers) andadding transformational rules to express global properties of expressions(Chomsky 1975a, Lees 1963, Matthews 1964, Klima 1964). A later steprestricted the recursive part of the generative procedure to rules of the form(59) and separated the lexicon from the computational system (Chomsky1965). This provided a two-level system: phrase structure rules and lexicalinsertion form D-Structure and transformations form the derived phrasemarkers of surface structure, then subjected to phonetic interpretation. TheStandard Theory assumed further that only D-Structures are subjected tosemantic interpretation, a position elaborated in Generative Semantics (Lakoff1971). The Extended Standard Theory (EST) proposed that surface structuredetermines crucial elements of semantic interpretation (Jackendoff 1972,Chomsky 1972). Later work led to the four-level conception of EST outlinedearlier, and the P&P approach, which dispenses entirely with rule systems forparticular languages and particular constructions.Separation of the lexicon from the computational system permits simplification of the rules (59) to context-free, with X, Y null. Thus, instead of (59), wehave the context-free rules (60).(60) a. A Zb. B lHere A, B are nonterminal symbols, Z is a nonnull string of nonterminalsymbols or grammatical formatives, and l is a position of lexical insertion. Bis a nonbranching lexical category, and Z contains at most one lexical category.Z of (60a) is therefore as in either (61a) or (61b), where Ci is a nonlexicalcategory, X and Y are strings of nonlexical categories, and L is a lexicalcategory.(61) a. A C1 ... Cnb. A XLY

The Theory of Principles and Parameters

47

These moves exposed the crucial redundancy in phrase structure rules alreadydiscussed (sections 1.1, 1.2): the form of Z in (60a) depends on inherent properties of lexical items. Further redundancies are also immediately apparent. In(60b) the properties of the lexical category B are completely determined bythe lexical element inserted in l. Considering the possible forms in (61), weobserve further that in (61b) the properties of A are determined by L: thus, ifL is N, A is NP; if L is V, A is VP; and so on. The rule is endocentric, withthe head L of the construction projecting the dominating category A. Supposewe assume that the rules (61a) are also endocentric, taking A to be a projectionof one of the Cis (an expression of ideas developed in structural linguistics interms of discovery procedures of constituent analysis (Harris 1951)). We nowhave rules of the form (62).(62) a. Xn ZXmWb. X0 lHere n is typically m + 1 and Xi is some set of categorial features (see 14));and X0 is a lexical category. The element inserted in position l determines thefeatures of Xi and, to a substantial extent, the choices of Z and W. At this pointphrase structure rules are largely eliminated from particular languages; theyare expressed as general properties of UG, within the framework of X-bartheory.A further proposal restricts the rules (62a) to the forms (63).(63) a. Xn ZXn1b. Xm XmYc. X1 X0WFor n maximal, we use the conventional symbol XP for Xn; n = 0 is oftendropped, where no confusion arises. To form a full phrase marker, each X0 isreplaced by a lexical element with the categorial features of X.Suppose that n = 2 and m = 1 or 2 in (63), so that the possible rule formsare (64).(64) a.b.c.d.

XP. Assume further that Z, Y are single symbols. We call Z the specifier (Spec)of X2, the elements of W the complements of X0, and Y in (64a) an adjunct ofX2. The status of Y in (64c) is ambiguous, depending on further articulationof the theory; let us tentatively classify it as an adjunct. Note that the notions

48

Chapter 1

specifier, complement, and adjunct are functional (relational), not categorial;

thus, there is no categorial symbol Spec, but rather a relation specifier-of, andso on.This is essentially the system of Chomsky 1981a, and the basis for furtherconcepts defined there. We continue with these assumptions, turning later tomodifications required under alternatives.Muysken (1982) proposes that the bar levels are determined by the featuresystem [projected, maximal]. Thus, X0 = [X, projected, maximal], X1 = [X,+projected, maximal]; X2 = [X, +projected, +maximal]. Note that thisapproach permits a distinction between adjunction structures formed atD-Structure and by adjunction operations. See also Jackendoff 1977, Stowell1981, Speas 1986, Fukui 1986, Baltin and Kroch 1989.With the move to X-bar theory, the phrase structure system for a particularlanguage is largely restricted to specification of the parameters that determinethe ordering of head-complement, head-adjunct, and specifier-head. Choicesabove are typical for a head-initial language. The rules (62)(64) themselvesbelong to UG (order aside), not to particular grammars. As discussed in sections 1.1 and 1.2, the elimination of phrase structure rules has always been aplausible goal for linguistic theory, because of their redundancy with ineliminable lexical properties. If X-bar theory can be sustained in its most generalform, choice of items from the lexicon will determine the D-Structure phrasemarkers for a language with parameters fixed.Items of the lexicon are of two general types: with or without substantivecontent. We restrict the term lexical to the former category; the latter arefunctional. Each item is a feature set. Lexical elements head NP, VP, AP, andPP, and their subcategories (adverbial phrases, etc.). At D-Structure and LF,each such XP must play its appropriate semantic role, satisfying FI, as discussed earlier. The heads of these categories have (1) categorial features;(2) grammatical features such as -features and others checked in the courseof derivations, continuing to assume one of the interpretations of morphological structure discussed in section 1.1; (3) a phonological matrix, furtherarticulated by the mapping to PF; (4) inherent semantic and syntactic featuresthat determine s(emantic)-selection and c(ategorial)-selection, respectively.Thus, persuade has features determining that it has an NP and a propositional complement, with their specific -roles. As discussed in section 1.2,c-selection is at least in part determined by s-selection; if the determination iscomplete, we can restrict attention to s-selection. We may now assume that acomplement appears at D-Structure only in a -position, -marked by its head.Since the computational rules can add no further complements, it follows thatat every level, complements are -positions, in fact, -marked the same way

The Theory of Principles and Parameters

49

at each level (the Projection Principle). The Projection Principle and therelated conditions on -marking provide a particular interpretation for thegeneral condition FI at D-Structure and LF.Functional items also have feature structure, but do not enter into -marking.Their presence or absence is determined by principles of UG, with someparameterization. Each functional element has certain selectional properties:it will take certain kinds of complements, and may or may not take a specifier.The specifiers typically (though perhaps not always) are targets for movement,in the sense discussed earlier. Hence, they have no independent semantic roleat all. As suggested in section 1.3.1, we may assume them to be inserted inthe course of derivation, unless some general condition on D-Structure requirestheir presence.We assume that a full clause is headed by a complementizer C, hence is aCP, satisfying X-bar theory. C may have a specifier and must have a complement, a propositional phrase that we assume to be headed by another functionalcategory I (inflection), which has the obligatory complement VP. Hence, aclause will typically have the form (65) (Bresnan 1972, Fassi Fehri 1980,Stowell 1981, Chomsky 1986a).(65) [CP Spec [C C [IP Spec [I I VP]]]]Specifiers are typically optional; we assume this is true of [Spec, CP]. TheExtended Projection Principle (EPP) states that [Spec, IP] is obligatory,perhaps as a morphological property of I or by virtue of the predicationalcharacter of VP (Williams 1980, Rothstein 1983). The specifier of IP is thesubject of IP; the nominal complement of VP is the object of VP. We takethese to be functional rather than categorial notions; for different views, seeBresnan 1982, Perlmutter 1983. By the Projection Principle, the object is a-position. The subject may or may not be; it may be filled by an expletive oran argument at D-Structure. [Spec, IP] is therefore a potential -position. Anactual or potential -position is an A-position; others are A -positions (A-barpositions). As matters stand at this point, complement and subject ([Spec, IP])are A-positions, and [Spec, CP] and adjunct positions are A -positions. Achain headed by an element in an A-position is an A-chain; a chain headedby an element in an A -position is an A -chain. The distinction between A- andA -positions, and between A- and A -chains, plays a central role in the theoryof movement and other modules of grammar. We return to some problemsconcerning these notions.Recall the two interpretations of the syntactic rule R that associates lexicalitems with their inflectional features: word formation by adjunction, or checking (see section 1.1). If we adopt the former approach, it follows that the

50

Chapter 1

operation R must apply in the D- to S-structure derivation, because it feeds

the rules of the phonological (PF) component. The checking alternative doesnot strictly imply that morphological properties must be determined byS-Structure, but we will assume that this is nevertheless true. It follows thatthe inflected head of VP must have its features assigned or checked by I atS-Structure, either through lowering of I to V or through raising of V to I(see sections 1.3.1, 1,3.3). In the lowering case the S-Structure chain is deficient. There must therefore be an LF operation that raises the adjunctionstructure [VI] to replace the trace of the lowered I, voiding the potential violation and providing an LF similar to what we find in a language with raisingat S-Structure (on some empirical consequences, see chapter 2). At LF, then,V will always be at least as high as I in (65).The [VI] complex may also raise further to C. In V-second languagessuch as Germanic generally, V raises to C and some other phrase raises to[Spec, CP] in the main clause (Den Besten 1989, Vikner 1990). The samephenomenon appears more marginally in English questions and some otherconstructions. We assume these to have the form illustrated in (66), whobeing in [Spec, CP], has raising to C and leaving the trace t, tw being thetrace of who.(66) [CP who has [IP John t [VP met tw]]]By virtue of the general properties of X-bar theory, the only options in thepre-IP position, introducing a clause, are YPX0 or X0; X0 may be null andcommonly must be in embedded clauses if [Spec, CP] is nonnull (the DoublyFilled Comp Filter; see Keyser 1975). We assume that in general, overt movement of the question words is to the [Spec, CP] position, and the same is trueof other constructions.Structures of the form (65) may also appear in embedded position, as in theindirect question (67a) or the declarative clauses (67b).(67) a. (I wonder) [CP who C [IP John has met tw]]b. i. (I believe) [CP that [IP John has met Bill]]ii. (I prefer) [CP for [IP John to meet Bill]]iii. (it was decided) [CP C [IP PRO to meet Bill]]In (67a) and (67biii) the C head of CP is null; in (67bi) it is that; and in(67bii) it is for. The head of IP is [ + tense] in (67a), (67bi); it is [ tense] in(67biiiii). [Spec, CP] is unfilled in (67b), but it can be realized in otherembedded constructions, for example, (67a), the relative clause (68a), or thecomplex adjectival clause (68b), where there is good reason to believe that Op

The Theory of Principles and Parameters

51

is an empty operator in [Spec, CP]. C is empty in both cases and t is the traceof Op.(68) a. the man [CP Op C [IP John met t]]b. Mary is too clever [CP Op C [IP PRO to catch t]]The embedded clauses of (68) are predicates, open sentences with a variableposition. In (68a) Op could be who, also semantically vacuous in this case. Asa matter of (nontrivial) empirical fact, FI at LF includes the property of strongbinding: every variable must have its range fixed by a restricted quantifier, orhave its value determined by an antecedent. Since the operators in (68) arevacuous, the value of the variable must be fixed by the antecedents man, Mary,the choice being determined by locality conditions on predication.These properties suffice to explain such examples as (3c), repeated here as(69a), the if-clause having the form (69b).(69) a. if Mary is too clever to expect anyone to catch, then we dontexpect anyone to catch Maryb. Mary is too clever [CP Op C [IP PRO to expect [anyone tocatch t]]]The embedded CP is a typical case of long (successive-cyclic) movement,analogous to (70) with who in place of Op.(70) (I wonder) [who he expected [them to catch t]]The variable must not be bound by anyone or PRO in (69b), just as it mustnot be bound by the elements them or he in (70); we return to the operativeprinciple of binding theory in sections 1.3.3, 1.4.2. By the strong bindingcondition, the variable must therefore have Mary as its antecedent. Furthermore, PRO must be arbitrary, for if it is bound by Mary (as in Mary is tooclever [PRO to catch Bill]), then the variable will be bound by PRO, violatingthe principle just illustrated. We therefore have the interpretation (69a). Notethat the account assumes crucially that binding is based upon an equivalencerelation; see section 1.4.2.On the same assumptions, we can reduce the problem of explaining thedeviance of (71) to that of the deviance of overt operator movement, as in theanalogous example of (72).(71) a. *the man [you met people that caught t]b. *Mary is too clever [to meet [people that caught t]](72) *who did John meet people that caught t

52

Chapter 1

In all cases the locality conditions on movement are violated. See section 1.4.1.We have assumed so far that embedded infinitivals are CPs, as in (67biiiii)or (73).(73) I wonder who he decided [CP C [PRO to catch t]]In such cases the embedded subject must be PRO if the C head is empty andmust be an overt NP if it is the Case-assigning element for, with dialectalvariation. But there are other propositional phrases in which neither PRO northe Case-assigning complementizer for can appear, for instance, (74).(74) a. John believes [Bill to be intelligent]b. John considers [Bill intelligent]c. that gift made [Bill my friend for life]Thus, in (74a) we cannot have for Bill or PRO instead of Bill. Similarly, insuch constructions as these, the embedded subject can be trace, unlike theinfinitival CPs. Compare:(75) a. Bill is believed [t to be intelligent]b. *Bill was decided [CP [t to be intelligent]]In general, the embedded subject of (74) behaves very much as if it were anobject of the verb of the main clause (the matrix verb), though it is not a-marked complement of the verb, but rather the subject of an embeddedclause. Constructions of the form (74a) are rather idiosyncratic to English; insimilar languages (e.g., German), the corresponding expressions have theproperties of (67biiiii), (73), and so on.The embedded clause of (74a) contains I, hence IP; there is no evidence forany further structure. To account for the differences from the embedded CPinfinitivals, we must assume either that the embedded clause is just IP, or thatthere is an EC complementizer that assigns Case, like for (Kayne 1984). Onthe former assumption, which we will pursue here, the embedded subject isgoverned by the matrix verb, a relation that suffices to assign Case, licensetrace, and bar PRO, as in verb-object constructions. Note that the questionwhether (75a) is a raising construction (like John seems [t to be intelligent])or a passive construction (like his claims were believed t) does not arise, theseconcepts having been discarded as taxonomic artifacts (section 1.1). The construction is formed by Move as a last resort, the Case-assigning propertyof the verb having been absorbed by the passive morphology. In the examples of (74bc) there is no overt functional head. Assuming the phrase boundaries indicated, either there is an EC I, or the embedded phrases are projectionsof their predicates, so-called small clauses (Stowell 1978, 1981). Either way,

The Theory of Principles and Parameters

53

Bill is the subject of the embedded clause, behaving as in (74a) and unlike thesubject of an embedded CP.We have so far considered two functional categories: I and C. A naturalextension is that just as propositions are projections of functional categories,so are the traditional noun phrases. The functional head in this case is D, aposition filled by a determiner, a possessive agreement element, or a pronoun(Postal 1966a, Brame 1981, 1982, Abney 1987). The phrases that picture ofBill and Johns picture of Bill would therefore have the forms (76).(76) a. [DP that [NP picture of Bill]]b. [DP John Poss [NP picture of Bill]]In (76a) [Spec, DP] is missing; in (76b) it is filled by the subject of the DP,John, to which the affix Poss is adjoined by a phonological operation. The Dhead is that in (76a) and Poss in (76b) (in some languagesfor instance,Turkishmanifesting visible agreement with the subject; see Kornfilt 1985).Noun phrases in the informal sense are thus similar in internal structure toclauses (possibly even containing a complementizer position; Szabolcsi1987). We might expect, then, to find N-raising to D, analogous to V-raisingto I; see Longobardi 1994. There are numerous other consequences, which wecannot pursue here. We will use the informal notation Noun Phrase for DP orNP, unless confusion would arise.We might ask whether these considerations generalize to other major categories, so that AP and VP are also complements of a functional element, evenin VVP or ModalVP constructions. If so, a natural choice would be anelement involved in Case assignment and agreement (call it Agr, a collectionof -features). Such possibilities suggest a reconsideration of the functionalelement I, which has the strange property of being double-headed in theversion of X-bar theory we are considering, assuming that T(ense) and Agrare independent heads. Following Pollock (1989), let us assume that T andAgr head separate maximal projections. Assuming that VP (and AP) is acomplement of Agr, we now have the structure [SpecTAgrVP] for thephrase we have called IP (now a term of convenience only), with T havingAgrP as its complement, and VP, AP being complements of the Agr headof AgrP. Pollock argues on different grounds for the same structure: [SpecTAgrVP]. In this structure the specifier of IP is not commanded (c- orm-commanded) by Agr, hence not governed by it. Hence, if (as we assumethroughout) the operative relations among elements are based on such localrelations, there would be no natural expression of subject-verb agreement.There is other evidence to suggest that the order should be AgrT (Belletti1990), where Agr is involved in subject agreement and nominative Case

54

Chapter 1

assignment. The proper reconciliation of these conflicting proposals may be

that there are two Agr elements in IP, each a collection of -features, oneinvolved in subject agreement and subject Case, the other in object agreementand object Case. Thus, the full structure will be (77), where AgrS and AgrO areinformal notations to distinguish the two roles functional of Agr, Spec indicatesa functional role as before, and IP = AgrP.(77)

IPAgrs

SpecAgrs

TPT

AgroPAgro

SpecAgro

VP

Here we omit a possible [Spec, TP]. Embedded in this structure there mightalso be a phrase headed by the functional element Negation, or perhaps morebroadly, a category that includes an affirmation marker and others as well(Pollock 1989, Laka 1990). We might proceed to assume that Case and agreement generally are manifestations of the Spec-head relation (Koopman 1987,Mahajan 1990; also see section 1.4.3 and chapters 2, 3).The status of [Spec, IP] is anomalous in several respects. One is that it mayor may not be a -position, depending on lexical choices. Thus, in (78) thesubject of hurt is a -position occupied by the trace of the argument John,taken to be the agent of hurt; but the subject of seems is a non--position,which can also be occupied by the expletive it.(78) a. John seems [t to have hurt himself]b. it seems [that John has hurt himself][Spec, IP] is also the only position in which -role is not assigned within them-command domain of a lexical head.Such idiosyncratic properties would be eliminated if we were to assumethat a thematic subject originates from a position internal to VP, then raisingto [Spec, IP]. Collapsing the inflectional nodes to I for convenience, theD-Structure underlying John met Bill would then be (79) (Kitagawa 1986,Kuroda 1988, Sportiche 1988, Koopman and Sportiche 1991).

The Theory of Principles and Parameters

(79)

55

IP[NPe]

II

VPJohn

Vmet

Bill

The subject and object are now -marked within the m-command domain ofthe verb met, within VP. On the present assumptions, John is [Spec, VP] andraises to [Spec, IP] to receive Case and produce a visible chain. By LF, metwill have raised to I. If V raises to I at S-Structure and its subject raises to[Spec, IP] only at LF, we have a VSO language (at S-Structure). If the -roleassigned to subject (the external -role, in the sense of Williams 1980) is inpart compositionally determined (Marantz 1984), then these properties mightbe expressed internal to VP, as properties of the paired elements (subject, V).The assumptions sketched out here provide a certain version of a universalbase hypothesis, a notion that has been explored from various points of view.If they are on the right track, typological variation should reduce to the ordering parameters and properties of functional elements. As discussed earlier, weexpect that D-Structure and LF vary little in essential properties, D-Structurereflecting lexical properties through the mechanisms of X-bar theory and theparametric options for functional elements, and LF being the outcome of aninvariant computational process that maps D-Structure to S-Structure and thento LF. A further proposal is that there is a uniform structural representation of-roles: thus, agent is typically associated with [Spec, VP], theme or patientwith complement to V, and so on. This appears more plausible as evidencemounts questioning the existence of ergative languages at the level of -theory(Baker 1988, Johns 1987). See section 1.2.We have so far kept to the assumption of Chomsky 1981a that all internal-roles (all -roles apart from the role of subject) are assigned to sisters of thehead. This assumption has repeatedly been questioned and has largely beenabandoned. To mention a few cases, Kayne (1984) proposes that all branchingis binary (yielding unambiguous paths). If so, some internal -roles will beassigned to nonsisters. Kayne suggests, for example, that double-object verbshave the structure in (80), in which case give will -mark NPs properly contained within its complement.

56

Chapter 1

(80) give [Mary books]

Similar ideas have been pursued in other studies as well. Belletti and Rizzi(1988) argue that the underlying structure of psych-verb constructions suchas the problem disturbed John is (81), where the sister of disturb is assignedthe -role theme (as usual), then raising to [Spec, IP], while the sister of Vreceives the -role experiencer (see also Pesetsky 1995, Bouchard 1991).(81)

VPVdisturb

NPNP

John

the problemLarson proposes that double-object verbs such as give enter into D-Structuresof the form (82) (Larson 1988, 1990; for an opposing view, see Jackendoff1990a).(82)

VPJohn

VVe

VPV

NPa book

to-Bill

gaveV raises to the empty main verb position of the higher VP shell, yielding Johngave a book to Bill. Alternatively, operations similar to those yielding thepassive construction could absorb the Case of Bill, forcing it to raise to thesubjectlike position of a book, which in turn becomes an adjunct, yieldingJohn gave Bill a book. In (82) the direct object a book, though -marked astheme by the verb, is not its sister. Larson also indicates that adverbs are theinnermost complements of V. Thus, the structure underlying John read thebook carefully would be (83).

The Theory of Principles and Parameters

(83)

57

VPJohn

VVe

VPV

NPthe book V

carefully

readIn this case the sister of the verb is an adverb that is not -marked at all, andthe sole internal -role is assigned to a nonsister (the book).With such modifications, the notion -position is still well defined, butA- and A -position are not. These notions are formally quite different incharacter. A particular occurrence of a category in a phrase marker is, or isnot, a -position, depending on whether it is -marked in that phrase marker.The notion A-position, however, depends upon potential -marking, whichis to say that it presupposes an equivalence relation among phrase markers: anA-position is one that is -marked in the equivalent position of some memberof the equivalence class. This is not an entirely straightforward notion, andwith modifications of the sort just outlined, it becomes unspecifiable in anyway that will bear the considerable theoretical burden that has been laid onthe A versus A distinction, which enters crucially into large areas of currentwork.The intuitive content of the distinction to be captured is reasonably clear.-positions and specifiers of inflectional elements share a range of structuralproperties; other non--marked positions ([Spec, CP], elements adjoined toXP, non--marked positions governed by a head) share a different range ofstructural properties. These are the former A- and A-positions, respectively.There are various proposals as to how to capture this distinction in terms ofnatural classes, and how to extend and sharpen it (e.g., for [Spec, DP]).One approach (see chapter 3) is based on the observation that certain functional elements are, in effect, features of a head, in that they must be adjoinedto this head to check its inherent features (alternatively, to assign these inherentfeatures to it). Tense and the Agr elements are features of V in this sense, butC is not. Given a lexical head L, we say that a position is L-related if it is the

58

Chapter 1

specifier or complement of a feature of L. The L-related positions are the

former A-positions, with the exception of non--marked elements such ascarefully in (83). But this exception will not be problematic if independentconsiderations block movement of such elements to any L-related position(raising). If economy considerations permit raising only when it is required(i.e., only Last Resort movement), then the issue will not arise; see sections1.1, 1.3.1.Along these lines, one might reconstruct something like the A versus Adistinction. The account now relies on properties of occurrences of a categoryin a phrase marker, without reference to equivalence classes of phrase markers.Other uses of these notions, as in binding theory, appear to fall into placewithout too much difficulty. We leave the matter with these informal indications of a direction to explore, merely noting here that certain concepts thatserve as foundations for much current work were originally defined on thebasis of assumptions that have been widely abandoned and therefore must bereconstructed in some different way. With these qualifications, we will continue to use the notions with their intuitive content, as is standard in currenttechnical work.1.3.3

Derived Syntactic Representations

We have adopted the EST assumption that the derivations from D-Structureto PF and LF have a common part: D-Structure is mapped to S-Structure byAffect , and the derivation then branches into two independent paths, oneforming PF, the other forming LF (the PF component and the LF component,respectively). These are the two external interface levels. Since our concernhere is syntax in the narrow sense, we restrict ourselves to the computationfrom D-Structure to LF.The part of this derivation that maps S-Structure to LF is sometimes trivial,but whenever structural properties relevant to meaning are not already expressedat S-Structure, this mapping is substantive. Following Chomsky (1977), May(1977), we assume that scope of operators is structurally represented at LF interms of c-command. For interrogative operators, as will be discussed below,movement to an appropriate scope position takes place sometimes betweenD-Structure and S-Structure and sometimes between S-Structure and LF.Movement of quantifiers (Mays quantifier raising, QR) is generally anS-Structure to LF operation. The examples of inversely linked quantificationdiscussed by May, as in (84), clearly indicate that S-Structure configurationdoes not suffice.(84) everybody in some Italian city likes it.

The Theory of Principles and Parameters

59

Here some Italian city has wide scope, even though at S-Structure it is contained within the universally quantified NP. The correct interpretation is structurally represented in (85), with the entire subject NP having undergone QR,and the existential expression having raised still further.(85) [IP[some Italian city]i [IP[everybody in ti]j [IP tj likes it]]]See May 1977, 1985, for further motivation for QR.Since it is an interface level, there are further requirements on LF. GivenFI, every element of the LF representation of an expression must be subjectto interpretation at the interface. As noted in section 1.1, this entails that thereshould be no true expletives in an LF representation. In such expressions as(86), then, the expletive element there must somehow be eliminated in themapping from S-Structure to LF.(86) there is a man in the roomOne possibility that can be readily dismissed is that the expletive is simplydeleted. The EPP demands that a clause have a subject at every syntactic level.Deletion of there would violate this requirement at LF. The expletive alsoappears to have -features that enter into agreement with the inflected verb.In (86) those features are [3 person, singular]; in (87) they are [3 person,plural].(87) There are men in the roomA strong form of recoverability of deletion would presumably prevent thedeletion of an element with -features. Given that there must be eliminatedand cannot be deleted, the remaining possibility is that it is the target of amovement operation, with the associate of the expletive (a man in (86) andmen in (87)) moving to the position of the expletive. Whether it is construedas substitution or adjunction, we may assume that this operation produces anew element combining the relevant features of the expletive and its associate:[there, a man] in (86), [there, men] in (87). Let us call this an amalgamatedexpletive, leaving open its exact form.We now have an account for the apparently anomalous rightward agreement in these cases, that is, the fact that the inflected verb agrees with theNP that follows it: is and are cannot be interchanged in (86), (87). The LFmovement analysis directly predicts this paradigm. There must be replaced,but the phrase amalgamating with it must be nondistinct from it in features. Ifthe operation is substitution, this requirement will follow from the recoverability condition. If the operation is adjunction, it will follow from a featurematching requirement. Alternatively, we might assume that there lacks

60

Chapter 1

-features and that the overt agreement is an S-Structure reflex of agreement

at the LF level between the inflected verb and the amalgamated expletive, itsagreement features provided by the associate. Note further that one of thecentral properties of these constructionsthat there is an argument associatedwith the expletivealso follows, since FI demands that the expletive bereplaced.From an S-Structure corresponding to (86), then, we derive the LF representation (88), t the trace of a man.(88) [there, a man] is t in the roomSince the expletive occupies an A-position at S-Structure ([Spec, IP]), theLF movement forming the amalgamated expletive is A-movement. It followsthat the relation between the associate and its trace meets the narrow conditions on A-movement. We now have an account for the fact that in the overtexpression, the expletive and its associate conform to the locality requirementsof A-chains. This follows from the fact that at LF, they are amalgamated toform an A-chain. We therefore have expletive-associate relations of the kindillustrated, but not those of (89), analogous to (90).(89) a. *there seems that a man is in the roomb. *there seems that John saw a manc. *there was thought that [pictures of a man were on sale](90) a. *a man seems that t is in the roomb. *a man seems that John saw tc. *a man was thought that [pictures of t were on sale]Note that the locality condition on the expletive-associate pair is that ofA-movement, not binding, which is permissible in the analogue to (90c).(91) we thought that [pictures of each other were on sale]We return in section 1.4.3 to some problematic features of this analysis.In section 1.3.1 we alluded to an approach to Case in terms of visibility for-marking. Expletives appear to contradict the principle, since they are not-marked but appear only in positions to which Case is assignablein fact,only in a subset of such positions (subjects), but this follows from the fact thatD-Structure complements are present only if they have a semantic role (typically, a -role). Thus, we find (92a) with nominative there and (92b) withaccusative there, but (92c) is impossible.(92) a. I believe [there is a man here]b. I believe [there to be a man here]c. *I tried [there to be a man here]

The Theory of Principles and Parameters

61

But now these facts fall neatly under the visibility approach. At LF we willhave (93), where t is the trace of a man and EA is the amalgamatedexpletive.(93) a. I believe [[EA there, a man] is t here]b. I believe [[EA there, a man] to be t here]c. *I tried [[EA there, a man] to be t here]When an expletive is in a Caseless position at S-Structure, its associated argument will necessarily be in that position at LF and will, as a consequence, beinvisible for -marking.The analysis just sketched suggests that Case is checked at LF even thoughmanifest at S-Structure; that is, it suggests that conditions requiring checkingor assignment of Case are LF conditions, not S-Structure conditions, despiteappearances. The same conclusion is suggested by the general approach toCase in terms of visibility, which links Case assignment to -theory. As discussed earlier, there is a preference on general conceptual grounds for interfaceconditions rather than S-Structure conditions. The various considerations sofar adduced point in the same direction, but serious problems arise in tryingto pursue this course. We return to the topic in section 1.4.3.Turning to the S-Structure representation, with parameters fixed this isdetermined (presumably uniquely) by the choice of D-Structure and LF representations. S-Structure is unlike the three basic levels (D-Structure, PF, LF)in that it satisfies no constraints external to the computational system. Itwould therefore be reasonable to expect that conditions involving the interface(in particular, conditions bearing on the semantic interpretation of SDs) shouldbe restricted to the interface levels themselves, not applying at S-Structure.Nevertheless, there may be conditions of UG that must be satisfied at theS-Structure level.There is some cross-linguistic variation in the character of S-Structure;in particular, functional elements vary in the ways they are articulated atS-Structure and hence are realized overtly. Languages may also differ, asnoted, with regard to the placement of S-Structure in the derivation of LFfrom D-Structure, that is, the point of branching to PF. One well-studied caseconcerns the application of Move that determines the scope of a questionphrase (commonly called the wh-phrase, by historical accident), moving itto the periphery of the proposition.In English-type languages the effects of the movement operation are visible,yielding the S-Structure form (94), where t is the trace of what.(94) a. what do you want [John to give t to Bill]b. what do you want [John to give t to whom]

62

Chapter 1

In a multiple question such as (94b), only one of the question phrases movesby S-Structure.In the counterpart to (94a) in a Chinese-type language, the analogue to whatis in situ at S-Structure, occupying the position of the trace in (94). Weassume, following Huang 1982 and much subsequent work, that the phrase ismoved to clause-peripheral position at LF, yielding an LF form resembling(94). More generally, in both types of language all question phrases will havemoved to scopal position under this operation in the course of the derivation,within the LF component if not before (Higginbotham and May 1981, Aoun,Hornstein, and Sportiche 1981).The D-Structure forms are therefore alike in relevant respects in Englishand Chinese-type languages, as are the LF forms, the standard expectation (seesection 1.1). But the S-Structure forms differ, depending on whether the operation that places the question phrase in the position that determines scopeapplies before or after the branching to the PF component at S-Structure. Onetype of language (English, French, etc.) employs overt movement of a questionphrase in the course of derivation of S-Structure from D-Structure, feeding thephonological component; another type (Chinese, Japanese, etc.) leaves allquestion phrases in situ at S-Structure. Both types of language employ covertmovement within the LF component for any in-situ question phrase. A thirdtype of language (e.g., Polish) has overt movement of all question phrases.D-Structure and LF representations are again similar to the other two languagetypes, but the S-Structures differ (Lasnik and Saito 1984).Given a narrow theory of parametric variation of the sort discussed, thesethree language types should differ in properties of functional features. Cheng(1991) argues that mood (interrogative, declarative, etc.) must be indicated atS-Structure in the pre-IP position, hence by choice of either C or [Spec, CP];the head of CP and its specifier thus serve as force indicators in somethinglike the Fregean sense. If the lexicon contains an element Q (marking yes-noquestions), then this element will suffice to identify an expression as an interrogative whether or not it contains an in-situ question phrase. There is no need,then, for the question phrase to raise to [Spec, CP] at S-Structure. Lacking theelement Q, a language must employ overt movement of a question phrase to[Spec, CP] to be identified as an interrogative at S-Structure.Suppose further that economy principles favor operations that do not feedthe PF component over others that do; hence, if operations need not be overtto satisfy some condition, they will be assigned to the LF component, applyingas late in the derivation as possible, at the point where they are forced byLF conditions (in the case under discussion, conditions of scope). Theseassumptions lead us to expect two basic categories of language in the simplest

The Theory of Principles and Parameters

63

case: (1) languages with a Q element and the question phrase in situ (Chinese,Japanese); and (2) languages lacking a Q element and with a single questionword in [Spec, CP] (English, German). At LF all question phrases will havemoved, so that the quasi quantifier can be interpreted with its scope determinedand a bound variable heading an argument chain. Other typological differencesshould then be reducible to internal morphology of the question phraseforinstance, languages of the Polish-Hungarian type with multiple fronting ofquestion phrases at S-Structure (though perhaps not to [Spec, CP]; see Cheng1991). On assumptions such as these, there are conditions that must be satisfied by S-Structure representations.Overt and covert movement might have different properties. Huang (1982)proposed that the bounding conditions on overt movement are relaxed in theLF component so that we have such pairs as (95a) in English and (95b) inChinese.(95) a. *who do you like [books that criticize t]b. ni xihuan [piping shei de shu]you like[criticize who rel book]Both expressions have the interpretation for which person x, you like booksthat criticize x, but only (95b) is well formed. The English example (95a)violates a locality condition on movement (Subjacency); its Chinese counterpart is free from this constraint (for varying approaches, see, among others,Huang 1982, Lasnik and Saito 1984, Nishigauchi 1986, Fiengo et al. 1988,Watanabe 1991).A similar phenomenon is found in multiple questions in English-type languages. Thus, English (96a) is well formed with the interpretation (96b)expressed in the LF form (96c).(96) a. who [t likes books that criticize whom]b. for which persons y, x, [x likes books that criticize y]c. [whomj, whoi] [ti likes books that criticize tj]We have assumed that overt movement, as in (94) or (96a), places the question phrase in the position [Spec, CP]. Possibly covert movement, not requiredfor mood specification, may adjoin the question phrase to IP, treating it like aquantifier phrase assigned scope by QR. Typically, such question phrases aswho, whom share semantic and distributional properties of quantifier phrases,and might be composed of an indefinite quantifier, a wh-feature, and therestriction on the quantifier (Chomsky 1964, Kuroda 1965, Nishigauchi 1986,Kim 1990, Watanabe 1991). Accordingly, who would be composed of [somex, wh-, x a person]; and so on. It would not then be surprising if such question

64

Chapter 1

phrases were to share properties of the indefinite quantifier, adjoining to IP in

the LF component by QR, though it remains to explain why they move sofreely, unlike QR, which is typically clause-bound.In English-type languages, relative clauses are formed in much the samemanner as interrogatives: an operator phrase, which may be either an ECoperator Op or morphologically identical to a question phrase, is moved to[Spec, CP], leaving a trace that functions as a variable, as in (97).(97) a. the people [who John expected to meet t]b. the people [Op (that) John expected to meet t]In either case, the relative clause is an open sentence functioning as a predicate(see (68)). In these constructions, movement is in the overt (pre-S-Structure)syntax, as shown in (97a), and satisfies the bounding conditions on overtmovement, as illustrated in (98).(98) a. *the man [who you like books that criticize t]b. *the man [Op (that) you like books that criticize t]While Chinese and Japanese have question words in situ, relative clauses showthe properties of overt movement (Huang 1982, Watanabe 1991, Ishii 1991).These observations suggest that relative clauses require overt movement. Thereason might be that predication must be established at S-Structure (Williams1980). If so, we have another example of an S-Structure condition. It wouldremain to extend the analysis to languages that form relatives with in-situpronouns (resumptive pronouns) and full NP heads in the position of the variable above (Sells 1984, Demirdache 1991).These considerations extend to other constructions with EC operators, suchas the complex adjectivals discussed in section 1.3.2 ((68)(69)), with thelocality properties of overt movement (repeated here).(99) a. Mary is too clever [cp Op C [IP PRO to expect [anyone to catch t]]]b. *Mary is too clever [cp Op C [IP PRO to meet [anyone whocaught t]]]Given the locality properties, the open sentences functioning as predicatesmust have been formed by overt movement, pre-S-Structure.Some semantic properties of linguistic expressions appear to be determinedby S-Structure configurations, independently of operations of the LF component. Let P be such a property. Then two accounts are possible.(100) a. P holds at S-Structure.b. P holds at LF under reconstruction, that is, with the moved phrasetreated as if it were in the position of its trace.

The Theory of Principles and Parameters

65

If the former is correct, then the property P involves a condition on S-Structure.

There are various ways of construing the notion of reconstruction.A good deal of insight into these questions derives from the principle ofbinding theorycall it Commandstipulating that a pronoun cannot c-command its antecedent (see sections 1.3.2, 1.4.2). We can formulate this as arequirement that an r-expression must be A-free, that is, not c-commandedby a pronoun in an A-position linked to in the binding-theoretic sense. Thus,in (101a) and (101b) John is A-free; the pronoun (him, his) does not c-command John and can take John as its antecedent. But in (101c) he c-commandsJohn and must be assigned reference in some other way.(101) a. John thought Mary took a picture of himb. [his mother] thought Mary took a picture of Johnc. he thought Mary took a picture of JohnThe principle Command applies to r-expressions generally, hence to variablesas well as John, as we see in (102), analogous to (101), with the trace of whoin the position of John in (101).(102) a. the man who [t thought Mary took a picture of him]b. the man who [[his mother] thought Mary took a picture of t]c. the man who [he thought Mary took a picture of t]In (102a) and (102b) the pronoun does not c-command t. Even if the pronounand variable are referentially linked, the variable is A-free, though A -boundby its operator. The variable and the pronoun can now be construed as variablesbound ( A -bound) by who. The interpretations are the man x such that xthought Mary took a picture of x, the man x such that xs mother thoughtMary took a picture of x, respectively; the deviance of (102b), if any, is slight(Chomsky 1982, Higginbotham 1983, Lasnik and Stowell 1991).But in (102c) he c-commands t and therefore cannot be linked to this variable or it will not be A-free; (102c) therefore cannot have the interpretationthe man x such that x thought Mary took a picture of x. There is nothingwrong with this interpretation; in fact, it is the interpretation of (102a). Butit cannot be assigned to (102c), by virtue of Command (the property of strongcrossover; Postal 1971, Wasow 1972, Lasnik 1976).The principle Command also enters into the explanation of the meaning ofthe complex adjectivals of (99), as discussed earlier (see (68)(69)). We nowask at what level Command applies. Consider the examples (103).(103) a. you said he liked [the pictures that John took]b. [how many pictures that John took] did you say he liked tc. who [t said he liked [how many pictures that John took]]

66

Chapter 1

In (103a) he c-commands John and cannot take John as antecedent; in (103b)

there is no c-command relation and John can be the antecedent of he. In themultiple-question-phrase construction (103c) John in fact cannot be the antecedent of he. It must be, then, that he c-commands John at the level of representation at which Command applies; the binding properties of (103c) arethose of (103a), not (103b).Returning to the two options of (100), we seem to be led here to adopt thefirst: that Command applies at S-Structure, before the bracketed questionphrase is moved to preclausal position at LF, at which point (103c) would beformally similar to (103b), not (103a). Alternatively, we could assume, in theface of examples such as these, that the second option, reconstruction, holdsfor LF raising but not overt movement. More simply, we could dispense withboth options, rejecting the tacit assumption that LF movement formed (104)from (103c), t the trace of the LF-moved phrase.(104) [[how many pictures that John took] who] [t said he liked t]Recalling that LF movement does not meet the strict locality conditions ofS-Structure movement, we might reject the assumption that the entire NP ispied-piped when how many is raised to the scopal position, assuming ratherthat how many is extracted from the NP, yielding an LF form along the linesof (105), t the trace of how many.(105) [[how many] who] [t said he liked [t pictures that John took]]The answer, then, could be the pair (12, Bill), meaning that Bill said he liked12 pictures that John took. But in the LF form (105), he c-commands John sothat Command applies as in (103a). Pursuing such lines as these, we wouldnot be led to adopt the assumption that Command applies at S-Structure,leaving us with the preferable option that conditions involving interpretationapply only at the interface levels. A further consequence would be that(103bc) have somewhat different forms at LF; the empirical effect is unclear(Hornstein and Weinberg 1990).Other constructions illustrate the process of reconstruction and are thusconsistent with the restriction of the conditions on interpretation to the LFlevel. Consider (106).(106) a. they said he admires Johns fatherb. who [t said he admires Johns father]c. (guess) whose father [they said he admires t]In (106a) and (106b) he c-commands John and cannot take John as its antecedent, given Command. In (106b) he does not c-command t, so both can be taken

The Theory of Principles and Parameters

67

as variables bound by who, yielding the interpretation for which person x, x

said x admires Johns father. In (106c) he does not c-command who, but itcannot be taken as a variable bound by who, even though this interpretationwould leave t A-free. The complement of guess is interpreted as (107) with heunbound, analogous to (106a).(107) for which person x [they said he admires xs father]Thus, we have reconstruction: treatment of [whose father] as if the phrase werein the position of its trace t in (106c) (Chomsky 1977, Freidin and Lasnik1981).Questions proliferate quickly with further inquiry. Consider, for example,such constructions as (108), formed by successive-cyclic movement of thequestion phrase from the position of t, to the position of t, to [Spec, CP] ofthe matrix clause.(108) a. [which picture of himself] did John say [t that Bill liked t best]b. [which pictures of each other] did they say [t that we liked best]Barss (1986) observes that the anaphor can take either of the italicizedNPs as its antecedent. But an anaphor can only be bound by the closestc-commanding subject, as we see in the corresponding expressions (109),without wh-movement.(109) a. John said [that Bill liked [that picture of himself] best]b. they said [that we liked [those pictures of each other] best]Here the antecedents must be Bill, we. In (108) the same binding conditionrequires that each of the traces be visible, the question phrase being interpreted for binding as if it were in one or the other of these positions (chainbinding).Another problematic example is (110a), with the interpretation (110b) and,on our current assumptions, the LF representation (110c) (Higginbotham1980, 1983).(110) a. guess which picture of which boy [they said he admires t]b. for which boy x, which picture y of x, [they said he admires y]c. [[which boy]i [which picture of ti]]j [they said he admires tj]Reconstruction in the manner of (106c) and (107) does not yield a structurebarred by Command. Nevertheless, he cannot be construed as an occurrenceof the bound variable x.The formal property entering into reconstruction here seems to be that thepair (r-expression , pronoun ) are referentially disconnected at LF if thereis a such that contains and c-commands or its trace. But that principle,

the pronoun in (103b). The discrepancy suggests that the problem with (110)lies elsewhere.The problems are more general. Consider (111).(111) a. the claim that John was asleep, he wont discuss tb. the claim that John made, he wont discuss tCase (111a) is analogous to (110); case (111b) to (103b). On our currentassumptions, the pronoun must not take John as antecedent in (111a) or(111b); the conclusion is correct for (111a) but not for (111b). Still furthercomplications arise when we consider differences between these examples ofA -movement and scrambling constructions in which the normal subjectobject order is inverted.We leave the topic in this unsettled state. For further discussion of theseand related matters, from various points of view, see Lakoff 1968, Reinhart1976, 1983, Van Riemsdijk and Williams 1981, Higginbotham 1980, 1983,Langendoen and Battistella 1982, Barss 1986, Freidin 1986, Lebeaux 1988,Saito 1989, and chapter 3.Consideration of LF A -movement also suggests that there is an S-Structurecondition licensing parasitic gap (PG) constructions such as (112a), interpreted as (112b).(112) a. which book did you file t [without my reading e first]b. for which x, x a book, you filed x without my reading x firstLicensing of PGs by A -chains is quite general, but those formed by LF movement do not license PGs, as illustrated in (113), with the S-Structure (113a)and the LF form (113b).(113) a. *who [t filed which book [without my reading e]]b. *[[which book]j whoi] [ti filed tj [without my reading e]]The interpretation cannot be for which book x, who filed x without myreading x. PG constructions, then, provide some evidence for the existenceof S-Structure conditions.The condition that licenses PGs must also account for the fact that theseconstructions are licensed by A -chains but not A-chains. Thus, the A-chain(the book, t) of (114) does not license the PG e, unlike the A -chain (whichbook, t) of (112a), with the same t-e relation.(114) *the book was filed t [without my reading e first]For further discussion, see Taraldsen 1981, Engdahl 1983, 1985, Chomsky1982, 1986a, Kayne 1984, Longobardi 1985, Browning 1987, Cinque 1990.

The Theory of Principles and Parameters

69

Note that even the acceptable PGs are somewhat awkward; as in earliercases discussed, we are interested in the relative deviance of various constructions, which is quite clear and demands explanation. The general literature onPGs regularly uses for illustration such pairs as (115), where the first is completely grammatical and the second sharply deviant, but these cases do notsuffice to show that A -chains license PGs while A-chains do not, because(115b) is ruled out for independent reasons of control theory, as illustrated in(116) (Lasnik and Uriagereka 1988).(115) a. the book that you filed [without PRO reading e]b. *the book that was filed [without PRO reading e](116) a. the book that you filed [without PRO thinking]b. *the book that was filed [without PRO thinking]The question of S-Structure conditions also arises in connection with elements lexically identified as affixes (e.g., pronominal clitics, verbal inflections,Case features). Since these properties are commonly overt at PF, they must bemanifested at S-Structure (Lasnik 1981; we omit here the possibility that rulesof the PF component might be rich enough to handle the phenomenon). Asindicated earlier, the question becomes rather subtle if we assume the checkinginterpretation of inflectional features. Suppose again that English walked isinserted into D-Structure with the properties [walk], [past], the latter beingchecked and licensed by a syntactic rule R that joins [past] and walked.Suppose further that such functional elements as [tense] lack phonologicalmatrices and are thus invisible at PF. We need not then assume that R is alowering rule adjoining [past] to walked, to be reversed at LF; an alternativepossibility is that the D- and S-Structures are alike, with R raising the verb tothe inflectional position at LF, mirroring the process that is overt with auxiliaries and in French-type languages (for theory-internal arguments bearing onthe matter, see chapters 2 and 3). The same question arises with regard to Casemarking. Even if it is overt, the conceptual possibility remains that elementsenter the computational system with their Case features already indicated,these being checked only at the LF level. Any apparent S-Structure requirement for Case would have to be satisfied in some other way. See section 1.4.3and chapter 3.Other theory-internal considerations suggest that empty categories must belicensed at S-Structure, in particular, traces in argument chains (Lasnik andSaito 1984, 1992; see section 1.4.1). If the relation of predication holdingbetween an XP and its (syntactic) subject must satisfy S-Structure conditions,as suggested earlier, it is also natural (though not necessary) to suppose thatlicensing of an EC subject of predication should also take place at this level.

70

Chapter 1

Thus, according to Rizzis theory, the null subject parameter reduces to properties of the system of the verbal inflection: in Italian, strong agreement (Agr)licenses pro subject; in French or English, the weaker Agr does not. Wemight expect, then, that this condition must be satisfied by the S-Structureconfiguration.The plausibility of this assumption is enhanced by consideration of properties of expletive pro. Consider the D-Structures (117).(117) a. e was stolen a bookb. e seems [e to be a book missing]In a null subject language, the expressions can surface in this form, with ebeing expletive pro and e its trace; here pro is licensed by strong Agr. But ina non-null subject language, e must be replaced by S-Structure, either by anovert expletive or by raising of a book to fill this position, as in (118).(118) a. i. ?there was stolen a bookii. a book was stolen tb. i. there seems [t to be a book missing]ii. a book seems [t to be t missing]Some S-Structure property, it appears, must ensure that the options of havebeen taken by the S-Structure level, not in the LF component. The problembecomes more severe if we adopt the strong version of FI that requires thatexpletives be replaced at LF (sections 1.3.1,1.3.3). Then the S-Structure formsof (117) will appear at LF essentially as the (ii) forms of (118). It would follow,then, that the relevant distinctions must be established at S-Structure: pro islicensed at S-Structure, permitting (117) in Italian but not English. For alternative analyses, see chapters 3, 4.It has also been proposed that some of the conditions that have been assumedto apply at LF actually apply within derivations from S-Structure to PF (Jaeggli1980, Aoun et al. 1987). It cannot be that the conditions apply at the level ofPF representation itself, because at the interface level PF we have only phonetic features with no further relevant structure. The assumption would be,then, that these conditions apply either at S-Structure or at some level intermediate between S-Structure and PF.We have assumed so far that X-bar theory applies at D-Structure, its properties being carried over to S-Structure and LF by the computational processes.Suppose that X-bar theory applies at S-Structure as well. Van Riemsdijk (1989)argues that on this assumption, movement need not be restricted to minimaland maximal phrases (X0 and XP), as so far tacitly assumed. Movement of X( = X1) could be allowed, to be followed by a process of regeneration that

The Theory of Principles and Parameters

71

forms a proper X-bar structure at the S-Structure level in a minimal way. On

this analysis, (119) would be derived by movement of the N category Lsung,followed by generation of eine to satisfy X-bar theory at S-Structure, einebeing a spelling out of the -features of Lsung.(119) [eine Lsung] hat er [eine besseret] als ichasolution has he abetter (one)than IIf X-bar theory applies at S-Structure, Emondss structure-preservinghypothesis for substitution (section 1.3.1) follows in essentials, since conflictof categorial features will violate X-bar-theoretic principles. A similar conclusion will also hold for adjunction. Suppose, for example, that an X0 elementis adjoined to the YP Z, forming (120).(120) [YP X0 YP]]This structure violates X-bar theory, which requires that X0 head an X structure. Adjunction of XP to YP, however, would yield a structure consistent withX-bar theory. Adjunction of X0 to Y0 yields a two-segment category [Y0, Y0],with an internal structure invisible to X-bar theory. Pursuing this line ofthinking, it may be possible to derive a version of the structure-preservinghypothesis for adjunction: essentially, the condition that a category can beadjoined only to a category of the same bar level.1.41.4.1

Modules of LanguageGovernment Theory

We have referred several times to the notion of government, a more local

variety of command (section 1.3.1). We assume tentatively that the relevantnotion of command is c-command. The concept of government has enteredextensively into the study of the various modules of grammar. Hence, slightmodifications in formulation have wide-ranging empirical consequences (see,among others, Aoun and Sportiche 1981, Chomsky 1981a, 1986a, Kayne 1984,Lasnik and Saito 1984, 1992, Rizzi 1990).We say that governs if c-commands and there is no category that protects from government by . protects in this sense if it isc-commanded by and either (121a) or (121b) holds.(121) a. is a barrier dominating .b. intervenes between and .Government is canonical if the linear order of (, ) accords with the valueof the head parameter (Kayne 1984). We speak of X-government when the

72

Chapter 1

governor has the property X. There are two main categories of governmentto be considered: antecedent government of by an antecedent of , andhead government of by a head. We refer to these categories as propergovernment.To make the concept of locality precise, we have to spell out the notionsbarrier and intervene in (121). Consider the two in turn.We take a barrier to be an XP that is not a complement, putting aside nowthe ambiguous status of noncomplements to V under the various ramificationsof Kaynes unambiguous path theory (section 1.3.2). Thus, in (122) the bracketed expressions are all XPs, but only those subscripted B are barriers for theelements they contain.(122) a.b.c.d.e.

I wonder which book [John told the students [that [they shouldread t]]]??I wonder which book [John met [someone [B who read t]]]*I wonder how [John met [someone [B who [fixed the car t]]]]??I wonder which book [John left New York [B before he read t]]*I wonder how [John left New York [B before he fixed the car t]]

In each case the trace indicates the position of extraction, under the intendedinterpretation: thus, (122e) asks how John fixed the car, not how he leftNew York. If we extract from within a barrier, the trace left behind will notbe antecedent-governed; otherwise, it will be. When extraction crosses abarrier, the expression is deviant, indicating that antecedent government is acondition on properly formed chains. In (122a) no barriers are crossed and thesentence is fully grammatical. In the other cases a barrier is crossed and thesentences are deviant. The violations are more severe in cases (122c) and(122e), illustrating a characteristic difference between argument and adjunctextraction.It appears that not only a complement but also its specifier is exemptfrom barrierhood. Belletti and Rizzi (1981) observe that the process ofne-cliticization in Italian extracts ne from the object of the verb but not fromits subject. The object, the complement of the verb, is not a barrier to government; the clitic ne thus governs the trace left by ne-extraction from theobject, as required. But the trace of ne-extraction from the subject will not beantecedent-governed: the subject is not a complement, hence is a barrier,whether government is based on c-command or m-command. Hence, we have(123a) but not (123b).(123) a.

pro ne-hovisto [molti t]Iof.them-have seen manyI have seen many of them

The Theory of Principles and Parameters

73

b. *[molti t] ne-sonointelligenti]many of.them-are intelligentBut now consider (124b), derived from the D-Structure (124a).(124) a. pro ritengo [[molti ne]intelligenti]Ibelieve many of.them intelligentb. ne-ritengo[[molti t] intelligenti]of.them-I.believe many intelligentI believe many of them (to be) intelligentHere the complement of ritengo is a small clause. The phrase [molti ne] isthe specifier of the small clause, hence is not a complement. But extraction isnevertheless permitted. We return to other illustrations of the same point.We conclude, then, that XP is not a barrier if it is the complement of ahead H or the specifier of the complement of H. The configuration of properties is not surprising, given that the head typically shares the features ofits maximal projection and agrees with its specifier, so there is an indirectagreement relation between a maximal projection and its specifier. The sameobservation suggests that we generalize the property further: if is the complement of H, then the daughters of (its specifier and its head) are not barriers. When the head is an X0, the question of extraction from it does not arise,but it could arise in other configurations. Suppose that in a small clause (125),YP = XP, with XP being the head of YP and NP its specifier (the subject ofthe predicate XP).(125) V [YP NP XP]In (124a), then, = YP = AP, and its head is the AP intelligenti. We havealready seen that the specifier is not a barrier. Example (126) illustrates thefact that the same is true of the head.(126) whom does he consider [AP Bill [AP angry at t]]The status of (126) is no different from that of whom is he angry at. Thus,neither the complement nor the head of a complement is a barrier. Similarly,in (127) the main verb phrase of the embedded clause is not a barrier, and itsVP head is also not a barrier, so that who extracts freely.(127) I wonder [who [John [VP[VP met t] [last night]]]]Note that in the case of the small clause (126) as well as (127), we mightalso appeal to the segment theory of adjunction (section 1.3.1), requiring thata barrier be a category, not a segment, and taking the heads to be segments,hence not possible barriers.

74

Chapter 1

We have dealt in a preliminary way with case (a) of (121); consider

now case (b), with the configuration (128), where intervenes between and .(128) ... ... ... ...Recall that c-commands the intervening element , which we assume furtherto c-command ; thus, left-to-right order in (128) expresses the c-commandrelation. Two cases of intervention have been explored; following Rizzi (1990),let us call them rigid minimality and relativized minimality.(129) a. Rigid: is a head H ( arbitrary).b. Relativized: is of the same type as .Rigid minimality can be restated in terms of barriers, taking the categoryimmediately dominating to be a barrier. To spell out the concept of relativized minimality, we must characterize the relevant types. These are givenin (130).(130) a. If is a head, is a head.b. If is in an A-position, then is a specifier in an A-position.c. If is in an A -position, then is a specifier in an A -position.Recall that the concepts A- and A -position are not properly defined in currenttheory; we suggested a way to approach the problem at the end of section 1.3.2and continue to assume it here.The three basic cases of relativized minimality are illustrated in (131) forheads, A-positions, and A -positions, respectively, in capitals (see (44), (57)).(131) a. *how fix [John WILL [t the car]]b. *John seems [that [IP IT is certain [t to fix the car]]]c. * guess [CP how [John wondered [WHY [we fixed the car t]]]]In conventional terminology, case (131a) illustrates the Head Movement Constraint (HMC); case (131b) superraising; and case (131c) the Wh-Island Constraint. As the structure indicates, (131c) is to be understood as expressingJohns puzzlement as to how we fixed the car, not as a query about how hewondered.In (131a) will intervenes between fix and its trace, and both fix and will areheads. In (131b) it intervenes between John and its trace, both it and John arein A-positions, and it is the specifier of IP. In (131c), why intervenes betweenhow and its trace, both why and how are in A -positions, and why is the specifier of CP. In all three cases the expression is severely deviant.We noted earlier that adjuncts and arguments behave somewhat differentlywith regard to extraction from barriers (see (122)). The same is true in case

The Theory of Principles and Parameters

75

(130c) of intervention: compare (131c) (adjunct extraction) with (132) (argument extraction).(132) ??guess [CP what [John wondered [why [we fixed t]]]]While unacceptable, (132) is a much less serious violation than (131c).These observations have a wide range of descriptive adequacy, but fall shortof a satisfactory explanatory principle. We return to the question at the end ofthis section.We have discussed some of the properties of the first case of proper government: antecedent government. Let us turn now to the second case: head government. Throughout the modules of grammar, we find relations (H, XP),where H is a head and XP a phrase with some property assigned (or checked)by H. These relations meet locality conditions that are typically narrower thaneither variety of command and have therefore often been considered to fallunder the category of government. We noted earlier that government by a verbsuffices to assign Case, bar PRO, and license trace (section 1.3.2). In all casesthe relation is narrower than command.In Case theory we find that a verb V can assign (or check) the Case of anXP only if the XP is in a local relation to V. The verb find assigns accusativeCase to the book in (133) but not in (134).(133) a. we found the bookb. we found [AP the book incomprehensible](134) a. we found [CP that [IP the book was incomprehensible]]b. we found the answer [ when the book arrived]In (133) no barrier protects the book from government by find. The same istrue of (134a), but here the intervening head C0 ( = that) bars government ofthe book by find. In (134b) is a barrier. In (134), then, the book must receiveCase in some other way. If the construction in which it appears is infinitival,it will not receive Case at all, and the construction is ungrammatical, asin (135).(135) a. *we tried [CP e [IP the book to win a prize]]b. *we found John [ when the book to arrive]In (135a) the intermediate head C ( = e) bars government of the book, as in(134a). It is natural to suppose, then, that government enters crucially intoCase theory.The positions to which a verb can assign Case are also, typically, those inwhich a trace can appear, suggesting that government by a verb can licensetrace. Thus, alongside (133), (134), and (135), we have (136) and (137).

76

(136) a.b.c.d.

Chapter 1

the bookthe bookthe bookthe book

was found twas found [AP t incomprehensible]was believed [t to be incomprehensible]seems [t to be incomprehensible]

(137) a. *the book was found [CP that [IP t was incomprehensible]]b. *the book was tried [CP e [IP t to win a prize]]Turning to PRO, we find a similar configuration. PRO cannot appear ingoverned positions, those in which, with the proper form of the verb, Case canbe assigned or trace licensed.(138) a. *we found PROb. *we found [AP PRO incomprehensible]PRO is also excluded from positions that are governed but in which Casecannot be assigned, as in (139).(139) a.b.c.d.

*they expressed the belief [IP PRO to be intelligent]

*we expected [there to be found PRO]*it was believed [PRO to be intelligent]*it seems [PRO to be intelligent]

As discussed in section 1.3.2, we assume that the verb believe in English takesan IP, not a CP, complement. Thus, PRO is governed by belief in (139a) andbelieved in (139c), though no Case marking is possible. The constructions arebarred. Thus, (139a) does not mean that they expressed the belief that someoneor other is intelligent, with arbitrary PRO, or that they expressed the beliefthat they are intelligent, with PRO bound by they. Similarly, (139c) does notmean that it was believed that someone or other is intelligent; the phoneticform can only be interpreted with it raised, leaving a trace in the position ofPRO. And (139b) does not mean that we expected there to be found someoneor other, with arbitrary PRO.A locality relation between a head and an XP also is found in -theory.Thus, a verb -marks only XPs within the VP that it heads. On the assumptionsof section 1.3.2, the verb -marks the specifier of the VP and sisters of V,relations that do not strictly fall under government theory, along with thecomplement, which does.A closer look at head government shows that C ( = C0), whether overtor null, behaves rather differently from other heads we have considered.Thus, PRO is not barred from positions governed by C, as illustratedin (140).(140) we decided [CP e [IP PRO to leave at noon]]

The Theory of Principles and Parameters

77

Similarly, C does not appear to license trace. Thus, we find that XPs movefairly freely, including VP and CP, but IP does not.(141) a. [VP admit that he was wrong], John never will tVPb. [the claim tCP] was made [CP that John was wrong]c. *[IP Bill will visit tomorrow], I think [that tIP]C also does not license trace of subject. Thus, although C governs the trace in(142), extraction is barred; as is well known, languages have various specialdevices to overcome the problem (see below).(142) *who did you say [CP that [IP t left yesterday]]Properties of C are further illustrated in (143).(143) a. *John was decided [CP e [IP t to leave at noon]]b. *we decided [CP e [IP John to leave at noon]]c. we decided [CP e [IP PRO to leave at noon]]If the head e of CP were to license the trace in (143a), raising of John to themain clause subject position would be permitted. Note that e does not intervenebetween John and its trace if we adopt the notions of relativized minimality(it does under the assumptions of rigid minimality). Examples (143b) and(143c) illustrate the fact that e does intervene between the matrix verb and theembedded subject, blocking a government relation between them. Thus, in(143b) John cannot receive Case from a matrix verb, and in (143c) PRO isallowed, neither the matrix verb nor C properly governing it. Thus, C functionsas an intervening head, but not a proper governor, licensing trace.Similarly, while other X0s typically raise, head-governing and thus licensingthe trace left behind, that is not true of C. We find V-raising to V or I, N-raisingto V (noun incorporation), I-raising to C (V-second), and so on, but we do notfind C-raising to the matrix verb that governs it (e.g., incorporation into ahigher verb of a verb that has been raised to V-second position). These factstoo would follow from failure of C to properly govern.C also differs from other heads with respect to barrierhood. Recall that ahead typically frees a complement and its daughters (specifier and head) frombarrierhood. But the situation is different in the case of C. Consider the following observations of Torrego (1985), who notes the contrast between (144)and (145) in Spanish.(144) a.

[ de que autora] [no sabes

b. *esta es la autora [CP[ de la que] C [IP[ varias

this is the author [[ by whom]severaltraducciones t] han ganado premios internacionales]]translationshave won international awards]In (144a) CP is the complement of sabes and is therefore not a barrier; itsspecifier is also not a barrier, and antecedent government is not blocked, soextraction is permitted. In (144b), however, extraction is blocked; even though is the specifier of the complement of C, it is a barrier blocking antecedentgovernment. A plausible conclusion is that C does not free its complement (orthe daughters of the complement) from barrierhood, unlike other X0s that wehave considered, though pursuit of this issue takes us into complexities thatwe will ignore here.C is unlike other heads that we have considered in other respects as well.Unlike inflectional elements, it is not a feature of the verb; thus, its specifieris not L-related, and is therefore an A -position, not an A-position as are otherspecifiers (section 1.3.2). C also lacks the semantic content of some otherheads.In general, a good first approximation is that the proper governors arerestricted to the lexical features (lexical categories, inflectional features of theverb, and perhaps others) and that only proper governors free their complements from barrierhood.We have seen that C does not suffice as the required head governor of asubject trace. In (143a) the null complementizer e failed to license the traceof A-movement. The same failure is observed with an overt C in the similarconfiguration (145).(145) *John is important [CP (for) [IP t to leave at noon]]The paradigm with A -movement (as opposed to A-movement) of the subjectis less straightforward. While (142) is unacceptable, it becomes perfectly wellformed if the overt complementizer is absent.(146) who did you say [CP[IP t left yesterday]]In the approach outlined above, the question is how the subject trace is headgoverned. Suppose there is a null complementizer and the movement of whoproceeded successive-cyclically via the Spec of the lower CP. Then the representation would be as in (147).(147) who did you say [CP t e [IP t left yesterday]]Spec-head agreement takes place between t and e in this configuration.We tentatively suggest that this agreement provides e with features allowing

The Theory of Principles and Parameters

79

it to license the trace t. The ungrammaticality of (142) (commonly called

the that-trace effect), on the other hand, indicates that such feature sharingis not possible with the overt complementizer that. Note too that thereis no derivation similar to that in (147) available for (143) since, quite generally, movement to an A-position cannot proceed through [Spec, CP]. Suchimproper movement results in an illicit A-bound variable, as in constructionsthat fall under the principle Command discussed in section 1.3.3 (see alsosection 1.4.2).One concern of some of the early literature on proper government (Huang1982, Lasnik and Saito 1984) was the absence of that-trace effects withadjuncts. Thus, (148) is good with or without that.(148) why do you think [(that) John left t]Since adjuncts, like subjects, are not complements, the question arises howtheir traces are head-governed. When that is absent, the same mechanism isavailable as we posited for (147). But when that is present, no such mechanismexists, as demonstrated by (142) (see Rizzi 1990). The framework of Lasnikand Saito was slightly different so that the technical problem was actuallyapparent lack of antecedent government, but their solution can carry overunder present assumptions. They suggest that as a consequence of the Projection Principle, argument traces must be licensed (-marked, in their terminology) at S-Structure, while adjunct traces are licensed only at LF. (142) willthus be ruled out at S-Structure while (148) will not be. Then in the LF component, that, being semantically empty, can be eliminated. The resulting configuration will allow government of the adjunct trace in just the same way thatit allowed government of the subject trace in (147), if the head governmentrequirement holds at LF.In the examples we have been considering, an adjunct trace is possible in asituation in which a subject trace is not. We also find (nearly) the oppositestate of affairs. (149), with movement of the adjunct how, is completely impossible, whereas (150), with movement of a subject, is much less severelydeviant.(149) * how do you wonder [whether John said [Mary solved theproblem t]](150) ??who do you wonder [whether John said [t solved the problem]]In both examples the initial trace is appropriately governed, in the manner justdiscussed. The difference between (149) and (150) must lie elsewhere.Consider the structures of the examples in more detail. We assume thatwhether occupies the Spec of the CP in which it appears.

80

Chapter 1

(151) *how do you wonder [CP whether [IP John said [CP t e [IP Mary solvedthe problem t]]]](152) ??who do you wonder [CP whether [IP John said [CP t e [IP t solved theproblem]]]]Lasnik and Saito argue that not just initial traces, but also intermediatetraces, must be appropriately governed. But the intermediate trace t is notantecedent-governed in either (151) or (152). In the case of (152), Lasnik andSaito argue, the intermediate trace antecedent-governs the initial trace t andthen is deleted in the LF component. Such a derivation is not possible for (151)if, as they suggest, all licensing of adjunct traces is at the level of LF. Thus,if t is present in the LF representation of (151), t will be properly governedbut t will not be. And if t is not present at the LF level, then t will not beantecedent-governed. Either way, then, the representation contains a trace thatis not properly governed.We have just seen how (149) and (150) can be distinguished in terms ofproper government. In (149) there will inevitably be an offending trace, butthere need not be one in (150). However, although (150) is much better than(149), it is not perfect, and that fact remains to be explained. Evidently,wh-movement is not permitted to bypass an intermediate [Spec, CP], as it didin both (151) and (152). This is one consequence of the subjacency constrainton movement proposed by Chomsky (1977) as a partial unification of severalearlier constraints on movement, including those of Chomsky (1964) and Ross(1967). Subjacency violations are characteristically less severe than propergovernment violations, all else equal. Another property of subjacency thatdistinguishes it from proper government was alluded to in section 1.3.3. Subjacency constrains overt movement, but apparently does not constrain covertmovement between S-Stracture and LF. This is seen in the following nearminimal pair, repeated from (95a), (96a):(153) *who do you like [books that criticize t](154)

who [t likes books that criticize whom]

The S-Structure position of whom in (154) is the LF position of the trace of

whom after LF raising, which yields a structure that is, in relevant respects,identical to the S-Structure (and LF) representation of (153). Yet the twoexamples contrast sharply in grammaticality. Similarly, as discussed by Huang(1982), in languages with interrogative expressions in situ, such as Chinese,the LF movement of those expressions is not constrained by Subjacency. (155)(=(95b)) is the Chinese analogue of (153), but it is acceptable, muchlike (154).

The Theory of Principles and Parameters

81

(155) ni xihuan [piping shei de shu]

you like[criticize who rel book]While LF movement seems not to conform to Subjacency, it does respectthe proper government requirement. The following Chinese example allowsLF movement of sheme what into the higher clause, but does not allow suchmovement for weisheme why.(156) ni xiang-zhidao [Lisi weisheme mai-le sheme]you wonderLisi whybought what(156) can mean (157) but not (158).(157) what is the thing such that you wonder why Lisi bought that thing(158) what is the reason such that you wonder what Lisi bought for thatreasonThe trace of the LF movement of weisheme to the higher clause will not beproperly governed, under the operation that yields the barred interpretation(158).Having reviewed some aspects of the theory of movement, let us return tothe basic concept of government that enters crucially into this and apparentlyother modules of grammar. We noted that government is a local form ofcommand, tentatively taking the operative notion to be c-command. Two elements of locality were introduced: government is blocked by certain barriersand by an intervening category (the Minimality Condition). The MinimalityCondition has two variants: Rigid and Relativized Minimality. We kept tothe latter, following Rizzi (1990). For the theory of movement, we took therelevant forms of government, proper government, to be antecedent government and head government by a lexical head or its features (the verbalinflections).As discussed earlier, these ideas have considerable descriptive adequacy butlack the generality and clarity that we would hope to find in an explanatorytheory of language (see section 1.1). In particular, the basic and appealingintuition that lies behind the principle of Relativized Minimality is not reallycaptured by the mechanisms proposed, which list three arbitrary cases and addunexplained complexity (the role of specifier for two of the cases); see (130).The basic intuition is that the operation Move should always try to construct the shortest link. If some legitimate target of movement is alreadyoccupied, the cost is deviance (see Rizzi 1990, 2224; also chapter 3). We mayregard this as part of the general principle of economy of derivation. Conditions quite independent of Relativized Minimality require that only heads can

82

Chapter 1

move to head positions, and only elements in A-positions to A-positions. Furthermore, again for independent reasons, XPs can move only to specifierpositions, and can move only to a position that c-commands it. Hence, thespecial properties listed in (130) can be eliminated from the formulation ofthe condition, which reduces to (159).(159) Minimize chain links.If this approach is viable, we can eliminate the intervention condition of (121)in favor of a general condition on economy of derivations, restricting the definition of government to (160).(160) governs if c-commands and there is no barrier for c-commanded by .We want government to be constrained by the same locality condition thatappears in binding theory and elsewhere. Thus, an antecedent binds ananaphor just in case it is the local binder; that is, there is no bound by and binding (see section 1.4.2). Similarly, governs only if there is no governed by and governing . This condition is now satisfied for antecedentgovernment, by the economy condition (159). But an analogue still has to bestipulated for head government. That raises the question of whether the headgovernment condition is, in fact, superfluous (Frampton 1992; also see chapter3). We will proceed on the assumption that it is required, noting the problematic aspect of this assumption.To make this intuitive account more precise and descriptively more accurate,we have to explain in what sense a cost accrues to failure to make the shortest move, and why violation of the economy condition is more severe foradjuncts than arguments, as noted throughout. Adapting mechanisms justdiscussed, we might suppose that when a chain link is formed by Move , thetrace created is assigned * if the economy condition (159) is violated as it iscreated (a version of the -marking operation of Lasnik and Saito 1984, 1992).Note further that only certain entities are legitimate LF objects, just as onlycertain entities are legitimate PF objects (e.g., a [+high, +low] vowel, or astressed consonant, is not a legitimate PF object, and a derivation that yieldssuch an output fails to form a proper SD). We therefore need some notion oflegitimate LF object. Suppose that the chain C of (161) is a legitimate LFobject only if C is uniform (see Browning 1987).(161) C = (1, , n)The only other legitimate LF objects are operator-variable constructions (,), where is in an A -position and heads a legitimate (uniform) chain.

The Theory of Principles and Parameters

83

Uniformity is a relational notion: the chain C is uniform with respect to P

(UN[P]) if each i has property P or each i has non-P. One obvious choicefor the relevant property P is L-relatedness, which we have suggested toground the distinction between A- and A -positions; see section 1.3.2. A chainis UN[L] if it is uniform with respect to L-relatedness. Heads and adjuncts arenon-L-related and move only to non-L-related positions; hence, the chains theyform are UN[L]. An argument chain consists only of L-related positions, henceis UN[L]. The basic typesheads, arguments, adjunctsare therefore uniformchains, legitimate objects at LF.Taking this as a first approximation, we now regard the operation of deletion, like movement, as a last resort principle, a special case of the principleof economy of derivation (make derivations as short as possible, with links asshort as possible): operations in general are permissible only to form a legitimate LF object. Deletion is impermissible in a uniform chain, since these arealready legitimate. Deletion in the chain C of (161) is, however, permissiblefor i in an A -position, where n > i> 1 and n is in an A-positionthat is, thecase of successive-cyclic movement of an argument. In this case a starred tracecan be deleted at LF, voiding the violation; in other cases it cannot.An expression (an SD) is a Subjacency violation if its derivation forms astarred trace. It is an Empty Category Principle (ECP) violation if, furthermore, this starred trace remains at LF; hence, ECP violations are more severethan Subjacency violations, which leave no residue at LF. Note that the conceptECP is now a descriptive cover term for various kinds of violations that aremarked at LF, among them, violations of the economy principle (RelativizedMinimality).We continue to assume that traces must be properly governed: both antecedent- and head-governed by a lexical feature (i.e., not C). To unify the account,let us say that a trace is marked * if it fails either of these conditions. Thus, atrace will be marked ** if it fails both, or if it fails one along with the economycondition, and it will be marked *** if it fails all three, with multiple starringindicating increased deviance. We have failure of antecedent government inthe case of movement over a barrier, or in the case of lowering in violation ofthe C-Command Condition; unless the offending trace deletes, the violationremains at LF. We speculated earlier that only proper governors free theircomplement from barrierhood. It will follow, then, that IP (the complement ofC) will be free from barrierhood only if C has a lexical feature: that will happenif VI raises to C.Government now is the special case of local c-command when there is nobarrier. Subjacency violations fail the economy condition that requires chainlinks to be minimal. There is generally further deviance if the violation leaves

84

Chapter 1

a residue in the LF representation. Traces must be properly governed (headand antecedent-governed), requiring raising rather than lowering, with deviance if raising crosses a barrier. The special properties of C, manifest in manyrespects as we have seen, impose further constraints on extraction of subjects.Deletion, like movement, is driven by FI: the requirement that derivations mustform legitimate LF objects. The guiding principle is economy of derivationsand representations: derivations contain no superfluous steps, just as representations contain no superfluous symbols. See chapters 2 and 3 for furtherdiscussion.1.4.2

Binding Theory

Among the imaginable anaphoric relations among NPs, some are possible,some are necessary, and still others are proscribed, depending on the natureof the NPs involved and the syntactic configurations in which they occur. Forexample, in (162) him can be referentially dependent upon John (can take Johnas its antecedent), while in (163) it cannot.(162) John said Mary criticized him(163) John criticized himThat is, (163) has no reading in which him refers to John, in the way thathimself in (164) does.(164) John criticized himselfApparently, a pronoun cannot have an antecedent that is too close to it. Notethat in (162), where antecedence is possible, a clause boundary intervenesbetween pronoun and antecedent. There is no such boundary between pronounand antecedent in (163).As we have seen in section 1.3.3, distance in this sense does not alwayssuffice to make antecedence possible. Consider (165), where a clause boundary intervenes between he and John, yet an anaphoric connection isimpossible.(165) he said Mary criticized JohnImportantly, it is not the linear relation between pronoun and name that inhibitsanaphora. This is evident from consideration of (166), in which he once againprecedes John, yet anaphora is possible.(166) after he entered the room, John sat downSimilarly, in (167) his can take John as its antecedent.(167) his boss criticized John

The Theory of Principles and Parameters

85

The generalization covering (165)(167) is approximately as in (168).

(168) A pronoun cannot take an element of its (c-command) domain as itsantecedent.The c-command domain of an element is the minimal phrase containing it.Thus, in (165) the domain of the pronoun is the entire sentence. Since, trivially,the putative antecedent is included in that domain, the anaphoric interpretationis inconsistent with the generalization (168). In (166), on the other hand, thedomain of the pronoun is the adverbial clause, which does not include theantecedent John. Similarly, in (167) the domain of the pronoun is the subjectNP, his boss, which does not include John.There are a number of ways that the generalization in (168), which relatesaspects of the structure and meaning of an utterance, might be expressed inthe theory. One way is in terms of a constraint (171) on binding, a structuralrelation defined in (169), and freedom defined in (170).(169) binds if c-commands and , are coindexed.(170) If is not bound, then is free.(171) An r-expression (fully referential expressionnot a pronoun or ananaphor) must be free.The fundamental relation in this approach, coindexation, is a symmetric one.For an alternative in terms of an asymmetric relation, linking, see Higginbotham 1983, 1985. Consider how (171), often called Condition C of the bindingtheory, will treat the examples in (165)(167). Representation (172), for sentence (165), will be excluded, while representations (173) and (174), for (166)and (167), respectively, will be allowed.(172) *hei said Mary criticized Johni(173)

after hei entered the room, Johni sat down

(174)

hisi boss criticized Johni

Note that according to (171), (175) is permitted if i j.

(175) hei said Mary criticized JohnjHence, if (171) is truly to play a role in capturing the generalizationin (168), an interpretation must be provided for the indexing in (175)that explicitly precludes the impossible interpretation. (176) suffices inthis case.(176) If the index of is distinct from the index of , then neither nor is the antecedent of the other.

86

Chapter 1

Shortly we will see reason to strengthen this constraint on interpretation of

contraindexation.Returning now to the phenomenon in (163), given that there, too, we founda constraint on antecedence, it is reasonable to suppose that (176) should againplay a role in the account. Evidently, all that is necessary is that the configuration (177) be allowed and (178) prohibited.(177)

Johni criticized himj

(178) *Johni criticized himi

(171) will not be effective in excluding (178), since that constraint is limitedto circumstances where the bindee is an r-expression, while in (178) the bindeeis a pronoun. Further, we do not want to generalize (171) to include pronounsas bindees, since that would incorrectly preclude antecedence in (162) bydisallowing representation (179).(179) Johni said Mary criticized himiAs noted earlier, there is a locality effect involved in this paradigm. A pronounis clearly able to be within the domain of its antecedent, hence, is allowed tohave a binder, but must not be too close to it. (180) is a rough statement ofthe necessary constraint (Condition B of the binding theory).(180) A pronoun must be free in a local domain.The precise nature of the relevant local domain remains to be specified. Theexamples under consideration suggest that the local domain is approximatelythe minimal clause containing the pronoun. We will limit our attention here topurely structural approaches. See Williams 1989 for an account in terms of-roles, and Reinhart and Reuland 1993 for one based on predication.Note that, as predicted, a pronoun can have an antecedent in its clause justas long as that antecedent does not c-command it. (181) is a permissiblerepresentation.(181) Johnsi boss criticized himiAnaphors, such as reciprocals and reflexives, require antecedents that bindthem. In this, their behavior is quite different from that of pronouns, whichmay have binding antecedents, but need not. Additionally, at least in Englishand a number of other languages, the antecedent of an anaphor must be localto the anaphor. In particular, we have (182), Condition A of the binding theory.(182) An anaphor must be bound in a local domain.Under the null hypothesis that the local domain is the same for ConditionA and Condition B, we predict complementarity between pronouns and

The Theory of Principles and Parameters

87

anaphors. This prediction is confirmed to a substantial degree. The ill-formed

(178) becomes grammatical, if its bound pronoun is replaced by an anaphor,as in (183).(183) Johni criticized himselfiConversely, the well-formed (179) becomes bad, if its pronoun is replaced byan anaphor.(184) *Johni said Mary criticized himselfiAll that remains for this rough approximation is to specify the interpretationfor coindexation. That is, we must guarantee that (183) cannot mean that Johncriticized Harry. The necessary principle of interpretation is not entirelyobvious. For the moment, let us assume (185), temporarily leaving open theprecise import of the notion antecedent.(185) If the index of is identical to the index of , then is theantecedent of or is the antecedent of .We now have three syntactic constraints, repeated as (186AC), and the twoprinciples of interpretation (176) and (185).(186) A. An anaphor must be bound in a local domain.B. A pronoun must be free in a local domain.C. An r-expression must be free.Before considering further the precise nature of the local domain involvedin Conditions A and B, we return briefly to the semantic import of indexingrelations. Earlier, we hinted that (176) would need to be strengthened. Consider, in this regard, representation (187).(187) after Johni walked in, Johnj criticized himiThis representation is fully consistent with the only relevant syntactic conditions, Conditions B and C. Neither occurrence of John is bound, and him isfree in its clause. According to (176), Johnj cannot be the antecedent of himi,but Johni is an appropriate antecedent. It is thus unclear why (187) does nothave the interpretation (and status) of (188), where coreferential interpretationfor the two occurrences of John contributes only a minor degree ofdeviance.(188) after Johnj walked in, Johnj criticized himselfjGiven the sharp contrast between (188) and (187) on the relevant interpretation, the extreme deviance of (187) cannot be attributed to repetition of thename, but rather must stem from the relation between the second occurrence

88

Chapter 1

of the name and the pronoun. We must rule out (intended) coreference betweenthese two NPs, even when the second does not take the first as its antecedent.We achieve this result by strengthening (176) to (189).(189) If the index of is distinct from the index of , then and arenoncoreferential.(185) must now be modified, in corresponding fashion, to (190).(190) If the index of is identical to the index of , then and arecoreferential.Consider the contrast between the mildly deviant (191) and the severelydegraded (192), both on the relevant interpretation involving only oneindividual.(191) ?after John walked in, John sat down(192) *John criticized JohnCondition C excludes representation (193) for (192), while permitting (194).(193) *Johni criticized Johni(194)

Johni criticized Johnj

(189) now correctly guarantees noncoreference for the two NPs in (194). Butnow consider (191). On the desired interpretation, the two occurrences of Johncannot be contraindexed, since (189) would demand noncoreference for sucha representation. Coindexation, too, would be problematic under (185), since(185) demands antecedence in one direction or the other, yet a name, beingfully referential in its own right, presumably cannot have an antecedent. Thisproblem does not arise once (190) is substituted for (185).Thus far we have limited our attention to anaphoric relations among singularNPs. Certain complications arise when we extend the scope of the investigation to plurals. The configurations giving rise to noncoreference effects, by themechanisms outlined above, seem to give rise to disjoint reference effects aswell (Postal 1966a). Just as a coreferential interpretation of the two NPs ismarkedly degraded in (195), so is overlap degraded in (196).(195) he likes him(196) they like himCorrespondingly, (197), whose NPs lexically demand coreference, is bad, and(198), whose NPs lexically demand overlap in reference, is substantiallydegraded also.

The Theory of Principles and Parameters

89

(197) *I like me(198) ?*we like meThis suggests that (189) should be further strengthened.(199) If the index of is distinct from the index of , then and aredisjoint in reference.In (195)(198) Condition B excludes coindexing. (199) then demands disjointreference of the necessarily contraindexed NPs. But a problem arises forpronouns not in configurations subject to Condition B. Consider (200)and (201).(200) they think he will be victorious(201) we think I will be victoriousIn contrast with (197) and (198), (200) and (201) allow an interpretation wherethe reference of the second NP is included in the reference of the first. Theresult is that (200) is ambiguous and (201) is grammatical. But given the twoprinciples of interpretation (190) and (199), there is now no possible representation available for these examples. Neither (202) nor (203) will yield a consistent interpretation for (201).(202) wei think Ij will be victorious(203) wei think Ii will be victoriousBy (199), in representation (202) we and I must be disjoint in reference, butthis is inconsistent with the lexical meanings of the two pronouns. And by(190), in representation (203) the two pronouns must be coreferential, whichis again inconsistent with their lexical meanings. Note further that it will notdo to weaken (190) so that it only demands overlap in reference, rather thancoreference. This is so since in (204), for example, coreference is clearlydemanded between the subject pronoun and the object reflexive, but under thehypothesized weakening, overlap should suffice.(204) theyi praised themselvesiEvidently, we require a richer set of notational possibilities than we have seenso far. At least three circumstancescoreference, disjoint reference, andoverlap in referencemust be accommodated. But the purely binary distinction provided by coindexing versus contraindexing straightforwardly allowsfor only two. To overcome this limitation, one notational device sometimesused is an index that is not a simple integer, but rather is a set of integers

90

Chapter 1

(Sportiche 1985). (It might seem tempting to take cardinality of index to correspond to cardinality of the referent of the NP. But such a move has no formalbasis and faces insurmountable difficulties. See Higginbotham 1985, Lasnik1989.) In accord with this convention, free is redefined as follows:(205) is free with respect to if either does not c-command or theintersection of the indices of and is null.Correspondingly, we modify interpretive rule (199).(206) If the intersection of the index of and the index of is null, then and are disjoint in reference.The problematic contrast between (198) and (201) is now straightforwardlyhandled. By Condition B, me in (198) must be free, as in (207a) or (207b).(207) a. we{i} like me{j}b. we{j,k} like me{i}(206) then demands of these representations that the subject and object bedisjoint in reference. In (201), on the other hand, Condition B is irrelevant.The indices of subject and object are therefore permitted to overlap (thoughstill not to be identical, given (190), which we maintain).(208) we{i,j} think I{i} will be victoriousThe phenomenon of split antecedence is similarly accommodated, as displayed in (209ab).(209) a. John{i} told Mary{j} that they{i,j} should leaveb. John{i} told Mary{j} that they{i,j,k} should leaveSeveral other possibilities might also be considered. Thus, in place of theresort to set indices, we might enrich the interpretation provided for simpleindices of the sort considered earlier. Consider the following interpretiveprocedure:(210) a. Suppose NP and are coindexed. Theni. if is an anaphor, it is coreferential with NP;ii. if is a pronoun, it overlaps in reference with NP.b. Suppose NP and are contraindexed. Then they are disjoint.The standard cases of coreference, distinct reference, and disjoint referencenow fall into place. In (195)(198) contraindexing is required by Condition B,and the pronouns are interpreted as disjoint. In (200)(204) coindexing ispermitted, and (210aii) yields the intended interpretation of overlap in reference. It remains, however, to deal with the phenomenon of split antecedence,

The Theory of Principles and Parameters

91

and further questions arise in the case of more complex constructions that wehave not considered.Another possibility would be to unify the indexing and interpretive procedures along with the binding conditions themselves, dispensing with indexingand simplifying (210) to (211), where D is the relevant local domain.(211) a. If is an anaphor, interpret it as coreferential with ac-commanding phrase in D.b. If is a pronoun, interpret it as disjoint from every c-commandingphrase in D.Following Lasnik (1976), we restate the former indexing requirement forr-expressions along the same lines.(212) If is an r-expression, interpret it as disjoint from everyc-commanding phrase.Nothing is said about interpretation in other cases. The standard examplesare interpreted straightforwardly. Split antecedence is now understood tobe a special case of free reference. Thus, in (209) any interpretation is permitted, including those indicated in (209), and also others, for example, an interpretation in which they is taken to refer to John and some third party, butnot Mary.What about more complex cases such as (213) (Wasow 1972)?(213) the woman who loved himi told himj that Johni was intelligentHere, we have to exclude the interpretation in which the two pronouns andJohn all corefer. The problem is that the binding conditions permit both Johniand himj to corefer with himi. It then follows, incorrectly, that Johni, and himjcan be coreferential. In the theory outlined earlier, this was excluded by thefact that coindexing is an equivalence relation, so that coindexing of both Johniand himj with himi entails that John is coindexed with himj, which is barredby Condition C. But we now have no coindexing, hence no equivalencerelation.However, the same result is achieved simply as a consequence of the interpretation itself (Lasnik 1976). By (212), Johni is disjoint from himj. Freeinterpretation allows the two pronouns to corefer and allows Johni to coreferwith himi. If we adopt these options, himj and Johni corefer, and we have aninconsistent interpretation, with Johni both coreferential with and disjoint fromhimj. Nothing further need be said. Many other complex cases follow in thesame way.The theory outlined earlier, which is the standard one, involved an indexingprocedure that satisfies the binding conditions and (explicitly or implicitly) an

92

Chapter 1

interpretive procedure. The approach just sketched unifies all three into aninterpretive procedure. Whichever approach is followed, it now remains toconsider the local domain in which anaphors must be bound and pronounsfree.Thus far the local domain has been the minimal clause containing theanaphor or pronoun. But this characterization is inadequate for a wider rangeof phenomena. In (214) the anaphor is free in its minimal clause, yet theexample is well formed.(214) Johni believes [himselfi to be clever]Similarly, (215) is deviant even though the pronoun is free in the complementclause.(215) *Johni believes [himi to be clever]We take the relevant difference between these examples and the embeddedclause cases considered earlier to be in terms of government. In (214) and(215) the main verb governs the subject of the infinitival complement, as isevident from the accusative Case that shows up on that subject. In (216), onthe other hand, there is clearly no such government relation, and the grammaticality judgments are the reverse of those in (214), (215).(216) a. *Johni believes [himselfi is clever]b. Johni believes [hei is clever]The local domain, or governing category as it is frequently called, involvesreference to government, roughly as in (217), as a first approximation.(217) The governing category (GC) of is the minimal clause containing and a governor of .In (214) and (215) the GC for the anaphor or pronoun is the entire sentence,since the governor, believe, is in the higher clause. Since both the anaphor andpronoun are bound in that domain, the former example is good, in obedienceto Condition A, and the latter bad, in violation of Condition B. In (216) theGC is the lower clause, since the subject is assigned nominative Case by agovernor internal to that clause, finite I (assuming that government is definedin terms of m-command). Since within the lower clause, there is no binder forthe subject of that clause, (216a) is in violation of Condition A, and (216b) isin conformity with Condition B. Note that (217) correctly predicts that thedifference between finite and infinitival complements is limited to subjectposition. With respect to object position, finite and nonfinite clauses areparallel.

The Theory of Principles and Parameters

93

(218) a. *Johni believes [Mary likes himselfi]

b.himi(219) a. *Johni believes [Mary to like himselfi]b.himiIn all four examples the GC for the anaphor or pronoun is the embedded clause,since the verb of the embedded clause is a governor of its object.The local domain for Conditions A and B can be NP as well as IP, as seenin (220).(220) *Johni likes [NP Bills stories about himselfi]This suggests that (217) should be extended in the obvious way to include NP.The large NP would then be the GC for himself since about governs thatanaphor. However, matters are slightly more complicated than that: unexpectedly, (221) is grammatical.(221) Johni likes [stories about himselfi]Under the suggested extension, (221) should also be bad.Note that in (220), in contrast with (221), the large NP contains not just theanaphor, but also a potential binder, that is, another NP that c-commandsthe anaphor. Our final modification incorporates this observation, and alsogeneralizes from NP and IP to complete functional complex (CFC), where aCFC is a projection containing all grammatical functions compatible withits head.(222) The GC for is the minimal CFC that contains and a governor of and in which s binding condition could, in principle, be satisfied.This correctly distinguishes (220) from (221). As noted above, there is apotential binder, Bills, for the anaphor in the large NP in (220), but none in(221). In the latter example the GC for the anaphor is thus the entire sentence,and Condition A is satisfied. Under the hypothesis alluded to in section 1.3.2that subjects are base-generated internal to VP, the VP will be the GC, withthe trace of the subject (which has itself moved to the [Spec, IP]) serving asthe binder.Note that the presence or absence of a potential binder (as opposed to anactual one) should play no role for Condition B, since there is no requirementthat a pronoun have a binder at all. Hence, the minimal CFC containing anda governor of (where is a pronoun) should always be the minimal suchCFC in which s binding condition could, in principle, be satisfied. Thispredicts that (223) and (224) should both be good, if in fact the NP object oflikes in (224) qualifies as a CFC.

94

Chapter 1

(223) Johni likes [Bills stories about himi]

(224) Johni likes [stories about himi]As expected, (223) is perfect. (224), while perhaps slightly worse, is stillreasonably acceptable. This latter example thus provides one context wherethe usual distinctness in distribution between anaphors and pronouns seems tobreak down. (221), with himself in place of him, was also, of course, grammatical. Note that, as predicted, distinct distribution is maintained if there isan actual binder within the large NP, as in (225).(225) a. I like [Johnsi stories about himselfi]b. *himiThe NP Johns stories about ___ is the smallest potential CFC in which Condition A or B could be satisfied. While in (225a) Condition A is satisfied in thatdomain, in (225b) Condition B is not.There is some evidence that the apparent overlap in distribution seen in(221), (224) is only illusory. In (224), where him is construed as John, thestories are not taken as Johns. This becomes even clearer in (226), since inthat example the meaning of the verb virtually forces the stories to be Johns.(226) ?*Johni told [stories about himi]This suggests that (224) actually can have a structure similar to (223), but withthe subject of the NP phonetically null. In that case the NP object of likeswould clearly constitute a CFC. In (226), on the other hand, even if the NPobject of told has a null subject, himi will still be illicitly bound in the minimalCFC, since that subject is understood as John.However, there is one other situation where the usual disjoint distributiondefinitely breaks down. English has, to a limited extent, configurations permitting long-distance anaphors. (227) is a representative example.(227) Maryi thinks [[pictures of herselfi] are on display]Though herself is free within both an NP and a finite clause here, it is boundin its GC, the entire clause. There is no potential binder for the anaphor anywhere in the lower clause, so Condition A could not be satisfied, even inprinciple, within the lower clause. Thus, herself is permitted to seek its binderin the upper clause, where, in fact, it finds it. Now note that a pronoun is possible in place of the anaphor.(228) Maryi thinks [[pictures of heri] are on display]The NP pictures of her (if it has a phonetically null subject), or the embeddedclause (otherwise), is the smallest CFC that contains her and a governor of

The Theory of Principles and Parameters

95

her (of or pictures, depending on certain assumptions about assignment of

genitive Case; see section 1.4.3) and in which her could, in principle, be free.And her is, in fact, free in that domain. The limited overlap in distributionthat exists is thus correctly accounted for by the relativized notion of GCin (222).There is one remaining problem to consider before we leave this topic.Recall example (216a), repeated here as (229).(229) *Johni believes [himselfi is clever]Under the earlier absolute notion of GC, this was correctly excluded by Condition A. But under the characterization in (222), it is not. Though himself hasa governor (finite In) in the lower S, there is no potential binder. The GCshould therefore be the entire sentence, and John should be available as a legalbinder. Assuming the basic correctness of the formulation of binding theorywe have been developing, something other than Condition A must be responsible for the ill-formedness of (229). We suggest that the relevant condition isone discussed in section 1.4.1, which excludes traces from configurations inwhich they are not properly governed. On the face of it, this condition mightseem irrelevant, because there is no trace evident in (229). However, it isplausible to regard the relation between a reflexive and its antecedent asinvolving agreement. Since agreement is generally a strictly local phenomenon, the reflexive must move to a position sufficiently near its antecedent. Thismight happen in the syntax, as in the cliticization processes of the Romancelanguages. If not, then it must happen in the LF component. In (229) thismovement will leave a trace that is not properly governed. This approachdirectly accounts for the familiar observation that binding relations andmovement processes fall under abstractly very similar constraints. Further, ifit is, indeed, the requirement of agreement that is forcing the (LF) movementof the reflexive, (230), which otherwise could have been problematic, isruled out.(230) *himself leftNotice that there is no potential binder for the reflexive, so Condition A doesnot exclude the example, given the formulation of GC in (222). However, inthe absence of an antecedent, the agreement requirement cannot be satisfied.These speculations suggest that for reflexives without agreement, there willbe no locality requirement (Yang 1983, Pica 1987).Given that the Condition A requirement on reflexives is thus partially subsumed under the proper government requirement on traces, the question arisesof whether these two constraints fall together even more generally. Heim,

96

Chapter 1

Lasnik, and May (1991), expanding upon a proposal of Lebeaux (1983),

suggest that the locality requirement between reciprocal expressions and theirantecedents is attributable to conditions on movement. To the S-Structure ofsentence (231), an LF operation of each-movement, adjoining the distributoreach to its antecedent, will be applicable, giving (232).(231) The men saw each other(232) [IP[NP[NP the men]i eachj] [VP saw [NP tj other]]]In (233) this LF movement can be long distance. One reading (the noncontradictory one) of this sentence is representable as (234).(233) they said that they are taller than each other(234) [IP[NP[NP they]i eachj] [VP said [CP that theyj are taller than [tj other]]]]When the verb of the main clause is a nonbridge verb, however, movementis characteristically blocked. Compare (235) with (236).(235)

who did they say that they are taller than t

(236) ?*who did they mutter that they are taller than tCorrespondingly, the wide scope reading for each is unavailable with thenonbridge verb, leaving only the contradictory reading.(237) they muttered that they are taller than each otherThus, both major classes of lexical anaphors, reflexives and reciprocals,display constraints suggestive of movement.We turn finally to the question of the level(s) of representation relevant tothe binding conditions. (238), whose derivation involves raising of the antecedent to the appropriate position to bind the reflexive, suggests that D-Structureneed not meet Condition A.(238) Johni seems to himselfi [ti to be clever]The issue is not entirely clear-cut, given the considerations of the precedingdiscussion, but we will tentatively assume that this is correct. Now observethat (239), from a D-Structure like that of (240), indicates that Condition Clikewise need not be satisfied at D-Structure.(239) [who that Johni knows] does hei admire(240) hei admires [who that Johni knows]Compare sentence (241), a standard Condition C violation.(241) *hei admires everyone that Johni knows

The Theory of Principles and Parameters

97

Further, (242) indicates that LF satisfaction of Condition C would not suffice.

The LF representation of (241), following QR, shown in (242), is structurallyvery similar to the S-Structure (and LF, presumably) of (239).(242) [everyone that Johni knows]j [IP hei admires tj]The relevant difference between (239) and (242) seems to show up neither atLF nor at D-Structure, but rather, only at S-Structure. Alternatively, as discussed in section 1.3.3, reconstruction could be at issue here. Under the nullhypothesis that the binding conditions apply in a block, the level of representation at which they apply is S-Structure, or, assuming reconstruction, LF.With respect to Condition A, we have considered the distribution and interpretation of reflexives. The empty category PRO, which was briefly discussedin section 1.3.1, is very similar in its interpretation and in some aspects of itsdistribution. Controlled PRO generally has just the interpretation that a reflexive would have. This, in fact, was the motivation for the self-deletion analysisof these constructions offered in Chomsky and Lasnik 1977. Further, the principles relevant to the control of PRO appear, on first inspection, to be similarto those involved in the assignment of antecedents to anaphors. For example,as already discussed, an anaphor as subject of an infinitival clause can successfully be bound by the next subject up, as in (243), just as a PRO can bebound in the parallel configuration in (244).(243) Johni believes [himselfi to be clever](244) Johni tries [PROi to be clever]And as subject of a finite clause, neither is permitted.(245) *Johni believes [himselfi is clever](246) *Johni promises [PROi will attend class]cf. Johni promises [PROi to attend class]Further, while both are allowed as the subject of a nonfinite clause, in mostcircumstances the antecedent must be the next subject up for both.(247) * Johni expects [Mary to believe [himselfi to be clever]](248) *Johni expects [Mary to try [PROi to be clever]]However, alongside these similarities, there are striking differences in thedistributions of PRO and standard anaphors. For example, the paradigmaticposition for an anaphordirect objectis unavailable to PRO.(249)

John injured himself

(250) *John injured PRO

98

Chapter 1

Further, even in the kinds of structural positions allowing both PRO and anaphors, as in (243) and (244), the precise distribution is, in general, complementary rather than identical, as seen in the contrast between (243), (244), onone hand, and (251), (252), on the other.(251) *John believes [PRO to be clever](252) *John tries [himself to be clever]Thus, there are clear, and well-known, obstacles standing in the way of analyzing PRO simply as an anaphor, and thus determining its distribution andinterpretation via Condition A. There have been a number of interestingattempts to overcome these obstacles, some of them involving appeals to thetheory of Case, which we will explore in section 1.4.3. Suppose, for example,that himself requires Case, since it is lexical, while PRO does not tolerate Case,because it is not. Then (250) is immediately accounted for: PRO is Casemarked. (252) is straightforwardly explained on the standard assumption thattry cannot exceptionally Case-mark; that is, it can Case-mark a complementNP but cannot Case-mark the subject of a complement clause. And (251) isruled out since believe does exceptionally Case-mark, as seen in (243). Butthere are aspects of the distribution of PRO that cannot be deduced in this way.Consider (253).(253) *John believes sincerely [Mary to be clever]cf. John believes sincerely that Mary is cleverIn (253) Mary fails to receive Case, perhaps because of the adjacency requirement on Case assignment. But (254) is no better than (251).(254) *John believes sincerely [PRO to be clever]Thus, a filter proscribing Case for PRO is insufficient.Further examples, of the sort that have been widely discussed, indicateadditional deficiencies of a purely Case-theoretic account of the distributionof PRO. Since PRO in (255) is not in a configuration of Case assignment(a lexical NP is impossible here), that example might be expected to begrammatical, presumably with an arbitrary interpretation for PRO, asin (256).(255) *it is likely [PRO to solve the problem](256)

it is important [PRO to solve the problem]

And (257) might be expected to be grammatical with an arbitrary interpretation, or possibly with PRO controlled by John, given the general lack (or at

The Theory of Principles and Parameters

99

least amelioration) of Condition A effects in clauses with expletive subjects,

as illustrated in (258).(258) *John believes [it to be likely [PRO to solve the problem]](259)

Johni believes [it to be likely [that pictures of himselfi will be on

display]]

(259)(260), discussed in section 1.4.1 (see (139)), display one further configuration in which Case marking is inapplicable, yet PRO is nonethelessimpossible.(259) *my belief [Harry to be intelligent]cf. my belief that Harry is intelligent(260) *my belief [PRO to be intelligent]In Chomsky 1981a it is argued that the crucial factor determining the distribution of PRO is government. In particular, (261) is offered as a descriptivegeneralization (see also section 1.4.1).(261) PRO must be ungoverned.Under the standard assumption that Case marking requires government, thiswill entail that PRO will not be Case-marked. But the requirement is nowbroader, since there is government without Case marking. This is what we findin (254), (255), (257), and (260). The distribution of PRO is thus correctlydescribed.(261) can itself be deduced from more general properties, namely, Conditions A and B. If we take PRO to be simultaneously both an anaphor and apronominal, as suggested in section 1.3.1, it will then follow that it will neverhave a GC, since if it did, contradictory requirements would be in force, giventhat free entails not bound. (261) now follows, since a governed element willalways have a GC. The relevance of this to the present discussion is that controlmust now be independent of Condition A, the condition determining antecedence for (pure) anaphors, since to exist at all, PRO must trivially satisfyCondition A, by virtue of having no GC.This is widely viewed as an unfortunate, or even intolerable, consequence,and a substantial amount of research has focused on redefining PRO and/orproviding alternative characterizations of governing category. For particularly interesting discussions along these lines, see, for example, Bouchard1984 and Manzini 1983. We suggest here that control is different enough fromanaphor binding that a separate mechanism for antecedent assignment is, infact, justified. Consider first the familiar observation that in addition to the

100

Chapter 1

instances of control by a subject illustrated above, a controller can regularly

be an object, as in (262).(262) John told Maryi [PROi to leave]Thus far there is no evidence for distinguishing control from binding, sincebinding too can be by an object.(263) John told Maryi about herselfiBut at least two differences emerge on closer inspection. First, control is generally by a specifically designated argument. (See Nishigauchi 1984.) (264),with control by a subject instead of an object, is ill formed.(264) *Johnj told Mary [PROj to leave]Binding, on the other hand, has no such constraint in English, as seen in thegrammaticality of (265).(265) Johnj told Mary about himselfjThus, there is an optionality concerning choice of binder that does not regularly exist for choice of controller, a significant difference between the twophenomena.Now, it is well known that there are languages unlike English with respectto this property of binding. In particular, there are languages where, apparently,only subjects can be binders. Polish is one such language, as illustrated in thefollowing paradigm, from Willim 1982:(266)

swoimi ojcuJani opowiada Mariij oJohn tellingMary about self s fatherJohn was telling Mary about his father

(267) *Jani opowiada Mariij o

swoimj ojcuJohn tellingMary about self s fatherJohn was telling Mary about her fatherThese languages display a second difference between binding and control. For,while anaphor binding by a nonsubject is impossible, control by a nonsubjectis possible (or even necessary), just as in English.(268) Jani kaza Mariij [PROj/*i napisac artyku]John told MarywritearticleJohn told Mary to write an articleThe precise nature of the parameter distinguishing English-type anaphorbinding (any c-commander as the binder) from the Polish type (only subjectas the binder) is far from clear. But what does seem clear is that this parametric

The Theory of Principles and Parameters

101

difference does not carry over to control. For this and other reasons, there isconsiderable evidence for the existence of a distinct control module in thetheory of grammar.1.4.3

Case Theory

In some languages (Sanskrit, Latin, Russian,...), Case is morphologically

manifested, while in others, it has little (English, French,...) or no (Chinese,...) overt realization. In line with our general approach, we assume that Caseis always present abstractly. In nominative/accusative languages, the subjectof a finite clause is assigned nominative Case; the object of a transitive verbis assigned accusative Case (with some parametric and lexical variation, asdiscussed by Freidin and Babby (1984), Neidle (1988), among others); andthe object of a pre- or postposition is assigned oblique Case (again with substantial variation). The basic ideas of Case theory grew out of the investigationof the distribution of overt NPs, those with morphological content. Chomskyand Lasnik (1977) proposed a set of surface filters to capture this distribution,but Vergnaud (1982) observed that most of their effects could be unified ifCase is assigned as indicated just above, and if Case is required for morphological realization, as stated in (269), the Case Filter.(269) Every phonetically realized NP must be assigned (abstract) Case.Chomsky and Lasniks filters, and Vergnauds replacement, were largely concerned with subject position of infinitival clauses. By and large, a lexical NPis prohibited in this position.(270) *it seems [Susan to be here](271) *I am proud [Bill to be here]Finite counterparts of these constructions are possible.(272) it seems [that Susan is here](273) I am proud [that Bill is here]This is as predicted, since in (272)(273) the italicized NP is assigned nominative Case, while no Case is available for the corresponding NPs in(270)(271).Certain empty categories are permitted in place of the lexical NPs in (270)(271). In (274) we have the trace of raised Susan, instead of the NP Susanitself, and in (275) we find PRO in place of Bill.(274) Susan seems [t to be here](275) I am proud [PRO to be here]

102

Chapter 1

Indeed, as discussed in section 1.3.1, it is the Case requirement that forces themovement producing (274) from an underlying structure like (270). (269) thenneed not be satisfied at D-Structure, but rather is a condition on a derived levelof representation.(276) displays another construction permitting PRO as subject of an infinitive while disallowing lexical NP.(276) a. Bill tried [PRO to be here]b. *Bill tried [Mary to be here]In surprising contrast with the complement of try, we find just the reversebehavior with believe.(277) a. *Bill believed [PRO to be here]b. Bill believed [Mary to be here]As seen in section 1.4.2, (276a) versus (277a) receives an account in terms ofbinding theory. The CP complement of try is a barrier to government of thesubject of the complement, so PRO is allowed, having no GC in this configuration. Under the assumption that believe, as a lexical property, takes just an IPcomplement, PRO in (277a) is governed, hence has a GC. Either Condition Aor Condition B is then necessarily violated. However, (277b) is not yetexplained. In that example Mary is not the subject of a finite clause, theobject of a transitive verb, or the object of a preposition, so (269) should beviolated. The fact that the example is acceptable indicates that Mary doesreceive Case; (278) indicates that that Case is accusative (or oblique) ratherthan nominative.(278) Bill believed [her (*she) to be here]Further, there is evidence that the Case assigner is the matrix verb believe (d).Perhaps because of the meager overt Case system in English, Case assignmentgenerally conforms to an adjacency requirement, as illustrated in (279).(279) a. Bill sincerely believed Samb. *Bill believed sincerely SamThe same requirement exhibits itself with respect to the subject of the infinitival complement of believe.(280) a. Bill sincerely believed [Mary to be here]b. *Bill believed sincerely [Mary to be here]Evidently, believe can assign accusative Case not only to its object (the coresituation) as in (279a), but also to the subject of its infinitival complement, aphenomenon often referred to as exceptional Case marking (ECM). Recalling

The Theory of Principles and Parameters

103

that (277a) shows that there is a government relation in this configuration, we

conclude that Case is assigned under government (and, parametrically, adjacency), a slightly weaker requirement than the head-complement relation atits core. We tentatively take nominative Case also to fall under government,in this instance government of the subject by the inflectional head of IP(assuming an m-command definition of government).In English the lexical heads V and P appear to be Case assigners, while Nand A do not. This is why NPs can occur as direct complements of the former,[N], categories, but not of the latter, [+N], categories, despite the fact thatX-bar theory would lead us to expect the same range of complements in bothsituations. Thus, while proud can take a clausal complement, as seen in (273)and (275), it cannot take a bare NP.(281) *I am proud my studentsLikewise, while the verb criticize takes an NP complement, its nominalizationcriticism does not.(282)

John criticized the theory

(283) *Johns criticism the theory

In place of the NP complements in (281) and (283), we find an apparentprepositional phrase with a semantically null preposition of.(284) I am proud of my students(285) Johns criticism of the theoryIt seems that of is inserted to provide a Case assigner for a lexical NP thatwould otherwise be Caseless. Insertion of a pleonastic element to fulfill amorphosyntactic requirement is a rather common process. Do-support salvages an inflectional affix isolated from V by movement of I to C, as in (286).(286) did John leaveBut there is some reason to question such an account of (284)(285). Inparticular, none of the other Case Filter violations enumerated above can besalvaged by the insertion of of.(287) *it seems of Susan to be here (cf. (270))(288) *I am proud of Bill to be here (cf. (271))(289) *Bill tried of Mary to be here (cf. (276b))(290) *Bill believed sincerely of Sam (cf. (279b))(291) *Bill believed sincerely of Mary to be here (cf. (280b))

104

Chapter 1

To the (271) versus (288) paradigm, with an adjectival head of the construction, could be added (292a) versus (292b), where the head is nominal.(292) a. *my proof John to be hereb. *my proof of John to be hereThat proof can take a clausal complement is evidenced by (293).(293) my proof that John is hereFurther, it would be expected to take an infinitival complement as an optionsince the verb to which it is related can.(294) a. I proved that John is hereb. I proved John to be hereIt is important to note that under other circumstances of-insertion is availablewith proof, as illustrated in (295).(295) a. *my proof the theoremb. my proof of the theoremTwo requirements emerge from the data examined so far. First, ofinsertion takes place in the context of a [+ N] head (N or A) and not otherwise.And second, of is available only for the complement of an appropriate head.It is not possible in exceptional circumstances. This suggests a differentperspective on of-insertion. Instead of of being inserted, as a sort of last resort,before the complement of an A or N, suppose A and N are, in fact, (genitive)Case assigners, as is overtly visible in German (Van Riemsdijk 1981). Of canthen be regarded as the realization of this genitive Case in this configurationin English. Following Chomsky (1986b), we then distinguish the structuralCases accusative and nominative, which are assigned solely in terms ofS-Structure configuration, from inherent Cases, including genitive, which areassociated with -marking. That is, inherent Case is assigned by to NP onlyif -marks NP. In (292), then, John cannot receive inherent Case from proofsince it receives no -role from it. Structural Case has no such thematicrequirement, but proof, being a noun, has no structural Case to assign. Thus,John receives no Case at all and violates the Case Filter. Note that under theinherent Case approach to of-insertion, the abstract Case needed for the satisfaction of the Case Filter can be either structural or inherent.Passives are another construction in which Case is evidently not available,but where of-insertion, now viewed as inherent genitive assignment, still doesnot obtain. (296) illustrates this for exceptional Case.(296) *it is believed (of) Mary to be here

The Theory of Principles and Parameters

105

Compare Mary is believed to be here and it is believed that Mary is here.

These examples show that a passive verb, unlike a preposition or activeverb, is not a structural Case assigner. The impossibility of of here is notsurprising, given the thematic requirement we have seen. (297) is moreproblematic.(297) *it is believed (of) Marycf. Mary is believedAgain, structural Case is unavailable, indicating that, as suggested in Chomskyand Lasnik 1977, passive verbs are not [ N]. But since Mary is the -markedcomplement of believed, inherent genitive Case might be expected. The factthat it is not possible indicates that a passive verb, while not a verb ([ + V,N]), is not an adjective ([ + V, + N]) either. Rather, it is a neutralized [ + V]category with no marking for the feature [N]. Alternatively, as in Baker,Johnson, and Roberts 1989, the passive morpheme is actually an argumentreceiving the subject -role of the verb and the accusative Case that the verbassigns. Accusative Case is then unavailable for the object of the verb, or forthe subject of a clausal infinitival complement.The Case Filter was originally proposed as a morphological requirement,and while such a requirement might well be at its core, there are relevantphenomena that do not seem amenable to an account in morphological terms.The trace of wh-movement generally must conform to the Case Filter; notethat virtually all of the contexts examined thus far where a lexical NP is prohibited also disallow a wh-trace.(298) *who does it seem [t to be here](299) *who are you proud [t to be here](300) *who did Bill try [t to be here](301) *who are you proud t(302) *which theory did you understand the proof t(303) *who is it believed tThough traces have features, they have no morphological realization, so (298)(303) are unexpected. It might be thought that it is actually the wh-phraseantecedent of the trace that must satisfy (269), with Case somehow beingtransmitted from the trace via the links of the movement chain in well-formedwh-questions such as (304).(304) who did you see t

106

Chapter 1

However, the paradigm is replicated in constructions where even the moved

operator need not have overt morphological realization, as in the relativeclauses in (305) or the complex adjectival constructions in (306).(305) a.b.c.d.e.f.g.

the*the*the*the*the*the*the

man (who) I see

man (who) it seems to be hereman (who) you are proud to be hereman (who) Bill tried to be hereman (who) I am proudtheory (which) you understand the proofman (who) it is believed

(306) a. Maryi is too clever [Opi [for us to catch ti]]

b. *Maryi is too reclusive [Opi [for it to seem ti to be here]]c. *Billi is too unpopular [Opi [for you to try ti to be here]]Evidently, both phonetically realized NPs and variables (traces of operatormovement) must have abstract Case. Arguably, pro, the null pronominalsubject in such languages as Italian and Spanish, must also since it typicallyoccurs as the subject of a finite clause. In terms of phonetics and morphology,these three NP types constitute an unnatural class. It is for this reason that weinstead might attribute Case Filter effects to -theory. As mentioned in section1.3.1, we assume that an argument must be visible for -role assignment, andit is Case that renders it visible. This correctly distinguishes overt NPs, variables, and pro, on the one hand, from NP-trace on the other hand. Only theformer are arguments.We now assume, then, that the Case Filter is, in effect, part of the principleof -marking: a chain is visible for -marking only if it has a Case position.Economy conditions (Last Resort) block further movement if a Case position has been reached in chain formation. Given the interface condition onD-Structures, we derive the Chain Condition: in an argument chain (1,..., n),1 is a Case position and n a -position.In discussing the Chain Condition in section 1.3.1, we noted two majorproblems: concerning expletives and PRO. The former were discussed insection 1.3.3; it remains to deal with the fact that argument PRO appears innon-Case positions, a fact that apparently compels us to adopt a disjunctiveversion of the Visibility Condition that falls short of a true generalization.(307) A chain is visible for -marking if it contains a Case position(necessarily, its head) or is headed by PRO.The problems concerning PRO are in fact more serious. Thus, PRO is likeother arguments in that it is forced to move from a non-Case position, and

The Theory of Principles and Parameters

107

cannot move from a Case-marked position, facts left unexplained even by theunsatisfactory disjunction (307).The first problem is illustrated by such constructions as (308).(308) we never expected [there to be found ]If is an indefinite NP, the counterpart to (308) is grammatical in many languages and marginally acceptable in English (more so, with heavy NPs suchas a hitherto unknown play by Shakespeare); at LF, raises to the positionof the expletive, giving a chain that satisfies the Visibility Condition. But with = PRO, the sentence is completely excluded, though all relevant conditionsare satisfied: PRO occupies a -position as object of find, and choice of arbitrary PRO should satisfy the definiteness condition, giving the meaning wenever expected that some arbitrary person would be found. Overt raising ofPRO to the position of there is possible, as in (309), but with an entirely different meaning involving control by we.(309) we never expected [PRO to be found t]As a descriptive observation, yet to be explained, we conclude that PRO mustmove from a non-Case position at S-Structure, while other arguments mustmove from such a position either at S-Structure or at LF.To bar (308), we might appeal to the requirement that PRO be ungoverned(see section 1.4.2). We must, however, now assume that this condition appliesat S-Structure; if the condition follows from Conditions A and B of the bindingtheory, then these too apply at S-Structure. To account for (309), we mightmodify Last Resort to permit movement of PRO from a governed position.Both the assumption that binding theory applies at S-Structure and theextension of Last Resort are open to question. Furthermore, they are empirically inadequate, because of the second problem: like other arguments, PROis not permitted to move from a Case-marked position, even to escape government. The problem is illustrated in such forms as (310).(310) a. to talk about b. to strike [that the problems are insoluble]c. to seem to [that the problems are insoluble]Suppose that (310a) is a D-Structure in the context it is unfair , with = eand = John. Last Resort bars raising of to position , yielding (311a),because the chain (John) is already visible for -marking without movement.Suppose = PRO. On the assumptions now under consideration, PRO mustraise to the position to satisfy the nongovernment requirement. But thatmovement is impermissible, even though is a legitimate position for PROin other constructions, as in (311c).

108

Chapter 1

(311) a. *it is unfair [John to talk about t]

b. *it is unfair [PRO to talk about t]c. it is unfair [PRO to talk about John]One might argue in this case that there is a -theory violation, the subjectbeing an obligatorily -marked position (a dubious move, as illustrated bynominalizations in which no external -role is assigned; see Chomsky 1981a).But that argument will not suffice for (310bc) (Lasnik 1992). Here is in anon--position, so that the sentences are well formed with = expletive it and = John as in (312ab).(312) a. it is rare for it to strike John that the problems are insolubleb. it is rare for it to seem to John that the problems are insolubleStill, = John cannot raise to the position , leaving trace, as in (313).(313) a. *We want John to strike t that the problems are insolubleb. *We want John to seem to t that the problems are insolubleIn the case of = John, Last Resort accounts for the phenomena, Case beingassigned in the trace position and therefore barring further movement. Butsuppose that = PRO in (310). The requirement of nongovernment forcesmovement, to yield (314).(314) a. PRO to strike t [that the problems are insoluble]b. PRO to seem to t [that the problems are insoluble]PRO is now in an ungoverned position, heading a -marked chain. Hence, allconditions are satisfied. But the constructions are radically ungrammatical,whatever the context.We conclude, then, that the proposal to impose the nongovernment requirement for PRO at S-Structure and to incorporate this condition in Last Resortdid not solve the problem. Even with these questionable moves, the disjunctiveformulation of the Visibility Condition remains empirically inadequate, as wellas unsatisfactory. Some other principle requires that PRO behave like otherarguments, moving from non-Case positions and barred from moving fromCase positions.Notice that these anomalies would be overcome if PRO, like other arguments, has Case, but a Case different from the familiar ones: nominative,accusative, and so on. From the point of view of interpretation, we mightregard PRO as a minimal NP argument, lacking independent phonetic, referential, or other properties. Accordingly, let us say that it is the sole NP thatcan bear null Case (though it may have other Cases as well, in nonstandardconditions that we will not review here). It follows that Last Resort applies to

The Theory of Principles and Parameters

109

PRO exactly as it does to any argument: PRO is permitted to move from a

non-Case position to a position where its Case can be assigned or checked,and is not permitted to move from a Case position. The Visibility Conditioncan now be simplified to (315).(315) A chain is visible for -marking if it contains a Case position.necessarily, its head, by Last Resort.Observe further that in some languages, agreement plays the same role asCase in rendering chains visible (Baker 1988). Thus, abstract Case shouldinclude agreement along with standard Case phenomena. The realization ofabstract Case will depend on parametric choices for functional categories.Case is a relation of XP to H, H an X0 head that assigns or checks the Caseof XP. Where the feature appears in both XP and H, we call the relation agreement; where it appears only on XP, we call it Case.In English, Spanish, and other languages with minimal overt Case marking,agreement is often manifest with PRO as well as overt NPs, as in (316), wherethe predicate necessarily agrees with the subject of the lower clause.(316) a.b.c.d.

I want [them to be officers]

Thus, PRO includes -features for agreement, elements of abstract Case if we

construe this category in the manner just indicated. It is a small further step,then, to suppose that like other NPs, PRO contains standard Case as well asagreement features.Where, then, is null Case assigned or checked (assume the latter, for concreteness)? Recall that nominative Case is standardly checked in [Spec, IP],where I involves the features of tense and agreement (T, Agr). It is thus arealization of a Spec-head relation, with the head = I, the head of IP. It isnatural, then, to take null Case to be a realization of the same relation whereI lacks tense and agreement features: the minimal I checks null Case, and theminimal NP alone can bear it. More generally, we may assume that the infinitival element (with null agreement) and the head Ing of gerundive nominalscheck null Case, so that PRO will appear in such constructions as (317).(317) a. PRO to VP (to be sick)b. PRO Ing VP (being sick)One striking anomaly still remains in Case theory. We are taking abstractCase to be an expression of an (XP, head) relation. But we still have two

110

Chapter 1

distinct relations of head to XP, leaving us still with an unsatisfactory disjunctive formulation: while nominative (and now null) Case is the realization of aSpec-head relation, accusative Case is assigned by V to an NP that it governs.In discussing the matter earlier, we extended government to m-command toincorporate nominative Case assignment; but apart from the Case relation,c-command appears to be the appropriate basis for government. It would bemore natural to suppose that structural Case in general is the realization of aSpec-head relation, while inherent Case, which, as we have seen, is associatedwith -marking, is assigned by lexical heads. We have already touched uponthis possibility in discussing the inflectional system in section 1.3.2, where wetook it to have the form (318) ( = (77)).(318)

IPAgrs

SpecAgrs

TPT

AgroPAgro

SpecAgro

VP

As before, the notations Agrs and Agro are mnemonics; there is only oneelement Agr, a collection of -features. We continue to omit a possible [Spec,T] and negation, and to assume that at D-Structure the subject occupies the[Spec, VP] position.Recall further that the V head of VP amalgamates with the heads Agro, T,and Agrs; and at least by LF, V with its affixes has raised to eliminate all tracesnot c-commanded by their antecedents. Verbs may or may not have the abilityto assign Case, which we may assume to be indicated by a two-valued feature[Case] for accusative and unaccusative verbs (Perlmutter 1978, Burzio 1986).If V has [+Case], then the amalgam [Agro, V] will also have this feature andwill check accusative Case in the position [Spec, Agro]; if V has [Case], anNP in [Spec, Agro] will not have its Case checked and must therefore move to[Spec, Agrs]. The [Agrs, T] amalgam checks either nominative or null Case inthe position [Spec, Agrs], depending on whether T has the value [+tense] or[tense]. Structural Case in general is simply a manifestation of the [Spec,

The Theory of Principles and Parameters

111

Agr] relation, with realizations as Case or agreement, depending on languageparticular morphology.

As we have seen, one standard kind of parametric variation among languages has to do with the position of S-Structure in the derivation of LF fromD-Structure. Thus, certain operations that are necessary for satisfying LFconditions may apply before or after the branch point to the PF component.The same is true of the operations that raise NP to the [Spec, Agr] positionsfor Case checking. Suppose that all the NP-raising operations are at LF andthe language is left-headed, with V raising overtly to the inflectional position.Then as noted earlier (section 1.3.2), we have a VSO configuration atS-Structure, V and the inflectional elements having amalgamated and trace ofV heading VP in (318), with subject and object remaining in their VP-internalpositions. Subject will raise to [Spec, Agrs] and object to [Spec, Agro] at LF.Suppose that subject raising is overt and object raising covert in the LF component. We then have an SVO configuration at S-Structure, with the VP headedby V or its trace depending on whether the language lowers inflections to V(like English) or raises V to inflection (like French, and English auxiliaries;see section 1.3.1). Suppose that the language is right-headed with overt objectraising and covert subject raising; we then have OSV order at S-Structure(scrambling). If both subject and object raise overtly in a right-headed language, we will still have SOV order, but with traces in the original positionsin VP. Other options are also possible.The parameters involved are much like those that differentiate English-typelanguages that require overt raising of a question phrase from Chinese-typelanguages that leave all such phrases in situ. As discussed in section 1.3.3, wetake the economy principles to prefer covert operations, which do not feed thePF component, to overt operations that do. Hence, unless a language requiresthat movement be overt, it will apply at LF, as in Chinese-type interrogativesor multiple wh-phrases in English-type languages. We might assume that whatis involved is a condition on S-Structure Spec-head agreement, where the headis the C to which the wh-phrase raisesthat is, a condition on Case, in thebroad sense now under consideration. The conditions on Agrs and Agro aresimilar. Only if S-Structure Spec-head agreement (Case, in the broad sense)is required is overt raising permissible: in English, for Agrs but not Agro. Fora formulation eliminating the S-Structure condition, see chapter 3.This approach, which reduces Case agreement to a reflection of the Spechead relation, requires that we modify the formulation of a number of the basicprinciples discussed earlier, while leaving their content essentially intact, forexample, the Last Resort condition for movement and the associated ChainCondition. Consider the D-Structures (319).

112

Chapter 1

(319) a. we believe [e to have [VP John won the election]]

b. we believe [e to have [VP been elected John]]Assuming the VP-internal subject hypothesis, John is within VP in (319a) andmust raise to the subject position e, as also in (319b), yielding the S-Structureforms (320).(320) a. we believe [John to have [t won the election]]b. we believe [John to have [been elected t]]The standard account, reviewed earlier, explains this in terms of the ChainCondition, assuming an S-Structure requirement on Case assignment. Movement is a legitimate last resort operation.We now cannot appeal to this argument for S-Structure movement. Theproblem is that the S-Structure forms (320) still do not satisfy the Chain Condition, because Case is checked only at the LF representations (321).(321) a. we [John believe [t to have t won the election]]b. we [John believe [t to have been elected t]]This is one of a class of problems relating to the subject position [Spec, IP],a non--position that can be occupied either by an argument (raised from a-position) or an expletive, which may in turn be overt (there, it) or vacuous,that is, nothing but a target for movement. The expletive can be pro, if thelanguage permits null subjects. In such a case, analogues to (319) would beacceptable in principle at S-Structure with e being pro, assuming the satisfaction of other conditions (the indefiniteness condition, etc.). Then LF movementwould replace pro by its associate in the normal fashion.Note that these problems arose in a different way in the standard account.In part the problems were conceptual: the standard account was based on thedubious assumption that Case must be checked at S-Structure, though onconceptual grounds we would expect the Visibility Condition, hence the ChainCondition, to apply only at the LF interface. In part the problems were similarto the ones just raised. Thus, in the construction (322), for example, the phrasean error is raised at S-Structure even though the target position is not assignedstructural Case; this is checked (or assigned) only at LF, after expletivereplacement.(322) there was an error made t in the calculationThe problem is similar to the one we now face in the case of (319)(321).The EPP (see section 1.3.2) requires, for English, that the [Spec, IP] positionbe present through the course of a derivation, hence occupied by an expletiveat D-Structure. Other optional positions (e.g., [Spec, Agro]) may be assumed

The Theory of Principles and Parameters

113

to be inserted in the course of the derivation as part of the movement operation

itself, inserting a target for movement in a manner conforming to X-bartheory. Where the expletive is inserted to satisfy the EPP, it must be eitherpro or a vacuous target for movement. English lacks the first option and musttherefore accept the second: the vacuous expletive, which is only a target formovement.A vacuous expletive, being only a target for movement, must be eliminatedas soon as possible. Either it is eliminated by the very movement operationthat inserted it as a target, or, if it was inserted at D-Structure to satisfy theEPP, it is eliminated at the first opportunity in the course of derivation, hencesurely by S-Structure, in the course of cyclic application of rules from the mostdeeply embedded structure to the highest category. Indirectly, then, (320) isleft as the only option for English. It is necessary to extend this reasoning toother constructions that exhibit a similar range of properties, a matter thatrequires a closer analysis of the notions of economy and the status ofexpletives. For discussion within a considerably simplified framework, seechapter 3.Turning now to the new version of Case theory, we can account for the factthat raising takes place at S-Structure in such constructions as (319). And sinceEnglish does not require S-Structure checking of accusative Case, overt operations cannot form (321). It remains to provide a new interpretation of the ChainCondition and Last Resort, to conform to the new assumptions.These revisions are straightforward. The Visibility Condition took Case(now including agreement) to be a condition for -marking. We assumedbefore that this was a condition on chains (the Chain Condition). We now takeit to be a condition on linked chains, where a linked chain is formed by linkingtwo chains C1 and C2 of (323), where n = 1.(323) a. C1 = (1,, n)b. C2 = (1,..., m)The new linked chain C3, headed by 1 and terminating in m, is the LF objectthat must satisfy the Chain Condition. In the examples (319)(321) we havethe linked chain (John, t, t) at LF, in each case. The account can be simplifiedfurther in ways that we will not explore here.Turning now to Last Resort, its intuitive content was that operations shouldbe permissible only if they form legitimate LF objects. We now relax thatrequirement, taking an operation to be permissible if it is a prerequisite to theformation of a legitimate LF object; had the operation not taken place, thederivation would not have been able to form such an object. S-Structure raisingis now a permissible last resort operation because, were it not to apply, the

114

Chapter 1

derivation would not yield legitimate LF objects in the case of (320), (322);the latter case indicates that this interpretation of Last Resort was alreadynecessary in the standard account.In presenting the standard account, we noted that the Case Filter is not satisfied at D-Structure, but rather is a condition on a derived level of representation. Apart from expletive constructions, that level was S-Structure, for English.We have now moved to the conceptually preferable assumption that the CaseFilter is satisfied only at the interface level. S-Structure movement, whererequired, follows from the economy conditions, the EPP, and properties ofexpletives (including the null subject parameter).It remains to settle many other questions (see chapter 3). But the basicstructure of the system is reasonably clear, and it offers some prospects forunifying the properties of Case theory and integrating it into the generalframework in a natural way.1.5

Further Topics

The review above is sketchy and incomplete, and leaves many important topicsvirtually or completely unmentioned. A number of examples have been noted,among them the status of morphology, a question with broad implications,however the problems are settled. The discussion of the computational systemis also crucially too narrow in that it excludes the PF component. This restriction of scope not only omits major topics (see Chomsky and Halle 1968,Goldsmith 1976, McCarthy 1979, Clements 1985, Dell and Elmedlaoui 1985,Halle and Vergnaud 1988, among many others), but also begs certain questions; as briefly noted earlier, there are open questions as to whether certainoperations and properties we have assigned to the LF component do not infact belong to the PF component (section 1.3.3).Similar questions arise about the actual division of labor between the PFcomponent and the overt syntax. Consider, for example, the parallelismrequirementcall it PRthat holds of such expressions as (324).(324) John said that he was looking for a cat, and so did Bill [say that hewas looking for a cat]The first conjunct is several-ways ambiguous. Suppose we resolve the ambiguities in one of the possible ways, say, by taking the pronoun to refer to Tom,and interpreting a cat nonspecifically, so that John said that Toms quest wouldbe satisfied by any cat. The constraint PR requires that the second conjunct beinterpreted in the same way as the firstin this case, with he referring to Tom

The Theory of Principles and Parameters

115

and a cat understood nonspecifically. The same is true of the elliptical construction (325).(325) John said that he was looking for a cat, and so did BillHere too, the interpretation satisfies PR (Lasnik 1972, Sag 1976, Ristad1993).On our assumptions so far, PR applies to the LF representation. If (325) isgenerated at S-Structure, we must assume that some LF process regeneratessomething like (324), which is then subject to PR. A simple alternative wouldbe to deny that (325) is generated at S-Structure, taking it to be formed by arule of the PF component that deletes the bracketed material in (324) to form(325), as in earlier versions of generative grammar. That alternative is strengthened by observation of a distinctive phonetic property of (324): the bracketedphrase has a distinguished low-flat intonation. That property, we assume, isdetermined within the PF component. The deletion rule, then, could say simplythat material with this intonational property may optionally delete. Since suchexpressions as (324) have their particular status in the language, they must begenerated quite independently of their elliptical counterparts. We are left, then,with a very simple treatment of ellipsis: it reduces to deletion of phoneticallymarked material by a general principle. The problems of parallelism, and soon, must still be dealt with for such examples as (324), but that is true independently of how we handle ellipsis.If this approach is correct, then a wide class of elliptical constructions willbe formed within the phonological component, not by operations of the overtsyntax. Numerous problems remain, for example, the status of such expressions as (326), derived from the presumed underlying forms (327), which are,however, ill formed in this case.(326) a. John said that he was looking for a cat, and Bill did toob. John likes poetry, but not Bill(327) a. John said that he was looking for a cat, and Bill did [say he waslooking for a cat] toob. John likes poetry, but not Bill [likes poetry]The solution to the problem might well involve significant changes in howinflectional processes and negation are treated in the overt syntax. We leavethe question here, merely noting that an approach to ellipsis that has considerable initial plausibility involves PF component properties in ways that mayhave large-scale effects when pursued. In this respect too, omission of the PFcomponent leaves important questions unanswered.

116

Chapter 1

The discussion of modules of language is also seriously incomplete. We

have, for example, said virtually nothing about -theory and argument structure (see, among many others, Gruber 1965, Jackendoff 1972, 1983, 1987,1990b, Williams 1981, Bresnan 1982, Higginbotham 1985, 1988, Hale andKeyser 1986, 1991, Wilkins 1988, Grimshaw 1990, Pustejovsky 1992) andhave barely mentioned the theory of control, topics that interact crucially withother aspects of syntax. Further inquiry into these topics raises the questionwhether the system of modules is, in fact, a real property of the architectureof language, or a descriptive convenience.It is unnecessary to add that this sketch also omits many other major topicsthat have been the focus of highly productive inquiry and provides only ascattered sample of relevant sources on the topics that have been addressed.As explained at the outset, we have attempted no more than to indicate thekinds of work being pursued within the general P&P framework and to outlinesome of the thinking that underlies and guides it.

Some Notes on Economy of Derivation and

Representation

The past few years have seen the development of an approach to the study oflanguage that constitutes a fairly radical departure from the historical tradition,more so than contemporary generative grammar at its origins. I am referringto the principles-and-parameters (P&P) approach,1 which questions theassumption that a particular language is, in essence, a specific rule system.If this approach is correct, then within syntax (excluding phonology)2 thereare no rules for particular languages and no construction-specific principles.A language3 is not, then, a system of rules, but a set of specifications forparameters in an invariant system of principles of Universal Grammar (UG);and traditional grammatical constructions are perhaps best regarded as taxonomic epiphenomena, collections of structures with properties resulting fromthe interaction of fixed principles with parameters set one or another way.There remains a derivative sense in which a language L is a rule systemof a kind: namely, the rules of L are the principles of UG as parameterizedfor L.In the course of this recent work, certain unifying concepts have emergedunifying in the sense that they appear throughout the components of a highlymodular system: c-command and government, for example. There also seemto be fairly general principles involving these concepts, with wide-rangingeffects. The Empty Category Principle (ECP), belonging to the theory of government, is one such principle, which has been the subject of much fruitfulwork. Such concepts and principles play a pervasive role in a tightly integratedsystem; slight modifications in their formulation yield a diverse and oftencomplex array of empirical consequences, which have also been fruitfully

This chapter originally appeared in Principles and Parameters in Comparative Syntax,

edited by Robert Freidin (Cambridge, Mass.: MIT Press, 1991), and is published herewith minor revisions.

118

Chapter 2

explored in a large number of languages. And we may be fairly confident that

much remains to be learned about just how they should be expressed.I think we can also perceive at least the outlines of certain still more generalprinciples, which we might think of as guidelines, in the sense that they aretoo vaguely formulated to merit the term principles of UG. Some of theseguidelines have a kind of least effort flavor to them, in the sense that theylegislate against superfluous elements in representations and derivations.Thus, the notion of Full Interpretation (FI) requires that representations beminimal in a certain sense. Similarly, the Last Resort condition on movement,which yields a partial explanation for the requirement that A-chains be headedby a Case position and terminate in a -position (the Chain Condition), hasthe corresponding effect of eliminating superfluous steps in derivations, thusminimizing their length.4 What I would like to do here is to search for someareas where we might be able to tease out empirical effects of such guidelines,with a view toward elevating them to actual principles of language, if that isindeed what they are.2.1

Preliminary Assumptions

Let us begin with a range of assumptions concerning language design, generally familiar though often controversial, which I will adopt without specificargument.I will assume the familiar Extended Standard Theory (EST) framework,understood in the sense of the P&P approach. We distinguish the lexicon fromthe computational system of the language, the syntax in a broad sense (including phonology). Assume that the syntax provides three fundamental levels ofrepresentation, each constituting an interface of the grammatical system withsome other system of the mind/brain: D-Structure, Phonetic Form (PF), andLogical Form (LF).The lexicon is a set of lexical elements, each an articulated system of features. It must specify, for each such element, the phonetic, semantic, andsyntactic properties that are idiosyncratic to it, but nothing more; if featuresof a lexical entry assign it to some category K (say, consonant-initial, verb, oraction verb), then the entry should contain no specification of properties of Kas such, or generalizations will be missed. The lexical entry of the verb hitmust specify just enough of its properties to determine its sound, meaning,and syntactic roles through the operation of general principles, parameterizedfor the language in question. It should not contain redundant information, forexample, about the quality of the vowel, properties of action verbs generally,or the fact that together with its complement, it forms a VP.5

Some Notes on Economy of Derivation and Representation

119

It has been suggested that parameters of UG relate, not to the computational

system, but only to the lexicon. We might take this to mean that each parameterrefers to properties of specific elements of the lexicon or to categories oflexical itemscanonical government, for example. If this proposal can bemaintained in a natural form, there is only one human language, apart fromthe lexicon, and language acquisition is in essence a matter of determininglexical idiosyncrasies. Properties of the lexicon too are sharply constrained,by UG or other systems of the mind/brain. If substantive elements (verbs,nouns, etc.) are drawn from an invariant universal vocabulary, then only functional elements will be parameterized. The narrower assumption appears plausible; what follows is consistent with it.6The level of D-Structure is directly associated with the lexicon. It is a purerepresentation of -structure, expressing -relations through the medium ofthe X-bar-theoretic conditions in accordance with the Projection Principle. Itmay meet some strong uniformity condition7 and in this sense be invariantacross languages. I will assume here a two-level X-bar theory of the conventional sort, perhaps restricted to binary branching in accordance with Kaynes(1984) theory of unambiguous paths.8The level of PF is the interface with sensorimotor systems, and the level ofLF, the interface with systems of conceptual structure and language use.Each of these levels is a system of representation of a certain type, its properties specified by principles of UG.9 For a particular language, the choice ofD-Structure, PF, and LF must satisfy the external constraints of the interfacerelation. Furthermore, the three levels must be interrelated by mechanismspermitted by the language faculty. The structural description of an expressionE in language L includesperhaps isthe set {, , }, representations at thelevels of D-Structure, PF, and LF, respectively, each satisfying the externalconditions.10 We may understand the structure of L to be the set of structuraldescriptions, for all expressions E. The language L itself consists of a lexicon,a specific choice of values for parameters of UG, and such rules as there maybe, perhaps restricted to phonology. I understand language here in the senseof what I have called elsewhere I-language, where the terminology is intendedto suggest internalized and intensional. Intuitively, a language, so construed, is a way of speaking and understanding, in a traditional sense; tohave such a way of speaking and understanding (that is, to have a language,or to know a language) is to have the I-language as a component of the mind/brain. Note that although they are external to the computational system oflanguage, the interface constraints are internal to the mind/brain. Otherinteractionsfor example, those entering into the study of reference andtruthare a different matter.

120

Chapter 2

In accordance with the general EST framework, I assume that the threelevels are related to one another not directly, but only through the intermediary level of S-Structure, which is the sole point of interaction among thethree fundamental levels. From this standpoint, S-Structure is a derivativeconcept. For a specific language L, its properties are determined by those ofthe fundamental levels, and by the condition that it be related to them by theappropriate principles. The level of S-Structure for L is the system that satisfiesthese conditions, something like the solution to a certain set of equations.Presumably, the principles of language design require that this solution beunique.Exactly how these principles of interaction among levels should be understood is not entirely clear. I will adopt the general assumption that S-Structureis related to LF by iterated application of the principle Move (substitutionand adjunction), deletion, and insertionthat is, by the principle Affect inthe sense of Lasnik and Saito (1984)and to PF by this principle and the rulesof the phonological component.The relation of S-Structure to the lexicon has been construed in variousways. I will assume that the relation is mediated by D-Structure, in the mannerjust outlined, and that D-Structure is related to S-Structure as S-Structure isrelated to LF and (in part) PF, that is, by iterated application of Affect .Alternatively, it might be that D-Structure is determined by a chain formationalgorithm applying to S-Structure (or perhaps LF), and in this sense is projected from S-Structure as a kind of property of S-Structure; this algorithmwill then express the relation of S-Structure to the lexicon.The choice between these two options has been open since the originsof trace theory, before the P&P approach crystallized. It has never beenentirely clear that there is a real empirical issue here. There is, at best, arather subtle difference between the idea that two levels are simply related,and the idea that the relation is a directional mapping. Similarly, it is asubtle question whether the relation of S-Structure to the lexicon is mediatedby a level of D-Structure with independent properties, serving as one of thefundamental interface levels. My own rather tentative feeling is that there isan issue, and that there is mounting, if rather subtle and inconclusive, evidencein support of the picture sketched earlier, with three fundamental interfacelevels and the D- to S-Structure relation interpreted as a directional mapping.11I will adopt this interpretation for expository purposes; it is rather generallyadopted in practice, with results then sometimes reconstructed in terms of thealternative conception, a suggestive and possibly meaningful fact. Much ofwhat follows is neutral between the several interpretations of this system.

Some Notes on Economy of Derivation and Representation

121

S-Structure may also have to satisfy independent conditions, for example,

Some Properties of Verbal Inflection

Of the many specific areas that might be investigated in an effort to clarify

general guidelines of the kind mentioned earlier, I will concentrate on the topicof X0-movement, a matter of particular interest because of its implications forthe study of word formation, though there are other cases, for example,V-movement in the sense of Koopman (1984) and others. With respect to wordformation, there are two major categories where the question of X0-movementarises: complex predicates (causatives, noun incorporation, etc.), and inflectional morphology. There is an ongoing and illuminating debate about whetherX0-movement applies in these cases, and if so, how. I will not consider thefirst category, but will limit attention to inflection, assuming that it involvessyntactic rules such as V-raising to I, and I-lowering to V (affix hopping). Iam thus assuming a sharp and principled distinction between inflectionalmorphology, part of syntax proper, and strictly derivational morphology, partof the lexicon, perhaps subject to such principles as right-headedness in thesense of Edwin Williams and others. I am, then, assuming something like theearliest version of the lexicalist hypothesis.With respect to X0-movement, there is one salient descriptive factthe Head Movement Constraint (HMC)and one central question about it:is it reducible, partially or completely, to independently motivated principlesof syntactic movement? Assume for now that XP-movement (A- andA-movement ) is given, with its principles, specifically the ECP. I will assumethat the ECP reduces to the property of antecedent government, with therequirement that trace be properly governed relating to other conditions thathave to do with identification of empty categories.13 We then ask whetherthese same principles yield the HMC as a special case. If so, we have a truereduction of the HMC, and therefore reduction of properties of word formationto independently established principles of syntax.14Let us begin with some recent ideas of Jean-Yves Pollock, based on workby Joseph Emonds on verbal inflection in English-type and French-type languages.15 I will generally follow Pollocks proposals, adapting some of themin a different way and asking how they might bear on least effort guidelinesand the status of the HMC.

122

Chapter 2

Assume the X-bar-theoretic principle that S = I (IP), so that the basic

structure of the clause is (1).16(1)

IPNP

II

VP

We leave open the question whether the subject NP is base-generated in place

or raised from VP, as proposed in several recent studies, and many other questions that are not directly relevant.Emondss basic idea is that in French-type languages, V raises to I, whereasin English-type languages, I lowers to V. There is a variety of empirical evidence supporting this conclusion. Assume it to be correct. It will then followthat VP-adverbs, which we take to be generated under VP adjoined to anotherVP, are preverbal in English and postverbal in French, as in (2).(2) a.b.c.d.

John often kisses Mary

But the English auxiliaries have and be behave approximately like ordinaryverbs in French, as in (3).(3) a. John has completely lost his mindb. books are often (completely) rewritten for childrenTherefore, the distinction is not raising in French versus lowering in English,but some other difference that requires French verbs and English auxiliariesto raise while barring this possibility for other verbs in English.On other grounds, it has been postulated that the Agr element is strongerin French than in English. Assume this to be true. Assume further that weakAgr is unable to attract true verbs such as kiss or lose, though it can attractauxiliaries, whereas strong Agr attracts all verbs.17Why should weak and strong Agr behave in this fashion? One possibility,suggested by Howard Lasnik, is that it is simply a morphological property:only strong Agr can accept a heavy element such as a verb, though any Agrcan accept a light element such as an auxiliary. Another possibility, developed by Pollock, is that the difference reduces to -theory: strong Agr allowsan adjoined element to head a -chain, but weak Agr does not. If the auxiliaries

Some Notes on Economy of Derivation and Representation

123

are not -markers, then they can raise to Agr without a violation of the-Criterion, but raising a true verb to weak Agr will lead to a violation of the-Criterion.Looking at this option more closely, consider the effect of raising Y0to adjoin to X0. This process yields the structure (4), where t is the traceof Y0.(4)X0Y0

X0

The theory of government must permit Y0 to govern its trace t in this structure,so as to satisfy the ECP. If the theory of government precludes government ofY0 from outside of the complex element X0 formed by adjunction, then successive-cyclic movement of Y0 will be barred; thus, causative formation, forexample, cannot escape the HMC (assuming it to reduce to the ECP) bysuccessive-cyclic movement. I will assume this to be the case, putting a preciseformulation aside.The chain (Y0, t) will therefore be properly formed in (4) with regard to theECP. Suppose that Y0 is a -marker. Then t must be able to -mark; the-marking property of Y0 must be transmitted through the chain. That willbe possible if X0 is strong, but not if it is weak. We will therefore have a-Criterion violation if a -marker Y0 is adjoined to weak Agr.Suppose that instead of raising Y0 to adjoin to X0 to yield (4), we lower X0to adjoin to Y0. This process again forms the complex element [Y0X0], butwith a structure different from (4)namely, (5)t being the trace of X0.(5)Y0t

Y0

X0

Here the lower Y0 is the head of the construction, and we may assume thatwhatever the character of X0, Y0 will retain all relevant relations to other elements and will therefore retain the capacity to -mark a complement. Thenormal properties of adjunction, then, have the desired effect, as Pollockobserves: lowering of weak Agr to the verb v does not bar -marking of thecomplement, but raising of v to weak Agr does bar -marking.

124

Chapter 2

Pollock extends the domain of observation further to negation, proposing the

more articulated structure (6) in a Kayne-style unambiguous path analysis.(6)

IPI

NPI

(NegP)Neg

AgrPVP

Agr(Adv)

VPV

Here I may be [finite] and Neg is English not or French pas.18 Thisrepresentation, separating I and Agr, eliminates the odd dual-headednessof I in earlier treatments. The assumption is that infinitives have (generallyvacuous) Agr.Suppose that V raises to Agr. Then we have the S-Structure order VerbAdverb-Object, as with English auxiliaries or French verbs generally. If Agrlowers to V, we have the order Adv-V-Obj, as with English nonauxiliary verbs.If V raises to Agr and the complex then raises further to I, we have such formsas (7).(7) a. John has not seen Billb. Jean (n) aime pas MarieJean (ne) love neg MarieJean does not love MarieIf V raises to Agr but not to I, we have (8) in French, where sembler seemin (8a) contrasts with tre be in (8b).(8) a. ne pas sembler heureuxne neg seemhappynot to seem happyb. ntre pas heureuxne be neg happynot to be happy

Some Notes on Economy of Derivation and Representation

125

The properties illustrated in (7) and (8) follow on the assumption that[+finite] is strong and [finite] is weak. Being strong, [+finite] allows the verbaime to adjoin to it, crossing Neg (pas), in (7b). Being weak, [finite] doesnot permit the verb sembler to adjoin to it, crossing Neg, in (8a), though theauxiliary tre can raise to weak I just as auxiliaries can raise to weak Agr.Though the V-raising rule in French is obligatory for tensed clauses, it isoptional for infinitives. Thus, alongside (8b) we have the option (9a); andalongside the form V-Adv-NP (obligatory for finite verbs as in (2c)) we have(9b).(9) a. ne pas tre heureuxb. souvent paratre tristeoftenseemsad(9a) results from failure of tre to raise over Neg to [finite] I, and (9b) fromfailure of paratre to raise over the adverb to Agr in the infinitive. We returnin section 2.3.2 to the question of why raising should be optional just in thecase of the infinitive, and in section 2.5 to further questions about the natureof Agr. Tentatively, let us assume the analysis just given, putting aside theoptionality with infinitives.At S-Structure the verb must typically be combined with its various affixes,to yield the proper forms at PF; the various affixes in (6) must form a singlecomplex with a verb. Let us suppose that these affixes share some uniquefeature to guarantee proper association at S-Structure. Thus, any series of ruleapplications that separates them is barred by an appropriate S-Structure condition, and we need not be concerned if the rule system permits wild applications of rules that would leave affixes improperly scattered among the wordsof the sentence generated. Note that other improper rule applications are barredby the requirement that items lexically identified as affixes be properlyattached at S-Structure.Assuming Pollocks parameter, we have strong and weak inflectional affixes.The [+finite] choice for I (tensed) is strong and the [finite] choice (infinitive)is weak. Agr is strong in French, weak in English. The basic facts follow, withsome idealization of the data.Pollock observes that earlier stages of English were very much like French,suggesting plausibly that a change in the Agr parameter led to the collectionof phenomena that differentiate the languages in their current state. Some ofthe forms reflect D-Structure directly: for example, (9ab) in French and theirEnglish equivalents. Other forms reflect the consequences of raising of V toAgr or to I, as illustrated. Pollock points out that unitary treatment of thecomparative datawith the array of facts involving tense-infinitive, negation

126

Chapter 2

and adverbs, verbs and auxiliariesrelies crucially on analysis of Tense and

Agreement morphemes as separate syntactic entities at an abstract level ofrepresentation, namely, D-Structure. The analysis, he concludes, providessupport for the rigid X-bar-theoretic condition of single-headedness and theconsequent distinction between Agr and I, and for the distinction betweenD-and S-Structure representation.2.32.3.1

A Least Effort Account

Minimizing Derivations

Let us now see how an analysis of this nature would bear on the guidelineswe have been considering. I will put aside the relation of S-Structure to PFand D-Structure to lexicon. Thus, we are considering the relations amongD-Structure, S-Structure, and LF. For expository convenience, I will refer tothe relation of D- to S-Structure as overt syntax (since the consequences ofthe operations relating these levels are commonly reflected at PF).The analysis of verbal inflection outlined in section 2.2 relies cruciallyon the principle that raising is necessary if possible. This would follow fromthe assumption that shorter derivations are always chosen over longer ones.The reason is that lowering of an inflectional element Inf, as in the caseof English true verbs, yields an improper chain (t, , Inf), where Inf isadjoined to V at S-Structure to form [V VInf] and t is the trace of Inf, whichc-commands it. Subsequent LF raising of [V VInf] to the position of t istherefore required to create a proper chain. The result is essentially the sameas would have been achieved with the shorter derivation that involves onlyraising in the overt syntax. Therefore, by a least effort condition, only thelatter is permissible.A closer look shows that the least effort condition cannot reduce simplyto the matter of counting steps in a derivation. Consider English interrogatives.Let us assume that an interrogative construction has the complementizer Q([ + wh]) to distinguish it at D-Structure from the corresponding declarative,triggering the appropriate international structure at PF and the proper interpretation at LF. If Q is furthermore an affix, then it must be completed in theovert syntax by X0-raising. The D-Structure representation (10) will yield, bylowering, an S-Structure representation with the verb [VAgrI]19 and tracesin the positions of I and Agr.(10) Q John I Agr write booksThe resulting form is indistinguishable from the declarative at PF and is furthermore illegitimate (at S-Structure) if Q is a real element, as postulated. To

Some Notes on Economy of Derivation and Representation

127

permit an output from the legitimate D-Structure representation (10), English

makes use of the dummy element do to bear the affix, so that loweringdoes not take place; rather, Agr and I adjoin to do. Let us call this processdo-support, a language-specific process contingent upon the weakness ofAgr; for expository purposes, assume it to be a rule of the overt syntax inserting do in the Modal position, hence do-insertion, attracting the raised affixesand then raising to Q. Given this device, we can form did John write booksfrom (10).20The same device, however, permits the illegitimate form John did writebooks (do unstressed) alongside John wrote books, both deriving from thedeclarative form corresponding to (10) (lacking Q). In fact, this option is notonly available but in fact arguably obligatory if shorter derivations are alwayspreferred. The reason is that the illegitimate form requires only the rule ofdo-insertion and raising, whereas the correct form requires overt lowering andsubsequent LF raising.To yield the correct results, the least effort condition must be interpretedso that UG principles are applied wherever possible, with language-particularrules used only to save a D-Structure representation yielding no output:interrogative forms without modal or non--marking verbs, in this case. UGprinciples are thus less costly than language-specific principles. We maythink of them, intuitively, as wired-in and distinguished from the acquiredelements of language, which bear a greater cost.21Consider now a negative expression with the D-Structure representation(11).(11) John I Neg Agr write booksThe correct derivation involves do-insertion and raising of Agr to form thecomplex verb [doIAgr], with the S-Structure representation (12).(12) John did (does) not write booksBut again we face a problem: why doesnt I lower to Agr, then to V, yieldingthe complex verb [VAgrI] as in the nonnegated form, so that at S-Structureand PF we have John not wrote (writes) books? Then LF raising will apply,eliminating the improper chain, exactly as in the case of the nonnegativecounterpart. This process involves only the UG principles of overt loweringand LF raising, avoiding the language-particular rule of do-insertion. It istherefore not only a permissible derivation, but is actually required by theleast effort condition, as just revised.A partial solution to this problem is provided by the HMC. The process ofLF raising has to cross Neg, thus violating the HMC. There is therefore only

128

Chapter 2

one legitimate derivation: the one involving do-insertion, which is therefore

required in these cases.We are thus assuming that, given a well-formed representation at D-Structure, we necessarily apply the least costly derivation that is legitimate to yieldan S-Structure and, ultimately, a PF output.But several further questions immediately arise. Consider the French counterpart to (11) or, equivalently, the English form (13).(13) John I Neg Agr have written booksHere the correct derivation requires that the verb have raise to Agr, then to I,crossing Neg, to yield (14).(14) John has not written booksThe same will be true of a main verb in French, as in the counterpart to theD-Structure representation (11). If the HMC blocks the unwanted derivationwith LF raising over Neg in the case of (11), then why does it not equivalentlyblock the required derivation with overt raising over Neg in the case of (14)and the French equivalent to (11)?Note that a similar question also arises in the case of (11). Thus, the requiredderivation involves raising of Agr over Neg to I to form the complex verb[doIAgr] after do-insertion. Why, then, does overt raising of Agr over Negnot violate the HMC?22To deal with these questions, we have to consider more carefully the natureof deletion. Clearly, we cannot delete an element if it plays a role at LF: forexample, the trace of a verb. But such considerations do not require that thetrace of Agr remain at LF, since it plays no role at that level. We might, then,suppose that the trace of Agr is deletable (I will return to this conclusion in amore general setting in section 2.6.2). We must also determine exactly whatwe intend the process of deletion to be. There are various possible answers tothis question, generally not addressed because they go beyond known empirical consequences. In the present context, however, there are empirical consequences, so a specific decision must be reached. One plausible answer is thatdeletion of an element leaves a category lacking features, which we can designate [e]. The deletion leaves a position but no features, in particular, nocategorial features. Deletion of [Agr t], the trace of Agr, leaves [e], and byX-bar-theoretic principles, the dominating category AgrP is now eP, an XPwith no features.23 That is a satisfactory conclusion, since AgrP plays no roleat LF.Making these assumptions, let us return to the problems we faced. Considerfirst the raising of Agr to I over Neg to form [doIAgr] in the correct

Some Notes on Economy of Derivation and Representation

129

derivation from the D-Structure representation (11). This process will, in fact,violate the HMC regarded as a condition on derivations, but there will be noECP violation at LF once the trace of Agr is deleted. Recall that we are takingthe ECP to be a condition on chains, along the lines discussed in Chomsky1986a, thus not applicable to the empty categories PRO, pro, and e, but onlyto trace. We therefore have no ECP violation, though we do have an HMCviolation. But if the HMC is reducible to the ECP, then we can dismiss theHMC as a descriptive artifact, valid only insofar as it does in fact reduce tothe ECP. The present case would be one in which the HMC does not reduceto the ECP and is therefore inoperative.Let us now turn to the more general question. Why does LF raising of[VAgr] to I over Neg violate the HMC, whereas overt raising of [VAgr] toI over Neg (as in the case of English auxiliaries and all French verbs) does notviolate the HMC? To answer this question, we must again consider moreclosely the structures formed by adjunction.Let us return to the D-Structure representations (11) and (13), repeated herein (15).(15) a. John I Neg Agr write booksb. John I Neg Agr have written booksLowering of I to Agr forms the element [Agr AgrI], leaving the trace t1. Furtherlowering of the complex element to V forms [V V [Agr AgrI]], a verb, leavingthe trace tAgr. But this trace deletes, leaving [e], a position lacking features.Applying these processes to (15a), then, we derive the S-Structure representation (16).(16) John t1 Neg [e] [VP[V write [Agr AgrI]] books]We now turn to LF raising. The complex V raises to the position [e], leavinga V-trace; we may assume this to be substitution, not adjunction, on a naturalinterpretation of recoverability of deletion. We now raise this element to theposition t1, again leaving a V-trace. The latter is of course undeletable, beingpart of a chain with substantive content at LF. This step violates the HMC;and its residue, (17), violates the ECP at LF.(17) John [V writeAgrI] Neg tV [VP tV books]Here antecedent government of tV is blocked by the intermediate element Neg,under the Minimality Condition. We therefore have a violation of the ECP atLF. In this case the HMC, reducing to the ECP, is a valid descriptive principle,violated by the derivation.

130

Chapter 2

Note that the situation contrasts with overt raising of V to Agr, then to Iover Neg, as in the case of (15b) (and all French verbs). Here raising to Agris permitted, therefore obligatory by the least effort condition. Following thederivation step by step, we first raise V to Agr, leaving V-trace and forming[Agr VAgr]. We then raise this complex element to I over Neg, forming[I VAgrI] and leaving Agr-trace; this step violates the HMC. The Agr-tracenow deletes, leaving [e]. We thus derive the form (18).(18) John [I haveAgrI] Neg [e] [VP tV ...]This representation induces no ECP violation,24 though the derivation thatformed it violates the HMC. Again, we see that the HMC is descriptively validonly insofar as it reduces to the ECP.The problems that arise therefore receive straightforward solutions when weconsider the nature of adjunction, as standardly defined. Note, however, thecrucial assumption that unnecessary elements delete at LF; we return to thematter in section 2.6.2. Also crucial is the assumption that D-Structure relatesto S-Structure by a directional mapping, a step-by-step derivational process.In the S-Structure (and LF) representation (18), have is too far from its tracetv for the ECP to be satisfied, but the locality requirement has been satisfiedin the course of the derivation from D- to S-Structure.252.3.2

The Element I

Let us turn to some speculations on the status of IP and the optionality

observed earlier in French infinitival constructions. If I is [+finite] (I = T =tense), then it presumably cannot be deleted, since a tensed phrase plays anLF role. Therefore, we have either overt raising to [+finite] or LF raising tothe position of its trace.There is, however, no strong reason to suppose that the same is true of[finite] (infinitive). If [finite] and its IP projection play no role at LF, thenthis element should be deletable, just as Agr (actually, tAgr) is. Suppose thatthis is the case.26Before considering the consequences, we have to resolve a minor technicalquestion about infinitival inflection: does [finite] attach to the base form ofthe verb or does it not? Little is at stake in the present connection; for concreteness, let us adopt the former alternative.Keeping now to French, consider verbs that can raise to weak inflection,for example, tre be. Suppose that we have the form (19), with tre raisedto Agr.(19) ne I pas tre heureux

Some Notes on Economy of Derivation and Representation

131

In this construction, tre may raise further to I in the normal way, yielding theform (20).(20) ntre pas heureuxBut there is also another option. The form tre may remain in place, withI lowering to [treAgr], leaving not trace but [e]. This is permissible onthe assumption we are now considering: that [finite] is deletable, playingno LF role. The resulting form is (21), identical to (19) but with [e] inplace of I.(21) ne pas tre heureuxEach of these options involves one rule application. Therefore, the two areequally costly and we have genuine alternatives, in conformity with the leasteffort guideline. As observed earlier, these two cases are both permitted inFrench.Consider now a true verb, such as paratre seem. We know that it cannotraise to I, so I must lower to Agr, leaving [e]. Suppose now that paratre is inan adverbial construction, as in the D-Structure representation (22).(22) souvent paratre tristeIf paratre raises to Agr in the usual way, we derive the form (23).(23) paratre souvent tristeSuppose, however, that [AgrI] lowers to the V position, leaving [e]rather than trace. The resulting form is (22) itself, a legitimate form with noECP violation. Again we have two options, (22) and (23), each involving asingle rule, each legitimate. The reason is that Agr and its projection, exactlylike [finite] I and its projection, play no role at LF and are thereforedeletable.We conclude, then, that although there are no options in the finite forms,their infinitival counterparts allow the options illustrated. Along these lines,we might hope to incorporate Pollocks observations about the range of optionsfor infinitives as distinct from tensed clauses.We have not settled the precise character of LF raising to the trace of[+finite]. What is required is that the finite (tensed) phrase, functioning at LF,not be deleted. The requirement is met under LF raising, which might be eitheradjunction or substitution. If it is adjunction, the resulting form will be (24),which heads TP, where T = [+finite] (tense).(24) [T[V V [Agr AgrT] tT]]

132

Chapter 2

We must then take this to be a legitimate form, with T c-commanding its tracetT. If the LF raising is substitution, we derive (25) in place of (24) in the Iposition, now heading VP.(25) [V V[Agr AgrT]]The question of government of tT does not now arise, but we must ask justhow the element (25) in the I position satisfies the requirement of tense interpretation at LF. The further implications are not clear, and I will leave thequestion open.2.4

Summary: On Economy of Derivation

Summarizing, we have selected one particular option available for sharpening

the notion of deletion, previously left undetermined; and we have made adistinction between deletable and nondeletable elements on the basis of theirLF role. These moves are natural and seem generally unexceptionable. Apartfrom this, we have kept largely to familiar assumptions along with Pollocksbasic analysis, modified in various ways. Attending to the meaning of theformalism for adjunction and other notions, the basic empirical observationsfollow.Some more general conclusions are also suggested. First, the HMC is nota principle, though it is largely accurate as a descriptive generalization. Theprinciple is valid only insofar as it reduces to the ECP, and it can be violatedwhen other processes overcome a potential ECP violation by eliminating anoffending trace. Second, we now have a somewhat more specific interpretation of the least effort guidelines. The condition requires that the least costlyderivation be used, eliminating the S-Structure and PF consequences of morecostly derivations. To a first approximation, cost is determined by length; thecondition requires the shortest derivation, so that overt raising is requiredwhere it is possible. But cost has a more subtle meaning: UG principles areless costly than language-specific rules that are contingent upon parameterchoices (see note 20); and do-insertion, in particular, functions only as a lastresort, to save a valid D-Structure representation that otherwise underliesno legitimate derivation.Other well-known facts suggest further refinement of the notion leastcostly derivation. Consider, for example, a standard case of longdistancemovement, as in (26).(26) how do you think that John said [that Bill fixed the car t]?The sentence is well formed by successive-cyclic movement. There is, ofcourse, a shorternamely, one-stepderivation, in which case, on the general

Some Notes on Economy of Derivation and Representation

133

principles so far assumed, the sentence should have a status no different

from (27).(27) how do you wonder why John asked [which car Bill fixed t]The shorter derivation does not bar the longer successive-cyclic one in thiscase. In fact, the shorter derivation is barred; it is not the case that (26) isstructurally ambiguous, with one interpretation given by the legitimate derivation and another deviant interpretation given by the illegitimate shorter one.Hence, it must be that the measure of cost prefers short movement to longmovement and thus requires the former where possible.In such ways as these, we may proceed to refine the least effort conditionson movement, raising them from the status of imprecise guidelines to actualprinciples of UG.Notice that this approach tends to eliminate the possibility of optionality inderivation. Choice points will be allowable only if the resulting derivations areall minimal in cost, as in the case of French infinitival constructions discussedearlier. Any remaining examples of optional rule application would then haveto be assigned to some other component of the language system, perhaps astylistic component of the mapping of S-Structure to PF. This may well betoo strong a conclusion, raising a problem for the entire approach.2.5

The Agreement System: Some Speculations

A number of questions arise about the status of Agr in the system just outlined.Following Pollock, we have assumed that Agr is dominated by T(ense). Butassuming these elements to be dissociated, one might rather expect Agr todominate T, since it presumably stands in a government relation with thesubject in tensed clauses, to yield the standard subject-verb agreement phenomena. There is morphological evidence, discussed by Belletti (1990), suggesting the same conclusion: in a number of languages where it is possible toobtain relevant evidence, the agreement element is outside the tense elementin the verbal morphology, as would follow from successive adjunction if Agrdominates T. Nevertheless, facts of the kind just illustrated lead Pollock topostulate a position intermediate between T and VP, what he takes to be theAgr position.These conflicts might be reconciled by noting that there are actuallytwo kinds of Verb-NP agreement: with subject and with object. Hence, pursuing the basic lines of Pollocks analysis, we should expect to find two Agrelements: the subject agreement element AgrS and the object agreementelement AgrO. On general assumptions, AgrO should be close to V, and Agrs

134

Chapter 2

close to the subject, therefore more remote from V.27 The element Agr inPollocks structure (6), which we have adopted as the basis for discussion,would therefore be AgrO, providing an intermediate position for raising.It would then be unnecessary to suppose that infinitives necessarily carry(generally vacuous) subject agreement, though we would now be assumingthat AgrO is present even for nontransitives. Pollocks structure (6) would nowbe more fully articulated as (28), where AgrS = I, the head of I and IP, and Fis [finite].(28)

IPI

NPAgrs

FPF

(NegP)AgrP

NegAgro

VP(Adv)

VPV

In terms of this proposal, the preceding analysis considered only the structure dominated by FP, which is identical with Pollocks (6) (notations aside).28These conclusions are consistent with Kaynes (1989) analysis of participleagreement in a variety of Romance languages. Kayne assumes an Agr elementheading AgrP, with VP as its complement. This element is distinct from theAgr involved in subject agreement; we may take it to be AgrO. Thus, we havesuch D-Structure representations as (29), for a French participial construction,putting aside I and AgrS.(29) NP Vaux [AgrP Agr [VP V-participle NP]]If the NP object is a wh-phrase that undergoes raising, then the participlemay or may not agree with it. Kayne assumes that these options correspondto two distinct structures, as in (30), where t, t are the traces of the wh-phrasehow many tables.

Some Notes on Economy of Derivation and Representation

135

(30) a. combien de tables [Paul a [AgrP t, [AgrP Agr [repeint- t]]]]

how many (of) tables Paul hasrepaintedb. combien de tables [Paul a [AgrP Agr [repeint- t]]]The two forms are synonymous, meaning how many tables has Paul repainted.In (30a) the participle surfaces as repeintes (plural), in (30b) as repeint (lackingagreement).In the derivation of (30a), the wh-phrase raises to the position of the tracet adjoining to AgrP. In this position, it is in a government relation with Agr(in our terms, AgrO). The participle thus agrees with its wh-phrase object.29The underlying assumption is that object agreement is contingent upon agovernment relation between Agr and an NP, exactly as in the case of subjectagreement. In (30b) the wh-phrase has not passed through the adjoined position, so there can be no agreement.30Since t, adjoined to AgrP, is in an A -position, it follows, Kayne observes,that there will be no participial agreement with the wh-phrase in the case ofan expletive subject (as is the case), on the assumption of expletive replacement, to which we return in section 2.6.3. The reason is that expletive replacement would require improper movement of the trace t of the wh-phrase froman A - to an A-position.If an NP remains in the object position, there is no participial agreement,though we again find such agreement in clitic movement, as in (31).(31) a. Paul a repeint (*repeintes) les tablesb. Paul les a repeintesThe reason is that the object les tables in (31a) is not in the appropriategovernment relation with AgrO (the relation is barred by the Minimality Condition on government, since the participle intervenes), whereas in (31b) theclitic has raised to a position governed by Agr, perhaps [Spec, AgrP]. Kayneargues further that although the two agreement processes (with wh-movementand clitics) are not clearly dissociated in French, comparative evidence showsthat they are in fact distinct processes and that the clitic does not adjointo AgrP.The question arises why the NP object cannot appear in the postulated position associated with Agr, say, its specifier position, as in (32).(32) *Paul a [les tables repeint(es)]Base generation is excluded if we take -marking to be to the right in French;or, as in recent work that assumes raising of the subject from VP to [Spec, IP],we might assume that -marking must be internal to the projection of the-marking head, thus impossible in (33).

136

Chapter 2

(33) ... [AgrP NP Agr [VP V]]

Failure of the nonclitic object to raise to the position in (32) follows from theChain Condition if the participle assigns Case directly to its object, to its rightin the base form, as Kayne assumes.31Without reviewing the further consequences that Kayne develops, note thatthe analysis supports the idea that an Agr position intervenes between T andthe V, and that this element is distinct from the subject agreement element.Furthermore, we have evidence that object agreement, like subject agreement,is based upon a government relation between Agr (in this case, AgrO) andthe NP.Koopman (1987) has independently proposed that agreement is always thereflection of a Spec-head relation.32 We might revise this proposal to accordwith Kaynes: agreement with an NP is always the reflection of a governmentrelation between the head Agr and the NP, either the Spec-head relation or therelation of the head to an adjoined element, the Agr typically being associatedwith the verb at S-Structure by the processes we have been discussing.Koopman suggests further that this idea may relate to her earlier proposal thatthe order parameters of the X-bar system involve two independent factors:directionality of Case marking and -marking (Koopman 1984; see also Travis1984). If Case marking is to the left and -marking to the right, then NP willbe in prehead and other -marked complements in posthead positions.We might carry the proposals a step further, supposing that structural Casegenerally is correlated with agreement and reflects a government relationbetween the NP and the appropriate Agr elements. Thus, subject-verb agreement is associated with nominative Case and is determined by the relation ofthe specifier to the Agrs head of AgrsP ( = IP, in (28)), whereas verb-objectagreement is associated with accusative Case and is determined by the relationof the NP to the AgrO head of AgrOP, either in specifier position or adjoinedto AgrO. The relations might be uniform at LF, parameterized at S-Structure,with Case checking and Case marking perhaps dissociated.Note finally that if the proposal just outlined is tenable, with AgrO distinctfrom AgrS, then one of the problems discussed earlier in connection withexample (11), repeated as (34), does not arise.(34) John I Neg Agr write booksThe problem was to ensure do-insertion and raising of Agr to form thecomplex verb [V doAgrI] with no violation of the HMC, while barring analternative derivation with overt lowering. If we were to adopt the structure(28) rather than (6), distinguishing AgrS from AgrO, then Agr in (34) isactually AgrO, which would not raise over Neg, but would lower to V (with

Some Notes on Economy of Derivation and Representation

137

subsequent LF raising to the position of the trace of AgrO to form a proper

chain). There is, then, no violation of the HMC, straightforwardly. The moregeneral problems discussed earlier remain, however, still motivating the argument presented.2.6

Economy of Representation

It has been suggested elsewhere that movement is available only as a last

resort. The preceding discussion suggested that deletion might also beregarded as a last resort operation, applicable where necessary, but not otherwise, and that the same is true of whatever is involved in do-support: insertion, if that is the proper way to interpret the phenomenon. More generally,then, it may be that the principle Affect applies only where necessary. Thisoverarching principle, then, expresses a general property of transformationalrulesor more properly, of the transformational rule, actually a principle ofUG. The intuitive meaning is that derivations must be as economical as possible: there is no superfluous rule application. The intuitive content of thisidea, however, is spelled out in terms of specific notions of cost that distinguishUG principles from language-particular properties, introduce locality considerations, and so on. We thus have a plausible least effort principle, but aprinciple that is apparently specific to the language faculty in its actual formulation. This is a familiar conclusion elsewhere as well, one that bears onthe nature of the language faculty generally.The analogous principle for representations would stipulate that, just asthere can be no superfluous steps in derivations, so there can be no superfluoussymbols in representations. This is the intuitive content of the notion of FullInterpretation (FI), which holds that an element can appear in a representationonly if it is properly licensed. Let us proceed now to ask how this intuitivenotion might be refined, in an effort to move it too from the status of a guideline toward that of a principle of UG.It would be natural to expect that FI holds at each of the three fundamentallevels that constitute an interface between the computational system of language and other systems: hence, at the levels of D-Structure, PF, and LF. Ifso, then licensing under FI is expressed in terms of conditions relating thesyntax, broadly construed, to other systems of the mind/brain.At D-Structure FI holds by definition, this level simply being a projectionof lexical structure in terms of the notions of X-bar theory.33 At PF it is universally taken for granted, without discussion, that the condition holds in astrong form. That is, a condition on phonetic representation is that eachsymbol be interpreted in terms of articulatory and perceptual mechanisms in

138

Chapter 2

a language-invariant manner; a representation that lacks this property is simply

not considered a phonetic representation, but instead is considered a higherlevel representation, still to be converted to PF. Like D-Structure, PF isunderstood to be defined by some version of FI. The corresponding notion atLF would be that every element that appears at LF must have a languageinvariant interpretation in terms of interactions with the systems of conceptualstructure and language use. Let us explore this idea further.2.6.1

Operators and Variables

One consequence is that vacuous quantification should be forbidden. That is,

language should differ from typical formal systems that permit vacuous quantification freely, with the well-formed expression (x) (2 + 2 = 4) receivingthe same interpretation as 2 + 2 = 4. Formal systems are designed this wayfor ease of description and computation, but the design of human language isdifferent. Thus, we cannot have such expressions as (35a) interpreted as Johnsaw Bill, or (35b) interpreted as some person left.(35) a. who John saw Billwho did John see Billb. every some person leftSimilarly, if a language permits such structures as (36), the vacuous operatorinterpretation is excluded.(36) a. who did Mary see himb. the man that Mary saw himThese expressions cannot be interpreted to mean Mary saw x the man y suchthat Mary saw x, respectively. If some theory of grammar stipulates specificdevices and rules to bar such constructions and interpretations, we concludethat it is the wrong theory: it is generating expressions and structures tooaccurately and is therefore incorrect. There is nothing paradoxical about thisconclusion. The unwanted constructions are excluded on general grounds, interms of the overarching condition FI; there is no reason to suppose that themechanisms of language include superfluous devices and rules to achieve,redundantly, the same result in special cases. Similarly, the phonological component contains no rules to express special cases of general properties ofuniversal phonetics or of phonetic representations.A related question has to do with free variables. What is their status innatural language? Typically, formal systems permit well-formed expressionswith free variables, interpreting them as universally quantified or with the freevariable treated as an arbitrary name, as in the course of natural deduction and

Some Notes on Economy of Derivation and Representation

139

intuitive mathematics generally. One natural-language analogue to a free variable would be an empty category bound by an empty operator. There is quitestrong evidence that such constructions exist, for example, in complex adjectival constructions such as (37).(37) a.b.c.d.

John is too clever to catch

John is too clever to expect anyone to catch*John is too clever to meet anyone who caughtMary expected John to be too clever to catch

The general properties of these and many other constructions follow fromthe assumption that the underlying D-Structure representation is as in (38a)(for (37a)) and that empty-operator movement, meeting the usual conditionson A -movement, raises the empty category Op to the C position of the bracketed clause (to the specifier position of CP), leaving a trace t in the S-Structurerepresentation (38b).(38) a. John is too clever [CP PRO to catch Op]b. John is too clever [CP Op [PRO to catch t]]But variables are subject to the property sometimes called strong binding:a variable must have a range determined by its restricted quantifier (languagepermitting no unrestricted quantification, as distinct from typical formalsystems), or a value fixed by an antecedent that meets certain structuralproperties: thus John but not Mary in (37d). The latter condition applieswhen the operator is an empty category. (37a), for example, cannot mean thatJohn is so clever that he cannot catch everything, or that he cannot catchsomething (someone) or other, analogous to John ate, meaning that John atesomething or other. In short, language does not permit free variables: thestrong binding property determines the curious semantic properties of theseconstructions. We might think of this condition as a specific application of theUG condition FI.In these terms, we would interpret the empty operator binding an emptypronominal, in the sense of Huangs (1984) work on Chinese, as restricted,in that it is necessarily discourse-related. There are semifree variables such asPRO and one, which, however, always appear to have special properties, specifically, human or animate (e.g., it is easy to roll down a hill does not referto a rock). Thus, a true free variable interpretation is disallowed.2.6.2

Legitimate LF Elements

A further sharpening of the condition FI is suggested by consideration of what

counts as a proper element at the LF level. The question here is analogous to

140

Chapter 2

the question of what counts as a phonetic element at the PF level. Each relevantelement at the LF level is a chain (39), perhaps a one-membered chain.(39) (1, ..., n)It seems that the following elements are permitted at LF, each a chain (39):1. Arguments: each element is in an A-position, 1 Case-marked and n-marked, in accordance with the Chain Condition.342. Adjuncts: each element is in an A -position.3. Lexical elements: each element is in an X0 position.4. Predicates, possibly predicate chains if there is predicate raising, VPmovement in overt syntax,35 and other cases.5. Operator-variable constructions, each a chain (1, 2), where the operator1 is in an A -position and the variable 2 is in an A-position.These are the only elements that seem to have an interpretation at LF. Suppose,then, that these are the only elements permitted at LF, in accordance with FI.Then the rule Affect may apply (and must apply) only to yield such anelement, given an illegitimate object. We conclude that Agr-trace (and perhapsthe trace of [finite]) must be eliminated, and V-trace may not be eliminated,as required for the proper functioning of the ECP if the argument sketchedearlier is correct.36Consider successive-cyclic A -movement from an A-position. This will yielda chain that is not a legitimate object; it is a heterogeneous chain, consistingof an adjunct chain and an ( A , A) pair (an operator-variable construction,where the A -position is occupied by a trace). This heterogeneous chain canbecome a legitimate objectnamely, a genuine operator-variable constructiononly by eliminating intermediate A -traces. We conclude, then, thatthese must be deleted at the point where we reach LF representation.37 Incontrast, intermediate A -traces formed by successive-cyclic movement froman A -position need not be deleted, since the chain formed is already a legitimate objectnamely, an adjunct; since they need not be deleted, they maynot be deleted, by the least effort principle for derivations already discussed.The same is true for A-chains (arguments) and X0-chains (lexical elements).On these naturalthough of course not logically necessaryassumptions, wederive, in effect, the basic principle for trace deletion stipulated in Lasnik andSaitos theory of the ECP, now a consequence of the general condition FI, withmay delete strengthened to must delete. There are further consequences,and interesting questions arise with regard to the specifier of NPs, which sharessome properties of A-positions and other properties of A -positions, but I willnot pursue these matters here.

Some Notes on Economy of Derivation and Representation

2.6.3

141

FI and Expletives

Consider finally the status of expletive elements, such as English there or

Italian ci, or their various counterparts, null or overt, in other languages. Thiselement receives no interpretation and therefore is not licensed as a legitimateLF object. It must therefore somehow be removed. Elsewhere I have suggestedthat there is eliminated by LF substitution.38 But there has specific features,and we might suppose on these grounds that it is undeletable, by the conditionon recoverability of deletionyet to be precisely formulated. Then we musttreat there as an LF affix; something must adjoin to it.The expletive there has three salient properties. First, an NP must appear ina certain formal relation to there in the construction; let us call this elementthe associate of the expletive and take the expletive to be licensed by its presence. Second, number agreement is not with there but rather with the associate.Third, there is an alternate form with the associate actually in the subject position after overt raising. Thus, we have (40), with the associate in italics, butnot (41).(40) a. there is a man in the roomb. there are men in the roomc. a man is in the room(41) a. *there was decided to travel by planeb. *there is unlikely that anyone will agreeThese properties are rather naturally explained on the assumption, derivingfrom FI, that the expletive is an LF affix, with its associate adjoining toit. Since there lacks inherent -features (including number) and category,these features will percolate from its associate on usual assumptions. Ifagreement is checked at LF, then it will already have to have been establishedat S-Structure between Agrs and the associate of there, as in (40ab), yieldingthe observed overt agreement. This analysis fits readily into the frameworkalready outlined, particularly if agreement and Case are treated in themanner suggested: both assigned by S-Structure since they may appearovertly, both checked at LF since they have LF consequences having todo with visibility (the Case Filter) and the Chain Condition.39 If we assumefurther that the specifier of IP (AgrsP, if the speculations of section 2.5 arecorrect) must be an NP with -features matching Agrs, then it will also followthat the associate must be an NP; and it is this NP that raises in overt syntax,as in (40c).Burzio (1986) argues further that if the expletive is a clitic, it will haveto satisfy additional conditions holding generally between a clitic and the

142

Chapter 2

position associated with it, specifically, a very restrictive locality condition

that, he argues, holds at D-Structure; on this further assumption, he derives aninteresting range of phenomena that differentiate English, Italian, French, andPiedmontese expletive constructions. On the general assumptions of the P&Papproach, we expect to find that expletive constructions of this type have thesame basic properties across languages, with differences explicable in termsof the lexical properties of the elements involved.For such reasons, then, it is plausible to assume that there (and its counterparts) is indeed an LF affix, as required by FI.In (40a) LF adjunction of the associate to the expletive yields the phrase(42) as subject, the complex constituting an NP by percolation.(42) [NP there[NP a man]]Other well-established principles conspire to guarantee that the only elementthat can adjoin to the expletive is the associate with the appropriateproperties.Given that there must have an NP associate, it follows that some otherexpletive (in English, it) is associated with clauses, as in (43), contrasting with(41).(43) a. it was decided to travel by planeb. it is unlikely that anyone will agreeIt should therefore not be necessary to stipulate distributional conditions onthere and it expletives, or their counterparts in other languages, when theirlexical properties are considered.40It also follows that at S-Structure an expletive E and its associate A mustsatisfy all LF chain conditions, since there is a chain ([AE], ... , tA]) at LF.Given the Chain Condition holding at LF, it must be that at S-Structure theexpletive E is in a Case-marked position and the associate A in a -position.41Furthermore, if we assume that the binding theory holds at LF, then atS-Structure A and E must be in a relation that satisfies Condition A, since atLF an antecedent-trace relation holds of their S-Structure positions. Similarly,the ECP, a chain condition at LF, will have to hold of the expletive-associatepair at S-Structure. These consequences are largely descriptively accurate, asillustrated in (44).42(44) a. *there seems that a man is in the room (ECP violation)b. *there seems that John saw a man (Condition A violation)Similarly, other conditions on movement must be satisfied. Compare theexamples in (45).

Some Notes on Economy of Derivation and Representation

143

(45) a. *there was thought that [pictures of a man were on sale]

b. we thought that [pictures of each other were on sale]c. *a man was thought that [pictures of t were on sale]The italicized elements are properly related in (45b), but not in (45a) or (45c).The problem with (45a) is not the binding theory, as (45b) shows, but rathera condition on movement (the ECP), as we see from (45c).Such properties of expletives now follow from FI, without further stipulation. Note that it also follows that the binding theory must apply at LF; whetheror not it also applies elsewhere (including S-Structure) is a separatequestion.Another consequence has to do with Condition C of the binding theory,which requires that an r-expression, such as the associate of an expletive, beunbound. A long-standing question has been why there is no Condition Cviolation in the case of an expletive and its related associate. But we nowassume that the two simply have different indices.43 There is, therefore, noneed to complicate the binding theory to exclude this case, as in a number ofproposals over the past years.Certain problems of scope of the kind discussed particularly by Edwin Williams also are overcome. Consider the sentences in (46).(46) a. I havent met many linguistics studentsb. there arent many linguistics students here(46a) has a scopal ambiguity, but in (46b) many unambiguously has narrowscope. The LF representation of (46b) is (47).(47) [NP[there [A many linguistics students]] are not tA here]If many linguistics students were literally to replace there, it would be expectedto have scope over not, but in (47) no relation is established between the two,and the scope of many can be assumed to be narrow, as in pictures of manystudents arent here.442.6.4

Further Questions concerning LF Raising

There is one major exception to the generalization that the expletive E and itsassociate A are in a binding theory (Condition A) relation at S-Structurenamely, raising constructions such as (48).(48) *there seems [a man to be in the room]Here the expletive-associate pair satisfies all chain conditions, but the expression is ungrammatical.

144

Chapter 2

A natural explanation of these facts is provided by Bellettis (1988) theory

of partitive Case assignment. Taking partitive Case to be oblique, therefore-related in accord with the uniformity condition on Case assignment (seeChomsky 1986b), partitive Case will not be assigned to the associate in (48)but will be properly assigned at S-Structure to the associate of the expletiveafter unaccusatives and, we must assume, copula, as in there arrived a man,there is a man in the room. Assume as before that Case must be assigned atS-Structure, given that it appears at PF and is relevant at LF. Then (48) is *,since an S-Structure condition is violated. Note that even with these assumptions, it still follows that there must be in a Case-marked position, by the ChainCondition, which requires that an LF chain be headed by a Case-markedposition.45If this line of argument is correct, there cannot be a process of Case transmission, for that process would allow (48) to satisfy the Case Filter. Rather,Case must be assigned at S-Structure directly by some Case marker or otherdevice.46 Lasnik (1989) observes that similar conclusions follow from suchexamples as (49).(49) a. I consider [there to be a solution]b. *I consider [there a solution] (analogous to I consider Johnintelligent)In (49a) it must be that be assigns Case directly to a solution; therealso receives Case (from consider), so that the Chain Condition is satisfiedafter LF raising. There is, it seems, no S-Structure process transmittingCase from the expletive there to its associate, the phrase a solution in theseexamples.Safir (1985) notes the existence of pairs like (50ab).47(50) a. [wh how many men] did John say that [there were twh in the room]b. *[wh how many men] did John say that [twh were in the room](50b) is a standard ECP violation; the trace twh is in a position that is not-marked, in Lasnik and Saitos (1984) sense. The question then arises whythis is not also true of (50a), if the trace twh, the associate of the expletive there,is raised by LF movement to the position of there. Lasnik and Saitos theoryprovides an explanation, whether we assume LF substitution or, as above, LFadjunction. In either case the trace twh is -marked by the process of whmovement in overt syntax and retains this property when it raises to the position of the expletive, so there is no ECP violation. Similar observations holdwith regard to Rizzis (1982) analysis of wh-extraction of subjects in Italian:the subject first extraposes, leaving expletive pro subject, and then undergoes

Some Notes on Economy of Derivation and Representation

145

normal wh-movement, leaving a trace t, -marked in overt syntax and then

raising at LF to the position of the expletive.The notion of LF adjunction eliminates much of the motivation for Casetransmission theories of expletive-associate relations, and these approaches arestill more dubious in the light of the observations just reviewed (see alsoPollock 1981 and Kayne 1989). Nevertheless, there is evidence supportingCase transmission.An indirect though plausible argument for Case transmission is developedby Koopman (1987) in a comparative study of the West African languageBambara and languages of the French-English type. Koopman postulates aparametric difference between languages that have Case chains ([+CC]) andthose that do not ([CC]). Bambara is [CC] and English-French, [+CC].Koopman considers three kinds of Case chains.(51) a. (V, ... , t), where V is a Case assignerb. (Op, ... , t), where Op is an operator and t the variable it bindsc. (E, ... , NP), where E is an expletive and NP its associateCase (51a) results from V-raising. In a [+CC] language, the trace of V willassign the Case transmitted from V through the chain. In a [CC] language,lacking Case chains, the trace will be unable to assign Case, and raising oftransitive verbs will therefore be impossible.Case (51b) is standard operator movement. Typically, the trace must be ina Case-marked position, and, Koopman assumes, the operator must inheritCase from it to satisfy the Case Filter. This will be possible in a [+CC] language, impossible in a [CC] language, which will therefore lack overt operator movement.Case (51c) is the expletive-associate relation. In a [+CC] language, Casecan be transmitted from E to NP, as in standard Case transmission theories,and the Case Filter is therefore satisfied. In a [CC] language, there can be noexpletives, for Case transmission will be impossible, Case chains not beingpermitted.Koopman observes that in all respects, English-French are of the [+CC]variety, whereas Bambara is of the [CC] variety. Omitting details, we findin Bambara the following properties. Consider Case chains of type (51a).A verb that does not assign Case raises to I, but a verb that assigns Caseremains in place, with a dummy element inserted to bear the affix; theexplanation is that the trace could not assign Case if the verb were to raise. Incausative formation, an intransitive verb raises to form a complex V-causativeconstruction in the familiar way, but this is impossible for a transitive verb,which allows causative only if the external argument is suppressed, as if prior

146

Chapter 2

passivization had taken place. These properties follow on the assumption thatthe trace of a transitive verb cannot assign Case; since the complex verbassigns its sole Case to the obligatory object, the subject cannot appear.With regard to property (51b), Bambara has only wh- in situ, as predicted.As for (51c), there are no overt expletives; rather, the associate raises overtlyto subject position, again as predicted.We thus have an indirect argument in favor of Case transmission, absent asa device just when Case chains generally are not permitted.Can we reinterpret these data so as to resolve the conflict between the argument for Case transmission and the evidence against such a process? Supposewe reinterpret Koopmans parameter in the following way, in accord with theplausible and generally applicable principle that parameters are lexical, thatis, statable in terms of X0 elements and X0 categories only. We then considerthe property [C], which an X0 element may or may not have. A [+C] elementcan enter into Case relations, either assigning or receiving Case; a [C]element cannot. Suppose further that X0 elements with lexical content arealways [+C], but that languages can differ with respect to whether other X0elements are [+C] or [C]. The parameter is restricted to functional elements,in accordance with the plausible condition discussed earlier. French-Englishare [+C], meaning that all X0 elements may enter into Case relations; Bambarais [C], meaning that only a lexical X0 enters into such relations.Turning to the three properties, (51a) follows directly: in Bambara, the traceof V, being [C], cannot assign Case. As for (51b), the trace of the operatorcannot receive Case in Bambara, being [C], so that we have a typical violation of the Case Filter (or the visibility requirement from which it derives),with a variable heading a (perhaps one-membered) chain that violates theChain Condition, since it lacks Case. Note that we need not assume that theoperator requires Case, an otherwise unmotivated assumption, particularlyunnatural for empty operators.The property that concerns us directly is (51c). Since Bambara is [C], anexpletive cannot receive Case. If the language had expletives, then LF raising(which Koopman assumes) would form a chain headed by an element in anon-Case-marked position, violating the Chain Condition. Consequently, therecan be no expletives, and overt raising is required.There seems, then, to be no strong argument for Case transmission, if thisline of argument is viable.48 We do, however, have evidence for a narrowlyspecified parametric difference involving Case theory, with a range of interesting consequences. I am not aware of other convincing evidence for Casetransmission, so it may be that the property can be eliminated from UG, infavor of LF movement, driven by FI.

Some Notes on Economy of Derivation and Representation

2.7

147

Some Conclusions on Language Design

Summarizing, we have found evidence to support the basic assumptions on

language design sketched in section 2.1, the more specific assumptions concerning the separate syntactic status of Tense and Agreement elements, andthose of subsequent discussion. There is varied evidence suggesting that bothderivations and representations are subject to a certain form of least effortcondition and are required to be minimal in a fairly well defined sense, withno superfluous steps in derivations and no superfluous symbols in representations. Proceeding in the way indicated, we may hope to raise these leasteffort guidelines to general principles of UG. Notice that although theseprinciples have a kind of naturalness and generality lacking in the specificprinciples of UG such as the ECP, the binding theory, and so on, neverthelesstheir formulation is, in detail, specific to the language faculty.As discussed elsewhere (see Chomsky 1991a), these properties of UG, ifindeed they are real, are rather surprising in a number of respects. For onething, they are the kinds of properties that yield computational difficulties,since structural descriptions have to meet global conditions. From the pointof view of parsing, suppose that we have a process recovering an S-Structurerepresentation from the PF representation . Then to determine the status of, we have to carry out a number of operations. We have to determine whether is derived from a properly formed D-Structure representation licensed bythe lexicon, and whether the derivation from the LF representation isminimal in the required sense, less costly than any other derivation from .Furthermore, we have to determine whether satisfies the conditions of external licensing, FI, and other properties of LF. In general, these computationsmay be nontrivial. In these respects, language design appears to be problematicfrom a parsing-theoretic perspective, though elegant regarded in isolationfrom considerations of use. The basic assumption that the fundamental levelsare those that satisfy the external licensing conditions at the interfacewith other systems already illustrates these properties, and the least effortconditions, though natural and plausible in terms of empirical consequences,provide further illustration. The discrepancies between natural-languagedesign and the structure of formal systems constructed for computationalefficiency may also be relevant here, as well as other properties of naturallanguage, such as the existence of empty categories, which might also beexpected to yield parsing problems. Note that one cannot easily motivate theconditions on economy of representation in terms of processing considerations, since they hold at LF, and only derivatively at S-Structure. Nor doesthere appear to be any argument that the particular properties of language

148

Chapter 2

design are necessary for languagelike systems. These are contingent propertiesof natural language.There are computational tricks that permit easy determination of thegrammatical properties of an S-Structure representation in a large class ofcases, broad enough to allow for language to be usable in practice. But language design as such appears to be in many respects dysfunctional, yieldingproperties that are not well adapted to the functions language is called uponto perform. There is no real paradox here; there is no reason to suppose, apriori, that the general design of language is conducive to efficient use. Rather,what we seem to discover are some intriguing and unexpected features oflanguage design, not unlike those that have been discovered throughout theinquiry into the nature of language, though unusual among biological systemsof the natural world.Notes1. This is sometimes called Government-Binding (GB) Theory, a misleading term thatshould be abandoned, in my view; see Chomsky 1988, lecture 2. Generative grammarhas engendered a good deal of controversy, sometimes for good reason, often not. Therehas been a fair amount of plain misunderstanding, beginning with the notion of generative grammar itself. I have always understood a generative grammar to be nothing morethan an explicit grammar. Some apparently have a different concept in mind. Forexample, reviewing Chomsky 1986b, McCawley (1988) notes that I interpret theconcept here as meaning nothing more than explicit, as I have always done (see, forinstance, Chomsky 1965, 4), and concludes erroneously that this is a sharp changein my usage that gives the enterprise an entirely different cast from that of the 1960s,when the task, as he perceives it, was taken to be specifying the membership of a setof sentences that is identified with a language (pp. 355356; McCawley takes the setof sentences to be what I have called the structure of the language, that is, the set ofstructural descriptions). But the characterization he gives does not imply that generative means anything more than explicit; there is, furthermore, no change in usageor conception, at least for me, in this regard. The review contains a series of furthermisunderstandings, and there are others elsewhere, but I will not discuss these mattershere.2. On why phonology alone might be expected to have specific rule structure, seeBromberger and Halle 1989.3. Or what is sometimes called a core language. The core-periphery distinction, in myview, should be regarded as an expository device, reflecting a level of understandingthat should be superseded as clarification of the nature of linguistic inquiry advances.See Chomsky 1988.4. On these notions, see Chomsky 1986b. General conditions of this sort were investigated in some detail in the earliest work in generative grammar, in the context of thestudy of evaluation procedures for grammars; see Chomsky 1951.

Some Notes on Economy of Derivation and Representation

149

5. The lexical elements are sometimes called atomic from the point of view of thecomputational operations. Taking the metaphor literally, we would conclude that nofeature of a lexical item can be modified or even addressed (say, for checking againstanother matching element) in a computational operation, and no features can be addedto a lexical element. The condition as stated is too strong; just how it holds is a theoryinternal question that I will put aside.6. On restriction to functional elements, see Borer 1984 and Fukui 1986, 1988.7. On this matter, see, among others, Baker 1988.8. As a matter of notation for X-bar theory, I will use prime instead of bar, X0 for thelowest-level category, and XP for X, for each X.9. I have in mind the notion of level of representation discussed in Chomsky 1975aand subsequent work.10. Some have proposed that certain conditions on syntax hold at PF; see, for example,Aoun et al. 1987. It cannot be, strictly speaking, the level of PF at which these conditions apply, since at this level there is no relevant structure, not even words, in general.Rather, this approach assumes an additional level S-P intermediate between S-Structureand PF, the purported conditions holding at S-P.11. See Burzio 1986 and Chomsky 1987. Some have felt that there is a profoundissue of principle distinguishing two-level theories that include a relation of D- toS-Structure from one-level approaches, which relate S-Structure to lexical propertiesin some different way; for some comment, see my (1981b) response to queries inLonguet-Higgins, Lyons, and Broadbent 1981, 63f. and Chomsky 1981a. There maybe an issue, but as noted, it is at best a rather subtle one.12. On X-bar-theoretic conditions at S-Structure, see Van Riemsdijk 1989. In lecturesin Tokyo in January 1987, I suggested some further reasons why such conditions mighthold at S-Structure.13. I assume here the general framework of Chomsky 1986a, based essentially onLasnik and Saito 1984, though further modifications are in order that I will not considerhere.14. Note that there also might be a partial reduction, for example, a formulation of theECP that expresses a generalization holding of X0-movement and other cases; thatwould be the import of a proposal by Rizzi (1990). We should also look into the otherpossible case of movement: X-movement. For evidence supporting this option, see VanRiemsdijk 1989. See also Namiki 1979.15. See Pollock 1989. I will touch upon only a few of the questions that Pollockaddresses. See Emonds 1978 and, for more recent development of his approach,Emonds 1985.16. Order irrelevant, here and below, for abstract formulations.17. Pollocks terms for strong and weak are transparent and opaque, respectively, forreasons that become clear directly.18. Pollock treats ne in the ne-pas construction as the clitic head of NegP, raising toa higher position. We might think of it as a kind of scope marker.19. More explicitly, the verb [v V [Agr Agr I]].

150

Chapter 2

20. The mechanics of how modals and do relate to the inflectional affixes remain tobe specified. If do-support can be shown to be a reflex of parameter fixing (choice ofweak Agr, we are assuming), then it is not, strictly speaking, a language-specific rule,though I will continue to use this term for expository purposes. The device of employing dummy elements in this manner is found elsewhere, also plausibly considered tobe contingent on parameter fixing; see section 2.6.4 for one example.21. Note that there are empirical consequences to these assumptions. They entail thatat the steady state attained in language acquisition, the UG principles remain distinctfrom language-particular properties. Suggestive work by Flynn (1987) on secondlanguage acquisition supports this conclusion.22. There would in fact be a straightforward solution to this particular problem in termsof an analysis to which we return in section 2.5, but I will put that aside here, since itwill not bear on the other questions just raised.23. Note that e is regarded here as an actual symbol of mental representation, butlacking -features and categorial features. e is not to be confused with the identityelement of a syntactic level, regarded as an algebraic construction in the manner ofChomsky 1975a.24. Recall that we are assuming, essentially, Lasnik and Saitos (1984) theory of theECP, as modified in Chomsky 1986a. Under this theory, tv in (17) is -marked afterraising of V to Agr, and subsequent deletion of Agr-trace in this position leaves no ECPviolation.25. On other cases of a similar sort, see Chomsky 1987.26. Semantic properties of infinitives, then, would be understood as properties of theconstruction, not its head [ finite].27. A cursory check suggests that the morphological consequences are as expected, inlanguages where the hierarchic position of object and subject agreement can bedetected.28. At various points, the reinterpretation would require slight modifications in theexposition and the resulting analysis. I will omit further comment on these matters,which do not seem to raise any serious problem.29. More precisely, agreement holds between the wh-phrase and Agro, to which theparticiple raises so that it agrees with the wh-phrase; the same is true of subject-verbagreement.30. Note that we must assume the two derivations to be equally costly, each beingminimal by successive-cyclic movement. This consideration would lead to a furtherrefinement of the notion of cost.31. The case of clitic movement depends upon theory-internal assumptions about cliticization, but no new problems appear to arise here. Kaynes argument is slightly different from the above.32. Koopman is considering the possibility of object raising to [Spec, VP]; alternatively, we might suppose that the process in question is raising to [Spec, AgrP].33. There are further refinements to be considered. For example, should expletives bepresent at D-Structure or inserted in the course of derivation? What is the status offunctional elements? And so on.

Some Notes on Economy of Derivation and Representation

151

34. If we adopt the approach to NP-raising discussed in Chomsky 1986a, then we willhave to distinguish the chain (39) formed by movement from the intermediate derivedchain that takes part in the process of -marking of n.35. An alternative possibility, suggested by certain facts about binding and trace interpretation, is that VP-movement is restricted to the PF component (as an optionalstylistic rule) and possibly also to (obligatory) LF movement, along the lines of areinterpretation of the barriers framework (Chomsky 1986a) discussed in my lecturesat Tokyo in January 1987. This conclusion may indeed follow from the considerationsdiscussed above concerning optionality, within the present framework.36. Note that further precision is necessary to make explicit just when and how thiscondition applies.37. They might be present at earlier stages, where licensing conditions do not yet apply,serving, as Norbert Hornstein observes, to permit the application of principles for theinterpretation of anaphors in displaced phrases of the sort proposed by Barss (1986).38. See Chomsky 1986b. For extensive discussion of expletives, which I will largelyfollow here, see Burzio 1986. See also Travis 1984 on the typology of expletives. Thestatus of it (and its counterparts) in extraposition constructions is more convoluted forvarious reasons, including the question of whether it occupies a -position.39. See Baker 1988 on the role of both Case and agreement in this connection.40. Such properties had to be stipulated on the assumptions made in Chomsky andLasnik 1977, but perhaps they are dispensable along the lines just sketched. For thesereasons alone, it seems doubtful that what adjoins to the expletive is a small clause ofwhich it is the subject; thus, I assume that what adjoins is a man, not the small clause[a man in the room], in (40a). There are other reasons for supposing this to be true.Kayne (1989) observes (see his note 6) that the assumption is required for his explanation of the lack of participle-object agreement with object raising in expletive constructions. Consider, furthermore, such expressions as *there seems to be several men sick,excluded by lack of agreement between several men and seems. But the phrase [severalmen sick] can be singular, as in [several men sick] is a sign that the water is pollutedand a range of similar cases discussed by Safir (1987), though many questions remainunsettled. On the possibility of nonagreement between the verb and its associate, seeBurzio 1986, 132133. Note that nothing requires that the two kinds of expletives bemorphologically distinct.41. We assume that Case distributes from a category to its immediate constituents, aprocess that is often morphologically overt, thus from the category of the complexelement [AE] to the adjoined element A, heading the chain (A, , tA). Recall that Aadjoined to E does head such a chain, by earlier assumptions.42. Note that these examples could be accounted for by stipulations on the distributionof expletives, as in Chomsky and Lasnik 1977, but we are now exploring the possibility,which seems plausible, that these are dispensable.43. Or no linking, in Higginbothams (1983) sense. Note that we cannot assume theexpletive to be unindexedthus, it might have raised, leaving an indexed trace.44. To account for scopal properties appropriately, more elaborate assumptions arerequired, taking into account the position of both the head and the terminal position of

152

Chapter 2

the associate chain (A, , t). In a raising construction such as there appear (not) tohave been many linguistics students here, we have to ensure that the scope of manyfalls within that of appear and not; no relation is determined by the proposed LF representation, but such a relation would be established in the correct way if the positionof the trace is considered, given that the head of the chain has no relation to the otherrelevant elements. Just what is entailed by a wider range of considerations remains tobe determined. See chapter 4.45. Similar remarks hold of quirky Case, assigned at D-Structure under the uniformity condition, but realized in a Case-marked position at S-Structure.46. See Pollock 1981 for arguments against Case transmission. For additional argument, see Kayne 1989.47. For discussion of these and the preceding examples, see Shlonsky 1987.48. Koopman considers other possible Case chains, but the evidence is lessconvincing.

3.1

A Minimalist Program for Linguistic Theory

Some General Considerations

Language and its use have been studied from varied points of view. Oneapproach, assumed here, takes language to be part of the natural world. Thehuman brain provides an array of capacities that enter into the use and understanding of language (the language faculty); these seem to be in good partspecialized for that function and a common human endowment over a verywide range of circumstances and conditions. One component of the languagefaculty is a generative procedure (an I-language, henceforth language) thatgenerates structural descriptions (SDs), each a complex of properties, including those commonly called semantic and phonetic. These SDs are theexpressions of the language. The theory of a particular language is its grammar.The theory of languages and the expressions they generate is UniversalGrammar (UG); UG is a theory of the initial state S0 of the relevant componentof the language faculty. We can distinguish the language from a conceptualsystem and a system of pragmatic competence. Evidence has been accumulating that these interacting systems can be selectively impaired and developmentally dissociated (Curtiss 1981, Yamada 1990, Smith and Tsimpli 1991), andtheir properties are quite different.A standard assumption is that UG specifies certain linguistic levels, each asymbolic system, often called a representational system. Each linguisticlevel provides the means for presenting certain systematic information aboutlinguistic expressions. Each linguistic expression (SD) is a sequence of representations, one at each linguistic level. In variants of the Extended Standard

This chapter originally appeared in The View from Building 20: Essays in Linguisticsin Honor of Sylvain Bromberger, edited by Kenneth Hale and Samuel Jay Keyser(Cambridge, Mass.: MIT Press, 1993), and is published here with minor revisions.

154

Chapter 3

Theory (EST), each SD is a sequence (, , , ), representations at the

D-Structure, S-Structure, Phonetic Form (PF), and Logical Form (LF) levels,respectively.Some basic properties of language are unusual among biological systems,notably the property of discrete infinity. A working hypothesis in generativegrammar has been that languages are based on simple principles that interactto form often intricate structures, and that the language faculty is nonredundant, in that particular phenomena are not overdetermined by principles oflanguage. These too are unexpected features of complex biological systems,more like what one expects to find (for unexplained reasons) in the study ofthe inorganic world. The approach has, nevertheless, proven to be a successfulone, suggesting that the hypotheses are more than just an artifact reflecting amode of inquiry.Another recurrent theme has been the role of principles of economy indetermining the computations and the SDs they generate. Such considerationshave arisen in various forms and guises as theoretical perspectives havechanged. There is, I think, good reason to believe that they are fundamentalto the design of language, if properly understood.1The language is embedded in performance systems that enable its expressions to be used for articulating, interpreting, referring, inquiring, reflecting,and other actions. We can think of the SD as a complex of instructions forthese performance systems, providing information relevant to their functions.While there is no clear sense to the idea that language is designed for useor well adapted to its functions, we do expect to find connections betweenthe properties of the language and the manner of its use.The performance systems appear to fall into two general types: articulatoryperceptual and conceptual-intentional. If so, a linguistic expression containsinstructions for each of these systems. Two of the linguistic levels, then, arethe interface levels A-P and C-I, providing the instructions for the articulatoryperceptual and conceptual-intentional systems, respectively. Each languagedetermines a set of pairs drawn from the A-P and C-I levels. The level A-Phas generally been taken to be PF; the status and character of C-I have beenmore controversial.Another standard assumption is that a language consists of two components:a lexicon and a computational system. The lexicon specifies the items thatenter into the computational system, with their idiosyncratic properties. Thecomputational system uses these elements to generate derivations and SDs.The derivation of a particular linguistic expression, then, involves a choice ofitems from the lexicon and a computation that constructs the pair of interfacerepresentations.

A Minimalist Program for Linguistic Theory

155

So far we are within the domain of virtual conceptual necessity, at least if

the general outlook is adopted.2 UG must determine the class of possiblelanguages. It must specify the properties of SDs and of the symbolic representations that enter into them. In particular, it must specify the interfacelevels (A-P, C-I), the elements that constitute these levels, and the computations by which they are constructed. A particularly simple design for languagewould take the (conceptually necessary) interface levels to be the only levels.That assumption will be part of the minimalist program I would like toexplore here.In early work in generative grammar, it was assumed that the interface C-Iis the level of T-markers, effectively a composite of all levels of syntacticrepresentation. In descendants of EST approaches, C-I is generally takento be LF. On this assumption, each language will determine a set of pairs (,) ( drawn from PF and from LF) as its formal representations of soundand meaning, insofar as these are determined by the language itself. Parts ofthe computational system are relevant only to , not : the PF component.3Other parts are relevant only to , not : the LF component. The parts of thecomputational system that are relevant to both are the overt syntaxa termthat is a bit misleading, in that these parts may involve empty categoriesassigned no phonetic shape. The nature of these systems is an empirical matter;one should not be misled by unintended connotations of such terms as logicalform and represent adopted from technical usage in different kinds ofinquiry.The standard idealized model of language acquisition takes the initial stateS0 to be a function mapping experience (primary linguistic data, PLD) to alanguage. UG is concerned with the invariant principles of S0 and the rangeof permissible variation. Variation must be determined by what is visible tothe child acquiring language, that is, by the PLD. It is not surprising, then, tofind a degree of variation in the PF component, and in aspects of the lexicon:Saussurean arbitrariness (association of concepts with phonological matrices),properties of grammatical formatives (inflection, etc.), and readily detectableproperties that hold of lexical items generally (e.g., the head parameter). Variation in the overt syntax or LF component would be more problematic, sinceevidence could only be quite indirect. A narrow conjecture is that there is nosuch variation: beyond PF options and lexical arbitrariness (which I henceforthignore), variation is limited to nonsubstantive parts of the lexicon and generalproperties of lexical items. If so, there is only one computational system andone lexicon, apart from this limited kind of variety. Let us tentatively adoptthat assumptionextreme, perhaps, but it seems not implausibleas anotherelement of the Minimalist Program.4

156

Chapter 3

Early generative grammar approached these questions in a different way,

along lines suggested by long tradition: various levels are identified, with theirparticular properties and interrelations; UG provides a format for permissiblerule systems; any instantiation of this format constitutes a specific language.Each language is a rich and intricate system of rules that are, typically, construction-particular and language-particular: the rules forming verb phrases orpassives or relative clauses in English, for example, are specific to these constructions in this language. Similarities across constructions and languagesderive from properties of the format for rule systems.The more recent principles-and-parameters (P&P) approach, assumed here,breaks radically with this tradition, taking steps toward the minimalist designjust sketched. UG provides a fixed system of principles and a finite array offinitely valued parameters. The language-particular rules reduce to choice ofvalues for these parameters. The notion of grammatical construction is eliminated, and with it, construction-particular rules. Constructions such as verbphrase, relative clause, and passive remain only as taxonomic artifacts, collections of phenomena explained through the interaction of the principles of UG,with the values of parameters fixed.With regard to the computational system, then, we assume that S0 is constituted of invariant principles with options restricted to functional elements andgeneral properties of the lexicon. A selection among these options determines a language. A language, in turn, determines an infinite set of linguisticexpressions (SDs), each a pair (, ) drawn from the interface levels (PF, LF),respectively. Language acquisition involves fixing ; the grammar of the language states , nothing more (lexical arbitrariness and PF component aside).If there is a parsing system that is invariant and unlearned (as often assumed),then it maps (, ) into a structured percept, in some cases associated with anSD.5 Conditions on representationsthose of binding theory, Case theory,-theory, and so onhold only at the interface, and are motivated by propertiesof the interface, perhaps properly understood as modes of interpretation byperformance systems. The linguistic expressions are the optimal realizationsof the interface conditions, where optimality is determined by the economyconditions of UG. Let us take these assumptions too to be part of the Minimalist Program.In early work, economy considerations entered as part of the evaluationmetric, which, it was assumed, selected a particular instantiation of the permitted format for rule systems, given PLD. As inquiry has progressed, the presumed role of an evaluation metric has declined, and within the P&P approach,it is generally assumed to be completely dispensable: the principles are

A Minimalist Program for Linguistic Theory

157

sufficiently restrictive so that PLD suffice in the normal case to set the parameter values that determine a language.6Nevertheless, it seems that economy principles of the kind explored in earlywork play a significant role in accounting for properties of language. With aproper formulation of such principles, it may be possible to move toward theminimalist design: a theory of language that takes a linguistic expression tobe nothing other than a formal object that satisfies the interface conditions inthe optimal way. A still further step would be to show that the basic principlesof language are formulated in terms of notions drawn from the domain of(virtual) conceptual necessity.Invariant principles determine what counts as a possible derivation and apossible derived object (linguistic expression, SD). Given a language, theseprinciples determine a specific set of derivations and generated SDs, each apair (, ). Let us say that a derivation D converges if it yields a legitimateSD and crashes if it does not; D converges at PF if is legitimate and crashesat PF if it is not; D converges at LF if is legitimate and crashes at LF if itis not. In an EST framework, with SD = (, , , ) ( a D-Structure representation, an S-Structure representation), there are other possibilities: or, or relations among (, , , ), might be defective. Within the MinimalistProgram, all possibilities are excluded apart from the status of and . A stillsharper version would exclude the possibility that and are each legitimatebut cannot be paired for UG reasons. Let us adopt this narrower condition aswell. Thus, we assume that a derivation converges if it converges at PF and atLF; convergence is determined by independent inspection of the interfacelevelsnot an empirically innocuous assumption.7The principles outlined are simple and restrictive, so that the empiricalburden is considerable; and fairly intricate argument may be necessary tosupport itexactly the desired outcome, for whatever ultimately proves to bethe right approach.These topics have been studied and elaborated over the past several years,with results suggesting that the minimalist conception outlined may not be farfrom the mark. I had hoped to present an exposition in this paper, but that planproved too ambitious. I will therefore keep to an informal sketch, only indicating some of the problems that must be dealt with.83.2

Fundamental Relations: X-Bar Theory

The computational system takes representations of a given form and modifies

them. Accordingly, UG must provide means to present an array of items from

158

Chapter 3

the lexicon in a form accessible to the computational system. We may take

this form to be some version of X-bar theory. The concepts of X-bar theoryare therefore fundamental. In a minimalist theory, the crucial properties andrelations will be stated in the simple and elementary terms of X-bar theory.An X-bar structure is composed of projections of heads selected from thelexicon. Basic relations, then, will involve the head as one term. Furthermore,the basic relations are typically local. In structures of the form (1), two localrelations are present: the Spec(ifier)-head relation of ZP to X, and thehead-complement relation of X to YP (order irrelevant; the usual conventionsapply).(1)

XPX

ZPX

YP

The head-complement relation is not only more local but also morefundamentaltypically, associated with thematic (-) relations. The Spechead relation, I will suggest below, falls into an elsewhere category. Puttingaside adjunction for the moment, the narrowest plausible hypothesis is thatX-bar structures are restricted to the form in (1); only local relations are considered (hence no relation between X and a phrase included within YP or ZP);and head-complement is the core local relation. Another admissible local relation is head-head, for example, the relation of a verb to (the head of) its NounPhrase complement (selection). Another is chain link, to which we will return.The version of a minimalist program explored here requires that we keep torelations of these kinds, dispensing with such notions as government by a head(head government). But head government plays a critical role in all modulesof grammar; hence, all of these must be reformulated, if this program is to bepursued.Take Case theory. It is standardly assumed that the Spec-head relation entersinto structural Case for the subject position, while the object position isassigned Case under government by V, including constructions in which theobject Case-marked by a verb is not its complement (exceptional Casemarking).9 The narrower approach we are considering requires that all thesemodes of structural Case assignment be recast in unified X-bar-theoretic terms,presumably under the Spec-head relation. As discussed in chapter 2, an elaboration of Pollocks (1989) theory of inflection provides a natural mechanism,where we take the basic structure of the clause to be (2).

A Minimalist Program for Linguistic Theory

(2)

159

CPC

Spec

AgrsP

Agrs

Spec

TP

AgrsT

AgroPAgro

SpecAgro

VP

Omitted here are a possible specifier of TP ([Spec, TP]) and a phrase headedby the functional element Neg(ation), or perhaps more broadly, a category thatincludes an affirmation marker and others as well (Pollock 1989, Laka 1990).AgrS and AgrO are informal mnemonics to distinguish the two functional rolesof Agr. Agr is a collection of -features (gender, number, person); these arecommon to the systems of subject and object agreement, though AgrS and AgrOmay of course be different selections, just as two verbs or NPs in (2) maydiffer.10We now regard both agreement and structural Case as manifestations of theSpec-head relation (NP, Agr). But Case properties depend on characteristicsof T and the V head of VP. We therefore assume that T raises to AgrS, forming(3a), and V raises to AgrO, forming (3b); the complex includes the -featuresof Agr and the Case feature provided by T, V.11(3) a. [Agr T Agr]b. [Agr V Agr]The basic assumption is that there is a symmetry between the subject andthe object inflectional systems. In both positions the relation of NP to V ismediated by Agr, a collection of -features; in both positions agreement isdetermined by the -features of the Agr head of the Agr complex, and Caseby an element that adjoins to Agr (T or V). An NP in the Spec-head relationto this Agr complex bears the associated Case and agreement features. The

160

Chapter 3

Spec-head and head-head relations are therefore the core configurations forinflectional morphology.Exceptional Case marking by V is now interpreted as raising of NP to theSpec of the AgrP dominating V. It is raising to [Spec, AgrO], the analogue offamiliar raising to [Spec, AgrS]. If the VP-internal subject hypothesis is correct(as I henceforth assume), the question arises why the object (direct, or in thecomplement) raises to [Spec, AgrO] and the subject to [Spec, AgrS], yieldingunexpected crossing rather than the usual nested paths. We will return to thisphenomenon below, finding that it follows on plausible assumptions of somegenerality, and in this sense appears to be a fairly deep property of language.If parameters are morphologically restricted in the manner sketched earlier,there should be no language variation in this regard.The same hypothesis extends naturally to predicate adjectives, with theunderlying structure shown in (4) (AgrA again a mnemonic for a collection of-features, in this case associated with an adjective).(4)

AgrPSpec

AgrAP

AgrANP

John

Aintelligent

Raising of NP to Spec and A to AgrA creates the structure for NP-adjective

agreement internal to the predicate phrase. The resulting structure is a plausible candidate for the small clause complement of consider, be, and so on.In the former construction (complement of consider), NP raises further to[Spec, AgrO] at LF to receive accusative Case; in the latter (complementof be), NP raises overtly to receive nominative Case and verb agreement,yielding the overt form John is intelligent with John entering into three relations: (1) a Case relation with [T AgrS] (hence ultimately the verbal complex[[T AgrS] V]), (2) an agreement relation with AgrS (hence the verbal complex),and (3) an agreement relation with Agr of structure (4) (hence the adjectivalcomplex). In both constructions, the NP subject is outside of a full AP in the

A Minimalist Program for Linguistic Theory

161

small clause construction, as required, and the structure is of a type that

appears regularly.12An NP, then, may enter into two kinds of structural relations with a predicate(verb, adjective): agreement, involving features shared by NP and predicate;or Case, manifested on the NP alone. Subject of verb or adjective, and objectof verb, enter into these relations (but not object of adjective if that is aninstance of inherent, not structural, Case). Both relations involve Agr: Agralone, for agreement relations; the element T or V alone (raising to Agr), forCase relations.The structure of CP in (2) is largely forced by other properties of UG,assuming the minimalist approach with Agr abstracted as a common propertyof adjectival agreement and the subject-object inflectional systems, a reasonable assumption, given that agreement appears without Case (as in NP-APagreement) and Case appears without agreement (as in transitive expletives,with the expletive presumably in the [Spec, AgrS] position and the subject in[Spec, T], receiving Case; see note 11). Any appropriate version of the CaseFilter will require two occurrences of Agr if two NPs in VP require structuralCase; conditions on Move require the arrangement given in (2) if structuralCase is construed as outlined. Suppose that VP contains only one NP. Thenone of the two Agr elements will be active (the other being inert or perhapsmissing). Which one? Two options are possible: AgrS or AgrO. If the choice isAgrS, then the single NP will have the properties of the subject of a transitiveclause; if the choice is AgrO, then it will have the properties of the object of atransitive clause (nominative-accusative and ergative-absolutive languages,respectively). These are the only two possibilities, mixtures apart. The distinction between the two language types reduces to a trivial question of morphology, as we expect.Note that from this point of view, the terms nominative, absolutive, and soon, have no substantive meaning apart from what is determined by the choiceof active versus inert Agr; there is no real question as to how these termscorrespond across language types.The active element (AgrS in nominative-accusative languages and AgrO inergative-absolutive languages) typically assigns a less-marked Case to itsSpec, which is also higher on the extractability hierarchy, among other properties. It is natural to expect less-marked Case to be compensated (again, as atendency) by more-marked agreement (richer overt agreement with nominativeand absolutive than with accusative and ergative). The C-Command Conditionon anaphora leads us to expect nominative and ergative binding in transitiveconstructions.13

162

Chapter 3

Similar considerations apply to licensing of pro. Assuming Rizzis theory

(1982, 1986a), pro is licensed in a Spec-head relation to strong AgrS, orwhen governed by certain verbs V*. To recast these proposals in a unitaryX-bar-theoretic form: pro is licensed only in the Spec-head relation to [Agr Agr], where is [+tense] or V, Agr strong or V = V*. Licensing of pro thusfalls under Case theory in a broad sense. Similar considerations extend rathernaturally to PRO.14Suppose that other properties of head government also have a naturalexpression in terms of the more fundamental notions of X-bar theory. Supposefurther that antecedent government is a property of chains, expressible in termsof c-command and barriers. Then the concept of government would be dispensable, with principles of language restricted to something closer to conceptual necessity: local X-bar-theoretic relations to the head of a projectionand the chain link relation.Let us look more closely at the local X-bar-theoretic notions, taking theseto be fundamental. Assume binary branching only, thus structures limited to(1). Turning to adjunction, on the assumptions of Chomsky 1986a, there is noadjunction to complement, adjunction (at least, in overt syntax) has a kind ofstructure-preserving character, and a segment-category distinction holds.15Thus, the structures to be considered are of the form shown in (5), where XP,ZP, and X each have a higher and lower segment, indicated by subscripting(H and X heads).(5)

XP1UP

XP2ZP1WP

XX1

ZP2H

YPX2

Let us now consider the notions that enter into a minimalist program. Thebasic elements of a representation are chains. We consider first the case ofone-membered chains, construing notions abstractly with an eye to the generalcase. The structure (5) can only have arisen by raising of H to adjoin to X (weput aside questions about the possible origins of UP, WP). Therefore, H headsa chain CH = (H, , t), and only this chain, not H in isolation, enters into

A Minimalist Program for Linguistic Theory

163

head- relations. The categories that we establish are defined for H as well asX, but while they enter into head- relations for X, they do not do so for H(only for the chain CH), an important matter.Assume all notions to be irreflexive unless otherwise indicated. Assume thestandard notion of domination for the pair (, ), a segment. We say thatthe category dominates if every segment of dominates . The category contains if some segment of dominates . Thus, the two-segment category XP dominates ZP, WP, X and whatever they dominate; XP contains UPand whatever UP and XP dominate; ZP contains WP but does not dominateit. The two-segment category X contains H but does not dominate it.For a head , take Max() to be the least full-category maximal projectiondominating . Thus, in (5) Max(H) = Max(X) = [XP1, XP2], the two-segmentcategory XP.Take the domain of a head to be the set of nodes contained in Max()that are distinct from and do not contain . Thus, the domain of X in (5) is{UP, ZP, WP, YP, H} and whatever these categories dominate; the domain ofH is the same, minus H.As noted, the fundamental X-bar-theoretic relation is head-complement,typically with an associated -relation determined by properties of the head.Define the complement domain of as the subset of the domain reflexivelydominated by the complement of the construction: YP in (5). The complementdomain of X (and H) is therefore YP and whatever it dominates.The remainder of the domain of we will call the residue of . Thus, in(5) the residue of X is its domain minus YP and what it dominates. The residueis a heterogeneous set, including the Spec and anything adjoined (adjunctionbeing allowed to the maximal projection, its Spec, or its head; UP, WP, andH, respectively, in (5)).The operative relations have a local character. We are therefore interestednot in the sets just defined, but rather in minimal subsets of them that includejust categories locally related to the heads. For any set S of categories, let ustake Min(S) (minimal S) to be the smallest subset K of S such that for any S, some K reflexively dominates . In the cases that interest us,S is a function of a head (e.g., S = domain of ). We keep to this case,that is, to Min(S()), for some head . Thus, in (5) the minimal domainof X is {UP, ZP, WP, YP, H}; its minimal complement domain is YP; and itsminimal residue is {UP, ZP, WP, H}. The minimal domain of H is {UP, ZP,WP, YP}; its minimal complement domain is YP; and its minimal residue is{UP, ZP, WP}.Let us call the minimal complement domain of its internal domain, andthe minimal residue of its checking domain. The terminology is intended to

164

Chapter 3

indicate that elements of the internal domain are typically internal argumentsof , while the checking domain is typically involved in checking inflectionalfeatures. Recall that the checking domain is heterogeneous: it is the elsewhere set. The minimal domain also has an important role, to which we willturn directly.A technical point should be clarified. The internal and checking domains of must be uniquely defined for ; specifically, if (or one of its elements, ifit is a nontrivial chain) is moved, we do not want the internal and checkingdomains to be redefined in the newly formed construction, or we will havean element with multiple subdomainsfor example, ambiguous specificationof internal arguments. We must therefore understand the notion Min(S())derivationally, not representationally: it is defined for as part of the processof introducing into the derivation. If is a trivial (one-membered) chain,then Min(S()) is defined when is lexically inserted; if is a nontrivialchain (1, , n), then Min(S()) is defined when is formed by raising 1.In (5) the head H has no minimal, internal, or checking domain, because it israised from some other position to form the chain CH = (H, , t) and hasalready been assigned these subdomains in the position now occupied by t;such subdomains are, however, defined for the newly formed chain CH, in amanner to which we will turn directly. Similarly, if the complex [H X] is laterraised to form the chain CH = ([H X], t ), Min(S()) will be defined as partof the operation for = CH, but not for = X, H, or CH.Returning to (5), suppose X is a verb. Then YP, the sole element of theinternal domain of X, is typically an internal argument of X. Suppose X is Agrand H a verb raised to Agr forming the chain CH = (H, t). Then the specifierZP (and possibly the adjoined elements UP, WP) of the checking domain ofX and CH will have agreement features by virtue of their local relation to X,and Case features by virtue of their local relation to CH. H does not have achecking domain, but CH does.16We have so far considered only one-membered chains. We must extend thenotions defined to a nontrivial chain CH with n > 1 (1 a zero-level category),as in (6).(6) CH = (1, , n)Let us keep to the case of n = 2, the normal case for lexical heads though notnecessarily the only one.17The issue arises, for example, if we adopt an analysis of multiargumentverbs along the lines suggested by Larson (1988), for example, taking theunderlying structure of (7) to be (8).

A Minimalist Program for Linguistic Theory

165

(7) John put the book on the shelf

(8)

VP1V1

NP1John

V1e

VP2V2

NP2the book V2put

ZPon the shelf

V2 raises to the empty position V1, forming the chain (put, t) (subsequently,NP1 raises (overtly) to [Spec, AgrS] and NP2 (covertly) to [Spec, AgrO]).The result we want is that the minimal domain of the chain (put, t) is {NPl,NP2, ZP} (the three arguments), while the internal domain is {NP2, ZP} (theinternal arguments). The intended sense is given by the natural generalizationof the definitions already suggested. Let us define the domain of CH in (6) tobe the set of nodes contained in Max(1) and not containing any i. Thecomplement domain of CH is the subset of the domain of CH reflexivelydominated by the complement of 1. Residue and Min(S()) are defined asbefore, now for = CH. The concepts defined earlier are the special caseswhere CH is one-membered.Suppose, for example, that CH = (put, t), after raising of put to V1 in (8),leaving t in the position V2. Then the domain of CH is the set of nodes contained in VP1 (= Max (V1)) and not containing either put or t (namely, the set{NPl, NP2, ZP} and whatever they dominate); the minimal domain is {NP1,NP2, ZP}. The internal domain of the chain CH is {NP2, ZP} (the two internalarguments), and the checking domain of CH is NP1, the typical position of theexternal argument in this version of the VP-internal subject hypothesis (basically Larsons).Suppose that instead of replacing e, put had adjoined to some nonnullelement X, yielding the complex category [X put X], as in adjunction of H toX in (5). The domain, internal domain, and checking domain of the chainwould be exactly the same. There is no minimal domain, internal domain, orchecking domain for put itself after raising; only for the chain CH = (put, t).

166

Chapter 3

It is in terms of these minimal sets that the local head- relations are defined,the head now being the nontrivial chain CH.In (8), then, the relevant domains are as intended after V-raising to V1. Notethat VP2 is not in the internal domain of CH (= (put, t)) because it dominatest (= n of (6)).The same notions extend to an analysis of lexical structure along the linesproposed by Hale and Keyser (1993a). In this case an analogue of (8) wouldbe the underlying structure for John shelved the book, with V2 being a lightverb and ZP an abstract version of on the shelf (= [P shelf]). Here shelf raisesto P, the amalgam raises to V2, and the element so formed raises to V1 in themanner of put in (7).18So far we have made no use of the notion minimal domain. But this toohas a natural interpretation, when we turn to Empty Category Principle (ECP)phenomena. I will have to put aside a careful development here, but it is intuitively clear how certain basic aspects will enter. Take the phenomena of superiority (as in (9a)) and of relativized minimality in the sense of Rizzi (1990)(as in (9b)).(9) a. i. whom1 did John persuade t1 [to visit whom2]ii. *whom2 did John persuade whom1 [to visit t2]b. Superraising, the Head Movement Constraint (HMC), [Spec, CP]islands (including wh-islands)Looking at these phenomena in terms of economy considerations, it is clearthat in all the bad cases, some element has failed to make the shortestmove. In (9aii) movement of whom2 to [Spec, CP] is longer in a natural sense(definable in terms of c-command) than movement of whom1 to this position.In all the cases of (9b) the moved element has skipped a position it couldhave reached by a shorter move, had that position not been filled. Spelling outthese notions to account for the range of relevant cases is not a trivial matter.But it does seem possible in a way that accords reasonably well with theMinimalist Program. Let us simply assume, for present purposes, that this taskcan be carried out, and that phenomena of the kind illustrated are accountedfor in this way in terms of economy considerations.19There appears to be a conflict between two natural notions of economy:shortest move versus fewest steps in a derivation. If a derivation keeps toshortest moves, it will have more steps; if it reduces the number of steps, itwill have longer moves. The paradox is resolved if we take the basic transformational operation to be not Move but Form Chain, an operation that applies,say, to the structure (10a) to form (10b) in a single step, yielding the chain CHof (10c).

A Minimalist Program for Linguistic Theory

167

(10) a. e seems [e to be likely [John to win]]

b. John seems [t to be likely [t to win]]c. CH = (John, t, t)Similarly, in other cases of successive-cyclic movement. There is, then, noconflict between reducing derivations to the shortest number of steps andkeeping links minimal (Shortest Movement Condition). There are independent reasons to suppose that this is the correct approach: note, forexample, that successive-cyclic wh-movement of arguments does not treat theintermediate steps as adjunct movement, as it should if it were a successionof applications of Move . Successive-cyclic movement raises a varietyof interesting problems, but I will again put them aside, keeping to thesimpler case.A number of questions arise in the case of such constructions as (8), considered now in the more abstract form (11).(11)

Here Subj is the VP-internal subject (or its trace), and Obj the object. Theconfiguration and operations are exactly those of (8), except that in (12) Vadjoins to Agr (as in the case of H of (5)), whereas in (8) it substituted for theempty position V1. On our assumptions, Obj must raise to Spec for Case checking, crossing Subj or its trace. (12) is therefore a violation of RelativizedMinimality, in effect, a case of superraising, a violation of the Shortest Movement Condition.Another instance of (11) is incorporation in the sense of Baker (1988). Forexample, V-incorporation to a causative verb has a structure like (12), but withan embedded clause S instead of the object Obj, as in (13).(13)

AgrPSpec

AgrAgr

VPNP1

VVc

SVP

[NP2]NP3

In an example of Bakers, modeled on Chichea, we take NP1 = the baboons,

Vc = make, NP2 = the lizards, V = hit, and NP3 = the children; the resultingsentence is the baboons made-hit the children [to the lizards], meaningthe baboons made the lizards hit the children. Incorporation of V to thecausative Vc yields the chain (V, t), with V adjoined to Vc. The complexhead [V Vc] then raises to Agr, forming the new chain ([V Vc], t), with[V Vc] adjoining to Agr to yield = [Agr [V Vc] Agr]. The resulting structureis (14).20

A Minimalist Program for Linguistic Theory

(14)

169

AgrPSpec

Agr

VPV

NP1t

SVP

[NP2]NP3

Here NP3 is treated as the object of the verbal complex, assigned accusativeCase (with optional object agreement). In our terms, that means that NP3 raisesto [Spec, ], crossing NP1, the matrix subject or its trace (another option isthat the complex verb is passivized and NP3 is raised to [Spec, AgrS]).In the last example the minimal domain of the chain ([V Vc], t) is (Spec,NP1, S}. The example is therefore analogous to (8), in which V-raising formedan enlarged minimal domain for the chain. It is natural to suppose that (12)has the same property: V first raises to Agr, yielding the chain (V, t) with theminimal domain {Spec, Subj, Obj}. The cases just described are now formallyalike and should be susceptible to the same analysis. The last two cases appearto violate the Shortest Movement Condition.Let us sharpen the notion shortest movement as follows:(15) If , are in the same minimal domain, they are equidistant from .In particular, two targets of movement are equidistant if they are in the sameminimal domain.In the abstract case (11), if Y adjoins to X, forming the chain (Y, t) with theminimal domain {Spec1, Spec2, ZP}, then Spec1 and Spec2 are equidistant fromZP (or anything it contains), so that raising of (or from) ZP can cross Spec2to Spec1. Turning to the problematic instances of (11), in (12) Obj can raiseto Spec, crossing Subj or its trace without violating the economy condition;and in the incorporation example (14) NP3 can raise to Spec, crossing NP1.This analysis predicts that object raising as in (12) should be possible onlyif V has raised to Agr. In particular, overt object raising will be possible only

170

Chapter 3

with overt V-raising. That prediction is apparently confirmed for the Germaniclanguages (Vikner 1990). The issue does not arise in the LF analogue, sincewe assume that invariably, V raises to AgrO covertly, if not overtly, thereforefreeing the raising of object to [Spec, AgrO] for Case checking.Baker explains structures similar to (13)(14) in terms of his GovernmentTransparency Corollary (GTC), which extends the government domain of V1to that of V2 if V2 adjoins to V1.21 The analysis just sketched is an approximateanalogue, on the assumption that Case and agreement are assigned not by headgovernment but in the Spec-head relation. Note that the GTC is not strictlyspeaking a corollary; rather, it is an independent principle, though Baker givesa plausibility argument internal to a specific theory of government. A possibility that might be investigated is that the GTC falls generally under the independently motivated condition (15), on the minimalist assumptions beingexplored here.Recall that on these assumptions, we faced the problem of explaining whywe find crossing rather than nesting in the Case theory, with VP-internalsubject raising to [Spec, AgrS] and object raising to [Spec, AgrO], crossing thetrace of the VP-internal subject. The principle (15) entails that this is a permissible derivation, as in (12) with V-raising to AgrO. It remains to show that thedesired derivation is not only permissible but obligatory: it is the only possiblederivation. That is straightforward. Suppose that in (12) the VP-internal subjectin [Spec, VP] raises to [Spec, AgrO], either overtly or covertly, yielding (16),tSubj the trace of the raised subject Subj.(16)

AgroPAgro

SubjAgro

VPtSubj

VV

Obj

Suppose further that V raises to AgrO, either overtly or covertly, forming thechain (V, tv) with the minimal domain {Subj, tSubj, Obj}. Now Subj and its traceare equidistant from Obj, so that Obj can raise to the [Spec, AgrO] position.But this position is occupied by Subj, blocking that option. Therefore, toreceive Case, Obj must move directly to some higher position, crossing [Spec,AgrO]: either to [Spec, T] or to [Spec, AgrS]. But that is impossible, even after

A Minimalist Program for Linguistic Theory

171

the element [V, AgrO] raises to higher inflectional positions. Raising of [V,AgrO] will form a new chain with trace in the AgrO position of (16) and a newminimal domain M. But tSubj is not a member of M. Accordingly, Obj cannotcross tSubj to reach a position in M (apart from the position [Spec, AgrO] alreadyfilled by the subject). Hence, raising of the VP-internal subject to the [Spec,AgrO] position blocks any kind of Case assignment to the object; the object isfrozen in place.22It follows that crossing and not nesting is the only permissible option in anylanguage. The paradox of Case theory is therefore resolved, on natural assumptions that generalize to a number of other cases.3.3

Beyond the Interface Levels: D-Structure

Recall the (virtual) conceptual necessities within this general approach. UG

determines possible symbolic representations and derivations. A languageconsists of a lexicon and a computational system. The computational systemdraws from the lexicon to form derivations, presenting items from the lexiconin the format of X-bar theory. Each derivation determines a linguistic expression, an SD, which contains a pair (, ) meeting the interface conditions.Ideally, that would be the end of the story: each linguistic expression is anoptimal realization of interface conditions expressed in elementary terms(chain link, local X-bar-theoretic relations), a pair (, ) satisfying theseconditions and generated in the most economical way. Any additional structureor assumptions require empirical justification.The EST framework adds additional structure; for concreteness, take Lectures on Government and Binding (LGB; Chomsky 1981a). One crucialassumption has to do with the way in which the computational system presentslexical items for further computation. The assumption is that this is doneby an operation, call it Satisfy, which selects an array of items from thelexicon and presents it in a format satisfying the conditions of X-bar theory.Satisfy is an all-at-once operation: all items that function at LF are drawnfrom the lexicon before computation proceeds23 and are presented in theX-bar format.We thus postulate an additional level, D-Structure, beyond the two externalinterface levels PF and LF. D-Structure is the internal interface between thelexicon and the computational system, formed by Satisfy. Certain principlesof UG are then held to apply to D-Structure, specifically, the Projection Principle and the -Criterion. The computational procedure maps D-Structure toanother level, S-Structure, and then branches to PF and LF, independently.UG principles of the various modules of grammar (binding theory, Case

172

Chapter 3

theory, the pro module, etc.) apply at the level of S-Structure (perhaps elsewhere as well, in some cases).The empirical justification for this approach, with its departures from conceptual necessity, is substantial. Nevertheless, we may ask whether the evidence will bear the weight, or whether it is possible to move toward a minimalistprogram.Note that the operation Satisfy and the assumptions that underlie it are notunproblematic. We have described Satisfy as an operation that selects an array,not a set; different arrangements of lexical items will yield different expressions. Exactly what an array is would have to be clarified. Furthermore, thispicture requires conditions to ensure that D-Structure has basic properties ofLF. At LF the conditions are trivial. If they are not met, the expression receivessome deviant interpretation at the interface; there is nothing more to say. TheProjection Principle and the -Criterion have no independent significance atLF.24 But at D-Structure the two principles are needed to make the picturecoherent; if the picture is abandoned, they will lose their primary role. Theseprinciples are therefore dubious on conceptual grounds, though it remains toaccount for their empirical consequences, such as the constraint against substitution into a -position. If the empirical consequences can be explained insome other way and D-Structure eliminated, then the Projection Principle andthe -Criterion can be dispensed with.What is more, postulation of D-Structure raises empirical problems, asnoticed at once when EST was reformulated in the more restrictive P&Pframework. One problem, discussed in LGB, is posed by complex adjectivalconstructions such as (17a) with the S-Structure representation (17b) (t thetrace of the empty operator Op).(17) a. John is easy to pleaseb. John is easy [CP Op [IP PRO to please t]]The evidence for the S-Structure representation (17b) is compelling, but Johnoccupies a non--position and hence cannot appear at D-Structure. Satisfy istherefore violated. In LGB it is proposed that Satisfy be weakened: in non-positions a lexical item, such as John, can be inserted in the course of thederivation and assigned its -role only at LF (and irrelevantly, S-Structure).That is consistent with the principles, though not with their spirit, one mightargue.We need not tarry on that matter, however, because the technical devicedoes not help. As noted by Howard Lasnik, the LGB solution fails, becausean NP of arbitrary complexity may occur in place of John (e.g., an NPincorporating a structure such as (17a) internally). Within anything like the

A Minimalist Program for Linguistic Theory

173

LGB framework, then, we are driven to a version of generalized transformations, as in the very earliest work in generative grammar. The problem wasrecognized at once, but left as an unresolved paradox. More recent workhas brought forth other cases of expressions interpretable at LF but not intheir D-Structure positions (Reinhart 1991), along with other reasons to suspectthat there are generalized transformations, or devices like them (Kroch andJoshi 1985, Kroch 1989, Lebeaux 1988, Epstein 1991). If so, the specialassumptions underlying the postulation of D-Structure lose credibility. Sincethese assumptions lacked independent conceptual support, we are led to dispense with the level of D-Structure and the all-at-once property of Satisfy,relying in its place on a theory of generalized transformations for lexicalaccessthough the empirical consequences of the D-Structure conditionsremain to be faced.25A theory of the preferred sort is readily constructed and turns out to havemany desirable properties. Let us replace the EST assumptions of LGB andrelated work by an approach along the following lines. The computationalsystem selects an item X from the lexicon and projects it to an X-bar structureof one of the forms in (18), where X = X0 = [X X].(18) a. Xb. [X X]c. [XP[X X]]This will be the sole residue of the Projection Principle.We now adopt (more or less) the assumptions of LSLT, with a single generalized transformation GT that takes a phrase marker K1 and inserts it in adesignated empty position 0/ in a phrase marker K, forming the new phrasemarker K*, which satisfies X-bar theory. Computation proceeds in parallel,selecting from the lexicon freely at any point. At each point in the derivation,then, we have a structure , which we may think of as a set of phrase markers.At any point, we may apply the operation Spell-Out, which switches to thePF component. If is not a single phrase marker, the derivation crashes at PF,since PF rules cannot apply to a set of phrase markers and no legitimate PFrepresentation is generated. If is a single phrase marker, the PF rules applyto it, yielding , which either is legitimate (so the derivation converges at PF)or not (the derivation again crashes at PF).After Spell-Out, the computational process continues, with the sole constraint that it has no further access to the lexicon (we must ensure, for example,that John left does not mean they wondered whether John left before finishinghis work). The PF and LF outputs must satisfy the (external) interface conditions. D-Structure disappears, along with the problems it raised.

174

Chapter 3

GT is a substitution operation. It targets K and substitutes K1 for 0/ in K.

But 0/ is not drawn from the lexicon; therefore, it must have been inserted byGT itself. GT, then, targets K, adds 0/ , and substitutes K1 for 0/ , forming K*,which must satisfy X-bar theory. Note that this is a description of the innerworkings of a single operation, GT. It is on a par with some particular algorithm for Move , or for the operation of modus ponens in a proof. Thus, itis invisible to the eye that scans only the derivation itself, detecting only itssuccessive steps. We never see 0/ ; it is subliminal, like the first half of theraising of an NP to subject position.Alongside the binary substitution operation GT, which maps (K, K1) to K*,we also have the singulary substitution operation Move , which maps K toK*. Suppose that this operation works just as GT does: it targets K, adds 0/ ,and substitutes for 0/ , where in this case is a phrase marker within thetargeted phrase marker K itself. We assume further that the operation leavesbehind a trace t of and forms the chain (, t). Again, 0/ is invisible when wescan the derivation; it is part of the inner workings of an operation carryingthe derivation forward one step.Suppose we restrict substitution operations still further, requiring that 0/ beexternal to the targeted phrase marker K. Thus, GT and Move extend K toK*, which includes K as a proper part.26 For example, we can target K = V,add 0/ to form [ 0/ V ],and then either raise from within V to replace 0/ orinsert another phrase marker K1 for 0/ . In either case the result must satisfyX-bar theory, which means that the element replacing 0/ must be a maximalprojection YP, the specifier of the new phrase marker VP = .The requirement that substitution operations always extend their target hasa number of consequences. First, it yields a version of the strict cycle, one thatis motivated by the most elementary empirical considerations: without it, wewould lose the effects of those cases of the ECP that fall under RelativizedMinimality (see (9b)). Thus, suppose that in the course of a derivation we havereached the stage (19).(19) a. [I seems [I is certain [John to be here]]]b. [C C [VP fix the car]]c. [C C [John wondered [C C [IP Mary fixed what how]]]]Violating no Shortest Movement Condition, we can raise John directly tothe matrix Spec in (19a) in a single step, later inserting it from the lexicon toform John seems it is certain t to be here (superraising); we can raise fixto adjoin to C in (19b), later inserting can from the lexicon to form fix Johncan t the car (violating the HMC); and we can raise how to the matrix [Spec,CP] position in (19c), later raising what to the embedded [Spec, CP] position

A Minimalist Program for Linguistic Theory

175

to form how did John wonder what Mary fixed thow (violating the Wh-IslandConstraint).27The extension version of the strict cycle is therefore not only straightforward, but justified empirically without subtle empirical argument.A second consequence of the extension condition is that given a structureof the form [X X YP], we cannot insert ZP into X (yielding, e.g., [X X YPZP]), where ZP is drawn from within YP (raising) or inserted from outside byGT. Similarly, given [X X], we cannot insert ZP to form [X X ZP]. There canbe no raising to a complement position. We therefore derive one major consequence of the Projection Principle and -Criterion at D-Structure, thuslending support to the belief that these notions are indeed superfluous. Moregenerally, as noted by Akira Watanabe, the binarity of GT comes close toentailing that X-bar structures are restricted to binary branching (Kaynesunambiguous paths), though a bit more work is required.The operations just discussed are substitution transformations, but we mustconsider adjunction as well. We thus continue to allow the X-bar structure (5)as well as (1), specifically (20).28(20) a. [X Y X]b. [xp YP XP]In (20a) a zero-level category Y is adjoined to the zero-level category X, andin (20b) a maximal projection YP is adjoined to the maximal projection XP.GT and Move must form structures satisfying X-bar theory, now including(20). Note that the very strong empirical motivation for the strict cycle justgiven does not apply in these cases. Let us assume, then, that adjunction neednot extend its target. For concreteness, let us assume that the extension requirement holds only for substitution in overt syntax, the only case required by thetrivial argument for the cycle.293.4

Beyond the Interface Levels: S-Structure

Suppose that D-Structure is eliminable along these lines. What about

S-Structure, another level that has only theory-internal motivation? The basicissue is whether there are S-Structure conditions. If not, we can dispense withthe concept of S-Structure, allowing Spell-Out to apply freely in the mannerindicated earlier. Plainly this would be the optimal conclusion.There are two kinds of evidence for S-Structure conditions.(21) a. Languages differ with respect to where Spell-Out applies in thecourse of the derivation to LF. (Are wh-phrases moved or in situ? Is

176

Chapter 3

the language French-style with overt V-raising or English-style with

LF V-raising?)b. In just about every module of grammar, there is extensive evidencethat the conditions apply at S-Structure.To show that S-Structure is nevertheless superfluous, we must show that theevidence of both kinds, though substantial, is not compelling.In the case of evidence of type (21a), we must show that the position ofSpell-Out in the derivation is determined by either PF or LF properties, thesebeing the only levels, on minimalist assumptions. Furthermore, parametricdifferences must be reduced to morphological properties if the MinimalistProgram is framed in the terms so far assumed. There are strong reasons tosuspect that LF conditions are not relevant. We expect languages to be verysimilar at the LF level, differing only as a reflex of properties detectable atPF; the reasons basically reduce to considerations of learnability. Thus, weexpect that at the LF level there will be no relevant difference between languages with phrases overtly raised or in situ (e.g., wh-phrases or verbs). Hence,we are led to seek morphological properties that are reflected at PF. Let uskeep the conclusion in mind, returning to it later.With regard to evidence of type (21b), an argument against S-Structureconditions could be of varying strength, as shown in (22).(22) a. The condition in question can apply at LF alone.b. Furthermore, the condition sometimes must apply at LF.c. Furthermore, the condition must not apply at S-Structure.Even (22a), the weakest of the three, suffices: LF has independent motivation,but S-Structure does not. Argument (22b) is stronger on the assumption that,optimally, conditions are unitary: they apply at a single level, hence at LF ifpossible. Argument (22c) would be decisive.To sample the problems that arise, consider binding theory. There are familiar arguments showing that the binding theory conditions must apply atS-Structure, not LF. Thus, consider (23).(23) a. you said he liked [the pictures that John took]b. [how many pictures that John took] did you say he liked tc. who [t said he liked [ how many pictures that John took]]In (23a) he c-commands John and cannot take John as antecedent; in (23b)there is no c-command relation and John can be the antecedent of he. In (23c)John again cannot be the antecedent of he. Since the binding properties of(23c) are those of (23a), not (23b), we conclude that he c-commands John at

A Minimalist Program for Linguistic Theory

177

the level of representation at which Condition C applies. But if LF movement

adjoins to who in (23c), Condition C must apply at S-Structure.The argument is not conclusive, however. Following the line of argumentin section 1.3.3 (see (105)), we might reject the last assumption: that LF movement adjoins of (23c) to who, forming (24), t the trace of the LF-movedphrase.(24) [[how many pictures that John took] who] [t said he liked t]We might assume that the only permissible option is extraction of how manyfrom the full NP , yielding an LF form along the lines of (25), t the traceof how many.30(25) [[how many] who] [t said he liked [[t pictures] that John took]]The answer, then, could be the pair (Bill, 7), meaning that Bill said he liked7 pictures that John took. But in (25) he c-commands John, so that ConditionC applies as in (23a). We are therefore not compelled to assume that ConditionC applies at S-Structure; we can keep to the preferable option that conditionsinvolving interpretation apply only at the interface levels. This is an argumentof the type (22a), weak but sufficient. We will return to the possibility ofstronger arguments of the types (22b) and (22c).The overt analogue of (25) requires pied-piping of the entire NP [howmany pictures that John took], but it is not clear that the same is true in theLF component. We might, in fact, proceed further. The LF rule that associatesthe in-situ wh-phrase with the wh-phrase in [Spec, CP] need not be construedas an instance of Move . We might think of it as the syntactic basis forabsorption in the sense of Higginbotham and May (1981), an operation thatassociates two wh-phrases to form a generalized quantifier.31 If so, then theLF rule need satisfy none of the conditions on movement.There has long been evidence that conditions on movement do not hold formultiple questions. Nevertheless, the approach just proposed appeared to beblocked by the properties of Chinese- and Japanese-type languages, with whin situ throughout but observing at least some of the conditions on movement(Huang 1982). Watanabe (1991) has argued, however, that even in these languages there is overt wh-movementin this case movement of an emptyoperator, yielding the effects of the movement constraints. If Watanabe iscorrect, we could assume that a wh-operator always raises overtly, that Move is subject to the same conditions everywhere in the derivation to PF and LF,and that the LF operation that applies in multiple questions in English anddirect questions in Japanese is free of these conditions. What remains is the

178

Chapter 3

question why overt movement of the operator is always required, a question

of the category (21a). We will return to that.Let us recall again the minimalist assumptions that I am conjecturing canbe upheld: all conditions are interface conditions; and a linguistic expressionis the optimal realization of such interface conditions. Let us consider thesenotions more closely.Consider a representation at PF. PF is a representation in universal phonetics, with no indication of syntactic elements or relations among them (X-barstructure, binding, government, etc.). To be interpreted by the performancesystems A-P, must be constituted entirely of legitimate PF objects, that is,elements that have a uniform, language-independent interpretation at theinterface. In that case we will say that satisfies the condition of FullInterpretation (FI). If fails FI, it does not provide appropriate instructionsto the performance systems. We take FI to be the convergence condition: if satisfies FI, the derivation D that formed it converges at PF; otherwise, itcrashes at PF. For example, if contains a stressed consonant or a [+high,+low] vowel, then D crashes; similarly, if contains some morphologicalelement that survives to PF, lacking any interpretation at the interface. If Dconverges at PF, its output receives an articulatory-perceptual interpretation,perhaps as gibberish.All of this is straightforwardindeed, hardly more than an expression ofwhat is tacitly assumed. We expect exactly the same to be true at LF.To make ideas concrete, we must spell out explicitly what are the legitimateobjects at PF and LF. At PF, this is the standard problem of universal phonetics. At LF, we assume each legitimate object to be a chain CH = (1, , n):at least (perhaps at most) with CH a head, an argument, a modifier, or anoperator-variable construction. We now say that the representation satisfiesFI at LF if it consists entirely of legitimate objects; a derivation forming converges at LF if satisfies FI, and otherwise crashes. A convergent derivation may produce utter gibberish, exactly as at PF. Linguistic expressions maybe deviant along all sorts of incommensurable dimensions, and we have nonotion of well-formed sentence (see note 7). Expressions have the interpretations assigned to them by the performance systems in which the language isembedded: period.To develop these ideas properly, we must proceed to characterize notionswith the basic properties of A- and A -position. These notions were welldefined in the LGB framework, but in terms of assumptions that are no longerheld, in particular, the assumption that -marking is restricted to sisterhood,with multiple-branching constructions. With these assumptions abandoned, thenotions are used only in an intuitive sense. To replace them, let us consider

A Minimalist Program for Linguistic Theory

179

more closely the morphological properties of lexical items, which play a majorrole in the minimalist program we are sketching. (See section 1.3.2.)Consider the verbal system of (2). The main verb typically picks up thefeatures of T and Agr (in fact, both Agrs and Agro in the general case), adjoining to an inflectional element I to form [V I]. There are two ways to interpretthe process, for a lexical element . One is to take to be a bare, uninflectedform; PF rules are then designed to interpret the abstract complex [ I] as asingle inflected phonological word. The other approach is to take to haveinflectional features in the lexicon as an intrinsic property (in the spirit oflexicalist phonology); these features are then checked against the inflectionalelement I in the complex [ I].32 If the features of and I match, I disappearsand enters the PF component under Spell-Out; if they conflict, I remainsand the derivation crashes at PF. The PF rules, then, are simple rewriting rulesof the usual type, not more elaborate rules applying to complexes [ I].I have been tacitly assuming the second option. Let us now make that choiceexplicit. Note that we need no longer adopt the Emonds-Pollock assumptionthat in English-type languages I lowers to V. V will have the inflectional features before Spell-Out in any event, and the checking procedure may take placeanywhere, in particular, after LF movement. French-type and English-typelanguages now look alike at LF, whereas lowering of I in the latter would haveproduced adjunction structures quite unlike those of the raising languages.There are various ways to make a checking theory precise, and to capturegeneralizations that hold across morphology and syntax. Suppose, for example,that Bakers Mirror Principle is strictly accurate. Then we may take a lexicalelementsay, the verb Vto be a sequence V = (, Infl1, , Infln), where is the morphological complex [R-Infl1--Infln], R a root and Infli an inflectional feature.33 The PF rules only see . When V is adjoined to a functionalcategory F (say, Agro), the feature Infl1 is removed from V if it matches F; andso on. If any Infli remains at LF, the derivation crashes at LF. The PF form always satisfies the Mirror Principle in a derivation that converges at LF. Othertechnologies can readily be devised. In this case, however, it is not clear thatsuch mechanisms are in order; the most persuasive evidence for the MirrorPrinciple lies outside the domain of inflectional morphology, which may besubject to different principles. Suppose, say, that richer morphology tends tobe more visible, that is, closer to the word boundary; if so, and if the speculations of the paragraph ending with note 13 are on the right track, we wouldexpect nominative or absolutive agreement (depending on language type) tobe more peripheral in the verbal morphology.The functional elements T and Agr therefore incorporate features of theverb. Let us call these features V-features: the function of the V-features of an

180

Chapter 3

inflectional element I is to check the morphological properties of the verb

selected from the lexicon. More generally, let us call such features of a lexicalitem L L-features. Keeping to the X-bar-theoretic notions, we say that a position is L-related if it is in a local relation to an L-feature, that is, in the internaldomain or checking domain of a head with an L-feature. Furthermore, thechecking domain can be subdivided into two categories: nonadjoined (Spec)and adjoined. Let us call these positions narrowly and broadly L-related,respectively. A structural position that is narrowly L-related has the basicproperties of A-positions; one that is not L-related has the basic properties ofA -positions, in particular, [Spec, C], not L-related if C does not contain aV-feature. The status of broadly L-related (adjoined) positions has beendebated, particularly in the theory of scrambling.34 For our limited purposes,we may leave the matter open.Note that we crucially assume, as is plausible, that V-raising to C is actuallyI-raising, with V incorporated within I,and is motivated by properties of the(C, I) system, not morphological checking of V. C has other properties thatdistinguish it from the V-features, as discussed in section 1.4.1.The same considerations extend to nouns (assuming the D head of DP tohave N-features) and adjectives. Putting this aside, we can continue to speakinformally of A- and A -positions, understood in terms of L-relatedness as afirst approximation only, with further refinement still necessary. We canproceed, then, to define the legitimate LF objects CH = (1, , n) in something like the familiar way: heads, with i an X0; arguments, with i in anA-position; adjuncts, with i in an A -position; and operator-variable constructions, to which we will briefly return.35 This approach seems relatively unproblematic. Let us assume so, and proceed.The morphological features of T and Agr have two functions: they checkproperties of the verb that raises to them, and they check properties of the NP(DP) that raises to their Spec; thus, they ensure that DP and V are properlypaired. Generalizing the checking theory, let us assume that, like verbs, nounsare drawn from the lexicon with all of their morphological features, includingCase and -features, and that these too must be checked in the appropriateposition:36 in this case, [Spec, Agr] (which may include T or V). This checkingtoo can take place at any stage of a derivation to LF.A standard argument for S-Structure conditions in the Case module is thatCase features appear at PF but must be visible at LF; hence, Case must bepresent by the time the derivation reaches S-Structure. But that argument collapses under a checking theory. We may proceed, then, with the assumptionthat the Case Filter is an interface conditionin fact, the condition that allmorphological features must be checked somewhere, for convergence. There

A Minimalist Program for Linguistic Theory

181

are many interesting and subtle problems to be addressed; reluctantly, I will

put them aside here, merely asserting without argument that a proper understanding of economy of derivation goes a long way (maybe all the way) towardresolving them.37Next consider subject-verb agreement, as in John hits Bill. The -featuresappear in three positions in the course of the derivation: internal to John,internal to hits, and in Agrs. The verb hits raises ultimately to Agrs and the NPJohn to [Spec, Agrs], each checking its morphological features. If the lexicalitems were properly chosen, the derivation converges. But at PF and LF the-features appear only twice, not three times: in the NP and verb that agree.Agr plays only a mediating role: when it has performed its function, it disappears. Since this function is dual, V-related and NP-related, Agr must in facthave two kinds of features: V-features that check V adjoined to Agr, and NPfeatures that check NP in [Spec, Agr]. The same is true of T, which checksthe tense of the verb and the Case of the subject. The V-features of an inflectional element disappear when they check V, the NP-features when they checkNP (or N, or DP; see note 36). All this is automatic, and within the MinimalistProgram.Let us now return to the first type of S-Structure condition (21a), the position of Spell-Out: after V-raising in French-type languages, before V-raisingin English-type languages (we have now dispensed with lowering). As we haveseen, the Minimalist Program permits only one solution to the problem: PFconditions reflecting morphological properties must force V-raising in Frenchbut not in English. What can these conditions be?Recall the underlying intuition of Pollocks approach, which we are basically assuming: French-type languages have strong Agr, which forces overtraising, and English-type languages have weak Agr, which blocks it. Let usadopt that idea, rephrasing it in our terms: the V-features of Agr are strong inFrench, weak in English. Recall that when the V-features have done their work,checking adjoined V, they disappear. If V does not raise to Agr overtly, theV-features survive to PF. Let us now make the natural assumption that strongfeatures are visible at PF and weak features invisible at PF. These featuresare not legitimate objects at PF; they are not proper components of phoneticmatrices. Therefore, if a strong feature remains after Spell-Out, the derivationcrashes.38 In French overt raising is a prerequisite for convergence; in Englishit is not.Two major questions remain: Why is overt raising barred in English? Whydo the English auxiliaries have and be raise overtly, as do verbs in French?The first question is answered by a natural economy condition: LFmovement is cheaper than overt movement (call the principle Procrastinate).

182

Chapter 3

(See section 1.3.3.) The intuitive idea is that LF operations are a kind ofwired-in reflex, operating mechanically beyond any directly observableeffects. They are less costly than overt operations. The system tries to reachPF as fast as possible, minimizing overt syntax. In English-type languages,overt raising is not forced for convergence; therefore, it is barred by economyprinciples.To deal with the second question, consider again the intuition that underliesPollocks account: raising of the auxiliaries reflects their semantic vacuity;they are placeholders for certain constructions, at most very light verbs.Adopting the intuition (but not the accompanying technology), let us assumethat such elements, lacking semantically relevant features, are not visible toLF rules. If they have not raised overtly, they will not be able to raise by LFrules and the derivation will crash.39Now consider the difference between SVO (or SOV) languages like English(Japanese) and VSO languages like Irish. On our assumptions, V has raisedovertly to I (Agrs) in Irish, while S and O raise in the LF component to [Spec,Agrs] and [Spec, Agro], respectively.40 We have only one way to express thesedifferences: in terms of the strength of the inflectional features. One possibilityis that the NP-feature of T is strong in English and weak in Irish. Hence, NPmust raise to [Spec, [Agr T]] in English prior to Spell-Out or the derivationwill not converge. The principle Procrastinate bars such raising in Irish. TheExtended Projection Principle, which requires that [Spec, IP] be realized(perhaps by an empty category), reduces to a morphological property of T:strong or weak NP-features. Note that the NP-feature of Agr is weak inEnglish; if it were strong, English would exhibit overt object shift. We are stillkeeping to the minimal assumption that Agrs and Agro are collections of features, with no relevant subject-object distinction, hence no difference instrength of features. Note also that a language might allow both weak andstrong inflection, hence weak and strong NP-features: Arabic is a suggestivecase, with SVO versus VSO correlating with the richness of visible verbinflection.Along these lines, we can eliminate S-Structure conditions on raising andlowering in favor of morphological properties of lexical items, in accord withthe Minimalist Program. Note that a certain typology of languages is predicted;whether correctly or not remains to be determined.If Watanabes (1991) theory of wh-movement is correct, there is no parametric variation with regard to wh- in situ: language differences (say, EnglishJapanese) reduce to morphology, in this case, the internal morphology of thewh-phrases. Still, the question arises why raising of the wh-operator is everovert, contrary to Procrastinate. The basic economy-of-derivation assumption

A Minimalist Program for Linguistic Theory

183

is that operations are driven by necessity: they are last resort, applied if theymust be, not otherwise (Chomsky 1986b, and chapter 2). Our assumption isthat operations are driven by morphological necessity: certain features mustbe checked in the checking domain of a head, or the derivation will crash.Therefore, raising of an operator to [Spec, CP] must be driven by such arequirement. The natural assumption is that C may have an operator feature(which we can take to be the Q- or wh-feature standardly assumed in C in suchcases), and that this feature is a morphological property of such operators aswh-. For appropriate C, the operators raise for feature checking to the checkingdomain of C: [Spec, CP], or adjunction to Spec (absorption), thereby satisfyingtheir scopal properties.41 Topicalization and focus could be treated the sameway. If the operator feature of C is strong, the movement must be overt. Raisingof I to C may automatically make the relevant feature of C strong (the V-secondphenomenon). If Watanabe is correct, the wh-operator feature is universallystrong.3.5

Extensions of the Minimalist Program

Let us now look more closely at the economy principles. These apply toboth representations and derivations. With regard to the former, we maytake the economy principle to be nothing other than FI: every symbol mustreceive an external interpretation by language-independent rules. Thereis no need for the Projection Principle or -Criterion at LF. A convergent derivation might violate them, but in that case it would receive a defectiveinterpretation.The question of economy of derivations is more subtle. We have alreadynoted two cases: Procrastinate, which is straightforward, and the Last Resortprinciple, which is more intricate. According to that principle, a step in a derivation is legitimate only if it is necessary for convergencehad the step notbeen taken, the derivation would not have converged. NP-raising, for example,is driven by the Case Filter (now assumed to apply only at LF): if the Casefeature of NP has already been checked, NP may not raise. For example, (26a)is fully interpretable, but (26b) is not.(26) a. there is [ a strange man] in the gardenb. there seems to [ a strange man] [that it is raining outside]In (26a) is not in a proper position for Case checking; therefore, it mustraise at LF, adjoining to the LF affix there and leaving the trace t. The phrase is now in the checking domain of the matrix inflection. The matrix subjectat LF is [-there], an LF word with all features checked but interpretable only

184

Chapter 3

in the position of the trace t of the chain (, t), its head being invisibleword-internally. In contrast, in (26b) has its Case properties satisfied internalto the PP, so it is not permitted to raise, and we are left with freestanding there.This is a legitimate object, a one-membered A-chain with all its morphologicalproperties checked. Hence, the derivation converges. But there is no coherentinterpretation, because freestanding there receives no semantic interpretation(and in fact is unable to receive a -role even in a -position). The derivationthus converges, as semigibberish.The notion of Last Resort operation is in part formulable in terms ofeconomy: a shorter derivation is preferred to a longer one, and if the derivationD converges without application of some operation, then that application isdisallowed. In (26b) adjunction of to there would yield an intelligible interpretation (something like there is a strange man to whom it seems that it israining outside). But adjunction is not permitted: the derivation convergeswith an unintelligible interpretation. Derivations are driven by the narrowmechanical requirement of feature checking only, not by a search for intelligibility or the like.Note that raising of in (26b) is blocked by the fact that its own requirements are satisfied without raising, even though such raising would arguablyovercome inadequacies of the LF affix there. More generally, Move appliesto an element only if morphological properties of itself are not otherwisesatisfied. The operation cannot apply to to enable some different element to satisfy its properties. Last Resort, then, is always self-serving: benefitingother elements is not allowed. Alongside Procrastinate, then, we have a principle of Greed: self-serving Last Resort.Consider the expression (27), analogous to (26b) but without there-insertionfrom the lexicon.(27) seems to [ a strange man] [that it is raining outside]Here the matrix T has an NP-feature (Case feature) to discharge, but cannotraise (overtly or covertly) to overcome that defect. The derivation cannotconverge, unlike (26b), which converges but without a proper interpretation.The self-serving property of Last Resort cannot be overridden even to ensureconvergence.Considerations of economy of derivation tend to have a global character,inducing high-order computational complexity. Computational complexitymay or may not be an empirical defect; it is a question of whether the casesare correctly characterized (e.g., with complexity properly relating to parsingdifficulty, often considerable or extreme, as is well known). Nevertheless,it makes sense to expect language design to limit such problems. The

A Minimalist Program for Linguistic Theory

185

self-serving property of Last Resort has the effect of restricting the class ofderivations that have to be considered in determining optimality, and might beshown on closer analysis to contribute to this end.42Formulating economy conditions in terms of the principles of Procrastinateand Greed, we derive a fairly narrow and determinate notion of most economical convergent derivation that blocks all others. Precise formulationof these ideas is a rather delicate matter, with a broad range of empiricalconsequences.We have also assumed a notion of shortest link, expressible in terms ofthe operation Form Chain. We thus assume that, given two convergent derivations D1 and D2, both minimal and containing the same number of steps, D1blocks D2 if its links are shorter. Pursuing this intuitive idea, which must beconsiderably sharpened, we can incorporate aspects of Subjacency and theECP, as briefly indicated.Recall that for a derivation to converge, its LF output must be constitutedof legitimate objects: tentatively, heads, arguments, modifiers, and operatorvariable constructions. A problem arises in the case of pied-piped constructions such as (28).(28) (guess) [[wh in which house] John lived t]The chain (wh, t) is not an operator-variable construction. The appropriate LFform for interpretation requires reconstruction, as in (29) (see section 1.3.3).(29) a. [which x, x a house] John lived [in x]b. [which x] John lived [in [x house]]Assume that (29a) and (29b) are alternative options. There are various waysin which these options can be interpreted. For concreteness, let us select aparticularly simple one.43Suppose that in (29a) x is understood as a DP variable: regarded substitutionally, it can be replaced by a DP (the answer can be the old one); regardedobjectually, it ranges over houses, as determined by the restricted operator. In(29b) x is a D variable: regarded substitutionally, it can be replaced by a D(the answer can be that (house)); regarded objectually, it ranges overentities.Reconstruction is a curious operation, particularly when it is held to followLF movement, thus restoring what has been covertly moved, as often proposed(e.g., for (23c)). If possible, the process should be eliminated. An approachthat has occasionally been suggested is the copy theory of movement: thetrace left behind is a copy of the moved element, deleted by a principle of thePF component in the case of overt movement. But at LF the copy remains,

186

Chapter 3

providing the materials for reconstruction. Let us consider this possibility,

surely to be preferred if it is tenable.The PF deletion operation is, very likely, a subcase of a broader principlethat applies in ellipsis and other constructions (see section 1.5). Consider suchexpressions as (30ab).(30) a. John said that he was looking for a cat, and so did Billb. John said that he was looking for a cat, and so did Bill [E say thathe was looking for a cat]The first conjunct is several-ways ambiguous. Suppose we resolve the ambiguities in one of the possible ways, say, by taking the pronoun to refer to Tomand interpreting a cat nonspecifically, so that John said that Toms quest wouldbe satisfied by any cat. In the elliptical case (30a), a parallelism requirementof some kind (call it PR) requires that the second conjunct must be interpretedthe same wayin this case, with he referring to Tom and a cat understoodnonspecifically (Lakoff 1970, Lasnik 1972, Sag 1976, Ristad 1993). The sameis true in the full sentence (30b), a nondeviant linguistic expression with adistinctive low-falling intonation for E; it too must be assigned its propertiesby the theory of grammar. PR surely applies at LF. Since it must apply to(30b), the simplest assumption would be that only (30b) reaches LF, (30a)being derived from (30b) by an operation of the PF component deleting copies.There would be no need, then, for special mechanisms to account for the parallelism properties of (30a). Interesting questions arise when this path is followed, but it seems promising. If so, the trace deletion operation may wellbe an obligatory variant of a more general operation applying in the PFcomponent.Assuming this approach, (28) is a notational abbreviation for (31).(31) [wh in which house] John lived [wh in which house]The LF component converts the phrase wh to either (32a) or (32b) by an operation akin to QR.(32) a. [which house] [wh in t]b. [which] [wh in [t house]]We may give these the intuitive interpretations of (33ab).(33) a. [which x, x a house] [in x]b. [which x] [in [x house]]For convergence at LF, we must have an operator-variable structure. Accordingly, in the operator position [Spec, CP], everything but the operator phrasemust delete; therefore, the phrase wh of (32) deletes. In the trace position, the

A Minimalist Program for Linguistic Theory

187

copy of what remains in the operator position deletes, leaving just the phrasewh (an LF analogue to the PF rule just described). In the present case (perhapsgenerally), these choices need not be specified; other options will crash. Wethus derive LF forms interpreted as (29a) or (29b), depending on which optionwe have selected. The LF forms now consist of legitimate objects, and thederivations converge.Along the same lines, we will interpret which book did John read either as[which x, x a book] [John read x] (answer: War and Peace) or as [which x][John read [x book]] (answer: that (book)).The assumptions are straightforward and minimalist in spirit. They carry usonly partway toward an analysis of reconstruction and interpretation; there arecomplex and obscure phenomena, many scarcely understood. Insofar as theseassumptions are tenable and properly generalizable, we can eliminate reconstruction as a separate process, keeping the term only as part of informaldescriptive apparatus for a certain range of phenomena.Extending observations of Van Riemsdijk and Williams (1981), Freidin(1986) points out that such constructions as (34ab) behave quite differentlyunder reconstruction.44(34) a. which claim [that John was asleep] was he willing to discussb. which claim [that John made] was he willing to discussIn (34a) reconstruction takes place: the pronoun does not take John as antecedent. In contrast, in (34b) reconstruction is not obligatory and the anaphoricconnection is an option. While there are many complications, to a first approximation the contrast seems to reduce to a difference between complement andadjunct, the bracketed clause of (34a) and (34b), respectively. Lebeaux (1988)proposed an analysis of this distinction in terms of generalized transformations. In case (34a) the complement must appear at the level of D-Structure;in case (34b) the adjunct could be adjoined by a generalized transformationin the course of derivation, in fact, after whatever processes are responsiblefor the reconstruction effect.45The approach is appealing, if problematic. For one thing, there is the question of the propriety of resorting to generalized transformations. For another,the same reasoning forces reconstruction in the case of A-movement. Thus,(35) is analogous to (34a); the complement is present before raising and shouldtherefore force a Condition C violation.(35) the claim that John was asleep seems to him [IP t to be correct]Under the present interpretation, the trace t is spelled out as identical to thematrix subject. While it deletes at PF, it remains at LF, yielding the unwanted

188

Chapter 3

reconstruction effect. Condition C of the binding theory requires that the

pronoun him cannot take its antecedent within the embedded IP (compare *Iseem to him [to like John], with him anaphoric to John). But him can take Johnas antecedent in (35), contrary to the prediction.The proposal now under investigation overcomes these objections. We havemoved to a full-blown theory of generalized transformations, so there is noproblem here. The extension property for substitution entails that complementscan only be introduced cyclically, hence before wh-extraction, while adjunctscan be introduced noncyclically, hence adjoined to the wh-phrase after raisingto [Spec, CP]. Lebeauxs analysis of (34) therefore could be carried over. Asfor (35), if reconstruction is essentially a reflex of the formation of operatorvariable constructions, it will hold only for A -chains, not for A-chains. Thatconclusion seems plausible over a considerable range, and yields the rightresults in this case.Let us return now to the problem of binding-theoretic conditions atS-Structure. We found a weak but sufficient argument (of type (22a)) toreject the conclusion that Condition C applies at S-Structure. What aboutCondition A?Consider constructions such as those in (36).46(36) a. i. John wondered [which picture of himself] [Bill saw t]ii. the students asked [what attitudes about each other] [the teachershad noticed t]b. i. John wondered [who [t saw [which picture of himself]]]ii. the students asked [who [t had noticed [what attitudes about eachother]]]The sentences of (36a) are ambiguous, with the anaphor taking either thematrix or embedded subject as antecedent; but those of (36b) are unambiguous, with the trace of who as the only antecedent for himself, each other. If(36b) were formed by LF raising of the in-situ wh-phrase, we would have toconclude that Condition A applies at S-Structure, prior to this operation. Butwe have already seen that the assumption is unwarranted; we have, again, aweak but sufficient argument against allowing binding theory to apply atS-Structure. A closer look shows that we can do still better.Under the copying theory, the actual forms of (36a) are (37ab).(37) a. John wondered [wh which picture of himself] [Bill saw [wh whichpicture of himself]]b. the students asked [wh what attitudes about each other] [the teachershad noticed [wh what attitudes about each other]]

A Minimalist Program for Linguistic Theory

189

The LF principles map (37a) to either (38a) or (38b), depending on which

option is selected for analysis of the phrase wh.(38) a. John wondered [[which picture of himself] [wh t]] [Bill saw [[whichpicture of himself] [wh t]]]b. John wondered [which [wh t picture of himself]] [Bill saw [which [wht picture of himself]]]We then interpret (38a) as (39a) and (38b) as (39b), as before.(39) a. John wondered [which x, x a picture of himself] [Bill saw x]b. John wondered [which x] [Bill saw [x picture of himself]]Depending on which option we have selected, himself will be anaphoric toJohn or to Bill.47The same analysis applies to (37b), yielding the two options of (40) corresponding to (39).(40) a. the students asked [what x, x attitudes about each other] [theteachers had noticed x]b. the students asked [what x] [the teachers had noticed [x attitudesabout each other]]In (40a) the antecedent of each other is the students; in (40b) it is theteachers.Suppose that we change the examples of (36a) to (41ab), replacing saw bytook and had noticed by had.(41) a. John wondered [which picture of himself] [Bill took t]b. the students asked [what attitudes about each other] [the teachershad]Consider (41a). As before, himself can take either John or Bill as antecedent.There is a further ambiguity: the phrase take picture can be interpretedeither idiomatically (in the sense of photograph) or literally (pick up andwalk away with). But the interpretive options appear to correlate with thechoice of antecedent for himself: if the antecedent is John, the idiomatic interpretation is barred; if the antecedent is Bill, it is permitted. If Bill is replacedby Mary, the idiomatic interpretation is excluded.The pattern is similar for (41b), except that there is no literal-idiomaticambiguity. The only interpretation is that the students asked what attitudeseach of the teachers had about the other teacher(s). If the teachers is replacedby Jones, there is no interpretation.Why should the interpretations distribute in this manner?

190

Chapter 3

First consider (41a). The principles already discussed yield the two LFoptions in (42ab).(42) a. John wondered [which x, x a picture of himself] [Bill took x]b. John wondered [which x] [Bill took [x picture of himself]]If we select the option (42a), then himself takes John as antecedent by Condition A at LF; if we select the option (42b), then himself takes Bill as antecedentby the same principle. If we replace Bill with Mary, then (42a) is forced.Having abandoned D-Structure, we must assume that idiom interpretationtakes place at LF, as is natural in any event. But we have no operations of LFreconstruction. Thus, take picture can be interpreted as photograph onlyif the phrase is present as a unit at LFthat is, in (42b), not (42a). It followsthat in (42a) we have only the nonidiomatic interpretation of take; in (42b) wehave either. In short, only the option (42b) permits the idiomatic interpretation,also blocking John as antecedent of the reflexive and barring replacement ofBill by Mary.The same analysis holds for (41b). The two LF options are (43ab).(43) a. the students asked [what x, x attitudes about each other] [theteachers had x]b. the students asked [what x] [the teachers had [x attitudes about eachother]]Only (43b) yields an interpretation, with have attitudes given its unitarysense.The conclusions follow on the crucial assumption that Condition A notapply at S-Structure, prior to the LF rules that form (42).48 If Condition A wereto apply at S-Structure, John could be taken as antecedent of himself in (41a)and the later LF processes would be free to choose either the idiomatic or theliteral interpretation, however the reconstruction phenomena are handled; andthe students could be taken as antecedent of each other in (41b), with reconstruction providing the interpretation of have attitudes. Thus, we have thestrongest kind of argument against an S-Structure condition (type (22c)):Condition A cannot apply at S-Structure.Note also that we derive a strong argument for LF representation. The factsare straightforwardly explained in terms of a level of representation with twoproperties: (1) phrases with a unitary interpretation such as the idiom take picture or have attitudes appear as units; (2) binding theory applies. Instandard EST approaches, LF is the only candidate. The argument is stillclearer in this minimalist theory, lacking D-Structure and (we are now arguing)S-Structure.

A Minimalist Program for Linguistic Theory

191

Combining these observations with the Freidin-Lebeaux examples, we seem

to face a problem, in fact a near-contradiction. In (44a) either option is allowed:himself may take either John or Bill as antecedent. In contrast, in (44b) reconstruction appears to be forced, barring Tom as antecedent of he (by ConditionC) and Bill as antecedent of him (by Condition B).(44) a. John wondered [which picture of himself] [Bill saw t]b. i. John wondered [which picture of Tom] [he liked t]ii. John wondered [which picture of him] [Bill took t]iii. John wondered [what attitude about him] [Bill had t]The Freidin-Lebeaux theory requires reconstruction in all these cases, theof-phrase being a complement of picture. But the facts seem to point toa conception that distinguishes Condition A of the binding theory, whichdoes not force reconstruction, from Conditions B and C, which do. Why shouldthis be?In our terms, the trace t in (44) is a copy of the wh-phrase at the pointwhere the derivation branches to the PF and LF components. Suppose wenow adopt an LF movement approach to anaphora (see section 1.4.2),assuming that the anaphor or part of it raises by an operation similar to cliticizationcall it cliticizationLF. This approach at least has the property wewant: it distinguishes Condition A from Conditions B and C. Note that cliticizationLF is a case of Move ; though applying in the LF component, it necessarily precedes the reconstruction operations that provide the interpretationsfor the LF output. Applying cliticizationLF to (44a), we derive either (45a)or (45b), depending on whether the rule applies to the operator phrase or itstrace TR.49(45) a. John self-wondered [which picture of tself] [NP saw [TR whichpicture of himself]]b. John wondered [which picture of himself] [NP self-saw [TR whichpicture of tself]]We then turn to the LF rules interpreting the wh-phrase, which yield the twooptions (46ab) ( = either tself or himself).(46) a. [[which picture of ] t]b. [which] [t picture of ]Suppose that we have selected the option (45a). Then we cannot select theinterpretive option (46b) (with = tself); that option requires deletion of[t picture of tself] in the operator position, which would break the chain(self, tself), leaving the reflexive element without a -role at LF. We must

without reconstruction:(47) John self-wondered [which x, x a picture of tself] NP saw xIn short, if we take the antecedent of the reflexive to be John, then only thenonreconstructing option converges.If we had Tom or him in place of himself, as in (44b), then these issueswould not arise and either interpretive option would converge. We thus havea relevant difference between the two categories of (44). To account for thejudgments, it is only necessary to add a preference principle for reconstruction: Do it when you can (i.e., try to minimize the restriction in the operatorposition). In (44b) the preference principle yields reconstruction, hence abinding theory violation (Conditions C and B). In (44a) we begin with twooptions with respect to application of cliticizationLF: either to the operator orto the trace position. If we choose the first option, selecting the matrix subjectas antecedent, then the preference principle is inapplicable because only thenonpreferred case converges, and we derive the nonreconstruction option. Ifwe choose the second option, selecting the embedded subject as antecedent,the issue of preference again does not arise. Hence, we have genuine optionsin the case of (44a), but a preference for reconstruction (hence the judgmentthat binding theory conditions are violated) in the case of (44b).50Other constructions reinforce these conclusions, for example, (48).51(48) a. i.John wondered what stories about us we had heardii. *John wondered what stories about us we had toldii. John wondered what stories about us we expected Mary to tellb. i. John wondered what opinions about himself Mary had heardi. *John wondered what opinions about himself Mary hadii. they wondered what opinions about each other Mary hadheardii. *they wondered what opinions about each other Mary hadc. i.John wondered how many pictures of us we expected Maryto takeii. *John wondered how many pictures of us we expected to take(idiomatic sense)Note that we have further strengthened the argument for an LF level atwhich all conditions apply: the LF rules, including now anaphor raising,provide a crucial distinction with consequences for reconstruction.The reconstruction process outlined applies only to operator-variable constructions. What about A-chains, which we may assume to be of the form

A Minimalist Program for Linguistic Theory

193

CH = (, t) at LF ( the phrase raised from its original position t, intermediate

traces deleted or ignored)? Here t is a full copy of its antecedent, deleted inthe PF component. The descriptive account must capture the fact that the headof the A-chain is assigned an interpretation in the position t. Thus, in John waskilled t, John is assigned its -role in the position t, as complement of kill. Thesame should be true for such idioms as (49).(49) several pictures were taken tHere pictures is interpreted in the position of t, optionally as part of the idiomtake pictures. Interesting questions arise in the case of such constructionsas (50ab).(50) a. the students asked [which pictures of each other] [Mary took t]b. the students asked [which pictures of each other] [t were taken tby Mary]In both cases the idiomatic interpretation requires that t be [x pictures of eachother] after the operator-variable analysis (reconstruction). In (50a) thatchoice is blocked, while in (50b) it remains open. The examples reinforce thesuggested analysis of -reconstruction, but it is now necessary to interpret thechain (t, t) in (50b) just as the chain (several pictures, t) is interpreted in (49).One possibility is that the trace t of the A-chain enters into the idiom interpretation (and, generally, into -marking), while the head of the chain functionsin the usual way with regard to scope and other matters.Suppose that instead of (44a) we have (51).(51) the students wondered [wh how angry at each other (themselves)][John was t]As in the case of (44a), anaphor raising in (51) should give the interpretationroughly as the students each wondered [how angry at the other John was](similarly with reflexive). But these interpretations are impossible in the caseof (51), which requires the reconstruction option, yielding gibberish. Huang(1990) observes that the result follows on the assumption that subjects arepredicate-internal (VP-, AP-internal; see (4)), so that the trace of John remainsin the subject position of the raised operator phrase wh-, blocking associationof the anaphor with the matrix subject (anaphor raising, in the present account).Though numerous problems remain unresolved, there seem to be goodreasons to suppose that the binding theory conditions hold only at the LFinterface. If so, we can move toward a very simple interpretive version ofbinding theory as in (52) that unites disjoint and distinct reference (D therelevant local domain), overcoming problems discussed particularly by HowardLasnik.52

194

Chapter 3

(52) A. If is an anaphor, interpret it as coreferential with a

c-commanding phrase in D.B. If is a pronominal, interpret it as disjoint from everyc-commanding phrase in D.C. If is an r-expression, interpret it as disjoint from everyc-commanding phrase.Condition A may be dispensable if the approach based upon cliticizationLF iscorrect and the effects of Condition A follow from the theory of movement(which is not obvious); and further discussion is necessary at many points. Allindexing could then be abandoned, another welcome result.53Here too we have, in effect, returned to some earlier ideas about bindingtheory, in this case those of Chomsky 1980a, an approach superseded largelyon grounds of complexity (now overcome), but with empirical advantages overwhat appeared to be simpler alternatives (see note 52).I stress again that what precedes is only the sketch of a minimalist program,identifying some of the problems and a few possible solutions, and omittinga wide range of topics, some of which have been explored, many not. Theprogram has been pursued with some success. Several related and desirableconclusions seem within reach.(53) a. A linguistic expression (SD) is a pair (, ) generated by an optimalderivation satisfying interface conditions.b. The interface levels are the only levels of linguistic representation.c. All conditions express properties of the interface levels, reflectinginterpretive requirements.d. UG provides a unique computational system, with derivationsdriven by morphological properties to which syntactic variation oflanguages is restricted.e. Economy can be given a fairly narrow interpretation in terms of FI,length of derivation, length of links, Procrastinate, and Greed.NotesI am indebted to Samuel Epstein, James Higginbotham, Howard Lasnik, and AlecMarantz for comments on an earlier draft of this paper, as well as to participants incourses, lectures, and discussions on these topics at MIT and elsewhere, too numerousto mention.1. For early examination of these topics in the context of generative grammar, seeChomsky 1951, 1975a (henceforth LSLT). On a variety of consequences, see Collins1994a.

A Minimalist Program for Linguistic Theory

195

2. Not literal necessity, of course; I will avoid obvious qualifications here and below.3. On its nature, see Bromberger and Halle 1989.4. Note that while the intuition underlying proposals to restrict variation to elementsof morphology is clear enough, it would be no trivial matter to make it explicit, givengeneral problems in selecting among equivalent constructional systems. An effort toaddress this problem in any general way would seem premature. It is a historical odditythat linguistics, and soft sciences generally, are often subjected to methodologicaldemands of a kind never taken seriously in the far more developed natural sciences.Strictures concerning Quinean indeterminacy and formalization are a case in point. SeeChomsky 1990, 1992b, Ludlow 1992. Among the many questions ignored here is thefixing of lexical concepts; see Jackendoff 1990b for valuable discussion. For my ownviews on some general aspects of the issues, see Chomsky 1992a,b, 1994b,c, 1995.5. Contrary to common belief, assumptions concerning the reality and nature of I-language (competence) are much better grounded than those concerning parsing. For somecomment, see references of preceding note.6. Markedness of parameters, if real, could be seen as a last residue of the evaluationmetric.7. See Marantz 1984, Baker 1988, on what Baker calls the Principle of PF Interpretation, which appears to be inconsistent with this assumption. One might be tempted tointerpret the class of expressions of the language L for which there is a convergentderivation as the well-formed (grammatical) expressions of L. But this seems pointless. The class so defined has no significance. The concepts well-formed and grammatical remain without characterization or known empirical justification; they playedvirtually no role in early work on generative grammar except in informal exposition,or since. See LSLT and Chomsky 1965; and on various misunderstandings, Chomsky1980b, 1986b.8. Much additional detail has been presented in class lectures at MIT, particularly infall 1991. I hope to return to a fuller exposition elsewhere. As a starting point, I assumehere a version of linguistic theory along the lines outlined in chapter 1.9. In Chomsky 1981a and other work, structural Case is unified under government,understood as m-command to include the Spec-head relation (a move that was notwithout problems); in the framework considered here, m-command plays no role.10. I will use NP informally to refer to either NP or DP, where the distinction is playingno role. IP and I will be used for the complement of C and its head where details areirrelevant.11. I overlook here the possibility of NP-raising to [Spec, T] for Case assignment, thento [Spec, AgrS] for agreement. This may well be a real option. For development of thispossibility, see Bures 1992, Bobaljik and Carnie 1992, Jonas 1992, and sections 4.9and 4.10 of this book.12. Raising of A to AgrA may be overt or in the LF component. If the latter, it may bethe trace of the raised NP that is marked for agreement, with further raising driven bythe morphological requirement of Case marking (the Case Filter); I put aside specificsof implementation. The same considerations extend to an analysis of participial agreement along the lines of Kayne 1989; see chapter 2 and Branigan 1992.

196

Chapter 3

13. For development of an approach along such lines, see Bobaljik 1992a,b. For adifferent analysis sharing some assumptions about the Spec-head role, see Murasugi1991, 1992. This approach to the two language types adapts the earliest proposal aboutthese matters within generative grammar (De Rijk 1972) to a system with inflectionseparated from verb. See Levin and Massam 1985 for a similar conception.14. See chapter 1.15. I put aside throughout the possibility of moving X or adjoining to it, and the question of adjunction to elements other than complement that assign or receive interpretiveroles at the interface.16. This is only the simplest case. In the general case V will raise to AgrO, formingthe chain CHV = (V, t). The complex [V, AgrO] raises ultimately to adjoin to AgrS.Neither V nor CHV has a new checking domain assigned in this position. But V is inthe checking domain of AgrS and therefore shares relevant features with it, and thesubject in [Spec, AgrS] is in the checking domain of AgrS, hence agrees indirectlywith V.17. To mention one possibility, V-raising to AgrO yields a two-membered chain, butsubsequent raising of the [V, AgrO] complex might pass through the trace of T bysuccessive-cyclic movement, finally adjoining to AgrS. The issues raised in note 11 arerelevant at this point. I will put these matters aside.18. Hale and Keyser make a distinction between (1) operations of lexical conceptualstructure that form such lexical items as shelve and (2) syntactic operations that raiseput to V1 in (8), attributing somewhat different properties to (1) and (2). These distinctions do not seem to me necessary for their purposes, for reasons that I will again putaside.19. Note that the ECP will now reduce to descriptive taxonomy, of no theoreticalsignificance. If so, there will be no meaningful questions about conjunctive or disjunctive ECP, the ECP as an LF or PF phenomenon (or both), and so on. Note that no aspectof the ECP can apply at the PF interface itself, since there we have only a phoneticmatrix, with no relevant structure indicated. The proposal that the ECP breaks downinto a PF and an LF property (as in Aoun et al. 1987) therefore must take the formerto apply either at S-Structure or at a new level of shallow structure between S-Structure and PF.20. Note that the two chains in (14) are ([V Vc], t) and (V, t). But in the latter, V isfar removed from its trace because of the operation raising [V Vc]. Each step of thederivation satisfies the HMC, though the final output violates it (since the head tintervenes between V and its trace). Such considerations tend to favor a derivationalapproach to chain formation over a representational one. See chapters 1 and 2. Recallalso that the crucial concept of minimal subdomain could only be interpreted in termsof a derivational approach.21. For an example, see Baker 1988, 163.22. Recall that even if Obj is replaced by an element that does not require structuralCase, Subj must still raise to [Spec, AgrS] in a nominative-accusative language (withactive AgrS).23. This formulation allows later insertion of functional items that are vacuous for LFinterpretation, for example, the do of do-support or the of of of-insertion.

A Minimalist Program for Linguistic Theory

197

24. This is not to say that -theory is dispensable at LF, for example, the principles of-discharge discussed in Higginbotham 1985. It is simply that the -Criterion andProjection Principle play no role.25. I know of only one argument against generalized transformations, based on restrictiveness (Chomsky 1965): only a proper subclass of the I-languages (there calledgrammars) allowed by the LSLT theory appear to exist, and only these are permittedif we eliminate generalized transformations and T-markers in favor of a recursive basesatisfying the cycle. Elimination of generalized transformations in favor of cyclic basegeneration is therefore justified in terms of explanatory adequacy. But the questionsunder discussion then do not arise in the far more restrictive current theories.26. A modification is necessary for the case of successive-cyclic movement, interpretedin terms of the operation Form Chain. I put this aside here.27. Depending on other assumptions, some violations might be blocked by variousconspiracies. Let us assume, nevertheless, that overt substitution operations satisfythe extension (strict cycle) condition generally, largely on grounds of conceptualsimplicity.28. In case (19b) we assumed that V adjoins to (possibly empty) C, the head of CP,but it was the substitution operation inserting can that violated the cycle to yield theHMC violation. It has often been argued that LF adjunction may violate the structurepreserving requirement of (20), for example, allowing XP-incorporation to X0 orquantifier adjunction to XP. Either conclusion is consistent with the present considerations. See also note 15.29. On noncyclic adjunction, see Branigan 1992 and section 3.5 below.30. See Hornstein and Weinberg 1990 for development of this proposal on somewhatdifferent assumptions and grounds.31. The technical implementation could be developed in many ways. For now, let usthink of it as a rule of interpretation for the paired wh-phrases.32. Technically, raises to the lowest I to form [I I]; then the complex raises to thenext higher inflectional element; and so on. Recall that after multiple adjunction, will still be in the checking domain of the highest I.33. More fully, Infli is a collection of inflectional features checked by the relevantfunctional element.34. The issue was raised by Webelhuth (1989) and has become a lively research topic.See Mahajan 1990 and much ongoing work. Note that if I adjoins to C, forming [C IC], [Spec, C] is in the checking domain of the chain (I, t). Hence, [Spec, C] is L-related(to I), and non-L-related (to C). A sharpening of notions is therefore required to determine the status of C after I-to-C raising. If C has L-features, [Spec, C] is L-related andwould thus have the properties of an A-position, not an -position. Questions arisehere related to proposals of Rizzi (1990) on agreement features in C, and his morerecent work extending these notions; these would take us too far afield here.35. Heads are not narrowly L-related, hence not in A-positions, a fact that bears onECP issues. See section 1.4.1.36. I continue to put aside the question whether Case should be regarded as a propertyof N or D, and the DP-NP distinction generally.

198

Chapter 3

37. See section 1.4.3 for some discussion.

38. Alternatively, weak features are deleted in the PF component so that PF rules canapply to the phonological matrix that remains; strong features are not deleted so thatPF rules do not apply, causing the derivation to crash at PF.39. Note that this is a reformulation of proposals by Emmon Bach and others in theframework of the Standard Theory and Generative Semantics: that these auxiliaries areinserted in the course of derivation, not appearing in the semantically relevant underlying structures. See Tremblay 1991 for an exploration of similar intuitions.40. This leaves open the possibility that in VSO languages subject raises overtly to[Spec, TP] while T (including the adjoined verb) raises to AgrS; for evidence that thatis correct, see the references of note 11.41. Raising would take place only to [Spec, CP], if absorption does not involve adjunction to a wh-phrase in [Spec, CP]. See note 31. I assume here that CP is not an adjunction target.42. See chapter 2 and Chomsky 1991b. The self-serving property may also bear onwhether LF operations are costless, or simply less costly.43. There are a number of descriptive inadequacies in this overly simplified version.Perhaps the most important is that some of the notions used here (e.g., objectual quantification) have no clear interpretation in the case of natural language, contrary tocommon practice. Furthermore, we have no real framework within which to evaluatetheories of interpretation; in particular, considerations of explanatory adequacy andrestrictiveness are hard to introduce, on the standard (and plausible) assumption thatthe LF component allows no options. The primary task, then, is to derive an adequatedescriptive account, no simple matter; comparison of alternatives lacks any clear basis.Another problem is that linking to performance theory is far more obscure than in thecase of the PF component. Much of what is taken for granted in the literature on thesetopics seems to me highly problematic, if tenable at all. See LGB and the referencesof note 4 for some comment.44. The topicalization analogues are perhaps more natural: the claim that John isasleep (that John made), The point is the same, assuming an operator-variableanalysis of topicalization.45. In Lebeauxs theory, the effect is determined at D-Structure, prior to raising; I willabstract away from various modes of implementing the general ideas reviewed here.For discussion bearing on these issues, see Speas 1990, Epstein 1991. Freidin (1994)proposes that the difference has to do with the difference between LF representationof a predicate (the relative clause) and a complement; as he notes, that approach provides an argument for limiting binding theory to LF (see (22)).46. In all but the simplest examples of anaphora, it is unclear whether distinctions areto be understood as tendencies (varying in strength for different speakers) or sharpdistinctions obscured by performance factors. For exposition, I assume the latter here.Judgments are therefore idealized, as always; whether correctly or not, only furtherunderstanding will tell.47. Recall that LF wh-raising has been eliminated in favor of the absorption operation,so that in (36b) the anaphor cannot take the matrix subject as antecedent after LFraising.

A Minimalist Program for Linguistic Theory

199

48. I ignore the possibility that Condition A applies irrelevantly at S-Structure, theresult being acceptable only if there is no clash with the LF application.49. I put aside here interesting questions that have been investigated by Pierre Picaand others about how the morphology and the raising interact.50. Another relevant case is (i),(i) (guess) which picture of which man he saw ta Condition C violation if he is taken to be bound by which man (Higginbotham 1980).As Higginbotham notes, the conclusion is much sharper than in (44b). One possibilityis that independently of the present considerations, absorption is blocked from within[Spec, CP], forcing reconstruction to (iia), hence (iib),(ii) a. which x, he saw [x picture of which man]b. which x, y, he saw x picture of [NP y man]a Condition C violation if he is taken to be anaphoric to NP (i.e., within the scope ofwhich man). The same reasoning would imply a contrast between (iiia) and (iiib),(iii) a. who would have guessed that proud of John, Bill never wasb. *who would have guessed that proud of which man, Bill never was(with absorption blocked, and no binding theory issue). That seems correct; other casesraise various questions.51. Cases (48ai), (48aii) correspond to the familiar pairs John (heard, told) storiesabout him, with antecedence possible only in the case of heard, presumably reflectingthe fact that one tells ones own stories but can hear the stories told by others; somethingsimilar holds of the cases in (48b).52. See the essays collected in Lasnik 1989; also section 1.4.2.53. A theoretical apparatus that takes indices seriously as entities, allowing them tofigure in operations (percolation, matching, etc.), is questionable on more generalgrounds. Indices are basically the expression of a relationship, not entities in their ownright. They should be replaceable without loss by a structural account of the relationthey annotate.

Categories and Transformations

The chapters that precede have adopted, modified, and extended work in theprinciples-and-parameters (P&P) model. In this final chapter I will take theframework for Universal Grammar (UG) developed and presented there as astarting point, extending it to questions that had been kept at a distance, subjecting it to a critical analysis, and revising it step by step in an effort toapproach as closely as possible the goals of the Minimalist Program outlinedin the introduction. The end result is a substantially different conception ofthe mechanisms of language.Before proceeding, let us review the guiding ideas of the MinimalistProgram.4.1

The Minimalist Program

A particular language L is an instantiation of the initial state of the cognitive

system of the language faculty with options specified. We take L to be a generative procedure that constructs pairs (, ) that are interpreted at the articulatory-perceptual (A-P) and conceptual-intentional (C-I) interfaces, respectively,as instructions to the performance systems. is a PF representation and an LF representation, each consisting of legitimate objects that can receivean interpretation (perhaps as gibberish). If a generated representation consistsentirely of such objects, we say that it satisfies the condition of Full Interpretation (FI). A linguistic expression of L is at least a pair (, ) meeting thisconditionand under minimalist assumptions, at most such a pair, meaningthat there are no levels of linguistic structure apart from the two interface levelsPF and LF; specifically, no levels of D-Structure or S-Structure.The language L determines a set of derivations (computations). A derivationconverges at one of the interface levels if it yields a representation satisfyingFI at this level, and converges if it converges at both interface levels, PF andLF; otherwise, it crashes. We thus adopt the (nonobvious) hypothesis that there

202

Chapter 4

are no PF-LF interactions relevant to convergencewhich is not to deny, of

course, that a full theory of performance involves operations that apply to the(, ) pair. Similarly, we assume that there are no conditions relating lexicalproperties and interface levels, such as the Projection Principle. The questionof what counts as an interpretable legitimate object raises nontrivial questions,some discussed in earlier chapters.Notice that I am sweeping under the rug questions of considerable significance, notably, questions about what in the earlier Extended Standard Theory(EST) framework were called surface effects on interpretation. These aremanifold, involving topic-focus and theme-rheme structures, figure-groundproperties, effects of adjacency and linearity, and many others. Prima facie,they seem to involve some additional level or levels internal to the phonological component, postmorphology but prephonetic, accessed at the interfacealong with PF (Phonetic Form) and LF (Logical Form).1 If that turns out tobe correct, then the abstraction I am now pursuing may require qualification.I will continue to pursue it nonetheless, merely noting here, once again, thattacit assumptions underlying much of the most productive recent work are farfrom innocent.It seems that a linguistic expression of L cannot be defined just as a pair(, ) formed by a convergent derivation. Rather, its derivation must beoptimal, satisfying certain natural economy conditions: locality of movement,no superfluous steps in derivations, and so on. Less economical computations are blocked even if they converge.The language L thus generates three relevant sets of computations: the setD of derivations, a subset DC of convergent derivations of D, and a subset DAof admissible derivations of D. FI determines DC, and the economy conditionsselect DA. In chapters 13 it was assumed that economy considerations holdonly among convergent derivations; if a derivation crashes, it does not blockothers. Thus, DA is a subset of DC. The assumption, which I continue to adopthere, is empirical; in the final analysis, its accuracy depends on factual considerations. But it has solid conceptual grounds, in that modifications of itentail departures from minimalist goals. On natural assumptions, a derivationin which an operation applies is less economical than one that differs only inthat the operation does not apply. The most economical derivation, then,applies no operations at all to a collection of lexical choices and thus is sureto crash. If nonconvergent derivations can block others, this derivation willblock all others and some elaboration will be needed, an unwelcome result. Inthe absence of convincing evidence to the contrary, then, I will continue toassume that economy considerations hold only of convergent derivations: DAis a subset of DC.

Categories and Transformations

203

Current formulation of such ideas still leaves substantial gaps and a rangeof plausible alternatives, which I will try to narrow as I proceed. It is, furthermore, far from obvious that language should have anything at all like thecharacter postulated in the Minimalist Program, which is just that: a researchprogram concerned with filling the gaps and determining the answers to thebasic questions raised in the opening paragraph of the introduction, in particular, the question How perfect is language?Suppose that this approach proves to be more or less correct. What couldwe then conclude about the specificity of the language faculty (modularity)?Not much. The language faculty might be unique among cognitive systems,or even in the organic world, in that it satisfies minimalist assumptions. Furthermore, the morphological parameters could be unique in character, and thecomputational system CHL biologically isolated.Another source of possible specificity of language lies in the conditionsimposed from the outside at the interface, what we may call bare outputconditions. These conditions are imposed by the systems that make use of theinformation provided by CHL, but we have no idea in advance how specific tolanguage their properties might bequite specific, so current understandingsuggests. There is one very obvious example, which has many effects: theinformation provided by L has to be accommodated to the human sensory andmotor apparatus. Hence, UG must provide for a phonological component thatconverts the objects generated by the language L to a form that these externalsystems can use: PF, we assume. If humans could communicate by telepathy,there would be no need for a phonological component, at least for the purposesof communication; and the same extends to the use of language generally.These requirements might turn out to be critical factors in determining theinner nature of CHL in some deep sense, or they might turn out to be extraneous to it, inducing departures from perfection that are satisfied in an optimalway. The latter possibility is not to be discounted.This property of language might turn out to be one source of a strikingdeparture from minimalist assumptions in language design: the fact that objectsappear in the sensory output in positions displaced from those in which theyare interpreted, under the most principled assumptions about interpretation.This is an irreducible fact about human language, expressed somehow in everycontemporary theory of language, however the facts about displacement maybe formulated. It has also been a central part of traditional grammar, descriptive and theoretical, at least back to the Port-Royal Logic and Grammar. Wewant to determine why language has this property (see section 4.7) and howit is realized (our primary concern throughout). We want to find out how wellthe conditions that impose this crucial property on language are satisfied: as

204

Chapter 4

well as possible, we hope to find. Minimalist assumptions suggest that the

property should be reduced to morphology-driven movement. What is knownabout the phenomena seems to me to support this expectation.These displacement properties are one central syntactic respect in whichnatural languages differ from the symbolic systems devised for one or anotherpurpose, sometimes called languages by metaphoric extension (formal languages, programming languages); there are other respects, including semanticdifferences.2 The displacement property reflects the disparityin fact, complementaritybetween morphology (checking of features) and -theory(assignment of semantic roles), an apparent fact about natural language thatis increasingly highlighted as we progress toward minimalist objectives. Seesection 4.6.The Minimalist Program bears on the question of specificity of language,but in a limited way. It suggests where the question should arise: in the natureof the computational procedure CHL and the locus of its variability (formalmorphological features of the lexicon, I am assuming); in the properties of thebare output conditions; and in the more obscure but quite interesting matterof conceptual naturalness of principles and concepts.It is important to distinguish the topic of inquiry here from a different one:to what (if any) extent are the properties of CHL expressed in terms of outputconditionssay, filters of the kind discussed in Chomsky and Lasnik 1977,or chain formation algorithms in the sense of Rizzi 1986b in syntax, or conditions of the kind recently investigated for phonology in terms of OptimalityTheory (Prince and Smolensky 1993, McCarthy and Prince 1993)? The question is imprecise: we do not know enough about the external systems at theinterface to draw firm conclusions about conditions they impose, so the distinction between bare output conditions and others remains speculative in part.The problems are nevertheless empirical, and we can hope to resolve them bylearning more about the language faculty and the systems with which it interacts. We proceed in the only possible way: by making tentative assumptionsabout the external systems and proceeding from there.The worst possible case is that devices of both types are required: bothcomputational processes that map symbolic representations to others andoutput conditions. That would require substantial empirical argument. Thefacts might, of course, force us to the worst case, but we naturally hope to findthat CHL makes use of processes of only a restricted type, and I will assumeso unless the contrary is demonstrated.A related question is whether CHL is derivational or representational: doesit involve successive operations leading to (, ) (if it converges), or does

Categories and Transformations

205

it operate in one of any number of other wayssay, selecting two such

representations and then computing to determine whether they are properlypaired, selecting one and deriving the other, and so on?3 These questions arenot only imprecise but also rather subtle; typically, it is possible to recode oneapproach in terms of others. But these questions too are ultimately empirical,turning basically on explanatory adequacy. Thus, filters were motivated by thefact that simple output conditions made it possible to limit considerably thevariety and complexity of transformational rules, advancing the effort toreduce these to just Move (or Affect , in the sense of Lasnik and Saito1984) and thus to move toward explanatory adequacy. Vergnauds theory ofabstract Case, which placed a central part of the theory of filters on more solidand plausible grounds, was a substantial further contribution. Similarly, Rizzisproposals about chain formation were justified in terms of explaining factsabout Romance reflexives and other matters.My own judgment is that a derivational approach is nonetheless correct, andthe particular version of a minimalist program I am considering assigns it evengreater prominence, though a residue of filters persists in the concept of morphologically driven Last Resort movement, which has its roots in VergnaudsCase theory. There are certain properties of language, which appear to befundamental, that suggest this conclusion. Viewed derivationally, computationtypically involves simple steps expressible in terms of natural relations andproperties, with the context that makes them natural wiped out by lateroperations, hence not visible in the representations to which the derivationconverges. Thus, in syntax, crucial relations are typically local, but a sequenceof operations may yield a representation in which the locality is obscured.Head movement, for example, is narrowly local, but several such operations may leave a head separated from its trace by an intervening head. Thishappens, for example, when N incorporates to V, leaving the trace tN andthe [V VN] complex then raises to I, leaving the trace tV: the chain (N, tN)at the output level violates the locality property, and further operations (say,XP-fronting) may obscure it even more radically, but locality is observed byeach individual step.In segmental phonology, such phenomena are pervasive. Thus, the rulesderiving the alternants decide-decisive-decision from an invariant lexical entryare straightforward and natural at each step, but the relevant contexts do notappear at all in the output; given only output conditions, it is hard to see whydecision should not rhyme with Poseidon on the simplest assumptions aboutlexical representations, output conditions, and matching of input-output pairings. Similarly, intervocalic spirantization and vowel reduction are natural and

206

Chapter 4

simple processes that derive, say, Hebrew ganvu they stole from underlyingg-n-B, but the context for spirantization is gone after reduction applies; theunderlying form might even all but disappear in the output, as in hitu theyextended, in which only the /t/ remains from the underlying root /ntC/ (C aweak consonant).4,5It is generally possible to formulate the desired result in terms of outputs.In the head movement case, for example, one can appeal to the (plausible)assumption that the trace is a copy, so the intermediate V-trace includes withinit a record of the local N V raising. But surely this is the wrong move. Therelevant chains at LF are (N, tN) and (V, tV), and in these the locality relationsatisfied by successive raising has been lost. Similar artifice could be used inthe phonological examples, again improperly, it appears. These seem to befundamental properties of language, which should be captured, not obscuredby coding tricks, which are always available. A fully derivational approachboth captures them straightforwardly and suggests that they should be pervasive, as seems to be the case.I will continue to assume that the computational system CHL is strictly derivational and that the only output conditions are the bare output conditionsdetermined from the outside, at the interface.We hope to be able to show that for a particular (I-)language L, the phenomena of sound and meaning for L are determined by pairs (, ) formed byadmissible (maximally economical) convergent derivations that satisfy outputconditionswhere determined, of course, means insofar as the cognitivesystem of the language faculty is responsible.6 The computation CHL thatderives (, ) must, furthermore, keep to computational principles that areminimalist in spirit, both in their character and in the economy conditions thatselect derivations. Another natural condition is that outputs consist of nothingbeyond properties of items of the lexicon (lexical features)in other words,that the interface levels consist of nothing more than arrangements of lexicalfeatures. To the extent that this is true, the language meets a condition ofinclusiveness.7 We assume further that the principles of UG involve onlyelements that function at the interface levels; nothing else can be seen inthe course of the computation, a general idea that will be sharpened as weproceed.In pursuing a minimalist program, we want to make sure that we are notinadvertently sneaking in improper concepts, entities, relations, and conventions. The point of the occasional forays into formalism below is to clarify justhow closely CHL keeps to minimalist conditions, with principles and conventions derived where valid. The more spare the assumptions, the more intricatethe argument is likely to be.

Categories and Transformations

4.24.2.1

207

The Cognitive System of the Language Faculty

The Computational Component

A linguistic expression (, ) of L satisfies output conditions at the PF and

LF interfaces. Beyond that, and must be compatible: it is not the case thatany sound can mean anything. In particular, and must be based on thesame lexical choices. We can, then, think of CHL as mapping some array A oflexical choices to the pair (, ). What is A? At least, it must indicate whatthe lexical choices are and how many times each is selected by CHL in forming(, ). Let us take a numeration to be a set of pairs (LI, i), where LI is an itemof the lexicon and i is its index, understood to be the number of times that LIis selected. Take A to be (at least) a numeration N; CHL maps N to (, ). Theprocedure CHL selects an item from N and reduces its index by 1, then performing permissible computations. A computation constructed by CHL does notcount as a derivation at all, let alone a convergent one, unless all indices arereduced to zero.Viewing the language L as a derivation-generating procedure, we may thinkof it as applying to a numeration N and forming a sequence S of symbolicelements (1, 2, ..., n), terminating only if n is a pair (, ) and N is reducedto zero (the computation may go on). S formed in this way is a derivation,which converges if the elements of n satisfy FI at PF and LF, respectively.Economy considerations select the admissible convergent derivations.Given the numeration N, the operations of CHL recursively construct syntactic objects from items in N and syntactic objects already formed. We have todetermine what these objects are and how they are constructed. Insofar as thecondition of inclusiveness holds, the syntactic objects are rearrangements ofproperties of the lexical items of which they are ultimately constituted. Weconsider now just the computation N , for reasons that will becomeclearer as we proceed and that tend to support the view that there is indeedsomething extraneous about the conditions imposed on language at the A-P(sensorimotor) interface.Suppose that the derivation has reached the stage , which we may taketo be a set {SO1, ..., SOn} of syntactic objects. One of the operations ofCHL is a procedure that selects a lexical item LI from the numeration,reducing its index by 1, and introduces it into the derivation as SOn+1. Callthe operation Select. At the LF interface, can be interpreted only if itconsists of a single syntactic object. Clearly, then, CHL must include a secondprocedure that combines syntactic objects already formed. A derivation converges only if this operation has applied often enough to leave us with justa single object, also exhausting the initial numeration. The simplest such

208

Chapter 4

operation takes a pair of syntactic objects (SOi, SOj) and replaces them by anew combined syntactic object SOij. Call this operation Merge. We will returnto its properties, merely noting here that the operations Select and Merge, orsome close counterparts, are necessary components of any theory of naturallanguage.Note that no question arises about the motivation for application of Selector Merge in the course of a derivation. If Select does not exhaust the numeration, no derivation is generated and no questions of convergence or economyarise. Insufficient application of Merge has the same property, since the derivation then fails to yield an LF representation at all; again, no derivation isgenerated, and questions of convergence and economy do not arise. The operations Select and Merge are costless; they do not fall within the domain ofdiscussion of convergence and economy.8 Similarly, we do not have to askabout the effect of illegitimate operations, any more than proof theory is concerned with a sequence of lines that does not satisfy the formal conditions thatdefine proof, or a chess-playing algorithm with evaluation of impropermoves.Within the framework just outlined, there is also no meaningful question asto why one numeration is formed rather than anotheror rather than none, sothat we have silence. That would be like asking that a theory of some formaloperation on integerssay, additionexplain why some integers are addedtogether rather than others, or none. Or that a theory of the mechanisms ofvision or motor coordination explain why someone chooses to look at a sunsetor reach for a banana. The problem of choice of action is real, and largelymysterious, but does not arise within the narrow study of mechanisms.9Suppose the lexical item LI in the numeration N has index i. If a derivationis to be generated, Select must access LI i times, introducing it into the derivation. But the syntactic objects formed by distinct applications of Select to LImust be distinguished; two occurrences of the pronoun he, for example, mayhave entirely different properties at LF. l and l are thus marked as distinct forCHL if they are formed by distinct applications of Select accessing the samelexical item of N. Note that this is a departure from the inclusiveness condition,but one that seems indispensable: it is rooted in the nature of language, andperhaps reducible to bare output conditions.We want the initial array A, whether a numeration or something else, notonly to express the compatibility relation between and but also to fix thereference set for determining whether a derivation from A to (, ) is optimalthat is, not blocked by a more economical derivation. Determination ofthe reference set is a delicate problem, as are considerations of economygenerally. As a first approximation, let us take the numeration to determine

Categories and Transformations

209

the reference set: in evaluating derivations for economy, we consider only

alternatives with the same numeration.Selection of an optimal derivation in the reference set determined from thenumeration N poses problems of computational complexity too vast to berealistic. We can reduce the problem with a more local interpretation ofreference sets. At a particular stage of a derivation, we consider only continuations of the derivation already constructedin particular, only the remaining parts of the numeration N. Application of the operation OP to is barredif this set contains a more optimal derivation in which OP does not apply to. The number of derivations to be considered for determining whether OPmay apply reduces radically as the derivation proceeds. At least this muchstructure seems to be required, presumably more. See section 4.9 for someempirical evidence supporting this construal of reference sets (which, in anyevent, is to be preferred)in fact, an even more stringent condition.An elementary empirical condition on the theory is that expressionsusable by the performance systems be assigned interface representationsin a manner that does not induce too much computational complexity. Wewant to formulate economy conditions that avoid exponential blowup inconstruction and evaluation of derivations. A local interpretation of referencesets is a step in this direction. Where global properties of derivations haveto be considered, as in determining the applicability of the principle Procrastinate of earlier chapters, we expect to find some ready algorithm to reducecomputational complexity. In the case of Procrastinate, it typically suffices tosee if a strong feature is present, which is straightforwardand even easierunder an interpretation of strength to which we return directly. But we arestill a long way from a comprehensive theory of economy, a topic that is nowbeing explored for the first time within a context of inquiry that is able to placeexplanatory adequacy on the research agenda.Given the numeration N, CHL computes until it forms a derivation that converges at PF and LF with the pair (, ), after reducing N to zero (if it does).A perfect language should meet the condition of inclusiveness: any structureformed by the computation (in particular, and ) is constituted of elementsalready present in the lexical items selected for N; no new objects are addedin the course of computation apart from rearrangements of lexical properties(in particular, no indices, bar levels in the sense of X-bar theory, etc.; see note7). Let us assume that this condition holds (virtually) of the computation fromN to LF (N ); standard theories take it to be radically false for the computation to PF.10As already noted, the inclusiveness condition is not fully met. Distinguishing selections of a single lexical item is a (rather narrow) departure. Another

210

Chapter 4

involves the deletion operation (Delete ). Let us assume that this operationmarks some object as invisible at the interface; we will sharpen it as weproceed, assuming for the moment that material deleted, though ignored at theinterface, is still accessible within CHL.11 The question turns out to have interesting ramifications.A core property of CHL is feature checking, the operation that drives movement under the Last Resort condition. A large part of our concern will be toexamine these notions. We can begin by reducing feature checking to deletion:a checked feature is marked invisible at the interface.12 Even a cursory lookshows that this cannot be the whole story, but let us take it as a starting point,returning to a more careful analysis in section 4.5.2.Output conditions show that and , are differently constituted. Elementsinterpretable at the A-P interface are not interpretable at C-I, and conversely.At some point, then, the computation splits into two parts, one forming andthe other forming . The simplest assumptions are (1) that there is no furtherinteraction between these computations and (2) that computational proceduresare uniform throughout: any operation can apply at any point. We adopt (1),and assume (2) for the computation from N to , though not for the computation from N to ; the latter modifies structures (including the internal structureof lexical entries) by processes very different from those that take place in theN , computation. Investigation of output conditions should suffice toestablish these asymmetries, which I will simply take for granted here.We assume, then, that at some point in the (uniform) computation to LF,there is an operation Spell-Out that applies to the structure already formed.Spell-Out strips away from those elements relevant only to , leaving theresidue L, which is mapped to by operations of the kind used to form . itself is then mapped to by operations unlike those of the N computation. We call the subsystem of CHL that maps to the phonologicalcomponent, and the subsystem that continues the computation from L to LFthe covert component. The pre-Spell-Out computation we call overt. Let usassume further that Spell-Out delivers to the module Morphology, whichconstructs wordlike units that are then subjected to further phonological processes that map it finally to , and which eliminates features no longer relevantto the computation. I will have little to say about the phonological componenthere, except for some comments about morphological structure and linearordering.The special properties of the phonological component relate to the need toproduce instructions for sensorimotor systems, for production and perception.As noted, this requirement may be the source of other imperfections of CHL,and in this sense extraneous to language, possibilities we will explore.

Categories and Transformations

211

Given these fairly elementary assumptions about the structure of CHL,

we distinguish two types of lexical feature: those that receive an interpretation only at the A-P interface (phonological) and those that receive aninterpretation only at the C-I interface. I assume further that these sets, aredisjoint, given the very special properties of the phonological component andits PF output.It is reasonable to suppose that overt operations do not delete phonologicalfeatures; otherwise, there would be little reason for them to appear in a lexicalitem at all. Suppose this to be so. By the assumption of uniformity of CHL, itfollows that covert operations cannot do so either; if any phonological featuresenter the covert component (after Spell-Out), the derivation will crash at LF,violating FI. We will make the still stronger assumption that overt operationscannot detect phonological features at allsuch features cannot, for example,distinguish one overt operation from another.13 Thus, the phonological matrixof a lexical item is essentially atomic, as far as overt operations are concerned.It is the form in which the instructions for certain rules of the phonologicalcomponent are coded in the lexical item. For the N computation,nothing would change if the phonological properties of book were coded inthe lexicon as 23, with a rule of the phonological component interpreting 23as the phonological matrix for book.Among the features that appear in lexical entries, we distinguish furtherbetween formal features that are accessible in the course of the computationand others that are not: thus, between the formal features [N] and [plural],and the semantic feature [artifact]. The basis for the distinction and itseffects raise substantial questions,14 among the many that I will put aside here.Such features also function differently in the phonological component. Sincewe take computation to LF to be uniform, we cannot stipulate that certainfeatures are eliminable only after Spell-Out; but the mapping to PF has completely different properties and eliminates features in ways not permitted inthe N computationin particular, it eliminates formal and semanticfeatures.The lexical entry for airplane, for example, contains three collections offeatures: phonological features such as [begins with vowel], semantic featuressuch as [artifact], and formal features such as [nominal]. The phonologicalfeatures are stripped away by Spell-Out and are thus available only to thephonological component; the others are left behind by Spell-Out, and theformal ones may continue to be accessed by the covert computation to LF.Within the phonological component, nonphonological features are eliminatedin the course of the computation, though they may be relevant to itsoperationat least its earlier parts, within the morphological subcomponent.

212

Chapter 4

The collection of formal features of the lexical item LI I will call FF(LI),a subcomplex of LI. Thus, FF(airplane) is the collection of features of airplanethat function in the N computation, excluding the phonological and(purely) semantic features. Some of the features of FF(LI) are intrinsic to it,either listed explicitly in the lexical entry or strictly determined by propertiesso listed. Others are optional, added as LI enters the numeration. We will returnto this matter in section 4.2.2. Insofar as we are considering properties of thecomputation from numeration to LF, we restrict attention to formal features,though bare output conditions at the A-P interface sometimes force a departurefrom this desideratum; see section 4.4.4.In the case of airplane, the intrinsic properties include the categorial feature[nominal], the person feature [3 person], and the gender feature [human]. Itsoptional properties include the noncategorial features of number and Case. Theintrinsic properties of build include the categorial feature [verbal] and the Casefeature [assign accusative], but its -features and tense are optional (if internalto the item). Choices of lexical item LI with different optional features aredistinct members of the numeration. If (airplane, i) is in the numeration, itsfirst term must include the categorial feature [nominal] and the noncategorialfeatures [3 person], [human], as well as one or another choice among numberand Case featuresperhaps [plural] and [accusative], in which case it mayappear in a convergent derivation for we build airplanes. Further analysisreveals additional distinctions and complexity that do not seem to relate to thecomputational procedure CHL, at least those aspects of it to which I will limitattention, along with a host of further questions about boundaries, substructure,and interaction with semantic features, which I ignore here for the same reasonperhaps improperly, as further inquiry may reveal.A guiding intuition of the Minimalist Program is that operations applyanywhere, without special stipulation, the derivation crashing if a wrongchoice is made. Let us assume this to be true of Spell-Out, as of other operations. After Spell-Out, the phonological component cannot select from thenumeration any item with semantic features, and the covert component cannotselect any item with phonological features. That is a requirement for anytheory on the weakest empirical assumptions; otherwise, sound-meaning relations would collapse.15It is unnecessary to add stipulations to this effect. For the phonologicalcomponent, the question does not arise. It has rules of a special nature, distinctfrom those of the N computation, and these only modify forms alreadypresented to them. Accordingly, Select is inoperative in the phonologicalcomponent: no items can be selected from the numeration in the computationfrom Spell-Out to PF.

Categories and Transformations

213

The operation Select is available to the covert component, however, assuming the uniformity condition on the N computation. But if an item withphonological features is selected, the derivation will crash at LF. Selection ofLI must be overt, unless LI has no phonological features. In this case LI canbe selected covertly and merged (at the root, like overt merger, for simplereasons to which we return). We will see that this conceptual possibility maywell be realized.One interesting case concerns strong features: can a (phonologically null)lexical item with a strong feature be selected covertly?To clarify the issues, we have to settle the status of the strength property.Feature strength is one element of language variation: a formal feature mayor may not be strong, forcing overt movement that violates Procrastinate. Alook at cases suggests that the [ strong] dimension is narrowly restricted,perhaps to something like the set of options (1).(1) If F is strong, then F is a feature of a nonsubstantive category and F ischecked by a categorial feature.If so, nouns and main verbs do not have strong features, and a strong featurealways calls for a certain category in its checking domain (not, say, Caseor -features). It follows that overt movement of targeting , forming[Spec, ] or [ ], is possible only when is nonsubstantive and a categorialfeature of is involved in the operation.16 Thus, the Extended ProjectionPrinciple (EPP) plausibly reduces to a strong D-feature of I, and overtwh-raising to a strong D-feature of C (assuming wh- to be a variant of D(Determiner)). Other cases would include overt N-raising to D (Longobardi1994, and sources cited), and I-to-C raising, now understood as involving notAgr or T but a true modal or V adjoined to I, an idea that will make moresense as we proceed. Adjunction of nominals to transitive verbs will target a[vV] complex formed by raising of the main verb V to a light verb, and verbincorporation would also be to a weak verb. Let us assume that something likethis is the case.For ease of exposition, I sometimes speak of a functional category asstrong when I mean, more explicitly, that one of its features is strong. I amalso glossing over a possibly significant distinction between D-features andN-features, that is, among three variants of the EPP: (1) requiring a DP asspecifier, (2) requiring an NP, (3) requiring a nominal category, whether NPor DP. The differences may be significant; I will return to them in sections4.9 and 4.10. Until then, references to the EPP will be expressed in termsof strong D-features, but the intention is to remain neutral among thechoices (1), (2), (3).

214

Chapter 4

A strong feature has two properties. First, it triggers an overt operation,

before Spell-Out. Second, it induces cyclicity: a strong feature cannot bepassed by that would satisfy it, and later checked by ; that would permitRelativized Minimality violations (Wh-Island, superraising). In chapter 3 thepre-Spell-Out property was stated in terms of convergence at PF (a strongfeature crashes at PF and therefore must be removed before Spell-Out), butthat formulation was based on a stipulation that we have now dropped: thatlexical access takes place before Spell-Out. The cyclic property was left onlypartially resolved in chapter 3 (and in Chomsky 1994a).Apart from its problems and limitations, formulation of strength in termsof PF convergence is a restatement of the basic property, not a true explanation.In fact, there seems to be no way to improve upon the bare statement of theproperties of strength. Suppose, then, that we put an end to evasion and simplydefine a strong feature as one that a derivation cannot tolerate: a derivationD is canceled if contains a strong feature, in a sense we must makeprecise. A strong feature thus triggers a rule that eliminates it: [strength] isassociated with a pair of operations, one that introduces it into the derivation(actually, a combination of Select and Merge), a second that (quickly) eliminates it.Cyclicity follows at once.17 We also virtually derive the conclusion that astrong feature triggers an overt operation to eliminate it by checking. Thisconclusion follows with a single exception: covert merger (at the root) of alexical item that has a strong feature but no phonological featuresan optionnoted earlier, to which we return.It is perhaps worth mentioning in this connection that the MinimalistProgram, right or wrong, has a certain therapeutic value. It is all too easy tosuccumb to the temptation to offer a purported explanation for some phenomenon on the basis of assumptions that are of roughly the order of complexityof what is to be explained. If the assumptions have broader scope, that maybe a step forward in understanding. But sometimes they do not. Minimalistdemands at least have the merit of highlighting such moves, thus sharpeningthe question of whether we have a genuine explanation or a restatement of aproblem in other terms.We have to determine in what precise sense a strong feature cannot beincluded within a legitimate derivation. The intuitive idea is that the strongfeature merged at the root must be eliminated before it becomes part of a largerstructure by further operations. The notion part of can be understood invarious ways. There are four possibilities, based on the two ways of buildingnew structures (substitution, adjunction)18 and the two options for projection

Categories and Transformations

215

(either the category with the strong feature or the one joined with it canproject).To illustrate, take the case of T (Tense) with a strong V-feature and a strongD-feature (as in French), forcing overt V-raising to T (adjunction) and overtDP-raising to [Spec, T] (substitution). We want to know how T and somecategory K can be joined to form a larger category L, consistent with thestrength of T.Suppose T and K are joined and T projects. Suppose the operation is substitution, forming L = [TP T K] with head T and complement K, the strongfeature of T remaining in the larger structure. Plainly that is admissible; infact, it is the only way T can enter a convergent derivation. Projection of strongT, then, permits the derivation to continue when T and K are joined: the projection of T can tolerate embedded strong T.Suppose that T and K are joined and K projects. Then T is either the specifier or complement of K (substitution), or an adjunct of K (adjunction). Forreasons that will be clarified as we proceed, subsequent joining of L to T (byadjunction or substitution) is barred. In general, then, we will try to establishthe principle (2).(2) Nothing can join to a nonprojecting category.That is, nothing can join to an adjunct, specifier, or complement. Hence,we need not consider the case of nonprojecting strong T, for if a strongfeature does not project, the operation required to eliminate strength willnever apply.If this is the correct interpretation of the options, then the descriptive property of strength is (3). Suppose that the derivation D has formed containing with a strong feature F. Then(3) D is canceled if is in a category not headed by .The cases just reviewed follow, as do others. Note that this is not a principlegoverning strength but a descriptive observation about it, if (2) holds.Suppose K adjoins to TP, forming the two-segment category M = [TP K TP].By (3), the derivation tolerates the strong features of T, which can be satisfiedby later operations. This is the right result. Suppose K is an adverbial. ThenTP can be extended to M = [TP K TP] by adjunction of the adverbial K, andM can be further extended to N = [TP DP M] by insertion of DP as [Spec, T]to satisfy the EPP, yielding such expressions as John probably has left already,there probably will be snow tomorrow. In fact, the NP-I break is a typicalposition for merger of adverbials.

216

Chapter 4

Suppose we have formed TP with head T and complement K, and the strongD-feature of T (EPP) has not yet been checked. Suppose we next merge TPwith C, forming CP with head C and complement TP. That is excluded by (3);the derivation will crash. Again, this is the right result quite generally, requiredto avoid Relativized Minimality violations, as mentioned earlier.These cases are typical. We therefore assume (3) to hold for strongfeatures.While Merge is costless for principled reasons, movement is not: the operation takes place only when forced (Last Resort); and it is overt, violatingProcrastinate, only when that is required for convergence. If has a strongfeature F, it triggers an operation OP that checks F before the formation of alarger category that is not a projection of . The operation OP may be Mergeor Move.4.2.2

The Lexicon

I will have little to say about the lexicon here, but what follows does rest oncertain assumptions that should be made clear. I understand the lexicon in arather traditional sense: as a list of exceptions, whatever does not followfrom general principles. These principles fall into two categories: those of UG,and those of a specific language. The latter cover aspects of phonology andmorphology, choice of parametric options, and whatever else may enter intolanguage variation. Assume further that the lexicon provides an optimalcoding of such idiosyncrasies.Take, say, the word book in English. It has a collection of properties, someidiosyncratic, some of varying degrees of generality. The lexical entry for bookspecifies the idiosyncrasies, abstracting from the principles of UG and thespecial properties of English. It is the optimal coding of information that justsuffices to yield the LF representation and that allows the phonological component to construct the PF representation; the asymmetry reflects the difference between the N , computation and the phonological component, theformer (virtually) satisfying and the latter radically violating the principles ofuniformity and inclusiveness.One idiosyncratic property of book coded in the lexical entry is the soundmeaning relation. The lexical entry also either lists, or entails, that it has thecategorial feature [N]; we overlook open questions of some interest here.19 Butthe lexical entry should not indicate that book has Case and -features; thatfollows from its being of category N (presumably, by principles of UG). Itshould also not specify phonetic or semantic properties that are universal orEnglish-specific: predictable interactions between vowel and final consonant,or the fact that book can be used to refer to something that is simultaneously

Categories and Transformations

217

abstract and concrete, as in the expression the book that Im writing will weigh5 pounds. That is a property of a broad range of nominal expressions, perhapsallone of the many reasons why standard theories of reference are not applicable to natural language, in my opinion (see note 2).For the word book, it seems the optimal coding should include a phonological matrix of the familiar kind expressing exactly what is not predictable, anda comparable representation of semantic properties, about which much less isknown. And it should include the formal features of book insofar as they areunpredictable from other properties of the lexical entry: perhaps its categorialfeature N, and no others. The fact that Case and -features have to be assignedto book follows from general principles, and nothing intrinsic to the lexicalentry book tells us that a particular occurrence is singular or plural, nominativeor accusative (though its person feature is presumably determined by intrinsicsemantic properties). In some cases such features might be idiosyncratic (e.g.,the plural feature of scissors, or grammatical gender). More important, in somelanguages the system works very differently: for example, Semitic, with rootvowel pattern structure. But lexical entries are determined on the same generalgrounds.Suppose that book is chosen as part of the array from which a derivationproceeds to form PF and LF representations. I have described the choice ofbook as a two-step process: (1) form a numeration that includes (book, i), withindex i, and (2) introduce book into the derivation by the operation Select,which adds book to the set of syntactic objects generated and reduces itsindex by 1. The optional features of a particular occurrence of book (say,[accusative], [plural]) are added by either step (1) or step (2)presumablyby step (1), a decision that reduces reference sets and hence computabilityproblems. Suppose so. Then the numeration N will include [book, [accusative],[plural], 2], in this case (assuming index 2). The fact that these featuresare present is determined (we assume) by UG, but the choice among themis not.Recall that we are concerned here only with mechanisms, not with choicesand intentions of speakers. Over the class of derivations, then, the mappingfrom lexicon to numeration is random with regard to specification of book forCase and -features, and the index of that collection of properties, though UGrequires that there is always some choice of Case, -features, and index.This much seems fairly clear. It is hardly plausible that Case and -featuresof book are determined by its position in a clausal configuration. If the wordis used in isolation, these features will be fixed one way or another, thoughthere is no structure. One could say that there is a presupposed structure,some representation of the intentions of the speaker or (possibly) shared

218

Chapter 4

assumptions in some interchange. But that is surely the wrong course to

pursue. It is possible (and has been proposed) that nouns are automaticallyselected along with broader nominal configurations (involving Case, perhaps-features). That is a possibility, but would require positive evidence. I willassume here the null hypothesis: Case and -features are added arbitrarily asa noun is selected for the numeration.The same conclusions are appropriate in other cases. Take such constructions as (4).(4) as far as John is concerned, [CP I doubt that anyone will ever want tospeak to the fool again]Here some formal properties of John and a fool are related, but not by themechanisms of CHL. The example falls together with use of CP in isolation onthe background assumption that John is under discussion (perhaps providedby discourse context, perhaps not). The same might be true of the more interesting case of nominal expressions in languages that express arguments asadjuncts associated with pronominal elements within words (see Baker 1995and section 4.10.3). The particular form taken by such adjuncts may dependon the association, but might not be expressible in terms of local relations ofthe kind admitted in CHL (typically, H- relations, where H is a head and isin its checking domain).In the numeration, then, Case and -features of nouns are specified, whetherby the lexical entry (intrinsic features) or by the operation that forms thenumeration (optional features). Larger structures are relevant, but only forchecking of features of the noun that are already present in the numeration.Consider verbs, say, explain. Its lexical entry too represents in the optimalway the instructions for the phonological component and for interpretation ofthe LF representation: a phonological matrix, and some array of semanticproperties. It must also contain whatever information is provided by the verbitself for the operations of CHL. The lexical entry must suffice to determinethat explain has the categorial property V, perhaps by explicit listing. Whatabout its selectional features? Insofar as these are determined by semanticproperties, whether by UG or by specific rules of English, they will not belisted in the lexicon. The fact that explain has tense and -features will not beindicated in the lexical entry, because that much is determined by its categoryV (presumably by UG). The particular specification of such features, however,is not part of the lexical entry. The verb also has a Case-assigning property,which is intrinsic: either determined by properties of the lexical entry (itssemantic features) or listed as idiosyncratic. Features that are associatedwith the verb but not predictable from the lexical entry have two possible

Categories and Transformations

219

sources: they might be chosen arbitrarily as the verb enters the numeration, orthey might be the result of operations that form complex words by associationwith other elements (e.g., adjunction to T). These could be operations of theovert syntax or the phonological component (including morphology). If overtsyntactic operations are involved, the categories involved will be marked (inthe lexicon, or the transition to the numeration) as allowing or requiringaffixation.The decisions are less clear than for nouns, but the principles are the same.Whatever information feeds the phonological rules must be available for thecomputation as the item is introduced into the derivation, but the specificcharacter of this information has to be discovered, as well as whether it isprovided by optional choice of feature values as the item enters the numerationor by overt syntactic operations. Take inflected explain. Its tense and -featuresmight be chosen optionally and assigned to the word as it enters the numeration, or they might result from overt V-raising to Agr and T. Or the word mightreach the phonological component uninflected, the PF form resulting frominteraction with functional elements within the phonological component. Theanswers could vary across or within languages.20 The questions have to beanswered case by case.For my specific purposes here, it does not matter much which choices turnout to be correct, until section 4.10, when the status of functional categoriesis reassessed. For concreteness (and with an eye in part to that later reassessment), I will assume that tense and -features of verbs (say, inflected explain)are chosen optionally as the item enters the numeration, then matched by otherprocesses. But alternatives are compatible with much of what follows.A separate question is the form in which the information should be codedin the lexical entry. Thus, in the case of book, the optimal representation inthe lexicon could include the standard phonological matrix PM, or somearbitrary coding (say, 23) interpreted within the phonological component asPMpresumably the former, unless there is strong independent reason for thelatter. Similarly, in the case of [past tense], the fact that it is typically dentalcould be represented by a phonological matrix [dental], or by an arbitrarycoding (say, 47) interpreted in the phonological component as [dental] (withwhatever complications have to be coded in the entries of irregular verbs)again, presumably the former, unless there is strong independent reason to thecontrary. I will put these matters aside, assuming that they have to be settledcase by case, though in a manner that will not matter for what follows.On the simplest assumptions, the lexical entry provides, once and forall, the information required for further computationsin particular, forthe operations of the phonological component (including morphology, we

220

Chapter 4

assume). There seems to be no compelling reason to depart from the optimal

assumption.I have kept to the easiest cases. Let us move to the opposite extreme.Suppose the PF form of a lexical entry is completely unpredictable: the Englishcopula, for example. In this case the lexical coding will provide whateverinformation the phonological rules need to assign a form to the structure[copula, {F}], where {F} is some set of formal features (tense, person, etc.).It does not seem to matter (for our purposes here, at least) how this information is presented: as a list of alternants, each with its formal features, or bysome coding that allows the phonological component to pick the alternant(late, insertion).This is the worst possible case. Plainly, it would be a methodological errorto generalize the worst case to all casesto infer from the fact that the worstcase exists that it holds for all lexical items.There are many intermediate cases. Take the lexical element come-came.Some regularity can be extracted and presented as a phonological matrix,perhaps /kVm/. But the choice of vowel is not rule-governed and thereforemust be coded in one or another way. Structural linguistics devoted muchenergy to whether the information should be coded in the form of morphemealternants, item-and-process rules that change one form to another, and so on.As in the case of pure suppletion, it is not clear that there even is an empiricalissue. There are similar problems, real or illusory, concerning semantic features (idiom structure, etc.).With regard to functional categories, the same general considerations apply,though new problems arise. It is clear that the lexicon contains substantiveelements (nouns, verbs, ...) with their idiosyncratic properties. And it is atleast reasonably clear that it contains some functional categories: complementizer (C), for example. But the situation is more obscure in the case ofother possible functional categories, in particular, T, Agr, specific -features,a Case category K, and so on, which is why theories about these matters havevaried so over the years. Postulation of a functional category has to be justified,either by output conditions (phonetic and semantic interpretation) or by theory-internal arguments. It bears a burden of proof, which is often not so easyto meet.The functional categories that concern us particularly here are T, C, D, andAgr; their formal properties are the primary focus throughout. T, D, and Chave semantic properties; Agr does not. Thus, T is [ finite], with furthersubdivisions and implications about event structure and perhaps other properties. D may be the locus of what is loosely called referentiality.21 C is basically an indicator of mood or force (in the Fregean sense): declarative,

Categories and Transformations

221

interrogative, and so on. The choice among the options of a given type isarbitrary, part of the process of forming a numeration from the lexicon,as (I am tentatively assuming) in the case of -features of verbs, and Case and(some) -features of nouns.Functional categories may also have phonological properties. Thus, EnglishT is dental and declarative C is that (with a null option) (we will returnto interrogative C = Q); Japanese Cases are phonologically invariant; andso on. The lexical entry, again, provides an optimal coding for what is notpredictable.We expect different decisions in different cases, depending on details of thephonological component, the lexical inventory for the language, and perhapsmore. Suppose, for example, that specific morphological properties of a language constrain the phonetic correlate of formal features: say, that verbsindicate person with prefixes and number with suffixes, or that exactly n slotsare available for spelling out formal features. Then the lexical entries willabstract from these properties, presenting just the information that they do notdetermine. It appears to be the case for some languages that templatic conditions are required for morphological structure. If so, the option is universallyavailable, as are adjunction operations, along with whatever else the empiricalfacts may require.In all cases the principle is clear, though the answers to specific questionsare not: the lexicon provides the optimal coding for exceptions. Thoughimportant typological differences doubtless exist, there seems little reason toexpect much of any generality, among languages or often within them. PerhapsJespersen was correct in holding that no one ever dreamed of a universalmorphology, to any far-reaching degree, morphology being a primary repository of exceptional aspects of particular languages.I am keeping to the optimal assumption: that for each lexical item in aparticular language, the idiosyncratic codings are given in a unified lexicalentry. There are more complex theories that scatter the properties. One mightpropose, for example, that formal features, instructions for phonological rules,and instructions for LF interpretation appear in distinct sublexicons, which areaccessible at different points in the computational process. Such elaborationsmight also involve new levels and relations among various parts of the derivation, to ensure proper matching of (PF, LF) pairs. The burden of proof is alwayson the proposal that the theory has to be made more complex, and it is a considerable one. I will keep to the optimal assumptions, which seem to be aboutas just indicatedstressing, however, that the serious questions of implementation and perhaps general theoretical issues are scarcely even touched in theseremarks.

222

Chapter 4

I have said nothing about other major components of the theory of wordformation: compound forms, agglutinative structures, and much more. This isonly the barest sketch, intended for no more than what follows.4.3

Phrase Structure Theory in a Minimalist Framework

The development of X-bar theory in the 1960s was an early stage in the effortto resolve the tension between explanatory and descriptive adequacy. A firststep was to separate the lexicon from the computations, thus removing aserious redundancy between lexical properties and phrase structure rules andallowing the latter to be reduced to the simplest (context-free) form. X-bartheory sought to eliminate such rules altogether, leaving only the generalX-bar-theoretic format of UG. The primary problem in subsequent work wasto determine that format, but it was assumed that phrase structure rules themselves should be eliminable, if we understood enough about the matterwhich, needless to say, we never do, so (unsurprisingly) many open questionsremain, including some that are quite central to language.22In earlier papers on economy and minimalism (chapters 13), X-bar theoryis presupposed, with specific stipulated properties. Let us now subject theseassumptions to critical analysis, asking what the theory of phrase structureshould look like on minimalist assumptions and what the consequences arefor the theory of movement.At the LF interface, it must be possible to access a lexical item LI and itsnonphonological properties LF (LI): the semantic properties and the formalproperties that are interpreted there. Accordingly, LI and LF (LI) should beavailable for CHL, on the natural minimalist assumption, discussed earlier, thatbare output conditions determine the items that are visible for computations.In addition, CHL can access the formal features FF (LI), by definition. It is alsoapparent that some larger units constructed of lexical items are accessed, alongwith their types: noun phrases and verb phrases interpreted, but differently, interms of their type, and so on. Of the larger units, it seems that only maximalprojections are relevant to LF interpretation. Assuming so,23 bare output conditions make the concepts minimal and maximal projection available to CHL.But CHL should be able to access no other projections.Given the inclusiveness condition, minimal and maximal projections arenot identified by any special marking, so they must be determined fromthe structure in which they appear; I follow Muysken (1982) in taking theseto be relational properties of categories, not properties inherent to them. (Seesection 1.3.2, below (64).) There are no such entities as XP (Xmax) or Xmin inthe structures formed by CHL, though I continue to use the informal notations

Categories and Transformations

223

for expository purposes, along with X (X-bar) for any other category.A category that does not project any further is a maximal projection XP,and one that is not a projection at all is a minimal projection Xmin; anyother is an X, invisible at the interface and for computation.24 As we proceed,I will qualify the conclusion somewhat for X0 categories, which have a veryspecial role.A further goal is to show that computation keeps to local relations of toterminal head. All principles of UG should be formulated in these termswhich have to be made preciseand only such relations should be relevant atthe interface for the modules that operate there.25Given the numeration N, CHL may select an item from N (reducing its index)or perform some permitted operation on the syntactic objects already formed.As discussed earlier, one such operation is necessary on conceptual groundsalone: an operation that forms larger units out of those already constructed,the operation Merge. Applied to two objects and , Merge forms the newobject K, eliminating and . What is K? K must be constituted somehowfrom the two items and ; the only other possibilities are that K is fixed forall pairs (, ) or that it is randomly selected, neither worth considering. Thesimplest object constructed from and is the set {, }, so we take K toinvolve at least this set, where and are the constituents of K. Does thatsuffice? Output conditions dictate otherwise; thus, verbal and nominal elements are interpreted differently at LF and behave differently in the phonological component. K must therefore at least (and we assume at most) be of theform {, {, }}, where identifies the type to which K belongs, indicatingits relevant properties. Call the label of K.For the moment, then, the syntactic objects we are considering are of thefollowing types:(5) a. lexical itemsb. K = {, {, }}, where , are objects and is the label of KObjects of type (5a) are complexes of features, listed in the lexicon. Therecursive step is (5b). Suppose a derivation has reached state = {, ,i, ... , n}. Then application of an operation that forms K as in (5b) converts to = {K, i, ... , n), including K but not , . In a convergent derivation,iteration of operations of CHL maps the initial numeration N to a single syntactic object at LF.We assume further that the label of K is determined derivationally (fixedonce and for all as K is formed), rather than being derived representationallyat some later stage of the derivation (say, LF). This is, of course, not a logicalnecessity; Martian could be different. Rather, it is an assumption about how

224

Chapter 4

human language works, one that fits well with the general thesis that thecomputational processes are strictly derivational, guided by output conditionsonly in that the properties available for computational purposes are thoseinterpreted at the interface. The proper question in this case is whether theassumption (along with the more general perspective) is empirically correct,not whether it is logically necessary; of course it is not.Suppose that the label for {, } happens to be determined uniquely for , in language L, meaning that only one choice yields an admissible convergentderivation. We would then want to deduce that fact from properties of , ,Lor, if it is true for , in language generally, from properties of the language faculty. Similarly, if the label is uniquely determined for arbitrary , ,L, or other cases. To the extent that such unique determination is possible,categories are representable in the more restricted form {, }, with the labeluniquely determined. I will suggest below that labels are uniquely determinedfor categories formed by the operation Move , leaving the question open forMerge, and indicating labels throughout for clarity of exposition, even if theyare determined.The label must be constructed from the two constituents and . Supposethese are lexical items, each a set of features.26 Then the simplest assumptionwould be that is either(6) a. the intersection of and b. the union of and c. one or the other of , The options (6a) and (6b) are immediately excluded: the intersection of , will generally be irrelevant to output conditions, often null; and the unionwill be not only irrelevant but contradictory if , differ in value forsome feature, the normal case. We are left with (6c): the label is either or; one or the other projects and is the head of K. If projects, then K = {,{, }}.For expository convenience, we can depict a constructed object of type (5b)as a more complex configuration involving additional elements such as nodes,bars, primes, XP, subscripts and other indices, and so on. Thus, we mightrepresent K = {, {, }} informally as (7) (assuming no order), where thediagram is constructed from nodes paired with labels and pairs of such labelednodes, and labels are distinguished by subscripts.1

(7)2

Categories and Transformations

225

This, however, is informal notation only: empirical evidence would be required

to postulate the additional elements that enter into (7) beyond lexical features,and the extra sets. (See note 7.) I know of no such evidence and will thereforekeep to the minimalist assumption that phrase structure representation isbare, excluding anything beyond lexical features and objects constructedfrom them as in (5) and (6c), with some minor emendations as we move towarda still more principled account.The terms complement and specifier can be defined in the usual way, interms of the syntactic object K. The head-complement relation is the mostlocal relation of an XP to a terminal head Y, all other relations within YPbeing head-specifier (apart from adjunction, to which we turn directly). Inprinciple, there might be a series of specifiers, a possibility with many consequences to which we return. The principles of UG, we assume, cruciallyinvolve these local relations.Further projections satisfy (6c), for the same reasons. Any such category wewill refer to as a projection of the head from which it ultimately projects,restricting the term head to terminal elements drawn from the lexicon, andtaking complement and specifier to be relations to a head.To review notations, we understand a terminal element LI to be an itemselected from the numeration, with no parts (other than features) relevant toCHL. A category Xmin is a terminal element, with no categorial parts. We restrictthe term head to terminal elements. An X0 (zero-level) category is a head ora category formed by adjunction to the head X, which projects. The head ofthe projection K is H(K). If H = H(K) and K is maximal, then K = HP. Weare also commonly interested in the maximal zero-level projection of the headH (say, the T head of TP with V and perhaps more adjoined). We refer to thisobject as H0max.If constituents , of K have been formed in the course of computation,one of the two must projectsay, . At the LF interface, maximal K is interpreted as a phrase of the type (e.g., as a nominal phrase if H(K) is nominal);and it behaves in the same manner in the course of computation. It is natural,then, to take the label of K to be not itself but rather H(K), a decision thatalso leads to technical simplification. Assuming so, we take K = {H(K), {,}}, where H(K) is the head of and its label as well, in the cases so fardiscussed. We will keep to the assumption that the head determines the label,though not always through strict identity.The operation Merge(, ) is asymmetric, projecting either or , the headof the object that projects becoming the label of the complex formed. If projects, we can refer to it as the target of the operation, borrowing the notionfrom the theory of movement in the obvious way. There is no such thing as a

226

Chapter 4

nonbranching projection. In particular, there is no way to project from a lexical

item a subelement H() consisting of the category of and whatever elseenters into further computation, H() being the actual head and the lexicalelement itself; nor can such partial projections be constructed from largerelements. We thus dispense with such structures as (8a) with the usual interpretation: the, book taken to be terminal lexical items and D+, N+ standingfor whatever properties of these items are relevant to further computation(perhaps the categorial information D, N; Case; etc.). In place of (8a) we haveonly (8b).(8) a.

Categories and Transformations

227

{P, Q}}. These alone can be functioning elements; call them the terms of XP.More explicitly, for any structure K,(10) a. K is a term of K.b. If L is a term of K, then the members of the members of L areterms of K.For the case of substitution, terms correspond to nodes of the informal representations, where each node is understood to stand for the subtree of which itis the root.27In (9) x is the head of the construction, y its complement, and ZP its specifier. Thus, (9) could be, say, the structure VP with the head saw, the complement it, and the specifier the man with the label the, as in (11).(11)

VPV

DPthe

man

saw

it

Here V = VP = saw, and DP = the.

Note that this very spare system fails to distinguish unaccusatives fromunergatives, a distinction that seems necessary. The simplest solution to theproblem would be to adopt the proposal of Hale and Keyser (1993a) thatunergatives are transitives; I will assume so.We assumed earlier that Merge applies at the root only. In the bare system,it is easy to see why this is expected. Suppose that the derivation has reachedstage , with objects and . Then Merge may eliminate and from infavor of the new object K = {, {, }}, with label . That is the simplest kindof merger. We might ask whether CHL also permits a more complex operation:given and , select K within (or within ; it is immaterial) and constructthe new object {, {, K}}, which replaces K within . That would be anapplication of Merge that embeds within some construction alreadyformed. Any such complication (which could be quite serious) would requirestrong empirical motivation. I know of none, and therefore assume that thereis no such operation. Merge always applies in the simplest possible form: atthe root.The situation is different for Move; we will return to this matter.To complete the minimalist account of phrase structure, we have to answerseveral questions about adjunction. Let us keep to the simplest (presumablyonly) case: adjunction of to , forming a two-segment category.

228

Chapter 4

That adjunction and substitution both exist is not uncontroversial; thus,

Lasnik and Saito (1992) reject the former while Kayne (1993) largely rejectsthe latter, (virtually) assimilating specifiers and adjuncts (see section 4.8 andChomsky 1994a). Nevertheless, I will assume here that the distinction is real:that specifiers are distinct in properties from adjuncts, and A- from -positions(a related though not identical distinction).Substitution forms L = {H(K), {, K}}, where H(K) is the head (= the label)of the projected element K. But adjunction forms a different object. In thiscase L is a two-segment category, not a new category. Therefore, there mustbe an object constructed from K but with a label distinct from its head H(K).One minimal choice is the ordered pair H(K), H(K). We thus take L ={H(K), H(K), {, K}}. Note that H(K), H(K), the label of L, is not a termof the structure formed. It is not identical to the head of K, as before, thoughit is constructed from it in a trivial way. Adjunction differs from substitution,then, only in that it forms a two-segment category rather than a new category.Along these lines, the usual properties of segments versus categories, adjunctsversus specifiers, are readily formulated.Suppose that adjoins to K and the target K projects. Then the resultingstructure is L = {H(K), H(K), {, K}}, which replaces K within the structure containing K: itself, if adjunction is at the root. Recall that it is the headthat projects; the head either is the label or, under adjunction, determines ittrivially.We thus have the outlines of a bare phrase structure theory that derivesfairly strictly from natural minimalist principles. The bare theory departs fromconventional assumptions in several respects: in particular, categories areelementary constructions from properties of lexical items, satisfying the inclusiveness condition; there are no bar levels and no distinction between lexicalitems and heads projected from them (see (8)). A consequence is that anitem can be both an X0 and an XP. Does this cause problems? Are thereexamples that illustrate this possibility? I see no particular problems, and onecase comes to mind as a possible illustration: clitics. Under the DP hypothesis,clitics are Ds. Assume further that a clitic raises from its -position andattaches to an inflectional head. In its -position, the clitic is an XP; attachmentto a head requires that it be an X0 (on fairly standard assumptions). Furthermore, the movement violates the Head Movement Constraint (HMC),28 indicating again that it is an XP, raising by XP-adjunction until the final step ofX0-adjunction. Clitics appear to share XP and X0 properties, as we wouldexpect on minimalist assumptions.If the reasoning sketched so far is correct, phrase structure theory is essentially given on grounds of virtual conceptual necessity in the sense indicated

Categories and Transformations

229

earlier. The structures stipulated in earlier versions are either missing or reformulated in elementary terms satisfying minimalist conditions, with no objectsbeyond lexical features. Stipulated conventions are derived. Substitution andadjunction are straightforward. At least one goal of the Minimalist Programseems to be within reach: phrase structure theory can be eliminated entirely,it seems, on the basis of the most elementary assumptions. If so, at least thisaspect of human language is perfect (but see note 22).4.44.4.1

The Operation Move

Movement and Economy

The structure (11) will yield the sentence the man saw it when further inflectional elements are added by Merge and the specifier of the VP is raised(assuming this form of the predicate-internal subject hypothesis). The construction so formed involves the second operation that forms categories: Move(Move ). What is this operation? We have so far assumed that it workslike this.Suppose we have the category with terms K and . Then we may form by raising to target K. That operation replaces K in by L = {, {,K}}. In the optimal theory, nothing else will change in , and will be predictable. We take human language to be optimal in the former sense: there areno additional mechanisms to accommodate further changes in . As for predictability of , we hope to establish the standard convention that the targetprojects (within the class of convergent derivations), so that is H(K) or H(K),H(K), depending on whether the operation is substitution or adjunction. Thequestion does not arise for Merge, but it does for Move; we will return to thismatter. The only other operation for the moment is Delete (Delete ), whichwe have assumed to leave the structure unaffected apart from an indicationthat is not visible at the interface.The operation Move forms the chain CH = (, t()), t() the trace of .29Assume further that CH meets several other conditions (C-Command, LastResort, and others), to be spelled out more carefully as we proceed.In forming the IP derived from (11), the subject is raised to the rootof the category , targeting the projection of I and becoming [Spec, I]. Butraising of the object it targets an embedded inflectional category K thatis a proper substructure of . We have taken this to be covert raising ofthe object to [Spec, Agro], for Case and agreement. Prior to this operationwe have (in informal notation) the structure (12a) embedded in the largerstructure (12b).

230

Chapter 4

(12) a.

Agr2Agr1

VPV

b.

DP

TT

Agr2

Here T is {T, {T, K}}, where K (namely, (12a)) is {Agr, {Agr,VP}},

VP = {V, {V, DP}}.30 If we target K, merging DP and K and projectingAgr as intended, we form (13), with the raised DP the specifier of AgrP(Agrmax).AgrP

(13)DP

Agr2

Here AgrP is (Agr, {DP, K}} = L, and the term T immediately dominating itis {T, {T, L}}, not {T, {T, K}} as it was before Move raised DP.Under the copy theory of movement (section 3.5), a two-element chainis a pair , , where = . Since we distinguish among distinct selectionsof a single item from the lexicon, we can be sure that such pairs ariseonly through movement. Suppose, for example, that we have constructed theobject (14), a head, and we derive (15) from it by raising , targeting andprojecting .

K = {, {, }} and L = {, {, K}}. We are now interested in two of theterms of L, call them 1 and 2, where 1 is the term of L such that L = {,

Categories and Transformations

231

{1, K}} and 2 is the term of L such that K = {, {, t2}}. Here, 1 = 2 = .

We wish to construct the chain CH that will serve as the LF object formedfrom these two terms of L, which we call and the trace of , respectively.The operation that raises introduces a second time into the syntacticobject that is formed by the operation, the only case in which two terms canbe identical. But we want to distinguish the two elements of the chain CHformed by this operation. The natural way to do so is by inspection of thecontext in which the term appears. Given the way syntactic objects are formed,it suffices to consider the co-constituent (sister) of a term, always distinct for and its trace. Suppose, then, that raises to target M in , so that the resultof the operation is , formed by replacing M in by {N, {, M}}, N thelabel. The element now appears twice in , in its initial position and in theraised position. We can identify the initial position of as the pair , (the co-constituent of in ), and the raised position as the pair , K (K theco-constituent of the raised term in ). Actually, and K would suffice;the pair is simply more perspicuous. Though and its trace are identical, thetwo positions are distinct. We can take the chain CH that is the object interpreted at LF to be the pair of positions. In (14) and (15) the position POS1 of1 is , K and the position POS2 of 2 is , . POS1 and POS2 are distinctobjects constituting the chain CH = POS1, POS2 formed by the operation;the chain is actually (K, ), if we adopt the more austere version. We refer toCH informally as (, t()). I omit a more precise account, the point being clearenough for our purposes.The c-command relations are determined by the manner of construction ofL. Chains are unambiguously determined in this way.It may, however, be correct to allow a certain ambiguity. Recall the earlierdiscussion of ellipsis as a special case of copy intonation, the special intonation found in the bracketed phrase of (16a) (= (324) of chapter 1; see sections1.5 and 3.5).(16) a. John said that he was looking for a cat, and so did Bill [say that hewas looking for a cat]b. John said that he was looking for a cat, and so did BillHere (16b) is derived from (16a) by deletion of the bracketed phrase in thephonological component. At some point in the derivation, the bracketedelement must be marked as subject to parallelism interpretation. Assumethat this takes place before Spell-Out.31 The marking could be removal ofthe distinctions indicated by numeration, in which case the bracketed elementis in a certain sense nondistinct from the phrase it copies (the latter stillmarked by the numeration). Such a configuration might be interpreted at

232

Chapter 4

PF as assigning copy intonation to the bracketed expression, and at LF as

imposing the parallelism interpretations (a complex and intriguing matter,which has only been very partially investigated). Suppose that numerationmarkings on the copy are changed to those of the first conjunct instead ofbeing deleted. Then the antecedent and its copy are strictly identical and constitute a chain, if a chain is understood as (constructed from) a pair of terms(l, 2) that are identical in constitution. It will follow, then, that the copydeletes, by whatever mechanism deletes traces in the phonological component.At LF the two kinds of constructions will be very similar, though not quiteidentical. It will then be necessary to demonstrate that legitimate LF objects,in the sense of earlier chapters, can be uniquely identified, with chains thatconstitute arguments (etc.) properly distinguished from those involved in parallelism structures.Without pursuing intricacies here, there are strong reasons to suppose thatthe strict copy (with PF deletion) generally involves the same kind of interpretation at the interface as the nondistinct copy (with copy intonation) butsomewhat strengthened, and that the latter falls under far more general conditions that hold for a wide range of other constructions as well and that go farbeyond sentence grammar or discourse.Similar ideas might accommodate the notion of linked chains (in the senseof section 1.4.3) and chains formed by successive-cyclic movement. We willreturn to these questions.A chain CH = (, t()) formed by Move meets several conditions, whichwe take to be part of the definition of the operation itself. One of these is theC-Command Condition: must c-command its trace, so that there cannot bean operation that lowers or moves it sideways; movement is raising, inthe specific sense defined by c-command. A second requirement, which seemsnatural, is that chains meet the uniformity condition (17),32 where the phrasestructure status of an element is its (relational) property of being maximal,minimal, or neither.(17) A chain is uniform with regard to phrase structure status.A third requirement is that Move must meet the Last Resort condition onmovement, which expresses the idea that Move is driven by feature checking,a morphological property. We will return to the proper interpretation of LastResort and to empirical consequences of the conditions that we take to defineMove. Note that if deletion forms chains, as suggested earlier, these may notmeet any of the conditions that hold of the operation Move.It is meaningless to ask whether the conditions that constitute the definitionof Move can be overridden for convergence, or to ask how economy

Categories and Transformations

233

considerations apply to them. That is true whatever the proper conditions turnout to be. However formulated, these conditions are part of the definition ofthe algorithm CHL. Violating them would be on a par with making an illegitimate move in a game of chess or adding a line illegitimately to a proof. Insuch cases further questions about the object being constructed (convergence,economy, shortest game or proof, etc.) do not arise. If the proper conditionsare C-Command, uniformity, and Last Resort, then there is no meaningfulquestion about the effects of violating these conditions (or whatever othersmay be introduced).The computational system CHL is based on two operations, Merge and Move.We have assumed further that Merge always applies in the simplest possibleform: at the root. What about Move? The simplest case again is application atthe root: if the derivation has reached the stage , then Move selects andtargets , forming {, {, }}. But covert movement typically embeds andtherefore takes a more complex form: given , select K within and raise to target K, forming {, {, K}}, which substitutes for K in . The morecomplex operation is sometimes induced by economy considerations, namely,Procrastinate, which requires that some operations be covert, hence (typically)operations that embed. Furthermore, even overt X0-adjunction of to iswithin the category max headed by , hence not strictly speaking at the rooteven if max is the root category.For overt movement, targeting of an embedded category is obviously notforced by Procrastinate, or by feature strength (see (3)). Overt movementtherefore always targets the root, with the minor qualification noted in the caseof head adjunction, and is invariably cyclic.It would be interesting to strengthen this conclusion: that is, to show thatovert targeting of an embedded category (hence lowering and noncyclicraising) is not possible, hence a fortiori not necessary. Arguments to this effecthave been proposed (Kawashima and Kitahara 1994, Erich Groat, personalcommunication), based on two assumptions: (1) that c-command plays acrucial role in determining linear order in accord with Kaynes theory ofordering, to which we will return in section 4.8; (2) that the only relationsthat exist for CHL are those established by the derivational process itself(Epstein 1994). Specifically, in Epsteins theory, c-command is just therelation that holds between and elements of when is attached to by Merge or Move. If so, then if is attached to an embedded category byan operation, will enter into no c-command relation with any higherelement , so no order is established between and and the derivation crashes at PF. It follows that overt operations are never of the morecomplex type that involves an embedded category, hence must be cyclic and

234

Chapter 4

must be raising (not lowering) operations. But covert operations might notmeet these conditions, the ordering requirement being irrelevant for LFconvergence.33There are a number of questions about checking theory that remain open,which we will have to resolve as we proceed. One has to do with checking instructure (18), where F is a functional category with adjoined.(18)

FF

DP

Suppose F is Agr and is T or V, and we are interested in Case checking. DP

is in the checking domain of both and F. Its Case could therefore be checkedeither by features of or by features of F. It is reasonable to suppose that has the Case-assigning feature as an intrinsic property: either listed in itslexical entry or determined by the entry. It would suffice, then, to consideronly the checking domain of to determine whether the Case of DP has beenproperly selected (accusative if is V, nominative if is T). A more complexalternative is that F and necessarily share all features relevant to checking,and that DP is checked against F. That would mean that in the numeration, Fhas the Case-assigning property, which matches that of . As usual, I willadopt the simpler assumption in the absence of evidence to the contrary. Wewill see later that the conclusion is also empirically well motivated (also asusual, an interesting fact).Whether the same conclusion holds for -features is a matter of fact, notdecision, depending on the answers to a question raised in section 4.2.2:how are -features associated with verbs and adjectives? I have tentativelyassumed that the assignment is optional, in the transition from lexicon tonumeration. If so, then Case and -features function alike; they are not properties of the functional category Agr (F, of (18)), and there is no matching relation between F and . I will continue to assume, then, that the features of Tand V that check Case or -features of DP appear only in these categories, notin Agr, which lacks such features. The decision is of little relevance untilsection 4.10.4.4.2

Projection of Target

Consider again (11), embedded in higher inflectional categories, as in (19)

(I = [Agrs, T]).

Categories and Transformations

(19)

235

IPI

AgroPAgro

VPV

DPthe

man

saw

it

We assume that the DP the man then raises overtly, targeting IP, while itraises covertly, targeting AgrOP, each of the raised elements becoming thespecifier of the targeted category. As noted, there is another option: the phrasethat is raised might itself have projected. If it is internally complex, then itbecomes an X, with the AgrP its specifier; if it is an X0, then it becomes thehead of the new projection, with the AgrP its complement. Thus, if the raisedDP projects, the targeted IP becomes the specifier of the D head of theman, which is now a D; and the targeted AgrOP becomes the complementof it, which heads a new DP. After raising and projecting, neither the man norit is a DP, as they were before the operation took place. In preMinimalistProgram work, these obviously unwanted options were excluded by conditions on transformations and stipulated properties of X-bar theory. But weno longer can, or wish to, make recourse to these, so we hope to show thatthe conventional assumptions are in fact derivable on principled groundsthat it is impossible for Move to raise , targeting K, then projecting rather than K.Note that these questions arise only for Move, not Merge, for which theconclusion is true by definition. Let us begin with substitution, turning toadjunction later.Recall the guiding assumption: movement of targeting K is permitted onlyif the operation is morphologically driven, by the need to check some feature(Last Resort). This idea can be formulated in a number of ways. Consider threeinterpretations of Last Resort, adapted from the literature.34(20) can target K only ifa. a feature of is checked by the operationb. a feature of either or K is checked by the operationc. the operation is a necessary step toward some later operation inwhich a feature of will be checked.35

236

Chapter 4

There are various unclarities here, which will be resolved as we proceed.

Recall that Last Resort, however it is finally interpreted, is to be understoodas part of the definition of the operation Movethat is, as an attempt to captureprecisely the intuitive idea that movement is driven by morphological checkingrequirements.Suppose that raises to target K, forming L = {H(), {, K}}, L a projection of with label H() = head of . Since the operation is substitution, wehave two cases to consider.(21) a. is the head of L and K its complement.b. K is the specifier of H().Suppose case (21a). The operation is not permitted under versions (20a) or(20b) of Last Resort. No property P can be checked in the head-complementstructure that has been formed. The operation might be allowed only underinterpretation (20c): raising of is permitted so as to allow it to reach someposition from which it can then raise further to target K, where P will be satisfied. However, any position accessible from the newly formed position of headof [L K] would also have been accessible from the position of its trace; theHMC cannot be overcome in this way, nor can any other condition, it seems.Thus, the case should not arise. Since I will later propose a version of LastResort that excludes this option on principled grounds, eliminating (20c), Iwill not pursue the questions any further here.Projection of it after it raises to [Spec, Agro] is therefore barred in (19),along with many other cases; thus, it is impossible to raise V, targeting K, Vprojecting to form [VP V K] with head V and complement K.The only possibility, then, is case (21b): after raising, K = [Spec, H()]. is therefore nontrivial max (otherwise, K would be its complement); in thechain CH = (, t()) formed by the operation, t() is an Xmax. But projectsafter raising, so it is an X category. This new X category cannot itself bemoved further, being invisible to the computational system CHL; thus, interpretation (20c) is not directly relevantand in any event, we may put thematter aside, anticipating elimination of this option. Keeping to (20a) and(20b), the question is whether the operation forming L = {H(), {, K}}, withK = [Spec, H()] and heading the chain CH, is legitimate. We hope to showthat it is not.There are two lines of argument that could bar this operation. One approachis to question the legitimacy of the chain CH = (, t()) that is formed by theoperation. In fact, CH violates the uniformity condition (17), which requiresthat a and t() have the same phrase structure status, since t() is maximaland is not.

Categories and Transformations

237

Assuming this condition, then, we conclude that nonmaximal cannot

raise by substitution, targeting K, whichever element projects. If K projects,then is maximal and its trace is not, violating the uniformity condition.If projects, the uniformity condition is satisfied but K is the complement of, and the operation is barred under case (21a). The unwanted interpretationsof (19) are thus ruled out, by case (21a) for DP = D and by case (21b) forDP D. Similarly, the D head of nontrivial DP cannot raise, targeting XP (say,an AgrP), leaving the residue of the DP behind, whether D or the target projects; and the V head of nontrivial VP cannot raise, targeting K, whether V orK projects.A different approach to case (21b) considers not the chain CH but the structure L that is formed by projection of the raised element . In L, the target K= [Spec, H()] is in the checking domain of H(), but and its constituentsare not in the checking domain of H(K) (the head of K). Returning to (19), ifthe man raises, targeting IP, and the raised DP projects, then the elements ofIP are in the checking domain of the D head of the man, but the man is not inthe checking domain of the head of IP. We might ask whether a proper checking relation is established in this case.To answer this question, we have to resolve an ambiguity about checkingthat has not been sorted out carefully. The intuitive idea is that operationsinvolving Case and agreement are asymmetric. The traditional intuition is thata verb assigns Case to its object, not conversely. That asymmetry is carriedover only in part to some of the earlier approaches based on government: atransitive verb assigns Case to the DP it governs, and the head agrees with itsspecifier for checking of -featuresbut nominative Case is assigned in theSpec-head relation. With a uniform interpretation of Case as a Spec-head relation, falling together with agreement, the asymmetry intuition is again expressible, but not captured. The informal description is that the V or T head checksthe Case of the DP in Spec, not that the DP checks the head; and the -featuresof the head are determined by those of the DP in Specthe verb agrees withthe subject, not the subject with the verb. Similarly, it has been standard tospeak of the wh-phrase raised to [Spec, C] as being licensed by a Q-featureof the complementizer C, not of the latter being licensed by the raised whphrase. The intuitive basis for the distinctions is fairly clear. Case is an intrinsicproperty of a verb or I element, not of the DP that receives Case in a certainposition; and -features are properties of the DP, not of the verb or adjectivethat assumes these features in a Spec-head configuration. The question we nowface is whether these intuitions actually play a role in the computationalprocess CHL or whether, like others of ancient vintage (grammatical construction, etc.), they dissolve into taxonomic artifacts.

238

Chapter 4

If the Spec-head relation really is asymmetric in the manner supposed in

informal description, then the fact that K is in the checking domain of H()in the construction L formed by raising the moved element would not establish a checking relation, since it is H(K) that must check the features of ,while H() cannot check the features of K. If the intuition is not relevant toCHL, then a checking relation is established and the illegitimate constructionL is not barred on these grounds.These two approaches are not logically equivalent, but they overlap forstandard cases, forcing the target to project. The redundancy suggests that atleast one is incorrect. The approach in terms of uniformity has advantages: itextends to other cases and is conceptually much simpler, and we do not haveto introduce a notion of asymmetry expressed ultimately in terms of intrinsicproperties of headsand not in any simple way, as will become even clearerbelow. There is, so far as I know, no reason to suppose that the property[intrinsic] plays any role in CHL. For these reasons, I will assume the uniformityapproach.Summarizing, there are good reasons why the target, not the raised element,should project under substitution.In special cases there are other arguments that lead to the same conclusion.Keeping to substitution, suppose that the target K is a constituent of the category N = {H(K), {K, M}} projected from K. Note that K is an X0 category;otherwise, it is an X category, not a visible target. Thus, K is either H(K) oran X0 projection of H(K) formed by adjoining elements to H(K), with M itscomplement.Suppose that raises, targeting K and projecting to form L = {H(), {,K}}. L now replaces the term K in N, forming N = {H(K), {L, M}}. Thehead of N is H(K), the head of L is H(), and the head of M is H(M), alldistinct; the head of N is distinct from the head of either of its constituents.N is not a legitimate syntactic object of the sort permitted by the recursiveprocedure (5). The derivation therefore is canceled.The argument extends directly to adjunction of to K, so that the label ofL is H(), H(). It is still the case that replacement of the term K in N byL yields an illegitimate syntactic object. This case is of particular interest inthe light of the role of X0-adjunction in computation, which becomes evenmore central as we proceed.To summarize, we have the following answers to the question about projection after Move. Suppose that , K are categories in , and raises to targetK. If the operation is substitution, K must project for convergence, whether Kis embedded in (covert raising) or K = (overt raising). Suppose the operation is adjunction to H0 within N = {H, {H0, M}} = [HP H0, M], with head H

Categories and Transformations

239

and complement M. Again, the target X0 must project when adjoins to H0.Furthermore, adjunction of nonmaximal to XP (including root ) is barredby the uniformity condition.The only cases still not covered are adjunction of to K in an adjunctionstructure N = [K, M], where M is adjoined to nonminimal K or M projects.In such cases either , K, and M are all nonminimal (XPs), or they are allnonmaximal (X0s); we will return to details. We will see that YP-adjunctionto XP has a dubious status, and this marginal case even more so. We thereforeignore it, restricting attention to N = [K, M] with , K, and M all X0s, and Mprojecting to form L = {H(M), H(M), {K, M}}. This case violates plausiblemorphological conditions. K can adjoin to M only if M has a morphologicalfeature [ K] requiring it to take K as an affix. If [ K] is strong, adjunctionof K is overt; otherwise, it is covert. Suppose that adjoins to K and projects, forming N = {H(), H(), {K, }}. N replaces K in L, forming L. Thecategory L is a legitimate syntactic object, in this case, but the feature [ K]of M is not satisfied, since M now has rather than K as its affixassuming,now, that K and differ morphologically in a relevant way. The derivationtherefore crashes, for morphological reasons. If so, this case does not exist.There are, then, fairly solid grounds for assuming that the target of movement projects, whether the operation is substitution or adjunction, overt orcovert.4.4.3

Last Resort: Some Problems

So far we have found no special reason to adopt interpretation (20c) of Last

Resort, the version that is most problematic. This was one component of theprinciple Greed assumed in the preceding chapters and Chomsky 1994a,holding in the form (22).(22) Move raises only if morphological properties of itself would nototherwise be satisfied in the derivation.On this assumption, we cannot, for example, derive (23b) from (23a) byraising, with the meaning (23c), violating Greed to satisfy the EPP (the strongD-feature of I); and (24a) cannot be interpreted as something like (24b), withcovert raising.36(23) a. seems [(that) John is intelligent]b. *John seems [(that) t is intelligent]c. it seems (that) John is intelligent(24) a. *there seem [(that) [A a lot of people] are intelligent]b. it seems (that) a lot of people are intelligent

240

Chapter 4

Assuming Greed, the unwanted computations are barred; all relevant properties of John and a lot of people are satisfied without raising.These computations are, however, legitimate under interpretation (20b) ofLast Resort, since a feature of the target K is satisfied by the operation: in (23)the strong D-feature of matrix I (EPP) is checked by raising of the DP John;and in (24) covert raising of the associate A satisfies the Case or agreementfeatures of matrix I (or both), or some property of there.Raising of targeting K is barred by (22) unless some property of is satisfied by its moving to, or through, this position, and that property would nothave been satisfied had this operation not applied. Consistent with Greed, suchmovement would be permitted if there were no other way for to reach aposition where its features would eventually be satisfied. One can think ofcircumstances under which this eventuality might arise, though computationalproblems are not inconsiderable. Instead of pursuing the matter, let us see ifthere is a simpler way to proceed.4.4.4

Move F

The proper formulation of Greed poses complicated questions, which, one

might suspect, do not arise if the theory of movement is properly formulated.Let us therefore consider a narrower conception that eliminates the wholerange of options permitted by interpretation (20c) of Last Resort, therebyavoiding these questions entirely.So far I have kept to the standard assumption that the operation Moveselects and raises it, targeting K, where and K are categories constructed from one or more lexical items. But on general minimalist assumptions, that is an unnatural interpretation of the operation. The underlyingintuitive idea is that the operation Move is driven by morphological considerations: the requirement that some feature F must be checked. The minimaloperation, then, should raise just the feature F: we should restrict in theoperation Move to lexical features. Let us investigate what happens ifwe replace the operation Move by the more principled operation Move F,F a feature.We now extend the class of syntactic objects available to the computationalsystem. Along with those permitted by the procedure (5), we allow also (25).(25) K = {, {, }}, where , are features of syntactic objects alreadyformed.The extension holds only for Move; it is vacuous for Merge. So far we haveconsidered only one case of the form (25), namely, K = {, {F, }}, where Fis raised to target . We will see that the extension is even narrower: if raises

Categories and Transformations

241

to target , then must be a full-fledged category and may (and in a certain

deeper sense must) be a feature.One question arises at once: when F is raised to target K, why does F notraise alone to form {, {F, K}}? Suppose that the subject raises to [Spec, IP].The simplest assumption would be that only the formal features of the headinvolved in feature checking raise to this position, leaving the rest of the DPunaffected. Why is this not the case? The answer should lie in a naturaleconomy condition.(26) F carries along just enough material for convergence.The operation Move, we now assume, seeks to raise just F. Whatever extrabaggage is required for convergence involves a kind of generalized piedpiping. In an optimal theory, nothing more should be said about the matter;bare output conditions should determine just what is carried along, if anything,when F is raised.For the most partperhaps completelyit is properties of the phonologicalcomponent that require such pied-piping. Isolated features and other scatteredparts of words may not be subject to its rules, in which case the derivation iscanceled; or the derivation might proceed to PF with elements that are unpronounceable, violating FI. There may be a morphological requirement thatfeatures of a single lexical item must be within a single X0 (see McGinnis1995). In any event, properties of the phonological component have a major(perhaps the total) effect on determining pied-piping.To take a concrete example, suppose that the words who, what have threecomponents: the wh-feature, an abstract element underlying indefinite pronouns, and the feature [ human].37 Suppose interrogative C (= Q) is strong,as in English. The wh-feature cannot overtly raise alone to check Q becausethe derivation will crash at PF. Therefore, at least the whole word who, whatwill be pied-piped in overt raising. Suppose that who appears in the phrasewhose book, which we assume to have the structure (27), with D the possessiveelement and book its complement.(27)

DPwho

Ds

book

Suppose that Move F seeks to raise the wh-feature to check strong Q, piedpiping who and leaving the residue - s book. That too crashes at PF (at least).

242

Chapter 4

And whose cannot raise because it is not a syntactic object at all, hence notsubject to movement. Therefore, the smallest category that can be raised bythe operation Move [wh-] in this case is the phrase whose bookthough asfar as the computational procedure is concerned, it is only the feature [wh-]that is raising; the rest is automatically carried along by virtue of the economycondition (26).PF convergence is determined in this case by a morphological property ofthe determiner D = Possessive. Suppose that these properties of D did not barextraction of the wh-phrase. Then violation of the Left-Branch Conditionshould be permitted. Uriagereka (1988) found a correlation between leftbranch extraction and richness of D, in a sense he characterizes: the LeftBranch Condition holds for languages with D rich. The correlation follows,he observes, if the reasoning outlined here is correct.Just how broadly considerations of PF convergence might extend is unclear,pending better understanding of morphology and the internal structure ofphrases. Note that such considerations could permit raising without piedpiping even overtly, depending on morphological structure, as in the theory ofovert raising of empty operators in Japanese developed by Watanabe (1992).Pied-piping might in principle depend as well on factors that constrainmovement: barriers, Empty Category Principle (ECP) considerations, theMinimal Link Condition (MLC) that requires shortest moves, or whateverturns out to be the right story for this much-studied but still murky area. Inthe case of all such principles, one open question has been whether violationcauses a derivation to crash or allows it to converge as deviant (say, a Subjacency violation vs. an ECP violation). The question could have an answerin the terms now being considered. Thus, if pied-piping is forced by theneed to satisfy some principle P, we conclude that violation of P causes thederivation to crash so that it does not bar less economical derivations withoutpied-pipingfor example, the principle P that sometimes bars prepositionstranding.Any further elaboration would be a departure from minimalist assumptions,hence to be adopted only insofar as that is forced on empirical grounds: never,in the best case. A host of problems arise that look difficult. The basic task isto determine how much of a departure (if any) is required from these optimalassumptions to account for generalized pied-piping; how PF and LF considerations enter into the picture; what these considerations imply about thestructure of phrases and the status and nature of conditions on movement; andhow language variation is determined.As noted by Hisa Kitahara and Howard Lasnik, the proposed economyprinciple provides a further rationale for the principle Procrastinate: nothing

Categories and Transformations

243

at all is the least that can be carried along for convergence, and that is possibleonly if raising is covert, not entering the phonological component.Consider now the case of covert movement. Questions of PF convergencedo not arise, so generalized pied-piping could only be required by conditionson movement. Earlier discussion of Move assumed that the principles thatgovern the operation hold only for categories, since only categories wereassumed to move. If that happens to be true, then these principles holdonly of overt movement, which has to carry along whole categories for PFconvergence. The conclusion could well be true for other reasons even if theassumption is false. If the conclusion is true (for whatever reason), then covertraising is restricted to feature raising. The operation Move F carries alongexcess baggage only when it is heard in the phonetic output. I will assumethat to be the case. The assumption accords well with the general minimalistperspective, and it has no obvious empirical flaw.We tentatively assume, then, that only PF convergence forces anythingbeyond features to raise. If that turns out to be the case, or to the extent thatit does, we have further reason to suspect that language imperfections arisefrom the external requirement that the computational principles must adapt tothe sensorimotor apparatus, which is in a certain sense extraneous to thecore systems of language as revealed in the N , computation.When the feature F of the lexical item LI raises without pied-piping of LIor any larger category , as always in covert raising, does it literally raisealone or does it automatically take other formal features along with it? Thereare strong empirical reasons for assuming that Move F automatically carriesalong FF(LI), the set of formal features of LI. We therefore understand theoperation Move F in accord with (28), where FF[F] is FF(LI), F a feature ofthe lexical item LI.(28) Move F carries along FF[F].This much pied-piping is automatic, reflecting the fact that Move relates tochecking of formal features. Broader pied-piping is as required for convergenceextraneous, insofar as PF convergence is the driving factor, whichwe tentatively assume to mean always.Applied to the feature F, the operation Move thus creates at least one andperhaps two derivative chains alongside the chain CHF = (F, tF) constructedby the operation itself. One is CHFF = (FF[F], tFF[F]), consisting of the set offormal features FF[F] and its trace; the other is CHCAT (, t), a categorycarried along by generalized pied-piping and including at least the lexical itemLI containing F. CHFF is always constructed, CHCAT only when required forconvergence. The computational system CHL is really looking at CHF, but

244

Chapter 4

out of the corner of its eye it can see the other two as well. Each enters intooperations. Thus, CHCAT determines the PF output, and CHFF enters into checking operations in a manner to which we will return. As noted, CHCAT shouldbe completely dispensable, were it not for the need to accommodate to thesensorimotor apparatus.The empirical questions that arise are varied and complex, and it is easyenough to come up with apparent counterevidence. I will put these problemsaside for now, simply assuming the best outcome, namely, that UG settles thematterhardly an innocuous step, needless to say. I assume, then, that theoperation Move raises F and derivatively raises FF[F] as well, carrying alonga phrase containing F only when the movement is overt, as required for convergence. The general approach is natural if not virtually obligatory on minimalist grounds, and it confers a number of advantages, as we will see.Note that we continue to rely on the assumption that only convergent derivations are compared for economythat the admissible derivations DA are asubset of the convergent ones DC. Thus, raising without pied-piping is moreeconomical in some natural sense, but that is irrelevant if the derivation doesnot converge.We have already considered a special case that resembles the economyprinciple (26): namely, such operations as wh-movement. As discussed insection 3.5, the entire wh-phrase need not raise covertly for feature checkingand scope determination, and perhaps does not; thus, we found good reasonto believe that nothing more than how many raises covertly from within thephrase how many pictures of John. A natural extension of that analysis is thatonly the wh-feature raises in the covert operation, the rest of the phrase remaining in situ.The revision of Move to Move F extends this reasoning to all cases. Italso permits a way to capture the essence of Last Resort (to be revised) as aproperty of the operation Move F.(29) F is unchecked and enters into a checking relation.Thus, the variable F in Move F ranges over unchecked features, and the resultof the operation is that it enters into a checking relation, either checking afeature of the target or being checked itself.38We are now tentatively assuming that if all features of some category have been checked, then is inaccessible to movement, whether it is a heador some projection. But if some feature F is as yet unchecked, is free tomove. Economy conditions exclude extra moves and anything more than theminimal pied-piping required for convergence. In covert movement, featuresraise alone. Procrastinate expresses the preference for the covert option.

Categories and Transformations

245

This simple and natural reinterpretation of Move , already motivated for

wh-movement, allows us to eliminate the complexities of interpretation (20c)of Last Resort entirely, a welcome outcome. We can dispense with (20b) aswell: the raised feature F must enter into a checking relation, which is onlypossible if the target K has an as-yet-unchecked feature. Thus, we have avery narrow and restrictive interpretation of Greed, incorporated straightforwardly into the definition of Move. We will return to further improvements,resolving ambiguities and imprecision and bringing in crucial properties so faroverlooked.In the cases just discussed, the intended consequences follow directly. Thefirst problem was to bar movement of targeting K, with rather than Kprojecting. The argument given before carries over without change. We therefore retain the conclusion that the target projects. Such cases as (23)(24) alsofall into place: though the target is legitimate, having an unchecked feature,the category to be raised is invisible to movement, having no unchecked features (an analysis to be revised below). Raising under ECM is permitted ifsome feature is checked: the strong feature of the embedded I that yields theEPP, in this case. The principle of Greed seems dispensable, except insofar asit is incorporated within (29).Consider successive-cyclic wh-movement. It is allowed under this approachonly where there is a morphological reflex. Sometimes this is visible at PF, asin Irish and Ewe (see Collins 1993); it remains an open question whether suchvisibility is only an accident of morphology, revealing the workings of aprocess that is more general, perhaps universal, even if morphological reflexesare not detected in the PF output.Adjunction to nonminimal XP is now barred unless some feature isthereby checked (see Oka 1993, for development of this possibility); successive-cyclic adjunction is even more problematic. The condition could berestricted to A-movement, or perhaps modified to include satisfaction of theMLC alongside feature checking, though consequences are rather complex.See Collins 1994b for further discussion; we will return to questions aboutXP-adjunction.Among the matters still to be clarified is the status of the MLC. The preferred conclusion would be that the MLC is part of the definition of Move:Move F must observe this condition, making the shortest move permissible.If that can be established, it will sharply reduce the computational complexityof determining whether a particular operation OP in the course of a derivationis legitimate. In contrast, if the MLC is an economy condition selecting amongderivations, OP will be permissible only if no other convergent derivation hasshorter links. It is hard to see even how to formulate such a condition, let alone

246

Chapter 4

apply it in some computationally feasible way; for example, how do we

compare derivations with shorter links in different places? But the questiondoes not arise if violation of the MLC is not a legitimate move in the firstplace. Following the usual minimalist intuition, let us assume that violation ofthe MLC is an illegitimate move, exploring the issues as we proceed.Suppose that F raises, carrying along the rest of a category , targeting K.By version (29) of Last Resort, the operation is permitted only if it satisfies achecking relation. We therefore have to have an elementary way to determinethe features of and K that enter into this checking relation, no matter howdeeply embedded these are in and K.For the raised element , the question does not arise. It is the feature F itselfthat must enter into the checking relation, by (29); other features of FF[F] mayalso enter into checking relations as free riders, carried along in the derivative chain CHFF = (FF, tFF) that is automatically constructed, but that is easilydetectable, given F. If a checking relation is established by merger of in thechecking domain of , then the relevant features in the new checking domainare those of the head of , which are immediately determined by its label.39Questions arise, then, only with regard to the category K that is the target ofmovement, gaining a checking domain (either an adjunct or a specifier) byvirtue of the operation.Suppose that K = {, {L, M}} is the target of movement. Then a feature FKof K may enter into a checking relation if it is within the zero-level projectionH0max of the head H of K. H and H0max are constructed trivially from the label, which is immediately determined by inspection of K. FK will be a featureeither of H itself, or of some element adjoined to H, and so on; this will besimplified even further in section 4.10. Recall that we are keeping to theoptimal assumption that not only H but also features adjoined to it can enterinto a checking relation with in the checking domain (see end of section4.4.1).For the target, then, determination of the relevant features is also trivial:these are the features associated with the label, which we may callsublabels.(30) A sublabel of K is a feature of H(K)0max.That is, it is a feature of the zero-level projection of the head H(K) of K.When Move F raises F to target K, some sublabel of K must legitimize theoperation by entering into a checking relation with F, and features of FF[F]may also enter into checking relations with sublabels of K as free riders.The features that legitimize the operation raising to target K are thereforedetermined straightforwardly, however deeply embedded they may be in and

Categories and Transformations

247

K: for example, the wh-feature in pictures of whose mother did you think wereon the mantelpieces. The computation looks at only F and a sublabel of K,though it sees more. The elementary procedure for determining the relevantfeatures of the raised element is another reflection of the strictly derivationalapproach to computation.To take a concrete case, suppose that at LF the head of IP is T with theverbal element adjoined to it, assuming all other functional categories to beirrelevant at this point (we will return to this assumption).(31)

The operation Move F forming (31) raises the categorial feature v of theverb V, carrying FF[v] along automatically in a derivative chain. If the operation is covert, as in English, then nothing else happens: in (31) is v. If theoperation is overt, as in French, then is V itself, pied-piped to allow convergence at PF. In either case v must enter into a checking relation with theaffixal feature [ v] (takes verbal affix) of T, and any other feature of FF[v]can in principle enter into a checking relation with a feature of T (= T0max,before raising, by assumption). The sublabels of IP, so formed, are the featuresof T and of .Similarly, when the Case feature of LI is raised by Move F, so are the-features of LI, and any of these free riders may also enter into a (derivative)checking relation with a sublabel of the target. For example, raising of DP forCase checking carries along -features, which may happen to check agreementfeatures of the target. We will return to various consequences.Bringing these ideas together, we have the following theory of the operationMove. Move raises feature F to target K in only if (32) holds, with (33a)and (33b) as automatic consequences, and (33c) a further consequence(assumed, but not fully established).(32) a. F is an unchecked feature.b. F enters into a checking relation with a sublabel of K as a result ofthe operation.(33) a. FF[F] raises along with F.b. A category containing F moves along with F only as required forconvergence.c. Covert operations are pure feature raising.Other features of FF[F] may check a sublabel of K as free riders. (32a) and(32b) incorporate Last Resort.

248

Chapter 4

Let us turn now to several issues that come to the fore when we formulatemovement theory as Move F.The recursive step in the definition (5) of admissible objects permitted theconstruction of L = {, {, }}, where , are syntactic objects and is thelabel of L. In earlier discussion we kept to the case where , are lexicalitems or larger phrases constructed from them, but we have now been considering a more general case, with the variables allowed to range over features aswell. Specifically, we allow an object L = {, {F, K}}, F a feature, formed byraising F to target K without pied-piping of a category . Several questionsarise, including the following:(34) a. Can the operation be substitution?b. Must the target project?c. Can K be a feature rather than a category?The answers depend on how we interpret such notions as Xmax and head,which so far have been defined only for phrases constructed from lexical items,not for features. But these notions have no clear sense for features. Suppose,then, that the definitions given earlier carry over without change. If so, thequestions of (34) are settled.Suppose the feature F is raised, targeting K and forming {, {F, K}}.The answer to question (34a) is negative. F cannot become a complementfor reasons already discussed. It must therefore be a specifier of K, hence anXmax by definition. The statement is meaningless if the notions Xmax (etc.) arenot defined for X a feature. If they were defined, then F would be a new kindof Xmax and the chain formed would violate the uniformity condition (17),under any natural interpretation. In either case, then, the operation must beadjunction of F to K. Move F can be substitution only in overt movement, witha category pied-piped for convergence.As for question (34b), the target must project; cannot be (or be constructedfrom) the head of F if the notion head of F is not defined. Question (34c)is also answered: K cannot be a feature; if it is, the object constructed willhave no label.On plausible assumptions, the class of permissible objects is extended onlyvery slightly by extending (5) to (25), permitting the variables , to rangeover features in the recursive step of the characterization of syntactic objectsK = {, {, }}. In fact, the only new objects allowed are those formed bycovert adjunction of features to a headwhich amounts to saying that a formalproperty of a lexical item can covertly enter the checking domain of a category,the question of PF convergence being irrelevant.40 Furthermore, we see thatthe target always projects in this case, as desired.

Categories and Transformations

249

From a more fundamental point of view, the class of permissible objects is

radically limited by these revisions. The only real syntactic objects arelexical items and L = {, {F, K}}, F a feature, K a projecting category, and constructed from H(K). This view captures rather closely the concept of movement (transformations) toward which work in generative grammar has beenconverging for many years: last resort operations driven by morphologicalrequirements, which vary within a narrow range, yielding crucial typologicaldifferences. Other objects are formed only as required for convergenceperhaps only PF convergence, illustrating again the extraneous character ofthe link to sensorimotor systems.Suppose that the target K is nonminimal. A reasonable conjecture is that theobject formed, with a feature adjoined to a pure (nonminimal) maximal projection, would be uninterpretable at LF; independently, we will see that there areempirical reasons to suppose that an element adjoined to nonminimal K is notin the checking domain of its head H(K), so that the operation would be barredby Last Resort. Assuming this, we conclude that pure feature raisinghenceall covert raisingis adjunction of a feature to a head, which projects. Theonly new objects L = {, {F, K}} allowed are those constructed by adjoiningthe feature F to the head K, which projects, so that gives the type of K.We have already found that in the case of category movement, the targetalways projects. The conclusion is now general, covering all cases of movement, with questions remaining only for YP-adjunction to XP.The picture is very simple and straightforward, and the arguments followon assumptions that seem conceptually natural and in accord with the Minimalist Program. If it is close to accurate, then human language is surprisinglyclose to perfect, in the sense described. Whether the conclusions are empirically correct is another question, hardly a trivial one.4.4.5

Covert Raising

The shift of perspective just outlined has broader consequences. In the case ofwh-movement, if the operator feature [wh-] is unchecked, it raises to an appropriate position, covertly if possible (by Procrastinate) and thus without piedpiping. If raising is overt, then pied-piping will be determined (we hope) byPF convergence and morphological properties of the language. Similarly, ifthe grammatical object Obj raises for checking of Case or some other formalfeature, then the features FF[F] of its head raise derivatively, and the operationcarries along a full category only if the movement is overt. If raising is overt,then Obj becomes [Spec, AgrO]. If it is covert, then the features FF[F] raisealone, adjoining to AgrO, which has V (or its relevant features) already adjoinedto it.41

250

Chapter 4

The same should hold for raising of subject Subj. Its unchecked features areeligible for raising. The operation is substitution with pied-piping if overt (say,to satisfy the EPP), and it is adjunction to the appropriate head without piedpiping if covert (perhaps in VSO languages).Subj and Obj should function in much the same way at LF whether theyhave raised overtly as full categories or covertly as features. In either case theraised element contains at least FF (LI), LI the head of Subj or Obj. FF(LI)includes the categorial feature of the nominal phrase and should have argument(A-position) properties, including the ability to serve as a controller or binder.In their reanalysis and extension of observations of Postal (1974), Lasnik andSaito (1991) argue that this is true for object raising: Obj raised covertly to[Spec, AgrO] for Case checking has basically the same properties as an overtobject in this regard, as illustrated for example in (35), with judgments somewhat idealized.(35) a.

the DA [proved [the defendants to be guilty] during each others

trials]b. *the DA [proved [that the defendants were guilty] during eachothers trials]c. the DA [accused the defendants during each others trials]

For the conclusions to carry over to the Move F theory, it must be that thefeatures adjoined to AgrO also have A-position properties, c-commandingand binding in the standard way. There is every reason to assume this tobe true.Consider such expletive constructions as (36ab).42(36) a. there is a [book missing from the shelf]b. there seem [t to be some books on the table]Agreement is with the associate of the expletive (namely, book-books), whichin our terms requires that the -features of the associate raise to the checkingdomain of matrix I. But the operation is covert. Therefore, it is not the associate that raises but its unchecked features, leaving the rest in situ. The naturalassumption, once again, is that these features adjoin to I, not to its specifierthere.43Interpretations of (37) would therefore be roughly as in the paired cases(38)though only roughly, because on this analysis, only formal features ofthe associate raise, leaving its semantic features behind.(37) a. there is considerable interest (in his work)b. there arent many pictures (on the wall)c. there are pictures of many presidents (on the wall)

Categories and Transformations

251

(38) a. interest is considerable (in his work)

b. pictures arent many (on the wall)c. pictures are of many presidents (on the wall)Similarly in other cases. The general conclusions about expletive constructions follow. Specifically, the associate must have unchecked features inorder to be accessible for raising, so that we account for such standard examples as (24a) (see also note 36); we will return to some other locality effects.The HMC is largely inoperative, however it is understood to apply to featuremovement.44It also follows that the expletive there cannot have checked all the featuresof I; if it had, I would not be a legitimate target for the associate. Plainly, therechecks the strong feature of I (EPP); otherwise, expletive constructions suchas (37) would not exist in the first place. But there must lack Case or -features,or both; otherwise, all features of I will be checked and the associate will notraise. There will be no way to express agreement of matrix verb and associate;(39a) will have the same status as (39b).(39) a. *there seem to be a man in the roomb. there seems to be a man in the roomCovert raising to AgrS places the features of the associate in a structuralposition with the essential formal properties of [Spec, AgrS]. We thereforeexpect the associate to have the binding and control properties of the overtsubject, analogously to the case of covert object raising to AgrO (see (35)). Theissues take a somewhat sharper form in a null subject language. Here we expectthat the counterparts to such expressions as (40) should be admissible, contrasting with (41).(40) a. there arrived three men (last night) without identifying themselvesb. there arrived with their own books three men from England(41) a. *I met three men (last night) without identifying themselvesb. *I found with their own books three men from EnglandThat appears to be correct. Thus, we find the following contrasts betweenItalian (42ab) and French (42cd):45(42) a.

sono entrati tre uomini senza identificarsi

are entered three menwithout identifying themselvesthree men entered without identifying themselvesb. nesono entrati tret senza dire una parolaof. them are entered three without saying anythingof-them three entered without saying anything

252

Chapter 4

c. *ilest entrtrois hommes sanssannoncerthere is entered three menwithout identifyingthemselvesd. *ilenest entrtrois t sanssannoncerthere of. them is entered three without identifyingthemselvesIn Italian, with null subject expletive sharing the relevant properties ofEnglish there, LF raising to I of Subj (actually, its formal features) assignsA-position properties to Subj for binding and control, including the case ofne-extraction that makes it clear that Subj is overtly in the internal domain ofthe verb (object position, basically). In French, with the full NP expletive ilanalogous to English it, the LF operation is barred, all features of the matrixI-phrase, the potential target, having already been checked by the expletive.Accordingly, there is no covert raising, hence no binding or control.Consider the German analogue (43).(43) essind gestern viele Leute angekommen, ohnethere are yesterday many people arrivewithoutsichzu identifizierenthemselves to identifymany people arrived yesterday without identifying themselvesHere agreement is with the associate, not the expletive, and the binding andcontrol properties are as in (42), as predicted.46Agreement with the associate, then, appears to correlate with matrix-subjectbinding and control properties for the associate, as expected on the minimalistassumption that Case and agreement are local Spec-head relations and thatfeatures raise under Last Resort, covertly if possible. We will return to a closerlook at the factors involved.Note that the entire discussion relies on the assumption that Case and-features of a noun N are part of its internal constitution, either intrinsic toit or added optionally as N is selected from the lexicon for the numeration.Therefore, these features form part of FF[N] and function within the packageof formal features that drive computation, raising as a unit. We have seen thatthe conclusion is motivated on independent grounds; it is confirmed by thecentral role it plays within the computational system, which will be furtherconfirmed as we proceed. Abandonment of the conclusion (say, by taking Caseor -features of N to be separate lexical categories with their own positions inphrase markers) would cause no slight complication.Though core predictions appear to be verified, many questions arise.One immediate problem is that the raised associate cannot be a binder in

Categories and Transformations

253

such expressions as (44), where t is the trace of there (see Lasnik andSaito 1991).(44) *there seem to each other [t to have been many linguists given goodjob offers]We know that this is an expletive-associate construction with associate agreement, as shown by replacing of each other with us. That leaves us with anapparent direct contradiction: the associate both can and cannot bind.The solution to the paradox might lie within binding theory. Suppose thatan LF movement approach of the kind mentioned in chapter 3, and developedin detail elsewhere in the literature, proves correct. Then the head of the matrixclause of (44), at LF, would have the structure (45a) or (45b), depending onhow covert operations are ordered, where An is the anaphor and is the X0complex formed from I and the matrix V.(45) a. [I An [FF(linguists) ]]b. [I FF(linguists) [An ])On reasonable assumptions, neither of these structures qualifies as a legitimatebinding-theoretic configuration, with An taking FF(linguists) as its antecedent.No such problem would arise in the examples (40) and (42), or in such standardexamples as (46).(46) they seemed to each other [t to have been angry]These phenomena provide further evidence that the features of the associateraise to I rather than adjoining to the expletive, over and above the fact thatthis operation is the normal one while adjunction from the associate positionto the expletive would be without any analogue. If adjunction were in fact tothe expletive, then there might be no relevant difference between (44) and (46).The phenomena also provide additional evidence for an LF movement analysisof anaphora.Overt raising of Subj and Obj to Spec yields an A-chain. What about thecovert analogue? Is the position of the adjoined features of Subj and Objalso an A-position? It is not clear that it matters how (or if) the questionis decided; though A- and A -positions differ in the usual properties, it isnot clear that they have more than a taxonomic role in the MinimalistProgram. But suppose that an answer is required. Then we conclude thatcovert adjunction of features of Subj and Obj establishes an A-chain: theconcept A-position should cover the position occupied by the formal featuresof Subj and Obj both before and after the adjunction operation. We havetaken A-positions to be those narrowly L-related to a head H. Adapting theterminology of section 1.3.2, let us add all sublabels of Hmax to the positions

254

Chapter 4

narrowly L-related to H, including H itself, features of H, and any feature

adjoined to H.47These conclusions appear to accord with binding and control properties ofcovertly raised object and subject, on standard assumptions. It also followsthat relativized minimality effects (in Rizzis (1990) sense) should be those ofA-chains, though that may well follow from independent considerations.4.5

Interpretability and Its Consequences

We have now reached the point where distinctions among the various kindsof formal features of FF(LI) are becoming important. Let us take a closerlook at these, continuing to assume that F automatically carries along FF[F]and, if overt, full category as required for convergence (perhaps just PFconvergence).4.5.1

Types of Features

Along with others, the following distinctions among features are worth noting:(47) a.b.c.d.

As discussed earlier, there are further distinctions that cross-cut those of (47):some features are intrinsic, either listed in the lexical item LI or determinedby listed features; others are optional, added arbitrarily as LI enters thenumeration.Suppose we have a convergent derivation for (48).(48) we build airplanesIntrinsic features of the three lexical items include the categorial features,[1 person] in FF(we), [3 person] and [human] in FF(airplanes), [assign accusative Case] in FF(build), and [assign nominative Case] in FF(T). Optionalfeatures include [plural] for the nouns and the -features of build.As already discussed, these distinctions enter into informal descriptiveusage. The distinctions also correlate more or less with other facts. Thus, the-features of a DP specifier commonly show up both on the DP and on theverbal head, but the Case feature of DP does not appear on the head. There isat least a tendency for -features to be overtly manifested when raising to thechecking domain is overt rather than covert, as in verbal agreement withsubject versus object in nominative-accusative languages with the EPP, or

Categories and Transformations

255

visible participial agreement in French as a reflex of overt raising. In the Move

F theory, the difference reduces to [Spec, H] versus [H F H] constructions,-features tending to be overt on H in the former but not the latter. Let ustentatively assume this to be the case, though a principled explanation islacking, and the empirical facts plainly require much closer scrutiny over a farbroader range.The intrinsic-optional distinction plays virtually no role here, but there is amuch more important distinction that has so far been overlooked. Evidently,certain features of FF(LI) enter into interpretation at LF while others areuninterpretable and must be eliminated for convergence. We therefore havea crucial distinction interpretable. Among the Interpretable features arecategorial features and the -features of nominals.48 The operations thatinterpret (48) at the LF interface will have to know that build is a V and airplanes an N with the -features [plural], [human], [3 person]. On the otherhand, these operations have no way to interpret the Case of airplane orthe agreement features of build, which must therefore be eliminated for LFconvergence.Interpretability at LF relates only loosely to the intrinsic-optional distinction. Thus, the optional feature [plural] of nouns is Interpretable, hence noteliminated at LF. The Case features of V and T are intrinsic but Interpretable,hence eliminated at LF (assuming that they are distinguished from the semantic properties that they closely reflect). It follows that these features of thehead must be checked, or the derivation crashes. The Interpretable features,then, are categorial features generally and -features of nouns.49 Others areInterpretable.Interpretability does relate closely to the formal asymmetry of the checkingrelation, which holds between a feature F of the checking domain of the targetK and a sublabel F of K. F is always Interpretable: strength of a feature,affixal features, the Case-assigning feature of T and V, -features of verb andadjective. The target has Interpretable features, such as its categorial features,but these never enter into checking relations. F in the checking domain,however, can be an Interpretable feature, including categorial and -features.These differences between checker (within the target) and checked (within thechecking domain) play a certain role in computation. They give some meaningto the intuitive asymmetry, though with only weak correlation to informalusage, as the notion agreement shows.These descriptive observations raise two obvious questions: (1) Why is asublabel F of the target that enters a checking relation invariably Interpretable? (2) Being Interpretable, why is F present at all? Question (2) is partof a more fundamental one: why does language have the operation Move? If

256

Chapter 4

it does, and if the operation is morphology-driven as we assume, then there

must be feature checkers in the targeted category. The fact that these arealways Interpretable again highlights the special role of the property of displacement of categories that is characteristic of human language: the solefunction of these feature checkers is to force movement, sometimes overtly.These questions begin to fall into place as we look more closely at the theoryof movement.Case differs from -features in that it is always Interpretable, for both termsof the checking relation. Case is therefore the formal feature par excellence,and it is not surprising that this entire line of inquiry has its origins in Vergnauds Case Filter.4.5.2

Checking Theory

Interpretability at LF is determined by bare output conditions and is clearly

an important property of features. Attending to it, we see at once that the notionof checking so far proposed is defective in fundamental ways, and the sameis true of earlier versions. These were unclear about the status of a checkedfeature, but did not differentiate among the cases. We see, however, that thereare crucial differences depending on Interpretability. In earlier sections here,we took checking to be deletion. A checked feature, then, is accessible to thecomputational system, but not visible for interpretation at LF. But that cannotbe correct. Some features remain visible at LF even after they are checked:for example, -features of nouns, which are interpreted. And some plainly arenot accessible to the computational system when checked: the Case feature ofnouns, for example, which cannot be accessed after checking.We therefore have to give a more nuanced analysis of the relation betweenvisibility at LF and accessibility to the computational system. The two properties are related by the descriptive generalization (49).(49) a. Features visible at LF are accessible to the computation CHLthroughout, whether checked or not.b. Features invisible at LF are inaccessible to CHL once checked.Case (49a) holds without exception; (49b) only in part, in an interesting way.The valid part of the generalization follows at once from a slight modification of the theory of checking, actually an improvement that was neededanyway. The checking operation taken over from earlier work has a numberof odd features. For one thing, it seems redundant: the relevant properties aredeterminable by algorithm from the LF representation itself. But we now seethat the proposal is untenable. Is there a way, then, to dispense with the checking operation entirely?

Categories and Transformations

257

Suppose that we do so, keeping just to the relation from which the operationis derived: the checking relation that holds between features of the checkingdomain and of the target (features that are readily detected, as we have seen).We have so far assumed that the operation Move F is defined in terms of theconditions in (32), repeated here.(50) a. F is an unchecked feature.b. F enters into a checking relation with a sublabel of K as a result ofthe operation.The point of (50a) was to prevent a nominal phrase that has already satisfiedthe Case Filter from raising further to do so again in a higher position. Theconclusion is correct, but the formulation of the principle must be revisedto yield the condition (49). We now have the means to do so quitestraightforwardly.The key to the problem is the hitherto neglected property Interpretable.This property is determined by bare output conditions, hence available freeof charge. We can therefore make use of it to restate (50), without cost. Asthroughout, we restrict attention to formal features in this inquiry into thecomputational system. To begin with, let us simplify (50) by eliminating (a)entirely, allowing the variable F in Move F to range over formal features freely.We then replace (50b) by (51), the final version here of Last Resort.(51) Last ResortMove F raises F to target K only if F enters into a checking relationwith a sublabel of K.But we still have to capture the intended effects of (50a): crucially, that a[Interpretable] feature is frozen in place when it is checked, Case being theprototype.Continuing to understand deleted as invisible at LF but accessible to thecomputation, we now reformulated the operations of checking and deletionas in (52).(52) a. A checked feature is deleted when possible.b. Deleted is erased when possible.Erasure is a stronger form of deletion, eliminating the element entirely sothat it is inaccessible to any operation, not just to interpretability at LF.50Possibility in (52) is to be understood relative to other principles. Thus,deletion is impossible if it violates principles of UG. Specifically, a checkedfeature cannot be deleted if that operation would contradict the overridingprinciple of recoverability of deletion, which should hold in some fashion for

258

Chapter 4

any reasonable system: Interpretable features cannot delete even if checked.

The question of erasure, then, arises only for a Interpretable feature F, whichis erased by (52b) unless that operation is barred by some property P of F. Pshould be readily detected, to avoid excessive computational complexity. Onesuch property is parametric variation: F could be marked as not erased whendeleted, a possibility that will be explored below in connection with multipleSpec constructions. Tentatively, let us assume that this is the only relevantproperty of F.Erasure is also barred if it creates an illegitimate object, so that no derivationis generated. That too is trivially determined. The crucial case has to do witherasure of an entire term of a syntactic object . Let N = {, {, }}. Erasureof replaces N by N = {, {}}, which is not a legitimate syntactic object(see (24)). We conclude that(53) A term of cannot erase.Erasure of a full category cancels the derivation. In the parallelism cases discussed earlier, for example, deletion is not followed by erasure in the Ncomputation, under (52); what happens in the phonological component, whichhas a wholly different character, is a separate matter. But illegitimate objectsare not formed by erasure within some term (see note 12). Hence, such erasureis not barred for this reason.We have now dispensed with the checking operation. The problemsabout interpretability skirted in earlier discussion dissolve, and the descriptive generalization (49) follows at once insofar as it is valid. Case (49a) istrue without exception: Interpretable features cannot be deleted (a fortiori,erased) and therefore remain accessible to the computation and visibleat LF. Case (49b) holds unless erasure of the Interpretable checked featureerases a term or is barred by a parametrized property P of the feature. Thoughexamples exist, they are few; thus, case (49b) holds quite generally. Forexpository purposes, I will speak of deletion as erasure except when theissue arises.The revision of checking theory is without effect for Interpretable featuresin the checking domain, such as Case of an argument. It is these features thatmust be inaccessible after checking; the examples discussed are typical in thisregard. Erasure of such features never creates an illegitimate object, so checking is deletion and is followed by erasure without exception. Features of thetarget are always Interpretable, for reasons yet to be explained. The revisedchecking theory deletes them without exception, and typically erases them.One might ask what happens when all features of FF(LI) are Interpretableand it raises to the checking domain of K: raising of the associate of an

Categories and Transformations

259

expletive or covert object agreement, for example. Not all of FF(LI) coulderase; if it did, an illegitimate object would be formed. But we need not solvethe problem, because it does not arise. FF(LI) always contains Interpretablefeatures: the categorial and -features of the argument.The only exception to the conclusions of the last paragraph is pure expletives, to which we will return.To illustrate the consequences, let us return to sentence (48), we build airplanes. When the subject we is introduced into the derivation within the verbphrase, FF(we) includes D and specific choices of -features and Case. SinceI has a strong D-feature (EPP), the categorial feature of we raises overtly toits checking domain, pied-piping the entire DP; therefore, the operation issubstitution in [Spec, I]. There are two ways in which we could have raised inthis case, depending on how F is selected for the Move F operation. If F = D,then a checking relation is established between the raised categorial feature ofwe and the strong D-feature of I. The Case feature of we is checked by T asa free rider, as are the -features, after covert raising of the verb establishesthe required checking relation. F could also be Case, which would mean thatthe EPP is satisfied by the categorial feature as a free rider. But F could notbe a -feature in this case, because the verb raises only covertly so that thechecking relation between -features is only established later; Last Resort (51)would be violated if -features of we were accessed in overt raising. The Casefeature of we is Interpretable, therefore erased when checked. The -features,however, are Interpretable, hence accessible to further operations, as is thecategorial feature D.Note that the EPP is divorced from Case. Thus, we assume that all valuesof T induce the EPP in English, including infinitives, though only controlinfinitives assign (null) Case; raising infinitives do not (see section 1.4.3).We can now return to a question that was left open: why are the features ofthe target that enter into checking relations invariably Interpretable? Supposethat a sublabel F of the target category K is Interpretable. Suppose the featureF that is accessed by the operation OP and raised to the checking domain ofF is Interpretable, entering into a checking relation with F. Both features areInterpretable, hence unchanged by the operation. The operation OP is locallysuperfluous, not required by the features that enter into the checking relationthat drives it. But OP might nonetheless contribute to convergence. Forexample, a free rider of FF[F] might enter into a checking relation with anothersublabel of the target, one or the other being affected (erased or deleted); orOP might be a necessary step toward a later operation that does delete andperhaps erase Interpretable features, allowing convergence. Such possibilitiesabound, considerably extending the class of possible derivations and thus

260

Chapter 4

making it harder to compute economy, perhaps also allowing derivations too

freely (as might not be easy to determine). Preferably, OP should be excluded.It is, if F is necessarily Interpretable, hence always affected by the operation.If F raises to target K, then, the sublabel that is checked by F deletes and typically erases.This property of feature checkers eliminates the possibility of locallysuperfluous movement operations. It reinforces the minimalist character ofthe computational system, permitting its operations to be formulated in a veryelementary way without proliferation of unwanted derivations. To put it differently, the imperfection of language induced by the displacement propertyis restricted by language design so as to avoid excessive computationalcomplexity.Consider successive-cyclic raising as in (54).(54) we are likely [t3 to be asked [t2 to [t1 build airplanes]]]Overt raising of we from t1 to t2 accesses D to satisfy the EPP in the mostdeeply embedded clause, the only possibility since the raising infinitival doesnot assign Case. D is Interpretable, therefore unaffected by checking. It isaccessed again to raise we to t3, satisfying the EPP in the medial clause. Furtherraising from t3 to the matrix subject can access any of the features that enterinto a checking relation there.Consider a different case of successive-cyclic raising, the simple adjectivalconstruction (55).(55) John is [AgrP t2 Agr [AP t1 intelligent]]John raises from the predicate-internal subject position t1 to [Spec, Agr] (t2)for agreement with the adjective, raised to Agr.51 By virtue of Last Resort (51),the operation must access the -features of John, which check agreement. Theyare Interpretable, therefore unaffected. John then raises to matrix subject position, satisfying the EPP, Case, and agreement. Here any of the relevant featuresmay be accessed, since all enter into checking relations (one accessed, theothers as free riders). John thus enters into double agreement: with each of thetwo Agr nodes, hence with the copula and the adjective. What shows up at PFdepends on morphological particulars.52The example illustrates the fact that agreement can be assigned with orwithout Casein the higher and lower [Spec, Agr] positions, respectively.Since the categorial and -features of DP remain accessible after checkingwhile the Case feature does not, a single DP can enter into multiple satisfactionof the EPP and multiple agreement, but not multiple Case relations. The latteroption is the core example that we want to exclude under Last Resort, and its

Categories and Transformations

261

ancestors back to Vergnauds Case Filter. But the other two definitely shouldbe permitted, as they now are.In (55) all features of the subject have been checked and the Interpretableones erased. Suppose that (55) is embedded, as in (56).(56) I(nfl) seems [that John is intelligent]]Though the Case feature of John has been erased, its categorial and -featuresare unchanged. Therefore, John can raise to matrix subject ([Spec, I]), satisfying the EPP and matrix agreement and yielding (23b), repeated here.(57) *John [I(nfl) seems [that t is intelligent]]But John offers no Case feature to be checked, so the derivation crashes ratherthan converging with the interpretation it seems that John is intelligent. In(56) John is effectively frozen in place, as in the examples that originallymotivated Greed (see note 36), though not for the reasons given in earliertheories. These reasons were defective in a fundamental way, failing to takeaccount of the property Interpretable and its relation to accessibility to thecomputational system.We conclude that the intrinsic Case feature of T is dissociated not only fromits (parametrically varying) EPP feature, but also from the (perhaps invariant)semantic properties it reflects. Being Interpretable, the Case feature must bechecked for the derivation to converge. Since it is not checked in (57), thederivation crashes.Suppose that matrix I in (56) is [tense]. If the structure is a control infinitive, the derivation crashes again, for Case reasons. If it is a raising infinitival,the construction is barred as it stands, presumably for selectional reasons: anonembedded infinitival can be a control structure with arbitrary PRO (in somelanguages and constructions), but not a raising infinitival with no relevantproperties other than the strong D-feature (EPP). Further embedding in raisingstructures reintroduces the same problem, so that John remains frozen inplace in (56) with infinitival I as well.53Suppose that a language were to allow the construction (56), but with onlyagreement checked in the embedded clause, not Case. Then raising should bepossible. The categorial and -features of John are Interpretable, hence accessible even when checked; and the Case feature is unchecked, hence still available for checking. Raising John to matrix subject, we derive (57), again withdouble agreement and double satisfaction of the EPP, but with only one Caserelation: in the matrix clause. Such constructions have been reported in anumber of languages, first (in this context) in modern Greek (Ingria 1981).Assuming the descriptions to be correct, they are (and have been regarded as)

262

Chapter 4

prima facie violations of the Case Filter. They fall into place in the mannerjust described. We expect the matrix subject in (57) to have the Case requiredin this position, which could in principle be distinct from nominative: in anECM construction, for example. If that is so, it confirms the conclusion thatCase was not assigned in the lower clause, which would have caused Caseconflict.54Successive-cyclic movement raises further questions, to which we willreturn after the groundwork has been laid.One consequence of this reanalysis of the theory of movement is that Interpretable features need not enter checking relations, since they survive to LFin any event. In particular, categorial and -features of NP need not be checked.The conclusion resolves an outstanding problem concerning inherent Case: ithas never been clear how the -features of the nominal receiving inherent Casecan be checked, in the absence of any plausible functional category; but thequestion does not arise if they need not be checked. The same considerationovercomes the problem of checking of -features in dislocation or coordination, as in John and his friends are here.55Consider incorporated into (say, a noun incorporated into a verb), sothat has the morphological feature [affix] that allows the operation. If thisfeature is Interpretable, excorporation of will be impossible because thefeature will have been erased and will thus be unavailable for further checking; will be unable to adjoin to a second head, even though its other propertiesare intact.The improved theory of movement has consequences for multiple-Specconstructions, which are permitted in principle on minimalist assumptionsabout phrase structure theory, as noted earlier. If this option is realized, wehave the structure (58), with possible further proliferation of Spec.XP

(58)Spec1

XSpec2

XH

Complement

Here we may tentatively assume Spec1 and Spec2 to be equidistant targets

for movement, being within the same minimal domain.56Suppose a language permits (58) for some construction. Suppose furtherthat a Interpretable feature F of H is not necessarily erased when checked

Categories and Transformations

263

and deleted, a parameterized property. F can then check each Spec, optionallyerasing at some point to ensure convergence. If F is a Case feature, it couldassign the same Case repeatedly; such an account has been proposed for multiple Case checking in Japanese and other languages (see note 56). Watanabes(1993a) layered Case theory as restated in note 49 could also be formulatedin these terms. In section 4.10 we will see that similar ideas have interestingconsequences in areas of more central concern here.Spec1 also allows an escape hatch for Relativized Minimality violations andscrambling with A-position properties (binding, obviating weak crossovereffects, etc.), unlike scrambling to an A-position , which, under earlier assumptions, involves full reconstruction; the idea was introduced by Reinhart (1981)to account for Wh-Island violations in Hebrew. Ura (1994) holds that superraising, A-scrambling, and multiple Case assignment correlate in many languages. If so, that would lend further empirical support to the conclusion that(58) is an option that a language may have.574.5.3

Expletives

Suppose that a derivation has reached the construction (56) and the numerationcontains an expletive, so that we can derive, for example, (24a), repeatedhere.58(59) *there seem [that [Subj a lot of people] are intelligent]The expletive there checks the strong feature of I (EPP), but it fails to checksome feature of H = [I, seem] that is Interpretable and must be erased forconvergence. The Interpretable features of H are its Case and -features.Once again, we see that the expletive must lack Case or -features, or both(see discussion of (39)).Suppose that there has Case, so that only the -features of H remainunchecked. But the -features of Subj are Interpretable, so Subj (actually, theformal features of its head) can raise covertly, checking the -features of Hand allowing the derivation to convergeincorrectly, with an interpretationsimilar to it seems that a lot of people are intelligent. It follows that theremust lack Case.Suppose that the expletive has -features. Suppose that these do not matchthe features of its associate, as in (39a), repeated here, with there plural andits associate a man singular, and the raising verb plural, matching there.(60) *there seem to be [a man] in the roomThe -features of seem are erased in the Spec-head relation with there. The-features of there, being Interpretable for an expletive, are also erased under

264

Chapter 4

this checking relation. The Case feature of seem is erased by raising of theassociate a man. Since the -features of a man are Interpretable, they need notbe checked. The derivation of (60) therefore converges, incorrectly.We conclude, then, that the expletive has neither Case nor -features. FF(there) contains only D, which suffices to satisfy the EPP: the expletive hasno formal features apart from its category.Notice that agreement is overtly manifested on the verb that has there assubject. Earlier we considered the suggestion that overt manifestation of-features is a reflection of the [Spec, H] rather than [H F H] relation. Theobservation about agreement with expletives is consistent with this proposal,but it would conflict with the alternative idea that the distinction reflects overtrather than covert agreement. The two suggestions are empirically distinct inthis case, perhaps only this case.Suppose that there is a pure expletive lacking semantic features as well asformal features apart from its category D. We therefore expect it to be invisibleat LF, to satisfy FI. We know that there cannot be literally erased whenchecked; that would violate the fundamental condition (53), forming an illegitimate syntactic object that would cancel the derivation. By the generalprinciple of deletion-erasure, (52), it follows that the categorial feature of thereis only deleted when checked, not erased, along with those of its traces (seenote 12).Since the expletive necessarily lacks Case, it must be the associate thatprovides the Case in ordinary expletive constructions such as (61ac).(61) a. there is a book on the shelfb. there arrived yesterday a visitor from Englandc. I expected [there to be a book on the shelf]The associate must therefore have the Case that would be borne by DP in theconstructions (62ac), respectively.(62) a. DP is (DP = nominative)b. DP arrived (DP = nominative)c. I expected [DP to be ] (DP = accusative)We therefore cannot accept the partitive Case theory of Belletti (1988), contrary to the assumption in chapter 2.There is a distinction between expletives that have Case and -features andthe pure expletives that lack these features: in English, it and there, respectively. The distinction is neither clear nor sharp, but it is adequate for ourlimited purposes.59 The former satisfy all properties of the I-V head they check,erasing the relevant features, and therefore bar associate raising. The latter do

Categories and Transformations

265

not erase the Interpretable features of the I-V head. Therefore, raising ispermitted, targeting this element; and it is required for convergence.Two consequences follow. The direct prediction is that expletive constructions will manifest verbal agreement with the associate just in case the expletive lacks Case and -features: English there, German es, and Italian probutnot English it and French il, which have a full complement of features. Notethat the distinction is only partially related to overt manifestation.60 A moreinteresting prediction, and one that will be more difficult to confirm if true, isthat just in case the expletive lacks Case and -features, the associate will bindand control as if it were in the surface subject position. We have seen somereason to believe that this is true.We might ask why languages should have overt expletives lacking Caseand -features rather than pro. In part, the reason may reduce to the nullsubject parameter, but more seems to be involved. Thus, Icelandic andGerman both have null expletives, but Icelandic is a null subject languageand German is not. In these languages the lexical entry for the expletivespecifies two forms, null and overt. Their distribution seems to be complementary, determined by structural factors. The optimal result would be thatthe overt variant is used only when this is required for convergence: PF convergence, since the two forms are identical within the covert component.That could be true if the presence of the overt expletive reduces to theV-second property, which could belong to the phonological component if thereis no ordering in the core N computation, as we have assumed. Thatseems promising. In both languages it seems that the overt expletive is usedonly where the V-second property otherwise holds. If that turns out to becorrect, then the expletive may well be nullnothing beyond the categorialfeature [D]throughout the N computation. The overt features are thenadded only in the course of the phonological operations, though coded inthe lexicon.61Though a serious development of the theory of expletives requires muchmore careful examination, including a far broader range of cases, some conclusions follow even from fairly weak considerations, as we have already seenand will continue to find as we proceed.4.5.4

Clause Type

Let us turn to the formal features of the functional category C (complementizer) that determines clause type,62 for example, the feature Q for interrogativeclauses in the construction (63).(63) Q [IP John gave DP to Mary]

266

Chapter 4

Q is plainly Interpretable; therefore, like the -features of a nominal, it

need not be checkedunless it is strong, in which case it must be checkedbefore Spell-Out if a derivation is to be constructed.63 As is well known,languages differ in strength of Q. The strong Q feature is satisfied by afeature FQ.For English, Q is strong. Therefore, when Q is introduced into the derivation, its strong feature must be eliminated by insertion of FQ in its checkingdomain before Q is embedded in any distinct configuration (see (3)). FQ mayenter the checking domain by Merge or Move, by substitution or adjunction.Consider the Merge option. Since it is overt, a full category must beinserted in the checking domain of Q. If the operation is substitution, becomes [Spec, Q]; if adjunction, is an X0 category. In English the two casesare illustrated by (64).(64) a. (I wonder) [CP whether Q [he left yet]]b. (I wonder) [CP [Q if Q] [he left yet]]FQ is often called the wh-feature, which we can take to be a variant of D.Notice that a checking relation can be established by Merge, though thenotions have so far been discussed only for Move. We will return to the question after improving the theory of movement.Let us turn to the second and more intricate possibility: FQ enters the checking domain of Q by raising. Again, the options are substitution or adjunction.The substitution option is realized by raising of FQ to [Spec, Q] by overt whmovement, which pied-pipes a full category for PF convergence. The adjunction option is realized by I Q raising. If this is in fact raising of a verbalfeature, as proposed earlier (see (1)), then FQ in this case is [V]. There aregeneralizations and language-specific properties,64 but any account that departsfrom minimalist assumptions can be considered explanatory only insofar as ithas independent justification.Under raising, (63) yields two legitimate outputs, (65a) and (65b), depending on whether the strong feature of Q is checked by adjunction or substitution(we abstract from the contrast between embedded and root forms).(65) a. did [IP John give a book to Mary]b. (guess) which book [IP John gave to Mary]c. (guess) which x, x a book, John gave x to MaryIn (65a) DP of (63) is a book, did adjoins to Q, and the construction is interpreted as a yes-or-no question. In (65b) DP of (63) is which book, and theconstruction is interpreted as something like (65c), along the lines sketched insection 3.5.

Categories and Transformations

267

FQ is Interpretable and hence need not be checked. It therefore raises to the

checking domain of Q only if this option is selected to eliminate the strongfeature of Q, in which case an entire wh-phrase or I-complex is pied-piped,substituted in [Spec, Q] or adjoined to Q, respectively.Suppose that DP is which book and the strong feature of Q in (63) is satisfied by adjunction of I alone, as in (65a), so that what surfaces is (66).(66) did John give which book to MaryCovert raising of the wh-feature is unnecessary and hence impossible, byeconomy conditions (see note 64). The interpretation of (66) is not (65c), asit would be if the wh-feature raised covertly to adjoin to Q. (66) converges,with whatever interpretation it hasperhaps gibberish (I put aside interpretations with focus and echo questions, irrelevant here).Suppose (63) is embedded and DP = which book. The I Q option is nowunavailable (alternatively, it is available, and yields an embedded yes-or-noquestion, interpreted as gibberish). The wh-phrase, however, may raise overtlyto the embedded [Spec, Q], yielding (67), with (65b) embedded.(67) they remember [which book Q [John gave t to Mary]]Suppose that the matrix clause is also interrogative, with the complementizerQ. Again, there are two ways to check its strong feature, I-raising or whmovement, yielding either (68a) or (68b) (again abstracting from the rootembedded distinction).(68) a. do they remember which book John gave to Maryb. (guess) which book [they remember [t Q [John gave t to Mary]]]The second option is available, because the wh-feature is Interpretable in (67),hence accessible to the computation.(68a) is unproblematic: it is a yes-or-no question with an embedded indirectquestion. (68b) converges with the interpretation (69).(69) (guess) which x, x a book, they remember [Q John gave x to Mary]Here the embedded clause is interpreted as a yes-or-no question, presumablygibberish, but in any event different from the interpretation of (70), whichresults if embedded Q is replaced by declarative C (perhaps mildly deviantbecause of the factive character of the embedded clause).(70) (guess) which book they remember that John gave to MaryWhether the operation that forms (70) is successive cyclic depends on theanswer to a question raised earlier (see p. 245). Interpretations fall out as theydo, depending on the nature of the complementizer.

268

Chapter 4

Suppose that a language has weak Q. In that case the structure (63) willreach PF without essential change. If DP = which book, it will remain in situat PF, (and also at LF, apart from covert raising for Case). The wh-feature doesnot adjoin to Q; both are Interpretable and need not be checked for convergence. If the language has only the interpretive options of English, it will haveno intelligible wh-questions and presumably no evidence for a wh-feature atall. But languages commonly have wh- in situ with the interpretation of (65c).They must, then, employ an alternative interpretive strategy for the construction Q[ wh- ], interpreting it, perhaps, as something like unselectivebinding.On different grounds, Reinhart (1993) proposes a similar analysis. The samebasic conclusions are reached by Tsai (1994), in a study of typologicallydiverse languages that carries further the effort to use morphological propertiesto account for some of the problems opened to investigation by Huang (1982).The essence of this theory seems to be a fairly direct consequence of minimalist assumptions, strictly pursued.65In discussing the operation Merge in section 4.2.1, we came to the conclusion that it must be overt, with a single exception: covert insertion of an item lacking phonological features, necessarily at the root. We can restrict attention to a complementizer C. The option left open is that phonologically nullC may be inserted covertly at the root. C could in principle be strong, in whichcase it triggers an immediate operation to erase the strength feature. Since thetriggered operation is covert, it cannot be substitution in [Spec, C], but mustbe feature adjunction to C. Do such cases exist?Consider first declarative C, which is weak. Can its null variant be insertedcovertly in a root clause? There is good reason to believe that it can. Declarative C is one of the force indicators and therefore must be present for interpretation at the C-I interface. But it never appears overtly: at the root we have(71a), not (71b) (understood as a declarative assertion).(71) a. John leftb. *that John leftThe natural conclusion is that C is indeed introduced, but covertly. Furthermore, covert insertion is necessary on grounds of economy, if we assume thatProcrastinate holds of Merge as well as Move.66Discourse properties confirm these conclusions. We do indeed find such rootclauses as (71b) or (72) with overt complementizer, but not with declarativeforces.(72) that John leave

Categories and Transformations

269

Thus, (71b) and (72) could be the answers to the questions (73a) and (73b),respectively, but not (73c), which calls for a declarative assertion.(73) a. what did he tell youb. what would you preferc. what happened yesterdayConsider interrogative C, say, English Q, which we still take to be strong.Suppose it is inserted covertly, at the root, to yield (74).(74) Q[DPSubj will see DPObj]We can rule out the possibility that this is the variant of Q satisfied byFQ = [V], yielding a yes-or-no question with I Q raising in the overt case.That variant has phonological properties that determine rising intonation;if it is inserted covertly, the derivation will therefore crash at LF. The onlypossibility, then, is that Q requires interpretation as a wh-question, which hasno phonological properties, leaving intonation unchanged.We might ask why the variant of Q satisfied by adjunction of FQ = [V] doesnot have a null alternant, like declarative C. There could be structural reasons:perhaps some barrier against a null element with an affixal feature (Agr isan exception, but the problem will be overcome in section 4.10; if-adjunctionto Q is a more serious counterexample). There are also functional motivations. Thus, if there were a null alternant for Q with strong V-feature, thesentence John will see Bill would be ambiguously interpreted as declarativeor interrogative.We restrict attention, then, to covert introduction of Q, still assuming it tobe strong in English. Covert substitution is impossible, so the strong featurehas to be satisfied by adjunction: the strong feature of Q must be checked byFQ = [wh-].The structure must therefore contain a wh-phrase with a wh-feature thatadjoins covertly to Q. The wh-phrase might be the subject, the object, or anadjunct, as in (75), an IP lacking C at the point of Spell-Out but interpretedas a wh-question at LF (declarative intonation throughout).(75) a. Q [IP who will fix the car]b. Q [IP John will fix what]c. Q [IP John will fix the car how (why)]For (75a), the conclusion accords reasonably well with the facts, which havealways been puzzling: why should a construction that seems to have all theovert syntactic properties of IP be interpreted as a wh-question?Case (75b) yields the interpretation what will John fix. That is allowed insome languages (French), but is dubious at best in English. Case (75c) should

270

Chapter 4

be interpreted as how (why) will John fix the car. That is excluded generally.The main results follow if we assume that strong features cannot be insertedcovertly, so that some variant of the in-situ strategy has to be employed (possible for (75a) and (75b), blocked for (75c), which allows no variable formation in the wh-phrase; see note 65).Let us assume, then, that covert insertion of strong features is indeed barred.One might suspect that the possibility is excluded because of paucity of evidence. To put it differently, the interface representations (, ) are virtuallyidentical whether the operation takes place or not. The PF representations arein fact identical, and the LF ones differ only trivially in form, and not at all ininterpretation. Suppose there is an economy principle (76).(76) enters the numeration only if it has an effect on output.With regard to the PF level, effect can be defined in terms of literal identity:two outputs are the same if they are identical in phonetic form, and isselected only if it changes the phonetic form. At the LF level the condition isperhaps slightly weaker, allowing a narrow and readily computable form oflogical equivalence to be interpreted as identity.67 Under (76), the referenceset is still determined by the numeration, but output conditions enter intodetermination of the numeration itself; they affect the operation that constructsthe numeration from the lexicon.Covert insertion of complementizer has an LF effect and therefore is notbarred by (76). The status of strength is somewhat different. Insofar as itspresence is motivated only by PF manifestation, it cannot be inserted covertly,under (76), or it would not have been in the numeration at all. There is a gooddeal more to say about these questions. We will return to some of their aspectsin a broader framework in section 4.10.With regard to covert insertion of lexical elements, we are fairly close towhat seem to be the basic facts, on only minimalist assumptions and with someapparent language variation that seems rather peripheral, though a good dealremains to be explained.4.5.5

The Minimal Link Condition

Suppose that whom replaces Mary in (67), yielding (77).

(77) they remember [which book Q [John gave t to whom]]Suppose that (77) is interrogative, with the complementizer Q. If it is a rootconstruction, the strong feature of Q can be eliminated by adjunction of I toQ or substitution of a wh-phrase in [Spec, Q]; if it is embedded, as in (78),only the latter option is available.

Categories and Transformations

271

(78) guess [Q they remember [which book Q [John gave t to whom]]]

Embedded or not, there are two wh-phrases that are candidates for raising to[Spec, Q] to check the strong feature: which book and (to-)whom, yielding(79a) and (79b).(79) a. (guess) [which book Q, [they remember [t Q [to give t towhom]]]]b. (guess) [[to whom]2 Q [they remember [[which book]1 Q [to givet1 t2]]]](79b) is a Wh-Island violation. It is barred straightforwardly by the naturalcondition that shorter moves are preferred to longer onesin this case, byraising of which book to yield (79a). This operation is permissible, since thewh-feature of which book is Interpretable, hence accessible, and the raisingoperation places it in a checking relation with Q, erasing the strong featureof Q. The option of forming (79a) bars the longer move required to form(79b). But (79a), though convergent, is deviant, as in the case of (69).Let us interpret the Minimal Link Condition (MLC) as requiring that at agiven stage of a derivation, a longer link from to K cannot be formed if thereis a shorter legitimate link from to K. In these terms, the A-movement casesof relativized minimality can be accommodated (to a first approximation; wewill return to further comment). It is not that the island violation is deviant;rather, there is no such derivation, and the actual form derived by the MLC isdeviant.What about the A-movement cases (superraising)? Suppose we have constructed (80).(80) seems [IP that it was told John [CP that IP]]Raising of John to matrix subject position is a Relativized Minimality (ECP)violation, but it is barred by the shorter move option that raises it to thisposition. Raising of it is a legitimate operation: though its Case feature hasbeen erased in IP, its D-feature and -features, though checked, remainaccessible.There are differences between the A- and A-movement cases that have tobe dealt with, but these aside, both kinds of Relativized Minimality violationfall together naturally under the MLC.68Closer analysis of formal features thus allows us to resurrect an idea aboutisland violations that has been in the air for some years: they involve a longerthan-necessary move and thus fall under an approach that has sometimes beensuggested to account for superiority phenomena.69 The idea ran into two problems. Suppose a derivation had reached the intermediate stage of (78) and

272

Chapter 4

(80), with an intermediate category (which book, it) closer to the intendedtarget than the one we hope to prevent from raising. The first problem is thatthe intermediate category has its features checked, so it should be frozen inplace. The second problem has to do with the range of permissible operationsat stage : there are so many of these that it is hard to see why raising of theintermediate category is the shortest move. That problem was in fact moregeneral: thus, it was far from clear why raising of John to [Spec, I] in (81) isthe shortest move.70(81) I(nfl) was told John (that IP)Both problems are now overcome, the first by attention to interpretability offeatures, the second by a radical narrowing of the class of permissible operations under (51) (Last Resort).Let us turn now to the differences between the A- and A-movement violations (Wh-Island, superraising). In the former case, the derivation satisfyingthe MLC converges; in the latter, it does not. Raising of embedded it to matrixsubject satisfies the EPP and the -features of [I, seem], but not the Casefeature. But matrix T has a Interpretable Case feature, which, unless checkedand erased, causes the derivation to crash.71 In the case of A-movement, unlikeA-movement , the shortest move does not yield a convergent derivation.For the account of the superraising violation to go through, we must takethe MLC to be part of the definition of Move, hence inviolable, not aneconomy condition that chooses among convergent derivations: shortestmoves are the only ones there are. As noted earlier, that has always been thepreferred interpretation of the MLC for purely conceptual reasons, and perhapsthe only coherent interpretation (see pp. 245246). We are now in a positionto adopt it, having eliminated many possible operations that would appear toundermine the condition.We therefore add to the definition of Move the condition (82), expressingthe MLC, where close is (tentatively) defined in terms of c-command andequidistance, as discussed in chapter 3.(82) can raise to target K only if there is no legitimate operation Move targeting K, where is closer to K.A legitimate operation is one satisfying (51).Before proceeding, let us review the status of the superraising violation (80)in the light of economy considerations. Suppose that the derivation D with theinitial numeration N has reached stage . The reference set within which relative economy is evaluated is determined by (N, ): it is the set R(N, ) ofconvergent extensions of the derivation N , using what remains of N. At

Categories and Transformations

273

, the operation OP is blocked if OP yields a more economical derivation in

R(N, ).Considerations of economy arise at stage of the derivation only if thereis a convergent extension. But in the case of (80), there is none. The problemis not with the initial numeration N: there is a convergent derivation that takesa different path from N, leading to (83), with it inserted in matrix subjectposition.(83) it seems [that John was told t [that IP]]Superraising from (80) is not barred by economy considerations that reject theoutcome in favor of (83), because (80) is not a stage on the way to a convergentderivation at all. Unless the shortest-move requirement is part of the definitionof Move, there will be a convergent derivation from (80), namely, the one thatinvolves superraising. But things work out as desired if the MLC is part of thedefinition of Move, as preferred for other reasons.72As is well known, the superraising violation is far more severe than theWh-Island violation involving arguments, and there are many related problemsthat have been the topic of much investigation.73 The conclusions here shedno further light on them.4.5.6

Attract/Move

The formulation of the MLC is more natural if we reinterpret the operation of

movement as attraction: instead of thinking of as raising to target K, letus think of K as attracting the closest appropriate .74 We define Attract F interms of the condition (84), incorporating the MLC and Last Resort (understood as (51)).(84) K attracts F if F is the closest feature that can enter into a checkingrelation with a sublabel of K.If K attracts F, then merges with K and enters its checking domain, where is the minimal element including FF[F] that allows convergence: FF[F] aloneif the operation is covert. The operation forms the chain (, t).For expository purposes, I will sometimes use the familiar terminology ofthe theory of movement (target, raise, etc.), though assuming that the correctinterpretation is in terms of attraction, referring to the operation generally asAttract/Move.The notion equidistance defined in chapter 3 carries over to Attract Fwithout essential change, though we can simplify it and generalize it to crucialcases not considered there. Let us consider the matter step by step, beginningwith the earlier notion.

274

Chapter 4

In chapter 3 we considered several instances of the structure (85) (= (11) of

chapter 3, modified as required in the present framework), t the trace of Y,which is adjoined to X to form [YX].(85)

XPSpec2

[YX]

YPSpec1

Yt

ZP

Spec1 and Spec2 are both in the minimal domain of the chain CH = (Y, t) andare therefore equidistant from = ZP or within ZP. Move can therefore raise to target either Spec1 or Spec2, which are equally close to . Reformulatingthe notion of equidistance in terms of Attract, we say that Spec1, being in thesame minimal domain as Spec2, does not prevent the category X (= {X, {X,YP}}) from attracting to Spec2.But note that (85) is only a special case: another possibility is that attachesto the higher target X, skipping Spec1, not by substitution as in (85) but byadjunction, either adjunction to X or head adjunction to [YX]. The case didnot arise in chapter 3, but it does now, particularly with regard to feature raising(hence all covert movement). We want to extend the notion of closeness toinclude this possibility.Let us review the basic notions of domain and minimal domain of definedearlier (section 3.2), as a prelude to what will be a considerable simplification.The notions were defined for heads (either alone, or heading chains). We nowextend them to features as well. Recall that we have modified them slightly;see note 47.Suppose is a feature or an X0 category, and CH is the chain (, t) or (thetrivial chain) . Then(86) a. Max () is the smallest maximal projection including .b. The domain (CH) of CH is the set of categories included inMax() that are distinct from and do not contain or t.c. The minimal domain Min((CH)) of CH is the smallest subset Kof (CH) such that for any (CH), some K reflexivelydominates .

Categories and Transformations

275

Recall that domain and minimal domain are understood derivationally, notrepresentationally. They are defined once and for all for each CH: at thepoint of lexical insertion for CH = , and when CH is formed by movementfor nontrivial CH.The domain () and the minimal domain Min(()) of are as definedfor CH = , t now being irrelevant.Turning to closeness, we are concerned with the maximal projection HPheaded by H with adjoined to form H0max (the zero-level projection of H), heading the chain CH = (, t).(87) is closer to HP than if c-commands and is not in the minimaldomain of CH. may be an X0 category or a feature.In effect, the minimal domain of CH determines a neighborhood of H thatcan be ignored when we ask whether a feature F is attracted by HP; withinthe neighborhood of H is not closer to HP than . Note that the neighborhoodis determined only by that is an immediate constituent of H0max, not by amore deeply embedded sublabel; this is necessary, or virtually all categorieswill be equidistant with I at LF after V-raising. This issue will dissolve lateron, so it need not be analyzed further.The definition incorporates equidistance in the former sense and straightforwardly extends it to the case of adjunction. We will see in section 4.10 thatthe notions closeness and equidistance can be further simplified. Unclarities remain about a zero-level projection of H with more than one adjoined.I will leave the problems unsettled for the moment; they will be reduced considerably as we proceed.In the light of this more principled approach to the theory of movement, letus return to the phenomenon of successive cyclicity, that is, raising of the head of a chain CH = (, t) to form a new chain CH = (, t). A number ofproblems arise if this is a permissible process.Suppose that is an argument that raises successive-cyclically to form (54),repeated here.(88) we are likely [t3 to be asked [t2 to [t1 build airplanes]]]Here the traces are identical in constitution to we, but the four identical elements are distinct terms, positionally distinguished (see discussion of (14) and(15)). Some technical questions remain open. Thus, when we raise (withco-constituent ) to target K, forming the chain CH = (, t), and then raise again to target L, forming the chain CH = (, t), do we take t to be the tracein the position of t or of CH? In the more precise version, do we take CH

276

Chapter 4

to be (, L, , K) or (, L, , )? Suppose the latter, which is natural,

particularly if successive-cyclic raising is necessary in order to remove allInterpretable features of (so that the trace in the initial position will thenhave all such features deleted). We therefore assume that in (88), the element in t1 raises to position t2 to form the chain CH1 of (89), then raises again toform CH2, then again to form CH3.(89) a. CH1 = (t2, t1)b. CH2 = (t3, t1)c. CH3 = (we, t1)In more precise terms, t1 of (89a) is we, [build airplanes] and t2 is we, [to[we build airplanes]] (or simply the co-constituents themselves); and so on.But a problem arises: only CH3 is a legitimate LF object, with the Interpretable Case feature eliminated from t1. The other two chains violate theChain Condition, so the derivation should crash.The problem has been recognized for years, along with others concerningsuccessive-cyclic movement to A-positions, in which medial links might beexpected to have properties of adjunct movement (see section 3.2). Variousproposals have been put forth. It was suggested in chapter 1 that the chainsformed by successive-cyclic movement become a single linked chain. Inchapter 3 we assumed that a single Form Chain operation yields a multimembered chain; but that proposal does not fit easily into the current framework,and the motivation has largely disappeared with the revision of the theory ofmovement to incorporate the MLC.In the present framework, the natural proposal is to eliminate the chainsCH1 and CH2, leaving only the well-formed chain CH3. That result would beachieved if the traces formed by raising of the head of a chain are invisible atLF. Why might that be the case?In the phonological component, traces delete. We have found no reason toextend that convention to the N computation, and indeed cannot; werewe to do so, -positions would be invisible at LF and argument chains wouldviolate the Chain Condition (analogous considerations hold for other chains).But we can extend the convention partially, stipulating that raising of headingthe chain CH = (, t) deletes the trace formed by this operationthat is, marksit invisible at LF. Suppose we do so. At LF, then, all that is seen is the chainCH3, which satisfies the Chain Condition.Can the deleted traces erase under the deletion-erasure principle (52)? Weknow that they cannot fully erase: they are terms, and terms cannot erase (see(53)). But the intermediate deleted traces do not enter into interpretation.Therefore, the economy condition (52b), which erases deleted formal features

Categories and Transformations

277

where possible, allows erasure of formal features of the intermediate traces if

something remains. The phonological features do not remain; they have beenstripped away by Spell-Out. But in the case of an argument, semantic featuresremain. These are not subject to the operations of checking theory (including(52)), which are restricted to formal features. Therefore, a formal feature F ofan intermediate trace of an argument may erase, and indeed must erase if possible. We therefore conclude that formal features of the intermediate trace ofA-movement erase. We can now informally think of the set of chains so produced as a single linked chain, along the lines of chapter 1, with defectiveintermediate traces.Filling in the details of this outline, we have a theory of successive-cyclicmovement that fits into the broader framework. Intermediate traces are invisible at LF; the only chain subjected to interpretation is the pair (, t), in thehighest position of raising and t in the position of lexical insertionwhich forconvenience I will continue to call the base position, borrowing the term fromEST. In an argument chain, formal features of intermediate traces are erased.75We derive the property (90).(90) The intermediate trace t of an argument cannot be attracted; hence, tdoes not prevent attraction of an element that it c-commands.The argument extends to traces of A-movement generally. Thus, the head of such a chain can freely raise, but the properties of the trace t left by theoperation will depend on the feature composition of . If has semanticfeatures, all formal features of t erase; if is a pure expletive, its sole formalfeature [D] remains, though it is deleted (invisible at LF).Language design must therefore be such that the trace of a raised expletivecan never be attracted improperly or bar attraction required for convergence.That is indeed the case. The only relevant construction would be (91), in whichan expletive has raised, leaving a trace.(91) there seem [t to be some books on the table]The Case and -features of book must raise to matrix I, though t is closer tothis position. The problem is overcome if t lacks relevant features, that is,features that can enter into a checking relation with matrix I. If so, matrix Iattracts the features of the associate book, as required. But we already knowthat the trace of the expletive lacks such features. Its only formal feature is itscategory [D], which is irrelevant, the EPP having already been satisfied by theexpletive itself. This is, furthermore, the only kind of construction in whichthe problem of attracting expletive trace could arise. We therefore concludethat the trace of an expletive does not enter into the operation Attract/Move;

278

Chapter 4

it is immobile and cannot bar raising. Once again, strict observance of minimalist assumptions yields the correct array of facts, without redundancy orother imperfection.We have restricted attention to intermediate traces of argument chains. Butthe notion intermediate trace is imprecise. Further questions arise when thehead of an argument chain is raised to an A -position, as in (92).(92) a. what did John see tb. what [t was seen t]c. (guess) what there is t in the roomIn all cases the trace t (= what) heads an argument A-chain and is raised furtherby wh-movement. In (92a,c) features of t (which could head a nontrivial argument chain, as in (92b)) must still be accessible for Case checking and associate raising, respectively. These are not what we think of intuitively asintermediate traces of successive-cyclic movement, but the computationalsystem does not make the intuitive distinctions (unless modified to do so). Wetherefore have to ask what happens to the features of the traces in these constructions: are their formal features deleted and erased, as in the case of successive-cyclic raising?One possibility is to sharpen the notion intermediate trace to exclude thesecases, but a more attractive one is to extend the discussion to them. If so, ahost of questions arise. Thus, in (92a), or the more complex structure what didJohn expect t to be seen t in which t heads a nontrivial chain, can the Casefeature F of t raise covertly for Case checking, or must the wh-phrase havepassed through [Spec, AgrO] overtly? If the former, then F must not havebeen erased, or presumably even deleted, when wh-movement took place.A convention is then needed requiring erasure of F throughout the array ofchains containing F, so that no Interpretable feature remains in the operatorposition; questions of some potential interest also arise about the position ofreconstruction.This line of reasoning suggests a narrow modification of the precedingaccount of feature deletion under movement: formal features of trace aredeleted (hence erased) if they are not necessary for the formation of legitimateLF objects that satisfy FI. There are two kinds of objects that concern us here:argument chains satisfying the Chain Condition and operator-variable constructions with the variable heading an argument chain. When A-chains areformed, no formal features in the trace position are necessary; the argumentchain is well formed without them. But when wh-movement or some otherform of operator raising takes place, the trace left behind heads an argument

Categories and Transformations

279

chain and must have the full complement of features: Interpretable featuresrequired for interpretation of the argument at LF, and Interpretable featuresthat have not yet been checked (otherwise, the Case feature is never checked,remaining in the operator, and the derivation crashes). We conclude, then, thatin A-movement the formal features of the trace are deleted and erased, but inwh-movement (and other operator movement), these features remain intact.The earlier discussion is unaffected. As for (92a), the revised formulationpermits the Case feature F in the argument chain headed by t to raise covertlyfor Case checking, which now deletes and erases it in both positions of thechain (F, tF) formed by the operation (and in the operator). (92c) falls out thesame way. There are a variety of other cases to consider. I will leave the matterhere, pending closer study. The general idea, then, should be that formal features of a trace are deleted and erased if they are unnecessary, and that someversion of (90) holds for traces of A-movement generally.It also seems natural to expect (90) to extend to (93).76(93) Trace is immobile.The operation Attract/Move can see only the head of a chain, not its secondor later members. Though it is not forced, the natural more general principleis that traces also cannot be targets, so that we have (94), with the qualifications already noted.(94) Only the head of a chain CH enters into the operation Attract/Move.If (94) holds, we settle a question that was left unresolved in the case ofV-raising: do the features of the object Obj adjoin to the head of the V-chainor to its trace? Suppose, say, that V-raising is overt, as in French-type languages. Do the features FF(Obj) adjoin to the trace of [V, AgrO], a copy of theraised VAgr complex, or to the I complex of which V is a part? The lattermust be the case if (94) is correct, as I will assume. We will return to supporting evidence in section 4.10.Summarizing, minimalist assumptions lead to (something like) the conditions (95) on Attract/Move. Where CH is a (possibly trivial) chain headedby ,(95) a. can raise, leaving the trace t, a copy of .b. Formal features of the trace of A-movement are deleted anderased.c. The head of CH can attract or be attracted by K, but traces cannotattract and their features can be attracted only under narrowconditions reviewed (and left partially open).

280

Chapter 4

A problem is posed, once again, by such constructions as (46), which appear

in preraising form as (96).(96) I(nfl) seem [PP to ] ClThere is good evidence that c-commands into the infinitival clause Cl.Suppose that = him and CL = they to like John, so that the preraising structureis (97), yielding they seem to him to like John after raising.(97) I(nfl) seem [to him] [Cl they to like John]Then a Condition C violation results if (= him) takes John as antecedent. Itfollows that must also c-command they.Why, then, does I in (96) attract the subject they of Cl rather than , whichc-commands it, an apparent Relativized Minimality violation?In (96) seem has two internal arguments: PP and Cl. On present assumptions, that requires an analysis as a Larsonian shell, with seem raising to thelight verb v and subsequent operations yielding (98) (internal structure of Iomitted).(98)

II

VP1v

VP2XP

V2seem

YP

Since PP is the optional argument and Cl the obligatory one in (96), it is likelythat Cl is the complement YP and PP the specifier XP, which yields theobserved order directly.77When seem raises to adjoin to v, it forms the chain CH = (seem, t). PP isin the minimal domain of CH, but this does not suffice to place PP within theneighborhood of I that can be ignored when we ask whether they in (97) isclose enough to IP to be attracted by IP. It is not, because nothing has adjoinedto I at the point when they raises.78 Therefore, = him is closer to IP and hasfeatures that can enter into a checking relation with I (e.g., its D-feature). Weexpect, then, that they should not raise in (97), contrary to fact.In some languages, the facts generally accord with these expectations. InFrench, for example, raising is barred in the counterpart to (97), unless PP isa clitic, which raises, presumably leaving a trace.79

Categories and Transformations

(99) a. *JeanJeanb. JeanJeanJean

281

semble Marie [tJ avoir du talent]

The results are as predicted. Marie is closer to IP than the embedded subjectJean in the position tJ of (99a) and therefore bars raising. The Case of Jean isnot checked and erased, so the derivation crashes. In (99b) the trace of theclitic cannot be attracted, by (95). Therefore, raising is permitted and the derivation converges.If PP in such structures could be raised by A -movement (topicalization,wh-movement), it would leave the structure (100).(100) V t ClAccording to the principle (95), the effects should be as in (99b). The evidenceappears to be partial and somewhat obscure, however. The status of theEnglish constructions still remains unexplained, along with many other relatedquestions.In (96) receives its inherent Case and -role internally to the construction[seem to ], in terms of properties of seem. Could there be a verb SEEM,like seem except that it selects DP instead of PP and assigns a -role but noCase? We would then be able to derive the structure (101), which would yieldthe outputs (102ac), among others.(101) I[[SEEM-v][VP DP [VP ts Cl]]](102) a. Bill SEEMs tB [that it is raining]b. there SEEMs someone [that it is raining]c. there SEEMs someone [John to be likely that t is intelligent]Presumably, no such verb as SEEM can exist, perhaps because of some interaction between inherent Case and -role of internal argument for which wehave no natural expression and which, to my knowledge, remains unexplainedand largely unexplored. There are many similar questions.80Consider the ECM constructions in (103).(103) a. I expected [there to be a book on the shelf]b. I expected [there to seem [t to be a book on the shelf]]In earlier chapters we assumed that the expletive there raises to [Spec, AgrO]to have its accusative Case checked by the transitive verb expect and that theassociate book then adjoins to the raised expletive. We have now rejected thatview. The associate raises not to there but to the matrix verbal element Vb

282

Chapter 4

(= [expect, AgrO]) itself (not to its Spec). Furthermore, the expletive is pureand therefore cannot raise to [Spec, AgrO] at all,81 though it can raise overtlyto subject, as in (91), to satisfy the EPP. Matrix Vb attracts an appropriatefeature F of the associate book (Case or -feature), this being the closest Fthat can enter into a checking relation with Vb. (103b) is accounted for alongthe same lines.Attraction of the associate in these cases radically violates the HMC. Thestatus of this condition remains unresolved. It always fit rather uneasily intothe theories of locality of movement, requiring special assumptions. Theempirical motivation for the HMC is also not entirely clear, and there seem tobe fairly clear counterexamples to it. Can the HMC fall within the frameworkjust outlined? That seems doubtful.We can restrict attention to adjunction of an X0 element to another0X element . Suppose we could establish that attracts only the closestsuch , meaning: a feature F of is the closest feature that enters into achecking relation with a sublabel of max. But that is not enough to yieldthe HMC. Specifically, there is nothing to prevent from skipping somehead that offers no features to be checked. Consider, for example, the construction (104).VP

allows incorporation. Does V1 attract , violating the HMC?If = V, so that V1 requires a verbal affix, then we might argue that V1attracts the closer verb V2, barring attraction of . But that seems implausibleif = N and V1 requires noun incorporation. It is easy enough to add furtherconditions, in effect stipulating the HMC; but we need strong arguments forthat undesirable conclusion. The problem might be overcome if noun incorporation involves a -relation, not a structural configuration (see Di Sciulloand Williams 1988). But it is still necessary to bar unwanted cases of longhead raising in other cases. The situation remains unsatisfactory.Given the central role of feature checking in the minimalist approach, wewant to be clear about just what it means. Category K attracts F only if F enters

Categories and Transformations

283

into a checking relation with a sublabel of K. But earlier discussion skirted an

important problem: what happens in the case of feature mismatch?Two related questions arise.(105) a. In a configuration for feature checking, are features checked ifthey fail to match?b. If a feature is not checked, does the derivation crash?Suppose, for example, that DP has nonnominative Case (accusative, null, orsomething else) and has been raised to [Spec, T], T finite, where nominativeCase is assigned. Then question (105a) asks which of (106a) or (106b) iscorrect.(106) a. The feature is checked in this configuration so that Case can nolonger be accessed by the computation, and the derivation crashesby virtue of feature incompatibility.b. This is not a checking configuration at all, so that Case is stillaccessible.As for (105b), suppose that a verb such as want takes a complement that mayor may not have Case (want a book, want to leave). Then (105b) askswhether the verb must have two distinct lexical entries, or may have only one,the Case feature simply not being assigned if there is no object.The tacit assumption of earlier work has been that the answer to (105a) ispositive (i.e., (106a), not (106b)) and that the answer to (105b) is negative. Inthe illustrative example, if DP is in [Spec, T] but has accusative or null Case,then it cannot raise to check this still unmatched feature; and a verb like wantcan have a single lexical entry, with the Case feature assigned or not.We now know that the assumption about (105b) was incorrect. If a featurethat is Interpretable is not checked, the derivation crashes. As discussed, allformal features of heads that create checking domains (that is, all their formalfeatures apart from category) are Interpretable (Case and -features of verbsand adjectives, strong features, etc.). Therefore, all must be checked, and theanswer to (105b) is definitely positive.Assuming that, suppose we also replace (105a) by its negation (essentiallyfollowing Ura 1994).(107) Features cannot be checked under feature mismatch.Then in the example of Case conflict, DP will be able to move further toreceive Case, but the derivation will crash because T fails to assign its Caseassigning feature. And verbs with optional objects will have distinct lexicalentries, one with and one without the Case feature.

284

Chapter 4

Considerations of conceptual naturalness seem to favor (107) over the alternative, in part because of the answer to (105b). We will return to some furthersupport for the same conclusion.To show that all unwanted constructions are barred by the condition (107)combined with the positive answer to (105b), it is necessary to survey a broadrange of possibilities: thus, the subject might have its (improper) accusativeCase checked in the checking domain of V raised to T, while the (improper)nominative Case feature of the object raises to T for checking nominative Case.Both subject and T might raise to higher positions for Case checking. Optionsproliferate, and it is no simple matter to show that all are blocked for one oranother reason. It is therefore reasonable to refine (107) to bar these possibilities across the board: feature mismatch cancels the derivation. The complexityof the computations required to determine whether a derivation will convergeis therefore sharply reduced, a desideratum quite generally. Furthermore, aswe will see in section 4.10, this step is necessary on empirical grounds as werefine the framework in accord with the Minimalist Program.I will therefore strengthen (107) to (108).(108) Mismatch of features cancels the derivation.A configuration with mismatched features is not a legitimate syntactic object.82We distinguish mismatch from nonmatch: thus, the Case feature [accusative]mismatches F = [assign nominative], but fails to match F = I of a raisinginfinitival, which assigns no Case. I have left the notion match somewhatimprecise pending a closer analysis. But its content is clear enough for presentpurposes: thus, the categorial feature D of DP matches the D-feature of I;-features match if they are identical; and so on.Notice that cancellation of a derivation under mismatch should be distinguished from nonconvergence. The latter permits a different convergent derivation to be constructed, if possible. But the point here is literally to baralternatives. A canceled derivation therefore falls into the category of convergent derivations in that it blocks any less optimal derivation; mismatchcannot be evaded by violation of Procrastinate or other devices. If the optimalderivation creates a mismatch, we are not permitted to pursue a nonoptimalalternative.Suppose, for example, that a series of applications of Merge has formed averb phrase with DP1 as specifier and DP2 as complement, bearing accusativeand nominative Case, respectively. We will see that the optimal derivation fromthat point leads to mismatch. Since mismatch is equivalent to convergencefrom an economy-theoretic point of view, we cannot construct a less optimalderivation from that might converge, with the thematic subject bearing

Categories and Transformations

285

accusative Case and the thematic object nominative Case. The interpretationis motivated on purely conceptual grounds: it sharply reduces computationalcomplexity. Again, conceptual and empirical considerations converge.We now distinguish between a checking configuration and a checkingrelation. Suppose that K attracts F, which raises to form {H(K), {, K}}; here = F if the operation is covert, and includes whatever is required for convergence if the operation is overt. Each feature of FF[F] (including F) is in thechecking domain of each sublabel f of K.83 We now say that(109) Feature F of FF[F] is in a checking configuration with f; and F is ina checking relation with f if, furthermore, F and f match.If F and f fail to match, no problem arises; if they mismatch (conflict), thederivation is canceled with an illegitimate object.In the illustrative example, if DP has nonnominative Case and has beenraised to [Spec, T], the Case feature [CF] of DP is in a checking configurationwith the Case feature of T, but not in a checking relation with it. Hence, thetarget TP did not attract [CF], because no checking relation is established. Itdoes, however, attract the categorial feature [D] of DP, to satisfy the EPP. Butthen [CF] is in a mismatching checking configuration with f, and the derivationis canceled.Suppose that f is the Case-assigning feature of K, and have the uncheckedCase features F and F (respectively), and F but not F matches f. Supposethat is closer to K than . Does prevent K from attracting ? The Casefeature F of does not do so; it is not attracted by K and is therefore no morerelevant than some semantic feature of . Suppose, however, that has someother feature F that can enter into a checking relation with a sublabel of K.Then is attracted by K, which cannot see the more remote element . Amismatching relation is created, and the derivation is canceled: cannot beattracted.Consider again the superraising example (80) (= I(nfl) seems [that it wastold John ]). The intermediate DP it is able to check the D-feature but notthe Case feature of matrix I; the more remote DP John can check all featuresof I. But I cannot see beyond it, so only that intervening element can raise,causing the derivation to crash. Had John been permitted to raise, the derivation would wrongly converge.The definition (82) of the MLC therefore has the somewhat sharper form(110) as a consequence of the refinement of the concept checking relation.(110) Minimal Link ConditionK attracts only if there is no , closer to K than , such that Kattracts .

286

Chapter 4

The definition of the operation Attract/Move incorporates this property, yielding the MLC in what seems to be the required form. I leave this here as thefinal version of the MLC, apart from a refinement of the notion closenessin section 4.10.Consequences ramify, and merit further thought.The notions of checking theory have been defined only for Attract/Move,but we have seen that that may be too narrow. Thus, checking domains areestablished by Merge in (64), repeated here as (111), and in simple expletiveconstructions such as (112).(111) a. (I wonder) [CP whether Q [he left yet]]b. (I wonder) [CP [Q if Q] [he left yet]](112) there is a book on the tableIn these cases whether, if, and there remain in their base positions, but satisfythe strong features of Q, I. The operations are closely analogous to raising ofa wh-phrase, I, or DP to become [Spec, I], checking the strong features in thesame way.These cases of merger will fall under checking theory if we extend thenotion close to distinct syntactic objects and K, taking the categorialfeature CF of the head H() to be close to K. K attracts , then, if CF entersinto a checking relation with a sublabel of K, in which case becomes [Spec,H(K)] or adjoins to H(K) (or its zero-level projection, if elements have alreadyadjoined to H(K)). The result is as intended for (111) and (112), but is toobroad elsewhere. Thus, it would cover merger of the subject Subj with Vwithin any version of the VP-internal subject hypothesis and would thus forcethe subject to have the accusative Case and object agreement features of thetransitive main verb of the VP (perhaps raised to the light verb v of which Subjis the specifier).The required distinction can be made in various ways. One is to extendAttract/Move only to merger of nonarguments, keeping strictly to the conception of Attract/Move as the formal expression of the feature-checkingproperty of natural language. That has some plausibility. Arguments (andoperator phrases constructed from them) satisfy the Chain Condition nontrivially; an argument is a nontrivial chain CH = (, t), where has raisedfor feature checking and t is in a -position. In contrast, the elementswhether, if, and there of (111) and (112) do not satisfy the Chain Conditionat all.A second approach is to allow the categorial feature CF of to be attractedby a distinct object K only if it enters into a checking relation with a strongsublabel of K. The rationale is that Merge creates a checking domain only

Categories and Transformations

287

when overt insertion is forced, violating Procrastinate in the case of Move

hence a special case for both Merge and Move.I will adopt the first option, though without strong reasons at this point.Some will be suggested in section 4.10, though they rely on specific choicesabout mechanisms that, while reasonable, are not well grounded. The choiceis therefore highly tentative.4.6

Movement and -Theory

Under any approach that takes Attract/Move to be driven by morphological

featureswhether Move F, Move , and Greed, or some other variantthereshould be no interaction between -theory and the theory of movement. -rolesare not formal features in the relevant sense; typically they are assigned in theinternal domain, not the checking domain, and they differ from the featuresthat enter into the theory of movement in numerous other respects. The conclusion is immediate in Hale and Keysers (1993a) configurational approach to-theory, implicit in some others (though rejected in theories that permit percolation, transmission, and other operations on -features). Let us assume itto be valid.In fundamental respects, -theory is virtually complementary to the theoryof checking, a fact expressed in part as a descriptive generalization in the ChainCondition: in the chain CH = (1, , n), n receives a -role and 1 entersinto a checking relation. Furthermore, only n can assign a -role, so that onlythe base position is -related, able to assign or receive a -role (see note75). The properties of 1 follow from Last Resort movement. Consider theproperties of n, that is, the fact that movement takes place from a positionthat is -related to one that is not: for an argument, from a -position to anon--position; for a head (or predicate), from a position in which a -role isassigned to one in which it is not.With regard to assignment of -roles, the conclusion is natural in Hale andKeysers theory. A -role is assigned in a certain structural configuration; assigns that -role only in the sense that it is the head of that configuration(though the properties of or its zero-level projection might matter). Suppose raises, forming the chain CH = (, ... , t). The trace t remains in the structuralconfiguration that determines a -role and can therefore function as a -roleassigner; but the chain CH is not in a configuration at all, so cannot assign a-role. In its raised position, can function insofar as it has internal formalfeatures: as a Case assigner or a binder. But in a configurational theory of-relations, it makes little sense to think of the head of a chain as assigning a-role.84

288

Chapter 4

With regard to receipt of -roles, similar reasoning applies. If raises to a

-position Th, forming the chain CH = (, t), the argument that must bear a-role is CH, not . But CH is not in any configuration, and is not an argument that can receive a -role. Other conditions too are violated under earlierassumptions or others like them, but I will not spell out the problems further.We conclude, then, that a raised element cannot receive or assign a -role.-relatedness is a base property, complementary to feature checking, whichis a property of movement. More accurately, -relatedness is a property of theposition of merger and its (very local) configuration. The same considerationsbar raising-to-object, even if the object is a specifier in a Larsonian shell. Wethus derive the P&P principle that there is no raising to a -positionactuallyin a somewhat stronger form, since -relatedness generally is a property ofbase positions.Thus, DP cannot raise to [Spec, VP] to assume an otherwise unassigned-role. There can be no words HIT or BELIEVE sharing the -structure of hitand believe but lacking Case features, with John raising as in (113) to pick upthe -role, then moving on to [Spec, I] to check Case and agreementfeatures.(113) a. John [VP t [HIT t]]b. John [VP t [BELIEVE [t to be intelligent]]]Surely no strong feature of the target is checked by raising to the [Spec, HIT]position, so overt raising is barred; in fact, no checking relation is established.85The only possibility is direct raising to [Spec, I]. The resulting sentences JohnHIT and John BELIEVES to be intelligent are therefore deviant, lacking theexternal argument required by the verb.The deviance of (113a) sheds some light on the question left open about-role assignment in a configuration headed by unraised to which hasadjoined, yielding the complex [ ]; see note 84. Can , which heads achain, participate in -role assignment in the configuration headed by ?Suppose so. In the illegitimate John I HIT t, the argument chain CH = (John,t) satisfies the Chain Condition: John receives Case in [Spec, I] and t receivesa -role within the VP. HIT then raises to I, so that John falls into its checkingdomain. If John can receive the subject -role in the configuration [Spec, I]headed by the complex [I HIT I] formed by V-raising to I, then all propertiesare satisfied and the expression should be well formed, with John bearing adouble -role. Assuming that the expression is deviant, HIT cannot contributeto assigning a -role when it adjoins to I. The principle that -relatedness is abase property, restricted to configurations of lexical insertion, has to beunderstood in an austere form.

Categories and Transformations

289

What is the nature of the deviance in the derivation of (113)? More generally, what is the status of a violation of the -Criterion, whether it involves anunassigned -role or an argument lacking a -role? No relevant performancetests are available, but economy conditions might provide evidence. Thus, ifthe derivation converges, it could block others under economy conditions, sothe deviance would have to be a case of nonconvergence if that result is empirically wrong.86To illustrate these possibilities, suppose that the theory of economy entailsthat shorter derivations block longer ones. The assumption seems plausible.87Kitahara (1994) observes that the condition can be invoked to deduce Procrastinate, if we assume that application of phonological rules counts in determining length of derivation; trace deletion is required only if an operation is overt,so covert operations are always preferred, if they yield convergence. Theeconomy condition receives independent confirmation in section 4.10.Now consider any simple sentence, say, (114a) with structure (114b), beforeSpell-Out.(114) a. John likes Billb. [John I VP]There is a derivation in which John is inserted directly in [Spec, I], not raisingfrom VP. If the VP-internal subject hypothesis is correct, as so far assumed,that derivation must crash, or it will block the less economical derivation withraising of John to [Spec, I]; the two begin with the same numeration, but thedesired one has an extra step. All formal features are checked.88 Insertion ofJohn satisfies the EPP, and other features are checked as or by free riders. Theonly defects of the unwanted derivation lie in -theory: the argument Johnlacks a -role, and like does not assign its external -role. If either of theseproperties constitutes a violation of FI, the derivation crashes, and the problemdisappears.The shortest derivation condition, then, entails that a violation of the-Criterion causes the derivation to crash, by failure to satisfy FI. Still openis the question whether the problem is the failure of an argument to receive a-role, or the failure of a -assigning head to assign its -role, or both. Independent reasons for the first, at least, are given in section 4.9: an argumentwithout a -role violates FI, causing the derivation to crash. Note that thesequestions about violation of the -Criterion have no relation to the conclusionthat -role is not a formal property that permits Last Resort movement, as Caseand agreement features do.The form of the VP-internal subject hypothesis has so far been leftvague. We have, however, generally assumed a version of the Hale-Keyser

290

Chapter 4

approach to -theory that has certain consequences for the hypothesis.89 In

particular, if a verb has several internal arguments, then we have to postulatea Larsonian shell, as in (98) or (115), where v is a light verb to which V overtlyraises.(115)

vmaxv

VPV

The internal arguments occupy the positions of specifier and complement of

V. Accordingly, the external argument cannot be lower than [Spec, v]. If it is[Spec, v], as I will assume, then the vVP configuration can be taken to expressthe causative or agentive role of the external argument. It would be natural toextend the same reasoning to transitive verb constructions generally, assigningthem a double-VP structure as in (115), the agent role being understood as theinterpretation assigned to the vVP configuration. A Vobject construction istherefore maximal, not V. The conclusion gains further support if such constructions may assign a partially idiosyncratic semantic role (See Marantz1984). If intransitive (unergative) verbs are hidden transitives, as Hale andKeyser suggest, then only unaccusatives lacking agents would be simple VPstructures. The analysis, which is natural in the present framework, also haswelcome empirical consequences, as we will see. I will assume it in whatfollows.In these terms, failure of a transitive verb to assign an external -role couldbe interpreted as simply meaningless. The external role is a property of thevVP configuration, and a specifier bearing this role is therefore a necessarypart of the configuration; a transitive verb assigns an external -role by definition.90 The question of the nature of the deviance in (113) would therefore notarise: there are no such objects. The only remaining question is whether failureof an argument to receive a -role is a case of nonconvergence or deviantconvergence. Presumably it is the former, under the natural interpretation ofFIthe conclusion we will reach in section 4.9.The spirit of this analysis requires that there be no AgrP intervening betweenthe light verb v and its VP complement in (115), contrary to what is assumedin the most extensive development of the complex-VP analysis of transitives(see Koizumi 1993, 1995). The issue takes a different form under refinementsintroduced in section 4.10. We will return to the whole issue in sections 4.9and 4.10.

Categories and Transformations

4.74.7.1

291

Properties of the Transformational Component

Why Move?

We have so far considered two operations, Merge and Move, each withtwo cases, substitution and adjunction. The operation Merge is inescapable onthe weakest interface conditions, but why should the computational systemCHL in human language not be restricted to it? Plainly, it is not. The mostcasual inspection of output conditions reveals that items commonly appeardisplaced from the position in which the interpretation they receive is otherwise represented at the LF interface.91 There is no meaningful controversyabout the basic facts. The only questions are, what are the mechanisms ofdisplacement, and why do they exist? As for their nature, on minimalistassumptions we want nothing more than an indication at LF of the position inwhich the displaced item is interpreted; that is, chains are legitimate objectsat LF. Since chains are not introduced by selection from the lexicon orby Merge, there must be another operation to form them: the operationAttract/Move.The second questionwhy do natural languages have such devices?arosein the early days of generative grammar. Speculations about it invoked considerations of language use: facilitation of parsing on certain assumptions, theseparation of theme-rheme structures from base-determined semantic () relations, and so on.92 Such speculations involve extraneous conditions of thekind discussed earlier, conditions imposed on CHL by the ways it interacts withexternal systems. That is where we would hope the source of imperfectionswould lie, on minimalist assumptions.Our concern here is to determine how spare an account of the operationAttract/Move the facts of language allow. The best possible result is that bareoutput conditions are satisfied in an optimal way.This question was a second focus of the effort to resolve the tension betweendescriptive and explanatory adequacy, alongside the steps that led to X-bartheory in the 1960s. The central concern was to show that the operation Move is independent of ; another, to restrict the variety of structural conditionsfor transformational rules. These efforts were motivated by the usual dualconcerns: the empirical demands posed by the problems of descriptive andexplanatory adequacy, and the conceptual demands of simplicity and naturalness. Proposals motivated by these concerns inevitably raise the new leadingproblem that replaces the old: to show that restricting the resources of linguistic theory preserves (and we hope, even enhances) descriptive adequacy whileexplanation deepens. The efforts have met with a good deal of success,93though minimalist assumptions would lead us to expect more.

292

4.7.2

Chapter 4

Departures from the Best Case

The properties that motivate Attract/Move have to be captured somehow in the

theory of human language, but we would like to show, if possible, that nofurther departure from minimalist assumptions is required. That is the problemthat comes into sharper focus as explanatory adequacy begins to take its placeon the research agenda.Consider first the independence of Move from choice of . Although thiscurrently seems a reasonable supposition, it has been necessary to distinguishvarious kinds of movement: XP-movement from X0-movement and (amongXPs) A-movement from -movement. Various kinds of improper movementhave been ruled out in various ways (e.g., head raising to an -position followed by raising to Spec). One goal is to eliminate any such distinctions,demonstrating on general grounds that the wrong kinds of movementcrashnot an easy problem, though it is by now substantially reduced.Some of the general constraints introduced to reduce the richness ofdescriptive apparatus also had problematic aspects. An example is Emondssinfluential structure-preserving hypothesis (SPH) for substitution operations.As has been stressed particularly by Jan Koster, the SPH introduces anunwanted redundancy in that the target of movement is somehow therebefore the operation takes place; that observation provides one motive fornonderivational theories that construct chains by computation on LF (orS-Structure) representations. The minimalist approach overcomes the redundancy by eliminating the SPH: with D- Structure gone, it is unformulable, itsconsequences derivedwe hope to showby the general properties of Mergeand Attract/Move.It has also been proposed that something like the SPH holds of adjunction:bar levels are matched within adjunctions. This extended SPH introduces noredundancy and is not affected by the Minimalist Program, though we wouldlike to deduce it from more elementary considerations.The descriptive facts are not entirely clear, but they might be as justdescribed: only YP can adjoin to XP and only Y0 can adjoin to X0, thoughcovert operations may have the apparent effect of adjoining YP to nonmaximalX0 (e.g., VP-adjunction to causative Vc, which we now take to be adjunctionof formal features of the main verb to Vc). We then have several problems todeal with.(116) a. Why does the SPH hold at all?b. Why is there a difference before and after Spell-Out, apparentlyviolating the (optimal) uniformity assumption on CHL?c. Why does the target K project after adjunction?

Categories and Transformations

293

Under the feature movement theory we are now assuming, the questions arenarrower still. All covert raising is adjunction of features, so question (116b)dissolves and (116c) arises only for overt adjunction. In that case it hasalready been answered for adjunction to X0 and to projecting XP. Furthermore,under the interpretation of Last Resort and checking domain that we are nowassuming (see (51) and note 47), it is unclear that adjunction to nonminimalXP is possible at all; let us assume so nevertheless and see what problemsarise.What remains, then, is question (116a) and three cases of question (116c)for overt movement: adjunction of YP to XP that is(117) a. the rootb. a specifierc. itself an adjunctConsider question (116a). In the case of overt movement, the answer maylie in part in properties of the morphological component. At Spell-Out, thestructure already formed enters Morphology, a system that presumably dealsonly with wordlike elements, which we may take to be X0sthat is, either anitem H selected from the lexicon or such an item with elements adjoined to itto form H0max. Suppose we assume the property (118).(118) Morphology deals only with X0 categories and their features.The morphological component gives no output (so the derivation crashes) ifpresented with an element that is not an X0 or a feature.On this natural assumption, the largest phrases entering Morphology areX0s, and if some larger unit appears within an X0, the derivation crashes.Question (116a) remains, then, only for adjunction of nonmaximal (eithera feature or a category) to nonminimal XPXP with at least a head H(XP)and a complement. Both cases are barred under the condition (119) on checking domains (see note 47).(119) adjoined to nonminimal K is not in the checking domainof H(K).If so, the operation we are trying to eliminate cannot take place. We willsee directly that there is reason to believe that (119) holds. Then question(116a) is fully answered, and we are left with the three cases (117) of question(116c).All cases of (117) are barred by Last Resort if checking domains are construed as in (119). A number of special cases are barred for independentreasons.

294

Chapter 4

Summarizing, the asymmetry of projection after movement has solid

grounds: it is only the target that can project, whether movement is substitutionor adjunction. The only obvious problem is that the constraints appear to betoo strong, barring YP adjunction to XP entirely.We have found a number of reasons to question the status of YPadjunction to XP. There are others. Suppose we suspend the barriers to theoperation already found, thus allowing adjunction of to nonminimal K, asin (120).(120)

Suppose that projects so that L = and the category formed is [, ]. We

have to determine what is the head of the chain formed by the adjunctionoperation: is it , or the two-segment category [, ]? The latter choice isruled out by the uniformity condition (17). But the former leaves us with acategory [, ] that has no interpretation at LF, violating FI. It cannot be, then,that the raised element projects, if it requires an interpretation. The sameproblem would have arisen, this time for , had we taken the head of the chainto be [, ]. Similar questions arise about the target, if it projects. Once again,adjunction to non-minimal XP leads to complications however we construethe structure formed, raising further questions about its status.Still assuming such adjunction to be possible, consider the special case ofself-attachment, as in (121).(121)

vV

VPt

Thus, suppose we have the VP read the book and we adjoin to it the head read,forming the two-segment category [read [t the book]]. Under the intendedinterpretation of (121), with the target projected, we have formed the object(122), where is the target VP = {read, {read, the book}} (omitting furtheranalysis).(122) {read, read, {read, }},Suppose, however, that we had projected the adjunct V (read) in (121), yielding (123).

Categories and Transformations

(123)

295

VV

VPt

But this too is an informal representation of (122), just as (121) is, though theintended interpretations differ: in (121) we have projected the target, in (123)the adjunct. Furthermore, the latter interpretation should be barred.The same question would arise if the operation were substitution, notadjunction. Suppose, for example, that we raise the head N of NP to [Spec,N], NP nonminimal (necessarily, or there is no operation to discuss). Thenwe construct the same formal object whether we think of NP or Spec asprojecting.We might conclude that this is exactly the right result, with such ambiguityinterpreted as a crashed derivation. Then such operations of self-attachment(whether adjunction or substitution) are barred on grounds independent ofthose already discussed.Let us turn now to the case of raising of V in a complex verb structure, asin (115), repeated here.(124)

Xmaxv

VPV

For several reasons already discussed, the operation cannot have targeted VP,either as adjunction or as substitution. It must be, then, that Xmax is not a projection of the raised verb V but rather a verb phrase distinct from VP, as wehave so far assumed. Thus, V raises to an already filled position occupied bythe light verb v that has been selected from the lexicon and heads its ownprojection, vmax. V adjoins to v, forming [v V v]; the v position is not createdby the raising operation. (For independent evidence, see Collins and Thrinsson 1994.) The operation is permissible if the target v is a light verb requiringa verbal affix. Independently, these conclusions are required by the propertiesof -theory discussed earlier.94We have so far sidestepped a problem that arises in the case of ordinaryhead adjunction. Take , K to be X0s in (120), with raising to target K, whichprojects, forming L = {H(K), H(K), {, K}}. Since K projects, is maximal.

296

Chapter 4

Thus, is both maximal and minimal. If that is true of t as well (e.g., the caseof clitic raising), then CH satisfies the uniformity condition (17). But supposet is nonmaximal, as is common in the case of V-raising to I or to V. Then undera natural interpretation, (17) is violated; CH is not a legitimate object at LF,and the derivation crashes. That is obviously the wrong result. We thereforeassume that at LF, wordlike elements are immune to the algorithm thatdetermines phrase structure status, as stated in (125),(125) At LF, X0 is submitted to independent word interpretationprocesses WI.where WI ignores principles of CHL, within X0.95 WI is a covert analogueof Morphology, except that we expect it to operate compositionally, unlikeMorphology, on the assumption that the N LF mapping is uniformthroughout.Suppose that (120) = L = {, {, K}} is formed by adjunction with K projecting, so that = H(K), H(K). So far, there are two ways in which L couldhave been formed: by strict merger of , K (without movement), or by raisingof , forming the chain CH, then merging with K. In either case we formthe structure L with the three terms , K, L. Each of these is a category thatis visible at the interface, where it must receive some interpretation, satisfying FI. The adjunct poses no problem. If it heads CH, it receives the interpretation associated with the trace position; if it is added by strict merger, itis presumably a predicate of K (e.g., an adverbial adjunct to a verb). But thereis only one role left at LF for K and L (a problem already mentioned). Notethat the label = H(K), H(K) is not a term, hence receives no interpretation.If L is nonmaximal, the problem is obviated by (125) under a natural interpretation of WI. This should suffice to account for, say, noun incorporation toverbs, or verb incorporation to causatives or light verbs. What is interpretedas covert incorporation of Xmax to a pure head is permitted straightforwardlyby the Move F theory.Suppose L is nonminimalagain, we suspend the barriers to this questionable case of adjunction to investigate the problems further. We now have twoterms, L and K, but only one LF role. The structure would be permissible ifK lacks a -role, as in covert adjunction to an expletive (a case that we nowassume does not exist). The only other possibility is that the adjunct isdeleted at LF, leaving just K. When would this take place?One case is when is the trace of successive-cyclic movement of the typethat permits intermediate trace deletion, say, along the lines sketched in section1.4.1for example, wh-movement to [Spec, CP] with intermediate adjunction,as in (126).96

Categories and Transformations

297

(126) which pictures of Johns brother did he expect that [t [you wouldbuy t]]Another case is full reconstruction at LF, eliminating the adjunct entirely,thus a structure of the type (127) interpreted only at the trace.(127) [YP XP[YP ... t ...]]]An example would be scrambling interpreted by reconstruction, argued tobe uniformly the case by Saito (1989, and subsequent work). Similarly, itwould follow that such constructions as (128) must be Condition C violations(under the relevant interpretation), and we predict a difference in status between(129) and (126), the latter escaping the violation because the head of the chainis not an adjunct.(128) a. meet John in England, he doesnt expect that I willb. pictures of John, he doesnt expect that I will buy(129) pictures of Johns brother, he never expected that you would buyThe conclusions are plausible as a first approximation, though we enter hereinto a morass of difficult and partially unsolved questions (see Barss 1986,Freidin 1986, Lebeaux 1988, and earlier work; and chapter 3 for somediscussion).On strictly minimalist assumptions, these should be the only possibilitiesfor adjunction:(130) a. word formationb. semantically vacuous targetc. deletion of adjunct (trace deletion, full reconstruction)Apart from (130c), there should be no adjunction to a -related phrase (a -roleassigner or an argument, a predicate or the XP of which it is predicated). Since(130c) is irrelevant to strict merger, the options for the current counterpart tobase adjunction are even narrower. We will consider adjoined adverbialsfurther in section 4.7.5.97Adjunction therefore remains an option under natural minimalist assumptions, but a very limited one with special properties. The primary and perhapsonly case is -adjunction to X0, a feature or (if the operation is overt) anX0. Adjunction of YP to XP does not fit easily into this general approach, andif allowed at all, has a very restricted range.4.7.3

XP-Adjunction and the Architecture of Linguistic Theory

Adjunction of YP to XP has had a central place in transformational generative

grammar from its origins. That is understandable: it provides the most obvious

298

Chapter 4

examples of displacement of phrases from the positions in which they are

interpreted. But as theoretical understanding has evolved, two distinct pathscan be discerned. One path focused on the operations formulated as Move NPand Move wh, later Move and Affect , now Attract F if what precedes iscorrect. For these, XP- adjunction was marginal, introduced primarily fortheory-internal reasons related to the ECP and so on. Another path sought tounderstand such operations as extraposition, right-node raising, VP-adjunction, scrambling, and whatever rearrangements are involved in forming suchexpressions as (131),(131) I took a lot of pictures out of the attic yesterday of my children andtheir friendswith two parts of the phrase took out ... separated by an element a lot of pictures that should not be a phrase at all, and two parts of the unitary phrase alot of [pictures of DP] separated by an element of much wider scope. As thetwo paths increasingly diverge, it becomes more and more reasonable tosuppose that the processes and structures they address do not belong together;furthermore, the latter category appears to be heterogeneous. As alreadyobserved, the former path does not readily incorporate even elementary structures of many kinds; see notes 22, 93.The divide has been sharpened further by inquiry into languages of the sortthat Baker (1995) describes in terms of his polysynthesis parameter, withthe syntax in large part word-internal and arguments attached as adjunctsassociated with internal elements. One might conjecture that such propertiesof UG appear in some manner in languages for which the principle is notof a fundamental nature. Consider, say, scrambling in Japanese, which seemsto share some of those properties, the scrambled element being a kind ofadjunct, external to the major syntactic structure, associated with an internalposition that determines the semantic interpretation (hence the obligatoryreconstruction). Related issues are currently being investigated in widelydiffering languages (see Barbosa 1994). We will return to the matter insection 4.10.In early transformational grammar, a distinction was sometimes madebetween stylistic rules and others. Increasingly, the distinction seems to bequite real: the core computational properties we have been considering differmarkedly in character from many other operations of the language faculty, andit may be a mistake to try to integrate them within the same framework ofprinciples. The problems related to XP-adjunction are perhaps a case in point:they may not really belong to the system we are discussing here as we keepclosely to the first of the two courses just outlined, the one that is concerned

Categories and Transformations

299

with Last Resort movement driven by feature checking within the N

computation. It is within this core component of the language that we find thestriking properties highlighted by minimalist guidelines. It seems increasinglyreasonable to distinguish this component of the language faculty.These speculations again raise the narrower technical question whetherchecking domains include YP adjoined to XP, both nonminimal. The mostdirect empirical motivation for defining checking domains to allow this casewas a version of Kaynes (1989) theory of participial agreement (see sections2.3.2, 3.2). Its basic assumption is that in passive and unaccusative, the objectpasses through the [Spec, AgrO] position (A-movement), checking agreementwith the participle, and then raises to subject, driven by Case; and in operatormovement, the object adjoins to the AgrP (-movement), again checkingagreement in the checking domain of Agr, then raising ultimately to [Spec,CP], driven by the operator feature. In particular, Kayne found dialect differences associated with the two kinds of participial agreement.Dominique Sportiche and Philip Branigan have observed that the operatormovement case is problematic because of such long-distance movement constructions as French (132).98(132) la lettre [quil a [AgrP t [AgrP dit [que Pierre luiathe letter that he hassaid that Pierre to.him has[envoy t]]]]]sentthe letter that he said that Pierre sent to himRaising of the operator from t to t (perhaps with intermediate steps) and thento [Spec, CP] is legitimate successive-cyclic -movement and should yieldparticipial agreement with dit in the higher clause, incorrectly. This suggeststhat agreement (hence, presumably, Case as well) should be restricted to thespecifier position, with long movement barred by the same principles thatbar object raising to a remote position; there are various possible accounts,depending on other aspects of the theory of movement. The dialect differencesnoted by Kayne remain unexplained, however.If these conclusions are correct, we restrict the checking domain of to positions included in (rather than contained in) Max()including theA-positions adjoined to , we have assumed. In brief, we accept the principle(119), which has useful consequences independently, as already discussed.4.7.4

Other Improprieties

Let us return to the problem of improper movement. We want to show that

the wide variety of such cases are excluded on principled grounds. Some fall

300

Chapter 4

into place: for example, such standard cases as improper raising of John fromt1 to t2 to matrix subject as in (133), even if adjunction of John to IP ispermitted.(133) *John is illegal [IP t2 [IP t1 leave]]The complement of illegal requires PRO (it is illegal to leave), so that (null)Case is assigned to the subject of the infinitive. Since John cannot have nullCase, the derivation is canceled by mismatch (see (108)).99Consider cases of the type (134), with t2 adjoined to IP, again putting asidereservations about such processes.(134) *John seems [that [t2 [it was told t1 [that ...]]]]We do not want to permit the intermediate (offending) trace t2 to delete, unlikewhat happens in (126). The distinction suggests a different approach to intermediate trace deletion: perhaps it is a reflex of the process of reconstruction,understood in minimalist terms as in chapter 3. The basic assumption here isthat there is no process of reconstruction; rather, the phenomenon is a consequence of the formation of operator-variable constructions driven by FI, aprocess that may (or sometimes must) leave part of the tracea copy of themoved elementintact at LF, deleting only its operator part. The reconstruction process would then be restricted to the special case of -movement thatinvolves operators.That reconstruction should be barred in A-chains is thus plausible on conceptual grounds. It has some empirical support as well. Under the relevantinterpretation, (135) can only be understood as a Condition B violation,though under reconstruction the violation should be obviated, with him interpreted in the position of t, c-commanded by me; as we have seen, the latterc-commands .(135) John expected [him to seem to me [ t to be intelligent]]That the raised subject does not fully reconstruct is shown as well by thequasi-agentive status commonly conferred in surface subject position (e.g.,for PRO in (136)).(136) [PRO to appear [t to be intelligent]] is harder than one might thinkOther reasons to question whether there is reconstruction in A-chainsarise from consideration of lowering effects of the kind first discussed byRobert May.(137) a. (it seems that) everyone isnt there yetb. I expected [everyone not to be there yet]c. everyone seems [t not to be there yet]

Categories and Transformations

301

Negation can have wide scope over the quantifier in (137a), and it seems in(137b) but not in (137c). If so, that indicates that there is no reconstruction tothe trace position in (137c).100The quantifier interactions could result from adjunction of the matrix quantifier to the lower IP (c-commanding the trace of raising and yielding a wellformed structure if the trace of quantifier lowering is deleted, along the linesof Mays original proposal). But reconstruction in the A-chain does not takeplace, so it appears.Some other cases of improper movement are eliminated within the framework outlined here, such as XP-movement passing through or adjoiningto a pure Y0 position, the trace then deleting. The status of scrambling mightbe reconsidered, and the (apparent) distinction in status between such structures as (126) and (129) as well. The general topic merits a comprehensivereview.So far we have (virtually) kept to the minimalist assumption that thecomputational procedure CHL is uniform from N to LF; any distinction beforeand after Spell-Out is a reflex of other factors. I have said little so farabout the extension condition of chapter 3, which guarantees cyclicity.The condition is empirically motivated for substitution before Spell-Outby relativized minimality effects and others, and it does not hold afterSpell-Out if the Case agreement theory of the minimalist approach is correct.It also cannot hold strictly for adjunction, which commonly (and in the caseof head adjunction, always) targets an element within a larger projection. Itwould be desirable to show that these consequences are deducible, notstipulated.101With regard to Merge, there is nothing to say. It satisfies the extensioncondition for elementary reasons already discussed (see below (11)). Questions arise only in connection with Attract/Move. The operation targets K,raising to adjoin to K or to become the specifier of K, K projecting in eithercase. K may be a substructure of some structure L already formed. That is anecessary option in the covert component but not allowed freely before SpellOutas a result of other conditions, we hope to show.With regard to overt cyclicity, there are several cases to consider. One typeis illustrated by such standard examples as (138).(138) *who was [ a picture of twh] taken t by BillThis is a Condition on Extraction Domain (CED) violation in Huangs (1982)sense if passive precedes wh-movement, but it is derivable with no violation(incorrectly) if the operations apply in countercyclic order, with passive following wh-movement. In this case cyclicity is induced directly by strength.

302

Chapter 4

Unless passive precedes wh-movement, the derivation is canceled by violation

of strength of T (the EPP). Independently, economy conditions might makethe relevant distinction between the competing derivations. Passive is the samein both; wh-movement is longer in the illicit one in an obvious sense, objectbeing more remote from [Spec, CP] than subject in terms of number of XPscrossed. The distinction might be captured by a proper theory of economy ofderivation though the issue is nontrivial, in part, because we are invokinghere a global notion of economy of the sort we have sought to avoid. Suchproblems would be avoided in the approach proposed by Kawashima andKitahara and by Groat, briefly sketched in section 4.4.1 (see also Collins1994a,b, Kitahara 1994, 1995). We will return in section 4.10 to the possibilityof excluding the countercyclic operation under a strict interpretation of theprinciple (95).Another class of cases are the relativized minimality constructions, forwhich the standard account is Rizzis (1990). These fall into three categories:(1) head movement (the HMC); (2) A-movement; (3) A -movement. As discussed, the status of the HMC is unclear. In each case we have two situationsto rule out: (I) skipping an already filled position; (II) counter-cyclic operations(i.e., movement that skips a potential position that is later filled). Situation(I) falls under the MLC, which is incorporated into the definition of Move/Attract. As for (II), category (1) is not a problem; head insertion is affectednecessarily by pure merger, which satisfies the extension condition. Theproblem also cannot arise for strong features. Questions still remain; they willbe reformulated and simplified in section 4.10.It may be, then, that there is no need to impose the extension condition ofchapter 3 on overt operations. Furthermore, neither the phonological nor thecovert component can access the lexicon, for reasons already discussed. TheMorphology module indirectly allows variation before and after Spell-Out, asdo strength of features and such properties of language as the PF conditionson movement that induce generalized pied- piping. All of these conditionsreflect properties of the A-P interface. It seems possible to maintain the conclusion that the computational system CHL is uniform from N to LF, in that nopre- versus post-Spell-Out distinction is stipulated. And perhaps we canapproach the optimal conclusion that bare output conditions are satisfied aswell as possible.4.7.5

Adjuncts and Shells

We have so far said little about such structures as (139), with an adverbialadjoined to XP to form the two-segment category [XP, XP], projectedfrom X.

Categories and Transformations

303

XP2

(139)Adv

XP1X

YP

We may, perhaps, assume that the construction is barred if XP has a semantic

role at LF (see (130) and note 97)say, if XP is a predicate (AP or VP), asin the vmax structure (140a) underlying (140b), if the analysis of transitiveconstructions in (115) is correct.

books

John v VP often VP reads

to his children b. *John reads often (books, to his children)

(140) a.

Such structures as (139) could have been derived either by Merge or by Move.The latter possibility can perhaps be ruled out in principle: adverbs seem tohave no morphological properties that require XP-adjunction (even if it ispossible, a dubious idea, as noted). The empirical evidence also suggests thatadverbs do not form chains by XP-adjunction (see p. 43). Thus, an adverb inpre-IP position cannot be interpreted as if it had raised from some lowerposition.102The only option, then, is Merge. The question is whether and how baseadjunction (in the EST sense) operates above the level of word formation.We have speculated that it is barred if XP is semantically active, as in (140a).Irrespective of the status of the word sequences, the structure assigned in(140a) is not permitted.As a first approximation, at least, adverbials cannot be adjoined by Mergeto phrases that are -related (arguments or predicates). If so, they can be baseadjoined only to X or to phrases headed by v or functional categories.Adjunction to X by merger does not conflict with the conclusion that X isinvisible to CHL; at the point of adjunction, the target is an XP, not X.Such constructions as (140a) have played a considerable role in linguistictheory since Emondss (1978) studies of differences between V-raising andnonraising languages (French, English). The basic phenomena, alongside(140a), are illustrated by (141ab) (both well formed in French).(141) a. John reads often to his childrenb. *John reads often booksA proposal sometimes entertained is that V raises from the underlying structure (140a) to form (141a), but such raising is barred in (141b) for Case

304

Chapter 4

reasons; accusative Case is assigned to books by read under an adjacency

condition of the kind proposed by Stowell (1981). French differs in the adjacency property or in some other way.Apart from the fact that the source construction (140a) is barred if the discussion earlier is accurate,103 the general approach is problematic on minimalist grounds. This framework has no natural place for the condition of adjacency.Furthermore, if Case is assigned by raising to a [Spec, head] position, as weassume, adjacency should be irrelevant in any event. It is also unclear why theverb should raise at all in (141), or where it is raising to. It seems that eitherthe proposed analysis is wrong, or there is a problem for the minimalistframework.In fact, the empirical grounds for the analysis are dubious. Consider suchadverbial phrases as every day or last night, which cannot appear in the position of often in (140a).(142) *John every day reads to his childrenNevertheless, we still find the paradigm of (141).(143) a. John reads every day to his childrenb. *John reads every day booksFurthermore, similar phenomena appear when raising is not an option at all,as in (144).(144) a. John made a decision (last night, suddenly) to leave townb. John felt an obligation (last night, suddenly) to leave townHere the adverbial may have matrix scope, so that it is not within the infinitivalclause. It can appear between the N head and its complement, though the Ncannot have raised in the manner under discussion. The examples indicate thatat least in structures of the form (145), we cannot conclude from the matrixscope of Adv that has raised out of the embedded phrase .(145) [[ Adv [to-VP]]]In general, it is doubtful that raising has anything to do with the relevantparadigms.The phenomena suggest a Larsonian solution. Suppose that we exclude(140a) from the paradigm entirely, assuming that often appears in somehigher position and thus does not exemplify (139) with XP = VP. The structureunderlying (141) and (143) is (146); that is true even if is absent, underassumptions about transitive verb structures adopted earlier (see discussionof (115)).

Categories and Transformations

(146)

305

VP1John

V1v

VP2

V2reads

Here VP1 and V1 are projections of the light verb v, VP2 and V2 are projections of read, and reads raises to adjoin to v.Suppose that in (146) is the adverbial often. Then if = to the children,there is no problem. But if = books, the derivation will crash; books cannotraise to [Spec, Agr] to have its Case checked because there are two closerintervening elements: the subject John and often.104 The Relativized Minimality violations are not overcome by V-raising to v, then v-raising to AgrO; thecombined operations leave closer to AgrO than = books. Recall that bookscannot raise to [Spec, v], for reasons already discussed.Under this analysis, the basic facts follow with no special assumptions.There is a Case solution, but it does not involve adjacency. The problemof optional raising is eliminated, along with those suggested by (143)and (144).Questions remain about other matters, among them: What is the basisfor the French-English distinction, and is it somehow reducible to overtV-raising? Why do the wh-variants of the adverbials in question behave likeadjuncts, not arguments?105 What about CED effects in the case of suchadjuncts as Adj in (147), which might be in a complement position if theanalysis generalizes?(147) they [read the book [Adj after we left]]I leave such questions without any useful comment.Another class of questions has to do with the scope of adverbials in ECMconstructions. Consider the sentences in (148).(148) a. I tell (urge, implore) my students every year (that they should gettheir papers in on time, to work hard)b. I would prefer for my students every year to (get their papers in ontime, work hard)

306

Chapter 4

c. I believe my students every year to (work hard, have gotten their

papers in on time)Under the Larsonian analysis just outlined, every year should have matrixscope in (148a), and (148c) should have the marginal status of (148b), withembedded scope if interpretable at all. The differences seem to be in theexpected direction, though they are perhaps not as sharp as one might like.We would incidentally expect the distinction to be obviated in a V-raisinglanguage such as Icelandic, as appears to be the case (Dianne Jonas, personalcommunication).The same analysis predicts that matrix scope should be marginal at best in(149) and (150).(149) a. I hear [him often talk to his friends]b. Ive proved [him repeatedly a liar](150) Ive proved [him repeatedly to be a liar]The cases of (149) come out about as expected, but such examples as (150)have been cited since Postals early raising-to-object work (1974) to illustratematrix scope. We are left with some unclarity about the proper idealization ofthe data with extraneous factors removed.A plausible conclusion seems to me that the scope of the embedded elementis narrow, as in (148) and (149), and that (150) involves the kind of rearrangement that has been called extraposition in the past, but that may not belongat all within the framework of principles we are considering; see section 4.7.3.The wide scope interpretation may then fall together with such cases as (131)and (144)(145), in which overt raising to a higher position is hardly likely.106This is speculative, however.Similar qualifications may well be in order with regard to multiple-adjunctstructures such as (151a), particularly if they also involve multiple rearrangement, as in (151b) and (131).(151) a. John watched a documentary with great interest yesterday twice ina Boston theaterb. John watched a documentary yesterday that he really enjoyedabout the French revolution by the author we met of the mostinteresting novels that any of us had ever readWhatever may be involved in such cases, it is unlikely that proliferation ofshells is relevant. Even if that analysis is assumed for multiple adjuncts, thereis little reason to suppose that the verb raises repeatedly from deep in thestructure; rather, if a shell structure is relevant at all, the additional phrasesmight be supported by empty heads below the main verb, which might be no

Categories and Transformations

307

more deeply embedded than in (146). Pending better understanding of this

whole range of topics, it seems premature to speculate.The same kind of analysis seems appropriate for a wide range of complexverbal structures, whether or not elements of the internal domain are NP arguments. Consider such constructions as they looked angry to him. Here angryand to him are within the internal domain of looked, though their relative position is unclear; surface order may be misleading, for reasons discussed.Suppose that on the analogy of seem to constructions (see discussion of (98)),angry is the complement. The structure would then be (152).(152)

It is possible that a similar approach might cover part of the territoryassigned to a process of reanalysis in earlier work. Compare (153a) and(153b).(153) a. this road was recently driven onb. *this road was driven recently onPreposition stranding of the kind illustrated in (153a) has varying degrees ofacceptability, but examples degrade rapidly when an adverb intervenes as in(153b), a phenomenon that has been used to support the idea that VP constructions sometimes reanalyze as a new verb V. An analysis along the linesof (146) yields the same resultswithout, however, addressing other familiarquestions, for example, why are such sentences as which road did John driverecently on also degraded, and why do we sometimes find idiomatic interpretations in the reanalyzed cases?4.8

Order

Nothing has yet been said about ordering of elements. There is no clear evidence that order plays a role at LF or in the computation from N to LF. Let

308

Chapter 4

us assume that it does not. Then ordering is part of the phonological component, a proposal that has been put forth over the years in various forms. If so,then it might take quite a different form without affecting CHL if language useinvolved greater expressive dimensionality or no sensorimotor manifestationat all.It seems natural to suppose that ordering applies to the output of Morphology, assigning a linear (temporal, left-to-right) order to the elements itforms, all of them X0s though not necessarily lexical items. If correct, theseassumptions lend further reason to suppose that there is no linear order inthe N LF computation, assuming that it has no access to the output ofMorphology.The standard assumption has been that order is determined by the headparameter: languages are basically head-initial (English) or head-final (Japanese), with further refinements. Fukui has proposed that the head parameterprovides an account of optional movement, which otherwise is excluded undereconomy conditions apart from the special case of equally economical alternative derivations. He argues that movement that maintains the ordering of thehead parameter is free; other movement must be motivated by Greed (LastResort). Thus, in head-final Japanese, leftward movement (scrambling, passive)is optional, while in head-initial English, such operations must be motivatedby feature checking; and rightward extraposition is free in English, thoughbarred in Japanese.107Kayne (1993) has advanced a radical alternative to the standard assumption,proposing that order reflects structural hierarchy universally by means of theLinear Correspondence Axiom (LCA), which states that asymmetric c-command (ACC) imposes a linear ordering of terminal elements; any category thatcannot be totally ordered by LCA is barred. From Kaynes specific formulation, it follows that there is a universal specifier-head-complement (SVO)order and that specifiers are in fact adjuncts. A head-complement structure,then, is necessarily an XP, which can be extendedexactly once, on Kaynesassumptionsto a two-segment XP.The general idea is very much in the spirit of the Minimalist Program andconsistent with the speculation that the essential character of CHL is independent of the sensorimotor interface. Let us consider how it might be incorporated into the bare phrase structure theory. That is not an entirely straightforwardmatter, because the bare theory lacks much of the structure of the standardX-bar theory that plays a crucial role in Kaynes analysis.108Kayne offers two kinds of arguments for the LCA: conceptual and empirical, the latter extended in subsequent work (see particularly Zwart 1993and Kayne 1994). The conceptual arguments show how certain stipulated

Categories and Transformations

309

properties of X-bar theory can be derived from the LCA. The empirical arguments can largely be carried over to a reformulation of the LCA within thebare theory, but the conceptual ones are problematic. First, the derivation ofthese properties relies crucially not just on the LCA, but on features of standardX-bar theory that are abandoned in the bare theory. Second, the conclusionsare for the most part immediate in the bare theory without the LCA.109Let us ask how a modified LCA might be added to the bare theory. Thereis no category-terminal distinction, hence no head-terminal distinction and noassociated constraints on c-command. Suppose we have the structure (154),which is the bare-theory counterpart to several of the richer structures thatKayne considers.K

(154)

jm

Here L is either m or p, K is either j or L. K may be either a separate category

or a segment of either [K, j] or [K, L], depending on which projects. The headsare the terminal elements j, m, p. Assuming that L is not formed by adjunction,either m or p is its head and the other is both maximal and minimal; say m isthe head, for concreteness, so L is mP.Suppose that K is a separate category and L projects, so that j is a specifierin an A-position. ACC holds (j, m) and (j, p), so j must precede m and p. Butit would hold of (m, p) only if the single-terminal p (the complement of thehead m) were replaced by a complex category. Hence, we have the orderspecifier-head-complement, though only for nontrivial complement.Suppose that instead of terminal j we had branching J, with constituents ,. L is an X, neither maximal nor minimal, so it does not c-command.110Therefore, the ACC relations are unchanged.Suppose that K is a separate category and j projects, so that it is the headof K with complement L. ACC holds as before.Suppose that K is a segment, either j or L. There is no particular problem, but adjunct-target order will depend on the precise definition ofc-command.In brief, the LCA can be adopted in the bare theory, but with somewhatdifferent consequences. The segment-category distinction (and the relatedones) can be maintained throughout. We draw Kaynes basic conclusion aboutSVO order directly, though only if the complement is more complex than asingle terminal.

310

Chapter 4

Let us return to the case of L = mP with the single-terminal complement p,

both minimal and maximal. Since neither m nor p asymmetrically c-commandsthe other, no ordering is assigned to m, p; the assigned ordering is not total,and the structure violates the LCA. That leaves two possibilities. Either weweaken the LCA so that nontotal orderings (but not contradictory orderings)are admissible under certain conditions, or we conclude that the derivationcrashes unless the structure N = [L m p] has changed by the time the LCAapplies so that its internal structure is irrelevant; perhaps N is converted byMorphology to a phonological word not subject internally to the LCA,assuming that the LCA is an operation that applies after Morphology.Consider the first possibility: is there a natural way to weaken the LCA?One obvious choice comes to mind: there is no reason for the LCA to orderan element that will disappear at PF, for example, a trace. Suppose, then, thatwe exempt traces from the LCA, so that (154) is legitimate if p has overtlyraised, leaving a trace that can be ignored by the LCA. The second possibilitycan be realized in essentially the same manner, by allowing the LCA to deletetraces. Under this interpretation, the LCA may eliminate the offending tracein (154), if p has raised.In short, if the complement is a single-terminal XP, then it must raise overtly.If XP = DP, then its head D is a clitic, either demonstrative or pronominal,which attaches at a higher point (determined either generally, or by specificmorphological properties).111 If XP = NP, then N must incorporate to V (andwe must show that other options are blocked). Clitics, then, are bare Ds withoutcomplements, and noun incorporation must be restricted to nonreferentialNPs (as noted by Hagit Borer), assuming the quasi-referential, indexicalcharacter of a noun phrase to be a property of the D head of DP, NP being akind of predicate. Within DP, the N head of NP must raise to D (as argued ina different manner by Longobardi (1994)).112We therefore expect to find two kinds of pronominal (similarly, demonstrative) elements, simple ones that are morphologically marked as affixes andmust cliticize, and complex ones with internal structure, which do not cliticize:in French, for example, the determiner D (le, la, etc.) and the complex elementlui-mme himself . In Irish the simple element is again D, and the complexone may even be discontinuous, as in an teach sin that house, with determineran-sin (Andrew Carnie, personal communication). A phenomenon that maybe related is noted by Esther Torrego. In Spanish the Case marker de can beomitted in (155a), but not in (155b).(155) a. cerca de la plaza near the plazab. cerca de ella near it

Categories and Transformations

311

When de is deleted in (155a), D = la can incorporate in cerca, satisfying the

Case Filter; but that is impossible in (155b) if the complex pronominal ella isnot D but a word with richer structure, from which the residue of D cannot beextracted.Since the affixal property is lexical, simple pronominals cliticize even ifthey are not in final position (e.g., a pronominal object that is a specifier in aLarsonian shell). If focus adds more complex structure, then focused (stressed)simple pronominals could behave like complex pronominals. If English-typepronouns are simple, they too must cliticize, though locally, not raising to I asin Romance (perhaps as a reflex of lack of overt V-raising). The barrier to suchstructures as I picked up it might follow. English determiners such as this andthat are presumably complex, with the initial consonant representing D (as inthe, there, etc.) and the residue a kind of adjective, perhaps. Various consequences are worth exploring.Although apparently not unreasonable, the conclusions are very strong: thus,every right-branching structure must end in a trace, on these assumptions.What about ordering of adjuncts and targets? In Kaynes theory, adjunctsnecessarily precede their targets. Within the bare theory, there is no reallyprincipled conclusion, as far as I can see. Ordering depends on exactly howthe core relations of phrase structure theory, dominate and c-command, aregeneralized to two-segment categories.Consider the simplest case, with attached to K, which projects.K2

(156)

K1

Suppose that K2 is a new category, the specifier or complement, so that

(156) = L = {H(K), {, K}}. Take dominate to be an irreflexive relation withthe usual interpretation. Then L dominates and K; informally, K2 dominates and K1.Suppose, however, that the operation was adjunction, forming the twosegment category [K2, K1] = {<H(K), H(K)>, {a, K}}. Are and K1 dominatedby the category [K2, K1]? As for c-command, let us assume that c-commandsoutside of this category; thus, if it heads a chain, it c-commands its trace, whichneed not be in K1 (as in head raising).113 But what about further c-commandrelations, including those within (156) itself?The core intuition underlying c-command is that(157) X c-commands Y if (a) every Z that dominates X dominates Y and(b) X and Y are disconnected.

312

Chapter 4

For categories, we take X and Y to be disconnected if X Y and neither

dominates the other. The notions dominate and disconnected (hencec-command) could be generalized in various ways for segments.These relations are restricted to terms, in the sense defined earlier: in thecase of (156), to , K (= K1), and the two-segment category [K2, K1]. K2 hasno independent status. These conclusions comport reasonably well with thegeneral condition that elements enter into the computational system CHL if theyare visible at the interface. Thus, K1 may assign or receive a semantic role,as may (or the chain it heads, which must meet the Chain Condition). Butthere is no third role left over for K2; the two-segment category will beinterpreted as a word by Morphology and WI (see (125)) if K is an X0, andotherwise falls under the narrow options discussed earlier.114If that much is correct, we conclude that in (156), [K2, K1] dominates itslower segment K1, so that the latter does not c-command anything (including, not dominated by [K2, K1] but only contained in it).Turning next to c-command, how should we extend the notion disconnected of (157b) to adjuncts? Take adjunction to a nonmaximal head ((16) inKayne 1993, reduced to its bare counterpart).(158)

LR

m2q

m1

Here q is adjoined to the head m to form the two-segment category [m2, m1],a nonmaximal X0 projecting to and heading the category L, which has labelm. R is the complement of m and r its head, and S (which may be complex)is the complement of r. What are the c-command relations for the adjunctstructure?The lowest Z that dominates q and m1 is L, which also dominates [m2, m1].Therefore, q and [m2, m1] asymmetrically c-command r and S, however weinterpret disconnected. What are the c-command relations within [m2, m1]?As noted, m1 does not c-command anything. The other relations depend on theinterpretation of disconnected, in (158b). Kayne interprets it as X excludesY. Then q (asymmetrically) c-commands [m2, m1], which dominates m1 sothat q precedes m1; and in general, an adjunct precedes the head to which it isadjoined. If X, Y are taken to be disconnected, if no segment of one containsthe other, then q c-commands m1 but not [m2, m1], and again q precedes m1.115If disconnected requires still further dissociation of X, Ysay, that neither

Categories and Transformations

313

is a segment of a category that contains the otherthen no ordering is determined for q, m1 by the LCA.I do not see any principled way to choose among the various options.If m1 is not a head but the complex category [m m P], so that q is an XP forreasons already discussed, then q c-commands the constituents of m1 under allinterpretations of disconnect, and the adjunct precedes the target (whether qis internally complex or not).Left open, then, is the case of adjunction of a head to another head, that is,ordering within words. Whether order should be fixed here depends on questions about inflectional morphology and word formation that seem ratherobscure and may have no general answer.Summarizing, it seems that Kaynes basic intuition can be accommodatedin a straightforward way in the bare theory, including the major empiricalconclusions, specifically, the universal order SVO and adjunct- target (at leastfor XP adjuncts). In the bare theory, the LCA gains no support from conceptualarguments and therefore rests on the empirical consequences. We take the LCAto be a principle of the phonological component that applies to the output ofMorphology, optionally ignoring or deleting traces. The specifier-adjunct(A- A) distinction is maintained, along with the possibility of multiple specifiers or adjuncts, though the options for adjunction are very limited for otherreasons. There are further consequences with regard to cliticization and othermatters, whether correct or not, I do not know.4.9

Expletives and Economy

The evidence reviewed so far has led us to postulate two functional categorieswithin IP for simple constructions, Tense (T) and Agr. Occurrence of T ismotivated by bare output conditions because of its semantic properties, andalso for structural reasons: it checks the tense feature of V and the Case of thesubject, and it provides a position for overt nominals, either raised or merged(EPP). Agr is motivated only structurally: it is involved in checking of featuresof subject and object, and it provides a position for overt object raising.116Several gaps in the paradigm remain to be explained. There are three functional categories, but so far only two positions for overt specifiers: subject andobject. Furthermore, the [Spec, I] position contains either an expletive or araised nominal, but only the latter appears in [Spec, AgrO].The first gap would be eliminated if there were structures with both of thepredicted [Spec, I] positions occupied. More interesting still are constructionswith all three of the possible specifier positions: IPs of the form (159), whereNom is a nominal phrase, DP or NP.

314

Chapter 4

(159) [AgrP Nom AgrS [TP Nom T [AgrP Nom AgrO VP]]]Recall that the subscripts on Agr are mnemonics with no theoretical status,indicating the position of the functional category.The structure (159) is illustrated by transitive expletive constructions (TECs)as analyzed by Jonas and Bobaljik (1993), Jonas (1994, forthcoming). Jonasand Bobaljik concentrate on Icelandic, which has structures of the followingtype (English words), according to their analysis:(160) [AgrP there painted [TP a student tT [AgrP[the house VP]]]]The meaning is something like a student painted the house, or the intelligiblebut unacceptable English counterpart (161).(161) there painted the house a student (who traveled all the way from Indiato do it)In (160) the expletive is in [Spec, AgrS]; painted is the verbal head of VPraised in stages to T, which then raises to AgrS, leaving the trace tT; the subjecta student is raised to [Spec, tT] and the object the house to [Spec, AgrO]; andVP contains only traces. The pre-VP position of the object is motivated byplacement of adverbials and negation in the overt forms. The usual propertiesof expletive constructions hold: [Spec, tT], the associate of the expletive in[Spec, AgrS], is nonspecific and determines the number of the verb in the AgrSposition. All three predicted positions are occupied in the overt form.117Case and agreement are checked for the object overtly in [Spec, AgrO], andfor the subject after covert raising of its formal features to AgrS.In a TEC, Agrs and T each have overt specifiers. Hence, each has a strongfeaturein effect, a generalization of the EPP to Agr and T independently. Inintroducing the EPP (section 4.2.1, below (1)), I noted a certain ambiguityabout the strong feature that expresses it: it could be (1) a D-feature, (2) anN-feature, or (3) a general nominal feature, either D or N. So far I have beenusing the terminology (1), but neutrally among the three choices. At this pointthe choices begin to make a difference, so more care is needed.The specifier of AgrS is the expletive, so the strong feature of AgrS at leastallows and may require the categorial feature [D]. The specifier of T isnominal, but there are theory-internal reasons, to which we will turn in section4.10, suggesting that it might be NP rather than DP. If correct, that could bea factor in accounting for the definiteness effect: the fact that the associate inan expletive construction, whether in [Spec, T] or lower in the clause, is nonspecific, NP rather than DP (D assumed to be the locus of specificity). Furthermore, since AgrS and AgrO are the same element appearing in two different

Categories and Transformations

315

positions, if AgrS has a strong D-feature, then AgrO should as well. We thusexpect overt object raising to favor definite-specific nominals, whereasnonspecific NPs remain in situ (by Procrastinate). That seems the generaltendency.The analysis of the I structures is the same whether the object is raised ornot, or even in passive and raising expletives with subject in [Spec, T], as in(162) (translation of Icelandic examples from Jonas 1994).(162) a. there have [TP some cakes [VP been baked t for the party]]b. there seems [TP [Subj someone] [IP t to be in the room]]Let us refer to all of these as multiple-subject constructions (MSCs), whethertransitive (TEC) or not.MSCs fall within the range of options already available, using more fullythe available strength features of functional categories. These constructionsalso provide additional support for the conclusion that the expletives in expletive-associate constructions lack Case or -features, by simple extension ofreasoning already given. MSCs thus offer further evidence that the expletiveis pure, as we would expect on general grounds.In a close analysis of Icelandic, several Faroese dialects, and MainlandScandinavian, Jonas (1994) found MSCs to be contingent on overt V-raising.We know from languages without overt V-raising, like English, that at leastone functional category, T or Agr, can take an overt specifier if it standsalone, not supported by overt V-raising. Assuming this category to be T,Jonass generalization states that Agr cannot have a specifier unless supportedby V.118MSCs such as (160) raise two basic questions.(163) a. Why do languages differ with regard to MSCs, some allowingthem, others not?b. How are such structures permitted by economy principles?These questions presuppose some analysis of simple expletive constructionssuch as (164ab).(164) a. there arrived a manb. there is a book missing from the shelfWe have found considerable evidence that in such constructions the formalfeatures of the associate raise covertly to matrix I, checking Case and -featuresand functioning as if they were in subject rather than object position withregard to binding and control.119 I will continue to assume that account to becorrect in its essentials.

Jonass generalization is relevant here: overt MSCs require overt V-raising.Could there be MSCs with only covert raising? Example (161) suggests thatthe possibility should not be immediately discounted. As noted, the sentenceis unacceptable (as is (164a), to some degree) though intelligible, and withother lexical choices the construction ranges in acceptability, as Kayne hasobserved.(165) a. there entered the room a man from Englandb. there hit the stands a new journalc. there visited us last night a large group of people who traveled allthe way from IndiaSuch constructions have been thought to result from an extraposition operation depending in part on considerations of heaviness, but in our restrictedterms, there is no obvious source for such constructions apart from MSCs.A possibility that might be explored is that they are in fact MSCs, withthe subject category appearing overtly at the right boundary, perhaps theresult of a process in the phonological component that could be motivatedby properties of theme-rheme structures, which typically involve surfaceforms in some manner. Prominence of the theme might require that it beat a boundary: to the right, since the leftmost position is occupied by theexpletive subject. Icelandic might escape this condition as a reflex of itsinternal-V-second property, which requires a method for interpreting internalthemes. The lexical restrictions in English may reflect the semilocative character of the expletive. If speculations along these lines prove tenable, question(163a) may take a somewhat more complex form; the MSC option may bemore general, but with different manifestations depending on other propertiesof the language.120Question (163b) leads into a thicket of complex and only partly exploredissues. The fact that MSCs alternate with nonexpletive constructions is unproblematic; the alternatives arise from different numerations, hence are not comparable in terms of economy.121 But there are questions about some of theinstantiations of the postulated structures (166).(166) Exp Agr [Subj [T XP]]Suppose we have the overt form (167). We would expect this to be themanifestation of the two distinct MSCs (168a) and (168b).(167) there seems someone to be in the room(168) a. there seems [TP someone [IP t to be in the room]]b. there seems [IP t [TP someone to be in the room]]

Categories and Transformations

317

In (168a) the matrix clause is an MSC with the subject someone occupyingmatrix [Spec, T]. In (168b) the embedded clause is an MSC with someoneoccupying embedded [Spec, T] and the trace of there occupying the higherSpec of the MSC. Both possibilities appear to be legitimate (Jonas 1994);(168a) is (162b)).122 But we have to explain why both (168a) and (168b) arelegitimate outcomes of the same numeration, and why (169) is barred inEnglish.(169) * there seems (to me, often) [IP someone to be in the room]We do not have a direct contradiction. Principles of UG might bar (169)generally while permitting (168). Tentatively assuming that possibility to pointto the resolution of the problem, we then ask why the structures illustrated in(168) are permitted while the one in (169) is barred. We cannot appeal to thenumeration in this case, because it is the same in all examples.We also have to explain why (170) is permitted, with there raising from theposition of t, where it satisfies the EPP in the embedded clause.(170) there seems [t to be [someone in the room]]The problem becomes harder still when we add ECM constructions. In these,the embedded subject does raise overtly to a position analogous to t in (170),where it is barred (see (169)); and it cannot remain in situ as it does in (170).I expected [someone to be [t in the room]] (... to have beenkilled t)b. *I expected [t to be [someone in the room]] (... to have beenkilled John)

(171) a.

We thus have an intricate network of properties that are near-contradictory.

Within the minimalist framework, we expect the escape route to be providedby invariant UG principles of economy.The questions have to do with overt movement; hence, one relevant principleshould be Procrastinate, which favors covert movement. Procrastinate selectsamong convergent derivations: overt movement is permitted (and forced) toguarantee convergence. Beyond that, the questions bear directly on basicassumptions of the theories of movement and economy.Let us begin by considering the ECM cases, which contrast with controlcases as in (172), t the trace of the embedded subject.(172) a. I expected [PRO to [t leave early]]b. I expected [someone to [t leave early]]In present terms, these differ in properties of the head H of the embeddedphrase. In both structures, the EPP holds, so the nominal feature of H is strong.

318

Chapter 4

In the control structure (172a), H assigns null Case to the subject, which musttherefore be PRO. In the ECM structure (172b), H assigns no Case, so Johnraises to the checking domain of AgrO in the matrix clause; more precisely, itsformal features raise covertly to this position, we assume.There are three basic problems about (172b): (1) Why doesnt someoneraise overtly all the way to the matrix position in which it receives Case?(2) Why is someone permitted to raise overtly to embedded subject position?(3) Why must someone raise overtly from the trace position (also PROin (171a))?Problem (1) is overcome by Procrastinate, which requires that the secondstep in raising to the Case-checking position be covert, assuming that AgrO isweak. Question (2) has already been answered: someone has accessible features, and one of them, its categorial feature, checks the strong nominal featureof the embedded I (EPP). Problem (3) now disappears: if someone does notraise overtly from the trace position, the derivation crashes.123Let us now turn to the contrasting cases (169) and (171). In each case thereference set determined by the initial numeration includes a second derivation: in the case of (169), the one that yields (170); in the case of (171a), theanalogous one that yields (171b) with the trace of raised I. Our goal is toshow that in the case of (169)(170), economy considerations compel raisingof there from the embedded clause, while in the case of (171), the same considerations block raising of I from the embedded clause, requiring raising ofsomeone to embedded subject position to satisfy the EPP. The properties ofthe constructions suggest that the answer lies in -theory.Consider first (169) and (170), specifically, the structure that is common tothe two derivations. In each, at some stage we construct = (173), with thesmall clause .(173) [ to be [ someone in the room]]The next step must fill the specifier position of to satisfy the EPP. Giventhe initial numeration, there are two possibilities: we can raise someone to[Spec, ] or we can insert there in this position.124 The former choice violatesProcrastinate; the latter does not. We therefore choose the second option,forming (174).(174) [ there to be ]At a later stage in the derivation we reach the structure (175).(175) [ seems [ there to be ]]Convergence requires that [Spec, ] be filled. Only one legitimate optionexists: to raise there, forming (170). We therefore select this option.

Categories and Transformations

319

The argument is based on the assumption that at a particular stage of a

derivation from numeration N, we consider the reference set R(N, ) from ahighly local point of view, selecting the best possible (most economical)move available in R(N, ) at stage . This more restrictive approach is preferable on conceptual grounds for the usual reasons of reduction of computationalcomplexity; and once again, conceptual naturalness and empirical demandscoincide, as we would hope if we are on the right track.Why, then, does the same argument not favor (171b) over (171a)? Thecommon part of the derivations is again (173). We have two ways to fill [Spec,], raising of someone or insertion of I, the latter being preferred.Suppose we insert I, then raising it to form (171b), analogous formally tothe legitimate outcome (170). But we already know that the argument chain(I, t) in (171b) lacks a -role under this analysis (see section 4.6). If this causesthe derivation to crash, then the unwanted outcome is barred and only (171a)is permitted. If this is only a case of convergent gibberish, then it blocks thedesired outcome (171a), incorrectly. We therefore conclude that an argumentwith no -role is not a legitimate object, violating FI and causing the derivationto crash, a conclusion that is natural though not previously forced.125In section 4.6 we reached a somewhat weaker conclusion on differentgrounds. There we found reason to believe that a derivation crashes if -rolesare not properly assigned, leaving open the question whether the problem isfailure to assign a -role, or to receive one, or both. We now have a partialanswer: failure of argument to receive a -role causes the derivation to crash.The status of failure to assign a -role remains open. In earlier work it hasbeen suggested that external -role need not be assigned in nominalizationsand is in this sense optional; we have not excluded that possibility, or confirmed it, though we have found reason to suppose that in verb-headed constructions, the question of assignment of external -role arises only in asomewhat different form. If a configuration [v-VP] is formed with [Spec, v],that configuration just is an external -role, and the question is, what happensif a nonargument (an expletive) appears in this position, violating the-Criterion? We will return to this question, leaving it without definiteanswer.Let us consider how the economy-theoretic account of raising in expletiveconstructions comports with earlier discussion of it-expletives, as in (83),rephrased here as (176).(176) it seems [that someone was told t [that IP]]At an earlier stage of the derivation of (176), we have (177), analogousto (173).

320

Chapter 4

(177) [ was told someone [that IP]]

Since the numeration includes expletive it, we have the same two options asin the cases just discussed: we can raise someone or insert it, the latter optionbeing preferred. Suppose we insert it, then raising it to form (178), analogousto (170).(178) it seems [that t was told someone [that IP]]But the derivation crashes for several reasons (the Case-checking feature ofmatrix T is not erased, the Case of someone is not checked). Since economyconsiderations select among convergent derivations, the preferred option ofinserting the expletive in (177) cannot be employed, and we derive (176), asrequired.126One aspect of question (163b) still remains unanswered: why is the permitted MSC structure (166), repeated here as (179a), not blocked by the alternative (179b), in accord with the reasoning just reviewed?(179) a. Exp Agr [Subj [T XP]]b. Exp Agr [t [T [... Subj ...]]]In other words, why are (168ab), repeated here, both legitimate?(180) a. there seems [TP someone [IP t to be in the room]]b. there seems [IP t [XP someone to be in the room]]By the reasoning just outlined, we would expect (180b) to block (180a), avoiding the violation of Procrastinate by overt raising of someone. Let us delay thequestion until the next section.127It is worth highlighting the basic assumption about reference sets that underlies the preceding discussion: they are determined by the initial numeration,but in a fairly local fashion. At a particular stage in the derivation, weconsider only the continuations that are permitted from to LF, using whatremains of the initial numeration; the most economical of these blocks theothers. But we ask even a narrower question: at , which operation that yieldsa convergent derivation is most economical at this point? Thus, we selectMerge over Attract/Move if that yields a convergent derivation, irrespective ofconsequences down the road as long as the derivation converges; but we selectAttract/Move even violating Procrastinate if that is necessary for convergence.The problems of computational complexity are thus considerably reduced,though more remains to be done, no doubt. The assumptions throughout arestraightforward, but rather delicate. It remains to investigate further cases andconsequences.

Categories and Transformations

4.10

321

Functional Categories and Formal Features

What precedes substantially revises the framework developed in chapters 13.

But we have not yet subjected functional categories to the same minimalistcritique. In the final section I would like to explore this question, a course thatleads to another fairly substantial modification. Even more than before, I willspeculate rather freely. The issues that arise are fundamental to the nature ofCHL, having to do with the formal features that advance the computation (primarily strength, which drives overt operations that are reflected at the A-Pinterface) and the functional categories that consist primarily (sometimesentirely) of such features.4.10.1

The Status of Agr

Functional categories have a central place in the conception of language we

are investigating, primarily because of their presumed role in feature checking,which is what drives Attract/Move. We have considered four functional categories: T, C, D, and Agr. The first three have Interpretable features, providinginstructions at either or both interface levels. Agr does not; it consists ofInterpretable formal features only. We therefore have fairly direct evidencefrom interface relations about T, C, and D, but not Agr. Unlike the other functional categories, Agr is present only for theory-internal reasons. We shouldtherefore look more closely at two questions.(181) a. Where does Agr appear?b. What is the feature constitution of Agr?In section 4.2.2 we tentatively assumed that Agr lacks -features, just as it(fairly clearly) lacks an independent Case-assigning feature, that being provided by the V or T that adjoins to it. If Agr indeed lacks -features as well,we would expect that the -features of a predicate Pred (verb or adjective) areadded to Pred (optionally) as it is selected from the lexicon for the numeration.We had little warrant for the assumption about -features, and so far it has hadlittle effect on the analysis. But it becomes relevant as we attempt more carefulanswers to the questions of (181). I will continue to assume that the originalassumption was correct, returning to the question at the end, after having narrowed significantly the range of considerations relevant to it.We have evidence bearing on question (181a) when Agr is strong, so thatthe position is phonetically indicated by the overt categories that raise to it: Vand T by adjunction, DP by substitution in [Spec, Agr]. The richest exampleis an MSC with object raising, as in the Icelandic TEC construction (160).

322

Chapter 4

Here three pre-VP positions are required within IP for nominal expressions:expletive, subject, and object. One position is provided by T. We thereforehave evidence for two noninterpretable functional categories, the ones we havebeen calling Agr (AgrS and AgrO). In MSCs, AgrS is strong, providing a specifier and a position for V-raising above the domain of strong T: in effect, adouble EPP configuration. Another VP-external position is provided betweenT and VP by strong AgrO. That is the basic rationale behind the analyses justoutlined. It accords with the general minimalist outlook, but the anomalousstatus of Agr raises questions.The background issues have to do with the strong features of T andAgr, and what appears in the overt specifier positions they make available.In the I position, preceding all verb phrases (main or auxiliary), we have postulated two functional categories: T and AgrS. In MSCs the specifier positionof each is nominal, DP or NP; hence, the strong feature must at least be[nominal-], meaning satisfied by the nominal categorial feature [D] or [N]. Atmost one nominal can have its Case and -features checked in this position,which suggests that one of the two nominals must be the pure expletive Exp,a DP. Let us assume this to be the case, though it is not as yet established. Theobserved order is Exp-nominal rather than nominal-Exp, a fact yet to beexplained.The best case is for AgrO to have the same constitution as AgrS. Since AgrSallows and perhaps requires a D-feature, the same should be true of AgrO.Hence, both Agrs attract DPs: nominals that are definite or specific. Asnoted, it follows that expletive-associate constructions observe the definitenesseffect and that object raising is restricted to definite (or specific) nominals.This is close enough to accurate to suggest that something of the sort may behappening.Recall, however, that the definiteness effect for object raising is at best astrong tendency, and that for expletive constructions its status is unclear. Itdoes not rule out any derivations, but rather describes how legitimate outputsare to be interpreted: either as expletive constructions with at most weaklyexistential implicatures, or as list constructions with strong existential interpretation (see notes 42, 44). We therefore have no strong reason to supposethat the associate cannot be a DPonly that if it is, a certain kind of interpretation is expected.With strong features, Agr provides a position for T- or V-raising (adjunction)and DP-raising (substitution), so there is evidence that it appears in the numeration. If Agr has no strong feature, then PF considerations, at least, give noreason for it to be present at all, and LF considerations do not seem relevant.That suggests an answer to question (181a): Agr exists only when it has strong

Categories and Transformations

323

features. Agr is nothing more than an indication of a position that must be

occupied at once by overt operations.128 Substitution can be effected byMerge or Move. If by Merge, it is limited to expletives, for reasons alreadydiscussed.Pursuit of question (181b) leads to a similar conclusion. The function ofAgr is to provide a structural configuration in which features can be checked:Case and -features, and categorial features ([V-] and [T-] by adjunction, [D-]by substitution). The Case-assigning feature is intrinsic to the heads (V, T) thatraise to Agr for checking of DP in [Spec, Agr], so there is no reason to assignit to Agr as well. With regard to -features, as already discussed, the matter ismuch less clear. If Agr has -features, they are Interpretable, but there mightbe empirical effects anyway, as noted earlier. Continuing tentatively to assumethat -features are (optionally) assigned to lexical items as they are drawn fromthe lexicon, we conclude that Agr consists only of the strong features that forceraising.Certain problems that arose in earlier versions now disappear. There is noneed to deal with optionally strong Agr, or with the difference in strength ofAgrS and AgrO. Since Agr is strong, the first problem is just a matter of optionalselection of an element ([strength of F]) from the lexicon for the numeration,the irreducible minimum; and difference in strength is inexpressible. Therestill remains, however, a conflict between the -theoretic principle that transitive verbs have a vVP structure and the assumption that overt object raisingis internal to this construction (see note 81 and p. 316).Let us turn to the properties that remain.Since Agr consists solely of strong features, it cannot attract covert raising.129We have so far assumed that Subj (subject) and Obj (object) raise to thechecking domain of Agr, entering into a checking relation with features of Tor V adjoined to Agr (technically, adjoined within Agr0max, the X0 projectionheaded by Agr). But with weak Agr gone, covert raising must target T and Vdirectly.130There is now no reason to postulate AgrO unless it induces overt raising ofDP to [Spec, AgrO]. What about AgrS? It appears in MSCs, but lacks independent motivation elsewhere, as matters now stand. For languages of the FrenchEnglish type, then, Agr is not in the lexicon (unless MSCs appear marginally,with extraposition). Agr therefore occurs in highly restricted ways.The next question is to inquire into the justification for Agr with strongfeatures. Let us first look at AgrO, then turn to AgrS.We restrict attention now to transitive verb constructions, which we continueto assume to be of the form (182), ignoring [Spec, V] (the case of a complexinternal domain).

324

Chapter 4

(182)

vmaxSubj

vv

VPV

Obj

V raises overtly to the light verb v, forming the complex Vb = [v V v]. Assuming unergatives to be concealed accusatives, the only other VP construction isthat of unaccusatives lacking the v-shell, not relevant here.Suppose that a derivation has formed (182) and Agr is merged with it. Agris a collection of strong features, either [D-] or [V-] or both. As noted, we neednot postulate AgrO except for object raising; it does not consist only of strong[V-]. Holmbergs generalization states, in effect, that it cannot be just strong[D-]. Let us tentatively assume, then, that AgrO is {strong [D-], strong [V-]}.The effect of adding AgrO is to compel overt raising of DP to [Spec, Agr] andof Vb to Agr.Consider the first property. There is a simple way to force overt DPraising without the functional category Agr: namely, by adding to v itselfa strong D-feature (or perhaps, the more neutral strong [nominal-] feature)that requires overt substitution in the outer Spec of a multiple-Specconfiguration. If Obj raises to this position to form a chain (Obj, t), it willbe in the checking domain of V and therefore able to check its Case and(object agreement) -features. Recall that Subj inserted by Merge in [Spec, v]is not in the checking domain of v, because it does not head a nontrivialchain.131Object raising, then, takes place when the light verb v that heads the transitive verb construction (182) is assigned the strong feature as it is drawn fromthe lexicon and placed in the numeration; see section 4.2.2. The choice isarbitrary, forced, or unavailable as the language has optional, obligatory, or noovert object raising, respectively. Since Subj is not in the checking domain, asjust noted, it does not check this strong feature, so an outer Spec must beconstructed for that purpose. One way is raising of Obj; I hope to show thatall others are excluded.Suppose that an adverbial phrase Adv is adjoined to vmax and object raisingcrosses it, yielding the construction ObjAdvvmax. That provides no reason topostulate an Agr position outside of vmax: a strong feature need only be satisfiedbefore a distinct higher category is created.132

Categories and Transformations

assuming the existence of AgrO. The other property of AgrO is that it forcesovert V-raisingactually to T outside of VP, so the effects are never directlyvisible. The motivation was theory-internal, but it disappears within themore restricted framework, as we will see. The property was a crucial partof the expression of Holmbergs generalization that object raising is contingent on V-raising, but to introduce that consideration to justify postulationof AgrO is circular. For VP, at least, it seems that we should dispensewith AgrO.Consider adjectival constructions such as (55), repeated here.(183) John is [AgrP t1 Agr [APt2 intelligent]]We assumed that John is merged in the position of t2 in AP as [Subj, Adj](subject of intelligent, in this case), raising to [Spec, Agr] for DP-adjectiveagreement, then on to matrix [Spec, I] for DP-verb agreement.133 Do we needa strong functional category (Agr) here to head the small clause complementof the copula? Assuming that [Subj, Adj] is analogous to specifier or complement of Vand more generally, that the complementarity of -theory andchecking theory holds in this case as well as othersthen agreement will notbe checked in this position of merger. We assumed that the ( Interpretable)-features of the adjective Adj are checked by overt raising of its subject Subjto [Spec, Agr] and of Adj to Agrthe latter problematic, as mentioned,because Agr is weak in English (see note 51). We can now avoid that problemby eliminating Agr and adopting the analysis just proposed for overt objectraising: Adj is assigned the feature strong [nominal-] as it is drawn from thelexicon, and [Subj, Adj] raises to the outer Spec required by the strong feature,entering the checking domain of Adj. In this case the derivation will convergeonly if the strong feature is selected, so the choice is in effect obligatory. Notethat features of Subj cannot adjoin covertly to the Adj, as a review of the possible cases shows (on plausible assumptions).We therefore eliminate AgrO in this case too, using simple mechanisms andovercoming an earlier problem about unexpected head raising. The structureof predicate adjectival constructions is not (183) but rather (184).(184) John is [AP t1 [A t2 intelligent]]For small clauses, we have something like the original assumptions of Stowell(1978) on which much of the work on the topic has been based, but consistentwith other assumptions that we are now adopting.If all of this turns out to be correct, we can eliminate AgrO from the lexicalinventory entirely, for any language. Turning to Agrs, we need to consider only

326

Chapter 4

MSCs, which have the surface order [ExpVSubj]. Our assumption so far isthat the subject Subj is in [Spec, T] and the expletive in [Spec, AgrS], and thatV has raised to AgrS. Suppose, instead, we follow the line of reasoning suggested for AgrO, eliminating Agr and adding an optional strong feature thatassigns an outer Spec to T. The situation differs from the case of AgrO. [Spec,v] in (182) is required for independent -related reasons, so only one new Specis required for object raising. In contrast, T requires no Spec, so we have toaccommodate two Specs that are induced only by feature strength. Independently, we have to account for the fact that the order is not the expected (185a)but rather (185b), along with other observed properties.(185) a. Exp [Subj [T0max XP]]b. Exp T0max Subj XPMSCs appear only when the EPP holds. The question of their nature arises,then, only when T already has a strong [nominal-] feature, which is deletedwhen checked by DP or NP in [Spec, T]. Suppose that the derivation hasreached the stage TP with T strong, and the numeration contains an unusedexpletive Exp. Then Exp can be inserted by Merge to satisfy the EPP, and wehave an ordinary expletive-associate construction. The strong feature of Tdeletes and furthermore erases, since the derivation converges. Hence, overtMSCs exist only if T has a parameterized property of the kind discussed earlier(see below (58)), which allows a Interpretable feature (in this case, the strong[nominal-] feature) to escape erasure when checked. If the option is selected,then there must be a multiple-Spec construction, with n + 1 specifiers if theoption is exercised n times. In a language with the EPP but no MSCs, thestrong nominal feature of T is introduced into the derivation with n = 0, henceerased when checked. In Icelandic, the descriptive facts indicate that n = 0 orn = 1; in the latter case, T has two Specs.Let us see where this course leads, eliminating Agr from UG entirelyand,at least for our purposes here, keeping to functional categories with intrinsicproperties that are manifested at the interface levels. The questions that ariseare again rather delicate. Let us delay a direct investigation of them until somefurther groundwork is laid.4.10.2

Core Concepts Reconsidered

To accommodate the change from an Agr-based to a multiple-Spec theory, we

have to simplify the notions of equidistance and closeness that entered intothe definition of Attract/Move. These were expressed in the principle (87),repeated here.

Categories and Transformations

327

(186) is closer to K than if c-commands and is not in the minimal

domain of CH, where CH is the chain headed by , adjoined withinthe zero-level projection H(K)0max.But this no longer works: with the elimination of intervening heads, minimaldomains collapse. We therefore have to exclude nontrivial chains from theaccount of equidistance, relying instead on the much more differentiatedanalysis of features now available and the immobility of tracesthat is, onthe fact that only the head of a chain can be seen by K seeking the closest to attract.In the earlier formulation, the basic case is (85), repeated here in the moregeneral form (187) to accommodate adjunction of to X as well as substitution of in [Spec, X] (where X may already be the head of a complex zerolevel projection).(187)

Xmax[YX]

YP

Spec1

Yt

ZP

When raises, targeting X , it creates a new position (X), which may eitherbe [Spec, X] or adjoined to [YX] (= X0max); call (X) the target in either case.The minimal domain of the chain CH = (Y, t) includes Spec1 and ZP alongwith (X) formed by raising of , which is within ZP or is ZP. Crucially, Spec1is within the neighborhood of X that is ignored in determining whether is close enough to be attracted by X (technically, by its projection). Thatassumption was necessary in order to allow to cross Spec1 to reach (X). Ina transitive verb construction, for example, it was assumed that X = Agr, Spec1= Subj, Y is the verbal element that adjoins to Agr, and Obj is within its ZPcomplement. Obj has to raise to the checking domain of Agr for feature checking either overtly or covertly, requiring that it be as close to the target asSpec1.Most of this is now beside the point. We have eliminated Agr and its projection from the inventory of elements. For the case of overt object raising, thestructure formed is no longer (187) with X = Agr and (X)= [Spec, Agr], but(188), with an extra Spec in YP.max

328

Chapter 4

(188)

YPSpec2

YSpec1

YVb

ZP

Vb is the verbal element (or its trace, if the complex has raised further to adjointo T); Y and YP are projections of the light verb v to which V has adjoinedto form Vb; ZP = [tv Obj], tv the trace of V; and Spec2 is the target (v) createdby the raising operation. Spec1 is Subj, and it is only necessary that it be nocloser to the target Spec2 than in ZP. For this purpose, it suffices to simplify(186), keeping just to the trivial chain CH = H(K) (the head of K) and itsminimal domain. We therefore restate (186) as (189).(189) and are equidistant from if and are in the same minimaldomain.Hence, = Spec2 and = Spec1 are equidistant from = Obj in the illustrativeexample just discussed.We now define close for Attract/Move in the obvious way: if c-commands and is the target of raising, then(190) is closer to K than unless is in the same minimal domain as (a) or (b) .We thus have two cases to consider. We ask (case (190a)) whether and are equidistant from , and (case (190b)) whether and are equidistantfrom . If either is true, then does not bar raising of to . In case (190a), and are in the minimal domain of H(K); and in case (190b), and are in the minimal domain of h, for some head h. In case (190a), isin the neighborhood of H(K) that is ignored, in the sense of earlierexposition.By case (190a), Obj within ZP in (188) is close enough to be attracted byY (= YP, at this point), since Spec1 is in the minimal domain of H(Y) and istherefore not closer to Y than Obj; Spec1 and Spec2 (=) are equidistant fromObj. Therefore, either Subj in Spec1 or Obj (in ZP) can raise to the new outerSpec, Spec2, required by the strong feature of v. Both Obj and Subj must raisefor Case checking, and something must raise to check the Case feature of T(or of some higher category if T is a raising infinitival, as already discussed).By case (190b), overt object raising to Spec2 does not prevent subject raising

Categories and Transformations

329

from Spec1, because Spec2 and Spec1 are equidistant from any higher target;both are in the minimal domain of v. How about direct raising of Obj fromwithin ZP, targeting T, crossing Subj and Spec1? That is barred by the MLC,since Subj and Obj are not equidistant from T, given the vVP analysis oftransitives; they are in different minimal domains. We will return to a closeranalysis, reviewing other options skirted here.Consider the following counterargument. Suppose the language has the EPPand optional object raising: T requires [Spec, T] and v permits an outer Spec,Spec2, beyond Subj in Spec1 (both overt). Suppose that Obj raises to [Spec2,v], then raises again to [Spec, T], satisfying the EPP. That much is permitted.Subj and T have not had Case features checked, but that can be overcome bycovert raising of Subj, targeting T, which is also permitted. So the derivationconverges, incorrectly. But this derivation is blocked by economy conditions.It involves three raising operations, and two would suffice for convergence:object raising followed by subject raising to [Spec, T] (in both cases, with twoviolations of Procrastinate, the minimal number with two strong features). Sothe unwanted series of steps, though permitted, is barred by economy considerations: shorter derivations block longer ones.The computation is local: after raising the object, we choose the operationthat will lead to the shortest convergent derivation: raising of Subj to [Spec,T]. We also have empirical support for the tentative assumption made earlierthat shorter derivations, locally determined in this sense, block longer ones(see discussion of (114)).Note that we have lost Holmbergs generalization and other effects ofV-raising on extension of chains; that is a consequence of excluding chainsfrom the definition of closeness. Such generalizations, if valid, would nowhave to be stated in terms of a property of Vb in (188): it can have a secondouter Spec only if it is a trace. There is no obvious reason why this shouldbe so.In any event, the earlier, more complex definition of equidistance andcloseness is not necessary and in fact not possible. The notion of equidistancemay still be needed, as in cases just reviewed and others, but it has narrowerscope.The conclusion that equidistance is still needed relies on a tacit assumptionthat could be challenged: that the strong feature of v must be satisfied by theouter Spec, Spec2 of (188), not the inner Spec, Spec1. All we know, however,is that some Spec of v is motivated by considerations of -theory (to host theexternal argument) and is therefore independent of the strength of v; the otherSpec is present only to check the strength feature. But both Specs are withinthe minimal domain of v, so either is available for -marking of the external

330

Chapter 4

argument of a transitive verb. Suppose we allow this possibility, so that the

outer Spec can host the external argument. In that case we can drop the notionof equidistance entirely, simplifying (190) to the statement that is closer tothe target K than if c-commands . It follows, then, that Obj can onlyraise to the inner Spec, Spec1 of (188), to check the strength feature andundergo overt Case marking. If overt object raising takes place, then Subj willbe merged in the outer Spec to receive the external -role provided by theconfiguration. With closer than restricted to c-command, only Subj in theouter Spec can be attracted by T (note that Subj always has features that willcheck sublabels of T). Therefore, Obj is frozen in place after overt objectraising, and the conclusions reached above follow directly.134On these assumptions, it follows that Subj always c-commands Obj withinIP. In particular, this is true in expletive constructions, whether or not objectraising has taken place; that appears to be generally the case, with some unexplained exceptions (see Jonas and Bobaljik 1993). We also have a somewhatmore natural account of agreement, with the inner Spec uniformly enteringinto the relation (the Spec -position is not subject to it for reasons alreadydiscussed). It also should be the case that only the inner Spec in a multipleSpec construction can be a binder (assuming that locality enters crucially intobinding, in one of several possible ways) though control may be more free, asit often is (see section 1.4.2); that appears to be the case (Hiroyuki Ura, personal communication). Further questions arise when we turn to verbs withcomplex internal argument structure, which I am ignoring here.Let us keep available these two options for the notion closer than, notinghowever that the one just sketched is simpler, hence to be preferred if tenable(and certainly if empirically supported). I will present the examples below onthe assumption that object raising is to the outer Spec to ensure that therequired consequences follow even under this more complex alternative; it iseasy to check that the arguments run through (more simply, in some cases) ifcloseness reduces to c-command so that object raising is to the inner Spec andonly the outer Spec is attracted by T.We also have to settle some questions about adjunction that have been leftopen but become more prominent in this much more restrictive framework,covert adjunction being the most interesting case. Empirical evidence forcovert operations and the structures they yield is harder to obtain than for theirovert counterparts, but it exists, and conceptual arguments also carry us somedistance, at least.One reasonable guiding idea is that interpretive operations at the interfaceshould be as simple as possible. Barring empirical evidence to the contrary,

Categories and Transformations

331

we assume that the external systems are impoverisheda natural extension of

minimalist intuitions to the language faculty more broadly, including thesystems (possibly dedicated to language) at the other side of the interface.That means that the forms that reach the LF level must be as similar astypological variation permitsunique, if that is possible. These assumptionsabout the interface impose fairly restrictive conditions on application andordering of operations, cutting down the variety of computation, always awelcome result for reasons already discussed. At the A-P interface, overtmanifestation provides additional evidence. Such evidence is largely unavailable at the C-I interface, but the general conceptual considerations justreviewed carry some weight. We have been implicitly relying on them throughout: for example, to conclude that covert features adjoin to the head of a chain(specifically, a raised verb), not to the trace or optionally to either; see (94)and discussion.The central problem about covert adjunction concerns the structure of T0 maxat LF. Consider first the richest case: a TEC with object raising (Icelandic).Putting aside the observed position of T0 max, we assume its form at LF to be(185a), repeated here, with YP an instance of (188).(191) Exp [Subj [T0 max YP]]Exp and Subj are specifiers of the T head of T0 max, which is formed by adjunction to T of Vb = [ V v]. In this case T0 max is (192) and the complement YPis (193).(192)

TVb

(193)

Tvmax

Obj

vtsubj

vVb

VPtV

tObj

Here Obj and tSubj are specifiers of the v head of Vb.

332

Chapter 4

Suppose V raises overtly and Obj does not (French, or optionally Icelandic).The complement of T differs from (193) in that it lacks the outer Spec occupiedby Obj, which remains in the position tObj. The formal features FF(Obj) raiseto T0 max for feature checking. Before this covert operation takes place, T0 maxis again (192). To maximize similarity to the LF output (192), FF(Obj) mustadjoin to the complex form (192) itself, forming (194), not to the deeplyembedded V within Vb, which actually contains the relevant features forchecking.(194)

TFF(Obj)

TVb

The operation is permitted, since the features of V are sublabels of the targetof adjunction; and it satisfies the conditions on closest target (tSubj beinginvisible, as discussed) and formation of complex X0s. Assuming this to bethe general pattern, we conclude that adjunction is always to the maximalzero-level projection X0 max, never internally to one of its constituentsthesimplest assumption in any event.Consider an English-type language with overt raising of Subj but not V orObj. To achieve maximum impoverishment at the interface, we want T0 max atLF to be as similar as possible to (194)in fact, identical except that in placeof Vb it has FF(Vb), since V-raising is covert. FF(Obj) therefore cannot raiseto V or to the verbal complex Vb before Vb adjoins to T; if it did, the structureformed would be quite different from (194). After covert V-raising, FF(Obj)adjoins to T0 max, again forming (194) at LF (with FF(Vb) in place of Vb). Theordering is forced by bare output conditions, if the conjecture about povertyof interface interpretation is correct.Suppose the language lacks overt raising of either Subj or Obj, so that bothFF(Subj) and FF(Obj) raise covertly to TP. The poverty-of-interpretation conjecture requires that Vb raise to T before either subject or object raising; wethus have (192) once again, as desired, whether V-raising is overt (as in a VSOlanguage) or not (in which case Vb in (192) is replaced by FF(Vb)). TP nowattracts Subj, which is closer than Obj, forming a structure identical to (194),except that the features FF of (188) happen to be FF(Subj) rather than FF(Obj).But it is also necessary for FF(Obj) to raise. That is now possible, since thetrace of Subj (unlike Subj itself) is not closer to TP, being inaccessible toAttract/Move. T0 max therefore ends up as (195).

Categories and Transformations

(195)

333

TFF(Obj)

FF(Subj)Vb

TT

Recall that in a normal expletive construction, the strong D-feature of T is

satisfied by an expletive rather than the raised Subjthe case of Merge thatwas assimilated to Attract/Move (see end of section 4.5). In this case T0 max isagain (195), and [Spec, T] is occupied by the expletive. An expletive construction thus has a certain structural resemblance to a VSO configuration, as hasbeen implicit throughout.We conclude that adjunction is to the maximal X0 projection X0 max and thatheads raise before elements of their domains, conditions reminiscent of cyclicity. These are descriptive generalizations that we derive, not principles that westipulate: they follow from the minimalist principle of poverty of interpretationat the interface. Simple and plausible assumptions suffice to guarantee virtually the same LF form for T0 max, over the typological range we are considering.So far these conclusions are motivated only by the conceptual requirement ofmaximizing uniformity of LF outputs. They supplement the earlier conclusionthat traces of A-movement never enter into Attract/Move, whether overt orcovert.Suppose that output conditions at the LF interface rule out (195) under therather natural requirement that FF(Subj) must c-command FF(Obj) in T0 max ifboth are present. Hence, Obj has to adjoin to T0 max before Subj does. But thatis impossible, since Subj is closer to T than Obj if both remain in situ. Itfollows, then, that at least one of Subj and Obj must raise overtly if the expression is to converge, a hypothesis that has been advanced several times. Therequirement on T0 max generalizes the conclusion that Subj must c-commandObj overtly, which follows from the simplification of the notion closer thanto just c-command, as already discussed.We also have to settle some questions about the positions in which expletives can appear. The problems are not specific to this analysis; they arosebefore (more broadly, in fact), but were ignored. The basic descriptive fact is(196).(196) Exp can only be in [Spec, T].

334

Chapter 4

We have to determine why this is the case, a question that has a number offacets.Suppose that Exp is merged in a -position, one of the possible violationsof (196). That leads to a violation of the -Criterion, hence a deviant expression, a conclusion that suffices to allow us to dismiss the option here. Still, afactual question that has arisen several times before remains: is this a case ofconvergent gibberish or of a crashed derivation?Suppose, say, that Exp is merged as subject of a transitive verb construction,yielding the VP (197), an instance of (182).(197)

vmaxExp

vv

VPsaw

someone

We next merge strong T, which attracts Exp, yielding finally the overt form(198).(198) there saw someoneRaising of Exp satisfies the strong feature of T (EPP), but not its Casefeature. Covertly, FF(Obj) adjoins to T, providing a Case feature. But thereare two heads that have to check a Case feature: T (nominative) and see(accusative). Hence, the derivation will crash, with one or the other not satisfied. Such a derivation could converge only if the language had a verbSEE, like see except that it assigns no Case. Since no issue of blocking convergent derivations seems to arise, we are left without a satisfactory answerto the question of the status of the analogue of (198) with SEE in placeof see. It could be that the derivation crashes because of a violation of the-Criterion, so there can be no such verb as SEE; or it could be that the derivation converges and a language can have such a verb (perhaps English in factdoes), though it appears only in deviant expressions violating the -Criterion.We have seen that an argument lacking a -role is illegitimate, violating FIand causing the derivation to crash; but the question of assignment of -roleremains open.Putting the option aside as either nonconvergent or gibberish, we consideronly Exp merged in a non--position. We thus keep to the earlier conclusionthat the only position that Exp can enter either by Merge or Attract/Move

Categories and Transformations

335

is one induced by a strong [nominal-] feature, hence [Spec, T] or [Spec2, v]

(the outer Spec) in (188). We also know that nothing can be raised to a-position. Hence, Exp never appears in a -position in the course of aderivation.Can Exp be merged in the outer Spec of v, [Spec2, v] in (188)? There aretwo cases to consider: Exp remains in this position, or it raises to [Spec, T].The latter possibility is excluded, because the effect at both PF and LF is thesame as merging Exp in [Spec, T] in the first place. Therefore, the economyprinciple (76) prevents selection of the strong [nominal-] feature of v for thenumeration. The only remaining case, then, is merger of Exp in [Spec, v],where it remains at LF.At LF, Exp is simply the categorial feature [D]. Any phonetic featureswould have been stripped away at Spell-Out, and we have seen that Exphas no other formal features. Lacking semantic features, Exp has to beeliminated at LF if the derivation is to converge: its D-feature is Interpretableand must be deleted by some operation. Therefore, [D] must enter into achecking relation with some appropriate feature F. As we have just seen,T does not offer a checking domain to which Exp can raise,135 so F mustraise to the checking domain of Exp, which means that F must adjointo it. What is F? Independently, there is good reason to believe that thecategorial feature [N] adjoins to [D] regularly, namely, in the DNP construction (Longobardi 1994). The optimal assumption, then, would be that it isadjunction of the feature [N] to Exp that eliminates its (sole remaining) feature.The feature [D] of Exp cannot be erased in this configuration: that wouldeliminate the category completely, leaving us with an illegitimate syntacticobject. Therefore, checking of the categorial feature of Exp (its entire content)deletes it but does not go on to erase it, by the general principles alreadydiscussed.The optimal assumption requires nothing new. Exp has no complement, butin the relevant formal respects, the head-complement relation that allowsN-raising to D is the same as the [Spec-] relation holding between Exp inSpec and the X projection of T. In a properly formed expletive construction,the formal features FF(A) of the associate A adjoin to matrix I (which we nowtake to be T0 max), checking Case and agreement and allowing matrix-typebinding and control. The categorial feature [N] of A comes along as a freerider and is therefore in the right position to adjoin to Exp, forming [D N Exp].The configuration so formed places [D] in a checking configuration with raised[N], as in the D-complement structure.136 Like D that takes a complement,expletive D has a strong [nominal-] feature, which attracts [N]a residue ofthe earlier adjunction-to-expletive analysis.

336

Chapter 4

Returning to (196), we recall that the sole remaining problem was to

show that Exp does not appear at LF in Spec2 of (188)the object-raisingposition. We have to show, then, that if Exp is in that position, no [N] can raiseto it.137There is only one possibility for N-raising to Spec2: namely, the categorialfeature [N] of Subj in Spec1.138 The operation is permitted, so we have to showthat the derivation will crash for other reasons. We already have such reasons.Subj must raise to the checking domain of T, leaving a trace in Spec1. Weconcluded earlier (see (94)) that traces of A-movement are inaccessible toAttract/Move. Therefore, if raising of FF(Subj) has taken place (either as partof overt substitution in [Spec, T] or as part of covert adjunction), [N] cannotraise from the trace of FF(Subj) to adjoin to Spec2. The only remaining possibility is that Subj remains in situ in Spec1 with Exp in Spec2, and [N] raisesfrom Subj, adjoining to [D] in Spec1 automatically carrying along FF[N]. ButFF(Subj) must now raise to T0 max, raising the trace of FF[N], an operation thatshould also be barred by (94). We see that there is good reason to interpretthat constraint (which was purposely left a bit vague) quite strictly. If so, therestriction of Exp to [Spec, T] is explained (with the qualifications of thepreceding notes).These observations suggest that a still stricter interpretation of (94) mightbe warranted, strengthening the condition on argument chains, repeated as(199), to the provision (200).(199) Only the head of CH can be attracted by K.(200) can be attracted by K only if it contains no trace.The suggestion would have been unacceptable in earlier versions for a varietyof reasons, but the objections largely dissipate if covert movement is onlyfeature raising and some earlier suggestions have merit.139 One immediateconsequence is that overt countercyclic operations of the kind that motivatedthe extension condition are ruled out (see discussion of (138)). Nevertheless,(200) may be too strong even within the core computational system; we willreturn to this.We see, then, that the descriptive observation (196) is well established onreasonable grounds.140 It is possible that something similar is true of nonpureexpletives of the it-type, which are associated with CPs with complementizers:that, for, or Q (the phenomenon of extraposition, however it is analyzed). Theydo not appear with control or raising infinitivals.(201) a. *it is certain [PRO to be intelligent]b. *it is certain [John to seem t to be intelligent]

Categories and Transformations

337

Possibly the overt complementizer head of the extraposed associate raises to

the expletive, deleting it as in pure expletive constructions and thus satisfyingFI. But see note 68.The analysis of (196) raises in a sharper form an unsettled problem lingeringfrom before. In discussing (168ab)essentially (202ab)we observed thatit is not clear why both are allowed (assuming that they are, when MSCs arepermitted).(202) a. there seems [TP [someone] t [TP tSubj to be in the room]]b. there seems [TP tExp [TP someone to be in the room]]To explain the notations, TP in both cases is the complement of seem; t is thetrace of seem; tSubj is the trace of someone; and tExp is the trace of there. Expoccupies the outer Spec of the matrix MSC in (202a), and its trace occupiesthe outer Spec of the embedded MSC in (202b). The Subj someone is in [Spec,T] in both cases, in the matrix clause in (202a) and in the embedded clause in(202b). Thus, the matrix clause is an MSC in (202a) and the embedded clauseis an MSC in (202b).The earlier discussion entailed that (202b) blocks (202a) because at thecommon stage of the derivation when only the most deeply embedded T projection has been formed, insertion of Exp is more economical than raising ofSubj to its Spec. The only permissible alternative to (202b), then, should be(203), with Subj remaining in the unraised associate position.(203) there seems [IP t to be [someone in the room]]We now have a further problem. We have just seen that the construction (204),which is rather similar to (202b), is not permitted by virtue of the economyprinciple (76).(204) Exp T ... [P t Spec2 [Vb XP]]Here raising of Exp from the Spec determined by the strong feature of v isbarred: the strong feature cannot appear in the numeration because it has noeffect on PF or LF output. In (202b), however, raising of Exp from the extraSpec determined by the strong feature of embedded T is permitted, eventhough it seems to be barred by this condition and, in case (202a), by yetanother economy condition: that the most economical step must be taken ateach point in the derivation.The problems unravel in the present framework. By the economy condition(76), a strong feature can enter the numeration if it has an effect on outputinthis case, PF output, because only a pure expletive is involved. That sufficesto bar (204): adding the strong feature to the v head of Vb has no PF effect.Turning to (202), we see that (202a) derives from adding an extra strong

338

Chapter 4

feature to matrix T, and (202b) from adding an extra strong feature to embedded Ttwo different elements.141 In each case there is an effect on PF output.Suppose matrix T enters the numeration without a strong feature that allowsan extra T. Embedded T is a different element in the numeration: if it lacksthe strong feature that allows an extra Spec, then we derive (203); if it has thisstrong feature, we derive the distinct PF form (202b). Therefore, (76) is inapplicable. The same is true of the economy principle that forced selection ofexpletive over raising in the earlier theory. That selection is forced only if thederivation converges; and if the embedded T has the strong feature requiringMSC, the derivation will not converge unless raising precedes insertion of Exp,giving (202b).142We have considered various kinds of expletive constructions, includingembedding constructions that bar superraising, MSCs of various kinds, andECM constructions. There is a fairly complex array of options. Within thisrange of constructions, at least, the options are determined by elementaryprinciples of economy: (1) add optional to the numeration only if it has aneffect at the interface; (2) at each stage of a derivation, apply the most economical operation that leads to convergence. So far, at least, the results lookencouraging.4.10.3.

Empirical Expectations on Minimalist Assumptions

With these clarifications, let us turn to the questions delayed at the end ofsection 4.10.1. The more restricted framework imposed by strict adherence tominimalist assumptions eliminates mechanisms that previously barredunwanted derivations. We therefore face problems of two kinds: to show that(1) the right derivations are permitted, and (2) the doors have not been openedtoo wide. The specific line of argument is sometimes fairly intricate, but itsimportant to bear in mind that at root it is really very simple. The basic guidingidea is itself elementary: that the array of consequences is determined by strictapplication of minimalist principles (to be sure, construed in only one of thepossible ways; see the introduction). To the extent that the conclusions areconfirmed, we have evidence for a conception of the nature of language thatis rather intriguing.We may continue to limit attention to simple transitive verb constructions,taken to be of the form (182) before T is added to yield TP. It suffices toconsider overt V-raising, which brings up harder problems; the covert-raisingalternatives fall out at once if this case is handled.The first problem that arises is that we are predicting the wrong order ofelements for MSCs. As noted, the observed order is (205b) instead of thepredicted (205a).

Categories and Transformations

339

(205) a. Exp [Subj [T0 max XP]]

b. Exp T0 max Subj XPThe best answer would be that the order really is (205a) throughout theN computation. If the expletive is null, we do not know its position,though (205a) is expected by analogy to the overt case. In section 4.9 we notedthe possibility that the expletive in MSCs is overt in order to satisfy theV-second property, which may belong to the phonological component. If thatis the case, the observed order is formed by phonologic operations that areextraneous to the N computation and may observe the usual constraints(V C), but need not, as far as we know: T0 max-adjunction to expletive or toTP, for example. Let us assume the best case and see where that leads. Wethus take the order to be really (205a), irrespective of what is observed at thePF output.T and V have intrinsic Interpretable features that must be checked:for T, [(assign) Case] (nominative or null); and for V, its -features and[(assign) accusative Case]. In addition, the nonsubstantive categories Tand v may (optionally) have a strong [nominal-] feature, which is alsoInterpretable. All have to be erased in a checking relation established byMerge or Move, by substitution or adjunction. Optional features are chosenwhen needed for convergence, as little as possible, in accordance with theeconomy condition (76). Features are deleted when checked. They arefurthermore erased when this is possible, apart from the parametric variationthat permits MSCs; see discussions below (58) and (185). Erasure is possiblewhen no illegitimate object is formed (detectable at once). Checking takesplace in the order in which the relations are established. These are theoptimal assumptions: we hope to show that they allow exactly the rightarray of empirical phenomena. We are concerned now only with strongfeatures.As the derivation proceeds, the first checking relation that can be establishedis by overt substitution in Spec of v. We have two proposals under consideration: (1) closer than is defined in terms of c-command alone and overtobject raising can only be to the inner Spec; (2) closer than is defined interms of c-command and equidistance, and the object may (perhaps must) raiseto the outer Spec. Again, let us restrict attention to the more complex variant(2); under (1), no problem arises, as can readily be checked.Suppose, then, that overt substitution is in the outer Spec of v: Spec2 of(188). In this case v has a strong [nominal-] feature. It cannot be checked bymerged Subj, as we have seen, so there must be an extra Spec as in (188),more explicitly (206).

340

Chapter 4

vP

(206)Spec2

vSubj

vVb

VPtv

Obj

Again, Vb is the complex form [ V v]. Subj is in Spec1 for -theoretic reasonsunrelated to strength.Spec2 must be filled overtly to remove the strong feature of v before a distinct larger category is formed: in this case, before merger with T. This can bedone by Merge or Attract/Move. We have already excluded Merge. An argument inserted in this position does not establish a checking relation (so thatthe strong feature is not checked) and also lacks a -role, violating FI; thederivation crashes. Expletives cannot be merged in this position, as we haveseen. The only option, then, is raising of either Obj or Subj. We want toallow only raising of Obj, which, as we have seen, then permits the derivation to converge by raising of Subj to the checking domain of T, and onlythat way.We have briefly (and incompletely) considered why raising of Subj to Spec2of (206) is barred. Let us look at the possibilities more closely, to clarify theissues.Suppose that Subj is raised to Spec2 in (206). It is in the checking domainof V, and checking relations are established for Case and -features. If featuresmismatch, then the derivation is canceled. If they match, then Subj receivesaccusative Case and object agreement, and the Case and -features of V erase.The Case of Obj still has to be checked, and that will have to take place in thechecking domain of T. But unraised Obj cannot reach that position, as we haveseen, because Subj is closer to T. The trace of Subj in Spec1 is invisible toAttract/Move and therefore does not prevent raising of Obj, but Subj itselfdoes.The only possibility, then, is that Subj raises further, to the checking domainof T. Now its trace in Spec2 is invisible, and FF(Obj) can raise to T. The traceleft in Spec2 deletes, and at LF the result is identical to the result of the derivation in which Spec2 was never constructed. The strong feature of v has noeffect on the PF or LF output in this derivation and therefore cannot have been

Categories and Transformations

341

selected for the numeration, by the economy principle (76). This option istherefore excluded. If Spec2 exists at all, it must be formed by overt objectraising.As the derivation proceeds, the next checking relation that can be established is substitution in the first [Spec, T] that is formed (EPP), either byMerge or Attract/Move. For Merge, the only option is an expletive, which (ifpure) establishes a checking relation only with the strong [nominal-] featureof T, requiring covert associate raising; raising of an expletive to this positionworks the same way. The only remaining case is raising of an argument to[Spec, T], necessarily Subj, as we have seen. Then its features enter intochecking relations with the most proximate sublabels of the target, inthe obvious sense: the -features of V (which are the only ones), and theCase feature of T.143 Vb cannot have a strong feature at this stage of the derivation, but a checking relation is established with the strong [nominal-] featureof T that forced the overt substitution. The sublabels in checking relationserase if matched by those of Subj; if there is a mismatch, the derivation iscanceled.Suppose either T lacks a strong [nominal-] feature or that feature has alreadybeen erased by substitution of Exp in Spec. Then FF(Subj) raises to adjoin toT0 max, forming (194), modified slightly here.(207)

TFF(Subj)

TVb

Checking proceeds exactly as in the [Spec, T] case, canceling the derivation

if there is a mismatch, erasing Interpretable features if there is a match. IfObj has raised overtly to [Spec, Vb], its features are checked there and undergono covert raising. If Obj has not raised overtly, then FF(Obj) raises to T0 max,forming (195), repeated here.(208)

TFF(Obj)

FF(Subj)Vb

TT

342

Chapter 4

The Case feature of T has already been erased by Subj, so FF(Obj) checks theCase feature of V, canceling the derivation unless its own Case feature isaccusative. Nominative Case and subject agreement necessarily coin