Comments

Monday, May 27, 2013

A while ago I discussed the rising incentive for BS in academic life. Something that I failed to emphasize is the way that our leading "science" journals have morphed into PR firms, where scoops are the valued currency. The leading science journals (Science, Nature, PNAS) embargo dissemination of information from forthcoming publications until the release date. The released papers are often part of large PR rollouts intended to wow the public (a kind of PR shock and awe). It all has the markings of publicity for a new Dan Brown blockbuster or a hot new hollywood summer movie. What it has little in common with are the virtues that scientists like to claim for themselves and their enterprise.

A recent post by Richard Sproat (here) provides an illustration (thx to David Pesetsky for bringing it to my attention). What Sproat describes is a typical case of protecting your brand. BS gets published and nothing that calls it for what it is gets a hearing. Why not? It will tarnish the Science brand. What would the journal be worth if it became clear that a good chunk of what it published was of little intellectual value? This is a rhetorical question, btw.

There has been a lot of hand wringing over fraud in science. I've talked about this some and evinced my skepticism about how serious this is, especially in cases where it appears that "fraudulent" results replicate (as in Hauser's case, to name an important recent "problem"). Read the Sproat piece and consider which is worse: the systematic suppression of criticism or scientific fraud? Which is systematized? Which more completely pollutes the data stream?

I would bet that most is not. However, enough just might be to undermine our confidence in the seriousness of our leading publications. BS is very hard to combat, especially when our journals see themselves as part of the entertainment industry. Are Science and Nature becoming Variety with formulae? At least when discussing language issues, it looks like it might be.

[1]
This does not mean that all material
published on these topics in these journals is BS (see for example the
excellent paper by Idsardi and Heinz (here).

Saturday, May 25, 2013

There's been a very vigorous discussion in the comment sections to this post which expatiates on the falsifiability of proposals within generative grammar of the Chomskyan variety. It interweaves with a second point: the value of formalization. The protagonists are Alex Clark and David Pesetsky. It should come as no surprise to readers that I agree with David here. However, the interchange is worth reading and I recommend it to your attention precisely because the argument is a "canonical" one in the sense that they represent two views that about the generative enterprise that we will surely hear again (though I hope that very soon David's views prevail, as they have with me).

Though David has said what I would have said (though much better) let me add three points.

First, nobody can be against formalization. There is nothing wrong with it (though there is nothing inherently right about it either). However, in my experience its value lies not in making otherwise vague theories testable. Indeed, as we generally evaluate a particular formalization in terms of whether it respects the theoretical and empirical generalizations of the account it is formalizing, it is hard to see how formalization per se can be the feature that makes an account empirically evaluable. This comes out very clearly, for example, in the recent formalization of minimalism by Collins and Stabler. At virtually every point they assure the reader that a central feature of the minimalist program is being coded in such and such a way. And, in doing this, they make formal decisions with serious empirical and theoretical consequences and that could call into question the utility of the particular formalization. For example, the system does not tolerate sidewards movement and whether this formalization is empirically and theoretically useful may rest on whether UG allows sidewards movement or not. But the theoretical and empirical adequacy of sidewards movement and formalizations that encode it or not is not a question that any given proposed formalization addresses nor can address (as Collins and Stabler know). So, whatever, the utility of formalization, to date with some exceptions (I will return to two), it does not primarily lie in making theories that would otherwise be untestable, testable.

So what is the utility? I think that when done well, formalization allows us to clarify the import of our basic concepts. It can lay bear what the conceptual dependencies between our basic concepts are. Tim Hunter's thesis is a good example of this, I think. His formalization of some basic minimalist concepts allows us to reconceptualize them and consequently extend them empirically (at least in principle). So too with Alex Drummond's formal work on sidewards movement and Merge over Move. I mention these two because this work was done here at UMD and I was exposed to the thinking as it developed. It is not my intention to suggest that there is not other equally good work out there.

Second, any theory comes to be tested only with the help of very many ancillary hypotheses. I confess to feeling that lots of critics of Generative Grammar would benefit by reading the work criticizing naive falsificationsim (Lakatos, Cartwright, Hacking, and a favorite of mine, Laymon). As David emphasizes, and I could not agree more, it is not that hard to find problems with virtually every proposal. Given this, the trick is to evaluate proposals despite their evident shortcomings. The true/false dichotomy might be a useful idealization within formal theory, but it badly distorts actual scientific practice where the aim is to find better theories. We start from the reasonable assumption that our best theories are nonetheless probably false. We all agree that the problems are hard and that we don't know as much as we would like. Active research consists in trying to find ways of evaluating these acknowledged false accounts so that we can develop better ones. And where the improving ideas will come from is often quite unclear. Let me give a couple of example of how vague the most progressive ideas can be.

Consider the germ theory of disease. What is it? It entered as roughly the claim that some germs cause some diseases sometimes. Not one of those strongly refutable claims. Important. You bet. It started people thinking in entirely new ways and we are all the beneficiaries of this.

The Atomic Hypothesis is in the same ball park. Big things are made up of smaller things. This was an incredibly important idea (Feynman, I think, thought this was the most important scientific idea ever). Progress comes from many sources, formalization being but one. And even pretty labile theories can be tested, as, e.g. the germ theory was.

Third: Alex Clark suggests that only formal theories can address learnability concerns. I disagree. One can provide decent evidence that something is not learnable without this (think of Crain's stuff or the conceptual arguments against the learnability of island conditions). This is not to dispute that formal accounts can and have helped illuminate important matters (I am thinking of Yang's stuff in particular, but a lot of stuff done by Berwick and his students are, IMO, terrific). However, I confess that I would be very suspicious of formal learnability results that "proved" that Binding Theory was learnable, or that Movement locality theory (aka Subjacency) was or that ECP or structure dependence was. The reasons for taking these phenomena as indictions of deep grammatical structural principles is so convincing (to me) that they currently form boundary conditions on admissible formal results.

As I said, the discussion is worth reading. I suspect that minds will not be changed, but this does not make going through this (at least once anyhow) worthwhile.

Tuesday, May 21, 2013

Neuroscientists win Nobels. They get years of the brain, presidents asking for billions of dollars for connectomes and brain atlases, and billion dollar grants (see here) to build computer brains to find out how we think. Neuroscience is a prestige science. Sadly, linguistics is not.

The paper noted above is the latest indication of the cachet that neuroscience has. However, buried in the article discussing this latest funding coup (btw, I have nothing against this, though I am envious, for none of this money would ever have come to me and better this than another fancy jet or tank) is an indication of how little contemporary neuroscience can tell us about how brains affect behavior or mental capacity. And not because we don't have a full fledge connectome or map of the brain. Consider the lowly roundworm: full wiring diagram and no idea why it does what it does. Don't take my word for this. Here's Christoph Koch one of the leaders in the field:

“There are too many things we don’t yet know,” says Caltech professor Christof Koch, chief scientific officer at one of neuroscience’s biggest data producers, the Allen Institute for Brain Science in Seattle. “The roundworm has exactly 302 neurons, and we still have no frigging idea how this animal works.”
So, next time a neuroscientist tells you that linguistic representations cannot be right because they are incompatible with what we know about brains, worry not. We don't seem to know much about brains, at least where it counts: coupling the structure of brains to what we (or even roundworms) do.

Monday, May 20, 2013

I confess that I did not read the Evans and Levinson article
(EL) (here) when it first came out. Indeed, I didn’t read it until last
week. As you might guess, I was not
particularly impressed. However, not necessarily for the reason you might
think. What struck me most is the crudity of the arguments aimed at the
Generative Program, something that the (reasonable) commentators (e.g. Baker,
Freidin, Pinker and Jackendoff, Harbour, Nevins, Pesetsky, Rizzi, Smolensky and
Dupoux a.o.) zeroed in on pretty quickly. The crudity is a reflection, I
believe, of a deep seated empiricism, one that is wedded to a rather
superficial understanding of what constitutes a possible “universal.” Let me
elaborate.

EL adumbrates several conceptions of universal, all of which
the paper intends to discredit. EL distinguishes substantive universals from
structural universals and subdivides the latter into Chomsky vs Greenberg formal
universals. The paper’s mode of argument is to provide evidence against a
variety of claims to universality by citing data from a wide variety of
languages, data that EL appears to believe, demonstrate the obvious inadequacy
of contemporary proposals. I have no expertise in typology, nor am I
philologically adept. However, I am pretty sure that most of what EL discuss
cannot, as it stands, broach many of
the central claims made by Generative Grammarians of the Chomskyan stripe. To
make this case, I will have to back up a bit and then talk on far too long.
Sorry, but another long post. Forewarned, let’s begin by asking a question.

What are Generative Universals (GUs) about? They are intended to be in the first
instance, descriptions of the properties of the Faculty of Language (FL). FL
names whatever it is that humans have as biological endowment that allows for
the obvious human facility for language. It is reasonable to assume that FL is
both species and domain specific. The species specificity arises from the
trivial observations that nothing does language like humans do (you know: fish
swim, birds fly, humans speak!). The domain specificity is a natural conclusion
from the fact that this facility arises in all humans pretty much in the same
way independent of other cognitive attributes (i.e. both the musical and the
tone deaf, both the hearing impaired and sharp eared, both the mathematically
talented and the innumerate develop language in essentially the same way). A natural conclusion from this is that humans
have some special features that other animals don’t as regards language and
that human brains have language specific “circuits” on which this talent rests.
Note, this is a weak claim: there is something
different about human minds/brains on which linguistic capacity supervenes.
This can be true even if lots and lots of our linguistic facility exploits the
very same capacities that underlie other forms of cognition.

So there is something special about human minds/brains as
regards language and Universals are intended to be descriptions of the powers
that underlie this facility; both the powers of FL that are part of general
cognition and those unique to linguistic competence. Generativists have proposed elaborating the
fine structure of this truism by investigating the features of various natural
languages and, by considering their properties, adumbrating the structure of
the proposed powers. How has this been done? Here again are several trivial
observations with interesting consequences.

First, individual languages have systematic properties. It
is never the case that, within a given language, anything goes. In other words, languages are rule governed.
We call the rules that govern the patterns within a language a grammar. For
generativists, these grammars, their properties, are the windows into the structure
of FL/UG. The hunch is that by studying the properties of individual grammars,
we can learn about that faculty that manufactures grammars. Thus, for a generativist, the grammar is the relevant unit of
linguistic analysis. This is important. For grammars are NOT surface
patterns. The observables linguists have tended to truck in relate to patterns
in the data. But these are but way stations to the data of interest: the grammars that generate these patterns. To talk about FL/UG one needs to study
Gs. But Gs are themselves inferred from
the linguistic patterns that Gs generate, which are themselves inferred from
the natural or solicited bits of linguistic productions that linguists bug
their friends and collaborators to cough up. So, to investigate FL/UG you need
Gs and Gs should not be confused with their products/outputs, only some of
which are actually perceived (or perceivable).

Second, as any child can learn any natural language, we are
entitled to conclude from the intricacies of any given language to powers of
FL/UG capable of dealing with such intricacies.
In other words, the fact that a given language does NOT express property
P does not entail that FL/UG is not sensitive to P. Why? Because a description
of FL/UG is not an account of any given language/G but an account of linguistic
capacity in general. This is why one can
learn about the FL/UG of an English speaker by investigating the grammar of a
Japanese speaker and the FL/UG of both by investigating the grammar of a
Hungarian, or Swahili, or Slave speaker. Variation among different grammars is
perfectly compatible with invariance in FL/UG, as was recognized from the
earliest days of Generative Grammar. Indeed, this was the initial puzzle: find
the invariance behind the superficial difference!

Third, given that some
languages display the signature properties of recursive rule systems (systems
that can take their outputs as inputs), it must be the case that FL/UG is
capable of concocting grammars that have this property. Thus, whatever G an
individual actually has, that individual’s FL/UG is capable of producing a
recursive G. Why, because that individual could
have acquired a recursive G even if that individual’s actual G does not
display the signature properties of recursion. What are these signature
properties? The usual: unboundedly large
and deep grammatical structures (i.e. sentences of unbounded size). If a given
language appears to have no upper bound on the size of its sentences, then it's
a sure bet that the G that generates the structures of that language is
recursive in the sense of allowing structures of type A as parts of structures
of type A. This, in general will suffice to generate unboundedly big and deep
structures. Examples for this type of recursion include conjunction,
conditionals, embedding of clauses as complements of propositional attitude
verbs, relative clauses etc. The reason
that linguists have studied these kinds of configurations is precisely because
they are products of grammars with this interesting property, a property that
seems unique to the products of FL/UG, and hence capable of potentially telling
us a lot about the characteristics of FL/UG.

Before proceeding, it is worth noting that the absence of
these noted signature properties in a given language L does not imply that a
grammar of L is not basically recursive.
Sadly, FL seems to leap to this conclusion (443). Imagine that for some
reason a given G puts a bound of 2 levels of embedding on any structure in L.
Say it does this by placing a filter (perhaps a morphological one) on more
complex constructions. Question: what is the correct description of the grammar
of L? Well, one answer is that it does
not involve recursive rules for, after all, it does not allow unbounded
embedding (by supposition). However,
another perfectly possible answer is that it allows exactly the same kinds of
embedding that English does modulo this
language specific filter. In that
case the grammar will look largely like the ones that we find in languages like
English that allow unbounded embedding, but with the additional filter. There
is no reason just from observing that
unbounded embedding is forbidden to conclude that the grammar of this
hypothetical language L (aka Kayardild or Piraha) has a grammar different in kind from the grammars we attribute
to English, French, Hungarian, Japanese etc. speakers. In fact, there is reason to think that the Gs
that speakers of this hypothetical language have does in fact look just like
English etc. The reason is that FL/UG is
built to construct these kinds of grammars and so would find it natural to do
so here as well. Of course L would seem
to have an added (arbitrary) filter on the embedding structures, but otherwise
the G would look the same as the G of more familiar languages.

An analogy might help.
I’ve rented cars that have governors on the accelerators that cap speed
at 65 mph. The same car without the
governor can go far above 90 mph. Question: do the two cars have the same
engine? You might answer “no” because of
the significant difference in upper limit speeds. Of course, in this case, we
know that the answer is “yes”: the two cars work in virtually identical ways,
have the very same structures but for the
governor that prevents the full velocity potential of the rented car from
being expressed. So, the conclusion that
the two cars have fundamentally different engines would be clearly
incorrect. Ok: swap Gs for engines and
my point is made. Let me repeat it: the
point is not that the Gs/engines
might be different in kind, the point is that simple observation of the
differences does not license the conclusion that they are (viz. you are not
licensed to conclude that they are just finite state devices because they don’t
display the signature features of unbounded recursion, as EL seems to). And, given what we know about Gs and engines
the burden of proof is on those that conclude from such surface differences to
deep structural differences. The
argument to the contrary can be made, but simple observations about surface
properties just doesn’t cut it.

Fourth, there are at least two ways to sneak up on
properties of UGs: (i) collect a bunch and see what they have in common (what
features do all the Gs display) and (ii) study one or two Gs in great detail
and see if their properties could be acquired from input data. If any could not
be, then these are excellent candidate basic features of FL/UG. The latter, of
course, is the province of the POS argument.
Now, note that as a matter of logic the fact that some G fails to have
some property P can in principle falsify a claim like (i) but not one like
(ii). Why? Because (i) is the claim that
every G has P, while (ii) is the claim that if G has P then P is the
consequence of G being the product of FL/UG. Absence of P is a problem for
claims like (i) but, as a matter of logic, not for claims like (ii) (recall, If
P then Q is true if P is false). Unfortunately,
EL seems drawn to the conclusion that PàQ is falsified if –P is
true. This is an inference that other papers (e.g. Everett’s Piraha work) are
also attracted to. However, it is a non-sequitur.

EL recognizes that arguing from the absence of some property
P to the absence of Pish features in UG does not hold. But the paper clearly wants to reach this
conclusion nonetheless. Rather than denying the logic, EL asserts that “the
argument from capacity is weak” (EL’s
emphasis). Why? Because EL really wants all universals to be of the (i)
variety, at least if they are “core” features of FL/UG. As these type (i)
universals must show up in every G if they are indeed universal, absence to
appear in one grammar is sufficient to call into question its universality. EL
is clearly miffed that Generativists in general and Chomsky in particular would
hold a nuanced position like (ii). EL seems to think that this is cheating in some
way. Why might they hold this? Here’s
what I think.

As I discussed extensively in another place (here), everyone
who studies human linguistic facility appreciates that competent speakers of a
language know more than they have been exposed to. Speakers are exposed to bits of language and
from this acquire rules that generalize to novel exemplars of that
language. No sane observer can dispute
this. What’s up for grabs is the nature
of the process of generalization. What separates empiricists from rationalists
conceptions of FL/UG is the nature of these inductive processes. Empiricists
analyze the relevant induction as a species of pattern recognition. There are
patterns in the data and these are
generalized to all novel cases.
Rationalists appreciate that this is an option, but insist that there
are other kinds of generalizations, those based on the architectural properties
(Smolensky and Dupoux’s term) of the generative procedures that FL/UG allow.
These procedures need not “resemble” the outputs they generate in any obvious
way and so conceiving this as a species of pattern recognition is not useful
(again, see here for more discussion).
Type (ii) universals fit snugly into this second type, and so
empiricists won’t like them. My own
hunch is that an empiricist affinity for generalizations based on patterns in
the data lies behind EL’s dissatisfaction with “capacity” arguments; they are
not the sorts of properties that inspection of cases will make manifest. In
other words, the dissatisfaction is generated by Empiricist sympathies and/or
convictions which, from where I sit, have no defensible basis. As such, they
can be and should be discounted. And in a rational world they would be. Alas…

Before ending, let me note that I have been far too generous
to the EL paper in one respect. I said
at the outset that its arguments are crude. How so? Well, I have framed the paper’s main point as
a question about the nature of Gs. However, most of the discussion is framed
not in terms of the properties of Gs they survey but in terms of surface forms
that Gs might generate. Their discussion
of constituency provides a nice example (441).
They note that some languages display free word order and conclude from
this that they lack constituents.
However, surface word order facts cannot possibly provide evidence for
this kind of conclusion, it can only tell us about surface forms. It is
consistent with this that elements that are no longer constituents on the
surface were constituents earlier on and were then separated, or will become
constituents later on, say on the mapping to logical form. Indeed, in one sense of the term constituent,
EL insists that discontinuous expressions are
such for they form units of interpretation and agreement. The mere fact that
elements are discontinuous on the surface tells us nothing about whether they
form constituents at other levels. I would not mention this were it not the
classical position within Generative Grammar for the last 60 years. Surface
syntax is not the arbiter of constituency, at least if one has a theory of
levels, as virtually every theory that sees grammars as rules that relate
meaning with sounds assumes (EL assumes this too). There is nary a grammatical structure in EL
and this is what I meant be my being overgenerous. The discussion above is
couched in terms of Gs and their features. In contrast, most of the examples in
EL are not about Gs at all, but about word strings. However, as noted at the
outset, the data relevant to FL/UG are Gs and the absence of Gish examples in
EL makes most of EL’s cited data irrelevant to Generative conceptions of FL/UG.

Again, I suspect that the swapping of string data for G data
simply betrays a deep empiricism, one that sees grammars as regularities over
strings (string patterns) and FL/UG as higher order regularities over Gs.
Patterns within patterns within patterns. Generativists have long given up on
this myopic view of what can be in FL/UG.
EL does not take the Generative Program on its own terms and show that
it fails. It outlines a program that Generativists don’t adopt and then shows
that it fails by standards it has always rejected using data that is nugatory.

I end here: there are many other criticisms worth making
about the details, and many of the commentators of the EL piece better placed
than me to make them do so. However, to my mind, the real difficulty with EL is
not at the level of detail. EL’s main point as regards FL/UG is not wrong, it
is simply besides the point. A lot of
sound and fury signifying nothing.

Wednesday, May 15, 2013

I recently read an intellectual biography of Feynman by
Lawrence Krauss (here) and was struck by the following contrast between physics
and linguistics. In linguistics, or at least in syntax, if a paper covers the very
same ground as some previously published piece of work, it is considered a
failure. More exactly, unless a paper can derive something novel (admittedly,
the novelty can be piddling), preferably something that earlier alternatives
cannot (or do not[1])
get, the paper will have a hard time getting published. Physicists, in contrast, greatly value
research that derives/explains already established results/facts in novel ways.
Indeed, one of Feynman’s great contributions was to recast classical quantum
mechanics (in terms of the Schrodinger equation) in terms of Lagrangians that
calculate probability amplitudes. At any
rate, this was considered an important and worthwhile project and it led, over
time, to whole new ways of thinking about quantum effects (or so Krauss argues). If Feynman had been a syntactician he would
have been told that simply re-deriving Schrodinger equation is not in and of itself enough: you also have
to show that the novel recasting could do things that the classical equation
could not. I can hear it now: “As you simply rederive the quantum effects
covered by the Schrodinger equation, no PhD for you Mr Feynman!”

Now, I have always found this attitude within
linguistics/syntax (ling-syn) rather puzzling. Why is deriving a settled effect
in a different way considered so uninteresting?
At least in ling-syn? Consider what happens in our “aspirational peers” in,
say, math. There are about a hundred proofs of the Pythagorean theorem (see
here) and, I would bet, that if someone came up with another one tomorrow it
could easily get published. Note, btw, we already know that the square of the hypotenuse of a right angles triangle
is equal to the sum of the squares of the other two sides (in fact we’ve known
this for a very long time), and nonetheless, alternative proofs of this very
well known and solid result are still noteworthy, at least to mathematicians. Why?
Because, what we want from a proof/explanation involves more than the
bottom (factual) line. Good explanations/proofs show how fundamental concepts
relate to one another. They expose the fine structure and the fault lines of
the basic ideas/theories that we are exploring. Different routes to the same
end not only strengthen our faith in the correctness of the derived fact(oid)
they also, maybe more importantly, demonstrate the inner workings of our
explanatory apparatus.

Interestingly, it is often the proof form rather than the
truth of the theorem that really matters. I recall dimly that when the four
color problem was finally given a brute force computer solution by cases, NPR
interviewed a leading topologist who commented that the nature of the proof
indicated that the problem was not as interesting as had been supposed! So,
that one can get to Rome is interesting. However, no less interesting is the
fact that one can get there in multiple ways. So, even if the only thing a
novel explanation explains is something that has been well explained by another
extant story, the very fact that one can get there both from varying starting
points is interesting and important. It is also fun. As Feynman put it: “There
is a pleasure in recognizing old things from a new viewpoint.” But, for some
reason, my impression is that the ling-syn community finds this unconvincing.

The typical ling-syn paper is agonistic. Two (or more)
theories are trotted out to combat one another. The accounts are rhetorically made
to face off and data is thrown at them until only one competitor is left standing,
able to “cover the facts.” In and of
itself, trial by combat need not be a bad way to conduct business. Alternatives
often mutually illuminate by being contrasted, and comparison can be used to
probe the inner workings so that the bells and whistles that make each run can
be better brought into focus.

However, there is also a downside to this way of proceeding.
Ideas have an integrity of their own which support different ways of packaging
thoughts. These packages can have
differing intellectual content and disparate psychological powers. Thus, two accounts that get all the same
effects might nonetheless spur the imagination differently and, for example,
more or less easily suggest different kinds of novel extensions. Having many ways of conceptualizing a
problem, especially if they are built from (apparently) different building
blocks (e.g. operations, first principles, etc.) may all be worth preserving
and developing even if one seems (often temporarily) empirically superior. The
ling-syn community suffers from premature rejection; the compulsion to quickly declare
a single winner. This has the side effect of entrenching previous winners and
requiring novel challengers to best them in order to get a hearing.

Why is the ling-syn community so disposed? I’m not sure, but here is a speculation.
Contrary to received opinion, ling-syns don’t really value theory. In fact,
until recently there hasn’t been much theory to speak of. Part of the problem
is that ling-syns confuse ‘formal’ with ‘theoretical.’ For example, there is
little theoretical difference between
many forms of GPSG, HPSG, LFG, RG, and GB, though you’d never know this from
the endless discussions over “framework” choice. The difference one finds here
are largely notational, IMO, so there is not room for serious theoretical disagreement.

When this problem is finessed, a second arises. There is
still in generative linguistics a heavy premium on correct description. Theory
is tolerated when it is useful for describing the hugely variable flora and
fauna that we find in language. In other words, theory in the service of
philology is generally acceptable.
Theory in the service of discovering new facts is also fine. But too
much of an obsession with the workings of the basic ideas (what my good and
great friend Elan Dresher calls “polishing the vessels”) is quite suspect, I
believe. As ‘getting there in different ways’ is mainly of value in understanding
how our theoretical concepts fit together (i.e. is mainly of
theoretical/conceptual value), this kind of work is devalued unless it can also be shown to have languistic consequences.

Until recently, the baleful effects of this attitude have
been meager. Why? Because ling-syn has actually been theory poor. Interesting theory
generally arises when apparently diverse domains with their own apparently
diverse “laws” are unified (e.g. Newtonian theory unified terrestrial and
celestial mechanics, Maxwell’s unified electricity and magnetism). Until
recently there were not good candidate domains for unification (Islands being
the exception). As I’ve argued in other places, one feature of the minimalist
program is the ambition to unify the apparently disparate domains/modules of GB,
and for this we will need serious theory. And to do this we will need to begin
to more highly value attempts to put ideas together in novel ways, even if for
quite a long while they do no better (and maybe a tad worse) than our favorite
standard accounts.

[1]
The two are very different. Data is problematic when inconsistent with the
leading ideas of an account. These kinds of counter-examples are actually
pretty hard to concoct.

Thursday, May 9, 2013

NPR (here) reports on recent research purporting to show that there is no faculty of language. What's the new evidence? Well, our new fancy shmancy imaging techniques allow us to look directly into brains and so we went to look for FL and, guess what? We failed to find it! Ergo, no "special module for language." In what did the failure consist? Here's the choice passage:

"But in the 1990s, scientists began testing the language-module theory using "functional" MRI technology that let them watch the brain respond to words. And what they saw didn't look like a module, says Benjamin Bergen, a researcher at the University of California, San Diego, and author of the book Louder Than Words.

"They found something totally surprising," Bergen says. "It's not just certain specific little regions in the brain, regions dedicated to language, that were lighting up. It was kind of a whole-brain type of process." "

So, the whole brain lights up when you hear a sentence and so there is no language module. Well, when you drive to Montreal from DC the whole car moves so it cannot have a fuel system right? What's to be done? Thankfully, Greg Hickok (quickly) walks us through the morass (here). And yes, the confusion is breath taking. Maybe NPR might wish to consult a few others when reporting this sort of stuff as exciting neuroscience. Yes, colored brains sell, even on radio. But isn't the aim of NPR to inform rather than simply titillate?

One of the nice thing about conferences is that you get to
bump into people you haven’t seen for a while. This past weekend, we celebrated
our annual UMD Mayfest (it was on prediction in ling sensitive psycho tasks)
and, true to form, one of the highlights of the get together was that I was
able to talk to Masaya Yoshida (a syntax and psycho dual threat at
Northwestern) about islands, subjacency, phases and the argument-adjunct
movement asymmetry. At any rate, as we
talked, we started to compare Phase Theory with earlier approaches to strict
cyclicity (SC) and it again struck me how unsure I am that the new fangled
technology has added to our stock of knowledge.
And, rather than spending hours upon hours trying to figure this out
solo, I thought that I would exploit the power of crowds and ask what the average
syntactician in the street thinks phases have taught us above and beyond
standard GB wisdom. In other words,
let’s consider this a WWGBS (what would GB say) moment (here) and ask what
phase wise thinking has added to the discussion. To set the stage, let me outline how I understand
the central features of phase theory and also put some jaundiced cards on the
table, repeating comments already made by others. Here goes.

Phases are intended to model the fact that grammars are SC.
The most impressive empirical reflex of this is successive cyclic
A’-movement. The most interesting
theoretical consequence is that SC grammatical operations bound the domain of
computation thereby reducing computational complexity. Within GB these two factors are the province
of bounding theory, aka Subjacency Theory (ST). The classical ST comes in two
parts: (i) a principle that restricts grammatical commerce (at least movement)
to adjacent domains (viz. there can be at most one bounding node (BN) between
the launch site and target of movement) and (ii) a metric for “measuring”
domain size (viz. the unit of measure is the BN and these are DP, CP, (vP), and
maybe TP and PP).[1]
Fix the bounding nodes within a given G and one gets locality domains that
undergird SC. Empirically A’-movement applies strictly cyclically because it
must given the combination of assumptions (i) and (ii) above.

Now, given this and a few other assumptions and it is also possible
to model island effects in a unified way.
The extra assumptions are: (iii) some BNs have “escape hatches” through
which a moving element can move from one cyclic domain to another (viz. CP but
crucially not DP) (iv) escape hatches can accommodate varying numbers of
commuters (i.e. the number of exits can vary; English thought to have just one,
while multiple WH fronting languages have many). If we add a further assumption
- (v) DP and CP (and vP) are universally BNs but Gs can also select TP and PP
as BNs – the theory allows for some typological variation.[2]
(i)-(v) constitutes the classical Subjacency theory. Btw, the reconstruction
above is historically misleading in one important way. SC was seen to be a consequence of the way in which island effects were unified. It’s
not that SC was modeled first and then assumptions added to get islands, rather
the reverse; the primary aim was to unify island effects and a singular
consequent of this effort was SC. Indeed, it can be argued (in fact I would so
argue) that the most interesting empirical support for the classical theory was
the discovery of SC movement.

One of the hot debates when I was a grad student was whether
long distance movement dependencies were actually SC. Kayne and Pollock and
Torrego provided (at the time surprising) evidence that it was, based on SC
inversion operations in French and Spanish.
Chung supplied Comp agreement evidence from Chamorro to the same
effect. This, added to the unification
of islands, made ST the jewel in the GB crown, both theoretically and
empirically. Given my general rule of thumb that GB is largely empirically accurate,
I take it as relatively uncontroversial that any empirically adequate theory of
FL must explain why Gs are SC.

As noted in a previous post (here), ST developed and
expanded. But let’s leave history behind
and jump to the present. Phase Theory (PT) is the latest model for SC. How does
it compare with ST? From where I sit, PT
looks almost isomorphic to it, or at least a version that extends to cover
island effects does. A PT of this ilk
has CP, vP and DP as phases.[3]
It incorporates the Phase Impenetrabiltiy Condition (PIC) that requires that
interacting expressions be in (at most) adjacent phases.[4]
Distance is measured from one phase edge to the next (i.e. complements to phase
heads are grammatically opaque, edges are not). This differs from ST in that
the cyclic boundary is the phase/BN head rather than the MaxP of the Phase/BN
head, but this is a small difference technically. PT also assumes “escape
hatches” in the sense that movement to a phase edge moves an expression from
inside one phase into the next higher phase domain and, as in ST, different
phases have different available edges suitable for “escape.” If we assume that Cs have different numbers
of available phase edges and we assume that D has no such available edges at
all then we get a theory effectively identical to the ST. In effect, we traded phase edges for escape
hatches and the PIC for (i).[5]

There are a few novelties in PT, but so far as I can tell
they are innovations compatible with ST. The two most distinctive innovations
regard the nature of derivations and multiple spell out (MSO). Let me briefly
discuss each, in reverse order.

MSO is a revival of ideas that go back to Ross, but with a
twist. Uriagereka was the first to
suggest that derivations progressively make opaque parts of the derivation by
spelling them out (viz. spell out (SO) entails grammatical inaccessibility, at
least to movement operations). This is not new.
ST had the same effect, as SC progressively makes earlier parts of the
derivation inaccessible to later parts.
PT, however, makes earlier parts of the derivation inaccessible by
disappearing the relevant structure.
It’s gone, sent to the interfaces and hence no longer part of the computation. This can be effected in various ways, but the
standard interpretations of MSO (due to Chomsky and quite a bit different form
Uriagereka’s) have coupled SO with linearization conditions in some way
(Uriagereka does this as do Fox and Pesetsky, in a different way). This has the
empirical benefit of allowing deletion to obviate islands. How? Deletion
removes the burden of PF linearization and if what makes an island an island
are the burdens of linearization (Uriagereka) or frozen linearizations (Fox and
Pesetsky) then as deletion obviates the necessity of linearization, island
effects should disappear, as they appear to do (Ross was the first to note this
(surprise, surprise) and Merchant and Lasnik have elaborated his basic insight
for the last decade!). At any rate, interesting though this is (and it is very
interesting IMO), it is not incompatible with ST. Why? Because, ST never said
what made an island an island, or more accurately, what made earlier cyclic
material unavailable to later parts of the computation. (i.e. it had not real theory of inaccessibility, just a
picture) and it is compatible with ST that it is PF concerns that render
earlier structure opaque. So, though PT incorporates MSO, it is something that
could have been added to ST and so is not an intrinsic feature of PT accounts. In other words, MSO does not
follow from other parts of PT any more than it does from ST. It is an add-on; a
very interesting one, but an add-on nonetheless.[6]

Note, btw, that MSO accounts, just like STs require a
specification of when SO occurs. It
occurs cyclically (i.e. either at the end of a relevant phase, or when the next
phase head is accessed) and this is how PT models SC.

The second innovation is that phases are taken to be the units of computation. In Derivation by Phase, for example,
operations are complex and non-markovian within the phase. This is what I take Chomsky to mean when he
says that operations in a phase apply “all at once.” Many apply simultaneously
(hence not one “line” at a time) and they have no order of application. I
confess to not fully understanding what this means. It appears to require a “generate
and filter” view of derivations (e.g. intervention effects are filters rather
than conditions on rule application). It
is also the case that SO is a complex checking operation where features are
inspected and vetted before being sent for interpretation. At any rate, the phase is a very busy place:
multiple operations apply all at once; expressions E and I merged, features
checked and shipped.

This is a novel conception of the derivation, but again, is
not inherent in the punctate nature of PT.[7]
Thus, PT has various independent parts, one of which is isomorphic to
traditional ST and other parts that are logically independent of one another
and the ST similar part. That which explains SC is the same as what we find in
ST and is independent of the other moving parts. Moreover, the parts of PT
isomorphic to ST seem no better motivated (and no less worse) than the
analogous features in ST: e.g. why the BNs are just these has no worse answer
within ST than the question why the phase heads are just those.

That’s how I see PT.
I have probably skipped some key features. But here are some crowd
directed questions: What are the parade cases empirically grounding PT? In
other words, what’s the PT analogue of affix hopping? What beautiful
results/insights would we loose if we just gave PT up? Without ST we loose an
account of island effects and SC. Without PT we loose…? Moreover, are these
advantages intrinsic to minimalism or could they have already been achieved in
more or less the same form within GB. In other words, is PT an
empirical/theoretical advance or just a rebranding of earlier GB technology/concepts
(not that there is anything intrinsically wrong with this, btw)? So, fellow minimalists, enlighten me. Show me
the inner logic, the “virtual conceptual necessity” of the PT system as well as
its empirical virtues. Show me in what ways we have advanced beyond our earlier
GB bumblings and stumblings. Inquiring minimalist minds (or at least one) want
to know.

[1]
This “history” compacts about a decade of research and is somewhat
anachronistic. The actual history is
quite a bit more complicated (thanks Howard).

[2]
Actually, if one adds vP as a BN then Rizzi like differences between Italian
and English cannot be accommodated. Why? Because, once one moves into an escape
hatch movement is thereafter escape hatch to escape hatch, as Rizzi noted for
Italian. The option of moving via CP is only available for the first move. Thereafter, if CP is a BN
movement must be CP to CP. If vP is added as a BN then it is the first
available BN and whether one moves through it or not, all CP positions must be
occupied. If this is too much “inside baseball” for you, don’t sweat it. Just
the nostalgic reminiscences of a senior citizen.

[3]
vP is an addition from Barriers
versions of ST, though how it is incorporated into PT is a bit different from
how vP acted in ST accounts.

[4]
There are two versions of the PIC, one that restricts grammatical commerce to
expressions in the same phase and a looser one that allows expressions in
adjacent phases to interact. The latter is what is currently assumed (for
pretty meager empirical reasons IMO – Nominative object agreement in quirky
subject transitive sentences in Icelandic, I think).

[5]
As is well known, Chomsky has been reluctant to extend phase status to D.
However, if this is not done then PT cannot account for island effects at all
and this removes one of the more interesting effects of cyclicity. There has
been some allusions to the possibility that islands are not cyclicity effects,
indeed not even grammatical effects.
However, I personally find the latter suggestion most implausible (see
the forthcoming collection on this edited by Jon Sprouse and yours truly: out
sometime in the fall). As for the former, well, if islands are grammatical
effects (and like I said, the evidence seems to me overwhelming) then if PT
does not extend to cover these then it is less empirically viable than ST. This does not mean that it is wrong to
divorce the two, but it does burden the revisionist with a pretty big
theoretical note payable.

[6]
MSO is effectively a theory of the PIC. Curiously, from what I gather, current
versions of PT have began mitigating the view that SO removes structure by
sending it to the interfaces. The problem is that such early shipping makes linearization
problematic. It is also necessitates
processes by which spelled out material is “reassembled” so that the interfaces
can work their interpretive magic (think binding which is across interfaces,
or clausal intonation, which is also defined over the entire sentence, not just
a phase).

[7]
Nor is the assumption that lexical access is SC (i.e. the numeration is
accessed in phase sized chunks). This is roughly motivated on (IMO view weak)
conceptual reasons concerning SC arrays reducing computational complexity and
empirical facts about Merge over Move (btw: does anyone except me still think
that Merge over Move regulates derivations?).