Comments

Sunday, May 31, 2015

I again mis-spelled someone's name. I "called" Winnie, "Wini." It is now corrected and I am sorry. However, the misspelling allows me to once again thank Winnie for all his great work in getting the Athens gig going. Thx and sorry.

******

I am currently sitting on the 6th floor of the
Athen’s Way Hotel, eating some fruit and yogurt and sipping a cup of tea. It’s
a beautiful day. The Road Ahead Conference (see here) ended yesterday and I
thought that I would jot down some quick comments.Before doing so, let me take this opportunity
to thank the organizers (Winnie Lechner, Marcel den Dikken, Terje Londahl,
Artemis Alexiadou and Peter Svenonius) for a wonderful event. It’s been a blast
and my most secure conclusion from the past three days is that if you ever get
invited anywhere to do anything by any of these people GO! Thx, and I am sure
that here I am not speaking just for myself but for all participants. That
pleasant task out of the way, here are a few impressions. In this post I’ll
talk about very general matters. In some follow up posts I’ll remark on some of
the socio-political issues raised and what we might do to address them. And in
some yet later posts I’ll discuss some of the stimulating questions raised and
ideas mooted. So let’s start.

My overall impression, one that runs quite counter to some
of the pessimism I often hear from colleagues about the state of contemporary
syntax, is that intellectually speaking,
we are in a golden age of syntax (though politically and sociologically, not so
much (I return to this in later posts)).What do I mean? Well, the organizers invited a very talented group of
people (present company excluded, of course) doing a pretty wide cross section
of syntactic work. What is very clear is that there is a huge amount of
excellent work being done on a wide variety of languages on a wide variety of
topics. Typology and morpho-syntax are particularly hot areas of current
interest, but syntacticians (in the wide sense meaning those with a good
knowledge of syntactic theory and methods) are also heavily involved in
“conjunctive” areas such as syntax + language acquisition, syntax + language
impairment/disorders, syntax + processing, a.o. As a result, we now know more
about the details and overall architectures of more grammars and have better
models of how this grammatical knowledge arises and is put to use in more areas
than ever before.Don’t get me wrong: it
is clear to all that there is a tremendous amount that we still do not know,
even about fundamental issues, but to someone like me, who has been doing this
stuff (or at least listening to people who do this stuff) for the last 40+
years, it is clear how much we have learned, and it is very very impressive.

Furthermore, it is also clear that we live in a time where all kinds of syntactic work can be
fruitfully pursued. What do I mean by “all”?Well, there are roughly three kinds of syntactic investigations: (i)
there is work on the structure of particular Gs, (ii) there is work on the
structure of UGs based on work of particular Gs and (iii) there is work on the
structure of FL/UG based on the particulars of UG. (i) aims at careful
description of the Gs of a given language (e.g. how does agreement work in L,
how are RCs formed? How does binding work?). (ii) aims to find the features
that delimit the range of possible Gs in part by distilling out the common
features of particular Gs. (iii) aims to simplify UG in part by unifying the
principles discovered by (ii) and in part by relating/reducing/unifying these
features with more general features of cognition and computation.All
three kinds of work are important and valuable. And though I want to plead
specially for (iii) towards the end, I want to be very very very clear that
this is NOT because I disvalue (i) or
(ii) or that I think that (iii) like work is inherently better than the others.
I don’t and never did. My main take-away from Athens is that there has never
been a better time to do all three kinds of work. I will return to a discussion
of (iii) for I believe the possibility of doing it fruitfully is somewhat of a
novelty and the field has not entirely understood how to accommodate it. But I
return to this. First the other two.

Let’s start with (i). If your heart gravitates towards
descriptive endeavors, there are many many entirely unexplored languages out
there waiting for your skills, and there are now well developed methods and
paradigms ready for you to apply (and modify). Indeed, one of the obvious
advances GG has made has been to provide a remarkably subtle and powerful set
of descriptive techniques (based of course on plenty of earlier theory) for
uncovering linguistically (i.e. grammatically) significant phenomena.
Typologists who fail to avail themselves of this technology are simply going to
mis-describe the individual languages
they investigate,[1]
let alone have nothing to say about the more general issues relating to the
variety of structures that natural languages display.[2]

Similarly if your interests are typological you are also in
luck. Though there are many more language families to investigate, many have
been looked at in great detail and Generative linguists have made a very good
start at limning the structural generalizations that cut across them. We now
have mapped out more properties of the grammar of case and agreement (at both
the specific and general levels) of more languages than ever before. We have
even begun to articulate solid mid level generalizations that cut across wide
swaths of these languages (and even some language families) enabling a more
sophisticated exploration of the parametric options that UG makes available. To
me, someone interested in these concerns but not an active participant in the
process, the theoretical speculations seemed extremely exciting (especially
those linking parameter theory to language change and language acquisition).
And though the trading relation between micro vs macro variation has not yet
been entirely sorted out (and this impression may be an optimistic appraisal of
the discussions I (over)heard) it is pretty clear that there is a thoughtful
research agenda about how to proceed to attack these big theoretical issues.
So, Generative Syntax (including here morpho-syntax) in this domain is doing
very very well.

Concerning (ii): A very nice outcome of the Athens
get-together was the wide consensus regarding what Amy Rose Deal dubbed
“Mid-Level Generalizations” (MLG). I have frequently referred to these as the
findings of GB, but I think that MLGs is a better moniker for it recognizes
that these generalizations are not the exclusive property of any specific
research tradition. So though I have tried to indicate that I consider the
differences between GB and LFG and HPSG and TAG and… to have been more notational
than notional, I adopt ARD’s suggestion that it is better to adopt a more
neutral naming convention so as to be more inclusive (does this count as PC?).
So from now on, MLGs it is!At any rate,
there is a pretty broad consensus among syntacticians about what these MLGs are
(see here for a partial enumeration, in, ahem GB terms). And this consensus
(based on the discovery of these MLGs) has made possible a third kind of
syntactic investigation, what I would dub pure
theoretical syntax (PTS).[3]
Now, I don’t want to raise hackles with this term, I just need a way of
distinguishing this kind of work from other things we call “theory.” I do not
mean to praise this kind of “theory” and demean the other. I really want to
just make room for the enterprise mentioned in (iii) by identifying it.

So what is PTS? It is directed at unifying the MLGs. The
great example of this in the GB tradition is Chomsky’s unification of Ross’s
island effects in “On Wh Movement” (OWM).[4]
Chomsky’s project was made feasible by Ross’s discoveries of the PLGs we (after
him) call “islands.” Chomsky showed how to treat these apparently disparate
configurations as instances of the same underlying system and in the process
removed the notion of construction from the fundamental inventory of UG. This
theoretical achievement in unification led to the discovery of successive
cyclicity effects, to the discovery that adjuncts were different from arguments
(both in how they move and how porous they are) and to the discovery of novel
locality effects (the ECP and CED in particular).

Someone at the conference (I cannot recall who) mentioned
that Chomsky’s work here was apparently un-motivated by empirical concerns.
There is a sense in which I believe this to be correct, and one in which it is
not. It is incorrect in that Ross’s islands, which were the target of Chomsky’s
efforts at unification, are MLGs and so based on a rich set of language
specific data (e.g. *Who did you meet someone who likes) which hold in a
variety of languages. However, in another sense it was not. In particular,
Chomsky did not aim to address novel particular data beyond Ross’s islands. In
other words, the achievement in OWM was the unification itself. Chomsky did not
further argue that this unification got us novel new data in, say, Hungarian.
Of course others went on to show that whatever Chomsky wanted to do the
unification had empirical legs. Indeed, the whole successive cyclicity industry
starting with Kayne and Pollock on stylistic inversion and proceeding through
McCloskey’s work on Irish, Chung’s on Chamorro, Torrego’s on Spanish and many
many others was based on this unification. However, Chomsky’s work was
theoretical in that its main concern was to provide a theory of Ross’s MLGs and
little (sic!) more.

Indeed, one can go a little further here. Chomsky’s
unification had an odd consequence in the context of Ross’s work. It proposed a
“new” island that Ross himself extensively argued against the existence of. I
am talking, of course, about Wh-islands, which Ross found to be highly
acceptable, especially if the clausal complement was non-finite. Chomsky’s
theory had to include these in his inventory despite Ross’s quite reasonable empirical
demurrals (voiced regularly ever since) because they followed from the unification.[5]
So, it is arguable (or was at the time), that Chomsky’s unification was less empirically adequate than Ross’s
description of islands. Nor was this the only empirical stumble that the
unification arguably suffered. It also predicted (at least in its basic form)
that who did you read a book about is
ungrammatical, which flies in the face of its evident acceptability.We now know that many of these stumbles led
to interesting further research (Rizzi on parameters for example), but it is
worth noting that the paper, now a deserved classic, did not emerge without
evident empirical problems.

Why is it worth noting this? Because it demonstrates a
typical feature of theoretical work; it makes novel connections, leading to new
kinds of research/data but often fails to initially (or even ever) cover all of the previously “relevant” data.
In other words, unification can incur an apparent empirical cost. Its virtue
lies in the conceptual ties it makes, putting things together that look very
different and this is a virtue even if some data might be initially (or even
permanently) lost. And this historical lesson has a moral: theoretical work,
work that in hindsight we consider invaluable, can start out its life
empirically hobbled. And we need to understand this if we are to allow it to
thrive. We need slightly different measures for evaluating such work. In
particular, I suggest that we look to this work more for its ‘ahaa’ effects
than for whether it covers the data points that we had presupposed, until its
emergence, to be relevant.

Let me make this point a slightly different way. Work like
(iii) aims to understand how something is possible not how it is actually. Now,
of course to be actual it is useful to be possible. However, many possible
things are not actual. That said, it is often the case that we don’t really see
how to unify things that look very different, and this is especially true when
MLGs are being considered (e.g. case theory really doesn’t look much like
binding theory and control does has very different properties from raising).
And here is where theory comes in: it aims to develop ways of seeing how two
things that look very different might
be the same; how might they possibly
be connected. I want to further observe that this is not always easy to
do.However, by the nature of the
enterprise, the possible often only coarsely fits the actual, at least until
empirical tailoring has had a chance to apply. This is why a some empirical
indulgence is condign.

So you may be wondering why I got off on this jag in a post
about the wonders of the Athens’ conference. The reason is that one of the
things that make this period of linguistics such a golden age is that the large
budget of MLGs the field seems to recognize makes it ripe for the theoretically
ambitious to play their unificational games. And for this work to survive (or even see the light of day) will
require some indulgence on the part of my more empirically conscious
colleagues. In particular, I believe that theoretical work will need to evaluated
differently (at least in the short and medium run) from the two other kinds of
work that I alluded to above, where empirical coverage is reasonably seen as
the primary evaluative hurdle.

More specifically, we all want our language particular
descriptions and MLGs to be empirically tight (though even fashioning MLGs some
indulgence (i.e. tolerance for “exceptions”) is often advisable), elegance be
damned. But we want our theories to be simple (i.e. elegant and conceptually
integrated and natural), and it is important to recognize that this is a virtue
even in the face of empirical leakage. Given that we have entered a period
where good theory is possible and desirable, we need to be mindful of this or
risk crushing theoretical initiative altogether.[6]

As you may have guessed, part of why I write the last
section was because the one thing that I felt was missing at Athens was the
realization that this kind of indulgence is now more urgent than ever. Yours
truly tried to argue the virtues of making Plato’s Problem (PP) and Darwin’s
Problem (DP) central to contemporary research. The reaction, as I saw it (and I
might be wrong here), was that such thinking did not really get one very far,
that it is possible to do all theoretical work without worrying about these
inconclusive problems. It seemed to me that to the degree that PP was
acknowledged to be important, it struck me that the consensus was that we
should off load acquisition concerns to the professionals in psych (though, of
course, they should consult us closely in the process). The general tone seemed
to be that eyeballing a proposal for PP compatibility was just self-indulgence,
if not worse. The case of DP was even worse. It was taken as too unspecified to
even worry about and anyhow general methodological concerns for simplicity and
explanation should prove robust enough without having to indulge in pointless
evolutionary speculation of the cursory variety available to us.

I actually agree with some version of each of these points.
Linguists cannot explore the fine details of PP without indulging in some
psychology requiring methods not typically part of the syntactician’s technical
armamentarium.[7]
And, I agree that right now what we know about evolution is very unlikely to
play a substantive role in our theorizing. However, PP and DP serve to vividly
bring before the mind two important projects: (i) that the object of study in
GG is FL and its fine structure and (ii) that theoretical unification is a
virtue in pursuing (i). PP and DP serve to highlight these two features, and as
these are often, IMO, lost sight of, this is an excellent thing.

Moreover, at least with PP, it is not correct that we cannot
eyeball a proposal to get a sense of whether it will pass platonic muster. In
fact, in my experience many of the thoughtful professionals often get their
cues regarding hard/interesting problems by first going through simple the PoS
logic implicit in a standard syntactic analysis.[8]
This simple PP analysis lays the groundwork for more refined considerations.
IMO, one of the problems with current syntactic pedagogy is that we don’t teach
of our students how to deploy simple PoS reasoning. Why? Well, I think it’s
because we don’t actually consider PP that important. Why? Well the most
generous answer is that it is assumed that all th work we do already tacitly endorses
PP’s basic ideals and so worrying PP to death will not get us very much bang
for the buck. Maybe, but being explicit really does make a difference.
Explicitly knowing the big issues what the big issues are really is useful. It
helps you to think of your work at one remove from the all important details
that consume you. And it can even serve, at times, to spur interesting kinds of
more specific syntactic speculation. Lastly, it’s the route towards engaging
with the larger intellectual community that the Athens conference indicated so
many feel detached from.

Personally, I think that the same holds true with DP. I
agree that we are not about to use the existing (non-existing) insights from
the evolution of cognition and language to ground type (iii) thinking. But, I
think that considering the Generative project in DP terms enlarges how we think
about the problem. In fact, much more specifically, it was only with the rise
of the Minimalist Program (MP) in which DP was highlighted and stressed as PP
had been before that the virtues of the unification of the MLGs rose to become a
central kind of research project. If you do not go “beyond” explanatory
adequacy there is no pressing reason for worrying about how our UGs fit in with
other domains of cognition or general features of computation (if indeed they
do). PP shoves the learnability problem in our faces and doing so has led us to
think constructively about Chomsky Universals and how they might be teased out
of our study of Gs. Hopefully, DP will do the same for cleaning up FL/UG: it will,
if thought about regularly, make it vivid to us that unifying the modules and
seeing how these unified systems might relate to other domains of cognition and
computation is something that we should try to tease out of our understanding
of our versions of UG.

I should add that doing this should encourage us to start
forcefully re-engaging with the neuro-cognitive-computational sciences
(something that syntacticians used to do regularly but not so much anymore).
And if we do not do this, IMO, linguistics in general and syntax in particular
will not have a bright future. As I said in Athens, if you want to know about
the half life of a philological style of linguistics, just consider that the
phrase “prospering classics department” is close to being an oxymoron. That way
lies extinction. So we need to reengage with these “folks” (ah my first
Obamaism) and both PP and DP can act as constant reminders of the links that
syntax ought to have with these efforts.

Ok, enough. To conclude: Intellectually, we are in a golden
age of linguistics (though we made need to manage this a bit so as to not
discourage PTS). However, it also appears that politically things are not so
hot. Many of the attendees felt that GG work is under threat of extinction in
many places. There was animated discussion about what could be done about this
and how to better advertise our accomplishments to both the lay and scientific
public. We discussed some of this here, and similar concerns were raised in
Athens. However, it is clear that in some places matters are really pretty
horrid. This is particularly unfortunate given the intellectual vigor of
generative syntax. I will try and say something about this feature in a
following post.

[1]
Jason Merchant made this point forcefully and amusingly in Athens.

[2]
I suspect that some of the hostility that traditional typology shows towards GG
lies in the inchoate appreciation GG has significantly raised the empirical standards for typological
research and those not conversant with these methods are failing empirically.
In other words, traditional typologists rightly fear that they, not their subject matter, is threatened
with obsolescence. To my mind, this is all very positive, but, for obvious
reasons, it is politically terrible. Typologists of a certain stripe might be
literally fighting for their lives, and this naturally leads to a very hostile
attitude towards GG based work.

[3]
Yes, I also see the possibility of a PTSD syndrome (pure theoretical syntax
disorder).

[4]
Continuing a project begun in “Conditions on Transformations” and ending with Barriers.

[5]
Though see Sprouse’s work which provides evidence that wh-islands show the same
super additivity profile as other islands.

[6]
I will likely blog on this again in the near future if all of this sounds kind
of cryptic.

[7]
Though this is changing rapidly, at least at places like the University of
Maryland.

[8]
I’d like to thank Jeff Lidz for showing me this in spades. I sat in on his
terrific class, the kind of class that every syntactician (especially newly
minted syntacticians) should do.

Thursday, May 21, 2015

Real science data is not natural. It is artificial. It is
rarely encountered in the wild and (as Nancy Cartwright has emphasized (see
here for discussion)) it standardly takes a lot of careful work to create the
conditions in which the facts are observable. The idea that science proceeds by
looking carefully at the natural world is deeply misleading, unless, of course,
the world you inhabit happens to be CERN. I mention this because one of the
hallmarks of a progressive research program is that it supports the manufacture
of such novel artificial data and their bundling into large scale “effects,” artifacts
which then become the targets of theoretical speculation.[1]
Indeed, one measure of how far a science has gotten is the degree to which the
data it concerns itself with is factitious and the number of well-established
effects it has managed to manufacture. Actually, I am tempted to go further: as
a general rule only very immature scientific endeavors are based on naturally
available/occurring facts.[2]

Why do I mention this. Well, first, by this measure,
Generative Grammar (GG) has been a raging success. I have repeatedly pointed to
the large number of impressive effects that GG has collected over the last 60
years and the interesting theories that GGers have developed trying to explain
them (e.g. here).
Island and ECP effects, binding effects and WCO effects do not arise naturally
in language use. They need to be constructed, and in this they are like most facts
of scientific interest.

Second, one nice way to get a sense of what is happening in
a nearby domain is to zero in on the effects its practitioners are addressing.
Actually, more pointedly, one quick and dirty way of seeing whether some area
is worth spending time on is to canvass the variety and number of different
effects it has manufactured.In what
follows I would like to discuss one of these that has recently come to my
attention that has some interests for a GGer like me.

A recent paper (here)
by Jiwon Yun, Zhong Chen, Tim Hunter, John Whitman and John Hale (YCHWH)
discusses an interesting processing fact concerning relative clauses (RC) that
seems to hold robustly cross linguistically. The effect is called the “Subject
Advantage” (SA). What’s interesting about this effect is that it holds in
languages where the head both precedes and follows the relative clause (i.e.
for languages like English and those like Japanese). Why is this
interesting?

Well, first, this argues against the idea that the SA simply
reflects increasing memory load as a function of linear distance between gap
and filler (i.e. head). This cannot be the relevant variable for though it
could account for SA effects in languages like English where the head precedes
the RC (thus making the subject gap closer to the head than the object gap is)
in Japanese style RCs where heads follow the clause the object gap is linearly
closer to the head than the subject gap is, hence predicting an object
advantage, contrary to experimental fact.

Second, and
here let me quote John Hale (p.c.):

SA effects defy explanation in terms of
"surprisal". The surprisal idea is that low probability words are
harder, in context. But in relative clauses surprisal values from simple
phrase structure grammars either predict effort on the wrong word (Hale 2001) or get it completely backwards --- an object
advantage, rather than a subject advantage (Levy 2008, page 1164).

Thus, SA effects are interesting in that they appear to be
stable over languages as diverse as English on the one hand and Japanese on the
other and seem to refractory to many of the usual processing explanations.

Furthermore, SA effects suggest that grammatical structure
is important, or to put this in more provocative terms, that SA effects are
structure dependent in some way. Note that this does not imply that SA effects are grammatical effects, only that G
structure is implicated in their explanation.In this, SA effects are a little like Island Effects as understood (here).[3]
Purely functional stories that ignore G structure (e.g. like linearly dependent
memory load or surprisal based on word-by-word processing difficulty) seem to
be insufficient to explain these effects (see YCHWH 117-118).[4]

So how to explain the SA? YCHWH proposes an interesting
idea: that what makes object relatives harder than subject relatives is have
different amounts of “sentence medial ambiguity” (the former more than the
latter) and that resolving this ambiguity takes work that is reflected in
processing difficulty. Or put more flatfootedly, finding an object gap requires
getting rid of more grammatical
ambiguity than finding a subject gap and getting rid of this ambiguity requires
work, which is reflected in processing difficulty. That’s the basic idea. He
work is in the details that YCHWH provides. And there are a lot of them.Here are some.

YCHWH defines a notion of “Entropy Reduction” based on the
weighted possible continuations available at a given point in a parse. One
feature of this is that the model provides a way of specifying how much work
parsing is engaged in at a particular
point. This contrasts with, for example, a structural measure of memory
load. As note 4 observes, such a measure could explain a subject advantage but
as John Hale (p.c.) has pointed out to me concerning this kind of story:

This general account is thus adequate but not very
precise. It leaves open, for instance, the question of where exactly greater
difficulty should start to accrue during incremental processing.

That said, whether to go for the YCHWH account or the less
precise structural memory load account is ultimately an empirical matter.[5]
One thing that YCHWH suggests is that it should be possible to obviate the SA
effect given the right kind of corpus data. Here’s what I mean.

YCHWH defines entropy reduction by (i) specifying a G for a
language that defines the possible G continuations in that language and (ii)
assigning probabilistic weights to these continuations. Thusm YCHWH shows how
to combine Gs and probabilities of use of these. Parsing, not surprisingly,
relies on the details of a particular
G and the details of the corpus of usages of those G possibilities. Thus, what
options a particular G allows affects how much entropy reduction a given word
licenses, as does the details of the corpus that are probabilize the G.This thus means that it is possible that SA
might disappear given the right corpus details. Or it allows us to ask what if
any corpus details could wipe out SA effects. This, as Tim Hunter noted (p.c.)
raises two possibilities. In his words:

An interesting (I think) question that arises is:
what, if any, different patterns of corpus data would wipe out the subject
advantage? If the answer were 'none', then that would mean that the grammar
itself (i.e. the choice of rules) was the driving force. This is almost
certainly not the case. But, at the other extreme, if the answer were 'any
corpus data where SRCs are less frequent than ORCs', then one would be forgiven
for wondering whether the grammar was doing anything at all, i.e. wondering
whether this whole grammar-plus-entropy-reduction song and dance were just a
very roundabout way of saying "SRCs are easier because you hear them more
often".

One of the nice features of the YCHWH discussion is that it
makes it possible to analytically
approach this problem. It would be nice to know what the answer is both
analytically as well as empirically.

Another one of he nice features of YCHWH is that it
demonstrates how to probabilize MGs of the Stabler variety so that one can view
parsing as a general kind of information
processing problem. In such a context difficulties in language parsing are
the natural result of general information processing demands. Thus, this
conception of parsing locates it in a more general framework of information
processing, parsing being one specific application where the problem is to
determine the possible G compatible continuations of a sentence. Note that this
provides a general model of how G knowledge can get used to perform some task.

Interestingly, on this view, parsing does not require a
parser. Why? Because parsing just is information processing when the relevant
information is fixed. It’s not like we do language parsing differently than we
do, say, visual scene interpretation once
we fix the relevant structures being manipulated. In other words, parsing
on the YCHWH view is just information
processing in the domain of language (i.e. there is nothing special about language processing except the fact that
it is Gish structures that are being manipulated). Or, to say this another way,
though we have lots of parsing, there is no parser that does it.

YCHWHis a nice
example of a happy marriage of grammar and probabilities to explain an
interesting parsing effect, the SA. The latter is a discovery about the ease of
parsing RCs that suggests that G structure matters and that language
independent functional considerations just won’t cut it. It also shows how easy
it is to combine MGs with corpora to deliver probabilistic Gs that are
plausibly useful in language use. All in all, fun stuff, and very instructive.

[2]
This is one reason why I find admonitions to focus on natural speech as a
source of linguistic data to be bad advice in general. There may be exceptions,
but as a general rule such data should be treated very gingerly.

[3]
See, for example, the discussion in the paper by Sprouse, Wagers and Phillips.

[4]
A measure of distance based on structure could explain the SA. For example, there
are more nodes separating the object trace and the head than separating the
subject trace and the head. If memory load were a function of depth of
separation, that could account for the SA, at least at the whole sentence level.
However, until someone
defines an incremental version of the Whole-Sentence structural memory load
theory, it seems that only Entropy Reduction can account for the word-by-word
SA effect across both English-type and Japanese-type languages.

[5]
The following is based on some correspondence with Tim Hunter. Thus he is
entirely responsible for whatever falsehoods creep into the discussion here.

Monday, May 18, 2015

1. Alex Drummond sent me this link to a nice little paper on what appears to be an old topic that still stumps physicists. The chestnut is the question of whether hot water freezes more quickly than cold. The standard answer is "you gotta be kidding" and then lots of aspersions are cast on those that think that they have proven the contrary empirically. Read this, but what's interesting is that nobody ever thought that the right answer was anything but the obvious one. However, experiments convinced many over centuries that the unintuitive view (i.e. that hot water does freeze faster) was correct. The paper reviews the history of what is now called the "Mpemba Effect," named after a high school student who had the courage of his experiments and was ridiculed for this by teachers and fellow students until bigger shots concluded that his report was not nuts. Not that it was correct, however. It turns out that the question is very complex, takes a lot of careful reasoning to make clear and turns out to be incredibly hard to test. It's worthwhile reading for linguists for it gives a good taste of how complex interaction effects stymie even advanced sciences. So, following the adage that if it's tough for physics don't be surprised if it't bought for linguistics, it's good to wallow in the hardships and subtleties of a millennial old problem.

2. Here's a recent piece on how hard it is to think cleanly in the sciences. None of it is surprising. The bottom line is that there is lots of wiggle room even in the best sciences for developing theories that would enhance one were they true. So, there is a strong temptation to find them true and there are lots of ways of fudging the process so that what we would like to be the case has evidence in its favor. I personally find none of this surprising or disheartening.

Two points did strike me as curious.

First the suggestion that a success rate of 15% is something to worry about. Maybe it is, but what should we a priori believe the success rate should be? Maybe 15% is great for all we know. There is this presupposition that the scientific method (such as it is) should insulate us from publishing bad papers. But why think this? IMO, the real issue is not how many bad papers get out there but how many good ones. Maybe an 85% miss rate is required to generate the small number of good papers that drive a field forward.

Second, there is the suggestion that this is in part due to the exigencies of getting ahead in the academic game. The idea is that pressures today are such that there is lots to gain in painting rosy research pictures of ever expand revolutionary insight. Maybe. But do we really know if things were better in more relaxed times when these sorts of pressures were less common? I don't know. It would be nice to have a diachronic investigation to see that things have gotten worse. Personal anecdote: I once read through the proceedings of the Royal Society from the 17th and 18th centuries. It was a riot. Lots of the stuff was terrible. Of course, what survives to the present day is the gold, not the dross. So, how do we know that things have gotten worse and that the reason for this are contemporary pressures?

That's it. Science is hard. Gaining traction is difficult. Lots of useless work gets done and gets published. Contrary to scientific propaganda, there is no "method" for preventing this. Of course, we might be able to do better and we should if we can. But I for one am getting a little tired of this sky-is-falling stuff. The idea seems to be that if only we were more careful all problems could be solved. Why would anyone believe this? As the first paper outlines, even apparently simple problems are remarkably difficult, and this in areas we know a lot about.

Friday, May 15, 2015

I’d intended this to be a response to Thomas’s
comments but it got too long, and veered off in various directions.

Computational
and other levels

Thomas makes the point that there’s too much
work at the ‘implementational’ level, rather than at the proper Marrian
computational level, and gives examples to do with overt vs covert movement,
labelling etc. He makes an argument that all that stuff is known to be formally
equivalent, and we essentially shouldn’t be wasting our time doing it. So ditch
a lot of the work that goes on in syntax (sob!).

But I don’t think that’s right. Specification
at the computational level for syntax is not answered fully by specifying the
computational task as solving the problem of providing an infinite set of
sound-meaning pairings; it’s solving the issue of why these pairings,
and not some other thinkable set.So,
almost all of that `implementation’ level work about labels or whatever is
actually at the computational level. In fact, I don’t really think there is an
algorithmic level for syntax in the classical Marrian sense: the computational
level for syntax defines the set of pairings and sure that has a physical
realization in terms of brain matter, but there isn’t an algorithm per se. The information
in the syntax is accessed by other systems, and that probably is algorithmic in
the sense there’s a set by step process to transform information of one sort
into another (to phonology, or thinking, or various other mental subsystems),
but the syntax itself doesn’t undergo information transforming processes of
this sort, it’s a static specification of legitimate structures (or
derivations). I think that the fact that this isn’t appreciated sometimes
within our field (and almost never beyond it) is actually a pretty big problem,
perhaps connected with the hugely process oriented perspective of much
cognitive psychology.

Back to the worry about the actual
`implementational’ issue to do with Agree vs Move etc. I think that Thomas is
right, and that some of it may be misguided, inasmuch as the different
approaches under debate may have zero empirical consequences (that is, they
don’t answer the question: why this pairing and not some other -
derivations/representations is perhaps a paradigm case of this). In such cases
the formal equivalence between grammars deploying these different devices is
otiose and I agree that it would be useful to accept this for particular cases.
But at least some of this ‘implementational’ work can be empirically sensitive:
think of David Pesetsky’s arguments for covert phrasal as well as covert
feature (=Agree) movement, or mine and Gillian’s work on using Agree vs overt
movement to explain why Gaelic wh-phrases don’t reconstruct like English ones
do but behave in a ways that’s intermediate between bound pronouns and
traces. The point here is that this is work at Marr’s computational level
to try to get to what the correct computational characterization of the system
is.

Here’s a concrete example. In my old paper on
features in minimalism, I suggested that we should not allow feature recursion
in the specification of lexical items (unlike HPSG). I still think that’s
right, but not allowing it causes a bunch of empirical issues to arise: we
can’t deal with tough constructions by just saying that a tough-predicate
selects an XP/NP predicate, like you can in HPSG, so the structures that are
legitimized (or derivations if you prefer) by such an approach are quite
different from those legitimized by HPSG. On the other hand, there are a whole
set of non-local selectional analyses that are available in HPSG that just
aren’t in a minimalist view restricted in the way I suggested (a good thing).
So the specification at the computational level about the richness of feature
structure directly impacts on the possible analyses that are available. If you
look at that paper, it looks very implementational, in Thomas’s sense, as it’s
about whether embedding of feature structures should be specified inside
lexical items or outside them in the functional sequence, but the work it’s
actually doing is at the computational level and has direct empirical (or at
least analytical) consequences. I think the same is true for other apparently ‘implementational’
issues, and that’s why syntacticians spend time arguing about them.

Casting
the Net

Another worry about current syntax that’s
raised, and this is a new worry to me so it’s very interesting, is that it’s
too ‘tight’: That is, that particular proposals are overly specific which is
risky, because they’re almost always wrong, and ultimately a waste of energy.
We syntacticians spend our time doing things that are just too falsifiable
(tell that to Vyv Evans!). Thomas calls this net-syntax, as you try to cast a
very particularly shaped net over the phenomena, and hence miss a bunch.
There’s something to this, and I agree that sometimes insight can be gained by
retracting a bit and proposing weaker generalizations (for example, the debate
between Reinhart Style c-command for bound variable anaphora, and the
alternative Higginbotham/Safir/Barker style Scope Requirement looks settled,
for the moment, in the latter’s favour, and the latter is a much weaker claim).
But I think that the worry misses an important point about the to and fro
between descriptive/empirical work and theoretical work. You only get to have
the ‘that’s weird’ moment when you have a clear set of theoretical assumptions
that allow you to build on-the-fly analyses for particular empirical phenomena,
but you then need a lot of work on the empirical phenomenon in question before
you can figure out what the analysis of that phenomenon is such that you can
know whether your computational level principles can account for it. That
analytical work methodologically requires
you to go down the net-syntax type lines, as you need to come up with
restrictive hypotheses about particularities, in order to explore the
phenomenon in the first case. So specific encodings are required, at least
methodologically to make progress. I don’t disagree that you need to back off
from those specific encodings, and not get too enraptured by them, but
discovering high level generalisations about phenomena needs them, I
think. We can only say true things when we know what the empirical lay of
the land is, and the vocabulary we can say those true things in very much
depends on a historical to and fro between quite specific implementations until
we reach a point where the generalizations are stable. On top of this, during
that period, we might actually find that the phenomena don’t fall together in
the way we expected (so syntactic anaphor binding, unlike bound variable
anaphora, seems to require not scope but structural c-command, at least as far
as we can tell at the moment). The difference between syntax and maths, which was
the model that Thomas gave, is that we don’t know in syntax where the hell we
are going much of the time and what the problems are really going to be,
whereas we have a pretty good idea of what the problems are in maths.

Structure
and Interpretation

I’ll (almost) end on a (semi-)note of
agreement. Thomas asks why we care about structure. I agree with him that structures
are not important for the theoretical aspects of syntax, except as what systems
generate, and I’m wholly on board with Thomas’s notion of derivational
specifications and their potential lexicalizations (in fact, that was sort of
the idea behind my 2010 thing on trying to encode variability in single
grammars by lexicalising subsequences of functional hierarchies, but doing it
via derivations as Thomas has been suggesting is even better). I agree
that if you have, for example, a feature system of any kind of complexity, you
probably can’t do the real work of testing grammars by hand as the possible
number of options just explodes. I see this as an important growth area for
syntax: what are the relevant features, what are their interpretations, how do
they interact, and my hunch is that we’ll need fairly powerful computational
techniques to explore different grammars within the domains defined by
different hypotheses about these questions, along the lines Thomas
indicates.

So why do we have syntax papers filled with
structures? I think the reason is that, as syntacticians, we are really
interested in how sign/sound relates to meaning (back to why these pairings), and unless you have a
completely directly compositional system like a lexicalized categorial grammar,
you need structures to effect this pairing, as interpretation needs structure
to create distinctions that it can hook onto. Even if you lexicalize it all,
you still have lexical structures that you need a theory of. So although
syntactic structures are a function of lexical items and their possible combinations,
the structure just has to go somewhere.

But we do need to get more explicit about
saying how these structures are interpreted semantically and phonologically.
Outside our field, the `recursion-only’ hypothesis (which was never, imo, a
hypothesis that was ever proposed or one that anyone in syntax took seriously),
has become a caricature that is used to beat our backs (apologies for the mixed
metaphor). We need to keep emphasizing the role of the principles of the
interpretation of structure by the systems of use. That means we need to talk
more to people who are interested in how language is used, which leads me to …

The
future’s bright, the future’s pluralistic.

On the issue of whether the future is rosy or
not, I actually think it is, but it requires theoretical syntacticians to work
with people who don’t automatically share our assumptions and to respect what
assumptions those guys bring, and see where compatibilities or rapprochements
lie, and where there are real, empirically detectable, differences. Part of the
sociological problem Thomas and others have mentioned is insularity and
perceived arrogance. My own feeling is that younger syntacticians are not as
insular as those of my generation (how depressing – since when was my
generation a generation ;-( ), so I’m actually quite sanguine about the future
of our field; there’s a lot of stellar work in pure syntax but those same
people doing that work are engaging with neuroscientists, ALL people,
sociolinguists, computational people etc). But it will require more work on our
(i.e. we theoretical syntacticians’) part: talking to non-syntacticians and nonlinguists,
overcoming the legacy of past insularity, and engaging in topics that might
seem outside of our comfort zones. But there is a huge amount of potential
here, not just in the more computational areas that Thomas mentioned, but also
in areas that have not had as much input from generative syntax as they could
have had: multilingualism, language of ageing, language shift in immigrant
populations, etc. There are areas we can really contribute to, and there are
many more. I agree with Thomas that we shouldn’t shirk `applied’ research: we
should make it our own.