Comments

Thursday, October 31, 2013

There has been an intense discussion of the APL paper in the comments to my very negative review of this paper (here). I would like to highlight one interchange that I think points to ways that UG like approaches have combined with (distributional) learners to provide explicit analyses of child language data. In other words, there exists concrete proposals dealing with specific cases that fruitfully combine UG with explicit learning models. APL deals with none of this kind of material, despite its obvious relevance to its central claims and despite citing papers that deal with such work in its bibliography. It makes one (e.g. me) think that maybe the authors didn't read (or understand) the papers APL cites.

One more point: Jeff observes that "it is right to bring to the fore the question of how UG makes contact with data to drive learning," and that APL are right to raise this question. I would agree that the question is worth raising and worth investigating. What I would deny is that APL contributes to advancing this question in any way whatsoever. There is a tendency in academia to think that all work should be treated with respect and politesse. I disagree. All researchers should be so treated, not their work. Junk exists (APL is my existence proof were one needed) and identifying junk as junk is an important part of the critical/evaluative process. Trying to find the grain of trivial truth in a morass of bad argument and incoherent thinking retards progress. Perhaps the only positive end that APL might serve is to be a useful compendium of junk work that neophytes can go to to practice their critical skills. I plan to use it with my students for just this purpose in the future.

At any rate, here is a remark by Alex C that generated this informative reply (i.e. citations for relevant work) by Jeff Lidz.

Alex Clark:

So consider this quote: (not from the paper under discussion)

"It is standardly held that having a highly restricted hypothesis space makesit possible for such a learning mechanism to successfully acquire a grammar that is compatible with the learner’s experience and that without such restrictions, learning would be impossible (Chomsky 1975, Pinker 1984, Jackendoff 2002). In many respects, however, it has remained a promissory note to show how having a well-defined initial hypothesis space makes grammar induction possible in a way that not having an initial hypothesis space does not (see Wexler 1990 and Hyams 1994 for highly relevant discussion).The failure to cash in this promissory note has led, in my view, to broadskepticism outside of generative linguistics of the benefit of a constrained initial hypothesis space."

This seems a reasonable point to me, and more or less the same one that is made in this paper: namely that the proposed UG don't actually solve the learnability problem.Jeff Lidz:Alex C (Oct 25, 3am ([i.e. above] NH])) gives a quote from a different paper to say that APL have identified a real problem and that UG doesn't solve learnability problems.

The odd thing, however, is that this quote comes from a paper that attempts to cash in on that promissory note, showing in specific cases what the benefit of UG would be. Here are some relevant examples.

Sneed's 2007 dissertation examines the acquisition of bare plurals in English. Bare plural subjects in English are ambiguous between a generic and an existential interpretation. However, in speech to children they are uniformly generic. Nonetheless, Sneed shows that by age 4, English learners can access both interpretations. She argues that if something like Diesing's analysis of how these interpretation arise is both true and innate, then the learner's task is simply to identify which DPs are Heim-style indefinites and the rest will follow. She then provides a distributional analysis of speech to children does just that. The critical thing is that the link between the distributional evidence that a DP is indefinite and the availability of existential interpretations in subject position can be established only if there is an innate link between these two facts. The data themselves simply do not provide that link. Hence, this work successfully combines a UG theory with distributional analysis to show how learners acquire properties of their language that are not evident in their environment.

Viau and Lidz (2011, which appeared in Language and oddly enough is cited by APL for something else) argues that UG provides two types of ditransitive construction, but that the surface evidence for which is which is highly variable cross-linguistically. Consequently, there is no simple surface trigger which can tell the learner which strings go with which structures. Moreover, they show that 4-year-olds have knowledge of complex binding facts which follow from this analysis, despite the relevant sentences never occurring in their input. However, they also show what kind of distributional analysis would allow learners to assign strings to the appropriate category, from which the binding facts would follow. Here again, there is a UG account of children's knowledge paired with an analysis of how UG makes the input informative.

Takahashi's 2008 UMd dissertation shows that 18month old infants can use surface distributional cues to phrase structure to acquire basic constituent structure in an artificial language. She shows also that having learned this constituent structure, the infants also know that constituents can move but nonconstituents cannot move, even if there was no movement in the familiarization language. Hence, if one consequence of UG is that only constituents can move, these facts are explained. Distributional analysis by itself can't do this.

Misha Becker has a series of papers on the acquisition of raising/control, showing that a distributional analysis over the kinds of subjects that can occur with verbs taking infinitival complements could successfully partition the verbs into two classes. However, the full range of facts that distinguish raising/control do not follow from the existence of two classes. For this, you need UG to provide a distinction.

In all of these cases, UG makes the input informative by allowing the learner to know what evidence to look for in trying to identify abstract structure. In all of the cases mentioned here, the distributional evidence is informative only insofar as it is paired with a theory of what that evidence is informative about. Without that, the evidence could not license the complex knowledge that children have.

It is true that APL is a piece of shoddy scholarship and shoddly linguistics. But, it is right to bring to the fore the question of how UG makes contact with data to drive learning. And you don't have to hate UG to think that this is valuable question to ask.

Wednesday, October 30, 2013

This recent piece (by James Somers) on Douglas Hofstadter (DH) has a brief review of AI from it's heady days (when the aim was to understand human intelligence) to its lucrative days (when the goal shifted to cashing out big time). I have not personally been a big fan of DH's early stuff for I thought (and wrote here) that early AI, the one with cognitive ambitions, had problems identifying the right problems for analysis and that it massively oversold what it could do. However, in retrospect, I am sorry that it faded from the scene, for though there was a lot of hype, the ambitions were commendable and scientifically interesting. Indeed, lots of good work came out of this tradition. Marr and Ullman were members of the AI lab at MIT, as were Marcus and Berwick. At any rate, Somers gives a short history of the decline of this tradition.

The big drop in prestige occurred, Somers notes, in about the early 1980s. By then "AI…started to…mutate…into a subfield of software engineering, driven by applications…[the]mainstream had embraced a new imperative: to make machines perform in any possible, with little regard for psychological plausibility (p. 3)." The turn from cognition was ensconced in the conviction that "AI started working when it ditched humans as a model, because it ditched them (p. 4)." Machine translation became the poster child for how AI should be conducted. Somers gives a fascinating thumb nail sketch of the early system (called 'Candide' and developed by IBM) whose claim to fame was that it found a way to "avoid grappling with the brain's complexity" when it came to translation. The secret sauce according to Somers? Machine translation! This process deliberately avoids worrying about anything like the structures that languages deploy or the competence that humans must have to deploy it. It builds on the discovery that "almost doesn't work: a machine…that randomly spits out French words for English words" can be tweaked "using millions of pairs of sentences…[to] gradually calibrate your machine, to the point where you'll be able to enter a sentence whose translation you don't know and get a reasonable result…[all without] ever need[ing] to know why the nobs should be twisted this way or that (p. 10-11).

For this all to work requires "data, data, data" (as Norvig is quoted as saying). Take …"simple machine learning algorithms" plus 10 billion training examples [and] it all starts to work. Data trumps everything" Josh Estelle at Google is quoted as noting (p. 11).

According to Somers, these machine-learning techniques are valued precisely because they allow serviceable applications to be built by abstracting away from the hard problems of human cognition and neuro-computattion. Moreover, the partitioners of the art, know this. These are not taken to be theories of thinking or cognition. And, if this is so, there is little reason to criticize the approach. Engineering is a worthy endeavor and if we can make life easier for ourselves in this way, who could object. What is odd is that these same techniques are now often recommended for their potential insight into human cognition. In other words, a technique that was adopted precisely because it could abstract from cognitive details is now being heralded as a way of gaining insight into how minds and brains function. However, the techniques here described will seem insightful only if you take minds/brains to gain their structure largely via environmental contact. Thinking from this perspective is just "data, data, data"plus the simple systems that process it.

As you may have guessed, I very much doubt that this will get us anywhere. Empiricism is the problem, not the solution. Interestingly, if Somers is right, AI's pioneers, the people that moved away from its initial goals and deliberately moved it in a more lucrative engineering direction knew this very well. It seem that it has taken a few generations to loose this insight.

Friday, October 25, 2013

When Chomsky turned 50, the MIT department threw him a party. The invite list was restricted to the MIT community. Me and my good and great friend Elan Dresher were thus excluded, he being at U Mass and me at Harvard in the philo dept. Fortunately for us, we discovered a loophole that allowed us to get in. Jay Keyser, the impresario, was organizing entertainment for the event and Elan, Amy Weinberg and me decided to put together a skit in celebration of the event. Jay heard out our idea for the skit and conditional on our delivering it, we were in. This skit is probably the greatest contribution I have personally made to modern generative grammar. The scenario? A debate between Chomsky and Coco the gorilla about whether apes could talk. My contribution? I played Coco. It was great. Let me tell you a bit about it.

Amy played Nichol Nicholson (our Penny Patterson, Coco's handler), Elan played Chomsky. I was Coco. We wrote the skit in about two weeks and practiced it for about a week. Tim Stowell videotaped the whole thing (sadly for me, the tape has since been lost). The debate was furious. Coco marshaled a series of fairly good arguments I believe (laid out on large cue cards held up), each of which Chomsky knocked down (Noam came up to me afterwards insisting that Coco's arguments, though superficially convincing, were actually quite weak as they were based on presuppositions that missed the main point). The skit lasted 10 minutes. It was a huge hit! And everyone was there. I have never had the attention of as much of the linguistics community since. Plus, I never had them listen to anything I did so carefully, though I was afraid that some (e.g. Jakobson) might keel over they were laughing so hard.

Dresher was a perfect Chomsky. He even got the hand gestures down (rolling up the sleeves, removing the watch) and the cadence and intonation (always upping the rhetorical ante). Amy was perfect as Penny. But, to be honest, I was the star. I had rented a gorilla suit for the occasion and had it for a whole weekend. I wore it for three days. I went to pick Elan up from the bus station wearing it and he almost refused to get off the bus. I put it on in the bathroom before our first rehearsal and Amy screamed when I got out (and yes, she screamed because I was wearing it). I wore it at Harvard one day, carrying my school bag asking people directions to the primate lab in William James Hall. I also wore it in the philo department, with Bob Nozick being the only one to react intelligently: he calmly walked by and said "Good morning Norbert.". Contrast this with one office mate, who thought of jumping out the window to get away. It was great. If you ever get a chance to wear a gorilla suit for a day, don't pass it up.

You may be wondering why I am here reminiscing about these events that took place well over 30 years ago. It's because of this piece that just appeared in the Onion that Greg Sailor was kind enough to send my way.

Oh yes: after teaching a short course at IU several years ago and mentioning my exploits as a youth, some of the participants chipped into together and bought me my own personal gorilla suit. Every now and then I put it on, but it seems that Gorillas are not as scary as they once were, or maybe the outward appearance between me with and without such a suit is not as dramatic as it used to be.

Tuesday, October 22, 2013

Here's another installment (see here for link to earlier one) on the effectiveness of student teaching evaluations in discerning effective teaching. It seems that the relation between the two are tenuous. This does not imply that such evaluations are without merit. As the linked to post notes: "Students are arguable in the best position to judge certain aspects of teaching that contribute to effectiveness, such as clarity, pace, legibility audibility...". However, teaching evaluations may well track something else, e.g. grade expectations, enjoyment, physical attractiveness, age, ethnicity. Given the increasing centrality of student questionnaires in faculty evaluation and course restructuring it would be nice to have indices that measure what we want them to. I confess to being skeptical that such measures are easy to devise. It seems pretty clear that what we have in place is wanting.

Monday, October 21, 2013

My mother always told me that you should be careful what you
wish for because you just might get it. In fact, I’ve discovered that her
advice was far too weak: you should be careful what you idly speculate about as
it may come to pass. As readers know, my last post (here)
questioned the value added of the review process based on recent research
noting the absence of evidence that reviewing serves its purported primary
function of promoting quality and filtering out the intellectually less
deserving. Well, no sooner did I write this than I received proof positive that
our beloved LSA, has implemented a no review policy for Language’s new online journal Perspectives. Before I review the
evidence for this claim, let me say that though I am delighted that my
ramblings have so much influence and can so quickly change settled policy, I am
somewhat surprised at the speed with which the editors at Language have adopted my inchoate maunderings. I would have hoped
that we might make haste slowly by first trying to make the review process progressively
less cumbersome before adopting more exciting policies. I did not anticipate
that the editors at Language would be
so impressed with my speculations that they would immediately throw all caution
aside and allow anything at all, no matter how slipshod and ignorant, to appear
under its imprimatur. It’s all a bit dizzying, really, and unnerving (What
power! It’s intoxicating!). But why do things by halves, right? Language has chosen to try out a bold
policy, one that will allow us to see whether the review process has any
utility at all.

Many of you will doubt that I am reporting the
aforementioned editorial policy correctly. After all, how likely is it that
anything I say could have such immediate impact? In fact, how likely is that that the LSA and
the editors of Language and its
online derivatives even read FoL? Not
likely, I am sorry to admit. However,
unbelievable as it may sound, IT IS TRUE, and my evidence for this is the
planned publication of the target article by Ambridge, Pine and Lieven (APL) (“Child
language: why universal grammar doesn’t help” here[1]).
This paper is without any redeeming intellectual value and I can think of only
two explanations for how it got accepted for publication: (i) the radical
change in review policy noted above and (ii) the desire to follow the Royal
Society down the path of parody (see here).
I have eliminated (ii) because unlike the Royal Society’s effort, APL is not even slightly funny haha (well maybe
as slapstick, I’ll let you decide). So that leaves (i).[2]

How bad is the APL paper? You can’t begin to imagine. However, to help you vividly taste its
shortcomings, let me review a few of its more salient “arguments” (yes, these
are scare quotes). A warning, however, before I start. This is a long post. I
couldn’t stop myself once I got started. The bottom line is that the APL paper
is intellectual junk. If you believe me, then you need not read the rest. But
it might interest you to know just how bad a paper can be. Finding zero on a
scale can be very instructive (might this be why it is being published? Hmm).

The paper goes after what APL identify as five central
claims concerning UG: identifying syntactic categories, acquiring basic
morphosyntax, structure dependence, islands and binding. They claim to
“identify three distinct problems faced by proposals that include a role for
innate knowledge –linking, inadequate
data coverage, and redundancy…(6).” ‘Linking’ relates to
“how the learner can link …innate knowledge to the input language (6).”
‘Data-coverage’ refers to the empirical inadequacy of the proposed universals,
and ‘redundancy’ arises when a proposed UG principle proves to be accurate but
unnecessary as the same ground is covered by “learning procedures that must be
assumed by all accounts” and thus obviate the need “for the innate principle or
constraint” (7). APL’s claim is that all proposed UG principles suffer from one
or another of these failings.

Now far be it from me to defend the perfection of extant UG
proposals (btw, the principles APL discusses are vintage LGB conceptions, so I
will stick to these).[3]
Even rabid defenders of the generative enterprise (e.g. me) can agree that the
project of defining the principles of UG is not yet complete. However, this is
not APL’s point: their claim is that the proposals are obviously defective and clearly
irreparable. Unfortunately, the paper contains not a single worthwhile
argument, though it does relentlessly deploy two argument forms: (i) The Argument
from copious citation (ACC), (ii) The Argument from unspecified alternatives
(AUA). It combines these two basic
tropes with one other: ignorance of the relevant GB literature. Let me
illustrate.

The first section is an attack on the assumption that we
need assume some innate specification of syntactic categories so as to explain
how children come to acquire them, e.g. N, V, A, P etc. APL’s point is that distributional analysis
suffices to ground categorization without this parametric assumption. Indeed,
the paper seems comfortable with the idea that the classical proposals critiqued
“seem to us to be largely along the right lines (16),” viz. that “[l]earners
will acquire whatever syntactic categories are present in a particular language
they are learning making use of both distributional …and semantic
similarities…between category members (16).” So what’s the problem? Well, it seems
that categories vary from language to language and that right now we don’t have
good stories on how to accommodate this range of variation. So, parametric theories
seeded by innate categories are incomplete and, given the conceded need for
distributional learning, not needed.

Interestingly, APL does not discuss how distributional learning
is supposed to achieve categorization. APL is probably assuming non-parametric
models of categorization. However, to function, these latter require
specifications of the relevant features that are exploited for categorization. APL,
like everyone else, assume (I suspect) that we humans follow principles like
“group words that denote objects together,” “group words that denote events
together,” “group words with similar “endings” together,” etc. APL’s point is
that these are not domain specific
and so not part of UG (see p.12). APL is fine with innate tendencies, just not
language particular ones like “tag words that denote objects as Nouns,” “tag words that denote events
as Verbs.” In short, APL’s point is that calling the groups acquired nouns,
verbs, etc. serves no apparent linguistic function . Or does it?

Answering this question requires asking why UG distinguishes
categories, e.g. nouns from verbs. What’s the purpose of distinguishing N or V in
UG? To ask this question another way: which GB module of UG cares about Ns, Vs,
etc? The only one that I can think of is the Case Module. This module identifies
(i) the expressions that require case (Nish things) (ii) those that assign it
(P and Vish things) and (iii) the configurations under which the assigners
assign case to the assignees (roughly government). I know of no other part of
UG that cares much about category labels.[4][5]

If this is correct, what must an argument aiming to show
that UG need not natively specify categorical classes show? It requires showing
that the distributional facts that Case Theory (CT) concerns itself with can be
derived without such a specification. In other words, even if categorization
could take place without naming the categories categorized, APL would need to
show that the facts of CT could also be derived without mention of Ns and Vs
etc. APL doesn’t do any of this. In fact, APL does not appear to know that the
facts about CT are central to UG’s adverting to categorical features.

Let me put this point another way: Absent CT, UG would
function smoothly if it assigned arbitrary tags to word categories, viz. ‘1’,
‘2’ etc. However, given CT and its role
in regulating the distribution of nominals (and forcing movement) UG needs category
names. CT uses these to explain data like: *It
was believed John to be intelligent, or *Mary
to leave would be unwise or *John
hopes Bill to leave or *who do you
wanna kiss Bill vs who do you wanna
kiss. To argue against categories in UG requires deriving these kinds of
data without mention of N/V-like categories. In other words, it requires
deriving the principles of CT from non-domain specific procedures. I personally
doubt that this is easily done. But, maybe I am wrong. What I am not wrong
about is that absent this demonstration we can’t show that an innate
specification of categories is nugatory. As APL doesn't address these concerns
at all, its discussion is irrelevant to the question they purport to address.

There are other problems with APL’s argument: it has lots of
citations of “problems” pre-specifying the right categories (i.e. ACC), lots of
claims that all that is required is distributional analysis, but it contains no
specification of what the relevant features to be tracked are (i.e. AUA). Thus,
it is hard to know if they are right that the kinds of syntactic priors that
Pinker and Mintz (and Gleitman and Co. sadly absent from the APL discussion)
assume can be dispensed with.[6]
But, all of this is somewhat besides the point given the earlier point: APL
doesn’t correctly identify the role that categories play in UG and so the presented
argument even if correct doesn’t
address the relevant issues.

The second section deals with learning basic morphosyntax.
APL frames the problem in terms of divining the extension of notions like
SUBJECT and OBJECT in a given language. It claims that nativists require that
these notions be innately specified parts of UG because they are “too abstract
to be learned” (18).

I confess to being mystified by the problem so construed. In
GB world (the one that APL seem to be addressing), notions like SUBJECT and
OBJECT are not primitives of the theory. They are purely descriptive notions,
and have been since Aspects. So, at least in this little world, whether
such notions can be easily mapped to external input is not an important problem. What the GB version of UG does need is a
mapping to underlying structure (D-S(tructure)). This is the province of theta
theory, most particularly UTAH in some version. Once we have DS, the rest of UG
(viz. case theory, binding theory, ECP) regulate where the DPs will surface in
S-S(tructure).

So though GB versions of UG don’t worry about notions like
SUBJECT/OBJECT, they do need notions that allow the LAD to break into the
grammatical system. This requires primitives with epistemological priority (EP) (Chomsky’s term) that allow the LAD
to map PLD onto grammatical structure. Agent
and patient, seem suited to the task
(at least when suitably massaged as per Dowty and Baker). APL discusses Pinker’s version of this kind
of theory. Its problem with it? APL claims that there is no canonical mapping
of the kind that Pinker envisages that covers every language and every
construction within a language (20-21). APL cites work on split ergative
languages and notes that deep ergative languages like Dyirbal may be particularly
problematic. It further observes that many of these problems raised by these
languages might be mitigated by adding other factors (e.g. distributional
learning) to the basic learning mechanism. However, and this is the big point,
APL concludes that adding such learning obviates the need for anything like
UTAH.

APL’s whole discussion is very confused. As APL note, the
notions of UG are abstract. To engage it, we need a few notions that enjoy EP.
UTAH is necessary to map at least some
input smoothly to syntax (note: EP does not require that every input to the syntax be mapped via
UTAH to D-S). There need only be a core set of inputs that cleanly do so in
order to engage the syntactic system. Once primed other kinds of information
can be used to acquire a grammar. This is the kind of process that Pinker
describes. This obviates the need for a general
UTAH like mapping.

Interestingly APL agrees with Pinker’s point, but it bizarrely
concludes that this obviates the need for EPish notions altogether, i.e. for finding
a way to get the whole process started. However, the fact that other factors
can be used once the system is
engaged does not mean that the system can be engaged without some way to get it
going. Given a starting point, we can move on. APL doesn’t explain how to get
the enterprise off the ground, which is too bad, as this is the main problem
that Pinker and UTAH addresses.[7]
So once again, APL’s discussion fails to engage UG’s main worry: how to
initially map linguistic input onto DS so that UG can work its magic.

APL have a second beef with UTAH like assumptions. APL
asserts that there is just so much variation cross linguistically that there
really is NO possible canonical
mapping to DS to be had. What’s APL’s argument? Well, the ACC, argument by
citation. The paper cites resaearch that claims there is unbounded variation in
the mapping principles from theta roles to syntax and concludes that this is
indeed the case. However, as any moderately literate linguist knows, this is
hotly contested territory. Thus, to make the point APL wants to make responsibly requires adjudicating these
disputes. It requires discussing e.g. Baker’s and Legate’s work and showing
that their positions are wrong. It does not
suffice to note that some have argued
that UTAH like theories cannot work if others have argued that they can. Citation is not argumentation, though APL
appears to read as if it is. There has
been quite a bit of work on these topics within the standard tradition that APL
ignores (Why? Good question). The absence of any discussion renders APL’s conclusions
moot. The skepticism may be legitimate (i.e. it is not beside the point).
However, nothing APL says should lead any sane person to conclude that the
skepticism is warranted as the paper doesn’t exercise the due diligence
required to justify its conclusions. Assertions are a dime a dozen. Arguments
take work. APL seems to confuse the first for the second.

The first two sections of APL are weak. The last three
sections are embarrassing. In these, APL fully exploits AUAs and concludes that
principles of UG are unnecessary. Why? Because the observed effects of UG
principles can all be accounted for using pragmatic discourse principles that
boil down to the claim that “one cannot extract elements of an utterance that
are not asserted, but constitute background information” …and “hence that only
elements of a main clause can be extracted or questioned” (31-32). For the case
of structure dependence, APL supplements this pragmatic principle with the further
assertion that “to acquire a structure-dependent grammar, all a learner has to
do is to recognize that strings such as the
boy, the tall boy, war and happiness share both certain functional and –as a consequence-
distributional similarities” (34). Oh boy!! How bad is this? Let me count some of the ways.

First, there is no semantic or pragmatic reason for why back-grounded
information cannot be questioned. In fact, the contention is false. Consider
the Y/N question in (1) and appropriate negative responses in (2):

(1)Is
it the case that eagles that can fly can swim

(2)a.
No, eagles that can SING can swim

b. No eagles that can fly, can SING

Both (2a,b) are fine answers to the question in (1). Given
this, why can we form the question with answer (2b) as in (3a) but not the
question conforming to the answer in (2a) as in (3a)? Whatever is going on has nothing to do with whether it is possible
to question the content of relative clause subjects. Nor is it obvious how
“recogniz[ing] that strings such as the
boy, the tall boy, war and happiness share both certain functional …and distributional
similarlities” might help matters.

(3)a. *Can
eagles that fly can swim?

b.
Can eagles that can fly swim?

This is not a new point and it is amazing how little APL has
to say about it. In fact, the section on structure dependence quotes and seems
to concede all the points made in the Berwick et. al. 2011 paper (see here).
Nonetheless APL concludes that there is no problem in explaining the structure
dependence of T to C if one assumes that back-grounded info is frozen for
pragmatic reasons. However, as this is obviously false, as a moment’s thought
will show, APL’s alternative “explanation” goes nowhere.

Furthermore, APL doesn’t really offer an account of how
back-grounded information might be relevant as the paper nowhere specifies what
back-grounded information is or in which contexts it appears. Nor does APL explicitly offer any pragmatic
principle that prevents establishing syntactic dependencies with back-grounded
information. APL has no trouble specifying the GB principles it critiques, so I
take the absence of a specification of the pragmatic theory to be quite
telling.

The only hint APL provides as to what it might intend (again
copious citations, just no actual proposal) is that because questions ask for new
information and back-grounded structure is old information it is impossible to
ask a question regarding old information (c.f. p. 42). However, this, if it’s
what APL has in mind (which, again is unclear as the paper never actually makes
the argument explicitly) is both false and irrelevant.

It is false because we can focus within a relative clause island,
the canonical example of a context where we find back-grounded info (c.f. (4a)).
Nonetheless, we cannot form the question (4b) for which (4a) would be an
appropriate answer. Why not? Note, it cannot be because we can’t focus within
islands, for we can as (4a) indicates.

(4)a. John
likes the man wearing the RED scarf

b.
*Which scarf does John like the man who wears?

Things get worse quickly. We know that there are languages
that in fact have no trouble asking questions (i.e. asking for new info) using
question words inside islands. Indeed, a good chunk of the last thirty years of
work on questions has involved wh-in-situ
languages like Chinese or Japanese where these kinds of questions are all perfectly
acceptable. You might think that APL’s claims concerning the pragmatic
inappropriateness of questions from back-grounded sources would discuss these
kinds of well-known cases. You might, but you would be wrong. Not a peep. Not a
word. It’s as if the authors didn’t even know such things were possible (nod
nod wink wink).

But it gets worse still: ever since forever (i.e. from Ross)
we know that Island effects per se
are not restricted to questions. The same things appear entirely with
structures having nothing to do with focus e.g. relativization and
topicalization to name two relevant constructions. These exhibit the very same
island effects that questions do, but in these constructions the manipulanda do
not involve focused information at all. If the problem is asking for new info from a back-grounded source,
then why can’t operations that target old
back-grounded information not form dependencies into the relative clause? The central fact about islands is that it
really doesn’t matter what the moved element means, you cannot move it out (‘move’ here denotes a particular
kind of grammatical operation). Thus, if you can’t form a question via movement,
you can’t relativize or tropicalize using movement either. APL does not seem
acquainted with this well-established point.

One could go on: e.g. resumptive pronouns can obviate island
effects but the analogous non-resumptive analogues do not despite semantic and
pragmatic informational equivalence, islands in languages like Swedish/Norwegian
do not allow extraction from any
island whatsoever, contrary to what PL suggests. All of this is relevant to
APL’s claims concerning islands. None of it is discussed, nor hinted at.
Without mention of these factors, APL once again fails to address the problems
that UG based accounts have worried about and discussed for the last 30 years.
As such, the critique advanced in this section on islands, is, once again,
largely irrelevant.

APL’s last section on binding theory (BT) is more of the
same. The account of principle C effects in cases like (4) relies on another
pragmatic principle, viz. that it is “pragmatically anomalous to use a full
lexical NP in part of the sentence that exists only to provide background
information” (48). It is extremely unclear what this might mean. However, on at least the most obvious
reading, it is either incorrect or much too weak to account for principle C
effects. Thus, one can easily get full NPs within back-grounded structure (e.g.
relative clauses like (4a)). But within the
relative clause (i.e. within the domain of back-grounded information)[8],
we still find principle C effects (contrast (4a,b)).

(5)a. John
met a woman who knows that Frank1 loves his1 mother

b.
* John met a woman who knows that he1 loves Frank’s1
mother

The discussion of principles A and B are no better. APL does
not explain how pragmatic principles explain why reflexives must be “close” to
their antecedents (*John said that Mary
loves himself or *John believes
him/heself is tall), why they cannot be anteceded by John in structures like John’s
mother upset himself (where the antecedent fails to c-command but is not in a clause), why they must be
preceded by their antecedents (*Mary
believes himself loves John) etc. In
other words, APL does not discuss BT and that facts that have motivated it at all
and so the paper provides no evidence for the conclusion that BT is redundant
and hence without explanatory heft.

This has been a long post. I am sorry. Let me end. APL is a
dreadful paper. There is nothing there. The question then is why did Perspectives accept it for publication? Why
would a linguistics venue accept such
a shoddy piece of work on linguistics
for publication? It’s a paper that displays no knowledge of the relevant
literature, and presents not a single argument (though assertions aplenty) for
its conclusions. Why would a journal sponsored by the LSA allow the linguistic
equivalent of flat-earthism to see the light of day under its imprimatur? I can
only think of only one reasonable explanation for this: the editors of Language have decided to experiment with
a journal that entirely does away with the review process. And I fear I am to
blame. The moral: always listen to your mother.

[2]
There are a couple of other possibilities that I have dismissed out of hand:
(i) that the editors thought that this paper had some value and (ii) linguistic
self loathing has become so strong that anything that craps on our discipline
is worthy of publication precisely because it dumps on us. As I said, I am
putting these terrifying possibilities aside.

[4]
Bounding theory cares too (NP is, but VP is not a bounding node). APL discusses
island effects and I discuss their points below. However, suffice it to say, if
we need something like a specification of bounding nodes that we need to know,
among other things, which groups are Nish and which not.

[5]
X’ theory will project the category of the head of a phrase to the whole
phrase. But what makes something an NP requiring case is that N heads it.

[6]
APL also seems to believe that unless the same categories obtain cross
linguistically they cannot be innate (c.f. p. 11). This confuses Greenberg’s
conception of universals with Chomsky’s, and so is irrelevant. Say that the following principle “words that
denote events are grouped as V” is a prior that can be changed given enough
data. This does not imply that the acquisition of linguistic categories can
proceed in the absence of this prior. Such a prior would be part of UG on
Chomsky’s conception, even if not on Greenberg’s.

[7]
It’s a little like saying that you can get to New York using a good compass
without specifying any starting point. Compass readings are great, but not if
you don’t know where you are starting from.

[8]
Just to further back-ground the info (4) embeds the relative clause within know, which treats the embedded
information as pre-supposed.

Friday, October 18, 2013

I have just finished reviewing two papers for possible publication. As you all know, this is hard work. You gotta read the thing, think about it rationally, judge it along several dimensions concluding in an overall assessment, and then write this all up in comprehensible, and hopefully, helpful prose. I find any one of these activities exhausting, and I thus don't count reviewing as among my top 10 favorite pastimes. Given my distaste, I have begun to wonder how valuable all this expended effort is. After all, it's worth doing if the process has value even if it is a pain (sort of like dieting or regular exercise). So is it and does it?

Sadly, this has become less and less clear to me of late. In a recent post (here) I pointed you to some work aimed at evaluating the quality of journal publications. Yesterday, I ran across a paper published in Plos Biology (here) that, if accurate, suggests, that the return on investment for reviewing is remarkably slight. Why? Because, as the authors ( Adam Eyre-Walker and Nina Stoletzki: (E-WS)) put it: "scientists are poor at estimating the merit of a scientific publication" (6). How poor? Very. Why? because (i) there is very low intersubjective agreement after the fact on what counts as worthwhile, (ii) there is a a strong belief that a paper published in a prestige journal is ipso facto meritorious (though as E-WS show this not a well justified assumption), and post hoc citation indices are very nosiy indicators of merit. So, it seems that there is little evidence for the common assumption that the cream rises to the top or that the best papers get published in the best journals or that the cumbersome and expensive weeding process (i.e. reviewing) really identifies the good stuff and discards the bad.

Now, linguists might respond to this by saying that this all concerns papers in biology, not hard sciences like syntax, semantics and psycholinguistics. And, of course, I sympathize. However, more often than not, what holds in one domain has analogues in others. Would it really surprise you to find out that the same holds true in our little domain of inquiry, even granted the obvious superior intellect and taste of generative grammarians?

Say this is true, what's to be done? I really don't know. Journals, one might think, play three different roles in academia. First, they disseminate research. Second, they help direct the direction of research by ranking research into different piles of excellence. Third, they are used for promotion and tenure and the distribution of scarce research resources, aka grants. In this internet age, journals no longer serve the first function. As for the second, the E-WS results suggest that the effort is not worth it, at least if the aim is to find the good stuff and discard the bad. The last then becomes the real point of the current system: it's a kind of arbitrary way of distributing scarce resources, though if the E-WS findings are correct, journals are the academic equivalent of dice, with luck more than merit determining the outcome.

After a seriesoftechnicalposts, I really feel like kicking back and waxing all poetic about some lofty idea without the constant risk of saying something that is provably nonsense. Well, what could be safer and loftier and fluffier than to follow Norbert's example and speculate on the future of my field? Alas, how narrowly should "my field" be construed? Minimalist grammars? Computational linguistics? Linguistics? Tell you what, let's tackle this inside-out, starting with Minimalist grammars. Warning: Lots of magic 8-ball shaking after the jump.

Tuesday, October 15, 2013

I ran across this piece (here) discussing the accuracy of student evaluations in measuring teaching effectiveness that I thought you might find interesting. These have begun to play a very big role in promotion and tenure cases, as well as guiding the conscientious with regards to their teaching. As regards the latter, there is no question of their utility. Concerning the former, the points made seem more than a little relevant.

Monday, October 14, 2013

It’s not news that syntactic structure is perceptible in the
absence of meaningfulness. After all, even though the slithy toves did gyre and
gimble in the mabe, still and all colorless green ideas do sleep furiously. Not
news maybe, but still able to instruct, as I found out in getting ready for
this years Baggett Lectures (see here). The invited speaker
this year is Stanislas
Dehaene and to get ready I’ve read (under the guidance of my own personal
Virgil, Ellen Lau, guide to the several nether circles of neuroscience) some
recent papers by him on how the brain does syntax. One (here)
was both short (a highly commendable property) and absorbing. Like many
neuroscience efforts, this one is collaborative, brought to you by the French
team of Pallier, Devauchelle and Dehaene (PDD). I found the PDD results (and
methods to the degree that I could be made to appreciate them) thought
provoking. Here’s why.

The paper starts from the conventional grammatical assumption
that “sentences are not mere strings of words but possess a hierarchical
structure with constituents nested inside each other” (2522).[1]
PDD’s task is to find out where, if anywhere, the brains tracks/builds this
hierarchy. PDD construct a very clever model that allows them to use fMRI
techniques to zero in on those regions sensitive to hierarchical structure.

Before proceeding, it’s worth noting that this takes quite a
bit of work. Part of what makes the paper fun, is the little model that allows
PDD to index hierarchy to differential blood flows (the BOLD
(blood-oxygen-level-dependent) response, which is what an fMRI tracks). It
predicts a linear relationship between the BOLD response and phrasal size
(roughly indexed to word length (a “useful approximation”) and they use this
relationship to probe ROIs (i.e. regions of interest (Damn I love these
acronyms)) that respond to this predicted relationship using two different
kinds of linguistic probes. The first are strings of words containing phrases
ranging from 1 to 12 words long (actually 12/6/4/3/2/1 e.g. 12: I believe that
you should accept the proposal of your new associate, 4: mayor of the city he
hates this color they read their names). The second are jabberwocky strings
with the same structure (e.g. I tosieve that you should begept the tropufal of
your tew viroate). Here’s what they found:

1.They
found brain regions that responded to these probes in the predicted linear
manner. Four in the STS region, and two in the left inferior gyrus (IFGtri and
IFGorb).

2.The
regions responded differentially to the two kinds of linguistic probes. Thus,
all four regions responded to the first kind of probe (“normal prose”) while
jabberwocky only elicited responses in IFGorb (with some response with a lower
statistical threshold in left posterior STS and IFGtri).

In sum, different brain regions light up exclusively to
phrasal syntax independent of content words. Thus, the brain seems to
distinguish contentful morphemes from more functional ones and it does so by
processing the relevant information in different
regions.

And this is interesting, I believe. Why?

Well, think of what PDD could have found. One possibility is
that all words are effectively the same; the only difference between content
words and functional vocabulary residing in their statistical frequency, the
closed class content words being far more common than the open class contentful
vocab. Were this correct, then we should expect no regional segregation of the
two kinds of vocab, just, say, a bigger or smaller response based on the
lexical richness of the input. Thus, we might have expected all regions to respond equally to all of
the inputs though the size of the response would have differed. But this is not what PDD found. What they found is
differential activity across various regions with one group of sites responding
exclusively to structural input even in the absence of meaningful content. This
sure smells a lot like the wetware analogue of the autonomy of syntax thesis
(yes, I was not surprised, and yes, I
was nonetheless chuffed). PDD (2526) make exactly this autonomy of syntax point
in noting that their results underline “the relative independence of syntax
from lexico-semantic features.”

Second, the results might
have implications for grammar lexicalization (GL) (an idea that Thomas has been
posting about recently). From what I can tell, GL sees grammatical dependencies
as the byproduct of restrictions coded as features on lexical terminals. Grammatical
dependencies on the GL view just are the sum total of lexical dependencies. If
this is correct, then a question arises: what does this kind of position lead
us to expect about grammatical structure in the absence of such lexical
information? I assume that Jabberwocky vocab is not part of our lexicon and so a grammar that exclusively builds
structure based on info coded in lexical terminals will not have access to (at
least some) grammatically relevant information in Jabberwocky input (e.g. how
to combine the and slithy toves in the absence of features
on the latter?). Does this mean that we should not be able to build syntactic
structure in the absence of the relevant terminals or that we will react
differently to input composed of “real” lexemes vs Jabberwocky vocab? Behaviorally,
we know that we can distinguish well- from ill- formed Jabberwocky. So we know
that the absence of a lot of lexical vocab does not impede the construction of
syntactic structure. PDD further shows that neurally there are parts of the
brain that respond to syntactic structure regardless of the presence of featurally
marked terminals (and recall, that this need not have been the case). A non-GL
view of grammar has no problems with this, as grammatical structure is not a by-product
of lexical features[2]
(at least not in general, only the
functional vocab is possibly
relevant).[3]
I cannot tell if this is a puzzle if one takes a fully GL view of things, but
it certainly indicates that parts of the brain seem tuned to structure without
much apparent lexical content. Indeed, it suggests (at least to me) that GL
might have things backwards: it’s not that grammatical structure arises from
lexical specifications but that lexical specifications are by-products of
enjoying certain syntactic relations. Formally,
this might be a distinction without a difference, but psychologically and
neurally, these two ways of viewing the ontogeny of structure look like they
might (note the weasel word here) have very different empirical consequences.

I am sure that I have over-interpreted the PDD results. The
structure they probe is very simple (just right branching phrase structure) and
the results rely on some rather radical simplifications. However, I am told,
that this is the current state of the fMRI art and even with these caveats, PDD
is interesting and thought provoking, and, who knows, maybe more than a little
relevant to how we should think about the etiology of grammar. It seems that brains,
or at least parts of brains, are sensitive to structure regardless of what that
structure contains. This is an old idea, one perfectly expected given a pretty
orthodox conception of the autonomy of syntax. It’s nice to see that this same
conception is leading to new and intriguing work investigating how brains go
about building structure.

[1]
This truism, sadly, is not always acknowledged. For example, it is still
possible for psychologists to find a publishing outlet of considerable repute
for papers that deny this (see here).
Fortunately, flat-earthism seems to be loosing its allure in neuroscience.

[2]
Note that this does not entail that
grammatical information might not be lexicalized for tasks like parsing. It
only implies that grammatical dependencies are more primitive than lexical
ones. The former determine the latter, not vice versa.

[3]
I say ‘possibly’ as the functional vocab might just be the morphological
outward manifestations of grammatical dependencies rather than the elements
from which such dependencies are formed. There is not obvious reason for
thinking that structure is there because
of the functional vocab, though there is reason for assuming that functional
vocab and syntactic structure are correlated, sometimes strongly.