Comments

Thursday, February 23, 2017

In a recent book (here),
Chomsky wants to run an argument to explain
why the Merge, the Basic Operation, is so simple. Note the ‘explain’ here. And
note how ambitious the aim. It goes beyond explaining the “Basic Property” of
language (i.e. that natural language Gs (NLG) generate an unbounded number of
hierarchically structured objects that are both articulable and meaningful) by
postulating the existence of an operation like Merge. It goes beyond explaining
why NLGs contain both structure building and displacement operations and why
displacement is necessarily to c-commanding positions and why reconstruction is
an option and why rules are structure dependent. These latter properties are
explained by postulating that NLGs must contain a Merge operation and arguing
that the simplest possible Merge
operation will necessarily have these properties. Thus, the best Merge
operation will have a bunch of very nice properties.

This latter argument is interesting enough. But in the book
Chomsky goes further and aims to explain “[w]hy language should be optimally
designed…” (25). Or to put this in Merge terms, why should the simplest possible Merge operation be the one that we
find in NLGs? And the answer Chomsky is looking for is metaphysical, not epistemological.

What’s the difference? It’s roughly this: even granted that
Chomsky’s version of Merge is the
simplest and granted that on methodological grounds simple explanations trump
more complex ones, the question remains, given
all of this why should the conceptually simplest operation be the one that
we in fact have.Why should methodological superiority imply
truth in this case?That’s the question Chomsky is asking and,
IMO, it is a real doozy and so worth considering in some detail.

Before starting, a word about the epistemological argument.
We all agree that simpler accounts trump more complex ones. Thus if some account A is involves fewer
assumptions than some alternative account A’ then if both are equal in their
empirical coverage (btw, none of these ‘if’s ever hold in practice, but were they to hold then…) then we all
agree that A is to be preferred to A’. Why? Well because in an obvious sense
there is more independent evidence in
favor of A then there is for A’ and we all prefer theories whose premises have
the best empirical support. To get a feel for why this is so let’s analogize
hypotheses to stools. Say A is a three legged and A’ a four legged stool. Say that evidence is
weight that these stools support. Given a constant weight each leg on the A
stool supports more weight than each of the A’ stool, about 8% more.So each of A’s assumption are better
empirically supported than each of those made by A’. Given that we prefer
theories whose assumptions are better supported to those that are less well
supported A wins out.[1]

None of this is suspect. However, none of this implies that the simpler theory is the true one. The epistemological
privilege carries metaphysical consequences only if buttressed by the
assumption that empirically better supported accounts are more likely to be
true and, so far as I know, there is actually no obvious story as to why this
should be the case short of asking Descarte’s God to guarantee that our clear
and distinct ideas carry ontological and metaphysical weight. A good and just
God would not deceive us, would she?

Chomsky knows all of this and indeed often argues in the conventional
scientific way from epistemological superiority to truth. So, he often argues
that Merge is the simplest operation that yields unbounded hierarchy with many
other nice properties and so Merge is the true Basic Operation. But this is not what Chomsky is attempting here. He
wants more! Hence the argument is interesting.[2]

Ok, Chomsky’s argument. It is brief and not well fleshed
out, but again it is interesting. Here it is, my emphasis throughout (25).

Why should language be optimally
designed, insofar as the SMT [Strong Minimalist Thesis, NH] holds? This
question leads us to consider the origins of language. The SMT hypothesis fits
well with the very limited evidence we have about the emergence of language,
apparently quite recently and suddenly
in the evolutionary time scale…A fair guess today…is that some slight rewiring
of the brain yielded Merge, naturally in
its simplest form, providing the basis for unbounded and creative thought,
the “great leap forward” revealed in the archeological record, and the
remarkable difference separating modern humans from their predecessors and the
rest of the animal kingdom. Insofar as the surmise is sustainable, we would
have an answer to questions about apparent optimal design of language: that is
what would be expected under the postulated circumstances, with no selectional or other pressures operating, so the emerging
system should just follow laws of nature,
in this case the principles of Minimal
Computation – rather the way a snowflake forms.

So, the argument is that the evolutionary scenario for the
emergence of FL (in particular its recent vintage and sudden emergence) implies
that whatever emerged had to be “simple” and to the degree we have the evo
scenario right then we have an account for why Merge has the properties it has
(i.e. recency and suddenness implicate a simple change).[3]
Note again, that this goes beyond any methodological arguments for Merge. It
aims to derive Merge’s simple features from the nature of selection and the
particulars of the evolution of language. Here Darwin’s Problem plays a very
big role.

So how good is the argument? Let me unpack it a bit more
(and here I will be putting words into Chomsky’s mouth, always a fraught
endeavor (think lions and tamers)). The argument appears to make a four way
identification: conceptual simplicity = computational simplicity = physical
simplicity = biological simplicity. Let me elaborate.

The argument is that Merge in its “simplest form” is an
operation that combines expressions into sets
of those expressions. Thus, for any A, B: Merge (A, B) yields {A, B}. Why sets?
Well the argument is that sets are the simplest kinds of complex objects there are. They are simpler than ordered
pairs in that the things combined are not
ordered, just combined. Also, the operation of combining things into sets does
not change the expressions so combined (no tampering). So the operation is
arguably as simple a combination operation that one can imagine. The assumption
is that the rewiring that occurred triggered the emergence of the conceptually simplest
operation. Why?

Step two: say that conceptually simple operations are also computationally
simple. In particular assume that it is computationally less costly to combine
expressions into simple sets than to combine them as ordered elements (e.g.
ordered pairs). If so, the conceptually simpler an operation then the less
computational effort required to execute it. So, simple concepts imply minimal
computations and physics favors the computationally minimal. Why?

Step three: identify computational with physical simplicity.
This puts some physical oomph into “least effort,” it’s what makes minimal
computation minimal. Now, as it happens, there are physical theories that tie
issues in information theory with physical operations (e.g. erasure of
information plays a central role in explaining why Maxwell’s demon cannot
compute its way to entropy reversal (see here on the
Landauer Limit)).[4]
The argument above seems to be assuming something similar here, something tying
computational simplicity with minimizing some physical magnitude. In other
words, say computationally efficient systems are also physically efficient so
that minimizing computation affords physical advantage (minimizes some physical
variable). The snowflake analogy plays a role here, I suspect, the idea being
that just as snowflakes arrange themselves in a physically “efficient” manner,
simple computations are also more physically efficient in some sense to be
determined.[5]
And physical simplicity has biological implications. Why?

The last step: biological complexity is a function of
natural selection, thus if no selection, no complexity. So, one expects
biological simplicity in the absence of selection,
the simplicity being the direct reflection of simply “follow[ing] the laws of
nature,” which just are the laws of minimal computation, which just reflect
conceptual simplicity.

So, why is Merge simple? Because it had to be! It’s what
physics delivers in biological systems in the absence of selection,
informational simplicity tied to conceptual simplicity and physical efficiency.
And there could be no significant selection pressure because the whole damn
thing happened so recently and suddenly.

How good is this argument? Well, let’s just say that it is
somewhat incomplete, even given the motivating starting points (i.e. the great
leap forward).

Before some caveats, let me make a point about something I
liked. The argument relies on a widely held assumption, namely that complexity
is a product of selection and that this requires long stretches of time.This suggests that if a given property is relatively simple then it was not selected
for but reflects some evolutionary forces other than selection. One aim of the
Minimalist Program (MP), one that I think has been reasonably well established,
is that many of the fundamental features of FL and the Gs it generates are in
fact products of rather simple operations and principles. If this impression is
correct (and given the slippery nature of the notion “simple” it is hard to
make this impression precise) then we should not be looking to selection as the
evolutionary source for these operations and principles.

Furthermore, this conclusion makes independent sense.
Recursion is not a multi-step process, as Dawkins among others has rightly
insisted (see here
for discussion) and so it is the kind of thing that plausibly arose (or could have arisen) from a single
mutation. This means that properties of FL that follow from the Basic Operation
will not themselves be explained as products of selection. This is an important
point for, if correct, it argues that much of what passes for contemporary work
on the evolution of language is misdirected. To the degree that the property is
“simple” Darwinian selection mechanisms are beside the point. Of course, what
features are simple is an empirical issue, one that lots of ink has been
dedicated to addressing. But the more mid-level features of FL a “simple” FL
explains the less reason there is for thinking that the fine structure of FL
evolved via natural selection. And this goes completely against current research
in the evo of language. So hooray.

Now for some caveats: First, it is not clear to me what
links conceptual simplicity with computational simplicity. A question: versions
of the propositional calculus based on negation and disjunction or negation and
disjunction are expressively equivalent. Indeed, one can get away with just one primitive Boolean operation, the
Sheffer Stroke (see here).
Is this last system more computationally efficient than one with two primitive
operations, negation and/or conjunction/disjunction? Is one with three
(negation, disjunction and conjunction) worse?I have no idea. The more primitives we have the shorter proofs can be.
Does this save computational power? How about sets versus ordered pairs? Is
having both computationally profligate? Is there reason to think that a “small
rewiring” can bring forth a nand gate but not a neg gate and a conjunction
gate? Is there reason to think that a small rewiring naturally begets a merge
operation that forms sets but not one that would form, say, ordered pairs? I
have no idea, but the step from conceptually simple to computationally more
efficient does not seem to me to be straightforward.

Second, why think that the simplest biological change did
not build on pre-existing wiring? So, it is not hard to imagine that
non-linguistic animals have something akin to a concatenation operation. Say
they do. Then one might imagine that it is just as “simple” to modify this operation to deliver
unbounded hierarchy as it is to add an entirely different operation which does
so. So even if a set forming
operation were simpler than concatenation tout
court (which I am not sure is so), it is not clear that it is biologically simpler
to derive hierarchical recursion from a modified conception of concatenation given that it already obtains in the
organism then it is to ignore this available operation and introduce an
entirely new one (Merge). If it isn’t (and how to tell really?) then the
emergence of Merge is surprising given that there might be a simpler evolutionary
route to the same functional end (unbounded hierarchical objects via descent
with modification (in this case modification of concatenation)).[6]

Third, the relation between complexity of computation and
physical simplicity is not crystal clear for the case at hand. What physical
magnitude is being minimized when computations are more efficient? There is a
branch of complexity theory where real physical magnitudes (time, space) are
considered, but this is not the kind
of consideration that Chomsky has generally thought relevant. Thus, there is a
gap that needs more than rhetorical filling: what links the computational
intuitions with physical magnitudes?

Fourth, how good are the motivating assumptions provided by
the great leap forward? The argument is built by assuming that Merge is what
gets the great leap forward leaping. In other words, the cultural artifacts
that are proxy for the time when the “slight rewiring” that afforded Merge that
allowed for FL and NLGs. Thus the recent sudden dating of the great leap
forward are the main evidence for dating the slight change. But why assume that
the proximate cause of the leap is a rewiring relevant to Merge, rather than
say, the rewiring that licenses externalization of the Mergish thoughts so that
they can be communicated.

Let me put this another way. I have no problem believing
that the small rewiring can stand independent of externalization and be of
biological benefit. But even if one believes this, it may be that large scale
cultural artifacts are the product of not just the rewiring but the capacity to
culturally “evolve” and models of cultural evolution generally have
communicative language as the necessary medium for cultural evolution. So, the
great leap forward might be less a proxy for Merge than it is of whatever
allowed for the externalization of FL formed thoughts. If this is so, then it
is not clear that the sudden emergence of cultural artifacts shows that Merge
is relatively recent. It shows, rather, that whatever drove rapid cultural change is relatively recent, and this
might not be Merge per se but the
processes that allowed for the externalization of merge generated structures.

So how good is the whole argument? Well let’s say that I am
not that convinced. However, I admire it for it tries to do something really
interesting. It tries to explain why Merge is simple in a perfectly natural
sense of the word.So let me end with
this.

Chomsky has made a decent case that Merge is simple in that it involves
no-tampering, a very simple “conjoining” operation resulting in hierarchical
sets of unbounded size and that has other nice properties (e.g. displacement,
structure dependence). I think that Chomsky’s case for such a Merge operation
is pretty nice (not perfect, but not at all bad). What I am far less sure of is
that it is possible to take the next step fruitfully: explain why Merge has these properties and not
others.This is the aim of Chomsky’s
very ambitious argument here. Does it work? I don’t see it (yet). Is it
interesting? Yup! Vintage Chomsky.

[1]
All of this can be given a Bayesian justification as well (which is what lies
behind derivations of the subset principle in Bayes accounts) but I like my
little analogy so I leave it to the sophisticates to court the stately
Reverend.

[2]
Before proceeding it is worth noting that Chomsky’s argument is not just a
matter of axiom counting as in the simple analogy above. It involves more
recondite conceptions of the “simplicity” of one’s assumptions. Thus even if
the number of assumptions is the same it can still be that some assumptions are
simpler than others (e.g. the assumption that a relation is linear is “simpler”
than that a relation is quadratic). Making these arguments precise is not
trivial. I will return to them below.

[3]
So does the fact that FL has been basically stable in the species ever since it
emerged (or at least since humans separated). Note, the fact that FL did not continue to evolve after the trek out of
Africa also suggests that the “simple” change delivered more or less all of what we think of as FL today. So,
it’s not like FLs differ wrt Binding Principles or Control theory but are
similar as regards displacement and movement locality. FL comes as a bundle and
this bundle is available to any kid
learning any language.

[5]
What do snowflakes optimize? The following see here,
my emphasis [NH]):

The growth of snowflakes (or of any substance changing
from a liquid to a solid state) is known as crystallization. During this
process, the molecules (in this case, water molecules) align themselves to maximize attractive forces and minimize
repulsive ones. As a result, the water molecules arrange themselves in predetermined
spaces and in a specific arrangement. This process is much like tiling a floor
in accordance with a specific pattern: once the pattern is chosen and the first
tiles are placed, then all the other tiles must go in predetermined spaces in
order to maintain the pattern of symmetry. Water molecules simply arrange
themselves to fit the spaces and maintain symmetry; in this way, the different
arms of the snowflake are formed.

[6]
Shameless plug: this is what I try to do here,
though strictly speaking concatenation here is not among objects in a 2-space
but a 3-space (hence results in “concatenated” objects with no linear
implications.

Sunday, February 12, 2017

I have argued repeatedly that the Minimalist Program (MP)
should be understood as subsuming
earlier theoretical results rather than replacing them. I still like this way
of understanding the place of MP in the history of GG, but there is something
misleading about it if taken too literally. Not wrong exactly, but misleading.
Let me explain.

IMO, MP is to GB (my favorite exemplar of an earlier theory)
as Bounding Theory is to Ross’s Islands. Bounding Theory takes as given that Ross’s account of islands is
more or less correct and then tries to derive these truths from more
fundamental assumptions.[1]
Thus, in one important sense, Bounding Theory does not substitute for Ross’s
but aims to explain it. Thus, Bounding Theory aims to conserve the results of
Ross’s theory more or less.[2]

Just as accurately, however, Bounding Theory does substitute for Ross’s. How so? It
conserves but does not recapitulate
it. Rather it explains why the things on Ross’s list are there. Furthermore, if
successful it will add other islands
to Ross’s inventory (e.g. Subject Condition effects) and make predictions that
Ross’s did not (e.g. successive cyclicity). So conceived, Ross’s island are explanada for which Bounding Theory is
the explanans.

Note, and this is important, given this logic Bounding
Theory will inherit any (empirical) problems for Ross’s generalizations. Pari passu for GB and MP. I mention this
not because it is the topic of todays sermonette, but just to observe that many
fail to appreciate this when criticizing MP. Here’s what I mean.

One way MP might fail is in adopting the assumption that
GBish generalizations are more or less accurate. If this assumption is
incorrect, then the MP story fails in its presuppositions. And as all good
semanticists know, this is different from failing in one’s assertions. Failing this way makes you not so much wrong as
uninteresting. And MP is interesting, just as Bounding Theory is interesting,
to the degree that what it presupposes is (at least) on the right track.[3]

All of this is by way of (leisurely) introduction to what I
want to talk about below. Of the changes MP has suggested I believe that most (or,
to be mealy mouthed, one of the most)
fundamental has been the proposal that we banish strings as fundamental units
of grammar. This shift has been long in coming, but one way of thinking about
Chomsky’s set theoretic conception of Merge is that it dislodges concatenation
as the ontologically (and conceptually) fundamental grammatical relation. Let
me flesh this out a bit.

The earliest conception of GG took strings as fundamental,
strings just being a series of concatenated elements. In Syntactic Structures (SS) (and LSLT for which SS was a public
relations brochure) kernel sentences were defined as concatenated objects
generated by PS rules. Structural Descriptions took strings as inputs and
delivered strings (i.e. Structural Changes) as outputs (that’s what the little
glide symbol (which I can’t find to insert) connecting expressions meant).
Thus, for example, a typical rule took as input things like (1) and delivered
changes like (2), the ‘^’ representing concatenation. PS rules are sets of such
strings and transformations are sets of sets of such strings. But the
architecture bottoms out in strings and their concatenative structures.[4]

(1)X^DP1^Y^V^DP2^Z

(2)X^DP2^Y^V+en^by^NP1

This all goes away in merge based versions of MP.[5]
Here phrase markers (PM) are sets, not strings and string properties arise via
linearization operations like Kayne’s LCA which maps a given set into a
linearized string. The important point is that sets are what the basic
syntactic operation generates, string properties being non-syntactic properties that only obtain when the syntax is done
with its work.[6]
It’s what you get as the true linguistic objects,
the sets, get mapped to the articulators. This is a departure from earlier
conceptions of grammatical ontology.

This said it’s an idea with many precursors. Howard Lasnik
has a
terrific little paper on this in the Aspects
50 years later (Gallego and Ott eds, a MITWPL product that you can download
here).
He reviews the history and notes that Chomsky was quite resistant in Aspects to treating PMs as just coding for hierarchical
relationships, an idea that James McCawley, among others, had been toying with.
Howard reviews Chomsky’s reasoning and highlights several important points that
I would like to quickly touch on here (but read the paper, it’s short and very
very sweet!).

He notes several things. First, that one of the key
arguments for his revised conception in Aspects
revolved around eliminating some possible but non-attested derivations (see
p. 170). Interestingly, as Howard notes, these options were eliminated in any theory that embodied cyclicity. This
is important for when minimalist Chomsky returns to Generalized Transformations
as the source of recursion, he parries the problems he noted in Aspects by incorporating a cyclic
principle (viz. the Extension Condition) as part of the definition of Merge.[7]

Second, X’ theory was an important way station in separating
out hierarchical dependencies from linear ones in that they argued against PS
rules in Gs. By dumping PS rules, the relation between such rules and the string
features of Gs was conceptually weakened.

Despite this last point, Lasnik’s paper highlights the Aspects arguments against set based
conception of phrase structure (i.e in favor of retaining string properties in
PS rules). This is section 3 of Howard’s paper. It is a curious read for a
thoroughly modern minimalist for in Aspects
we have Chomsky arguing that it is a very bad idea to eliminate linear properties from the grammar as was
being proposed, by among others, James McCawley. Uncharacteristically (and I mean this is a compliment), Chomsky’s
reasoning here is largely empirical. Aspects argues that when one looks, the
Gs of the period, presupposed some conception of underlying order in order to
get the empirical facts to fit and that this presupposition fits very poorly
with a set theoretic conception of PMs (see Aspects:
123-127). The whole discussion is interesting, especially the discussion of
free word order languages and scrambling. The basic observation is the
following (126):

In every known language the
restrictions on order [even in scrambling languages, NH] are quite severe, and
therefore rules of realization of abstract structures are necessary. Until some
account of such rules is suggested, the set-system simply cannot be considered
seriously as a theory of grammar.

Lasnik, argues plausibly, that Kayne’s LCA offered such an
account and removed this empirical objection against eliminating string information
from basic syntactic PMs.

This may be so. However, from my reading of things I suspect
that something else was at stake. Chomsky has not, on my reading, been a huge
fan of the LCA, at least not in its full Kaynian generality (see note 6). As
Howard observes, what he has been a very big fan of is the observation, going
back at least to Reinhart, that, as he says in the Black Book (334), “[t]here
is no clear evidence that order plays a role at LF or in the computation from N
[numeration, NH] to LF.”

Chomsky’s reasoning is Reinhart’s on steroids. What I mean
is that Reinhart’s observations, if memory serves, are largely descriptive,
noting that anaphora is largely insensitive to order and that c-command is all
that matters in establishing anaphoric dependencies (an important observation
to be sure and one that took some subtle argumentation to establish).[8]
Chomsky’s observations go beyond this in being about the implications of such
lacunae for a theory of generative procedures. What’s important wrt to linear
properties and Gs is not whether linearized order plays a discernible role in
languages, of course it does, but whether these properties tell us anything
about generative procedures (i.e. whether linear properties are factors in how
generative procedures operate). This is key. And Chomsky’s big claim is that G operations are exclusively structure dependent, that this fact about Gs needs to be explained and that the best
explanation is that Gs have no capacity to exploit string properties at all. This
builds on Reinhart, but is really making a theoretical point about the kinds of
rules/operations Gs contain rather than a high level observation about
antecedence relations and what licenses them.

So, the absence of linear sensitive operations in the “core” syntax, the mapping from lexical items to
“LF” (CI actually, but I am talking informally here) rather than some way of
handling the evident linear properties of language, is the key thing that needs
explanation.

This is vintage Chomsky reasoning: look for the dogs that aren’t barking and give a principled
explanation for why they are not
barking. Why no barking strings? Well, if PMs are sets then we expect Gs to be
unable to reference linear properties and thus such information should be
unable to condition the generative procedures we find in Gs.

Note that this argument has been a cynosure of Chomsky’s
most recent thoughts on structure dependence as well. He reiterates his long-standing
observation that T to C movement is structure dependent and that no language
has a linear dependent analogue (move the “highest” Aux exists but move the
“left-most” aux never does and is in
fact never considered an option by kids building English Gs). He then goes on
to explain why no G exploit such linear
sensitive rules. It’s because the rule writing format for Gs exploits sets and
sets contain no linear information. As such rules that exploit linear
information cannot exist for the
information required to write them is un-codeable in the set theoretic “machine
language” available for representing structure. In other words, we want sets
because the (core) rules of G systematically ignore string properties and this
is easily explained if such properties are not part of the G apparatus.

Observe, btw, that it is a short step from this observation
to the idea that linguistic objects are pairings of meanings with sounds (the latter a decidedly
secondary feature) rather than a pairing of meanings and sounds (where both interfaces are equally critical). These, as
you all know, serve as the start of Chomsky’s argument against communication
based conceptions of grammar. So eschewing string properties leads to
computational rather than communicative conceptions of FL.

The idea that strings are fundamental to Gs has a long and
illustrious history. There is no doubt that empirically
word order matters for acceptability and that languages tolerate only a small
number of the possible linear permutations. Thus, in some sense,
epistemologically speaking, the linear properties of lexical objects is more
readily available (i.e. epistemologically simpler) than their hierarchical
ones. If one assumes that ontology should follow epistemology or if one is
particularly impressed with what one “sees,” then taking strings as basic is hard
to resist (and as Lasnik noted, Chomsky did not resist it in his young foolish
salad days). In fact, if one looks at Chomsky’s reasoning, strings are
discounted not because string properties do not hold (they obviously do) but
because the internal mechanics of Gs fails
to exploit a class of logically possible operations. This is vintage Chomsky
reasoning: look not at what exists, but what doesn’t. Negative data tells us
about the structure of particular Gs. Negative G-rules tells us about the
nature of UG. Want a pithy methodological precept? Try this: forget the
epistemology, or what is sitting there before your eyes, and look at what you never see.

Normally, I would now draw some anti Empiricist
methodological morals from all of this, but this time round I will leave it as
an exercise for the reader. Suffice it for now to note that it’s those non-barking
dogs that tell us the most about grammatical fundamentals.

[1]
Again, our friends in physics make an analogous distinction between effective
theories (those that are more or less empirically accurate) and fundamental
theories (those that are conceptually well grounded). Effective theory is what
fundamental theory aims to explain. Using this terminology, Newton’s theory of
gravitation as the effective theory that Einstein’s theory of General
Relativity derived as a limit case.

[2]
Note that conserving the results of earlier inquiry is what allows for the accumulation of knowledge. There is a
bad meme out there that linguistics in general (and syntax in particular)
“changes” every 5 years and that there are no stable results. This is hogwash.
However, the misunderstanding is fed by the inability to appreciate that older
theories can be subsumed as special cases by newer ones.IMO, this has been how syntactic theory has
generally progressed, as any half decent Whig history would make clear. See one
such starting here
and continuing for 4 or 5 subsequent posts.

[3]
I am not sure that I would actually strongly endorse this claim as I believe
that even failures can be illuminating and that even theories with obvious
presuppositional failures can point in the right direction. That said, if one’s
aim is “the truth” then a presupposition failure will at best be judged
suggestive rather than correct.

[4]
For those that care, I proposed concatenation as a primitive
here, but it was a very different sense of concatentation, a very misleading sense.I abstracted the operation from string
properties. Given the close intended relation between concatenation and
strings, this was not a wise move and I hereby apologize.

[6]
One important difference between Kayne’s and Chomsky’s views of linearization
is that the LCA is internal to the
syntax for the former but is part of the mapping from the syntax proper to the
AP interface for the latter. For Kayne, LCA has an effect on LF and derives the
basic features of X’ syntax. Not so for Chomsky. Thus, in a sense, linear
properties are in the syntax for
Kayne but decidedly outside it for Chomsky.

[7]
The SS/LSLT version of the embedding transformation was decidedly not cyclic
(or at least not monotonic structurally). Note, that other conceptions of
cyclicity would serve as well, Extension being sufficient, but not necessary.

[8]
It’s also not obviously correct. Linear order plays some role in making
antecedence possible (think WCO effects) and this is surely true in discourse
anaphora. That said, it appears that in Binding Theory proper, c-command (more
or less), rather than precedence, is what counts.

Thursday, February 9, 2017

This note is not mine, but one that Dan Milway sent me (here). He blogged about instrumentalism as the guiding philo of science position in linguistics and argues that adopting it fervently is misguided. I agree. I would actually go farther and question whether instrumentalism is ever a reasonable position to hold. I tend to be realist in my scientific convictions thinking that my theories aim to describe real natural objects and that the aim of data collection is to illuminate the structure of these real objects. I think that this is the default view in physics and IMO what's good enough for physicists is good enough for me (when I can aim that high) so it is my default view in ling.

Dan's view is more nuanced and I believe you will enjoy reacting to it (or not).

Saturday, February 4, 2017

There is still quite a bit of skepticism in the cog-neuro
community about linguistic representations and their implications for
linguistically dedicated grammar specific nativist components. This skepticism
is largely fuelled, IMO, by associationist-connectionist (AC) prejudices
steeped in a nihilistic Empiricist brew.Chomsky and Fodor and Gallistel have decisively debunked the relevance
of AC models of cognition, but these ideas are very very very (very…) hard to
dispel. It often seems as if Lila Gleitman was correct when she mooted the
possibility that Empiricism is hard wired in and deeply encapsulated, thus
impervious to empirical refutation. Even as we speak the default view in
cog-neuro is ACish and that there is a general consensus in the cog-neuro
community that the kind of representations that linguists claim to have
discovered just cannot be right for the
simple reason that the brain simply cannot
embody them.

Gallistel and Matzel (see here)
have deftly explored this unholy alliance between associationist psych and
connectionist neuro that anchors the conventional wisdom. Interestingly, this anti
representationalist skepticism is not restricted to the cog-neuro of language.
Indeed, the Empiricist AC view of minds and brains has over the years permeated
work on perception and it has generated skepticism concerning mental (visual) maps
and their cog-neuro legitimacy. This is currently
quite funny for over the last several years Nobel committees have been falling
all over themselves in a rush to award prizes to scientists for the discovery
of neural mental maps. These awards are well deserved, no doubt, but what is
curious is how long it’s taken the cog-neuro community to admit mental maps as
legit hypotheses worthy of recognition.For a long time, there was quite a bit of excellent behavioral evidence
for their existence, but the combo of associationist dogma linked to Hebbian
neuro made the cog-neuro community skeptical that anything like this could be so. Boy were they wrong and, in
retrospect, boy was this dumb, big time dumb!

Here
is a short popular paper (By Kate
Jeffery) that goes over some of the relevant history. It traces the
resistance to the very idea of mental maps stemming from AC preconceptions.
Interestingly, the resistance was both to the behavioral evidence in favor of
these (the author discusses Tolman’s work in the late 40s. Here’s a quote (5):

Tolman, however, discovered that
rats were able to do things in mazes that they shouldn’t be able to do
according to Behaviourism. They could figure out shortcuts and detours, for
example, even if they hadn’t learned about these. How could they possibly do
this? Tolman was convinced animals must have something like a map in their
brains, which he called a ‘cognitive map’, otherwise their ability to discover
shortcuts would make no sense. Behaviourists were skeptical. Some years later,
when O’Keefe and Nadel laid out in detail why they thought the hippocampus
might be Tolman’s cognitive map, scientists were still skeptical.

Why the resistance? Well ACism prevented conceiving of the
possibility.Here’s how Jeffery put it
(5-6).

One of the difficulties was that
nobody could imagine what a map in the brain would be like. Representing
associations between simple things, such as bells and food, is one thing; but
how to represent places? This seemed to require the mystical unseen
internal ‘black box’ processes (thought and imagination) that Behaviourists had
worked so hard to eradicate from their theories. Opponents of the cognitive map
theory suggested that what place cells reveal about the brain is not a map, so
much as a remarkable capacity to associate together complex sensations such as
images, smells and textures, which all happen to come together at a
place but aren’t in themselves spatial.

Note that the problem was not the absence of evidence for the position. Tolman presented lots
of good evidence. And O’Keefe/Nadel presented more (in fact enough more to get
the Nobel prize for the work). Rather the problem was that none of this made
sense in an AC framework so the Tolman-O’Keefe/Nadel theory just could not be
right, evidence be damned.[1]

What’s the evidence that such maps exist? It involves
finding mental circuits that represent spatial metrics, allowing for the
calculation of metric inferences (where something is and how it is from where
you are). The two kinds of work that have been awarded Nobels involve place
cells and grid cells. The former involve the coding of direction, the latter
coding distance. The article does a nice job of describing what this involves,
so I won’t go into it here.Suffice it
to say, that it appears that Kant (a big deal Rationalist in case you were
wondering) was right on target and we now have good evidence for the existence
of neural circuits that would serve as brain mechanisms for embodying Kant’s
idea that space is a hard wired part of our mental/neural life.

Ok, I cannot resist. Jeffery nicely outlines he challenge
that these discoveries pose for ACism. Here’s another quote concerning grid
cells (the most recent mental map Nobel here)
and how badly it fits with AC dogma (8):[2]

The importance of grid cells lies
in the apparently minor detail that the patches of firing (called ‘firing
fields’) produced by the cells are evenly spaced. That this makes a pretty
pattern is nice, but not so important in itself – what is startling is
that the cell somehow ‘knows’ how far (say) 30 cm is – it must do, or it
wouldn’t be able to fire in correctly spaced places. This even spacing of
firing fields is something that couldn’t possibly have arisen from building up
a web of stimulus associations over the life of the animal, because 30 cm (or
whatever) isn’t an intrinsic property of most environments, and therefore can’t
come through the senses – it must come from inside the rat, through some
distance-measuring capability such as counting footsteps, or measuring the
speed with which the world flows past the senses. In other words, metric
information is inherent in the brain, wired into the grid cells as it were,
regardless of its prior experience. This was a surprising and dramatic
discovery. Studies of other animals, including humans, have revealed place,
head direction and grid cells in these species too, so this seems to be a
general (and thus important) phenomenon and not just a strange quirk of the lab
rat.

As readers of FL know, this is a point that Gallistel and
colleagues have been making for quite a while now and every day the evidence
for neural mechanisms that code for spatial information per se grows stronger. Here is another very recent addition to the
list, one that directly relates to the idea that dead-reckoning involves path
integration. A recent Science paper (here)
reports the discovery of neurons tuned to vector properties. Here’s how the
abstract reports the findings:

To
navigate, animals need to represent not only their own position and
orientation, but also the location of their goal. Neural representations of an
animal’s own position and orientation have been extensively studied. However,
it is unknown how navigational goals are encoded in the brain. We recorded from
hippocampal CA1 neurons of bats flying in complex trajectories toward a spatial
goal. We discovered a subpopulation of neurons with angular tuning to the goal
direction. Many of these neurons were tuned to an occluded goal, suggesting
that goal-direction representation is memory-based. We also found cells that
encoded the distance to the goal, often in conjunction with goal direction. The
goal- direction and goal-distance signals make up a vectorial representation of
spatial goals, suggesting a previously unrecognized neuronal mechanism for
goal-directed navigation.

So, like place
and distance, some brains have the wherewithal to subserve vector
representations (goal direction and distance). Moreover, this information is
coded by single neurons (not nets) and is available in memory representations,
not merely for coding sensory input. As the paper notes, this is just the kind
of circuitry relevant to “the vector-based navigation strategies described for
many species, from insects to humans (14–19)— suggesting a previously
unrecognized mechanism for goal-directed navigation across species” (5).

So, a whole
series of neurons tuned to abstracta like place, distance, goal, angle of
rotation, and magnitude that plausibly subserve the behavior that has long been
noted implicates just such neural circuits. Once again, the neuroscience is finally
catching up with the cognitive science. As with parents, the more neuro science
matures the smarter classical cognitive science becomes.

Let me
emphasize this point, one that Gallistel has forcefully made but is worth
repeating at every opportunity until we can cleanly chop off the Empiricist
zombie’s head. Cognitive data gets too little respect in the cog-neuro world.
But in those areas where real progress has been made, we repeatedly find that
the cog theories remain intact even as the neural ones change dramatically. And
not only cog-neuro theories. The same holds for the relation of chemistry to
physics (as Chomsky noted) and genetics to biochemistry (as Gallistel has observed).
It seems that more often than not what needs changing is the substrate theory not the reduced theory. The same scenario is
being repeated again in the cog-neuro world. We actually know very little about
brain hardware circuitry and we should stop assuming that ACish ideas should be
given default status when we consider ways of unifying cognition with neuroscience.

Consider one
more interesting paper that hits a Gallistel theme, but from a slightly
different angle. I noted that the Science
paper found single neurons coding for abstract spatial (vectorial) information.
There is another recent bit of work (here) that ran across my
desk[3]
that is also has a high Gallistel-Intriguing (GI) index.

It appears that
slime molds can both acquire info about their environment and can pass this
info on to other slime molds. What’s interesting is that these slime molds are
unicellular, thus the idea that learning in slime molds amounts to fine tuning
a neural net cannot be correct. Thus
whatever learning is in this case must be intra,
not inter-neural.And this supports the idea that one has intra
cellular cognitive computations. Furthermore, when slime molds “fuse” (which
they apparently can do, and do do) the information that an informed slime mold
has can transfer to its fused partner. This supports the idea that learning can
be a function of the changed internal
state of a uni-cellular organism.

This is clearly
grist for the Gallistel-King conjecture (see here for some discussion)
that (some) learning is neuron, not net, based. The arguments that Gallistel
has given over the years for this view have been both subtle, abstract and
quite arm-chair (and I mean this as a compliment). It seems that as time goes
by, more and more data that fits this conception comes in. As Gallistel (and
Fodor and Pylyshyn as well) noted, representational accounts prefer certain
kinds of computer architectures over others (Turing-von Neumann architectures).
These classical computer architectures, we have been told, cannot be what brains exploit. No, brains, we are told repeatedly,
use nets and computation is just the Hebb rule with information stored in the strength
of the inter-neuronal connections. Moreover, this information is very ACish
with abstracta at best emergent, rather than endogenous features of our neural
make-up. Well, this seems to be wrong. Dead wrong. And the lesson I draw form
all of this is that it will prove wrong for language as well. The sooner we
dispense with ACism, the sooner we will start making some serious progress.
It’s nothing but a giant impediment, and has proven to be so again and again.

[1]
This is a good place to remind you of the difference between Empiricist and empirical. The latter is responsiveness to evidence. The former is a
theory (which, IMO, given its lack of empirical standing has become little more
than a dogma).

[2]
It strikes me as interesting that this sequence of events reprises what took
place in studies of the immune system. Early theories of antibody formation
were instructionist because how could the body natively code for so many
antibodies? As work progressed, Nobel prizes streamed to those that challenged
this view and proposed selectionist theories wherein the environment selected
from a pre-specified innately generated list of options (see here). It seems that
the less we know, the greater the appeal of environmental conceptions of the
origin of structure (Empiricism being the poster child for this kind of
thinking). As we come to know more, we come to understand how rich is the
contribution of the internal structure of the animal to the problem at hand.
Selectionism and Rationalism go hand in hand. And this appears to be true for
both investigations of the body and the mind.