Comments

Saturday, July 30, 2016

It is somewhat surprising that Harper’s felt the need to run a hit piece by Tom Wolfe on Chomsky
in its August issue (here).
True, such stuff sells well. But given that there are more than enough engaging
antics to focus on in Cleveland and Philadelphia one might have thought that
they would save the Chomsky bashing for a slow news period. It is a testimony
to Chomsky’s stature that there is a publisher of a mainstream magazine who
concludes that even two national conventions featuring two of the most
unpopular people ever to run for the presidency won’t attract more eyeballs
than yet another takedown of Noam Chomsky and Generative Grammar (GG).

Not surprisingly, content wise there is nothing new here. It
is a version of the old litany. Its only distinction is the over the top nuttiness of the writing (which, to be honest, has a certain charm in its
deep dishonesty and nastiness) and its complete disregard for intellectual
integrity. And, a whiff of something truly disgusting that I will get to at the
very end. I have gone over the “serious” issues that the piece broaches before
in discussions of analogous hit jobs in the New
Yorker, the Chronicle of Higher Education,
and Aeon (see here
and here
for example). Indeed, this blog was started as a response to what this piece is
a perfect example of: the failure of people who criticize Chomsky and GG to
understand even the basics of the views they are purportedly criticizing.

Here’s the nub of my earlier observations: Critics like Everett
(among others, though he is the new paladin for the discontented and features
prominently in this Wolfe piece too) are not engaged in a real debate for the
simple reason that they are not addressing positions that anyone holds or has
ever held. This point has been made repeatedly (incuding by me), but clearly to
no avail. The present piece by Wolfe continues in this grand tradition. Here's what I've concluded: pointing out that neither Chomsky nor GG has ever held the
positions being “refuted” is considered impolite. The view seems to be that
Chomsky has been rude, sneaky even, for articulating views against which the deadly
criticisms are logically refractory. Indeed, the critics refusal to address
Chomsky’s actual views suggests that they think that discussing his stated
positions would only encourage him in his naughty ways. If Chomsky does not
hold the positions being criticized then he is clearly to blame for these are
the positions that his critics want him to hold so that they can pummel him for
holding them. Thus, it is plain sneaky of him to not hold them and in failing to hold them Chomsky clearly shows
what a shifty, sneaky, albeit clever, SOB he really is because any moderately
polite person would hold the views that Chomsky’s critics can demonstrate to be
false! Given this, it is clearly best to ignore what Chomsky actually says for
this would simply encourage him in articulating the views he in fact holds, and
nobody would want that. For concreteness, let’s once again review what the
Chomsky/GG position actually is regarding recursion and Universal Grammar (UG).

The Wolfe piece in Harper’s
is based on Everett’s critique of Chomsky’s view that recursion is a central
feature of natural language. As you are all aware, Everett believes that he has
discovered a language (Piraha) whose G does not recurse (in particular, that
forbids clauses to be embedded within clauses). Everett takes the putative
absence of recursion within Piraha to rebut Chomsky’s view that recursion is a
central feature of human natural language precisely because he believes that it
is absent from Piraha Gs. Everett further takes this purported absence as
evidence against the GG conception of UG and the idea that humans come with a
native born linguistic facility to acquire Gs.For Everett human linguistic facility is due to culture, not biology
(though why he thinks that these are opposed to one another is quite unclear).
All of these Everett tropes are repeated in the Wolf piece, and if repetition
were capable of improving the logical relevance of non-sequiturs, then the
Wolfe piece would have been a valuable addition to the discussion.

How does the Everett/Wolfe “critique” miss the mark? Well, the
Chomsky-GG view of recursion as a feature of UG does not imply that every human
G is recursive. And thinking that it does is to confuse Chomsky Universals (CU)
with Greenberg Universals (GU). I have discussed this before in many many posts
(type in ‘Chomsky Universals’ or ‘Greenberg Universals’ in the search box and
read the hits). The main point is that for Chomsky/GG a universal is a design feature
of the Faculty of Language (FL) while for Greenberg it is a feature of
particular Gs.[1]
The claim that recursion is a CU is to say that humans endowed with an FL
construct recursive Gs when presented with the appropriate PLD. It makes no
claim as to whether particular Gs of particular native speakers will allow
sentences to licitly embed within sentences. If this is so, then Everett’s
putative claim that Piraha Gs do not allow sentential recursion has no
immediate bearing on the Chomsky-GG claims about recursion being a design
feature of FL. That FL must be able to construct Gs with recursive rules does
not imply that every G embodies recursive rules. Assuming otherwise is to
reason fallaciously, not that such logical niceties have deterred Everett and friends.

Btw: I use ‘putative claim’ and ‘purported absence’ to
highlight an important fact. Everett’s empirical claims are strongly contested.
Nevins, Pesetsky and Rodrigues (NPR) have provided a very detailed rebuttal of
Everett’s claims that Piraha Gs are recursiveless.[2]
If I were a betting man, my money would be in NPR. But for the larger issue it
doesn’t matter if Everett is right and NPR are wrong. Thus, even were Everett right about the facts
(which, I would bet that he isn’t) it would be irrelevant to his conclusion regarding the implications of Piraha
for the Chomsky/GG claims concerning UG and recursion.

So what would be relevant evidence against the Chomsky/GG
claim about the universality of recursion? Recall that the UG claim concerns
the structure of FL, a cognitive faculty that humans come biologically endowed
with. So, if the absence of recursion in Piraha Gs resulted from the absence of
a recursive capacity in Piraha speakers’ FLs then this would argue that
recursion was not a UG property of human FLs. In other words, if Piraha speakers couldnot acquire recursive Gs then we would have direct evidence that
human FLs are not built to acquire recursive Gs. However, we know that this conditional
is FALSE.
Piraha kids have no trouble acquiring Brazilian Portuguese (BP), a language
that everyone agrees is the product of a recursive G (e.g. BP Gs allow
sentences to be repeatedly embedded within sentences).[3]Thus, Piraha speakers’ FLs are no less
recursively capable than BP speakers’ FLs or English speakers’ FLs or Swahili
speakers’ FLs or... We can thus conclude that Piraha FLs are just human FLs and
have as a universal feature the
capacity to acquire recursive Gs.

All of this is old hat and has been repeated endlessly over
the last several years in rebuttal to Everett’s ever more inflated claims. Note
that if this is right, then there is no (as in none, nada, zippo, bubkis,
gornisht) interesting “debate” between Everett and Chomsky concerning recursion.
And this is so for one very simple reason. Equivocation obviates the
possibility of debate. And if the above is right (and it is, it really is) then
Everett’s entire case rests on confusing CUs and GUs. Moreover, as Wolfe’s
piece is nothing more than warmed over Everett plus invective, its actual critical
power is zero as it rests on the very same confusion.[4]

But things are really much worse than this. Given how often
the CU/GU confusion has been pointed out, the only rational conclusion is that
Everett and his friends are deliberately running these two very different
notions together. In other words, the confusion is actually a strategy. Why do
they adopt it? There are two explanations that come to mind. First, Everett and
friends endorse a novel mode of reasoning. Let’s call it modus non sequitur, which has the abstract form “if P why not Q.”It is a very powerful
method of reasoning sure to get you where you want to go. Second possibility:
Everett and Wolfe are subject to Sinclair’s Law, viz. It
is difficult to get a man to understand something when his salary depends upon
his not understanding it. If we understand ‘salary’ broadly to include the
benefits of exposure in the high brow press, then … All of which brings us to Wolfe’s
Harper’s piece.

Happily for the Sinclair inclined, the absence of possible debate
does not preclude the possibility of considerable controversy. It simply
implies that the controversy will be intellectually barren. And this has
consequences for any coverage of the putative debate. Articles reprising the issues will focus on
personalities rather than substance, because, as noted, there is no substance
(though, thank goodness, there can be heroes engaging in the tireless (remunerative)
pursuit of truth). Further, if such coverage appears in a venue aspiring to
cater to the intellectual pretensions of its elite readers (e.g. The New Yorker, the Chronicle and, alas, now Harper’s)
then the coverage will require obscuring the pun at the heart of the matter.
Why? Because identifying the pun (aka equivocation) will expose the discussion
as, at best, titillating gossip for the highbrow, at middling, a form of
amusing silliness (e.g. perfect subject matter for Emily Litella) and, at
worst, a form of celebrity pornography in the service of character
assassination. Wolfe’s Harper’s piece
is the dictionary definition of the third option.

Why do I judge Wolfe’s article so harshly? Because he quotes
Chomsky’s observation that Everett’s claims even if correct are logically
irrelevant. Here’s the full quote (39-40):

“It”—Everett’s opinion; he does not refer to Everett by
name—“amounts to absolutely nothing, which is why linguists pay no attention to
it. He claims, probably incorrectly, it doesn’t matter whether the facts are
right or not. I mean, even accepting his claims about the language in question—Pirahã—tells
us nothing about these topics. The speakers of this language, Pirahã speakers,
easily learn Portuguese, which has all the properties of normal languages, and
they learn it just as easily as any other child does, which means they have the
same language capacity as anyone else does.”

A serious person might have been interested in finding out why Chomsky thought Everett’s claims
“tell us nothing these topics.” Not Wolfe. Why try to understand issues that
might detract from a storyline? No, Wolfe quotes Chomsky without asking what he
might mean. Wolfe ignores Chomsky's identification of the equivocation as soon as he
notes it. Why? Because this is a hit piece and identifying the equivocation at
the heart of Everett’s criticism would immediately puncture Wolfe’s central
conceit (i.e. heroic little guy slaying the Chomsky monster).

Wolfe clearly hates Chomsky. My reading of his piece is that
he particularly hates Chomsky’s politics and the article aims to discredit the
political ideas by savaging the man. Doing this requires demonstrating that
Chomsky, who, as Wolfe notes is one of the most influential intellectuals of
all time, is really a charlatan whose touted intellectual contributions have
been discredited. This is an instance of the well know strategy of polluting
the source. If Chomsky’s (revolutionary) linguistics is bunk then so are his
politics. A well-known fallacy this, but not less effective for being so.
Dishonest and creepy? Yes. Ineffective? Sadly no.

So there we have it. Another piece of junk, but this time in
the style of the New
Journalism. Before ending however, I want to offer you some quotes that
highlight just how daft the whole piece is. There was a time that I thought
that Wolfe was engaging in Sokal
level provocation, but I concluded that he just had no idea what he was talking
about and thought that stringing technical words together would add authority
to his story. Take a look at this one, my favorite (p. 39):

After all, he [i.e. Chomsky, NH] was very firm in his insistence
that it [i.e. UG, NH] was a physical structure. Somewhere in the brain the language
organ was actually pumping the UG through the deep structure so that
the LAD, the language acquisition device, could make language, speech,
audible, visible, the absolutely real product of Homo sapiens’s central
nervous system. [Wolfe’s emphasis, NH].

Is this great, or what! FL pumping UG through the deep structure.
What the hell could this mean? Move
over “colorless green ideas sleep furiously” we have a new standard for
syntactically well-formed gibberish. Thank you Mr Wolfe for once again
confirming the autonomy of syntax.

Or this encomium to cargo cult science (37):

It [Everett’s book, NH] was dead serious in an academic sense.
He loaded it with scholarly linguistic and anthropological reports of his findings
in the Amazon. He left academics blinking . . . and nonacademics with eyes wide
open, staring.

Yup,
“loaded” with anthro and ling stuff that blinds professionals and leaves
neophytes agog. Talk of scholarship. Who could ask for more? Not me. Great
stuff.

Here’s
one more, where Wolfe contrasts Chomsky and Everett (31):

Look at him! Everett was everything Chomsky wasn’t: a rugged
outdoorsman, a hard rider with a thatchy reddish beard and a head of thick
thatchy reddish hair. He could have passed for a ranch hand or a West Virginia
gas driller.

Methodist
son of a cowboy rather than the son of Russian Askenazic Jews infatuated with
political “ideas long since dried up and irrelevant,” products “perhaps” of a
shtetl mentality (29). Chomsky is an indoor linguist “relieved not to go into
the not-so-great outdoors,” desk bound “looking at learned journals with
cramped type” (27) and who never left the computer, much less the building”
(31). Chomsky is someone “very high, in an armchair, in an air conditioned
office, spic and span” (36), one of those intellectuals with “radiation-bluish
computer screen pallors and faux-manly open shirts” (31) never deigning to
muddy himself with the “muck of life down below” (36). His linguistic “hegemony”
(37) is “so supreme” that other linguists are “reduced to filling in gaps and
supplying footnotes” (27).

Wowser.
It may not have escaped your notice that this colorful contrast has an unsavory
smell. I doubt that its dog whistle overtones were inaudible to Wolfe. The scholarly blue-pallored desk bound bookish high and mighty (Ashkenazi) Chomsky
versus the outdoorsy (Methodist) man of the people and the soil and the
wilderness Everett. The old world shtetl mentality brought down by a (lapsed)
evangelical Methodist (32). Trump’s influence seems to extend to Harper’s. Disgusting.

That’s
it for me. Harper’s should be ashamed
of itself. This is not just junk. It is garbage. The stuff I quoted is just a
sampling of the piece’s color. It is deeply ignorant and very nasty, with a nastiness
that borders on the obscene. Your friends will read this and ask you about it.
Be prepared.

[1]
Actually, Greenberg’s own Universals were properties of languages not Gs. More
exactly, they describe surface properties of strings within languages. As
recursion is in the first instance a property of systems of rules and only
secondarily a property of strings in a language, I am here extending the notion
Greenberg Universal to apply to properties all Gs share rather than all
languages (i.e. surface products of Gs) share.

[2]
Incidentally, Wolfe does not address these counterarguments. Instead he
suggests that NPR are Chomsky’s pawns who blindly attack anyone who exposes
Chomsky’s fallacies (see p.35).However,
reading Wolfe’s piece indicates that the real reason he does not deal with
NPT’s substantive criticisms is that he cannot. He doesn’t know anything so he must ignore the substantive issues and
engage in ad hominem attacks. Wolfe has not written a piece of popular science
or even intellectual history for the simple reason that he does not appear to
have the competence required to do so.

[3]
It is worth pointing out that sentence recursion is just one example of
recursion. So, Gs that repeatedly embed DPs within DPs or VPs within VPs are
just as resursive as those that embed clauses within clauses.

[4]
See Wolfe’s discussion of the “law” of recursion on 30-31. It is worth noting that
Wolfe seems to think that “discovering” recursion was a big deal. But if it was
Chomsky was not its discoverer, as his discussion of Cartesian precursors
demonstrates. Recursion follows trivially from the fact of linguistic
creativity. The implications of the fact that humans can and do acquire
recursive Gs are significant. The fact itself is a pretty trivial observation.

Wednesday, July 27, 2016

Norbert and regular readers of this prestigious blog may have seen me participate in some discussions about open access publishing, e.g. in the wake of the Lingua exodus or after Norbert's link to that article purportedly listing a number of arguments in favor of traditional publishers. One thing that I find frustrating about this debate is that pretty much everybody who participates in it thinks of this issues as how the current publishing model can be reconciled with open access. That is a very limiting perspective, in my opinion, just like every company that has approached free/libre and open source software (aka FLOSS) with the mindset of a proprietary business model has failed in that domain or is currently failing (look at what happened to OpenOffice and MySQL after Oracle took control of the projects). In that spirit, I'd like to conduct a thought experiment: what would academic publishing look like if it didn't have decades of institutional cruft to carry around? Basically, if academic publishing hadn't existed until a few years ago, what kind of system would a bunch of technically-minded academics be hacking away on?

Wednesday, July 20, 2016

L&M identifies two other important properties that were
central to the Cartesian view.

First, human linguistic usage is apparently free from
stimulus control “either external or internal.” Cartesians thought that animals
were not really free, animal behavior being tightly tied to either
environmental exigencies (predators, food location) or to internal states
(being hungry or horny). The law of effect is a version of this view (here). I am dubious that
this is actually true of animals. And, I recall a quip from an experimental
psych friend of mine that claimed that the first law of animal behavior is that
the animal does whatever it damn well pleases. But, regardless of whether this
is so for animals, it is clearly true of humans as manifest in their use of
language. And a good thing too, L&M notes. For this freedom from stimulus
control is what allows “language to serve as an instrument of thought and self-expression,”
as it regularly does in daily life.

L&M notes that Cartesians did not take unboundedness or
freedom from stimulus control to “exceed the bounds of mechanical explanation”
(12). This brings us to the third feature of linguistic behavior: the coherence
and aptness of everyday linguistic behavior. Thus, even though linguistic
behavior is not stimulus bound, and hence not tightly causally bound to
external or internal stimuli, linguistic behavior is not scattershot either.
Rather it displays “appropriateness to the situation.” As L&M notes, it is
not clear exactly how to characterize condign linguistic performance, though
“there is no doubt that these are meaningful concepts…[as] [w]e can distinguish
normal use of language from the ravings of a lunatic or the output of a
computer with a random element” (12). This third feature of linguistic
creativity, its aptness/fit to the situation without being caused by it was,
for Cartesians, the most dramatic expression of linguistic creativity.

Let’s consider these last two properties a little more
fully: (i) stimulus-freedom (SF) and (ii) apt fit (AF).

Note first that both kinds of creativity though expressed in
language, are not restricted to linguistic performances. It’s just that normal
language use provides everyday manifestations of both features.

Second, the sources of both these aspects of creativity are,
so far as I can tell, still entirely mysterious. We have no idea how to “model”
either SF or AF in the general case.
We can, of course, identify when specific responses are apt and explain why
someone said what they did on specific occasions. However, we have no general
theory that illuminates the specific instances.[1]
More precisely, it’s not that we have poor theories, it’s that we really have
no theories at all. The relevant factors remain mysteries, rather than problems
in Chomsky’s parlance. L&M makes this point (12-13):

Honesty forces us to admit that we
are as far today as Descartes was three centuries ago from understanding just
what enables a human to speak in a way that is innovative, free from stimulus
control, and also appropriate and coherent.

The intractability of SF and AF serves to highlight the
importance of the competence/performance distinction. The study of competence
is largely insulated from these mysterious factors.How so? Well, it abstracts away from use and studies capacities, not their exercise.
SF and PF are not restricted to linguistic
performances and so are unlikely intrinsically
linked to the human capacity for language. Hence detaching the capacity should
not (one hopes) corrupt its study, even if how competence is used for the free
expression of thought remains obscure.

The astute reader will notice that Chomsky’s famous review
of Skinner’s Verbal Behavior (VB) leaned
heavily on the fact of SF. Or more accurately, the review argued that it was
impossible to specify the contours of linguistic behavior by tightly linking it
to environmental inputs/stimuli or internal states/rewards. Why? Cartesians
have an answer: the Skinnerian project is hopeless. Our behavior is both SF and
AF, our verbal behavior included. Hence any approach to language that focuses
on behavior and its immediate roots
in environmental stimuli and/or rewards is doomed to failure. Theories built on
supposing that SF or AF are false will either be vacuous or evidently false.
Chomsky’s critique showed how VB embodied the twin horns of this dilemma. Score
one for the Cartesians.

One last point and I quit. Chomsky’s expansive discussion of
the various dimensions of linguistic creativity may shed light on “Das Chomsky
Probleme.” This is the puzzle of how, or whether, two of Chomsky’s interests,
politics and linguistics, hook up. Chomsky has repeatedly (and IMO, rightly)
noted that there is no logical relation between his technical linguistic work
and his anarchist political views. Thus, there is no sense in which accepting
the competence/performance distinction or thinking that TGG is required as part
of any solution to linguistic creativity or thinking that there must be a
language dedicated FL to allow for the facts of language acquisition in any way
imply that we should organize
societies on democratic bases in which all participants robustly participate,
or vice versa. The two issues are logically and conceptually separate.

This said, those parts of linguistic creativity that the
Cartesians noted and that remain as
mysterious to us today as when they were first observed can ground a
certain view of politics. And Chomsky talks about this (L&M:102ff). The
Cartesian conception of human nature as creative in the strong Cartesian sense
of SF and AF leads naturally to the conclusion that societies that respect
these creative impulses are well suited to our nature and that those that
repress them leave something to be desired. L&M notes that this creative
conception lies at the heart of many Enlightenment and, later, Romantic
conceptions of human well-being and the ethics and politics that would support
expression of these creative capacities. There is a line of intellectual
descent from Descartes through Rousseau to Kant that grounds respect for humans
in the capacity for this kind of “freedom.” And Chomsky is clearly attracted to
this idea. However, and let me repeat, however,
Chomsky has nothing of scientific substance to say about these kinds of
creativity, as he himself insists. He does not link his politics to the fact
that humans come with the capacity to develop TGGs. As noted, TGGs are at right
angles to SF and AF, and competence abstracts away from questions of
behavior/performance where SF and AF live. Luckily, there is a lot we can say
about capacities independent of considering how these capacities are put to
use. And that is one important point of L&M’s extended discussion of the
various aspects of linguistic creativity. That said, these three conceptions
connect up in Cartesian conceptions of human nature, despite their logical and
conceptual independence and so it is not surprising that Chomsky might find all
three ideas attractive even if they are relevant for different kinds of
projects. Chomsky’s political interests are conceptually separable from his
linguistic ones. Surprise, surprise it seems that he can chew gum and walk at
the same time!

Ok, that’s it. Too long, again. Take a look at the discussion
yourself. It is pretty short and very interesting, not the least reason being
how abstracting away from deep issues of abiding interest is often a
pre-condition for opening up serious inquiry. Behavior may be what interests
us, but given SF and AF is has proven to be refractory to serious study.
Happily, studying the structure of the capacity independent of how it is used
has proven to be quite a fertile area of inquiry. It would be a more productive
world were these insights in L&M more widely internalized by the
cog-neuro-ling communities.

[1]
The one area where SFitude might be relevant regards the semantics of lexical
items. Chomsky has argued against the denotational theories of meaning in part
by noting that there is no good sense in which words denote things. He
contrasts this with “words” in animal communication systess. As Chomsky has
noted, how lexical items work “pose deep mysteries,” something that referential
theories do not appreciate. See here
for references and discussion.

Wednesday, July 13, 2016

Once again, this post got away from me, so I am dividing it
into two parts.

As I mentioned in a recent previous post, I have just
finished re-reading Language & Mind (L&M)
and have been struck, once again, about how relevant much of the discussion is
to current concerns. One topic, however, that does not get much play today, but
is quite well developed in L&M is it’s discussion of Descartes’ very
expansive conceptions of linguistic creativity and how it relates to the
development of the generative program. The discussion is surprisingly complex
and I would like to review its main themes here. This will reiterate some
points made in earlier posts (here,
here)
but I hope it also deepens the discussion a bit.

Human linguistic creativity is front and center in L&M
as it constitutes the central fact animating Chomsky’s proposal for Transformational
Generative Grammar (TGG). The argument is that a TGG competence theory is a
necessary part of any account of the obvious fact that humans regularly use language in novel ways. Here’s
L&M (11-12):

…the normal use of language is
innovative, in the sense of much of what we say in the course of normal use is
entirely new, not a repetition of anything that we have heard before and not
even similar in pattern - in any useful sense of the terms “similar” and
“pattern” – to sentences or discourse that we have heard in the past. This is a
truism, but an important one, often overlooked and not infrequently denied in
the behaviorist period of linguistics…when it was almost universally claimed
that a person’s knowledge of language is representable as a stored set of
patterns, overlearned through constant repetition and detailed training, with
innovation being at most a matter of “analogy.” The fact surely is, however,
that the number of sentences in one’s native language that one will immediately
understand with no feeling of difficulty or strangeness is astronomical; and
that the number of patterns underlying our normal use of language and
corresponding to meaningful and easily comprehensible sentences in our language
is order of magnitudes greater than the number of seconds in a lifetime. It is
in this sense the normal use of language is innovative.

There are several points worth highlighting in the above
quote. First, note that normal use is “not
even similar in pattern” to what we have heard before.[1]
In other words, linguistic competence is not
an instance of pattern matching or recognition in any interesting sense of
“pattern” or “matching.”Native speaker
use extends both to novel sentences and to novel sentence patterns effortlessly. Why is this important?

IMO, one of the pitfalls of much work critical of GG is the
assimilation of linguistic competence to a species of pattern matching.[2]
The idea is that a set of templates (i.e. in L&M terms: “a stored set of
patterns”) combined with a large vocabulary can easily generate a large set of
possible sentences in the sense of templates saturated by lexical items that
fit.[3]
Note, that such templates can be hierarchically organized and so display one of the properties of natural
language Gs (i.e. hierarchical structures).[4]
Moreover, if the patterns are
extractable from a subset of the relevant data then these patterns/templates
can be used to project novel
sentences. However, what the pattern matching conception of projection misses
is that the patterns we find in Gs are not finite and the reason for this is
that we can embed patterns within patterns within patterns within…you get the
point. We can call the outputs of recursive rules “patterns” but this is
misleading for once one sees that the patterns are endless, then Gs are not
well conceived of as collections of patterns but collections of rules that generate patterns. And once one sees this, then the linguistic problem
is (i) to describe these rules and
their interactions and (ii) to further explain how these rules are acquired (i.e. not how the patterns are acquired).

The shift in perspective from patterns (and patternings in
the data (see note 5)) to generative procedures and the (often very abstract)
objects that they manipulate changes what the acquisition problem amounts to.
One important implication of this shift of perspective is that scouring strings
for patterns in the data (as many statistical learning systems like to do) is a
waste of time because these systems are looking for the wrong things (at least
in syntax).[5]
They are looking for patterns whereas they should be looking for rules. As the
output of the “learning” has to be systems of rules, not systems of patterns,
and as rules are, at best, implicit in patterns, not explicitly manifest by
them, theories that don’t focus on rules are going to be of little linguistic
interest.[6]

Let me make this point another way: unboundedness implies
novelty, but novelty can exist without unboundedness. The creativity issue
relates to the accommodation of novel
structures. This can occur even in small finite domains (e.g. loan words in
phonology might be an example). Creativity implies projection/induction, which must
specify a dimension of generalization along which inputs can be generalized so
as to apply to instances beyond the input. This, btw, is universally
acknowledged by anyone working on learning. Unboundedness makes projection a
no-brainer. However, it also has a second important implication. It requires
that the generalizations being made involve recursive rules. The unboundedness
we find in syntax cannot be satisfied via pattern matching. It requires a
specification of rules that can be repeatedly applied to create novel patterns.
Thus, it is important to keep the issue of unboundedness separate from that of
projection. What makes the unboundedness of syntax so important is that it
requires that we move beyond the pattern-template-categorization conception of
cognition.

Dare I add (more accurately, can I resist adding) that
pattern matching is the flavor of choice for the Empricistically (E) inclined.
Why? Well, as noted, everyone agrees
that induction must allow generalization beyond the input data. Thus even Es
endorse this for Es recognize that cognition involves projection beyond the
input (i.e. “learning”). The question is the nature of this induction. Es like
to think that learning is a function from input to patterns abstracted from the
input, the input patterns being perceptually
available in their patternings, albeit sometimes noisily.[7]
In other words, learning amounts to abstracting a finite set of patterns from
the perceptual input and then creating new instances of those patterns by
subbing novel atoms (e.g. lexical items) into the abstracted patterns. E
research programs amount to finding ways to induce/abstract patterns/templates
from the perceptual patternings in the data. The various statistical techniques
Es explore are in service of finding these patterns in the (standardly, very noisy)
input. Unboundedness implies that this kind of induction is, at best,
incomplete. Or, more accurately, the observation that the number of patterns is unbounded implies that
learning must involve more than pattern detection/abstraction. In domains where
the number of patterns is effectively infinite, learning[8]
is a function from inputs to rules that generate patterns, not to patterns
themselves. See link in note 6 for more discussion.

An aside: Most connectionist learners (and deep learners)
are pattern matchers and, in light of the above, are simply “learning” the wrong
things. No matter how many “patterns” the intermediate layers converge on from
the (mega) data they are exposed to they will not settle on enough given that
the number of patterns that human native speakers are competent in is
effectively unbounded. Unless the intermediate layers acquire rules that can be
recursively applied they have not acquired the right kinds of things and thus
all of this modeling is irrelevant no
matter how much of the data any given model covers.[9]

Another aside: this point was made explicitly in the quote
above but to no avail. As L&M notes critically (11): “it was almost
universally claimed that a person’s knowledge of language is representable as a
stored set of patterns, overlearned through constant repetition and detailed
training.” Add some statistical massaging and a few neural nets and things have
not changed much. The name of the inductive game in the E world is to look for
perceptual available patterns in the signal, abstract them and use them to
accommodate novelty. The unboundedness of linguistic patterns that L&M
highlights implies that this learning strategy won’t suffice the language case,
and this is a very important observation.

Ok, back to L&M

Second, the quote above notes that there is no useful sense
of “analogy” that can get one from the specific patterns one might abstract
from the perceptual data to the unbounded number of patterns with which native
speakers display competence. In other words, “analogy” is not the secret sauce
that gets one from input to rules So, when you hear someone talk about
analogical processes reach for your favorite anti-BS device. If “analogy” is
offered as part of any explanation of an inferential capacity you can be absolutely
sure that no account is actually being offered. Simply put, unless the
dimensions of analogy are explicitly specified the story being proffered is
nothing but wind (in both the Ecclesiastes and the scatological sense of the
term).

Third, the kind of infinity human linguistic creativity
displays has a special character: it is a discrete
infinity. L&M observes that human language (unlike animal communication
systems) does not consist of a “fixed, finite number of linguistic dimensions,
each of which is associated with a particular nonlinguistic dimension in such a
way that selection of a point along the linguistic dimension determines and
signals selection of a point along the associated nonlinguistic dimension”
(69). So, for example, higher pitch or chirp being associated with greater
intention to aggressively defend territory or the way that “readings of a
speedometer can be said, with an obvious idealization, to be infinite in
variety” (12).

L&M notes that these sorts of systems can be infinite,
in the sense of containing “an indefinitely large range of potential signals.”
However, in such cases the variation is “continuous” while human linguistic
expression exploits “discrete” structures that can be used to “express
indefinitely many new thoughts, intentions, feelings, and so on.”‘New thoughts’ in the previous quote clearly
meaning new kinds of thoughts (e.g.
the signals are not all how fast the car is moving). As L&M makes clear,
the difference between these two kinds of systems is “not one of “more” or
“less,” but rather of an entirely different principle of organization,” one
that does not work by “selecting a point along some linguistic dimension that
signals a corresponding point along an associate nonlinguistic dimension.”
(69-70).

In sum, human linguistic creativity implicates something
like a TGG that pairs discrete hierarchical structures relevant to meanings
with discrete hierarchical structures relevant to sounds and does so recursively.
Anything that doesn’t do at least
this is going to be linguistically irrelevant as it ignores the observable
truism that humans are, as matter of course, capable of using an unbounded
number of linguistic expressions effortlessly.[10]
Theories that fail to address this obvious fact are not wrong. They are
irrelevant.

Is hierarchical recursion all that there is to linguistic
creativity? No!! Chomsky makes a point of this in the preface to the enlarged
edition of L&M. Linguistic creativity is NOT identical to the “recursive
property in generative grammars” as interesting as such Gs evidently are
(L&M: viii). To repeat, recursion is a necessary feature of any account
aiming to account for linguistic creativity, BUT the Cartesian conception of
linguistic creativity consists of far more than what even the most
explanatorily adequate theory of grammar specifies.What more?

[1]
For an excellent discussion of this see Jackendoff’s very nice (though
unfortunately (mis)named) Patterns in the
mind (here).It is a first rate debunking of the idea that
linguistic minds are pattern matchers.

[2]
This is not unique to the linguistic cognition. Lots of work in cog sci seems
to identify higher cognition with categorization and pattern matching. One of
the most important contributions of modern linguistics to cog sci has been to
demonstrate that there is much more to cognition than this. In fact, the hard
problems have less to do with pattern recognition than with pattern generation
via rules of various sorts.See notes 5
and 6 for more off handed remarks of deep interest.

[3]
I suspect that some partisans of Construction Grammar fall victim to the same
misapprehension.

[4]
Many cog-neuro types confuse hierarchy with recursion. A recent prominent
example is in Frankland and Greene’s work on theta roles. See here
for some discussion. Suffice it to say, that one can have hierarchy without
recursion, and recursion without hierarchy in the derived objects that are
generated. What makes linguistic objects distinctive is that they are the
products of recursive processes that deliver hierarchically structured objects.

[5]
Note that unbounded implies novelty, but novelty can exist without
unboundedness. The creativity issue relates to easy handling of novel structures. This can occur even in
small finite domains. Creativity implies projection, which must specify a
dimension of generalization along which inputs can be extended to apply to instances
beyond the input. Unboundedness makes projection a no-brainer. It further
implies that the generalization involves recursive rules. Unboundedness cannot
be pattern matching. It requires a specification of rules that can be
repeatedly applied to create novel patterns. Thus, it is important to keep the
issue of unboundedness separate from that of projection. What makes the
unboundedness of syntax so important is that it requires that we move beyond
the pattern-template-categorization conception of cognition.

[6]
It is arguable that some rules are more manifest in the data that others are and so are more accessible to inductive
procedures. Chomsky makes this distinction in L&M, contrasting surface
structures which contains “formal properties that are explicit in the signal”
to deep structure and transformations for which there is very little to no such
information in the signal
(L&M:19). For another discussion of this distinction see (here).

[8]
We really should distinguish between ‘learning’ and ‘acquisition.’ We should
reserve the first term for the pattern recognition variety and adopt the second
for the induction to rules variety. Problems of the second type call for
different tools/approaches than those in the first and calling both ‘learning’
merely obscures this fact and confuses matters.

[9]
Although this is a sermon for another time, it is important to understand what
a good model does: it characterizes the underlying mechanism. Good models model
mechanism, not data. Data provides evidence for mechanism, and unless it does
so, it is of little scientific interest. Thus, if a model identifies the wrong
mechanism not matter how apparently successful in covering data, then it is the
wrong model. Period. That’s one of the reasons connectionist models are of
little interest, at least when it comes to syntactic matters.

I
should add, that analogous creativity concerns drive Gallistel’s arguments
against connectionist brain models.
He notes that many animals display an effectively infinite variety of behaviors
in specific domains (caching behavior in birds or dead reckoning in ants) and
that these cannot be handled by connectionist devices that simply track the
patterns attested. If Gallistel is right (and you know that I think he is) then
the failure to appreciate the logic of infinity makes many current models of
mind and brain beside the point.

[10]
Note that unbounded implies novelty, but novelty can exist without
unboundedness. The creativity issue relates to easy handling of novel structures. This can occur even in
small sets. Creativity implies projection which must specify a dimension of
generalization along which inputs can be extended to apply to instances beyond
the input. Unboundedness makes projection a no-brainer. It further implies that
the generalization is due to recursive rules that require more than
establishing a fixed number of patterns that can be repeatedly filled to create
novel instances of that pattern.

Tuesday, July 12, 2016

Talk about data problems! Here is one we should all be aware of. Beware native speakers with an agenda or a sense of humor. Thx to Paul Pietroski for bringing this sever data problem wrt speaker judgments.

Sunday, July 10, 2016

Here are three more short pieces (here,here, here) on the academic publishing landscape. All three relate to publishing in bio-med and so have only a glancing relation to what goes on in linguistics. We are shielded from many of the problems cited by the relative irrelevance of our work for useful products. There is clearly a lot of pressure on research to come to the right conclusion in some fields. So maybe we should consider our lack of funding from certain sources to be a partial blessing.

The last piece is a bit more interesting than the first two in that it tries to find ways of mitigating the pressures. One of the more interesting claims is that blind review did not do much to help to promote more objective reviewing. Another interesting idea is to have reviews signed and so reviewers are responsible for their comments. Of course, I can imagine that there are also downsides to this, especially if the reviewee is not someone that a reviewer would want to mess with for all sort of personal or professional reasons. At any rate, interesting stuff.

Wednesday, July 6, 2016

Every summer I go back to Generative Grammar (GG) re-education
camp. I pick up an old classic (or two) and reread it/them to see what I failed
to understand when I read it/them last and what nuggets there remain to mine.
This year, prompted by a project that I will tell you about soon (in order to
pick your collective brains) I re-read Syntactic
Structures (SS), Topics (T) and Language and Mind (L&M) (well, I’m
in the middle of the last two and have read the first twice). At any rate,
several things struck me and they seemed like good blog fodder, so let me
share.

Before I got into linguistics (when I was still but a starry
eyed philo major (no cracks please, too easy)) I though that deep structure
was, well, deep. After all why call it deep
structure if it was just another level, without particular significance. Wasn’t
it, after all, the place where Gs met semantics (at least in both SS and the
standard theory) and wasn’t meaning deep?

Moreover, I was not the only one who thought this. The
popular press circulating around GG always seemed to zero in on “deep
structure,” surface structure being so, well, surfacy. Like any philosopher,
given a choice between plumbing the depths and skimming the surface I was all
for going down and deep.

As I grew more sophisticated I came to realize the error of
my ways and how terminology had mislead me. I would sneer at terminologically
naived neophytes who failed to appreciate that “deep” did not mean
“fundamental.” I would knowingly intone that deep structure was just another
level and of no more intrinsic significance that any other level. I would also
glibly point out that meaning was not restricted to deep structure as the
Katz-Postal hypothesis was slowly giving way to interpretive theories of
semantic interpretation where surface structure fed some aspects of meaning
(Jackendoff 1972 being the seminal text).[1]

And I was wrong. Sophistication be damned, deep structure
really was/is deep, even if not in the way that I originally thought. That’s
what my summer rereading of the big three above showed me. So, why is deep
structure deep in the sense of terrifically significant to the GG enterprise?
Here’s why in one phrase: linguistic creativity (LC).

Chomsky noted that the fact of LC was underappreciated.
Humans are able to appropriately produce and easily understand a (practically)
infinite number of linguistic expressions. Or, as a matter of course, humans
produce or parse linguistic expressions they have never before encountered. The
capacity to do this requires that they
have internalized a system of rules. What kinds? Rules that tightly couple a
linguistic expression’s meaning with that linguistic expression’s articulation
(sound, gesture). Absent this kind of theory (aka a grammar that generates an
infinite number of sound meaning pairings) there is no possible account for
this easily observed fact that humans are linguistically creative.

Moreover, as Chomsky argues in SS and L&M and T, the
structure required to code for articulations are insufficient to represent core
aspects of meaning. Here’s Chomsky in Topics
(17):

It is clear…that deep structures
must be quite different from this surface structure. For one thing, the surface
representation in no way expresses the grammatical relations that are…crucial
for semantic interpretation. Secondly, in the case of ambiguous sentences such
as, for example, (5), only a single surface structure may be assigned but the
deep structures must obviously differ. Such examples …are sufficient to
indicate…that deep structures cannot be identified with surface structures. The
inability of surface structures to indicate semantically significant
grammatical relations (i.e., to serve as deep structures) is one fundamental
fact that motivated the development of transformational generative grammar…

Thus any account of LC which wants to account for the human
capacity to use an unbounded number of linguistic expressions (i.e. linguistic
expressions a given native speaker has never before encountered) must include a
system of rules that recursively generate sound meaning pairings based on
different kinds of representations that are G related. Given LC and the fact meaning
structures are different from sound structures there really is no other logical option than something like a
transformational GG.

Before proceeding, I want to make an unpaid political
announcement. GG has been regularly accused of dissing meaning. For example, the
autonomy of syntax is often misunderstood as the irrelevance of semantics. As
you all know, this is completely bogus. The autonomy of syntax thesis is the very
very weak claim. It notes that syntactic properties are not reducible to
semantic (or phonetic) ones. It does not deny that meaning (and sound) facts
are G irrelevant.

Moreover, Chomsky emphasizes this point in all three works.
In both T and L&M he emphasizes that the BIG problem with earlier
Structuralism was its inability to accommodate the simplest facts about meaning
(in particular what we now call theta roles (who did what to whom)). Thus, how
language delivers meaning was at the center of Chomsky’s novel GG proposals and
was the central feature of his critique of Structuralism. And this is not
something that it takes sophisticated close textual analysis to discover. This
leads me to think that many (maybe most) critics of GG’s syntactocentrism
simply did not (and have not) read the work being criticized. Not only is this
deeply ignorant, but is is intellectually irresponsible as well. Sadly, this
kind of ignorant criticism has become a hallmark of the anti GG literature,
something that people like Evans (see here
and links provided) and Everett (see here)
among others have further personalized. However, what is clear on rereading
these classics is that these critiques are not based on even a cursory reading
of the relevant texts.

Ok, back to the main programming: So, LC properly described leads
quickly to the modern conception of grammar, one with distinctive levels for
the coding of articulatory and semantic information (surface structure (S-S)
and deep structure (D-S)) and
operations that unite these levels (aka, transformations (T)). So what made
deep structure deep was the realization that LC required it and once one had
D-S and understood it to be structurally distinct from S-S then one needed Ts
to relate them and the whole modern GG enterprise is up and running. Here’s
Chomsky in L&M (17):

…the speaker makes infinite use of
finite means. His grammar must, then, contain a finite system of rules that
generate infinitely many deep and surface structures, appropriately related. It
must also contain rules that related these abstract structures to certain
representations of sound and meaning…

Well actually, most
(not all) of the modern GG enterprise is motivated by the fact of LC, in
particular the project of specifying the properties of particular human Gs and
the enterprise of specifying the properties humans must have for acquiring
these Gs. The minimalist program adds an extra dimension: the extra question
(already mooted in these early works btw) of separating out the linguistically
specific factors underlying these two capacities from the more cognitively and
computationally general ones that are underlie capacities but are not
specifically linguistically dedicated.

So, a recursive G is part of any theory aspiring to address
the fact of LC and given the difference between S-S and D-S this G will have at
least a D-S level, an S-S level and a T component to relate them. And this
brings us to why deep structure was a deep discovery. Critically, structuralism
was ready to recognize something like S-S. What structuralism missed was any
level analogous to D-S, the level relevant to semantic interpretation. Again
Chomsky in L&M (19):

…[M]odern structural and
descriptive linguistics … restricts itself to the analysis of what I have
called surface structure, to formal properties that are explicit in the signal
and to phrases and units that can be determined from the signal by techniques
of segmentation and classification…[S]uch taxonomic analysis leaves no place
for the deep structures…[which] cannot be derived…by segmentation and
classification of segmented units, nor can the transformational operations
relating deep and surface structure…

So, what brought down classical structuralism and the
Empiricist/Behaviorist psychology that it embraced? Well, the observation that
LC required something like D-S. That, in short, is one reason why D-S really is
deep.

I should add that the relevance of this line of thinking to
G issues has still not been entirely internalized. There is an industry trying
to show that phrase structure can be statistically induced from the signal,
thinking that were this so the GG enterprise would be fatally wounded (see
Elissa Newport’s work on this for example). There is nary a mention of the
problem of relating D-Sish facts and S-Sish facts. The idea seems to be that if
we could just get hierarchically structured S-Ss from the signal the whole GG
project as envisioned by Chomsky over 60 years ago would be discredited as
fundamentally empirically flawed. There is little recognition that the problems
for structuralism and its attendant empiricist psychology started from the
concession that S-S might be amenable
to standard analytic (associationist) techniques.[2]
The problem was that structuralism left out half the problem, the D-S part.
Things, sadly, are no better today in much of the anti-GG literature.

There is a second reason that D-S was considered deep: it
pointed to where language was likely to be invariant. Chomsky notes this in
L&M discussing the philosophical grammarians (e.g. Port Royal types). He
observes that modern conceptions of GG “make the assumption that languages will
differ very little despite considerable diversity in superficial realization”
(76). Where will languages be “similar”? “[O]nly at the deeper level , the
level at which grammatical relations are expressed and at which the processes
that provide for the creative aspect of language use are to be found” (77).
Thus, D-S and the attendant operations that deliver a corresponding S-S were
the natural locus of invariance given the
obvious surface diversity of natural languages. Thus, the other deep
property of D-S was that it and the principles mapping it to S-S were likely to
be invariant across Gs, these invariances being key features of UG.

So, deep structure had some important features that arguably
made it deep. But I can sense all you minimalists out there developing an
uncomfortable intellectual itch that can be characterized roughly as follows:
how deep could deep structure be given that contemporary theories have
dispensed with it. Good itch. Let me scratch.

First, we have retained much of the point of deep structure
in contemporary theory. So, for example, nobody now thinks that the syntactic
structure relevant to surface phonetic form is the same as required to code for
underlying grammatical function/thematic form. Indeed, given the predicate
internal subject hypothesis there is almost no sentence, no matter how simple
in which the underlying semantic subject (the external argument) starts in
surface subject position (e.g. Spec T). The structure relevant to phon
interpretation is understood as different from the syntax relevant for sem
interpretation and it is taken for granted that any adequate G will have to
generate an infinite number of phon-sem pairs. In other words, the moral of D-S has been completely
internalized.

So too has the idea that D-S is G invariant. Contemporary
syntactic theory does not tolerate variation in the mapping of theta roles to
initial phrase marker positions. We are all UTAHers now! Thus, we do not expect
Gs just like English but where affected objects are underlying subjects and
agents are underlying objects. This is not a dimension of permissible
variation. Nor do we expect the mapping principles that deliver CI (and
possibly AP) interpretable objects to differ significantly. Operations are
constrained by universal principles like phase impenetrability (aka
subjacency), the ECP, minimality, etc. When we think of universals in GG, this
is the kind of thing we are assuming.
GG makes no claim about surface invariances. We expect the overt surface
properties of language to vary dramatically. We expect little CI/LF variation
and no variation in the principles of UG. Thus, invariance lives in the forms/derivations
that feed CI, not in the surface realizations of these derivations. Again, this
endorses much of the D-S conceptions outlined in SS, L&M and Topics.

So where the difference? Modern syntactic theory,
minimalism, has largely abandoned the technology of D-S, not its grammatical
point. Minimalism no longer assumes that there is a G level like deep structure
or D-structure, i.e. a level at which
GFs are determined by something like a phrase structure rules. This was part of
every prior (Chomskyan) GG theory. The rejection of D-S (and its analogues) has
been more or less complete.

We have given up the idea that D-S is the product of PS
rules which all apply prior to displacement operations. In fact, thoroughly
modern Minimalists don’t recognize a formal distinction between E-merge and I-merge,
both just being instances of the same underlying Merge operation. Furthermore,
Bare Phrase Structure has eliminated the distinction between Structure building
and lexical insertion so critical to earlier D-S conceptions. In modern theory there
is strictly speaking nothing like a PS rule anymore and so not much left of the
idea that the Grammatical Functions relevant to semantic interpretation are
coded via PS rules.

There remains one last residue of the old technical D-S
idea. Some (e.g. Chomsky and PRO lovers everywhere) still hold onto the view
that the logical GF roles are products of E-merge exclusively. Others (e.g.
moi) do not restrict GF marking to E-merge, but allow that marking via I-merge.
However, this is really the last place where the technical notion of D-S has
life. I, of course, believe that I am right and Chomsky is wrong here (though I
would not bet on myself, at least not a lot). However, this is really the last
residue of the older conception of D-S as a level. The technical conception seems largely gone, though the empirical and
conceptual points D-S served have been completely internalized.

Last point: here’s something else that struck me in
rereading this literature: why is it that S-S and D-S don’t perfectly match.
One can imagine a world in which these two had to coincide. In such a world
there would be two articulations of flying
planes can be dangerous (one corresponding to each interpretation) and
passives (where surface and underlying grammatical relations do not coincide)
would not exist. This is a perfectly conceivable universe, but it is not ours.
Why not? Why is D-S distinct from S-S? Why don’t they match one to one? Might
the mere fact that these two kinds of information are differentially encoded
support Chomsky’s recent suggestions that the mapping to articulation is a late
accretion and that the primary mapping is from something like D-S to SI? I
don’t know, but it is curious that our world is not more neatly arranged. And
that it is not, should be something we think about, and maybe one day address.

That’s it for now. The discovery of D-S launched the modern
GG enterprise. The existence of D-Sish facts and what they means for GG are now
part of the common wisdom. It is fair to say that D-S focused scientific
attention on LC and Plato’s problem. If that ain’t “deep” I don’t know what
could be.

[2]
Btw, this is almost certainly false once one starts thinking about how to
“abstract” out categories that allow for recursion. It is one thing to define
the VP in John saw the dog via these
simple techniques and another to define the VPs in John saw the dog that Bill thinks that Mary kissed using them. Once
we consider categories with recursive subparts the standard analytic techniques
quickly fail. Simple phrase structure might be statistically coaxed from
surface forms. Interesting ones with complex structure will not be.