Monday, 3 September 2012

What Chomsky doesn't get about child language

Noam Chomsky is widely regarded as an
intellectual giant, responsible for a revolution in how people think about
language.In a recent book by Chomsky
and James McGilvray, the
Science of Language, the foreword states: “It is particularly important to understand Chomsky’s views … not
only because he virtually created the modern science of language by himself ….
but because of what he and colleagues have discovered about language –
particularly in recent years…”

As someone who works on child
language disorders, I have tried many times to read Chomsky in order to appreciate
the insights that he is so often credited with.I regret to say that, over the years, I have come
to the conclusion that, far from enhancing our understanding of language
acquisition, his ideas have led to stagnation, as linguists have gone through
increasingly uncomfortable contortions to relate facts about children’s
language to his theories. The problem is
that the theories are derived from a consideration of adult language, and take
no account of the process of development.There is a fundamental problem with an
essential premise about what is
learned that has led to years of confusion and sterile theorising.

Let us start with Chomsky’s famous
sentence "Colourless green ideas sleep furiously". This was used to demonstrate independence of syntax and
semantics: we can judge that this sentence is syntactically well-formed even
though it makes no sense. From this, it was a small step to conclude that
language acquisition involves deriving abstract syntactic rules that determine
well-formedness, without any reliance on meaning. The mistake here was to
assume that an educated adult's ability to judge syntactic well-formednessin isolation has anything to do with how that
ability was acquired in childhood. Already in the 1980s, those who actually
studied language development found that children used a wide variety of cues, including syntactic,
semantic, and prosodic information, to learn language structure (Bates &
MacWhinney, 1989). Indeed,
Dabrowska (2010) subsequently showed that agreement on well-formedness of complex sentences was far from universal in adults.

Because he
assumed that children were learning abstract syntactic rules from the outset,
Chomsky encountered a serious problem. Language, defined this way, was not learnable by any usual
learning system: this could be shown by formal proof from mathematical learning theory.The
logical problem is that such learning is too unconstrained: any grammatical
string of elements is compatible with a wide range of underlying rule
systems.The learning becomes a bit
easier if children are given negative evidence (i.e., the learner is explicitly
told which rules are not correct), but (a) this doesn’t really happen and (b) even
if it did, arrival at the correct solution is not feasible without some prior
knowledge of the kinds of rules that are allowable. In an oft-quoted sentence, Chomsky (1965)
wrote: "A consideration of the
character of the grammar that is acquired, the degenerate quality and
narrowly limited extent of the available data, the striking uniformity of the
resulting grammars, and their independence of intelligence, motivation and
emotion state, over wide ranges of variation, leave little hope that much of
the structure of the language can be learned by an organism initially
uninformed as to its general character." (p. 58) (my italics).

So we were led to the inevitable, if surprising, conclusion that if grammatical
structure cannot be learned, it must be innate. But different languages have
different grammars. So whatever is innate has to be highly abstract – a
Universal Grammar. And the problem is
then to explain how children get from this abstract knowledge to the specific
language they are learning. The field became encumbered by creative but highly
implausible theories, most notably the parameter-setting account, which
conceptualised language acquisition as a process of "setting a switch" for a number of innately-determined
parameters (Hyams, 1986). Evidence, though, that children’s grammars actually changed
in discrete steps, as each parameter became set, was lacking. Reality was much
messier.

Viewed from a contemporary perspective, Chomsky’s concerns about the unlearnability of language
seem at best rather dated and at worst misguided. There are two key features
in current developmental psycholinguistics that were lacking from Chomsky’s account,
both concerning the question of what is learned. First, there is the question of
the units of acquisition: for Chomsky, grammar
is based on abstract linguistic units such as nouns and verbs, and it was
assumed that children operated with these categories. Over the past 15 years, direct
evidence has emerged to indicate that children don't start out with awareness
of underlying grammatical structure; early learning is word-based, and patterning in the input at the level of abstract elements is something children
become aware of as their knowledge increases (Tomasello, 2000).

Second, Chomsky viewed grammar as a rule-based system that determined allowable
sequences of elements. But people’s linguistic knowledge is probabilistic, not
deterministic.And there is now a large
body of research showing how such probabilistic knowledge can be learned from sequential
inputs, by a process of statistical learning. To take a very simple example, if
repeatedly presented with a sequence such as ABCABADDCABDAB, a learner will
start to be aware of dependencies in the input, i.e. B usually follows A, even
if there are some counter-examples.Other
types of sequence such as AcB can be learned, where c is an element that can
vary (see Hsu & Bishop, 2010, for a brief account). Regularly encountered sequences will then form higher-level units. At the time Chomsky was first writing,
learning theories were more concerned with forming of simple associations,
either between paired stimuli, or between instrumental acts and outcomes. These
theories were not able to account for learning of the complex structure of
natural language. However, once language researchers started to think in terms
of statistical learning, this led to a reconceptualisation of what was learned,
and many of the conceptual challenges noted by Chomsky simply fell away.

Current statistical learning accounts allow us to move ahead and to
study the process of language
learning.Instead of assuming that
children start with knowledge of linguistic categories, categories are
abstracted from statistical regularities in the input (see Special Issue 03, Journal of Child Language 2010, vol 37).
The units of analysis thus change as the child develops expertise.And, consistent with the earlier writings of
Bates and MacWhinney (1989), children's language is facilitated by the presence
of correlated cues in the input, e.g., prosodic and phonological cues in
combination with semantic context.In
sharp contrast to the idea that syntax is learned by a separate modular system
divorced from other information, recent research emphasises that the young
language learner uses different sources of information together. Modularity
emerges as development proceeds.

A statistical learning account does not, however, entail treating the
child as a “blank slate”. Developmental psychology has for many years focused
on constraints on learning: biases that lead the child to attend to particular features
of the environment, or to process these in a particular way. Such constraints will affect how language input is processed, but they are a long way from the notion of a Universal Grammar. And such constraints are
not specific to language: they influence, for instance, our ability to perceive
human faces, or to group objects perceptually.

It would be
rash to assume that all the problems of language acquisition can be solved by
adopting a statistical learning approach. And there are still big questions, identified by Chomsky and others – Why don’t
other species have syntax? How did language evolve? Is linguistic ability
distinct from general intelligence? But
we now have a theoretical perspective that makes sense in terms of what we know
about cognitive development and neuropsychology, that has general applicability
to many different aspects of language acquisition, which forges links between
language acquisition and other types of learning, and leads to testable
predictions. The beauty of this approach is that it is amenable both to
experimental test and to simulations of learning, so we can identify the kinds
of cues children rely on, and the categories that they learn to operate with.

So how does Chomsky respond to
this body of work?To find out, I
decided to take a look at The Science of Language, which based on transcripts
of conversations between Chomsky and James McGilvray between 2004 and 2009.It was encouraging to see from the preface that
the book is intended for a general audience and “Professor Chomsky’s
contributions to the interview can be understood by all”.

Well, as “one of the most influential
thinkers of our time”, Chomsky fell far short of expectation. Statistical learning and connectionism were not given serious consideration, but were rapidly dismissed as versions
of behaviourism that can’t possibly explain language acquisition. As noted by Pullum elsewhere,
Chomsky derides Bayesian learning approaches as useless – and at one point
claimed that statistical analysis of sequences of elements to find morpheme
boundaries “just can’t work” (cf. Romberg & Saffran, 2010). He seemed stuck with his critique of Skinnerian learning and ignorant of how things had changed.

I became interested in not just what Chomsky said, but how he said it. I’m afraid that despite the
reassurances in the foreword, I had enormous difficulty getting through this
book. When I read a difficult text, I usually take notes to summarise the main
points. When I tried that with the Science of Language, I got nowhere because
there seemed no coherent structure. Occasionally an interesting gobbet of information
bobbed up from the sea of verbiage, but it did not seem part of a consecutive argument.The style is so discursive that it’s
impossible to précis. His rhetorical approach seemed the antithesis of a scientific argument. He made sweeping
statements and relied heavily on anecdote.

A stylistic device commonly used
by Chomsky is to set up a dichotomy between his position and an alternative,
then represent the alternative in a way that makes it preposterous. For
instance, his rationalist perspective on language acquisition, which
presupposes innate grammar, is contrasted with an empiricist position in which “Language
tends to be seen as a human invention, an institution to which the young are
inducted by subjecting them to training procedures”. Since we all know that children learn language
without explicit instruction, this parody of the empiricist position has to be
wrong.

Overall, this book was a disappointment: one came away with a sense that a lot of
clever stuff had been talked about, and much had been confidently asserted, but
there was no engagement with any opposing point of view – just disparagement. And as Geoffrey Pullum concluded, in a review
in the Times
Higher Education, there was, alas, no science to be seen.

Correction: 4/9/2010. I had originally cited the wrong reference to
Dabrowska (Dabrowska, E. 1997. The LAD goes to school : a cautionary
tale for nativists. Linguistics, 35, 735-766). The 1997 paper is
concerned with variation in adults' ability to interpret syntactically
complex sentences. The 2010 paper cited above focuses on grammaticality
judgements.

One of the nice things about blogging is that it gives an
opportunity to get feedback on one’s point of view. I’d like to thank all those
who offered comments on what I’ve written here, particularly those who have
suggested readings to support the arguments they make.The sheer diversity of views has been
impressive, as is the generally polite and scholarly tone of the arguments.
I’ve tried to look seriously at the points people have made and I’ve had a
fascinating few weeks reading some of the broader literature recommended by
commentators.

I quickly realised that I could easily spend several months
responding to comments and reading around this area, so I have had to be
selective.I’ll steer clear of commenting on Chomsky’s
political arguments, which I see as quite a separate issue. Nor am I prepared
to engage with those who suggest Chomsky is above criticism, either because he
is so famous, or because he’s been around a long time.Finally, I won’t say more about the views of
those who have expressed agreement, or extensions of my arguments – other than
to say thanks: this is a weird subject area where all too often people seem
scared to speak out for fear of seeming foolish or ignorant. As Anon (4 Sept)
says, it can quickly get vitriolic, which is bad for everyone.But if we at least boldly say what we think,
those with different views can either correct us, or develop better
arguments.

I’ll focus in this reply on the main issues that emerged from the discussion: how far is statistical learning compatible with a
Chomskyan account, are there things that a non-Chomskyan account simply can’t
deal with, and finally, are there points of agreement that could lead to more
positive engagement in future between different disciplines?

How compatible is statistical learning with a Chomskyan
account?

A central point made by Anon, (3rd Sept/4th Sept), and Chloe
Marshall (11th Sept) is that probabilistic learning is compatible with Chomsky's views.

This seems to be an absolutely crucial point. If there
really is no mismatch between what Chomsky is saying and those who are
advocating accounts of language acquisition in terms of statistical learning,
then maybe the disagreement is just about terminology and we should try harder
to integrate the different approaches.

It’s clear we can differentiate between different levels of language processing.For instance, here are just three examples of
how statistical learning may be implicated in language learning:

The
original work by Saffran et al (1996) focused on demonstrating that
infants were sensitive to transitional probabilities in syllable strings. It
was suggested that this could be a mechanism that was involved in segmenting
words from speech input.

Redington et al (1998) proposed that information about lexical categories could be
extracted from language input by considering sequential co-occurrences of words.

Edelman and Waterfall (2007) reviewed evidence that children attend to specific patterns
of specific lexical items in their linguistic input, concluding that they first
acquire the syntactic patterns of particular words and structures and later generalize information to entire word classes. They went on to describe
heuristic methods for uncovering structure in input, using the example of the
ADIOS (Automatic DIstillation Of Structure) algorithm. This uses distributional
regularities in raw, unannotated corpus data to identify significant co-occurrences, which are used as the basis for distributional classes. Ultimately, ADIOS discovers recursive
rule-like patterns that support generalization.

So what does Chomsky make of all of this? I am grateful to
Chloe for pointing me to his 2005 paper “Three factors in language design”, which was
particularly helpful in tracing the changes in Chomsky’s views over time.

Here’s what he says on word boundaries:

“In Logical Structure of Linguistic Theory
(LSLT; p. 165), I adopted Zellig Harris’s (1955) proposal, in a different
framework, for identifying morphemes in terms of transitional probabilities,
though morphemes do not have the required beads-on-a-string property. The basic
problem, as noted in LSLT, is to show that such statistical methods of chunking
can work with a realistic corpus. That hope turns out to be illusory, as has
recently been shown by Thomas Gambell and Charles Yang (2003), who go on to
point out that the methods do, however, give reasonable results if applied to
material that is preanalyzed in terms of the apparently language-specific
principle that each word has a single primary stress. If so, then the early
steps of compiling linguistic experience might be accounted for in terms of
general principles of data analysis applied to representations preanalyzed in terms
of principles specific to the language faculty....”

Gambell and Yang don’t seem to have published in the peer-reviewed
literature, but I was able to track down four papers by these authors (Gambell & Yang, 2003; Gambell & Yang, 2004; Gambell & Yang, 2005a; Gambell & Yang, 2005b),which all make
essentially the same point. They note that a simple rule that treats a
low-probability syllabic transition as a word boundary doesn’t work with a
naturalistic corpus where a high proportion of words are monosyllabic.However, adding prosodic information –
essentially treating each primary stress as belonging to a new word – achieves
a much better level of accuracy.

The work by Gambell and Yang is exactly the kind of research
I like: attempting to model a psychological process and evaluating results
against empirical data. The insights gained from the modelling take us forward.
The notion that prosody may provide key information in segmenting words seems
entirely plausible. If generative grammarians wish to refer to such a cognitive
bias as part of Universal Grammar, that’s fine with me.As noted in my original piece, I agree that
there must be some constraints on learning; if UG is confined to this kind of
biologically plausible bias, then I am happy with UG. My difficulties arise with
more abstract and complex innate knowledge, such as are involved in parameter
setting (of which, more below).

But, even at this level of word identification, there are
still important differences between my position and the Chomskyan one. First of
all, I’m not as ready as Chomsky to dismiss statistical learning on the basis
of Gambell and Yang’s work. Their model assumed a sequence of syllables was a
word unless it contained a low transitional probability. Its accuracy was so
bad that I suspect it gave a lower level of success than a simpler strategy:
“Assume each syllable is a word.”But
consider another potential strategy for word segmentation in English, which
would be “Assume each syllable is a complete word unless there’s a very high
transitional probability with the next syllable.” I’d like to see a model like
that tested before assuming transitional probability is a useless cue.

Second, Gambell and Yang stay within what I see as a
Chomskyan style of thinking which restricts the range of information available
to the language processor when solving a particular problem.This is parsimonious and makes modelling
tractable, but it’s questionable just how realistic it is. It contrasts sharply
with the view proposed by Seidenberg and MacDonald (1999), who argue that cues
that individually may be poor at solving a categorisation problem, may be much
more effective when used together. For instance, the young child doesn’t just
hear words such as ‘cat’, ‘dog’, ‘lion’, ‘tiger’, ‘elephant’ or ‘crocodile’: she
typically hears them in a meaningful context where relevant toys or pictures
are present. Of course, contextual information is not always available and not
always reliable. However, it seems odd to assume that this contextual
information is ignored when populating the lexicon. This is one of the core
difficulties I have with Chomsky: the sense that meaning is not integrated in
language learning.

Turning to lexical categories, the question is whether
Chomsky would accept that these might be discovered by the child through a
process of statistical learning, rather than being innate.I have understood that he’d rejected this idea, and
have not found any statement by him to suggest otherwise, but others may be
able to point to these. Franck Ramus (4th Sept) argues that children do represent
some syntactic categories well before this is evident in their language and
this is not explained by statistical relationships between words. I’m not convinced by the evidence he cites, which is based on different brain responses
to grammatical and ungrammatical sentences in toddlers (Bernal et al, 2010).
First, the authors state: “Infants could therefore not detect the
ungrammaticality by noticing the co-occurrence of two words that normally never
occur together”.But they don’t present
any information on transitional probabilities in a naturalistic corpus for the
word sequences used in their sentences. All that is needed is for statistical
learning is for the transitional probabilities to be lower in the ungrammatical
than grammatical sentences: they don't have to be zero.Second, the children in this study were two
years old, and would have been exposed to a great deal of language from which
syntactic categories could have been abstracted by mechanisms similar to those
simulated by Redington et al.

Regarding syntax, I was pleased to be introduced to the work
of Jeffrey Lidz, whose clarity of expression is a joy after struggling with
Chomsky. He reiterates a great deal of what I regard as the ‘standard’
Chomskyan view, including the following:

“Speaking broadly, this research generally finds that
children’s representations do not differ in kind from those of adults and that
in cases where children behave differently from adults, it is rarely because
they have the wrong representations. Instead, differences between children and
adults are often attributed to task demands (Crain & Thornton, 1998),
computational limitations (Bloom,1990; Grodzinsky & Reinhart, 1993), and the problems of pragmatic integration (Thornton & Wexler, 1999) but only rarely to
representational differences between children and adults (Radford, 1995; see
also Goodluck, this volume).” Lidz, 2008

The studies cited by Lidz as showing that
children’s representations are the same as adults – except for performance
limitations – has intrigued me for many years. As someone who has long been
interested in children’s ability to understand complex sentence structures, I long
ago came to realise that the last thing children usually attend to is syntax:
their performance is heavily influenced by context, pragmatics, particular
lexical items, and memory load. But my response to this observation is very
different from that of the generative linguists. Whereas they strive to devise
tasks that are free of these influences, I came to the conclusion that they play a key part in language acquisition.Again, I find myself in
agreement with Seidenberg and MacDonald (1999):

“The apparent complexity of language and its uniqueness vis
a vis other aspects of cognition, which are taken as major discoveries of the
standard approach, may derive in part from the fact that these ‘performance’
factors are not available to enter into explanations of linguistic structure.
Partitioning language into competence and performance and then treating the
latter as a separate issue for psycholinguists to figure out has the effect of excluding many aspects of language structure and use from the data on which the
competence theory is developed.” (p 572)

The main problem I have with Chomskyan theory, as I
explained in the original blogpost, is the implausibility of parameter setting
as a mechanism of child language acquisition. In The Science of Language,
Chomsky (2012) is explicit about parameter-setting as an
attractive way out of the impasse created by the failure to find general UG
principles that could account for all languages.Specifically, he says:

“If you’re trying to get Universal Grammar to be articulated
and restricted enough so that an evaluation will only have to look at a few
examples, given data, because that’s all that’s permitted, then it’s going to
be very specific to language, and there aren’t going to be general principles
at work. It really wasn’t until the principles and parameters conception came
along that you could really see a way in which this could be divorced. If
there’s anything that’s right about that, then the format for grammar is
completely divorced from acquisition; acquisition will only be a matter of
parameter setting. That leaves lots of questions open about what the parameters
are; but it means that whatever is left are the properties of language.”

I’m sure readers will point out if I’ve missed anything, but
what I take away from this statement is an admission that UG is now seen as
consisting of very general and abstract constraints on processing that are not
necessarily domain-specific.The
principal component of UG that interests Chomsky is

“an operation that enables you to take mental
objects [or concepts of some sort], already constructed, and make bigger mental
objects out of them.That’s Merge. As
soon as you have that, you have an infinite variety of hierarchically
structured expressions [and thoughts] available to you.”

I have no difficulty in agreeing
with the idea that recursion is a key component of language and humans have a capacity for this kind of processing.But Chomsky makes another claim that I find much harder to
swallow. He sees the separation of UG from parameter-setting as a solution to
the problem of acquisition; I see it as just moving the problem elsewhere.For a start, as he himself notes, there are
“a lot of questions open” about what the parameters are.Also, children don’t behave as if parameters
are set one way or another: their language output is more probabilistic. I was
interested to read that modifications of Chomskyan theory have been proposed to
handle this:

“Developing suggestions
of Thomas Roeper’s, Yang proposes that UG provides the neonate with the full array of possible languages, with all parameters
valued, and that incoming experience shifts the probability distribution over
languages in accord with a learning function that could be quite general. At
every stage, all languages are in principle accessible, but only for a few are
probabilities high enough so that they can actually be used.” (Chomsky, 2005,
p. 9).

So not only can the theory can be adapted to handle probabilistic
data; probability now assumes a key role, as it is the factor that decides
which grammar will be adopted at any given point in development.But while I am pleased to see the
probabilistic nature of children’s grammatical structures acknowledged, I still
have problems with this account:

First, it is left unclear why a child opts
for one version of the grammar at time 1 and another at time 2, then back to
the first version at time 3. If we want an account that is explanatory rather than merely descriptive, then non-deterministic behaviour needs explaining.It could reflect the behaviour of a system that
is rule-governed but is affected by noise or it could be a case of different
options being selected according to other local constraints. What seems less
plausible –though not impossible -is a
system that flips from one state to another with a given probability. In a similar vein,if a grammar has an optional
setting on a parameter, just what does that mean?Is there a random generator somewhere in the
system that determines on a moment-by-moment basis what is produced,or are there local factors that constrain
which version is preferred?

Second, this account ignores the fact that early usage of
certain constructions is influenced by the lexical items involved (Tomasello, 2006), raising questions about just how abstract the syntax
is.

Third, I see a clear distinction between saying that a child
has the potential to learn any grammar, and saying that the child has available
all grammars from the outset, “with all parameters valued”. I’m happy to agree
with the former claim (which, indeed, has to be true, for any
typically-developing child), but the latter seems to fly in the face of
evidence that the infant brain is very different from the adult brain, in terms
of number of neurons, proportion of grey and white matter, and
connectivity.It’s hard to
imagine what the neural correlate of a “valued parameter” would be. If the
“full array of languages” is already available in the neonate, then how is it
that a young child can suffer damage to a large section of the left cerebral
hemisphere without necessarily disturbing the ultimate level of language
ability (Bishop, 1988)?

Are there things that only a Chomskyan account can explain?

Progress, of course, is most likely when people do disagree,
and I suspect that some of the psychological work on language acquisition might
not have happened if people hadn’t taken issue with being told that
such-and-such a phenomenon proves that some aspect of language must be
innate.Let me take three such examples:

1.Optional
infinitives.I remember many years ago
hearing Ken Wexler say that children produce utterances such as “him go
there”, and arguing that this cannot have been learned from the input and so
must be evidence of a grammar with an immature parameter-setting.However, as Julian Pine pointed out at the
same meeting, children do hear sequences such as this in sentences such as “I
saw him go there”, and furthermore children’s optional infinitive errors tend
to occur most on verbs that occur relatively frequently as infinitives in
compound finite constructions (Freudenthal et al., 2010).

2. Fronted interrogative verb auxiliaries. This is a classic
case of an aspect of syntax that Chomsky (1971) used as evidence for Poverty of
the Stimulus – i.e., the inadequacy of language input to explain language
knowledge. Perfors et al (2010) take this example and demonstrate that it is
possible to model acquisition without assuming innate syntactic knowledge. I’m
sure many readers would take issue with certain assumptions of the modelling,
but the important point here is not the detail so much as the demonstration
that some assumptions about impossibility of learning are not as watertight as
often assumed: a great deal depends on how you conceptualise the learning
process.

3. Anaphoric ‘one’. Lidz et al (2003) argued that toddlers
aged around 18 months manage to work out the antecedent of the anaphoric
pronoun “one” (e.g. “Here’s a yellow bottle. Can you see another one?”), even
though there was insufficient evidence in their language input to disambiguate
this. The key issue is whether “another one” is taken to mean the whole noun
phrase, “yellow bottle”,or just its
head, “bottle”.Lidz et al note that in
the adult grammar the element “one” typically refers to the whole constituent
“yellow bottle”. To study knowledge of this aspect of syntax in infants, they
used preferential looking: infants were first introduced to a phrase such as
“Look! A yellow bottle”. They were then presented with two objects: one
described by the same adjective+noun combination (e.g. another yellow bottle),
and one with the same noun and a different adjective (e.g. a blue bottle).Crucially, Lidz et al claimed that
18-month-olds would look significantly more often to the yellow (rather than
blue) bottle when asked “Do you see another one?”, i.e., treating “one” as
referring to the whole noun phrase, just like adults. This was not due to any
general response bias, because they showed the opposite bias (preference for
the novel item) if asked a control question “What do you see now?” In
addition Lidz et al analysed data from the CHILDES database and concluded that,
although adults often used the phrase “another one” when talking to young
children, this was seldom in contexts that disambiguated its reference.

This study stimulated
a range of responses from researchers who suggested alternative explanations; I
won’t go into these here, as they are clearly described by Lidz and Waxman
(2004), who go carefully through each one presenting arguments against it. This
is another example of the kind of work I like – it’s how science should
proceed, with claim and counter-claim being tested until we arrive at a resolution. But is the answer clear?

My first reaction to the original study was simply that I’d
like to see it replicated: eleven children per group is a small sample size for
a preferential looking study, and does not seem a sufficiently firm foundation
on which to base the strong conclusion that children know things about syntax
that they could not have learned. But my second reaction is that, even if this
replicates, I would not find the evidence for innate knowledge of grammar
convincing. Again, things look different if you go beyond syntax. Suppose, for
instance, the child interprets “another one” to mean “more”. There is reason to
suspect this may occur, because in the same CHILDES corpora used by Lidz, there
are examples of the child saying things like “another one book”.

On this interpretation, the Lidz task would still pose a challenge,
as the child has to decide whether to treat “another one” as referring to the
specific object (“yellow bottle”), or the class of objects (“bottle”).If the former is correct, then they should
prefer the yellow bottle. If the latter, then there’d be no preference. If
uncertain, we’d expect a mixture of responses, somewhere between these
options.So what was actually
found?As noted above, children given
the control sentence “What do you see now?” there was a slight bias to pick the
new item and so the old item (yellow bottle) was looked at for only an average
of 43% of the time (SD = 0.052). For children asked the key question: “Do you
see another one?” the old item (yellow bottle) was looked at on average 54% of
the time (SD = 0.067). The difference between the two instruction types is
large in statistical terms (Cohen’s d = 1.94), but the bias away from chance is
fairly modest in both cases.If I’m
right and syntax not the most crucial factor for determining responses, then we
might find that the specific test items would affect performance:e.g., a complex noun phrase that describes a
stable entity (e.g. a yellow bottle) might be more likely to be selected for
“another one”than an object in a
transient state (e.g. a happy boy). [N.B. My thanks to Jeffrey Lidz who kindly provided raw data that are the basis of the results presented above].

Points of agreement – and disagreement – between generative
linguists and others

The comments I have received give me hope that there may be
more convergence of views between Chomskyans and those modelling language
acquisition than I had originally thought. The debate betweenconnectionist ‘bottom up’ and Bayesian ‘top
down’ approaches to modelling language acquisition highlighted by Jeff Bowers
(4th Sept) and described by Perfors et al (2011) gets back to basic issues
about how far we need a priori abstract symbolic structures, and how far these
can be constructed from patterned input. I emphasise again that I would not
advocate treating the child as a blank slate. Of course, there need to be
constraints affecting what is attended to and what computations are conducted
on input. I don’t see it as an either (bottom up)/or (top down) problem.The key questions have to do with what
top-down constraints are and how domain-specific they need to be, and just how
far one can go with quite minimal prior specification of structure.

I see these as empirical questions whose answers need to
take into account (a) experimental studies of child language acquisition and
(b) formal modelling of language acquisition using naturalistic corpora as well
as (c) the phenomena described by generative linguists, including intuitive
judgements about grammaticality etc.

I appreciate the patience of David Adjer (Sept 11th) in
trying to argue for more of a dialogue between generative linguists and those
adopting non-Chomskyan approaches to modelling child language.Anon (Sept 4th) has also shown a willingness
to engage that gives me hope that links may be forged between those working in
the classic generative tradition and others who attempt to model language
development.I was pleased to be nudged by Anon (4th Sept) into reading
Becker et al (2011), and agree it is an example of the kind of work that is
needed: looking systematically at known factors that might account for observed
biases, and pushing to see just how much these could explain. It illustrates
clearly that there are generative linguists whose work is relevant for
statistical learning.I still think,
though, that we need to be cautious in concluding there are innate biases, especially when
the data come from adults, whose biases could be learned. There are always
possible factors that weren’t controlled – e.g. in this case I wondered in this
case about age of acquisition effects (cf. data from a very different kind of
task by Garlock et al, 2001).But overall, work like this offers reassurance that not all generative linguists live in a Chomskyan silo - and if I implied that they did, I apologise.

When Chomsky first wrote on this topic, we did not have
either the corpora or the computer technology to simulate naturalistic language
learning. It still remains a daunting task, but I am impressed at what has been
achieved so far.I remain of the view
that the task of understanding language acquisition has been made unduly
difficult by adopting a conceptualisation of what is learned that focuses on
syntax as a formal system that is learned in isolation of context and
meaning.Like Edelman and Waterfall
(2007) I also suspect that obstacles have been created by the need to develop a
‘beautiful’ theory, i.e. one that is simple and elegant in accounting for
linguistic phenomena. My own prediction is that any explanatorily adequate
account of language acquisition will be an ugly construction, cobbled together
frombits and pieces of cognition, and
combining information from many different levels of processing. The test will
ultimately be if we can devise a model that can predict empirical data from
child language acquisition. I probably won’t live long enough, though, to see
it solved.

37 comments:

I am afraid, I have seen a simililar narrative about Chomsky in many places, especially his unsophisticated (read, non-probabilistic) approach to language acquisition, yet I am less that convinced by it.

Here are my reasons for it:1) Just a reading of his 1955 dissertation will allow the reader to infer that he in fact suggested statistical learning in language acquisition in it(especially of word-learning). Although, he was somewhat circumspect about its possible success. (Section 1 of this paper has a very nice, but brief, summary of this - http://www.ling.upenn.edu/~ycharles/papers/crunch.pdf)

2) I have in other places, seen him give positive reviews of probabilistic work, especially of Charles Yang's work if I remember correctly.

3) This paper, while slightly old from current perspectives, is still very sophisticated. (Chomsky, N. Three models for the description of language. IRE Transactions on Information Theory 2 (1956), 113--124.)

4) He himself has done some information theoretic stuff, and if I remember correctly, was rather disillusioned by the results.

I can see someone disagree with his viewing of the problem and his suggested solutions. Especially his separation of competence/performance is something many non-linguists are uncomfortable with. However, to call his views dated would be to not do justice to the facts.

I agree with you that his comments of bayesian approaches at the UCL lecture were somewhat surprising, but given his track-record on the issues, I am far more wont to take a charitable interpretation of his words - something to the effect of "statistical/probabilistic methods when uninformed by theoretical linguistic constructs have been a failure".

I agree mine is as much a narrative as yours, but I am afraid I can't see how one could maintain what you said given what he has said in print.

As far as him being stuck on Skinnerian learning and of how things had changed, I have to say I think the world is stuck on his critique of Skinner, not him. He rarely if ever refers to it in his work in print. Most of his discussions of the issue seem to be in response to a question in an interview or talk. In fact, I am not sure he has discussed Skinner at length in print past the late 1950's, perhaps early 1960's.

Anonymous (You really don’t need to be):I’ve come across similar ripostes to critiques of Chomsky before: i.e. your arguments don’t stand because you haven’t read everything he has ever written. This just isn’t good enough. For a start, here’s a direct quote from Science of Language – specifically on connectionism, which is one approach to probabilistic learning:“…connectionism seems to me about the level of corpuscularianism in physics. ….They’re abstracting radically from the physical reality, and who knows if the abstractions are going in the right direction? But, like any other proposal, you evaluate it in terms of its theoretical achievements and empirical consequences. It happens to be quite easy in this case, because they’re almost non-existent.” And later: “The learning thesis is a variation on behaviourism and old-style associationism” (Sorry, can’t give page refs, as this is on Kindle)There’s quite a lot more in this vein.I’m not surprised that you can provide cases where he has expressed different views. Much of what he says is opaque and allusive and not always internally consistent. I’m really not interested in the odd sentence here or there. I’m interested in whether or not he accepts that his view of what children learn has been seriously challenged, and I’d like to know what defence he offers to the challenge. I see no evidence that he has engaged with the literature on statistical learning at any kind of serious level. This despite the fact that this is a burgeoning area of research.But there’s a much more serious point than what Chomsky has or has not said in print. My point is that if he were to accept that language learning starts by abstracting phonological regularities from speech input (aided by contextual cues), identifying probabilistic patterns that correspond to morphemes, and only later detecting regularities that correspond to conventional word classes, then the whole rationale for his approach would be undermined. The arguments that language is not learnable evaporate. With them goes the need to postulate Universal Grammar. The whole problem of language acquisition that was tortuously addressed by inventing parameter setting is no longer a problem. There is therefore a very good reason why Chomsky does not engage with this literature. If you take it seriously, decades of work in developmental linguistics in the Chomskyan tradition become an irrelevance.

"My point is that if he were to accept that language learning starts by abstracting phonological regularities from speech input (aided by contextual cues), identifying probabilistic patterns that correspond to morphemes, and only later detecting regularities that correspond to conventional word classes, then the whole rationale for his approach would be undermined. The arguments that language is not learnable evaporate. With them goes the need to postulate Universal Grammar."

I agree with almost everything in the first sentence, except the inference. Nowhere in the input is there anything that suggests what to keep track of, what abstractions/generalisations one makes. To be more specific, there are a lot of statistical facts/correlations in the linguistic input, yet people aren't generalising from all of them. The set of generalisations made is a subset of those that have statistical support. Note, this is very similar to Chomsky's poverty of stimulus argument. (refs: http://www.phonologist.org/projects/surfeit/)

On his opacity and general incoherence, this is clearly a subjective opinion. There are surely many amongst the probabilistically sophisticated, who might disagree with some of his views surely, but would beg to differ from your general assessment.

I have to say, these criticisms of Chomsky, in less well put form, have regularly occurred to me, in the course of several goes at trying to understand his work on language. I gave up trying, thinking that I must be missing something or too dumb to understand it all.

My feeling is that the contrast between the two comments above nicely captures the problem with him and his body of work, written and elsewhere - he seems to regularly contradict himself, or at best remain frustratingly allusive (and elusive) when it comes to the major controversies. I can certainly say that the same pattern can be seen in his writing on political issues as well, and interestingly a brief trip to google tells me that plenty of people have even written about the presence of an inherent contradiction between his views in his two main areas of politics and linguistics.

I wonder what other areas (perhaps of psychology / neuroscience) are dominated by views of major figures who don't take into account the way in which something is thought to have developed?

For those interested in pursuing the debate further, I have just been pointed to this site which has many relevant references to critics of Chomskyhttp://timothyjpmason.com/WebPages/LangTeach/CounterChomsky.htm

The brilliant child psychiatrist, Stanley Greenspan, MD, ended the debate with publication of The First Idea, How symbols, language, and intelligence evolved from our primate ancestors to modern humans. Greenspan writes in his classic book that the relationship between infant and adult is much more complex than previously thought and the development process is the key to language and intelligence. Greenspan also ends the Cartesian concept of reason, elucidating that thought and intelligence are emotion oriented. You must have emotion to have intelligence.

I believe Dr. Greenspan puts the argument to rest and shows that Chomsky is out of date.

Where I think Chomsky went wrong was not in the idea that children learn some (semi-)autonomous syntax, which they do after all seem to wind up in possession of eventually, but rather in having the wrong notion of 'fit', as explored in the apparently exploding field of Bayesian Syntax learning (Mike Dowman, Anne Hsu, Hick Chater, Paul Vitanyi, Lisa Pearl and Amy Perfors being some of the more productive recent contributors).

Thanks to all for comments.Anonymous: Have you read any of the papers in the Special Issue of Journal of Child Language that I referenced in my post? Alternatively, this paper tackles many of the issues about how far grammar can be learned from input: Edelman, S., & Waterfall, H. (2007). Behavioral and computational aspects of language and its acquisition. Physics of Life Reviews, 4, 253-277. Avery: Thanks for the reference to Christina Behme. There's a more direct link to her paper here: http://tinyurl.com/dxmr77dI would not like my exasperation with Chomsky’s rhetorical style to sidetrack us from a consideration of his arguments. The main point I want to stress is not just that Science of Language is unscholarly in its style (as argued also by Behme and Pullum) but that the fundamental premise of Chomsky’s theorising is wrong. In some ways I am more extreme than other critics: I am not just arguing for an alternative approach to child language acquisition, I’m also saying that we might have made much more progress 40 years ago if Chomsky had not blighted the field with his theorising. Children clearly do learn grammars which can be expressed in terms of sequences of abstract lexical categories: but (a) this knowledge is probabilistic and not deterministic and (b) knowledge of lexical categories is not something that they start out with.

I repeat that I agree with you where you are factual. On your two points, I have shown you and can shown many other instances that (a) is perfectly compatible with both Chomsky's views and modern generative theories. (b) is perhaps more debated in generative theorising, but if you are talking only about Chomsky, he himself has questioned the need for lexical categories in the traditional sense (even at the adult grammar level) in a whole chapter level discussion (Ch. 4) in The Minimalist Program, written in 1995. It is not a big leap to assume that he would perhaps say the same about children.

Again, very little of the claims about UG you made earlier follow from the data you show. It is like arguing that people need to learn morphemes, so everything is learnt. Just because somethings are learnt doesn't mean everything is. [Note: I am using "learn(ing)" with the meaning that you imply - knowledge of something that was not there before.]

All I can see is that you are making as big a leap of logic that you claim Chomsky (and perhaps other Chomskyans) makes.

I think there is a need to honestly acknowledge that both sides of the debate are bringing their set of biases to the table. And unfortunately, neither side is honest about it, and is instead wrangled in rhetorical posing, and even misdirection. Neither side attempts a full-faith attempt to understand what the other side is saying. In most places, there is more consensus than the rhetoric will let you believe. All because people highlight their preferred assumptions, and push others under the carpet.

The debate itself has been good in my opinion in forcing people to do innovative work. However, the vitriolic rhetoric has been horrible.

Full disclosure: I haven't read all the papers of the issue you refer to. But, I am guessing you hadn't read the Becker et al paper I mentioned too. So, I am not sure where that question was going. We are coming to the discussion with different backgrounds, clearly.

Two comments, one concerning the link between Chomsky’s linguistics and politics, and one about his linguistics.

In one of the replies above, Charlie Wilson writes that there are similarities between Chomsky’s politics and his linguistics, both suffering from internal inconsistencies, and both hard to follow. But what is so striking about Chomsky’s politics is that they are almost completely untheoretical, and very easy to understand. He points out self-evident truisms (e.g., we should apply the same standards to ourselves as we do to others) and cites lots of lots of data (much of it that you have never heard of before) to make very straightforward conclusions. One of the common claims is that he is inconsistent in that he criticizes the US more than other states. Here is his very straightforward reply.

"My own concern is primarily the terror and violence carried out by my own state, for two reasons. For one thing, because it happens to be the larger component of international violence. But also for a much more important reason than that: namely, I can do something about it. So even if the US was responsible for 2% of the violence in the world instead of the majority of it, it would be that 2% I would be primarily responsible for. And that is a simple ethical judgment. That is, the ethical value of one's actions depends on their anticipated and predictable consequences. It is very easy to denounce the atrocities of someone else. That has about as much ethical value as denouncing atrocities that took place in the 18th century."

So, however bad this book his, and however bad his linguistics, itprovides no basis for questioning his politics.

What about his linguistics? Given Dorothy’s comments, I’ll not be buying the book. But I do think Chomsky may be right about one important point that was not emphasized enough. Chomsky’s main point against Skinner was that associationism is not enough to explainlanguage, and to the extent that statistical learning is implementedby connectionist networks associated with the PDP camp, I think his critique still applies. This is also the view of people like Pinker, who rejects “eliminative connectionism” (networks that eliminatesymbols) in favour of neural networks that include symbols. And why Hummel Develops “symbolic networks”, as opposed to PDP networks. These authors have are far from Chomskyians (Pinker seems to dislike a lot about Chomsky, including his politics), but have adopted one of his central tenants that you need symbols and rules, not onlyassociations.

It is also worth noting that Bayesian theorists are often very much insympathy with the importance of abstract symbols, which puts them atodds with the PDP camp (see recent debate in TICS). Here is a quotefrom a paper I just downloaded from Tenenbaum’s website that highlights how some key Bayesian theorists are not just adopting another form of statistics learning – they are adopting a key part of the non-associationist position:

“Second, our approach offers a way to tease apart two fundamentaldimensions of linguistic knowledge that are often conflated in thelanguage acquisition literature. The question of whether humanlearners have (innate) language-specific knowledge is logicallyseparable from the question of whether and to what extent humanlinguistic knowledge is based on structured symbolic representationslike generative phrase-structure grammars. Few…cognitive scientists have explored the possibility that explicitly structured mental representations might be constructed or learned via domain-general learning mechanisms.

That said, I would point out that I’m not Bayesian!http://www.ncbi.nlm.nih.gov/pubmed/22545686

hi Dorothy,For once I will disagree to a large extent (although not entirely). This would deserve a whole paper, but here are a few succinct points:- the idea that children's language (and even worse, their linguistic awareness skills) can provide any meaningful measure of their linguistic knowledge is ridiculous, and as far as I know has been rejected long ago. See also my more recent discussion of child phonology: http://www.lscp.net/persons/ramus/docs/Labphon10Ramus.pdf- there is in fact evidence that young children represent some syntactic categories well before their language can show any sign of that (let alone awareness), and without this being attributable to mere statistical relationships between words. See for instance:http://www.lscp.net/persons/anne/papiersPDF/Bernal-Dehaene-Lambertz-Millotte-Christophe-DevSci2010.pdf- more generally, while statistical learning is certainly much more important than Chomsky will admit (including for learning and use of an abstract grammar), I haven't seen any meaningful theory of language acquisition solely based on statistical learning. Tomasello's has very low explanatory power and largely relies on unproven magic.- what sense does it make to judge Chomsky's ideas on the basis of transcriptions of interviews?- at the same time, I agree that he has become totally unreadable for anybody working outside the minimalist program, and I dislike his dismissive attitude to basically everything else. But this is quite independent from the broader framework that he has set, which may remain very fruitful, and despite falling short of many of its promises, does not currently have any credible competitor in my opinion.- an example of work that I find more credible is that of Paul Smolensky, who does realistic statistics-based connectionist modelling, while keeping with the standard notion of an abstract grammar being acquired by the child, and trying to explain how this can happen for real. This is the sort of direction where I would place my bets (but not a cent on Tomasello!).

Thanks for writing this post. But you don't go far enough. Chomsky is not only wrong about language learnability. He is profoundly wrong about language. I wrote a post a few years back that I don't consider him to be a very good linguist: http://metaphorhacker.net/2010/08/why-chomsky-doesnt-count-as-a-gifted-linguist/

But I don't think I went far enough - particularly in light of this recent hagiography. The bit about discovering more about language in the last 20 years than in all of previous history appeared somewhere else earlier and it just made me laugh. All it means is they were able to refine GB with minimalism to work out the kinks in the theory. No new insights on language were generated during that process.

Normally, I would just ignore Chomsky (as most sensible linguists seem to do these days) but his effect has been particularly pernicious in the applied sciences. I constantly have to correct language teachers that nothing they ever come in contact with is in any way impacted by what Chomsky and his acolytes have to say about learnability. Not vocabulary, not pronunciation, not spelling! The problem is that the Chomskean argument for Universal Grammar is so technical and hard to understand for anyone who is not actually immersed in it, that a cursory look will just result in a complete distortion. My advice to language practitioners is simply to ignore Chomsky in their work.

I have no doubt that he is wrong about pretty much everything, and I am convinced that in 50 years once his political star has faded, he will be seen as a blip on the linguistic scene and Universal Grammar will have the credibility phrenology has today. But even if I and all his other critics are wrong and Chomsky's been right all along, it won't change the fact, that what he has to say is completely irrelevant to what language teachers do, speech therapists do or even language acquisition researchers do. It has nothing to say about acquiring knowledge of the world so essential to using language, register, bilingualism, diglossia, language change, language death, language politics, literature, poetry, figurative language, sociolinguistic variation, and so on.

We can have all the arguments about research evidence and combinatorics. I just normally ask the Universal Grammar advocates to name off the top of their head 20 principles and explain the process through which the right parameters are set for their instantiation in my knowledge of language required to produce this sentence. I have yet to receive an answer. They have 50 years to prove me wrong.

Sorry, I posted my screed before Franck Ramus commented, otherwise I would have inorporated a response into the first comment.

I think the syntactic categories problem assumes a stable set of categories that cannot simply be substantiated by analysis of actual language in actual use. Which is also why the statistical learning alternative is a dead end. It provides a useful model for language but ignores too many inputs along the way to actually be language works.

Franck comments that the approach continues to be fruitful but I would ask how? To say it has underdelivered is a massive understatement. It is not even used in things like machine translation that much anymore which is where it once showed great promise. I cannot think of a single thing that minimalism has achieved that is relevant outside the generative paradigm. But maybe I'm missing something big

I always found Thomasello's and Dabrowska's work very compelling and frankly I don't see the magic required to make it work. Most importantly, it seems consistent with how we learn most of what we know about the world. However, the magic of the Language Acquisition Device has always boggled my mind. It requires a level of modularity of mind that can surely be hard to sustain in the face of so much messiness coming from all the data points.

I feel out of depth commenting on this article. Is there a guide to the statistical approach for a lay-person? Particularly on this idea of the being no (solid) syntactic categories, and the (not so) fully formed grammar arising from statistical learning.

I can believe UG is unhelpful for figuring out how a child goes about learning a language, and that statistical learning may be used for learning words and early phrases. I thought though (and this is where understanding arguments about syntactic categories would help) that the fully formed grammar is too complex. How does this approach explain the final state attained? Is there still a generative procedure?

I agree with your general tenor re. the thrall that Chomsky seems to hold over at least some parts of linguistics (although it’s probably not his fault that people take what he says quite so seriously). However, I wondered whether your comments re. the learnability issue were quite right – formal learnability really isn’t my area, so no doubt better-informed people can correct me.

My concern stems from when you say “Because he assumed that children were learning abstract syntactic rules from the outset, Chomsky encountered a serious problem. Language, defined this way, was not learnable by any usual learning system: this could be shown by formal proof from mathematical learning theory.”. I presume here you’re referring to Gold’s Theorem (Gold, 1967). My understanding is that Gold’s result applies to any set of languages where you have an infinitely nested set of languages stacked one inside the other, like Russian dolls – that might apply to languages which can be described by abstract syntactic rules, but abstract syntactic rules aren’t required for Gold’s argument to work. In other words, the learnability problem doesn’t follow solely from the idea that children learn abstract syntactic rules.

You then go on to talk about how a reconceptualisation of what is learned solves the learnability problem (again, presumably referring back to Gold), in terms of whether rules develop from specific to abstract, and whether grammatical representations are probabilistic or not. Again, I agree with your general point that Chomsky doesn’t really seem to be particularly interested in the relevant literature here, and is quite dismissive of some of it. But I don’t think either of these points addresses the specific learnability problem identified by Gold, again because his theorem doesn’t make any assumptions about the representations underpinning languages – it just requires the Russian doll configuration I mentioned above.

That’s not to say that Gold’s Theorem is particularly relevant. As explained in the excellent Johnson (2004), in constructing his proof, Gold defined some very strict conditions on learnability which don’t actually seem that relevant for real language learning. Gold assumes that learners may be required to learn any language from a class of languages (when in fact they may be unlikely to encounter some languages, e.g. as a result of cultural evolutionary processes disfavouring less learnable languages, a point which I think is made in Zuidema, 2003), and Gold assumes that learning has to be possible in the face of any possible data set generated from the target language (including pathologically unhelpful data sets: his proof says nothing about learnability from data that is more representative, or even data that is designed to be helpful, as might be the case for child-directed speech). Basically I think your point still stands – the learnability claims are probably over-emphasised in some sections of the literature. But I also think it’s important to point out that, at least as I understand it, (1) there is no mathematical result proving that language is unlearnable (in a sense that’s meaningful to developmentalists, rather than the sense Gold uses in his proof), and (2) recent advances e.g. in statistical learning research don’t in any way ‘disprove’ Gold’s original result.

This is rather too easy a target, no ? You comment on someone's work from the perspective of someone who has seen 40 years of research stimulated by Chomsky's ideas. It's a little unfortunate to dismiss many of his theories as not borne out by subsequent experiment etc. You'll be happy to see your own work evaluated this way in 2040 - if it has generated the extraordinary variety of work, research, interest, development as Chomsky's did ?

Hi Dorothy,Like you I find Chomsky’s work very difficult to read, and unlike you I’ve not attempted to work my way though “The Science of Language”. But in his paper “Three factors in language design” (2005, p.6, available here, http://www.biolinguistics.uqam.ca/Chomsky_05.pdf), Chomsky writes:“Assuming that the faculty of language has the general properties of other biological systems,we should, therefore, be seeking three factors that enter into the growth of language in the individual:1. Genetic endowment, apparently nearly uniform for the species, which interprets part ofthe environment as linguistic experience, a nontrivial task that the infant carries outreflexively, and which determines the general course of the development of the languagefaculty. 2. Experience, which leads to variation, within a fairly narrow range, as in the case of othersubsystems of the human capacity and the organism generally.3. Principles not specific to the faculty of language.”I am far from being well versed in Chomsky’s work, but I interpret this paragraph as not being incompatible with there being a role for statistical learning. And as I understand it, he did consider the role of statistical learning one of his earliest works, Logical Structure of Linguistic Theory (1955), but he considered that it did not have full explanatory power. As Franck Ramus commented a few days ago on this blog, the explanatory power of statistical learning is still up for debate.Jeff Lidz from the University of Maryland, who works within a Chomskyan framework but who writes much more lucidly and actually carries out experimental work with young children, argues that both domain-specific and domain-general mechanisms are needed to explain language acquisition. Lidz gave a very clear set of lectures at UCL in June this year, and the theme from his lectures was that statistical learning is licensed by Universal Grammar. I understood him to be making the point that Universal Grammar and statistics are not in opposition. Rather, they work together: Learners are able to use statistics because they already know what statistics they need to compute. In other words, Universal Grammar limits the hypothesis space. Lidz’s webpage has several downloadable papers presenting this argument, http://ling.umd.edu/~jlidz/.

Just a quick message to all who have commented to say a big thank you. I am intrigued and fascinated by the diversity of views that have been expressed. I do plan to respond when I have had a chance to read and think a bit more, but this may take a week or two!

I'd just like to second Chloe's comment about Lidz's work on child language acquisition, which provides a range of empirical studies arguing for an approach to language learning that incorporates both statistical biases and representational (UG) constraints. The work on how child speakers of Kannada interpret ditransitives is especially interesting. I must admit I think that many theoretical syntacticians (and I am a theoretical syntactician so mea culpa) have been very bad at making the results of our work accessible to people in the psychology of language learning, but I do think that many of those results are potentially really interesting to the broader psychological community: structural conditions on meaning construction (e.g. cases where you might expect to get a particular meaning but it just isn't there); structural conditions on resolving meaning dependencies (between referential antecedents like names and pronouns, between quantifiers like `every boy' and pronouns, between question phrases like `which boy' and the verb which assigns them their semantic role (as, say, agent or patient), ...); the relations between linear orders of types of phrases and their meanings across different languages; I could go on interminably. Almost all of this is fairly well agreed upon at least at the phenomenological level (if not in which theory of syntax can best account for them) and they are real properties of people (or people's behaviour in experimental settings anyway - google Jon Sprouse's recent work on how reliable grammaticality judgments are). These descriptive regularities across linguistic behaviour are what make language so interesting and challenging to build theories of, and they are the things that 95% of generative syntacticians spend their time on. There are real live and interesting debates in the linguistic literature about whether what are called `island effects' (where certain syntactic structures disrupt the relation between, say, a question phrase and the verb which assigns it its meaning role) are to be explained by how the syntactic processing mechanism deals with certain grammatical structures, or whether these effects, which are very robust, are as a result of the way that the grammar is set up. Again, I could go on with many more cases. The basic point is that we theoretical syntacticians shouldn't dismiss the possibility that there are important effects of frequency in learning languages and hence in the final grammars, or that there are domain general biases (in fact, I think Chomsky himself has said that both are relevant) but equally, I think that one can't dismiss the challenges for psychological explanation raised by the enormous amount of solid theoretical and experimental results that have emerged from generative syntax over the years, and which are still appearing. I do acknowledge, however, the we theoretical syntacticians really need to do a better job of making our work more accessible.

Hi Dorothy,Following on from David’s post and the sorts of grammatical constructions that generativists spend their time on, another researcher whose work is worth reading is Stephen Crain. In a paper with Takuya Goro and Rosalind Thornton, available here, https://unstable.nl/andreas/ai/psy/presentaties/Crain.pdf, he presents examples of children’s syntactic errors that really are troubling for statistical learning accounts. For example, English children sometimes insert extra wh-words in long-distance questions, saying ungrammatical things such as (1) *“What do you think what pigs eat?” and (2) * “Who did he say who is in the box?”, even though they do not get exposed to medial-wh constructions in the input. Furthermore, Crain et al argue that these questions could not be produced by merging the template for two simple questions: such a strategy could work for (2), but not for (1), as *”what pigs eat” is not a well-formed question. In a neat analysis, they show that such constructions are grammatical in some dialects of German (where, for example, the equivalent of “Who do you think who goes home?” is perfectly OK). Even neater, they argue that there are no examples of English children using medial wh-words in, for example, sentences such as *“Who do you want who to win?”, where this would require extraction from an infinitival clause – and those same dialects of German don’t allow their speakers to do that either. Importantly, children do hear fragments like “who + to + verb” in, for example, embedded questions, e.g. “I know who to follow”. So they could potentially form those sorts of questions using the ‘cut-and-paste’ operations that Tomasello proposes. But they don’t. In other words, children sometimes deviate in certain ways from the grammar of the particular language that they’re exposed to – but they don’t deviate from what’s allowed in other natural human languages. How are their errors constrained? Crain et al. argue for a Universal Grammar and temporarily mis-set parameters. I haven’t seen any other convincing explanation of their data.

I enjoyed reading your review. You may be interested in a review of the same work I recently posted at http://ling.auf.net/lingBuzz/001592 It supports some of the points you make and gives some additional perspective about the sad state of affairs Chomsky's linguistics have become. In case you have questions/comments my contact info can be found at that link...

In the Arts and Humanities Citation Index (1980-1992), Noam Chomsky was the most cited living person, and the eighth most cited source overall. The other nine were: Marx, Lenin, Shakespeare, Aristotle, the Bible, Plato, Freud and Cicero. Presumably, everyone agrees that these other nine have made significant contributions to the history of ideas. But Dorothy Bishop questions whether Noam Chomsky deserves the place he has been accorded. Chomsky’s failing, according to Bishop, is to theorize about adult languages without regard to the process of language acquisition. Bishop’s view – for which she offers no supporting evidence or argument, only an assertion – is that one cannot understand adult language without understanding the process of language acquisition. Chomsky denies this. He observes that children raised in extremely different linguistic environments are nevertheless able to communicate effortlessly; children born in New Zealand, in the UK, in the Australian bush, or on the streets of New York, all acquire equivalent grammars (aside from differences in vocabulary and pronunciation). Chomsky’s conclusion is that the process of language acquisition leaves no stamp on adult language. This contradicts Bishop’s assertion that investigations into the process of language acquisition are critical to the development of theories of adult languages. To support her claim that adult language is influenced by the process of language acquisition, Bishop cites a study by Dabrowska (2010) which found that linguists and nonlinguists differed in their judgments about the acceptability of complex sentences; linguists tended to be less influenced by the lexical content of the test sentences than nonlinguists were. Although the two groups were given different instructions, Dabrowkska reports that “linguists and nonlinguistic alike gave higher ratings to sentences that linguists would describe as ‘grammatical’ ”. On the basis of this study, Bishop reaches the opposite conclusion, namely that “agreement on well-formedness of complex sentences [is] far from universal in adults.”

The finding that linguists and nonlinguists invoke different criteria in judging whether or not sentences are acceptable has in any case little or no bearing on the relationship between language acquisition and adult languages, since acceptabillity judgements do not directly tap an adult’s intrinsic grammatical knowledge. But let us suppose, for the sake of argument, that Chomsky is wrong to assert that the process of language acquisition fails to leave a stamp on adult languages. Does this suffice to diminish Chomsky’s contributions to our intellectual history? Only if we are influenced by people like Bishop who, by their own admission, fail to understand the contributions Chomsky has made, but which so many others have recognized.

Worth noting that citations are time-limited and that frequency of citations doesn't tell us anything about the validity or reliability of the ideas involved. All ten of the sources listed have been widely critiqued - in many cases with some justification.

My curiosity having been piqued by the opening paragraph of Stephen Crain & Max Coltheart's comment I was prompted to read the paper by Crain et al cited by Chloe Marshall. I wasn't convinced that the evidence presented could be used, as the authors suggest,

'to adjudicate between a UG-based approach to child language and an experience-based approach.'

Three reasons:

1. The authors use 'linguistic error' in a very narrow sense - along the lines of a consistent grammatical error within a single sentence, rather than, say, the single clauses, or sentences that are unfinished, grammatically incorrect or start off with one structure and switch to another - familiar to anyone who has tried to do a verbatim transcript of unscripted speech.

2. The authors appear to assume that an experienced-based model of language acquisition limits children's exploration of the 'space of human languages' to speech patterns they hear around them (e.g. English doesn't have a medial 'wh-' so where does it come from in children's speech?). Of course children do mimic speech patterns but because of the way human memory works, they don't do it entirely accurately. They approximate, fill in, omit, innovate - in exactly the same way as adults handle incomplete information.

3. The authors appear to assume that linguistic structures are clear-cut. It's true that the prototypical features of languages are usually amenable to classification, but the way people use a language isn't prototypical, it's an approximation to prototypicality.

Language is primarily a vehicle for communicating information. As long as an utterance carries sufficient information to accurately convey its intended meaning, there is no compelling reason for the speaker to make it conform closely to the prototypical features of the language being spoken. If however the meaning isn't clear to listeners, speakers generally get frustrated and if possible try something else.

Toddlers constructing a phrase in a way that a parent doesn't understand tend to get very cross at the uncomprehending response, but if the penny drops and the parent corrects the error, the child will often pay close attention to the correction and abandon the incorrect form. This means that if incorrect grammatical forms do the job, they are likely to persist for a while in children's speech, but forms that are ambiguous or don't convey sufficient information will be abandoned. One would expect to see the same phenomenon in languages. It's quite likely that persistent grammatical errors in one language will map on to accepted forms in another , because both are emergent properties of similar constraints and affordances.

The Universal Grammar hypothesis appears to be based on two implicit assumptions. Firstly that language is a special cognitive domain not subject to the errors and biases that occur in other cognitive processes; and secondly that recurring patterns must be the result of some underlying design (in this case biological) and that they couldn't have arisen as emergent properties of interacting systems.

Chomsky's late wife Carol wrote a book published in 1969, entitled "The Acquisition of Syntax in Children From 5 to 10". Might be worth a look if you're interested in his views on acquisition, as I imagine he had an influence on it.

I realise those of you interested in this topic must have given up hope that I will reply to comments as promised. Just to say I will! It is just taking longer than anticipated to find time to read all I want to read. Definitely something by end of October and I hope sooner.Alex: I know about that book you mentioned - read it as a grad student! My recollection is that it focused primarily on a few constructions such as the distinction between "John is easy to please" vs "John is eager to please" but it's worth taking another look to refresh my memory.

NC is a brilliant scholar, a towering figure in linguistics as well as an extraordinary polemicist and an inspiring and charismatic leader and teacher. But one fact about NC that seems to me indisputable is that he is constitutionally incapable of playing any role other than king of the mountain, even as he likes to pose as a sort of impotent outsider. According to him, he is always right and always has been right; and so far as I know, he has never been able to find it within himself to admit to error. I see no point in debating against him, because it appears that, from his perspective, any argument you wish to make by positioning yourself in unfriendly disagreement with him must be the result of a pernicious misunderstanding on your part – i.e., your position is either a notational variant of his or is just plain wrong, and your failure to see or to acquiesce to this is evidence of your obtuseness or venality.One way that NC succeeds better than the rest of this at this not uncommon strategy is by making beautifully sweeping claims, supported by extremely impressive and detailed argument, that are in the end never very easy to pin down. For example, he never supplied his theories of the autonomy of syntax or the poverty of the stimulus with hypotheses that were actually empirically testable. Though his arguments are replete with linguistic examples, they prove (if anything) not the sweeping claims but lemmas that may or may not have anything to do with those claims. The lemmas may be interesting and challenging, but they are not conclusive about much, even including internal mechanisms of the theory (which is why NC can adjust those mechanisms so easily from year to year). The exceptions to this general rule are for the most part to be found in his work from the ‘50s - his MA and PhD theses, his 1953 Language paper, and his contributions to mathematical linguistics. Subsequently, a great deal of the empirically testable work that has informed his theories has been done by students and followers, and that work can be and has been accepted or disowned by NC from time to time as he sees fit.Admittedly, NC is the object of constant virulent attack just because he is king of the mountain. But he did not get to where he is by being a nice guy (which has no bearing on theory or empirics, but does have a bearing on how arguments are deployed and received and argued against). NC has usually set the tone of the conversation, and it has never been a genial one.My point is that there’s a theoretical side, a history-/philosophy-/sociology-of-science side, and an empirical side to NC’s work, and the trick, if you want to dispute with him, is to separate these and then find (if you can) the specifically empirical claims. The theoretical component (because it is, for NC, thoroughly invested with ideology and at the same time unattached to any falsifiable hypotheses) is not subject to refutation.Although I have focused on NC here because he was the subject of the original post, the problem is obviously much more general. What to do about sweeping claims (such as whether correction or some correlate thereof plays a role in first language acquisition, or whether adults without pathology can usefully be said to acquire “equivalent grammars,” independent of or unaffected by individual variation or levels of attainment, that we linguists can use our intuition to see into) generally believed by most linguists, whether Chomskyan or not? I don’t know – we just try to do our bit for Kuhnian normal science except when intertheoretic rumbles are afoot and the larger problems are highlighted. I readily confess that I am no less guilty here of making sweeping claims unsupported by argument (though supporting arguments by me and others can be found elsewhere).

The core activity of science is to produce models of reality, and validate them by observation. It's been 50-some years: Where is the TG model of language, that can understand, produce, or learn realistic human language? I should say, where can I download it, because to not embody such a model in a computer program would be pathetically anachronistic. 50 YEARS people, and there's really nothing. Chomsky is a mathematician, not a scientist. If his math provides useful models, great, put it to work. If it doesn't it's just pretty math. Personally I think Chomsky's linguistic work will end up like the Ptolemaic system - Beautiful, rational, based on fundamentally flawed assumptions. Except that epicycles actually produced quite accurate models...

Crackpot here. This must have been discussed elsewhere at length, many times, but I have trouble understanding just what it is about "colourless green ideas sleep furiously" that would make anyone think semantics had been disengaged from syntax.

We have here green ideas; what sort of green, why a *colourless* green, a green that evokes the ephemera of green, all the things about green which are not colour: the force, perhaps, that through a green fuse drives a flower; the phrase 'colourless green' posits the existence of this intersection, and it might be rejected, but that is hardly the same thing as proving it meaningless. In fact it is clear that the juxtaposition has meaning; there is a pattern here, for what do these colourless green ideas do, why, they are sleeping, they are sleeping *furiously*, boiling in dreams: it is actually a highly evocative image.

So I do not understand his assumptions. Syntax may indeed be divorceable from semantics, but the assertion that this is a sentence devoid of semantic content confuses me, coming from a background in the humanities. Is this because he is discussing 'semantics' within a technical framework wherein the meaning of the word is understood to be highly specified?

You have only the dimmest notion of what Chomsky's theories mean. For example, your nonsensical notion that he believes children learn "abstract syntactic rules" as they first learn a specific language could not be more in error. It does seem that you do not have the capability to understand his theories.

Chomsky's views are interesting for two different reasons: first he has provided an impeccable way of conceptualizing the acquisition problem in the domain of language. In fact, he has done more than this: he has broken the language problem into three interacting yet separate parts: describing a native speaker's linguistic facility within a particular language (this work focus on the kinds of RULES native speakers embody), describing a human's linguistic facility in acquiring ANY language (this focuses on the kinds of mental structures required to derive particular grammars given the limited input the child uses), and describing how human's cam be to the species with the second aforementioned linguistic facility (this focuses on how much of the faculty of language is cognitively generic and how much linguistically proprietary). These are three different questions that often get run together (sadly). At any rate, showing the fecundity of this way of putting things is impossible to do in a comment here so let me recommend the interested parties to check out endless discussions on my blog: facultyoflanguage.blogspot.com.

Chomsky has not only described the problems to be solved but has also made detailed suggestions concerning how to solve/approach them. This work has been much more controversial. Many "Chomskyans" have disagreed with many details over the last 60 years. And Chomsky has repeatedly changed his mind over the details. The problem with many critics, including the blogger here, is that they fail to distinguish what kind of point Chomsky is trying to make, and fail to understand how much controversy over the details there is. Again, I discuss many such issues in my blog for those interested.

Chomsky is not infallible, not even to his most devoted acolytes like me (yup, count me a true believer). However, he has correctly identified the right problems and correctly described the KINDS of solutions these will require. IMO, he has nailed this almost completely. In short, he has the big picture absolutely right. The details? Well, let's just say that I am respectful but not convinced about many. But, the reason that Chomsky is such a big deal is that getting the big questions right (what Marr would call the Level 1 computational problem) is what we remember the sceintific greats for. Galileo got a lot wrong as did Newton etc. But we remember them because they got the problem worth solving right and pointed directions of how to solve them. Ditto Chomsky. What is sad is that many still do not understand this, and it results in lots of wasted effort and confusion mongering. The post that prompted this reply is a good example of this.

Typical Chomsky fan. 'Sigh, You guys just don't understand the great masters work, what a shame'. Sorry Norbert, this won't wash for much longer. Chomsky is a fraud, but a fraud who has hid it well for decades. His star is fading and his ideas will be consigned to the scrap heap of bad academia along with cold fusion and lamarkianism.

The history of finger biometry was initiated in the late nineteenth century by scientist Francis Galton. Since then, it has grown tremendously thanks to a large team of geneticists and biologists. In 1880, Henry Faulds made the argument for the amount of fingerprint RC (Ridge Count) to assess the degree of fingerprint dependence on the genes.

The scientists claim that fingerprints are formed under the influence of the genetic system of the fetus inherited and the impact of the environment through the vascular system and the nervous system located between the dermis and the expression the cover. Some of these effects are oxygen supply, nerve formation, the distribution of sweat glands, the development of epithelial cells. Interestingly, although there is a common genetic system Hereditary but fingerprints on the ten fingers of each individual individual. In 1868 the scholar Roberts pointed out that each finger had a different micro-growth environment; In addition, the thumb and index finger suffers from some additional environmental effects. So fingerprints on the top ten fingers of a different individual. The twin brothers (sisters) with fingerprint eggs are quite similar but still can distinguish fingerprints of each person. This is because although they have the same genetic system and share the same developmental environment in the womb, but because of their different position in the womb, their micro environment is different and therefore has different fingerprints. together.