Innateness and Contemporary Theories of Cognition

Nativism and Empiricism are rival approaches to questions about the
origins of knowledge. Roughly speaking, Nativists hold that important
elements of our understanding of the world are innate, that they are
part of our initial condition, and thus do not have to be learned from
experience. Empiricists deny this, claiming that all knowledge is
based in experience. Different Nativist and Empiricist views spell
out the details in different ways, depending on which elements of our
knowledge are at issue, what counts as understanding, what is meant by
the initial condition, how learning is to be understood, what it is
for knowledge to be based in experience, and so on. There continues
to be lively philosophical debate about whether there is any
satisfactory general account of what it is for something to be innate
(for a review of some recent work see Gross & Rey 2012). The
Nativist views discussed here differ in many respects, but all share
the broad commitments of the approach. It should be noted that the
commonplace opposition of Empiricism to Rationalism reflects back on
17th and 18th century philosophical debates in which Nativism was a
central plank in the Rationalist position. The contemporary Nativist
views we consider here are independent of such broader Rationalist
commitments (see the entry on
rationalism vs. empiricism).
Although it is misleading, it is not uncommon for the
terms ‘Nativism’ and ‘Rationalism’ to be used
interchangeably (see the entry on the
historical controversies surrounding innateness).

Up until the 1950s, there were no active research programs that were
looking for the innate factors in knowledge and cognition that had
been hypothesized and argued for by Nativist thinkers since Plato It
was widely agreed that the centuries-old battles between Empiricists
and Nativists were over, and that the Empiricists had decisively won.
The Nativist situation was actually worse than that: innateness claims
were seen as not only wrong, but as ultimately unscientific approaches
to mind and perhaps incoherent as well. The prevailing research
agenda for scientists and philosophers interested in how the mind
works was to show how our knowledge and abilities could be fully
accounted for on the basis of our sensory experiences and the general
learning mechanisms that operate on them.

But a number of developments have led to a resurgence of Nativism,
beginning with Chomsky's revolutionary work in linguistics in the
1950s and 1960s. This entry places this resurgence in its scientific
and philosophical context, and will discuss a few important areas of
research to give a taste of the kinds of experimental approaches,
hypotheses, and theories that have been advanced. A word about the
focus of this entry. Most philosophical discussions about innateness
begin with careful analyses of the variety of meanings innateness
claims can have, consider the sorts of entities that might be at issue
in such claims (beliefs, ideas, concepts, knowledge, etc.), discuss
the epistemological standing of these innate elements, and so
on. These questions are no doubt interesting—and sometimes the
answers are interesting too—and such work has its place. But the
real action for philosophers is more in the details of the current
empirical research, and less in the philosophical bookkeeping.
Cognitive scientists are beginning to reveal some of the basic, or one
might say ‘primal’, patterns of human cognition. They are
using experimental evidence to paint a detailed picture of how we
human beings understand the world—both the physical world around
us, and ourselves and other selves that are parts of that
world. Developmental scientists are trying to figure out to what
extent and in what ways we are built by nature to arrive at these
understandings. Those we identify as Nativists accord a significant
role to our natures, and lean towards the view that we are not built
to be initially neutral about the world we encounter, in the way that
classical Empiricism would lead us to expect. This growing body of
scientific thinking is of general interest, as evidenced by the
attention of science magazines and newspapers like the NY Times. But
the character of our primal understandings and their innate bases are
intimately connected to the central concepts and questions that
philosophers have always been most interested in. Getting clear on how
we naturally think and how we come to think that way is, arguably, a
critical element in our understanding of human beings.

The entry has three main sections. In the first section,
current Nativist developments are put in recent historical context,
especially the connections between Chomsky's linguistic
innovations and current cognitive science research. The second,
and longest section, takes up three areas of current research on
children's early concepts and understanding—of physical
objects, number, and mind/agency—to give a sense of the type of
empirical work being conducted and to highlight some of the promising
results that are
emerging.[1]
A third section reviews some recent work in
the study of development that is close to the Empiricist side of the
traditional divide.

1.1.1 Behaviorism and Nativism

The reigning experimental paradigms in mid-20th century
American psychology were for the most part variants of
Behaviorism. B.F. Skinner's behaviorist account of language
acquisition and use (Skinner 1957) in many ways marks the end of this
dominance—or at least the beginning of the end—because it
was the target of a very influential attack by Chomsky (1959). This
attack convinced many of the inherent limits of behaviorist theorizing
(see Cowie 2010 for details).

The defining feature of Behaviorism is its
anti-mentalism—the methodological claim that one can
(must) provide a psychological account of human beings without
referencing internal mental states. Chomsky's attack on
Skinner zeroes in on this
anti-mentalism.[2]
The connection between Behaviorism and
Nativism, on the other hand, is typically given less
prominence. Although Behaviorism is closely tied to Empiricist
associationism and is therefore ‘officially’ anti-Nativist,
theories like Skinner's do incorporate significant Nativist
elements. Specifically, Skinner took it for granted that every
animal has a range of naturally emitted behaviors. Some of these
behaviors are responses to stimuli (Skinner's
respondents—e.g., the baby's suckling response),
while others are just emitted (Skinner's
operants—e.g., the baby's babbling). These
behaviors are the raw materials that can be shaped by
experience—Skinner's conditioning and the law
of effect. So the notion of an innate behavioral repertoire,
and of innately specified links between environmental stimuli and
elements of that repertoire, are very much part of the Behaviorist
picture. This innate repertoire was, as any good Darwinian would
expect, highly information rich, because it was shaped by the history
of problem solving by the animal's forebears. All parties
take it for granted that babies babble, and suckle in the presence of
the right stimuli, because such behaviors are part of their biological
heritage. There might be disagreements about the underlying mechanisms
and epistemological standing of that heritage, but it is hard to deny
that humans are in some sense pre-informed that they need to
suck to get milk from the breast. This is a ‘factory
settings’ for babies. So if we set aside the controversy
over the subject matter of psychology (behavior or the
internal mind?), and the controversy over the right explanatory
constructs (schedules of reinforcement or cognitive
processes?), we find that Behaviorism is actually
committed to innateness claims, and doctrinally
opposed to any kind of ‘blank-slate-ism’.
But this isn't how things actually played out. Behaviorism
was for the most part truer to its affiliations with philosophical
Empiricism and Associationism, and its Nativist commitments were
obscured. One important lesson is that in the Nativism-Empiricism
debate we are often dealing with ideology, not theory (Pinker
2002).

The impact of Chomsky on linguistics and cognitive science has been
much discussed. Here we briefly review some of the elements
critical to the resurgence of
Nativism.[3]
Chomsky focused attention on two facts about
human languages: (1) that they are very complex, and (2) that children
come to master them without much systematic training. The second
fact is fairly obvious, but the first is not. A very important
step, as far as Nativism is concerned, was Chomsky's notion of a
generative grammar as a framework for articulating the
complexity of a language. A generative grammar of a particular
language is a system of rules that generates all (and only) the
sentences of that language, along with a characterization of how each
sentence sounds and what it means. Chomskyan linguistics is the
project of discovering the elements and structure of such rule
systems.

The link between linguistics and innateness comes in a second
important move: the psychologization of grammars. Chomsky argued
that every speaker of a language has a mental representation of its
grammar. This sets up a natural question—how did the
grammar get into the speaker's head?—and two
traditional answers immediately present themselves. The
Empiricist would aim to show that the grammar (if it indeed is
in the head) could be learned from experience in much the way one
learns other facts about the world. The Nativist, in contrast, is ready to
consider that learning a language—now reconceived as a matter of
grammar acquisition—depends in some way on a language-specific
innate endowment. This brings us to the third important
step. Chomsky argued that a comparison of (i) the grammar that
has to be acquired, and (ii) the idiosyncrasies of the acquisition
process and the data presented to the language-learner, favors the
Nativist approach.

So Chomsky did more than simply point to language learning as an
area in which the Nativist case might be
built.[4]
His framework for
specifying the grammatical rules that the child has to master
sharpened the debate between Empiricism and Nativism in something like
the way that the mathematicization of physics in the 17th
century revolutionized the empirical sciences.

Part of this sharpening is the result of Chomsky's important
methodological distinction between competence and
performance. Chomsky argued that a scientific approach
to language needed to focus on the specific mental representations that
underlie linguistic behavior (‘linguistic
competence’), and not on the behavior itself
(‘linguistic performance’). Linguistic
performance, he argued, is scientifically intractable, because it is
the result of too many idiosyncratic interacting factors. We
would do better to take on the much more circumscribed question: what
is the system of rules (the grammar) that generates all the allowable
sentences? It soon became clear that even if we set aside the
performance systems involved in real linguistic behavior, the rules of
the grammar were themselves very complicated, often unintuitive, and
abstract, in that they involved categories and constructs that
were at a significant remove from the data. The idea that children
could simply ‘pick up’ these rules by attending to what is
associated with what in their language environment was just not
plausible. Yet every normal child does in fact learn a language,
and so does somehow master these rules. So either the general
learning system that the child wields is somehow more powerful than the
Associationist-Empiricist had assumed, or the Nativist is right and
there is some innate language-specific information that ‘greases
the wheels’ of language acquisition. To resist the Nativist
conclusion, the Empiricist has to return to the drawing board to
develop a more powerful general learning theory. Chomsky
championed the Nativist position and termed the innate information
‘universal grammar’ or ‘linguistic
theory’. This is the essence of Chomsky's famous
Poverty of the Stimulus argument, which in an important way
provided a measure of the challenge that Empiricism faces.
The Empiricist-Nativist debate was no longer
‘you-say-experience-I-say-innate’ affair; it looked to many
to be a matter of ‘put up or shut up’, and the burden was
on the Empiricist to do the putting up.

There was significant controversy about all the elements of this
paradigm shift: philosophical tangles about the notion of
representation (in what sense is the grammar ‘in the
head’?), technical linguistic debates about the structure and
character of grammars for specific languages and about the nature of
universal grammar, controversies in psychology about the relevance of
Chomskyan formalisms to experimental studies of child learners and
adult speakers of a language, and on and on. But the shift
held. Linguistics went from a backwater to a central player (as a
model and as an integrator) in the development of cognitive science as
a multi-disciplinary approach to aspects of cognition and mind.
Developmental psycholinguistics, a field more or less born out of these
upheavals, set out to investigate experimentally whether the details
about language acquisition actually supported the Chomskyan Nativist
hypotheses, and in time, many developmental psychologists broke from
the reigning Empiricist paradigm and began to deploy Poverty of the
Stimulus arguments in other areas of cognitive development.

1.1.3 Nativism as natural science

Before Chomsky, Nativism suffered from two disabilities. The
older charge, which we mentioned briefly at the start, was that the
doctrine was in some way incompatible with a naturalistic or scientific
approach to the world. It is true that the Nativist view, as
defended by many early modern Rationalists including Descartes
(1996/1641 and 1911/1647) and Leibniz (1981/1764), did contain (what we
now regard as) a supernaturalist element: what was innate was
presumed to have been placed in us by God. But beside this taint
of anti-naturalism, there seemed to be another problem, highlighted by
Locke: simplicity. Locke (1979/1690) argued that, all things being
equal, we ought to prefer the simpler Empiricist doctrine, which posits
only sense experience and general associationist learning, to the
Nativist view, which adds inborn materials. It is this
presumption in favor of Empiricism that was inherited by modern
versions of Associationist psychology; it was taken for granted that if
there were equally good Empiricist and Nativist accounts, the
Empiricist account would be methodologically preferable on the grounds
of simplicity.

In light of all this, it is important to recognize that
Chomsky's advances undercut both these supposed shortcomings of
Nativism. On the first point, Chomsky repeatedly stressed that
claims about internalized grammars and universal grammar were
unexceptional empirical hypotheses about the internal causes
of the observational evidence. The question of what is built in
and what needs to be learned is a straightforward scientific
question. It goes without saying that there is no hint of the
supernatural in Chomsky's linguistics: we have the innate
structures we do because we are evolved biological organisms.

This Nativist connection to evolution raises a natural question: why
did the resurgence of Nativism have to wait for Chomskyan linguistics;
why didn't the theory of evolution, developed more than a
half-century earlier, undermine Empiricism and resurrect
Nativism? The Empiricist paradigm, after all, has always promoted
itself in terms of its very austere view of human knowers: we perceive
the world, and learn all we know on the basis of our perceptual
experience of it. But as we noted earlier, the Darwinian
Revolution made it plain that as a general rule, evolutionary
forces shape organisms to fit into their niche. Such
shaping, at least in the animal kingdom, was obviously a matter of
pre-organizing the animal's behavior-producing
machinery—the processing that goes on in its brain—so that,
for example, birds know that they should eat worms and build nests out
of twigs and not vice versa. No one tries to explain the
bird's competences (and birds' natural competences extend
far beyond this trivial example) purely in terms of the bird's
perceptual experience. Birds are not blank slates at birth.
But we humans grow from the same evolutionary branches as the animals
around us. This line of thought leaves us with a few
possibilities. One is that all the innate preparedness
painstakingly established in our evolutionary ancestors was somehow
discarded, and we humans were redesigned—from scratch, as it
were—as blank slates with a uniquely powerful learning capability
to make up for our meager initial holdings. This is, arguably,
the traditional Empiricist approach. Another is that we inherited
a good deal of what evolution had established in the cognitive systems
of the organisms from which we evolved, but that our further advance
was, to a first approximation, based not on innate factors but on
learning. A third view—the Nativist position—is that
more was added in the course of our own evolution, and that we too are
in some way pre-informed about at least some matters most critical for
our survival. These possibilities are too vague to be taken as
hypotheses, but the Nativist view seems at least as initially plausible
as the Empiricist approach. The important point is that it
should have been that plausible a century ago. Somehow the
Nativist implications of evolutionary theorizing were
also
obscured.[5]
Empiricists
might argue that these implications are not relevant to the Nativist
tradition that they oppose, but the point is that the issue was hardly
raised. One suspects that a deep cultural and intellectual bias
was at work.[6]

The upshot of this last point is that the presumed advantage of
simplicity that Empiricism claimed for itself was illusory. Once
we include in our measurement of simplicity how well a hypothesis fits
with other established theories, the simpler hypothesis is that human
beings are part of the natural biological order, and that like all
other organisms they are to some degree pre-shaped by evolution to fit
into their distinctive ecological niche. The naturalistic view of
human beings ushered in by Darwin should have, all by itself,
revived
Nativism.[7]
We might go a step further and ask whether Empiricism itself missed a
golden opportunity to deploy evolutionary theory as a vindication
of Empiricism. A more enterprising Empiricism might
have noted that evolutionary theory commits us to the idea that
whatever is innate in us was, at least in one sense, shaped by
experience. Experience here would be ancestral
experience, not the experience of the individual subject, but such a
view would still ground knowledge in experience. In other words,
the range of ‘learning from experience’, the
Empiricist's core commitment, would simply be extended to cover
not only individual learning but species-based learning as well.
But this opportunity was for the most part missed.

1.2.1 The problem of linguistic exceptionalism

Although Chomskyan linguistics set the stage for a general Nativist
revival, it took a while for this train to leave the station, and it
will help to understand why. Part of the problem was that the
original case for linguistic Nativism had been made, at least in part,
by focusing on what looked to be unique features of language.
Language has long been seen as exceptional; as the
distinguishing feature of human cognition. Chomsky championed
this view, and argued that language is central to a special kind of
human creativity (Chomsky 1966).

We have already noted one facet of this exceptionalism: the fact
that grammars are very complex. But there are also unexpected
singularities in how children learn; in the learning process
itself. Each child is exposed to an idiosyncratic sample of the
language (their primary linguistic data). Each sample is
compatible with any number of non-equivalent grammars that all generate
the pld sample so far, but give different verdicts about new
cases not in the pld. We might therefore expect (i) that
the grammar a child acquires reflects the idiosyncrasies of the
pld the child was exposed to, (ii) that, as a consequence,
children will disagree about what is and what is not grammatical, and
(iii) that adults will therefore have to correct them to smooth out
errors that reflect those idiosyncrasies. But this, Chomsky
argued, is not what we find (Chomsky 1965). Children learning a
language somehow converge on the same grammar, as evidenced by
their agreement about well-formedness, and by the distinctive types of
errors they make and don't make in the course of
learning. If this is right, it suggests that the child must have
prior information that somehow constrains or orders the hypothesis
space that steers the child to the right grammar, and it is hard to see
how this information can be acquired through experience.
Furthermore, the pld contain ungrammatical and incomplete
sentences, but children somehow filter out this noise, and do so
without explicit instructions or feedback. There are a
number of other striking features about language learning that Chomsky
drew attention to: (1) it is acquired rapidly, (2) the speed of
acquisition does not correlate with intelligence, (3) it does not
require reinforcement or extensive explicit training, and (4) it is
acquired in a critical period—a relatively fixed window
in the maturation process—during which other less complex systems
(counting, for instance—see below section 2.3) cannot be
mastered. Each of these claims has prompted a long trail of
experimentation and theory construction, and all remain controversial
(see, for example, the discussion in Menn et al. 2003). But
their overall effect was to single out language learning as
exceptional, and perhaps unique. Chomsky himself marked this
difference by speaking of language acquisition and contrasting
it with learning, a term he reserved for induction-based
processes.

So on Chomsky's view, language is doubly exceptional. It
is the distinctive human cognitive trait, and is essentially
different from all known animal communication systems. The fact
that we have it makes us exceptional as a
species.[8]
It is also exceptional in that
the pattern of its acquisition suggests that it stands apart from all
that we learn about the world; it simply grows in us. Taken
together, these considerations supported a Nativist account of language
learning, but at the same time tended to discourage the idea of
exporting the Nativist revolution beyond language. After
all, how much of the rest of the child's untutored knowledge of
the world is as complex as grammars reveal human languages to be?
And how much of that knowledge comes to the child as effortlessly and
without explicit instruction?

1.2.2 The expanding prospect(s)

In time, the arguments for linguistic exceptionalism gave way to a
broader view of the Nativist project. Chomsky (1975) set out a
fully general schema for Poverty of the Stimulus arguments that did
not depend on the distinctive features of grammars and
language acquisition, which had been featured in making the original
Nativist case. Chomsky began to speak of language as one of
possibly many mental organs that grow in the
individual. This naturalistic biological model embeds Nativism
about mental organs into a wider and uncontroversial biological
Nativism. It is uncontroversial that kidneys do not develop as a
response to the environment, and they certainly do not copy the
environment. The human body is organized in such a way that in
normal (fetal) environments, kidneys will form. This point could
now be deployed against the Empiricist. To presume that the basic
features of our physical-biological nature are
internally pre-determined, but that our
mental-psychological nature is not, but is wholly
externally determined, is to introduce a dualism that requires
a special defense. But Empiricism seems to make just this
presumption, and offers no credible defense. So the tables are
turned. The Nativist has been freed from the earlier
supernaturalism charge, the simplicity-card of Empiricist models turns
out to be spurious, and now the Empiricist seems to be the one carrying
an unmotivated dualism as excess
baggage.[9]

The mental organs approach has proven to be extremely influential in
both philosophy and the cognitive sciences. In its most general
form, it has displaced the idea of information in the mind as (for the
most part) a single uniform set of sentences or data points, and put in
its place an alternative architecture of systems and subsystems of
knowledge and information, each, possibly, having its own design,
pattern of representations, specialized function, pattern of
activation, level of integration with other systems, (sometimes)
specific locus in the brain, and so on. We mention here a number
of developments significant to the Nativist side that that have grown
out of this central theme.

The modularity of mind hypothesis. Fodor (1983) proposed a
view of our overall cognitive architecture that rested on a rough
distinction between input systems, or relatively rigid computational
“modules” that are designed to pick up specific types of
information, and more flexible central processors that integrate that
information in various ways. Each of these modules has a specific
task-orientation, and does its work independently of much of what is
going on in the rest of the system. So, for instance, we more-or-less
automatically hear sound patterns as sentences of our native language,
perceive patterns of light and shadow as configurations of objects in
space, and so on. In these terms, the language organ is just one
of a set of freestanding mental modules. Fodor suggested a
checklist of properties that such modules could be expected to have,
and among them is that they are innately
determined.[10]
The architectural claim about modular
organization does not in itself imply an innate basis, but the
hypothesis that the sorts of response patterns to linguistic and visual
input (like those just mentioned) have a strong innate basis is
plausible and has been experimentally pursued. Fodor's
version of the view is now termed a moderate modularity
thesis, because he holds that much of the business of cognition
involves ‘central’ processing that is decidedly
non-modular. Modules do the work of ‘presenting the
world’ to highly integrated non-modular global psychological
processes. But others, like Carruthers (2006), have argued that
with some adjustment to Fodor's original characterization of
modules, we can argue for massive modularity.

Evolutionary Psychology. One of the controversial
arguments used to defend massive modularity claims is that evolution
favors this sort of architecture. This brings us to the central
doctrine of Evolutionary Psychology—i.e., that cognition is best
understood as a ‘Swiss army knife’ of special purpose
psychological-computational mechanisms that evolved to enhance the
survival of our
ancestors.[11]
One much-discussed example of such a
mechanism is a ‘cheater detection’ module. Our
ancestors needed to distinguish fair-traders from freeloaders.
Those who could be consistently taken advantage of in exchanges were at
a significant disadvantage in terms of survival. At some point, a
mechanism evolved—a computational program in the brain, a mental
organ (or mini-organ?)—that made such vigilance and
record-keeping second nature, and we now all have this module as part
of our innate endowment. It's been argued—but the
claim continues to be controversial—that the operation of this
module explains the (purported) fact that although we fall prey to a
class of reasoning mistakes, we do not make as many of these errors
when our reasoning is related to cheater-detection. For
Evolutionary Psychologists, the mind is a collection of evolved
sub-systems adapted to the environments of our Pleistocene ancestors,
not to our own
environment.[12]
Evolutionary Psychology is arguably the most
radical Nativist-inspired paradigm, because it looks to make the range
of the Empiricist's general purpose learning mechanism smaller
and smaller.

To keep the players straight, we must note that Chomsky himself has
had a very complicated relationship with evolutionary explanations of
mind and
cognition.[13]
He
is certainly not a friend of Evolutionary Psychology, and has joined
with its critics in questioning its adaptationist
perspective.[14]

Cognitive Ethology. The modularist position, and the
Nativism that fits it so well, have been supported by recent work on
animal cognition, especially the discovery of very sophisticated
information-rich sub-systems in the animal brain (see Andrews 2010 for
a philosophy-oriented review). Early discoveries about complex
animal behavior—like Von Frisch's work on the dance of the
bees (Frisch 1971)—remained in the shadows during the heyday of
Behaviorism, but more and more such systems have come to light since
then. Just to take navigation as an example, desert ants have an
innate dead reckoning module for navigation, and various birds species
have intricate innately-based systems based on the fixed stars,
magnetic fields, the azimuth angle of the Sun, and so
on.[15]
All these cognitive
modules/mechanisms are innately specified subsystems, and add
plausibility to the Nativist theme that nature has built human beings
in the same way.

We have explained the ways in which Chomsky's work in
linguistics inspired subsequent Nativist thinking in the cognitive
sciences. But there is an irony here in that, except for the very
general Poverty of the Stimulus schema (which can be traced back to
Plato), linguistics and language acquisition have not served as
easy-to-use templates or paradigms for developing Nativist hypotheses
in other domains. We so far have no reason to think that there is
any domain outside language that requires anything as complex as a
grammar of a natural language to represent it. So linguistic
competence remains an exceptional element in our
cognitive
make-up.[16]
And even
though some of the distinctive features of language acquisition have
counterparts in other domains—sensitive and critical periods in
the development of visual perception, for example—there does seem
to be something exceptional about the way virtually every normal child
comes to master a language. We might say that for Nativists,
language has been more an inspiration than a working model. But
at the same time, as Nativists move beyond language, they may avoid
many of the methodological challenges to the Chomskyan approach
(including: is a grammar a theory of competence, in what sense are
grammars ‘mentally represented’, is the pld all
that's relevant to acquisition, etc).

A full account—even a comprehensive survey—of Nativism
in the cognitive sciences would fill may volumes and is therefore
beyond the scope of this entry. But there are a number of
conceptual domains that have been especially well investigated by
cognitive scientists in the last decades, and this section will
highlight a few areas that are the subject of lively and theoretically
interesting work, and that are connected to traditional and
contemporary philosophical concerns.

The research we will discuss in this section is inspired by the
Chomskyan paradigm, but there is an important difference between the
language case and this developmental work. Chomsky's
Linguistic Nativism used Skinner's Behaviorism as a foil, but the
Behaviorist paradigm was not the reigning scientific paradigm
in the area of child development. In this field, the Swiss
psychologist Jean Piaget was the dominant figure, and his research has
served as the backdrop for most developmental work over the last 40 or
50
years.[17]
A
brief discussion of Piaget's paradigm and the general Nativist
response to it will therefore be useful.

Piaget generally ignored Behaviorism, and conducted experimental
studies on the child's evolving conception of the world.
His extensive research agenda included the child's understanding
of space, time, God, objects, causality, morality, dreams, number,
being alive, and more. Piaget's specific questions and
experimental results—which were reciprocally (mostly) ignored by
Behaviorists—have served as a jumping off point for many
Nativist-oriented theorists. But Piaget was not a Nativist.
The heart of the Piagetian paradigm is his stage
theory. On this view, children start with a very different
conception of the world than adults have—in fact, Piaget thought
that they start without a conception of an external world at
all—and they go through a series of identifiable stages that
culminate in adult understanding. The powerful unifying idea here
is that there is something about the general character of these stages
that is the same across all domains of understanding, and that the
dynamics of stage transition is also uniform. To a first
approximation, for Piagetians there are no significant distinctions
between the developmental patterns in different domains of
understanding. If we consider any domain, the stage theory
imposes a uniform grid of steps in the development of that domain
knowledge. The dynamic picture, again very roughly, is that a
child at a stage proceeds until she faces an insurmountable obstacle;
her present grasp of things makes it impossible for her to deal with a
recalcitrant problem. This disequilibrium propels her to the next
(pre-plotted) stage, in which new internal resources become
available—an enriched conception of the world or a new
flexibility in physical interaction—and the earlier problem can
be resolved. The child recovers equilibrium until coping with
problems again causes a crisis that leads to the availability of more
new resources, and so on. The articulation of the Piagetian
paradigm involved understanding the general nature of these
stage-transitions better, exploring how the stage theory operates in
specific domains, and understanding the new cognitive and behavioral
resources that make these transitions possible.

Philosophers will recognize the theory as in some ways analogous to
the theories of scientific development championed by Thomas Kuhn
(1962/1996) and others. Two important differences are worth
mentioning, because they highlight what is distinctive about
Piaget's approach. First, although science develops
organically, there is, for Kuhn, no one specific resource that
applies across all fields. What explains the shift from
one dominant paradigm to another in economics will typically not
explain the shift from the Ptolemaic to the Copernican paradigm in
astronomy. But Piaget held that what makes it possible for the
child to advance in her understanding of space is in one sense the
same thing as what facilitates the stage transitions in the
child's developing understanding of God or morality.
Second, science depends on the contingent, uniquely fruitful
innovations that overthrow older understandings and set the stage for
new ones. But in children, Piaget's developmental stages
are posited as mandatory; we might say they are innately prescribed
steps in normal development. The child's forward motion is
regularized as the world presents its predictable problems, and the new
resources become available to solve them and advance the child's
understanding. The upshot is that although Piagetians produced
probing and highly detailed studies of various domains of the
child's understanding, they shared the Empiricist preference for
an across-the-board domain-general mechanism that could explain the
developmental facts in every domain. Although there are
interesting ideas inherent in the Piagetian paradigm about the innate
endowment that makes adult cognition possible, it is not easy to place
Piagetian Constructivism on the Nativist-Empiricist
spectrum.[18]

Piaget's theories provided the scientific received view
against which developmentalists inspired by Chomsky's linguistics
reacted. These researchers set aside Piaget's assumption
that development is uniform across domains, and instead—in part
inspired by Chomsky's organology and modularity
claims—considered each domain independently. The overall
strategy was to discover the cognitive capacities of the youngest
children, and to develop and test hypotheses about (i) the initial
state, and (ii) the transitions that move the child from the initial
state to the normal adult repertoire.

The ‘Core Cognition’ hypothesis. Many
developmentalists in this camp share a commitment to the ‘Core
Cognition’ (sometimes called ‘Core Knowledge’)
hypothesis. According to this hypothesis (Carey 2009; Carey
& Spelke 1996; Spelke et al. 1992; Spelke 1998, 2000, 2003),
evolution has equipped our species (and other species too) with an
innate repertoire of conceptual representations, that is,
representations that cannot be reduced to the perceptual primitives
favored by the Empiricists or the sensory-motor primitives favored by
Piagetians. Rather, evolution has shaped our perceptual input
analyzers to detect certain types of entities in the world, and to
think about them in a certain way. These different types of
entities are few in number. To date, proponents of this
hypothesis have claimed that the innately specified core domains
include physical objects, number, and minds. Proponents of the
Core Cognition view restrict their Nativism to a few domains, and
therefore leave work for learning mechanisms, which (together with
maturation) takes the infant from a limited ‘core’
conceptual repertoire to the broad and highly elaborated knowledge of
the world that adults have. In some cases, adult knowledge
extends the core; in others it ‘over-writes’ it. In
the sections that follow, we review select findings on these three
domains of core
cognition.[19]
We concentrate on very early
development. While it is often difficult to say what
exactly the research reveals about the young child's
knowledge (for methodological as well as philosophical
reasons), the earlier some distinctive elements of a competence are
present, the less likely that it was learned solely on the
basis of experience.

Methodological innovation: the
‘violation-of-expectancy’ looking time. The work
we discuss depended on solving a knotty methodological problem: how to
discover what is going on in the minds of preverbal infants and very
young children? Though infants cannot report on what they are
perceiving or thinking, one can make inferences from their reactions to
objects and events. Long before they utter their first words,
they suck, grasp, creep, crawl, and—most importantly—they
look. Since infants, like adults and other animals, look longer
at an unexpected stimulus, where they look and for how long they look
can reveal a good deal about their expectations about the
world. While measures of grasping, crawling, and sucking have all
been successfully used to reveal some of what is going on in the baby's
mind, the measure that has been used most extensively is the
violation-of-expectancy looking time (sometimes called
preferential looking time). Experiments using this measure
tend to have a similar structure: during an initial phase, the child
is presented with display X, over and over, until the child's
interest wanes and looking time drops down to some criterion
(the habituation phase). In the test phase, the
child is presented with two displays: Y and Z. If
the child reliably looks longer at Y than at Z, this
provides evidence that Z is as expected, but that Y
is unexpected.

As adults, we recognize physical objects as bounded entities that
persist through space and time; they ‘hold together’ as
units, and their paths, when they move, are continuous; they do not
have gaps. In addition, objects causally interact upon contact
with each other. Do we learn these properties of objects by
experience, as the Empiricists believed and, if so, by what sort of
experience? Empiricist thinkers have argued that these properties
are learned, and have proposed several different types of experience as
requisite input to such learning. Helmholtz (1867/1962) suggested
that moving around objects and manipulating them were necessary for
building a concept of an object. Quine (1960) looked to language
acquisition as the relevant source of information, and Piaget (1954)
proposed that sensorimotor coordinations led to construction of the
concept of a physical object. Indeed, Piaget famously argued that
infants altogether lack object permanence (1977), the
understanding that objects persist in time and space, until the latter
half of the second year of life.

2.2.1 Innate folk physics

Object permanence. In the last 25 years,
the baby's representation of objects has been re-explored with
striking results. A landmark study (Baillargeon et al. 1985) used
the violation-of-expectancy paradigm to test the Piagetian claim that
infants lack object permanence. Five-month-old infants were shown
a screen that rotated 180 degrees up from the surface of a table and
back again to its initial position. In the habituation phase, the babies
got used to the screen motion and their looking time decreased,
evidence that they no longer found the screen's movement to be
novel. In the test phase, an object was placed in the path of the
screen as the screen moved downward to the table's surface.
In one outcome, the screen rotated down until it touched the object and
then rotated back up to its initial position, an event that adults
recognize as possible. In the other outcome, the screen continued
its downward trajectory to the table, at first hiding the object and
then apparently moving right through the space occupied by the object,
an event that adults recognize as physically impossible. The
logic here is straightforward: babies will see the second outcome as
surprising only if (i) they represent the object as continuing to exist
even when it can no longer be seen behind the screen, and (ii) they
assume that two objects cannot occupy the same space at the same
time. Only then should they look longer at what adults recognize
as an impossible event. If, however, young infants lack object
permanence or have no constraints about two physical objects occupying
the same space, the impossible event will not constitute a
violation of any expectation. The results were clear; babies
looked longer at the impossible event, indicating that it violated
their expectation of objects. The same finding was later
demonstrated with 4-month-olds (Baillargeon 1987). These findings
offer evidence that very young infants represent objects as persisting
even when they are no longer in view, an understanding of object
permanence thoroughly at odds with the claims of Piaget and
Quine. One may still ask what exactly the child
knows or represents (Burge 2010 is especially
pertinent here), but the point is that there is something in
the child's cognitive apparatus that is sufficient to generate
this expectation, and the burden of explanation is on the view that
this is learned from experience. Moreover, these infants also
expect that two objects will not occupy the same space at the same
time.

Spatiotemporal continuity of objects. As adults, we know that
objects are spatiotemporally continuous; an object that appears at
point A and then at point B must have traversed a
continuous path between these points. Here too the violation of
expectancy looking time paradigm has been used to test the Empiricist
claim that such knowledge requires an extended learning period. In one
study (Spelke et al. 1995), 4.5-month-olds were shown a stage with 2 screens on it, with a
visible gap between the screens. In the discontinuous motion
condition, each screen has an object hidden behind
it. First, the object behind the left screen is moved further
left so that the baby sees it, then it is moved back behind that same
screen. The object behind the right screen is shown in the same
way, so that during these displays only one object has been visible at
a time and no object has ever been shown to cross the gap between the
two screens. Adults seeing this display infer that there are 2
objects involved. To find out if babies make the same inference,
the screens are removed and the infant is shown either one object or
two objects. The result is that infants look longer at the
one-object display, presumably expecting, like adults, that there had
to be two objects; otherwise, the object would have been visible
crossing the gap. In a follow-up study, a continuous motion
condition was used. This condition is identical to the previous
condition except that, between the alternating trials, an object
is seen crossing the gap. In this condition babies
looked longer at the display of two objects. Like
adults, they presumably assumed there was a single object moving back
and forth (Aguiar & Baillargeon 1999 report similar findings
with 2-month-olds).

2.2.2 Animals and the representation of objects

If the representation of spatiotemporally continuous objects is part
of our evolutionary endowment, we might expect to find such
representation in the newborn of other species, and indeed we
do. Newborn chicks, for example, display a striking ability to
represent spatiotemporally continuous objects (see Spelke (1998) for
review). In one study, newborn chicks spent their first day of
life in a homogeneous environment containing only one inanimate object.
On their second day, the object was moved fully out of view
behind one of two screens. Though they had never before seen an
object hidden behind another, they reliably searched behind the correct
screen where the object was hidden. Indeed they even did so when
they had to turn away from the object in order to reach it (Regolin et
al. 1995; Regolin et al. 2000). Chicks, it seems, have object
permanence from birth. Although this does not show that
object permanence is innate in humans, it does show that in at least
one animal, evolution has succeeded in building it in.
So Nativists can claim an existence proof of an innately endowed
representation of objects as permanent.

2.2.3 Babies' representations of objects support addition and subtraction

Wynn (1992) showed that young babies represent objects as not only
persisting in time and space, but also as subject to addition and
subtraction. In that study, babies were habituated to a display
of a single object on a stage. Then a screen came up and hid the
object completely. Now a hand was seen to bring in another
(identical) object and move behind the screen, from which the hand then
withdrew empty. The question was: do the babies now represent 2
objects behind the screen? Test displays consisted of 1 object, 2
objects, and 3 objects. Again, in line with Nativist claims,
babies showed longer looking times to all displays except the
2-object display. In this respect, babies showed the same
expectations that adults do. Further testing showed that babies
are not only capable of ‘adding up’ the number of hidden
objects (at least to 3), but are also capable of
‘subtraction’ of the same number of hidden objects as well.
This finding has been replicated in 4- and 5-month-olds as well
(Simon et al. 1995; Koechlin et al. 1998).

It may be tempting to see the infant's ability to add and
subtract the number of objects in a display as evidence that infants
already have something close to the adult concept of number, but a
series of studies suggests that this is not the case. Most
telling is the extremely limited set size (about 3 objects) over which
the baby can add or subtract. To illustrate this set size limit,
which has emerged in a variety of experiments, consider the following
study that used crawling, rather than violation-of-expectancy, as an
indicator of the baby's representation (Feigenson & Carey
2005). In this study, babies watched as graham crackers were
placed, one at time, into 2 separate boxes. The babies were then
allowed to crawl to the box of their choice and retrieve the
crackers. When one box had 1 cracker and the other box had 2,
babies crawled to the box with 2. Similarly, when one box had 3
crackers and the other had 2 or had 1, they crawled to the box with
3. The surprising finding, however, is that babies failed with 4
versus 3, 4 versus 2, and even 4 versus 1. Apparently, the
ability to represent and keep track of exactly 4 objects is beyond the
baby's capability. Given the set size limit of 3 objects,
it is arguable that the baby's competence should be understood as
an ability to track 3 different objects in working memory. One
might argue that the baby can succeed in adding and subtracting very
small numbers of objects without having a general concept of number
or any general numerical competence. We return to this issue
in section 2.3.

2.2.4 Intermodal representation

As mentioned above, Piaget (1954) proposed that sensorimotor
coordinations gradually lead to construction of the object
concept. Establishing these coordinations between different modes
of perceptual experience—vision and touch, for
example—would take time, and Piaget proposes that it is not until
the child is 18–24 months that these coordinations have been
constructed. Meltzoff
& Moore 1977 provide counter-evidence to this claim. This study shows that newborn infants
can imitate the facial movements of an experimenter, clearly revealing
the coordination of their own movements (along with the attendant
feelings of their muscles) and their visual perception of the
experimenter's facial movements. The following video links
(Ferrari et al. 2006, videos S1 and S2)
provide some evidence
that newborn rhesus macaques have this ability as well. Insofar
as these coordinations of different modes of perceptual experience are
present at birth in humans and in monkeys, they simply could not be the
products of learning.

2.3.1 The analog number system

There is currently a great deal of empirical research—and
philosophically sophisticated
debate[20]—on
the underpinnings of numerical knowledge
in adults and children. There is strong evidence for the view
that in addition to an exact number system that underlies
formal mathematical thinking, adults also have an analog magnitude
system for representing approximate number (see Dehaene
1997). For example, if we are very briefly shown 2 bowls of rice,
one with 20 grains and one with 50, we can tell immediately which has
more, even though we couldn't say exactly how many grains of rice
were in either. Similarly, shown a book with 70 pages and a book
with 100 pages, we can see instantly which has more pages—though
again without knowing the exact number of pages in either book.
Controlled tests show that such judgments are independent of variables
that correlate with magnitude, such as the extent of space occupied or
the size of the individual stimuli. The ‘signature’
of this analog magnitude system is its ratio dependence: that
is, the difficulty in comparing two analog magnitudes decreases as the
ratio difference between them grows. Recent studies indicate that
the smallest ratio difference needed for adults to successfully
discriminate 2 different analog magnitudes is 8:7. If the ratio
is smaller, error rates in comparing magnitudes spike up.

Analog magnitude representations in
infants. Recent studies have shown have
shown that 6-month-olds use this same analog system to discriminate
numerical arrays (McCrink & Wynn 2004b; Xu
& Spelke 2000). In one study (Xu & Spelke 2000),
infants were habituated to displays of either 8 dots or 16 dots.
When shown novel dot displays, babies who had originally seen 16 dots
dishabituated to displays of 8, but remained habituated to the new
displays of 16. Similarly, babies who had been habituated to 8
dots dishabituated to novel displays of 16, but not novel displays of 8
dots. (Again, researchers controlled for the cumulative amount of
space occupied by the dots, the density of the dots, and the size of
the dots.) A series of studies have now shown that 6-month-old
babies can use the analog magnitude system successfully so long as the
magnitudes differ by a 2:1 ratio. When presented with dots
displays that have a 3:2 ratio, such as 24 to 16, babies this age do
not show discrimination. Note that the necessary ratio for
discrimination gets smaller with age, so that 9-month-olds succeed when
the magnitudes differ by 3 to 2. One might have thought that this
ability to discriminate approximate quantities is somehow implemented
in the visual system, but the analog number system has been shown to
operate at a more abstract level (or perhaps to be implemented in a
number of perceptual modalities). At any given age, the same
ratio applies no matter whether the stimulus is a number of dots in a
spatial array or the number of tones in an auditory sequence (Lipton
& Spelke, 2003)—or even a number of events (jumps) in a
visual display (Wood & Spelke, 2005).

Not only do these representations support comparisons of magnitude,
they have also been shown to support approximate addition and
subtraction in babies as young as 9-months-old. In one study,
babies were presented with a set of 5 objects that moved behind a
screen so they were no longer visible. Then another set of 5
objects was presented and they too moved behind the screen. When
the screen was removed, babies looked longer if there were only 5 objects
than if there were 10. In a parallel subtraction condition, where
babies first saw 10 objects move behind the screen, and then saw
5 objects taken away, they stared longer when the screen was removed to
show a display of 10 objects (McCrink & Wynn, 2004a).

The approximative analog system we have been discussing is different
from the object-tracking system mentioned earlier (for instance, in the graham cracker
study). The infant's object-tracking system has a severely
limited set size, and this is true of the adult's tracking system
as well. The analog magnitude system does not. It has also
been found that infants' success shows the ratio-dependence
profile of the analog magnitude system. The fact that 6-month-old
babies appear to use the same system of analog representation that
adults do—although, again, their discriminations are less
fine—strongly suggests that humans come equipped with an innate
system that makes it possible for them to make relative size
distinctions across modalities. Very recently this hypothesis has
been given strong confirmation by a study showing that the analog
magnitude system operates in newborn babies (Izard et al. 2009).
In this study, newborns were familiarized with auditory sequences
containing a fixed number of syllables and were then tested with
visual-spatial images of the same or a different number of
objects. Infants spontaneously associated stationary,
visual-spatial arrays of 4–18 objects with auditory sequences
(spoken syllables) on the basis of approximate number, providing
evidence for abstract numerical representations at the very
beginning of postnatal
experience.[21]

Is the analog system species universal? If this
analog numerical system is innate, it should be found in all human
societies, no matter how urban or rural, educated or unschooled,
whether in technologically advanced societies or remote and isolated
tribal villages in other parts of the
world.[22]
If it is the same system evident in the
youngest infants, it should not require exposure to any symbolic
representations of number, such as Arabic numerals or a number
lexicon. To test this hypothesis, investigators explored the
analog number system in the Amazonian Munduruku people, an isolated
tribe whose language has no words for numbers greater than 5. As
predicted, the Munduruku compared and added large approximate numbers
far beyond their naming range. Moreover, performance decreased as
the ratios decreased, just as it did in a group of French control
subjects (Dehaene et al. 2008).

Animal representation of approximate quantities. If this
numerical system (what Dehaene has called our number sense) is
part of our innate endowment, might it be evident in other primate
species? Hauser and his colleagues (Hauser et al. 2003)
presented cotton-top tamarins with auditory sequences of syllables of
different numerosities. Like humans, monkeys orient their
attention to unexpected stimuli. When they hear a sequence of
syllables of an unexpected numerosity, they turn their heads toward the
audio speaker from which the sounds are emanating, providing a reliable
indicator of their discrimination of the novel number. The
results are similar to those of the infant studies: cotton-top tamarins
discriminated between sequences of syllables based on approximate
numerosity alone. Moreover, discriminability depended on the
ratio of the numbers, just as it does in humans. Indeed, adult
tamarins showed comparable discrimination abilities to nine-month-old
human babies.

There is now a sizeable literature showing the presence of analog
magnitude representations in many different kinds of animals, including
rats, crows, pigeons, a parrot, rhesus macaques, apes, and dolphins
(see Carey 2009 for review). In short, there appears to be
excellent evidence from studies of human adults, human babies, and
animals, all suggesting the presence of an ancient evolutionary system
of approximate number representation.

2.3.2 The analog system, the tracking system, and the concept of number

If one steps back from the theoretical heat of the
Empiricist-Nativist debates, it should not be surprising that we have
an innate system for discriminating sets by their approximate size, and
that this system is found in other animals too. Animals typically
need to take some measure, for example, of the relative size of food
sources, of the relative number of predators on their left and right
flanks, and so on. In some animals, these abilities may be part
of an encapsulated system devoted to a specific task. The
bee's awareness of the relative size of a discovered food
source—information communicated in the scout's
dance—is a popular example of this sort of ability (Frisch
1953). In other animals, the system operates more broadly, and
different sorts of inputs can be measured in this way (heard sounds,
perceived jumps, and so on). It is as if the brain has an
‘accumulator’; a bar graph system of some kind that maps
input arrays into some neutral format and appends the elements together
into a stack, and a scanner that judges relative stack size.
Gelman and Gallistel and others have explored such systems extensively
(starting with Gelman & Gallistel 1978).

Equally unremarkable is the fact that crawling infants distinguish 1
cracker from 2 and 2 from 3. This discrimination is beyond the
abilities of the posited analog approximative systems. But it
suggests that there is another system in the child that is in a limited
way sensitive to number. This system—which seems more tied
to attention—is the subject of current research (see, for
instance, Pylyshyn 2007). Animals need to keep track of changing
elements in their immediate environment. One idea under investigation is that there
is a psychological subsystem system that ‘tags’ elements in
a perceptual array and keeps track of them by assigning properties to
the tag. Without such a system, we would lack the ability to
re-identify changing elements from one moment to the next. Such a
system therefore seems to be a prerequisite for any perception of a
world/scene, as involving things that are moving and changing. It
is difficult to see how an animal that does not track in this way could
learn to do so (although the ability might grow).

If current thinking about these systems is on the right track, we
have two innate systems, each of which deals with number in some
sense. The analog system takes a range of perceptual
presentations and assigns an ordering by relative magnitude. The
second system identifies and tracks (a limited number of) discrete
elements in an environment. A current research question that is
of particular interest to philosophers is this: what is the relation of
these innate systems to the adult concept of number.
Notice that the analog system does not get us to the concept of an
exact number. Only ranges are detected—the system can judge
that two sets are in the same range (below the ratio-threshold for
discrimination), but two arrays that are “the same” in this
way need not have the same number of elements. The
second system is not approximative. If the subject has
tracked 2 objects and 1 is added, as in Wynn's studies, the
difference is noted and the subject's expectations change
accordingly. So this object-tracking system is sensitive to
the number of units in play, and in this respect is closer to
the adult notion of number. But it has an extremely constricted
range, and is useless when it comes to problems that extend
beyond its range. The crawling infant in the study cited earlier
doesn't represent an added 4th cracker as one more
than the 3 previously tracked. The infant doesn't even
track the 4th element: the system seems to
(eccentrically) shut down completely when its range is exceeded.
So the concept of number—the successor function and all that it
brings in its wake—is not implemented in this system.

For these reasons, some have argued that aside from these
well-evidenced systems, there must be a third element in the human
mind—viz., an innate concept of number, which must involve the
grasp of a fully-general successor function—that grounds adult
mathematical competence (see Leslie, Gallistel & Gelman 2007,
for example). Others, like Carey (2009) have argued that the
concept of exact number is not innate, but is constructed by
the kind of language-based bootstrapping sketched out by Quine (1960).
The debate here is especially interesting because although both sides
are Nativist—in that both accept innate ‘numerical’
systems—there are still learning elements in play in the search
for an adequate psychological account of our distinctive arithmetic
competence.

The simple question: “Is number innate?” turns out to be
too simple. However the current debates play out, we can expect
that the achievement of adult number competence is quite complex and
involves significant innate and learned elements. We should be
prepared to find that things are no less complicated on other
Empiricist-Nativist battlegrounds.

2.4.1 The Theory of Mind experimental paradigm

In a seminal paper of 1978, Premack and Woodruff posed the question
of whether chimpanzees have a ‘theory of mind’; that is, do
they attribute mental states to others, and do they, like adult humans,
predict and explain action on the basis of hypotheses about these
states. It was a mark of Piaget's influence that no one had
as yet asked this question in regard to human infants; Piaget thought that they did not yet have a robust notion of an external world at all, let alone of a world containing minds. The chimp
studies led to an explosion of research into the development of a
theory of mind in human beings. In responses to Premack and
Woodruff's paper, Dennett and others commented that the
successful prediction of another's action does not yet constitute
evidence for a theory of mind. Consider the following: a child
participant in a study is told a story about a boy named Max who has a
piece of candy. Max puts it into the red cabinet and goes out to
play in the yard. The child participant is asked, “When Max
returns and wants to get his candy, where will he look for
it?” The child might answer correctly because he
or she understands that Max will think that the candy is where
he left it or last saw it (i.e., the red cabinet). This involves
attributing mental states to Max. But the child might also answer
correctly because that's where the candy actually is. That
is, the child with no theory of mind might still answer correctly
simply by reasoning that people go to get things where they
are. The way to resolve this uncertainty, Dennett proposed,
was the false belief task. In this task, which quickly
became the litmus test of a theory of mind, the story includes a second
character who enters the scene while Max is still outside in the
yard. This second character finds the candy in the red cabinet
and puts it into the yellow cabinet. Once again, the child
participant is asked where Max will look for his candy when he returns
to the kitchen. Only if the child is successful now, responding
that Max will look in the red cabinet—even though the
child knows that the candy is really in the yellow cabinet—can we
legitimately attribute to the child a theory of mind. There is
now a very large literature involving the false belief task and the
bottom line appears to be that most young 3-year-olds incorrectly
predict that Max will look in the yellow cabinet (or, in some studies,
say that Max thinks it's in the yellow cabinet)
because that's where it is, while somewhere between the
ages of 3.5 and 4, children begin to succeed on the
task.[23]

For two decades, success on the false belief task was considered the
only really hard evidence for a claim that one had a theory of mind.
Whatever social competence children showed before passing the
false belief test was widely considered a precursor to having
a theory of mind. More recently, however, cognitive
developmentalists have argued that success on the false belief task is
neither necessary nor sufficient for the attribution of a theory of
mind, and that focusing nearly exclusively on it has led to an overly
narrow view of the conceptual domain (Bloom & German
2000). The last several years have seen a plethora of studies
investigating the attribution of mental states, and social cognition
more broadly, in infants. The next section focuses on a group of
key concepts involved in understanding social cognition including
goals, agency, and rationality.

2.4.2 Goals

Woodward's 1998 study on goal understanding in 6-month-olds is
a good example of the pattern of recent work in this area.
Infants watched a hand move across a stage and repeatedly grasp one of
the two objects on opposite sides of the stage. The hand always
moved along the same path to the same side of the stage and then always
grasped the same object. After the infants habituated to this
display, Woodward switched the location of the two objects. Now
one of two events occurred: either the hand took a different
path to grasp the same object it had always grasped (because
that object was now on the other side of the stage) or it took the
same path as before, but now grasped the other
object. Looking time showed that infants were more surprised when
the hand followed the same path and grasped the other object than when
it followed a new path and grasped the originally grasped object.
This would make sense if the infants understood in some sense that the
previously grasped object was the hand's preferred
goal. To see if this was really the basis of the
babies' looking responses, control conditions were included to
rule out a variety of other possibilities.

In a control condition, the hand was replaced with a rod that had a
multi-fingered sponge at the end. When the rod/sponge followed
its old path and touched the new object, babies did not dishabituate;
they dishabituated only when the rod/sponge followed a new path to the
old object. The suggestion is that the babies did not see the
action of the rod/sponge (whose shape was similar to the shape of the
hand) as a goal-directed action. What is it about the
presence of the human arm that signals a goal? Would any movement
involving repeated contact between a human hand and one of the toys
trigger goal attribution? Woodward (1999) shows that this is not the
case. In this study, a human arm was used again, but this time
the arm merely dropped onto the display, and contact was between the
back of the hand and the toy. In this case, there was contact,
but not grasping. In this condition, adults would be less likely
to interpret the action as purposeful, and the same was true of the
babies. When the hand/arm followed its earlier path (touching the
new object), babies did not dishabituate; they did however dishabituate
when it followed the new path, even though it made contact with the
same object as before. This suggests that 5-month old babies,
like adults, attribute goal-directedness (again: ‘in some
sense’) to human arms and hands that reach and grasp,
but not to arms that only drop and make passive contact with the
object.

What clues do babies use to determine if a perceived motion is
goal-directed? It's been suggested that babies might first
restrict their attributions of goals to humans only and then, with
experience, extend the range to include non-humans as well (Woodward
2005; Meltzoff 2005). A recent study, however, suggests that this
may not be true. In this study (Luo & Baillargeon 2005),
babies reliably attributed goals to a moving box, which they were
previously shown could move on its own. The key difference
between the rod/sponge in Woodward's study and the moving box in
this study appears to be information about autonomous
motion. The rod/sponge never showed such capacity; the
moving box did. Autonomous motion, the authors argue, signals an
object's status as an agent, and agents, for the baby,
have goals. These results have recently been extended to
3-month-olds (Luo 2011).

A very recent study has shown that infants are sensitive not only to
clues indicating an agent's capacity for autonomous motion, but
to the perceptual information available to the agent as well.
Remarkably, this is true even when this information differs from
their own. In Luo and Johnson (2009), 6-month-old babies saw another person look at 2 different objects and repeatedly reach for the same one. As indicated by their looking times, babies in this condition attributed to the other person a preference for the chosen object. In contrast, in a condition where the baby saw 2
objects, but also saw that the other person could see only one, no preference was attributed. In this case, it seems, the baby
appreciates that the other person cannot see the second object and that
therefore the repeated grasping of the first object does not indicate a
preference. This suggests that babies at this age can already
attribute different perceptual information to different perceivers
(what I see vs. what she sees). Nativists expect to find
similar sorts of perceptual preparedness for other systems of knowledge
and action (for instance, a system of face recognition as preparedness
for social and family life).

The cognitive resources we bring to bear on the problem of
responding to and carrying out goal-directed behavior is complicated;
these studies provide evidence that some of these resources are in
place very early in life. They do not show that the
infant's goal-directedness abilities are innate; they might
somehow be learned on the basis of early experience. But again,
such findings shift the burden. The earlier that resources
involving notions like intention, goal,
preference, and so on appear, the greater the challenge to
Empiricist claims that the categories are learned solely on the basis
of prior experience.

2.4.3 Agency, cooperation, and competition

Another set of studies (Kuhlmeier et al. 2003; Hamlin et al. 2007)
provide evidence that infants are not only sensitive to displays of
agency, but also have a sense of (something like)
cooperative behavior: they readily distinguish between
helpers and hinderers. In the 2007 study,
babies were shown animated displays that adults interpret as a red
circle trying to climb a hill but having trouble making it all the way
up
(Hamlin et al. 2007 video display on line).
In half the trials, the babies see a yellow triangle gently
‘helping’ the circle up the hill; in the other half, they
see a blue square gently pushing the triangle down to the bottom.
Adults plainly see the yellow triangle as a helper, an agent
whose goal is to assist the circle in getting up the hill; they see the
blue square as a hinderer, an agent whose goal is to stop the
triangle from getting up the hill. Babies make such a distinction
as well. Six-months-olds showed surprise in test trials that came
after the hindering and helping scenarios, in which the red circle is
seen approaching its hinderer rather than its helper.
Furthermore, in a live action version of the task, the babies
themselves chose to touch the helper more than the hinderer when they
were given both to
choose.[24]

2.4.4 Rationality

Our understanding of goal-directed behavior is characterized by a
principle of rationality; that is, that all things being equal, agents
take the easiest, most direct, and most efficient means available to
achieve their goal. In a series of studies, Csibra, Gergely and
their colleagues provide evidence that infants use this principle
(Csibra et al. 1999 and 2003). In Csibra et al. 2003,
12-month-old babies were habituated to a ball rolling along a path,
apparently jumping while its path is hidden by a screen, and then
continuing rolling along its path once it has emerged from behind the
screen. In the test trials, the screen was removed and babies
were shown one of two displays: one with an obstacle on the path, one
with no obstacle. Longer looking times at the display with no
obstacle indicate that jumping for no apparent reason is unexpected for
the infant. In contrast, when there is an obstacle on the path,
jumping over it is a direct and efficient means to achieving
one's goal and is therefore not a violation of expectation.

Another study by Gergely and his colleagues (2002) followed up on a
finding of Meltzoff (1988) showing that 14-month-olds imitate the means
an agent employs to attain a goal, even if those means are not the most
direct or efficient. Meltzoff showed infants that tapping a panel
light with his head made it light up. When babies returned to the
lab the following week, they too used their heads to turn on the light,
rather than simply pressing it with their hands. Gergely
suggested that this seeming violation of rationality was not in fact
irrational. He suggested that the baby might reason that
if the light could be turned on with one's hand, the
adult they were imitating would have used his hand. The fact that
the adult used his head to turn on the light suggests to the child that
this must be a necessary means to achieve the goal. To test this
hypothesis, the researchers added a condition in which the adult actor
could not use his hands because they were otherwise engaged: the actor
pretended to be very cold and used his hands to hold a blanket wrapped
around him. With hands thus busy, the adult actor used his head
to tap the panel light. They then compared the babies'
responses to the panel light in this hands-busy condition with the
responses in the original Meltzoff condition where the actor's
hands were simply resting on the table. In the original Meltzoff
condition, babies used their heads to turn on the light, but in the
actor's-hands-busy condition, the babies did not imitate the
actor but instead used their hands. This supports the view that
these babies already are acting on the basis of some principle
connecting efficiency and goal-directedness, and that this principle is
stronger than their tendency to imitate.

2.4.5 Belief and theory of mind

Let us return to the False Belief Task. It was noted earlier
that children younger than 3-1/2 do not succeed in the classic
paradigm. But in a recent study, Onishi and Baillargeon (2005)
showed that infants as young as 13- to 15-months could succeed on a
false belief task. In this study, babies were familiarized to a
display of an adult placing a toy (a plastic watermelon slice) into one
of two boxes and then reaching into the box as if to grasp it.
The point of these familiarization trials was to indicate to the baby
that reaching the toy was the adult's goal. The toy was
then moved from the box in which the adult had placed it to the other
box. Although the baby always saw the toy move, and thus
understood its new location, the adult did not always see the toy move;
half the time, the adult's view was blocked. The question is this: on the
trials where the adult did not see the toy move to the new
box—that is, when the adult had a false belief about the
toy's location—where do babies expect the adult to look for
the toy? Looking time measures indicated that babies were
surprised when the adult looked in the new box, even though babies knew
it was the correct location. In contrast, on the trials in which
adults saw the toy move to the new box, babies were surprised if adults
did not look in the new location. At present there is no
satisfactory account of why 3-year-olds fail the standard false belief
task, given that 15-month-old babies seem to be able to attribute false beliefs to
others. What else does the 3-year-old need, beyond what the
15-month-old already has, to succeed on the classic task? There
are many candidate answers, but the Onishi and Baillargeon results have
considerably changed the debate.

2.4.6 Animals and theory of mind

As noted above, questions about the development of a theory of mind
were first posed with respect to chimpanzees, and it is to chimpanzees
(and other nonhuman primates) that we now return. Until recently,
most researchers agreed that there was little evidence to support the
claim that nonhuman primates represented agency, goals, attention or
the like (Povinelli 2000; Tomasello & Call 1997).
However, chimpanzees, macaques, and other primates do follow eye
gaze. Researchers have probed whether they appreciate the
relationship between the direction of gaze and attention, or
between seeing something and acquiring
information. A number of recent studies have shown that
chimps prefer to steal food from a person (or, in some conditions, a
more dominant chimp) who cannot see them as opposed to a person (or
more dominant chimp) who can (see Flombaum & Santos 2005; Hare
et al. 2000; and Carey 2009 for review). If dedicated
mechanisms to identify agents and to support our reasoning about them
is part of our evolutionary heritage, as seems increasingly plausible,
it should not surprise us to find them in some of our distant
relatives—and in the very young.

Once again, the studies of newborn chicks are particularly
illuminating. Regolin and colleagues (2000) habituated newborn
chicks to a video display involving 2 balls, one red and one
blue. At first the balls are presented as static. The red
ball then moves, bumps into the blue ball, and then the blue ball
moves. After habituation the chicks were presented with a fuzzy
oval-shaped red ball and a fuzzy oval-shaped blue ball. The
chicks imprinted to the red ball, not the blue one. It seems that
they are sensitive to agency—that they see the red ball
as an agent, while the blue ball may be a passive object. To make
sure it was the red ball's autonomous movement that was critical,
experimenters partly occluded the red ball as it began its movement so
that it wasn't clear whether the movement was autonomous or set
in motion by someone or something else. In this condition, the
imprinting preference for the red ball disappeared. These chicks
were newly hatched, so an explanation for these data that appeals to
learning from sensory experience is unavailable. Once again, the
chick studies provide an existence proof of an innately specified
detection mechanism closely related to agency. Note that the
question of what precisely the chick is detecting or representing is
still open—is it autonomous motion or agency or some other
property.

The studies summarized in section 2 are representative of the
Nativist resurgence. Not surprisingly, cognitive scientists with
Empiricist sympathies continue to push back: to search for
countervailing evidence, to question the methodologies involved in
these studies, to develop alternative interpretations of the data, and
so on. Moreover, as we mentioned at the outset, it is not only
Nativism that has experienced a resurgence; there are important
research directions in the cognitive sciences that seem inherently more
friendly to the Empiricist position. In this section we briefly
describe and contextualize some of these developments.

One important trend has been the development of Connectionism as an
alternative to the ‘Classical’ conception of the mind
(Newell & Simon 1976; see Garson 2010 for an overview).
On the Classical view, the cognitive mind is best understood on
the model of a digital computer that (i) uses symbolic representations
that have a combinatorial syntax and semantics, and (ii) manipulates
these representations following structure-sensitive processing rules.
Connectionists replace the Classical view with a model of
psychological processes as involving networks of simple units with
weighted connections among the units that control the spread of
activation through the network, and ‘learning’ algorithms
for resetting the weights of the connections on the basis of earlier
behavior of the network in response to some task. There is
continuing debate about whether the Classical and Connectionist models
are really incompatible, and some have argued that Connectionist
systems are best viewed as implementations of classical symbol-based
systems (see Pinker & Prince 1988 for discussion). But
the research on psychological processing within the Connectionist
framework is very different from what one finds in the Classical
tradition.

Connectionism is relevant to the Nativism-Empiricism in two related
ways. In the first place, Connectionism provides a natural format
for the Empiricist idea that perception provides the basic elements of
the mental system (ideas/network-nodes) and experienced regularities
among ideas strengthens their connection (associations/weightings) and
in this way accounts for learning. But a more important idea is
that if Connectionism could be established as a real
alternative to the Classical symbol manipulation approach (and
not simply as providing implementations of Classical systems), it could
help undercut a key argument of Chomsky-style Nativism. Here is a
simplified version of the target
argument.[25]
Chomskyans, as we noted (section 1.1.2),
argued that grammars—and by extension, the rules governing other
domains of knowledge—are ‘psychologically
real’. If they are, and the Classical view is correct, then
it would seem that such rules are present in the mind as symbolic
constructions. But these rules, as linguistic grammars make
plain, involve abstract concepts that are not perceptually available in
the data. So if the rules are symbolically represented, then
these abstract concepts, which are the constituent elements of the
rules, are also internally represented. But if the relevant
concepts are not perceptually available, how could they be learned by
Empiricist-style mechanisms that only track regularities in the stream
of experience? This sort of Nativist argument was developed in
Fodor 1981. Connectionism rejects the view of mental
representation on which this argument depends. For the
Connectionist, information is not in the mind as the semantics of
mental symbols; as the meanings of terms in the language of
thought. For the Connectionist, information is distributed as a
pattern of weightings in a network in which none of the nodes
represents anything. So: if this sort of anti-Classical
Connectionist approach is successful, the methodologically-grounded
Poverty of the Stimulus argument for Nativism is blocked.

There is continuing controversy about whether Connectionism has
in-principle limitations that disqualify it as a general model of
cognitive processing (see Fodor & Pylyshyn 1988 and the
literature this critique spawned, which is reviewed in Garson
2010). But there is a practical problem that is less
controversial, and to understand it, we need to consider more closely
how Connectionist nets learn. Imagine that one wants the net to
learn the difference between (photos of) male and female faces. A
set of input nodes will code the photo, activation will pass through a
set of intermediate nodes, and an answer will appear on the output
nodes. If the output on a particular input is incorrect (a male
is misidentified as a female for example), the algorithm that governs
the dynamics of the network automatically adjusts the weights of the
connections between the various nodes ‘in the right
direction’ and more inputs are cycled through the system.
When the output is correct over some range of inputs, the net has been
‘trained up’; it has successfully learned to tell male from
female faces in photos. The art in Connectionist modeling is to
discover the best network structure and the right algorithm for
adjusting the weightings. The problem is that such networks learn
very slowly; they often need hundreds of thousands of cycles of inputs,
outputs, and weight adjustments. But humans and animals learn
many things very quickly, sometimes even from one instance and often
from a small set of instances (Garcia et al. 1955; Markman 1989).

One way to approach this discrepancy is to see it as due to the fact
that in the typical Connectionist set up, the weights between nodes are
initially set to random values, and are (very) slowly reset on the
basis of small adjustments. But the fact that the initial
weightings provide no prior information is arguably an artifact of the
modeler's Empiricist commitment to have all the learning
‘come from experience’. There is nothing in the
general structure of Connectionist models that would prevent the
modeler from starting with a highly constrained set of
weightings—in this case one that already holistically contains
information of the general features of human faces, and perhaps
information about differences between male and female faces. The
upshot, then, is that although most actual Connectionist models are
Empiricist-friendly in their format and in their representational
commitments, they can also be implemented in a way that is congenial to
Nativist ideas. The prior information that the Nativist claims is
part of the initial state of the organism can be realized by setting
the initial patterns of weightings between the nodes in the network in
such a way that learning will happen much more quickly. So while
Connectionism may avoid the very general commitment to Nativism that
some have argued is built into the Classical conception, it is neutral
on the question of whether learning in a particular domain is wholly
based on experience or uses innate information (suitably distributed
across networks).

This last point applies to Dynamic Systems Theory approaches to
cognition as well (Thelen & Smith 1994; Port & van
Gelder 1995). Dynamicists hold that human behavior should be
explained in terms of sets of differential equations that represent a
subject's trajectory in real time through a space of possible
total cognitive-behavioral states. Because they, like
anti-Classical Connectionists, reject the Classical paradigm's
commitment to symbol manipulation and computation, they also avoid the
Nativist consequences of that view. But neither Connectionists
nor Dynamicists are in principle anti-Nativist. However
we model an organisms cognitive processes—as executing a
Classical Von Neumann style program, as reassigning weights to nodes in
line with a Connectionist back-propagation algorithm, or as moving
through a Dynamicist state space as described by a set of differential
equations—the question remains: what are the built in initial
biases of the system and what role do they play in determining the
steady state. One can construct a Connectionist system that is
antecedently tuned to converge on a specific steady-state, and as such
will have a significant Nativist element (Hummel & Biederman
1992 presents such a system for shape recognition). The same
seems true of Dynamic Systems models. The oft-used Dynamicist
example of a pendulum is ‘innately specified’ to reach a
specific steady state (its point attractor) despite wide
variability in its inputs. If very young children do indeed
distinguish helpers from hinderers, for example, then this capacity
will need to figure in the Dynamicist model. It will be
appropriate to then ask about the role that the child's initial
structure or configuration played in its coming to have this
capacity.

Even at the height of Chomsky's influence, it was clear that
the strength of the Nativist position rested, to a great extent, on the
weakness of the Empiricist alternative. The central argument from
the Poverty of the Stimulus was that Empiricism had failed to make
its case, and that the Nativist hypothesis was therefore more
plausible. But it was implicit in this dialectic that if
a more powerful Empiricist learning theory were developed, it could
change the terms of the debate. Furthermore, Empiricists argued
that there had to be a stronger general learning theory
because learning theory as developed up until that time did not have
the resources to account for much learning that was plainly
based on experience (Harman 1967; Putnam 1967). These Empiricist
hopes for a more powerful learning theory have been realized. Learning theory has advanced
significantly, especially in the last decade, and Empiricism can now
draw upon new resources; specifically, learning algorithms based on
Bayes' Theorem. The power of Bayesianism raises the
possibility that the earlier Poverty of the Stimulus arguments
underestimated what could be learned from experience by general
learning mechanisms.

‘Bayesianism’ is a general term for a range of
sophisticated statistical methods and tools that draw upon Bayes'
Theorem, which tells us how to revise our beliefs given new
information; that is, how to choose the best of a set of alternative
hypotheses given new data. The calculation requires (i) the prior
probability of the data, (ii) the probability of the data given the
hypothesis, and (iii) the prior probability of the
hypothesis.[26]

The relevance of Bayes' Theorem to Cognitive
Science. Bayesianism is in its origins a normative theory of
what one ought to believe under specific epistemic
circumstances, and as such it has been applied extensively in
understanding theory confirmation in the sciences. It first came
to the fore in the cognitive sciences as an ideal against which one
could measure human irrationality. Kahneman and Tversky
(1972) famously showed that ordinary reasoners typically fall short of
Bayesian standards when they are asked to decide the bearing of
evidence on hypotheses, in part because they misjudge the relevance of
the prior probability of the hypotheses. But in recent years,
Bayesian ideas have been successfully applied to the processing
underlying perception—especially the visual system (Knill
& Richards 1996; Rao et al. 2002). In visual perception,
a pattern of light hits the eye (the proximal stimulus), and the visual
system needs to determine the nature of the visual scene in the
environment (the distal stimulus) that caused that pattern. The
proximal stimulus is compatible with a number of different distal
stimuli. So the system faces something like the
under-determination problem that a scientist faces. Both must
select one view about what the world is like on the basis of
information that still leaves other possibilities open. It turns
out that Bayesian methods have been very successful at modeling how the
visual system resolves these uncertainties.

The visual system gets an image on the retina (D), and must
determine what the real-world scene is like (H). The
image is compatible with many different possible scenes, but the visual
system is very good at overcoming this uncertainty and reliably settles
on the most likely scene. In Bayesian terms, the visual system
must do this calculation:

P(Scene | Image) =

P(Image | Scene) P(Scene)

P(Image)

Consider this (again simplified) example, drawn from Scholl
2005. In Figure 1, the circles are ambiguous; they can be either
convex bumps or concave depressions. Viewers normally see (a) as
convex and (b) as concave, (but if the display is turned upside down, the
properties are reversed).

Figure 1

The fact that we see these as we do can be explained in Bayesian
terms. To figure out the most likely scene/source of (a), the visual
system must assign a probability to the hypotheses H1 (that the circle
in a is convex) and to H2 (that it is concave). One key assumption
the visual system makes is that the scene in both (a) and (b) is
illuminated by a single light source coming from overhead. So if
the bottom of the circle is in shadow, we tend to see it as convex; if
the top, we tend to see it as concave. When we look at
(a), this assumption about the light source translates into the
prior probability of H1 being higher than the prior probability of
H2. So the priors in this case give us an antecedent ordering of
the hypothesis space (here we ignore other hypotheses that could
account for the image), and the visual system settles on (a) as
convex.

Bayesian approaches are appealing because they provide a natural way
to solve the problem that troubles theories, like Connectionism, that
are built on associationist lines. Associationist learning is
bottom-up. It depends on keeping track of correlations in the
stream of experience and slowly modulating expectations on the basis of
these correlations. But as we noted earlier, humans and animals
learn about the world very quickly, and on the basis of a very small
number of exposures and interventions. A child hears the word
‘horse’ applied to a few instances (and probably
hears stray utterances of the word too) and reliably learns the
extension of the term (Markman 1989). A rat made sick by a food
one time, will not eat food with that smell again (Garcia et
al 1955). These ‘fast-mappings’ are a problem
for Associationist models. But they are more easily accommodated
in Bayesian models, which essentially quantify the role of background
knowledge—the top-down contribution—in the fixation of
belief. If the rat already knows, as part of its background
knowledge—its ‘factory settings’, so to
speak—that when it comes to foods, smell is an indicator of
edibility, then single-case learning is less mysterious. The
prior probability of hypotheses linking edibility to smell may be
antecedently set as very high, and hypotheses linking edibility to
orientation may be set as very low. So one association between
sick-making food f and smell s will be enough for the rat to
‘adopt the hypothesis’ that f and s are regularly
linked. In contrast, if sick-making food f is always in a
particular orientation o, the rat may have a hard time making the
connection even if it may be sensitive to orientations in other
contexts. Similarly, if the child comes to the word-learning task
with the assumption that new words most likely pick out unfamiliar
extensions—again, with this assumption implemented in the
priors—then her job is made easier. Bayes' Theorem
gives us a way to factor in this top-down background
knowledge.[27]

The key issue in considering the bearing of Bayesianism on the
Nativist-Empiricist controversy is the
priors.[28]
Where do they come from? If we are talking about simple, repeatable events like coin flips, the priors are a
matter of well-defined relative frequencies given by probability
theory. But the prior in the concave-convex case (which was
chosen to highlight this point) seems to involve domain-specific facts
about light and shadow, and their relation to the shape of objects.
Scholl 2005 argues that the priors here are innate, and many
scientists studying visual perception would agree. We don't
learn from experience that the objects in our perceptual world
will typically have overhead illumination. Rather, this is one of
the ‘factory settings’ of the visual system. As
Kersten (2004) puts it (speaking more generally): ‘the priors are
in the genes’. Ullman (1979) argues that the same may well
hold for the general constraints relating the rigidity of objects to
facts about motion. The view that the illumination constraint is
innate is also supported by the fact that chickens reared in an
abnormal illuminated-from-below environment still react as we do to
stimuli (a) and (b) (Hershberger 1970). So we have evidence that this
prior can be innate.

Let us assume that there are significant innate priors that operate
in perceptual processing. Does this score points for the Nativist
position in general? In one way it does, because it is in line
with the basic Nativist theme that humans are tailored for their
natural environment. But in another sense, the Empiricist might
downplay the importance of this kind of Perceptual Nativism for the
larger debate. Empiricists have always taken it for granted that
we perceive as we do, in large part, because of our
biological-psychological nature. The traditional Empiricist focus
has usually been on that part of our understanding that goes beyond
what we actually perceive. Its main claim is that anything that
goes beyond what we perceive is constructed out of what
we've perceived by domain-general
principles. So even if (some of) the priors involved in Bayesian
models of perceptual processing are innate, the more critical arena for
the Nativist is domain-specificcognitive processing,
to which we now turn. Nativists would expect that the best
Bayesian models of cognitive processing would have to incorporate
innate priors that reflect domain-specific knowledge.
Empiricists would expect that domain-specific priors are themselves
learnable by Bayesian methods from experience plus domain-general
constraints on learning.

We do not yet know enough to settle these questions, but they are
now beginning to be addressed. Most recently, a number of
theorists have used Bayesian techniques to model not just low-level
perceptual processing but also aspects of higher-order cognitive
processes. Areas of current research include concept learning
(Tenenbaum 1999), word learning (Xu 2007), and causal reasoning
(Griffiths & Tenenbaum 2005; Griffiths et al. 2011), and the
list is
growing.[29]
Contemporary research on the application of Bayesian techniques
to higher-level cognition has generally ignored the battle lines of the
Nativist-Empiricist debate. The real interest is in the
possibility of developing statistical techniques that, as Tenenbaum et
al 2006 puts it, “integrate bottom-up and top-down
influences.” We already have sophisticated statistical
analyses of the bottom-up part; the perceptual phenomena. The
challenge is to develop quantitative representations and analyses of
the levels of top-down background knowledge that operate in particular
domains. In section 2, for instance, we considered as part of the
child's background information his theory of mind.
It was on the basis of this theory that the child could develop a
structural analysis of a situation in terms of agents, beliefs, goals,
help/hindrance, and so on. But the information contained in such
a theory and the structural analyses of particular situations that this
theory makes available, cannot yet be integrated into Bayesian
statistical analyses. The challenge for Bayesians is to develop
ways to recast the top-down elements and the analyses they make
available in quantitative terms. Only then will we be address
whether and to what extent top-down information is learned or
innate.

We can use the case of language understanding, a well-studied area
and arguably a Nativist stronghold, to illustrate how these Bayesian
goals might be achieved. The phenomenon is familiar: you hear a
sentence S1 as having a specific meaning. The theoretical
approach mirrors the vision case: S1 (as auditorily processed) is
compatible with a number of structural representation, but your parser
somehow chooses the best one sr(S1)′. The Bayesian says
that the parser is able to do this because it can do a statistical analysis
that integrates bottom-up and top-down information. In this case,
the bottom-up element is S1. But the range of possible
structured representations the parser can select from is top-down
information, as is the algorithm that chooses sr(S1)′
over other candidates. The problem: how to assign a prior
probability to a complex structured representation like sr(S1)′
(say a tree)—a probability that depends on the probability
assigned to the sub-elements. We know how to assign the prior
probability of a series of heads for a fair coin. But the question comes up again: how do we
assign a prior probability to a linguistic representation, or to a
complex visual scene, or to a complicated representation of the goals,
roles, and perceptual beliefs of a player in one of the Theory of Mind
scenarios? The events are more complex, the
representations of the events are therefore more complex, and the
hypothesis space is more complex (Chater et al. 2006).

In the language case, the Bayesian can hope to draw on a good deal
of what contemporary linguists have already achieved in understanding
the structures underlying sentence comprehension, and some
computational linguists are beginning to merge such analyses with
probability theory (for instance, Chater & Manning 2006).
But even here, the problem of finding the best structure to assign to
an input is daunting. As Chater et al. 2006 puts it:

“More challenging is inferring representational structures over
which parameters are optimized. One problem is that the space of
possible structures is often large and discontinuous; a second is that
a direct application of probabilistic methods would involve assessing
each structure by integrating a prior over its parameters, which seems
computationally prohibitive; a third is that structures appear to be
constrained in potentially highly abstract ways.”

In the case of theory of mind, on the other hand, we don't yet have
developed theories about the relevant structures (but see the related
work on causality collected in Gopnik & Schulz 2007). So it is
only if Bayesians can get a handle on these representational and
statistical problems, that they will be able to attack our
question: how is the space of such structures generated in the first
place? Is there innate domain-specific information at work or is there
a Bayesian hierarchy, a two-level-up Bayesian account that explains
how this one-level-up information is acquired (that is, a Bayesian
learning-theoretic account that explains why the child represents
linguistic input, for example, using tree structures, but integers in
terms of a very different linear structure; for further discussion see
Tenenbaum et al. 2011).

So, for example, children might know that animals are arranged in a
taxonomy of a specific sort, and this prior background knowledge helps
them learn about animals. But how do they get this prior? It might be that they have a prior higher-order principle P that
provides a probabilistic ordering on different graph structures, and
that the taxonomy they use has a higher-prior probability than other
ways to structure the animal world (say, a ring structure). But how do
they get P? Do they learn it or is it simply there innately? To
tackle these questions, all sorts of objects—structured
representations one finds in a grammar, graph-structures one might find
in a taxonomic representation of causal or kin relations, schemas
applied to scene or event analysis, etc.—will need to be
formalized and assigned probabilities. So there is much to be
done. There is no a priori answer about how far up the Bayesian
can go, and we do well to keep an open mind about the nature of the
unlearned priors. But we should also not overlook findings like
the chick's stubborn presumption of illumination from above,
which suggest that nature can build in unlearned priors, and that
they can be domain specific. It would be, at the very least,
extremely surprising if nothing like this operates in human
psychology.

Bayesianism, then, focuses the Nativist-Empiricist question on the
priors. First, we need to find out where the background
knowledge brought to bear in any particular task comes from. Is
some part of it innate, or can its presence be accounted for in terms
of higher-order Bayesian learning? At some point, the Bayesian
will come up against what is not learned by Bayesian methods (at the
very least, the Bayesian machinery
itself[30]),
and we will want to understand its specific
character. Will it be information implemented in our perceptual
systems or domain-general information that applies no matter what is
being learned, supporting the Empiricist view, or will some of it be
tailored to specific ranges and domains of knowledge, vindicating the
Nativist? We are still at the beginning of the road to the
answers to these questions,

In summary, then, Bayesianism appeals to Empiricists for at least
two important reasons. First, because it reinstates
learning from experience as a central process in
cognitive development and change. This focus on learning
contrasts sharply with the first wave of Nativist cognitive research,
which, inspired by Chomsky's work in linguistics, tended to
assign a diminished role to learning from experience. Experience
was thought to act as a trigger/releaser of innate information, or, as
in some linguistic theorizing, as setting values to parameters that were left
unspecified by our innate endowment. The lead role, again
following Chomsky, was assigned to growth, understood simply as
biological maturation. The second reason is that the current
Bayesian mindset tends in some ways towards Empiricism.
This is primarily because Bayesian learning can, at least in principle, be extended
hierarchically, in the ways we've discussed. But
Bayesianism also has some appeal to Nativists, because it focuses
attention on the role of background knowledge in learning, and this is
a theme that Nativists have pressed against bottom-up Associationist
forms of Empiricism from the outset. Nativists can welcome a
renewed focus on learning, and join in the development of Bayesian
theories of cognitive development. So in the end,
Bayesianism—as an approach to cognitive development—is,
like Connectionism, compatible with Nativism.

The studies that we surveyed in section 2
provide compelling evidence that we have been underestimating how much
infants and young children understand about the world. At the same
time, it is clear that adult competence goes far beyond the child's in
virtually every domain. The Bayesian framework we discussed
in section 3 has the potential to address both
issues at once. It provides a systematic and quantifiable approach to
development, and is at the same time open to incorporating innate
elements. Whether it will succeed in unifying a learning-theoretic
approach to cognitive development with the built-in representations
favored by Nativists remains to be seen.