We see ... that language ... consists of a peculiar symbolic relation --
physiologically an arbitrary one -- between all possible elements of
consciousness on the one hand and certain selected elements localized in the
auditory, motor, and other cerebral and nervous tracts on the other ....
Hence, we have no recourse but to accept language as a fully formed functional
system within man's psychic or "spiritual"
constitution.

--Edward Sapir (1921)

The call for papers for this symposium invited
participants to focus on new research findings "that throw a special light on
the links between language, culture and thought" and mentioned some
interdisciplinary fields from which applicable new findings might be emerging.
It is possible that new findings and contributions from external disciplines
may not only shed light on questions raised in the past but also suggest
reformulations of those questions. New knowledge, along with what we have
known all along but have been failing to use imaginatively, may permit us to
raise different questions about Whorf's assertions than those which have so
often defined the issues in the past.

Among the several interrelated theories and hypotheses
that Whorf proposed, the one that especially appeals to me and invites further
inquiry is summed up in this oft-quoted statement:

We dissect nature
along lines laid down by our native languages. The categories and types that
we isolate from the world of phenomena we do not find there because they stare
every observer in the face; on the contrary, the world is presented in a
kaleidoscopic flux of impressions which has to be organized by our minds --
and this means largely by the linguistic systems in our minds. We cut nature
up, organize it into concepts, and ascribe significances as we do,
largely because we are parties to an agreement to organize it in this way --
an agreement that holds throughout our speech community and is codified by the
patterns of our language. The agreement is, of course, an implicit and
unstated one, but its terms are absolutely obligatory; we cannot talk at
all except by subscribing to the organization and classification of data which
the agreement decrees.

I must confess that to me
this statement is so self-evidently true — in all but one respect — that I
find it hard to understand how anyone could disagree. Yet disagree they do,
some people. For example, Steven Pinker finds it almost outrageously
mistaken and even calls it "this radical position" (1994).

Now, I just said "in all but one respect". What is that
one respect? It is Whorf's provision that the "agreement ... holds throughout
our speech community". But I think we can show that people have different
thought systems even within the same language-culture system.
This position has been convincingly demonstrated, for example, by Deborah
Tannen in her You Just Don't Understand (1990): Why do husbands and
wives so often fail to understand each other? Because they are operating with
different systems of concepts and with different interconnections of concepts
with lexemes. They are thus able to use the same expressions, in the same
language, to arrive at quite different thoughts.

Even more: We can find divergence of conceptualization
within a single person’s cognitive system — a conventional metaphor for a
concept which conflicts, for example, with a visual or motor image connected
to the same concept; or even different and conflicting metaphors
attached to the same concept.

This paper aims to explicate this statement of Whorf,
to show not only that it is valid but why it has to be valid, since what
Whorf proposes in this passage can be shown to follow as an inevitable
consequence of the structure and operation of the human neurocognitive system.
Before introducing the neurocognitive perspective, I shall attempt an informal
sketch of some properties of our mental systems and then consider just what
question(s) we are asking in connection with the interrelationships of language and thought.

Five Basic Properties of Mental Models

We can start with the observation that our
thinking works along with our senses and the reports that we get through our
linguistic systems to provide us with ‘pictures’ of the world. We humans are
all model builders, building models of the world and of ourselves within
that world. This modeling process, largely unconscious, begins in infancy,
perhaps even before birth, and continues into adulthood, to some extent even
to old age, subject to the limitations of senility. We think we have
abundant knowledge of the world, and perhaps we do, but to the extent that it
is not accurate what we have is illusions rather than knowledge.

It seems that the mental system automatically engages
in certain basic strategies that are indispensable to its operation yet which
necessarily involve simplification, hence imperfect representation. These
basic modeling strategies result in the (usually unconscious) formation
of four kinds of assumptions about the world. That is, the mental system, by
its nature, assumes

(1) the existence of boundaries,

(2) the existence of enduring objects,

(3) a basic difference between objects and processes,
and

(4) the existence of categories of objects and of
processes and relationships.

Without such assumptions it can't operate at all. They are
consequences of built-in properties of our perceptual and conceptual systems.
They are thus involved in all our efforts to understand anything.

To these we can add a fifth property, the tendency to
build semantic mirages. This tendency makes use of the twin processes
of reification and what might be called the one-lexeme-one-thing
fallacy, a process of conflating different concepts that are connected
to the same lexeme, simply because of that shared lexical connection. This is
the source of some of the intra-person variation in thought pattern mentioned
above. Some pertinent semantic mirages of English are ‘thought’, ‘language’,
and ‘consciousness’. For example, by reification of the term ‘language’ we are
led to believe that there is such a thing as language, and by the
one-lexeme-one-thing fallacy we are led to suppose that this term stands for
just one thing, even though when we look closely we can see that it is used
for a number of quite distinct collections of phenomena selected from the
kaleidoscopic flux, including especially these three: (1) language as a set of
sentences (e.g. Chomsky) or utterances (Bloomfield); (2) language as the system
that lies behind such productions; (3) language as linguistic processes, as in
the title of Winograd's book Language as a Cognitive Process (1980).
Let us call these language1, language2,
and language3. Our cognitive systems are evidently tempted
to conflate these three since the same term is being used interchangeably for
all. Still other phenomena are labeled by this lexeme from time to time,
providing the opportunity for further conflation. For example, we find Steven
Pinker using the term for certain cognitive phenomena associated with
language2, namely the propensity and ability of children to learn
languages (1994). Why this set of properties should be called ‘language’ is
something you would have to ask Pinker about; perhaps he believes that this
propensity and ability is explained as the operation of an innate cognitive
foundation on which language2 can be built, and since there is
no readily available term for this notion he adapts the term ‘language’ by a
kind of metonymy based on language2. Whatever the explanation,
having indulged in this semantic exercise he goes on to conflate this new
sense of the term -- we can call it 'language4' --
with language2. By thus stretching the term ‘language’ along with
the term ‘instinct’, which he uses to draw attention to the fact that
language4 is evidently innate, he gets the title of his book
(The Language Instinct) and finds himself justified in making, and
evidently believing, such statements as,

...some cognitive
scientists have described language as a psychological faculty, a mental organ,
a neural system, and a computational module. But I prefer the admittedly
quaint term "instinct." It conveys the idea that people know how to
talk in more or less the sense that spiders know how to spin webs. (p. 18)

Notice that the first part of
this quotation and the passage "people know how to talk" work for
language2, but the term 'instinct' can only be justified, and that
by a stretch, for language4. It can't apply at all to
language2, since a French child raised in a Mandarin speaking
environment will speak Mandarin but not French. I don't think any realistic
appraisal of the phenomena can find any reason for considering language2
and language4 to be one and the same. This is not even to
mention that a spider raised in isolation will nevertheless spin webs. It is
not just by coincidence that this is the same Steven Pinker who, a little
later in the same book, strenuously objects to Whorf's idea that language
can influence thought. Those who doubt that language can influence thinking
are unlikely to be vigilant for the effects of language on their own thinking.

Another semantic mirage related to the
one-lexeme-one-thing fallacy is the unity fallacy, the illusion that a
concept represents a unified object, which must be either present or absent as
a whole in a given situation, rather than a (sometimes haphazard) collection
of phenomena of the kaleidoscopic flux. In the case of language, we see this
fallacy in questions like, "How many languages do you speak?". It leaves no
provision for the case in which a person knows a little bit (say a few dozen
lexemes) of, let us say, Swedish. The same fallacy leads to questions about
the evolution of language. If earlier people either had language as a whole
unit or didn't, there are serious problems in understanding the evolution of
language.
How do you get from no language to language fully formed in one generation?
What did the first language mutant talk about, given that nobody else could
understand him? These are questions that can arise only from a semantic mirage.

In general, whether it is within space or time or in
more abstract conceptual dimensions, our mental systems impose boundaries on a
world which does not itself have boundaries. Why? If they did not do so, it
would not be possible to talk to one another about the world or to think about
the things of the world. Although everything is connected in various ways to
other things, hence ultimately to everything else, we can't talk or think
about the whole world at once. Thus we have to cut up the kaleidoscopic flux,
to segment it by imposing boundaries; and since those boundaries are imposed
by our minds and are not really there, they can be regarded as illusory.

A pertinent example is words themselves. In ordinary
speech they do not occur in isolation; rather, we get phonological phrases
with no gaps corresponding to word boundaries. Yet our perceptual systems,
seemingly without effort, extract words as units -- if we are hearing speech
in a language we know, but not otherwise -- and treat them as separate units
in the process of comprehension, just as if there were boundaries
there. The boundaries are supplied by our mental systems.

Categorization goes hand-in-hand with segmentation. The
world, infinitely complex with no natural boundaries and no two things
completely alike, is modeled by our minds by means of these two tools: (1)
segmentation, achieved by mentally imposing boundaries; and (2) the
classification of the segments into categories on the basis of shared
properties. But those shared properties do not include all the properties of
the items categorized, only some of them. It would be impossible to use all of
them since everything in the world is indefinitely complex, and so recognizing
all or even too many of them would render categorization impossible.

It follows that all imposition of structure in our
mental models is made at the cost of ignoring some properties of the phenomena
modeled.

Of course, these two fundamental sources of (often
useful) illusion do not just operate in that order, first segmentation, then
categorization. For the segmentation is done partly on the basis of properties
of the segments which result; it is thus influenced by considerations of
categorization. We do not ask which comes first; it is like the chicken and
the egg. Similar considerations apply to the case of Pinker’s conflation of
language2 and language4: of the two cognitive operations
we can distinguish — the metonymic creation of a new sense for the term
language and the application of the one-lexeme-one-thing fallacy — we don't
want to suppose that they took place separately and in sequence; things like
this seem to happen all at once.

Now, is there any reason to expect that all people,
regardless of their different cultures and languages, share the same system of
illusions? Would that not be a preposterous supposition? If we reject that
unlikely possibility, we are accepting the proposal of Benjamin Lee Whorf. To
understand how it is that, as Whorf pointed out, different peoples of
differing linguistic and cultural backgrounds have different mental models of
the world, we have only to appreciate the fact that any mental model is
necessarily a simplified model — hence a distorted model
— of what it is attempting to represent, and the rest follows. It is then easy
to appreciate that the systems of different cultures are different, simply
because they are imperfect in different ways. To verify this conclusion, we
can find abundant evidence, and of course many examples were provided by Whorf
and many more by others, including Chafe (this volume) and other participants
in this symposium.

It is not just that our minds are mistaken about the
world when they impose these structures, for they couldn't operate at all
without doing so. They enhance our ability to cope with the world by building
on our experience, including the indirect experience provided by linguistic
inputs — that is, by hearsay. But the only way they can do so is to simplify —
and that means to oversimplify — since without segmentation and categorization
— processes of oversimplification — they couldn't organize our worlds at all.
Thus it is inevitable that our representations of reality are
necessarily filled with illusion. Although we can get convincing evidence of
this fact by observing how different cultures, and even different people in our
own culture, structure their projected worlds, we don't have to depend just on
such evidence to reach the conclusion that our projected worlds are full of
illusions, since we can deduce that fact just from consideration of the
structural properties of the system we use for our knowing.

What Are We Asking?

In this exploration I am placing more emphasis
on categories expressed by lexemes like nouns and verbs than upon grammatical
categories, even though most of the literature on the ‘Whorf hypothesis’
dwells on grammatical categories, as if he wrote about only those. But some of
his best examples, such as his well-known passage about how to express the
notion of cleaning a gun-barrel in English and Shawnee (Figure 1), concern
lexical more than grammatical phenomena. And I think that the passage quoted
above about the kaleidoscopic flux makes more sense and is more powerful if
interpreted in the context of concepts associated with nominal and verbal
lexemes than grammatical categories.

FIGURE 1 ABOUT HERE

The distinction between lexical and grammatical is one
of several dimensions of contrast to be found among the various proposals in
the "complex of interweaving theoretical strands" of what Penny Lee calls the
Whorf Theory Complex (1996:xiv). Besides this contrast, we have several other
choices available when considering what questions to investigate. It might be
a good idea to be clear about just what question or questions we would like to
ask. Are we asking about

Or should we be asking, along with John Lucy (1997:291), in
terms of a two-step process: "how languages interpret experiences and how
those interpretations influence thought"?

Another approach would have it that none of
these formulations is quite right. It is easy to think in some such terms —
one or more of these possibilities — taking these concepts, like language,
thought, perception, behavior, as actual objects or entities of some kind,
as if they had existence apart from human beings; to be more exact, as if they
had some life of their own, apart from the human mind. But I'd like to suggest
that thinking in such terms is in itself an example of just the kind of
phenomenon Whorf was talking about, an example of language influencing thought
-- in this case, through the process of reification, in which we are reifying
‘language’, ‘thought’, and so forth, and treating them as independent objects.

Penny Lee proposes one way of getting beyond this mode
of thinking (1996: xiv):

In the realm of linguistic thinking there is little point in arguing about whether language influences thought or thought influences language for the two are functionally entwined to such a degree in
the course of individual development that they form a highly complex, but nevertheless systematically coherent, mode of cognitive activity which is not usefully described in conventionally dichotomizing terms as either ‘thought’ or ‘language’.

This way of looking at the relationship seems fair enough as far as it goes; yet it isn't quite robust enough to satisfy some people, and I find myself among them. For we do seem to find in Whorf’s assertions a
suggestion that in some way our linguistic systems are playing some kind of causal role. I am not ready to give up on this intriguing possibility.

I would like to propose an alternative way of looking at the situation. Instead of starting with elusive disembodied abstractions like ‘thought’ and ‘language’, we could start by talking about something relatively real, the human br
ain, and about language in relation to the brain. I will try to show how that perspective might reframe the questions for us.

The Cortical Information System

Each of us has an information system which we use to interact with the world, our personal information system. That world is of course not just external to the body, since it also includes information about the body itself:
feelings of hunger and other sorts of feelings, knowing where our hands and feet are and what condition they are in, and so on. The system also includes information about the past, both external and internal events, both experienced and reported happening
s, both true memories and false memories, both physical and mental events. To a limited extent for most people, the system also includes some information about itself.

This information system, implemented mainly in the cerebral cortex and associated white matter, which provides cortico-cortical connections, includes the linguistic system together with conceptual, perceptual, and other systems. Bec
ause of its extensive interconnections with these various other systems, the linguistic system enables us humans to report and think about experiences and imaginings of many different kinds, represented by activations in different modalities all over our
brains. Figure 2, while it is highly simplified in relation to the actual information system of the human brain, provides some idea of the kind of structure involved.

FIGURE 2 ABOUT HERE

It appears from numerous theoretical and empirical studies (cf. Makkai and Lockwood 1973, Lamb 1999c) that most or all of these mental modalities are organized in the form of networks with multiple layers of structure, and this hypo
thesis is supported by neuroanatomical evidence (Kandel et al. 1991, Lamb 1999c:307-369). Of course, as they are all interconnected, these several systems are all portions of one large network.

Figure 2 summarizes in highly oversimplified form numerous hypotheses concerning our neurocognitive systems, some of which are easily taken for granted but none of which should properly be accepted without evidence. It identifies ce
rtain specific functional subsystems, and (without making specific locational proposals) suggests that each of them might occupy a relatively contiguous area. It also identifies connections between subsystems and shows many of them as bidirectional, by me
ans of lines with arrowheads at both ends (cf. Lamb 1999a). It also includes some hypotheses relative to the relative locations of the different subsystems -- for example, the position of Phonological Recognition close to Auditory Perception
. Whether these are actual properties of our cortical information systems or just matters of diagramming convenience is a question we will consider briefly. These and other questions relating to these hypotheses and the evidence supporting them are treate
d more extensively elsewhere (Lamb 1999c). For purposes of the present exploration it will be pertinent to consider the locations of just a few of the subsystems relative to one another along with a few of the cases of bidirectional connectivity. We shall
consider the questions of localization after looking at the learning process.

The hypothesis of bidirectional connectivity in our systems is perhaps most readily supported by experience with our own perceptual systems. Most people can visualize objects and scenes — cats, dogs, waterfalls, our bedrooms. And mo
st of us can ‘hear’ the voices of friends or relatives speaking, or we can listen mentally to the opening lines of Beethoven's 5th symphony; and most of us engage in inner speech, during which we hear our own voices — to be sure, not with the clarity that
is present when actual sounds are being received through the ears. (It is reported that a significant percentage of people have little or no visualizing ability; they find statements like that just given about visualizing hard to believe.) Now, what is g
oing on here? To me, the most likely explanation (in fact, the only likely one) for such ‘inner seeing’ or ‘inner hearing’ is that we are activating some of those same connections in our perceptual systems that get activated when we are getting actual sen
sory input. If I ask you to visualize a cat and you do so, you are activating those connections in your visual system as a consequence of linguistic rather than sensory input. If my suggestion to do so had been spoken rather than written, then, in
terms of Figure 2, the pathway of activation would go from Auditory Perception to Phonological Recognition to Lexis (the grammatical recognition and production subsystems have been omitted from Figure 2 just to keep it from being too
cluttered) to a location in Object Categories to Hi-Level Vision to Mid-Level Vision to Lo-Level Vision. Yes, all the way to low-level vision, for it is here that you have the actual visual features (which you can conjure up to
the extent you care to work at it, unless you are one of those who lack the necessary connections) which are needed to make up the pointy ears, the whiskers, the yellow eyes, etc. (The diagram arbitrarily distinguishes just three layers for visual percep
tion and in doing so presents a highly oversimplified picture; actually there are many more layers than three.)

And so what we seem to have are perceptual pathways going in the reverse direction from that of ordinary perception. The kind of network structure needed to support this ability has to consist of both feed-forward and feed-ba
ckward connections — from a given layer of structure to both upper and lower layers, and both to and from other subsystems. That is, these feed-forward and feed-backward connections can exist not only between immediately neighboring layers of the same sub
system but also between different subsystems, for example between the systems for Vision and Object Categories. This subject has been treated in greater detail elsewhere (Lamb 1997, 1999a, 1999c:132-136; cf. Damasio 1989 a,b,c, Kosslyn 1983, Kosslyn and K
oenig 1995).

Concepts are centrally important to this inquiry. A node for a conceptual category seems to have connections to/from a large number of nodes representing its properties, both to/from other conceptual nodes and to/from other subsyste
ms. For example, concepts for categories of visible objects need connections to nodes in the visual area, those for audible objects to/from auditory nodes, and so forth. Taking the concept Ccat, for example, we have visual connections comprisin
g what a cat looks like, auditory connections for the 'meow' and other sounds made by a cat, tactile connections for what a cat feels like to the touch; as well as connections to other concepts representing information about cats in the information system
of the person in whose system these connections have been formed (Figure 3). And so a person's knowledge of cats is represented in the information system by a little network, actually comprising hundreds or thousands of nodes, including a visual subnetwo
rk for the visual features, an auditory network for the 'meow', and so forth, all ‘held together’ by a central coordinating node, to which we can give the label ‘Ccat’.

FIGURE 3 ABOUT HERE

The current impression that we have in our conscious awareness of a scene or a situation or a person results from a widely distributed representation of many nodes, usually of multiple subsystems; and it is the lower-level nodes who
se activation gives us our conscious experience, while the function of higher-level ones is to provide coordination of those lower-level nodes, so that they are kept active in concert. This is important evidence of the need for distributed representations
to be supported by higher-level local representations: It is those higher-level local nodes that provide, by means of their feed-backward connections, the coordinated activation of the nodes comprising the low-level distributed representations. They also
make possible the coordinated spread of activation from one subsystem to another. The function of this central coordinating node, and the need to posit its presence in the system, are addressed in detail elsewhere (Lamb 1999b, 1999c:329-343, 366-369; cf.
Damasio 1989 a,b,c).

To get a handle on the question of the integrity and relative locations of the various neurocognitive subsystems, it is necessary to consider learning. This we shall do next.

To summarize the argument so far, the first point is: Let's be more realistic about concepts like thought and language and stop treating them as independent disembodied entities with lives of their own. The second point is: Consider
the brain. Next, we consider the third point: learning. If the cortical information system is a network, its information is in the connectivity of the system rather than in the form of symbols or any such objects that would have to be stored somewhere. T
herefore, learning has to consist of building connections.

Learning Looms Large

Relational networks as portrayed in most of the literature (e.g. Copeland and Davis 1980; Lamb 1966, 1970, 1984, 1994; Lockwood 1972; Makkai and Lockwood 1973; Schreyer 1976) describe, however imperfectly, parts of a typical
cognitive system as it might exist at the end of a long series of learning steps. It is natural to ask how that network structure gets formed. How does the system get those seemingly ‘hard-wired’ connections that are seen in linguistic network diagrams?
The preliminary answer, considered in more recent literature (Lamb 1997, 1999a,c) comes in two parts: first, there must be some genetically built-in structure that provides the potential for all of the connections that will eventually get formed; second,
there must be many steps of building and adjusting connections to get from that initial state to the functioning state that represents an adult's capabilities. The abundant connections of that initial state need to be both local and long-distance: local f
or building connections within a subsystem, like higher-level phonological nodes for integrating lower-level phonological elements, and long-distance to allow for connections between different subsystems, such as between lexical and conceptual, between co
nceptual and visual, etc.

We need not suppose that all of the connections of a system actually get built as part of the learning process. And in fact such a supposition would create needless problems for the learning theory, for in that case the hypothesized
learning mechanism would have to be endowed with some way of ‘knowing’ where to build the new connections needed for each particular aspect of a skill, and a means of ‘knowing’ would demand far more complexity than we actually need. There is a simpler al
ternative: to suppose that the genetically provided state of the network includes abundant connections proliferated by a built-in program, most of which connections will never become operative — just as hundreds of eggs are laid by a turtle or insect, onl
y a few of which will produce surviving organisms. We can suppose that those abundant latent connections, from each node to many nodes of other levels, start out very weak, in effect with near-zero strength. We can hypothesize, in harmony with Hebb
(1949), that the fundamental learning process might consist of strengthening a connection when it is active while the node to which it is connected has its threshold satisfied by virtue of also receiving activation from other connections.

This simple learning hypothesis eliminates the need for the system to ‘know’ how to build the precise connections that it must build for linguistic performance. It doesn't need to know at all; it just proliferates possibilities in a
dvance and the learning process is one of selection. This is a Darwinian process like that which leads to the origin of species and to complex biological structures like eyes and brains and the elephant’s trunk (compare Edelman 1987). Nature didn’t
have to know in advance how to construct an eye or a brain. At each of many steps in the process it proliferated possibilities, and those which succeeded passed their genetic material to the next generation.

The Darwinian features of this learning mechanism are in harmony with a bottom-up direction of learning — in perception, for example, from the level of sensory input to successively higher levels of integration, leading up to concep
tual structures. For language, bottom-up learning implies that a child learns to speak in single words before producing multi-word utterances, etc. This bottom-up hypothesis is supported by neurological evidence in that the progress of myelination of cort
ical nerve fibers begins with the primary cortical levels and moves successively higher. The development of species is also bottom-up, as is the development of complex biological systems like eyes and the mammalian brain. In the process of network structu
re building, latent connections get selected for specific functions first at lower levels, and it is only after nodes of a lower-level have been recruited for specific functions that they can serve as ‘parent’ nodes for the next generation of nodes which
will build upon them. That is, higher-level nodes cannot get recruited until a few of their incoming connections are able to be activated; and they cannot become consistently activated until the nodes from which these connections are coming have been recr
uited.

And so it makes sense to call the process Darwinian in that learning is not so much a building process as a process of selection. At every stage of learning we make selections from the abundant potential that has been provided in th
e form of latent connections. These abundant latent connections, proliferated and thus available throughout the system, also provide the enormous flexibility which our mental systems enjoy, their ability to learn about new things later in life which could
never have been foreseen during childhood, their adaptability to novel conditions, their ability in many cases to compensate for damage to brain tissues, etc.

Conceptual nodes occupy upper levels of the cognitive system. The process of learning a concept is a matter of recruiting a node which can integrate information from perceptual as well as other conceptual locations. In the initial s
tages of learning a concept there may be only a few such connections, representing the properties present in awareness at the time of first learning. The activation of the properties that become connected to the concept node, either initially or later on
(see below), can come either from direct experience, ie. via the sense organs and perceptual cortices, or, very commonly, as a result of linguistic activation. In the latter case we are talking about activation of conceptual properties coming from
phonological representations via lexical nodes.

The same process of strengthening connections applies both to the initial recruitment of a node and to its later refinement to adjust to new information coming in after the initial recruitment. Such fine-tuning operations are of two
kinds: (1) adding 'new' connections, for properties of new exemplars that were not present at the time of initial learning of the concept; and (2) strengthening already established connections, for properties repeatedly associated with the concept. In ke
eping with the Darwinian features of the process as described, the adding of 'new' connections is not literally adding connections but of strengthening latent ones, just as in the initial recruitment process. The second of these two processes is one of ad
ding additional strength. We have to recognize that connections can vary in strength not just between the two values of latent and established but along a continuous scale from very weak to very strong. After a sufficient amount of experience (direct and
through hearsay), those properties that are most frequently associated with a concept will have acquired great strength, while those only occasionally present will have acquired relatively weak strength.

For all this to work we must also hypothesize that each such node has a threshold function such that a greater amount of incoming activation leads to a greater amount of threshold satisfaction, causing the node to send varying degre
es of activation out to other nodes: strong activation if the threshold is strongly satisfied, weak if only slightly satisfied, none if the incoming activation doesn't reach the threshold at all. It follows that a part of the learning process has to consi
st of adjustments in the threshold so that the node will be neither too easily satisfied nor too stringent in its demands.

Although the first step of learning a concept may result from a single exemplar, so that the node for that moment responds to a single object, the strengthened connections, representing the perceived properties of that object, would
rarely be specific to that one exemplar, and so would immediately allow for recognition of multiple similar objects comprising, with that initial exemplar, a category rather than just that one object. And as the process of fine tuning progresses, as a re
sult of further experience, the node and its connections will progressively refine, in effect, the definition of the membership of the category based on properties experienced as associated with it, giving greater weight to those experienced as more impor
tant. The node's threshold will then be satisfied by any member of the category defined by its connections. It will have learned to be satisfied by a sufficient amount of activation from among all of the nodes representing its properties, an
d it will automatically exhibit prototype effects, since it will respond more strongly to prototypical exemplars than to peripheral ones. Why? Because the prototypical ones are those with the strongest and the most connections from the properties associat
ed with the category.

Another consequence of the learning process according to this hypothesis is that each concept ends up as highly selective in relation to the potential range that was available to it before learning occurred. We can see this selectiv
ity and the range of the potentials in two ways. First, the possibilities which the world presents are indefinitely varied -- it is, after all, a kaleidoscopic flux. The system of categories that a person ends up with is the result of many individual proc
esses of selection of certain features of that kaleidoscopic flux for representation in the system among the indefinitely many other possibilities which remain more or less ignored. Second, the means by which all this is accomplished is also a matter of s
election: it is the selection of certain connections for strengthening while others remain latent, and of the further strengthening of selected connections among those strengthened earlier.

Moreover, this highly selective structuring imposed upon the kaleidoscopic flux is not a consequence of limitations in our sense organs, in our ability to receive inputs from the world — even though such limitations do of course exi
st. The child who is building an information system has no problem with being able to discriminate or to learn to discriminate myriad visual and other perceptual properties. The possibilities available for the child's sensory appreciation are abundant bey
ond measure. But the process of constructing the information system is compelled by inner necessity to be selective. And what guides the selection? It is other members of the community in which the child is growing up. The child learns to associate
certain selected perceptual properties with every concept being learned (except for the abstract concepts, which are even more heavily dependent on language), and ends up with a system of conceptual categories very much like that of the rest of the commu
nity. And how does the child learn which perceptual properties to emphasize and which ones to ignore? Through language. Not because someone instructs the child by saying that property p is important for concept C, but just by naming e
xemplars of categories, either directly or indirectly. If an older sibling says, "here, doggie!" to a newly encountered creature, that is enough information to allow the younger one to reinforce the connections from the perceptual features of this creatur
e to the node for the developing conceptual category for Cdog, activated from the linguistic system. The system continues its fine-tuning operations in order to become like those of others in the community, in order to be able to communicate wi
th them: "We cannot talk at all except by subscribing to the organization and classification of data which the agreement decrees".

To sum up, what the child does is to learn, by means of language, to make the distinctions that others have been making.

The Proximity Principle

Although the learning hypothesis assumes the availability of abundant latent connections, it seems altogether unlikely that the cortex has connections from every location to every other one, and in fact that possibility real
ly has to be ruled out, even locally. This statement is supported by clear neuroanatomical evidence (e.g. Abeles 1991). But it is perhaps reasonable to assume that the latent connections are abundantly proliferated locally and that, as a result of
a long process of 'evolutionary learning', sufficient long-distance connections are available to non-local areas. But for the latter, the long-distance connections, it is reasonable to suppose that they are relatively limited in comparison to the local on
es. They could be of two kinds: to relatively nearby areas and to distant areas. The latter would be provided for if the brain’s genetic endowment includes long-distance ‘cables’ from certain areas to certain other areas. And we know from neuroanatomy tha
t such cables do exist, the most important for language being the arcuate fasciculus, which connects the Phonological Recognition area to that for Phonological Production.

In any case, it would be a reasonable prediction from this learning hypothesis that if the system needs to connect nodes of two subsystems which are distant from each other, the most likely location for a node that would have latent
connections available from both would be in an area intermediate between them. Why? Because a system with this property makes fewer demands on the amount of latent connections that need to be provided by the genetic endowment. This is the general situati
on for learning of the type which integrates information from more than one subsystem. This situation includes the learning of concepts, which must integrate perceptual information from more than one perceptual modality along with lexical information, and
it includes also the nodes for lexemes, each of which has to provide a bridge from a phonological location to a conceptual or other location. The other situation is that in which a node being recruited for a new function is only integrating features from
one subsystem, as when a complex phonological expression is learned as a composite of two simpler ones. In this situation it is perhaps even more reasonable to hypothesize that the newly recruited node is likely to be close to the nodes for the propertie
s being integrated, and for the same general reason: such a scenario requires far less extensive latent connections in the system than one which would allow such an integrating node to be farther away.

As a result, it will generally turn out that, other things being equal, integrating nodes will tend to be maximally close to the nodes for the features which they integrate. This consequence may be called the proximity hypothesis
. This hypothesis relates function to location. It comes in two varieties:

(1) A node being recruited to integrate a combination of properties whose nodes are close to each other will tend to be maximally close to the nodes for those properties.

(2) A node being recruited to integrate a combination of properties whose nodes are not close to each other will tend to be in an intermediate location between the nodes for those properties.

An incidental consequence of this hypothesis is that close competitors — that is, nodes for similar functions — will tend to be physically close to one another. It follows that nodes which are physically close to one another will tend t
o have similar functions.

The Language Cortex

Based on the proximity hypothesis we can now interpret Figure 2 (above) as not only functionally descriptive with respect to the various subsystems and their interconnections identified, we can also support two principles su
ggested by the figure that up to now may have seemed intuitively acceptable but for which we really had no supporting argument: (1) to a large extent each of the subsystems may be subserved by a geographically coherent area of the cortex; (2) areas which
are connected to two or more other areas should, other things being equal, be roughly intermediate in location between the areas they are connected to. So, for example, the hypothesis predicts that lexical nodes ought to be in intermediate locations betwe
en conceptual nodes and phonological nodes; that conceptual nodes for objects which are both visible and audible should be in an area intermediate between the visual and auditory areas of the cortex. The figure was drawn following these two principles in
the first place because to do otherwise would have resulted in a far more complex diagram. But now we have a theory to justify the policy followed and to support an interpretation of the figure that is more than just an abstract functional one.

The proximity hypothesis also permits us to formulate hypotheses of likely locations in the cortex of the different neurocognitive subsystems, starting from the primary areas, whose locations have been well-known for decades. It all
ows us to predict that the Phonological Recognition area ought to be relatively close to the primary auditory area, and intermediate between that area and the lexical area, and so forth. And since conceptual nodes for objects which are both visible and au
dible should be in an area intermediate between the visual and auditory areas of the cortex, we can propose that they are likely to be in the posterior temporal lobe. In short, the proximity hypothesis and its corollaries allow us to make various predicti
ons about likely locations of subsystems in the cortex, including nodes like those of Figure 3. We can test and refine such predictions against what is known about localizations in the cerebral cortex using results from aphasiology (cf. Goodglass 1993, Be
nson and Ardila 1996) and other areas of neuroscience, including brain imaging (cf. H. Damasio 1991). Such checking provides encouraging confirmation as well as adjustments to preliminary guesses (Lamb 1999c:349-365). In fact we are able with some degree
of assurance to propose hypothetical localizations like those shown in Figures 4 and 5.

FIGURE 4 ABOUT HERE

FIGURE 5 ABOUT HERE

Top-Down Effects in Perception

To sum up what we have so far, our information about a concept is widely distributed, and the distributed representation is held together by localized integrative or ‘convergence’ nodes at higher-levels, which provide potent
ially multiregional retroactivation of lower-level nodes by virtue of bidirectional connections. Feed-backward activation from a category node to the nodes for its relevant properties provides heightened activation to that subset of nodes currently receiv
ing activation from the senses, resulting in increased attention to the properties relevant to that category; and it also triggers inferences, as activation of properties normally associated with the category but not currently receiving sensory input — fo
r example, a portion of a cat's body which is obscured from sight by an intervening object. When we see a cat's head emerging from behind a sofa, we don't say, "Oh, look, there's a cat’s head!" No, we assume that a whole cat is there as our perception sys
tem fills in predicted features of the rest of the body by means of top-down activation. Some such inferences may be unwarranted in the particular instance; this is the source of errors in thinking associated with ‘thinking in categories’.

Together, these properties provide top-down effects in perception: a heavy influence of the system, representing information and beliefs already present in it as a result of accumulated previous experience, upon the interpret
ation of new sensory input. The model would thus appear to account for how it is that, to a large extent, we see what we are looking for and what we expect to find, as much as or even more than what is actually there. Moreover, that previous experi
ence which has built our cognitive systems includes not only the results of our direct experience (as mediated by earlier stages of our perceptual-conceptual systems) but also the results of information received from others via the linguistic system, which has influenced the construction of our conceptual systems.

The Basic Puzzle and a Solution

We are now ready to return to the questions raised at the outset of this paper. First, we have the question of just what we are asking, and then we have the problem of coming up with an answer to the question(s) we choose to
ask. Our basic questions we can now consider in the context of the structure of the neurocognitive system, and I would like to propose that there are two of them: (1) How can language influence thought? (2) How can language influence perception?

First, the influence of language on thought. Here we need to distinguish two subtypes. First, the cases involving semantic mirage. These are the ones which rely on reification and the one-lexeme-one-thing fallacy. For these it is qu
ite easy to see an influence of language on thinking (as in the example given above), and we don't need to be detained further by them. Second, we have the type of thinking which is driven primarily by the concepts involved rather than by their lexical co
nnections. This being the case it is not so obvious how language could be influencing the thinking. As this question is similar to but less complex and less intriguing than that of how language could influence perception, let us turn to the latter. The an
swer to it will apply also here.

Second, then, is the question of how language can influence perception. This is the one I find the most interesting. And such influence I take to be implied by Whorf's statement: "The categories and types that we isolate from the wo
rld of phenomena we do not find there because they stare every observer in the face; on the contrary, the world is presented in a kaleidoscopic flux of impressions which has to be organized by our minds — and this means largely by the linguistic systems i
n our minds."

If we look at perception in connection with Figures 2, 4, and 5, it is not at all apparent how language could influence perception. A perceptual process, say seeing, starts from the eyes, goes through the several layers of visual st
ructure and from there to conceptual structure and only from there to the linguistic subsystems if the subject is motivated to engage in linguistic activity as a result of what has been perceived -- perhaps to say "Henry, do you know that your cat is claw
ing your oriental rug?". The activation of linguistic subsystems would appear to come only after that of the perceptual areas. So how can language influence perception? It may seem that some mysterious — even mystical — process is involved, or mayb
e just an imaginary process.

In thinking about the possibility that language may influence perception or thought, it is easy to suppose, if we are letting our thinking about this question be influenced by the words we are using, not only that these abstract obj
ects -- language and thought and perception — have a life of their own, apart from the minds of human beings, but that any such operation of language upon thinking or perception must be taking place at the same time as the thinking (or perception) being a
ffected. But it needn't be so. And in fact the only way to take the mystery out of the process — to solve the puzzle — is to recognize that it isn’t so.

We need to recognize two different time periods. In the later one, at the actual time of the thinking and perceiving we are interested in, two important factors are operating:

(1) the mutual activation of the conceptual categories and perceptual distinctions which are present in the system at this time of operation;

(2) top-down effects in perception, from conceptual structure to high-level perceptual layers and from higher-level to lower-level perceptual layers.

The other time period is an earlier one, actually several earlier periods in the usual case, often going back to the childhood of the individual involved. At these earlier periods, the conceptual and perceptual structures are being
built and fine-tuned, largely through the operation of linguistic inputs to the system. Here is where the most important role of language comes in, during the construction and refinement of the conceptual and perceptual systems — during the learnin
g processes.

Thus there is a long time delay between the time of linguistic influence and the time of the thinking and perception being influenced.

And so I’d like to propose that the process works roughly as follows: Our thinking is largely the operation of our conceptual systems, and therefore it depends upon the structure of those systems. Also, our perception is depende
nt upon the structure of the perceptual networks and is affected by our conceptual system through the operation of top-down effects in perception. And, our conceptual systems were built and our perceptual systems shaped, mostly in childhood, under the hea
vy influence of language. Therefore, it is not the case that, in some mysterious way, language is influencing thought and perception at the time the thinking and perceiving are occurring; rather it is the influence of languaging during childhood that i
s affecting thinking and perceiving throughout later life.

When we were children we accepted the illusions of our parents and older siblings and friends and teachers, knowing no better than to trust them. And by what means did we do this? Of course, it was largely through language. They tol
d us, in effect, what to believe about the world. Here, then, we have a clear causal relationship: It is largely through language that each generation learns the system of boundaries and categories and semantic mirages projected onto the world by its cult
ure.

The Language Cortex. Those portions of the cerebral cortex which are devoted largely to processing linguistic information.

Latent Connections. Very weak network connections which are abundantly available throughout the cortical information system and which can become strengthened as part of the learning process.

The One-Lexeme-One-Thing Fallacy. The assumption that a lexeme stands for just one thing, ruling out the possibility that it might have different senses in different contexts.

The Proximity Hypothesis. The hypothesis, derived from considerations of the learning process, that network structures with similar functions will tend to be in physical proximity to one another in the cortex, and that those which have
connections to relatively distant cortical areas will tend to be intermediate between the areas connected, thus as close as possible to them.

Reification. The assumption that a nominal lexeme must represent a thing, leading to the unconscious ascription of substantial reality to abstractions.

Semantic Mirage. Any semantic relationship between lexical and conceptual units in a cognitive system which leads to assumptions about or projections onto the world of properties which are not actually there. Subtypes include the one-le
xeme-one-thing fallacy, reification, and the unity fallacy.

The Unity Fallacy. The assumption that a concept represents an object that is an integral whole, even if closer examination would show it to be a relatively haphazard collection of diverse phenomena.