Comments

Wednesday, November 23, 2016

Here
are several pieces by our own estimable Jeff Lidz that fight the good fight
against the forces of darkness and ignorance. We need much more of this.
We need to get stuff into popular venues defending the work that we have done.[1]

The
most important is this
piece in Scientific American
rebutting the profoundly ignorant and pernicious piece by Ibbotson and
Tomasello (I&T). (see here
and
here and here
for longer discussion). Jeff does an excellent job of pointing out the issues
and debunking the “arguments” that I&T advance. It is amazing, IMO, that
T’s views on these issues still garner any attention. They no doubt arise from
the fact that he has done good work on non-linguistic topics. However, his
criticisms of GG are both of long-standing and very low quality and have been
so for as long as they have been standing.So, it is good to see the Sci Am
has finally opened up its pages to those willing to call junk junk. Read it and
pass it around widely in your intellectual community.

Here
are two other pieces (here
and here). The
latter is a response to this. This all
appears in PNAS. The articles are
co-atuhored with Chung-hye Han and Julien Musolino. The discussion is an
accessible entry into the big issues GG broaches for the scientifically literate
non GGer. As such, excellent for publicity purposes.

So, read and disseminate widely. It is important to call out
the idiocy out there. It is even fun.

[1]
I have a piece in Current Affairs
with Nathan Robinson that I will link to when it is available on the web.

Monday, November 21, 2016

The first (here) is a review by Steven Mithen (SM) of a new book on human brain size. The received wisdom has been that human brains are large compared to our body size. The SM review argues that this is false. The book by Suzana Herculano-Houzel, a neuroscientist from Brazil, makes two important points (and I quote):

(i) What is perhaps more astounding than that number itself, one that is actually less than the often assumed 100 billion neurons, is that 86 billion makes us an entirely typical primate for our size, with nothing special about our brain at all, so far as overall numbers are concerned. When one draws a correlation between body mass and brain mass for living primates and extinct species of Homo, it is not humans—whose brains are three times larger than those of chimpanzees, their closest primate relative—that are an outlier. Instead, it is the great apes—gorillas and the orangutan—with brains far smaller than would be expected in relation to their body mass. We are the new normal in evolution while the great apes are the evolutionary oddity that requires explanation.

(ii) But we remain special in another way. Our 86 billion neurons need so much energy that if we shared a way of life with other primates we couldn’t possibly survive: there would be insufficient hours in the day to feed our hungry brain. It needs 500 calories a day to function, which is 25 percent of what our entire body requires. That sounds like a lot, but a single cupful of glucose can fuel the brain for an entire day, with just over a teaspoon being required per hour. Nevertheless, the brains of almost all other vertebrates are responsible for a mere 10 percent of their overall metabolic needs. We evolved and learned a clever trick in our evolutionary past in order to find the time to feed our neuron-packed brains: we began to cook our food. By so doing, more energy could be extracted from the same quantity of plant stuffs or meat than from eating them raw.

What solved the energy problem? Cooking. So, human brain size to mass ratio is normal but the energy the brain uses is off the charts. Cooking then, becomes part of the great leap forward.

The review (and the book) sound interesting. For the minimalistically inclined the last paragraph is particularly useful. It seems that the idea that language emerged very recently is part of the common physical anthro world view. Here's the SM's prose:

If a new neuronal scaling rule gave us the primate advantage at 65 million years ago, and learning to cook provided the human advantage at 1.5 million years ago, what, one might ask, gave us the “Homo sapiens advantage” sometime around 70,000 years ago? That was when our ancestors dispersed from Africa, to ultimately replace all other humans and reach the farthest corners and most extreme environments of the earth. It wasn’t brain size, because the Neanderthals’ matched Homo sapiens. My guess is that it may have been another invention: perhaps symbolic art that could extend the power of those 86 billion neurons or maybe new forms of connectivity that provided the capacity for language.

So 75kya something happened that gave humans a way of using their new big energy consuming brains another leg up. This adventitious change was momentous. What was it? Who knows. The aim of the Minimalist Program is to abstractly characterize what this could have been. It had to be small given the short time span. This line of reasoning seems to be less and less controversial. Of course what the right characterization of the change is at any level of abstraction is still unclear. But it's nice to know the problem is well posed.

Here's a "humorous" piece by Rolf Zwaan by way of Andrew Gelman. It's a sure fire recipe for getting things into the top journals. It focuses on results in "social priming" but I bet clever types can make the required adaptations for their particular areas of interest. My only amendment would be regarding the garnish in point 2. I believe that Greek Philosophers really are best.

Have a nice Thanksgiving (if you are in the USA). I will be off for at least a week until the turkey festivities end.

Sunday, November 20, 2016

I recently received two papers that explore Gallistel’s
conjecture (see here
for one discussion) concerning the locus of neuronal computation. The first (here)
is a short paper that summarizes Randy’s arguments and suggests a novel view of
synaptic plasticity. The second (here: http://www.nature.com/nature/journal/v538/n7626/full/nature20101.html)[1]
accept Randy’s primary criticism of neural nets and couples a neural net
architecture with a pretty standard external memory system. Let me say a word
about each.

The first paper is by Patrick Trettenbrein (PT) and it
appears in Frontiers in Systems Neuroscience.
It does three things.

First, it reviews the evidence against the idea that brains
store information in their “connectivity profiles” (2). This is the classical
assumption that inter-neural connection strengths are the locus of information
storage. The neurophysiological mechanisms for this are long term potentiation
(LTP) and long term depression (LTD). LTP/D are the technical terms for
whatever strengthens or weakens interneuron connections/linkages. I’ve
discussed Gallistel and Matzel’s (G&M) critique of the LTP/D mechanisms
before (see here).
PT reviews these again and emphasizes G&M’s point that there is an intimate
connection between this Hebbian “fire together wire together” LTP/D based
conception of memory and associationist psychology. As PT puts it: “Crucially,
it is only against this background of association learning that LTP and LTD
seem to provide a neurobiologically as well as psychologically plausible
mechanism for learning and memory” (88). This is why if you reject
associationsim and endorse “classical cognitive science” and its “information
processing approach to the study of the mind/brain” you will be inclined to find
contemporary connectionist conceptions of the brain wanting (3).

Second, there is recent evidence that connection strength
cannot be the whole story. PT reviews the main evidence. It revolves around
retaining memory traces despite very significant alterations in connectivity
profiles. So, for example, “memories appear to persist in cell bodies and can
be restored after synapses have been eliminated” (3), which would be odd if
memories lived in the synaptic connections. Similarly it has recently been shown
that “changes in synaptic strength are not directly related to storage of new
information in memory” (3). Finally, and I like this one the best (PT describes
it as “the most challenging to the idea that the synapse is the locus of memory
in the brain”), PT quotes a 2015 paper by Bizzi and Ajemian which makes the
following point:

If we believe that memories are
made of patterns of synaptic connections sculpted by experience, and if we
know, behaviorally, that motor memories last a lifetime, then how can we
explain the fact that individual synaptic spines are constantly turning over
and that aggregate synaptic strengths are constantly fluctuating?

Third, PT offers a reconceptualization of the role these
neural connections. Here’s an extended quote (5):

…it occurs to me that we
should seriously consider the possibility that the observable changes in
synaptic weights and connectivity might not so much constitute the very basis
of learning as they are the result of learning.

This is
to say that once we accept the conjecture of Gallistel and collaborators that
the study of learning can and should be separated from the study of memory to a
certain extent, we can reinterpret synaptic plasticity as the brain's way of
ensuring a connectivity and activity pattern that is efficient and appropriate
to environmental and internal requirements within physical and developmental
constraints. Consequently, synaptic plasticity might be understood as a means
of regulating behavior (i.e., activity and connectivity patterns) only after
learning has already occurred. In other words, synaptic weights and connections
are altered after relevant information has already been extracted from the
environment and stored in memory.

This leaves a place for
connectivity, but not as the mechanism of memory but as what allows memories to
be efficiently exploited.[2] Memories live within the
cell but putting these to good use requires connections to other parts of the
brain where other cells store other memories. That’s the basic idea. Or as PT puts
it (6):

The
role of synaptic plasticity thus changes from providing the fundamental memory
mechanism to providing the brain’s way of ensuring that its wiring diagram
enables it to operate efficiently…

As PT notes, the Gallistel
conjecture and his tentative proposal are speculative as theories of the relevant
cell internal mechanisms don’t currently exist. That said, neuroiphsyiological
(and computational, see below) evidence against
the classical Hebbian view are mounting and the serious problems for storing
memories in usable form in connections strengths (the bases of Gallistel’s
critique) are becoming more and more well recognized.

This brings us to the second Nature paper noted above. It endorses
the Gallistel critique of neural nets and recognizes that neural net
architectures are poor ways of encoding memories. It adds a conventional RAM to
a neural net and this combination allows the machine to “represent and
manipulate complex data structures.”

Artificial
neural networks are remarkably adept at sensory processing, sequence learning
and reinforcement learning, but are limited in their ability to represent
variables and data structures and to store data over long timescales, owing to
the lack of an external memory. Here we introduce a machine learning model
called a differentiable neural computer (DNC), which consists of a neural
network that can read from and write to an external memory matrix, analogous to
the random-access memory in a conventional computer. Like a conventional
computer, it can use its memory to represent and manipulate complex data
structures, but, like a neural network, it can learn to do so from data.

Note that the system is still “associationist” in that
learning is largely data driven (and as such will necessarily run into PoS
problems when applied to any interesting cognitive domain like language) but it
at least recognizes that neural nets
are not good for storing information. This latter is Randy’s point. The paper
is significant for it comes from Google’s Deep Mind Project and this means that
Randy’s general observations are making intellectual inroads with important
groups. Good.

However, this said, these models are not cognitively
realistic for they still don’t make room for the domain specific knowledge that
we know characterizes (and structures)
different domains. The main problem remains the associationism that the Google
model puts at the center of the system. As we know that associationism is wrong
and that real brains characterize knowledge independently of the “input,” we
can be sure that this hybrid model will need serious revision if intended as a
good cog-neuro model.

Let me put this another way. Classical cog sci rests on the
assumption that representations are central to understanding cognition. Fodor
and Pylyshyn and Marcus long ago agued convincingly that connectionism did not
successfully accommodate representations (and, recall, that connectionist agreed that their theories dumped
representations) and that this was a serious problem for connectionist/neural
net architectures. Gallistel further argued that neural nets were poor models
of the brain (i.e. and not only of the mind) because they embody a wrong
concpetion of memory; one that that makes it hard to read/write/retrieve
complex information (data structures) in usable form. This, Gallistel noted,
starkly contrasts with more classical architectures. The combined Fodor-Pylyshyn-Marcus-Gallistel
critique then is that connectionist/neural net theories were a wrong turn
because they effectively eschewed representations
and that this is a problem both from the cognitive and the neuro perspective.
The Google Nature paper effectively
concedes this point, recognizes that representations (i.e. “complex data
structures) are critical and resolves
the problem by adding a classical RAM to a connectionist front end.

However, there is a second feature of most connectionist approaches that is also wrong. Most such
architectures are associationist. They embody the idea that brains are entirely
structured by the properties of the inputs to the system. As PT puts it (2):

Associationism has come in
different flavors since the days of Skinner, but they all share the fundamental
aversion toward internally adding structure to contingencies in the world
(Gallistel and Matzel 2013).

Yes! Connectionists are weirdly attracted to associationism
as well as rejecting representations. This is probably not that surprising.
Once on thinks of representations then it quickly becomes clear that many of
their properties are not reducible to statistical properties of the inputs.
Representations have formal properties above and beyond what one finds in the
input, which, once you look, are found to be causally efficacious. However,
strictly speaking associationsim and anti-representationalism are independent
dimensions. What makes Behaviorists distinctive among Empiricists is their
rejection of representations. What unifies all Empiricists is their endorsement
of associationism. Seen form this perspective, Gallistel and Fodor and Pylyshyn
and Marcus have been arguing that representations are critical. The Google paper
agrees. This still leaves associationism however, and position the Googlers embrace.[3]

So is this a step forward? Yes. It would be a big step
forward if the information processing/representational model of the mind/brain
became the accepted view of things, especially in the brain sciences. We could
then concentrate (yet again) all of our fire on pernicious Empiricism so many
Cog-neuro types embrace.[4]
But, little steps my friends, little steps. This is a victory of sorts. Better
to be arguing against Locke and Hume than Skinner![5]

That’s it. Take a look.

[1]
Thx to Chris Dyer for bringing the paper to my attention. I put in the URL up
rather than link to the paper directly as the linking did not seem to work.
Sorry.

[2]
Redolent of a competence/performance distinction, isn’t it?The physiological bases of memory should not
be confused with the physical bases for the deployment of memory.

[3]
I should add that it is not clear that the Googlers care much about the
cog-neuro issues. Their concerns are largely technological, it seems to me. They
live in a Big Data world, not one where PoS problems (are thought to) abound.
IMO, even in a uuuuuuge data environment, PoS issues will arise, though finding
them will take more cleverness. At any rate, my remarks apply to the Google
model as if intended as a cog-neuro
one.

[4]
And remember, as Gallistel notes (and PT emphasizes) much of the connectionism
one sees in the brain sciences rests on thinking that the physiology has a
natural associationist interpretation psychologically. So, if we knock out one
strut, the other may be easier to dislodge as well (I know that this is wishful
thinking btw).

[5]
As usual, my thinking on these issues was provoked by some comments by Bob
Berwick. Thx.

Saturday, November 12, 2016

Here’s an
interesting piece by Nick Evans on the indigenous languages of Australia. It is
imbued with a sensibility concerning the study of language quite different than
my own (which is partly why I found it interesting) but it also raises some
questions that someone who approaches linguistic questions from my direction
should find intriguing. In what follows I will discuss both points of con- and
di-vergence. But before starting, let me reiterate that I found the piece
intriguing and I could imagine spending quite a bit of pleasant time over
several cold beers talking to Nick about his work, which is a long-winded way
of saying that you should take a look at the piece for yourself.[1]

Some comments:

(1) Nick worries about a question whose utility from where I
sit is not at all evident: How to distinguish a language from a dialect (see 4).
This is in service of trying to establish the integrity of the Australian
language family, which is in turn in service of trying to estimate how fast
languages change and how old language families are. The idea that Nick moots is
that the Australian language family is 60,000 years old and that this raises
the possibility that the emergence of the Faculty
of Language is much older still. In other words, Nick takes the dating of
the language family question as bearing on the emergence of the FL question.
Clearly, the second one is of interest to devotees of the Minimalist Program.

However, I am not sure that I would take the question as
nearly as well posed as Nick does. I do not see that there is a principled way
of distinguishing languages from dialects. The one that he proposes is the
following: “a language is something that is distinct enough to needs its own
distinctive descriptive grammar” (5). But what does ‘distinctive enough’ mean?
Darn if I know. For me a G is a mental construct. It is almost certain that no
two Gs are the same (i.e. no two people have exactly the same Gs).So the question is one of more or less. But
so far as I know this becomes a question of G overlap and the degree of overlap
will not be precise. But we need some measure of this to see how different two
Gs are so as to get a measure of G difference and hence, change.Maybe such measures exist, but I know of
none, and unless one specifies some dimensions of similarity (which may exist
(recall, I am no expert on these matters)) then the rate of change issue
becomes hard to specify.

This said, if we could
establish a rate of G change then this might be useful in establishing how old
FL is, and given that the only evidence we have for when it emerged is indirect
(the emergence of complex cultural artifacts (i.e. the big bang)) this would be
useful. That said, I doubt that it would significantly alter the backdrop for
Darwin’s Problem as it applies to language. The big fact is that FL appeared
more or less in one piece and it has not evolved since.There is no indication from what Nick writes
that these older Gs are qualitatively different from contemporary ones. This
means that the FL required to acquire them is effectively the same as the one
that we still possess. And if that is the case then the logic of Darwin’s
Problem as it applies to MP remains unchanged. So far as someone with my
interests is concerned, that is enough.

Let me add a question before moving on: is there a measure
of G change (or the more ambitious rate
of G change?) out there?Note that this
would be a measure of how Gs of the same
language change. This seems to require reifying languages so that two Gs can be Gs of the same language even if
different in detail. So far as I know, modern GG has only an inchoate
qualitative purchase on the notion of a language, and it has not been important
to make it more precise. In fact, it is part of a dispensable idealization
concerning ideal-speaker hearers. Nick’s project requires theoretically grounding
the informal notion sufficient for most GG inquiries. I am skeptical, but wish
him luck.

(2) Nick raises a second question: why are there so many
languages anyhow (8ff)?He asks this in
order to focus efforts on identifying “the social processes that drive
differentiation.” I also find this question interesting, but in a slightly
different way.From my perspective, Gs
are products of three factors: (i) the structure of FL/UG, (ii) the nature of
the PLD (the input data that the LAD uses to construct its G given the options
FL/UG allows) and (iii) the learning theory that LADs use to organize the PLD
and uses to construct a particular G given (i) and (ii).[2]
The question I find interesting is why FL/UG makes so many Gs available. Why
not simply hardwire in one G and be done with it? Why is FL/UG so open textured
and environmentally sensitive (i.e. open to the effects of PLD)? Note, that
FL/UG could have specified one G in
the species (say all Gs have more or less the syntax of “English”). This is
roughly what happens in some songbirds: all birds of a species sing the same
song. Why isn’t this what happened for language? In P&P terms this would
mean an FL/UG with no parameters. Why don’t we have this?And does the fact that we don’t have this
tell us anything interesting about FL/UG?

There are
several possibilities. Mark Baker has offered a kind of evolutionary rationale.
He thinks that Gs are codes that enable speakers of the same language to conceal information from outsiders (here:8):

Suppose
that the language faculty has a concealing function as well as a revealing
function. Our language faculty could have the purpose of communicating complex
propositional information to collaborators while concealing it from rivals that might be
listening in.

I say evolutionary, for I am assuming that it is because
concealment can confer selective advantages that we have such a code. Though an
ingenious idea, I am skeptical for the obvious reasons. This parameterized
coding scheme is now species wide and anyone can acquire any of the coding schemes
(aka Gs) if placed in the right linguistic environment. If the goal was opacity useful for segregating
in groups form out groups then one can imagine schemes that would make it impossible (or at least very difficult)
for outlanders to acquire the code would have been a superior option. But so
far as we can tell, all humans are equally adept at learning any G (i.e. set of
parameter values). Perhaps what Mark has in mind is that it is hard to learn a
non native G later in life and this suffices for whatever advantages concealment
promotes. Maybe.

I have remarked before, that parametrization is a very
curious fact (if it is a fact) (here),
one that suggests that, contrary to standard assumptions, typological
difference tell us very little about the structure of FL. However, putting this
to one side, it is interesting that Gs can be so different and Nick’s question
of why there is so much variation is a good one.

What’s his answer? There are social processes that drive
differentiation and we need to identify these. He suggests two steps (8-9):

The first step is to see how new linguistic elements
are born: new sounds, new grammatical structures, new words, new meanings. What
makes the range of these more or less diverse in different groups? For example,
does being multilingual add options to the pool? ...

The second step is to find how the society promotes
one variant over another. It is clear that some groups have linguistic
ideologies that place a high premium on harnessing linguistic means to say “Our
clan is different”, “our moiety is different” and so on…

This might be right so far as it goes, but it presupposes
that FL/UG allows all of these
options to begin with. In other words, given
that FL allows diverse Gs what drives the specific diversity we see. Baker
(and me) are interested in another question: why does FL allow the diversity to
begin with. What’s wrong with an FL that, as it were, had no parameters at all?

Here’s my thought: an FL with fixed parameters is more
biologically expensive than an open textured one. The idea is that if evolution
can rely on there always being enough PLD to allow a child to acquire the local
G then there is no reaons for evolution to code information in the genome that
the PLD makes readily available. If fixing info in the genome is costly then it
will not be put there unless it must be. So, an open textured system is what we
should expect. That’s the idea.

I think that this fits pretty well with MP thinking as
well. If what allows FL to emerge is a small
addition, say an operation like Merge, (an addition that remains very stable
and unchanging over time) then given that Merge is consistent with various
surface differences then so long as the non linguistic proprietary parts of FL
suffice with Merge to generate Gs then we should not expect more linguistic proprietary info to be
biologically coded. If Merge is enough, then it’s all that we will get. Note,
that this suggests that MP like systems will not likely have an FL/UG
specification of a particular parameter space (see here
and here
for some discussion). If this can be fleshed out, then the reason we have G
diversity is that fixed parameters are costly and MP takes FL to be what we get
we add only a smidgen of linguistically proprietary structure to an otherwise
language ready cognitive system. In other words, typologically diversity (PLD
sensitive G generation) is just what MP ordered.

(3) Nick provides sort
of an antidote for my tolerance for inferring UG principles from the
properties of a single G. As he puts it (12-3):

We are just coming out of half a century where generative
linguistics, as inspired by the great linguist Noam Chomsky, placed great
emphasis on ‘Universal Grammar’, very much seeing all languages as alike with
only minor variations. Part of this emphasis meant claiming there are all sorts
of imaginable design options that are simply not found in language. For
example, Steven Pinker and Paul Bloom wrote, in the early 90s, that ‘‘no
language uses noun affixes to express tense’’. Now clearly this is simply wrong
for Kayardild. It is an example of what can go wrong, scientifically, when one
extrapolates prematurely from too limited a range of cases. Now there’s nothing
wrong with the scientific strategy of making strong statements to invite
falsification. But what Kayardild shows us – and many other languages I could
have used to illustrate the structural originality of Australian languages, in
different ways – is that we really need to get out there and describe
languages, as they are, to realize the full richness and diversity of how
humans have colonized the design space of language through the languages they
have built through use.

I say “sort of” because Nick’s observations are not
couched in terms of Gs but in terms of languages and the problems he cites have
less to do with the properties of Gs than with their surface manifestations.
Chomsky did not (and does not) see “all languages alike.” What he saw/sees was/is
that all I-languages are pretty much alike. Missing the ‘I’ prefix
threatens confusing Chomsky for Greenberg. I can understand that if one’s interest
are mainly typological and that diversity is what gets you excited then
dropping the ‘I’ will seem like the best way to import Chomsky’s insights into
your work. But this is a mistake (as you knew I would say). It is not the
diversity of languages that we need to investigate if your goal is GGish, but
the diversity of Gs and these will only be indirectly related to surface
patterns we observe. The Pinker-Bloom example is very much a Greenberg
conception of universal at least as Nick takes it to be refuted by Kyardild (it
appears to deal with features of overt affixes). If we are to learn about FL/UG
by exploring the rich “design space of language” then we need to keep in mind
that it is I-language space we should
be exploring. Moreover, when it comes to I-language
space I am less sure than Nick is that

[t]he world of languages holds more possibilities
than any linguist has imagined, and Australian languages have taken the ‘design
space’ in lots of rare and unusual directions, so that we’re still finding new
phenomena that people hadn’t imagined before (14).

In fact, from where I sit, we have actually found
relatively few new universals since
the mid 1980s. If this is correct, oddly, exploring the ‘design space’ has
enriched our understanding of language diversity but has left our understanding
of I­-language variation pretty much
where it was when only a small number of languages served as linguistic model
organisms.[3]

That’s it. I think that Nick has asked some interesting
questions, the most interesting being why FL/UG allows G variation. We are
interested in different things, but the paper was fun to read and Kayardild sounds
like it can take you on a wild ride. Like I said, I’d love to have a beer with
him.

[3]
See
here for a partial list. The observant reader will note that most of these
are very old. It would be nice to have some candidate universals that are of
more recent vintage, say discovered in the last 20 years. If my hunch is right
that recent contributions to the list have been sparse of late, this is
interesting and worth trying to understand.