Comments

Monday, June 12, 2017

Face it, research is tough

Research is tough. Hence, people look for strategies. One close
to my heart is one not that far off from the one the Tom Lehrer identified here. Mine is not quite
the same, but close. It involves leading a raiding party into nearby successful
environs, stripping it naked of any and all things worth stealing and then
repackaging it in one’s favorite theoretical colors. Think of it as the
intellectual version of a chop shop. Many have engaged this research strategy.
Think of the raids into Relational Grammar wrt unaccusativity, psych verbs, and
incorporation. Or the fruitful “borrowing” of “unification” in feature checking
to allow for crash proof Gs. Some call this fruitful interaction, but it is largely
thievery, albeit noble theft that leaves the victim unharmed and the
perpetrator better off. So, I am all for it.

This kind of activity is particularly rife within the cog-neuro
of my acquaintance. One of David Poeppel’s favorite strategies is to appropriate
any good idea that the vision people come up with and retool it so that it can
apply to sound and language. The trick to making this work is to find the right
ideas to steal. Why risk it if you are not going to strike gold? This means
that it is important to keep one’s nose in the air so as to smell out the nifty
new ideas. For peoples like me, better it be someone else’s nose. In my case,
Bill Idsardi’s.He just pointed me to a
very interesting paper that you might like to take a look at as well. It’s on
face recognition, written by Chang and Tsao (C&T) and appeared in Cell (here) and
was reprised in the NYT (here).

What does it argue? It makes several interesting points.

First, it argues that face recognition is not based on exemplars. Exemplar theory
goes as follows according to the infallible Wikipedia (here):

Exemplar theoryis a proposal concerning the
way humanscategorizeobjects and ideas in psychology. It argues that individuals
make category judgments by comparing new stimuli with
instances already stored inmemory. The instance stored in memory is the“exemplar.” The new stimulus is assigned to a
category based on the greatest number of similarities it holds with exemplars in
that category. For example, the model proposes that people create the
"bird" category by maintaining in their memory a collection of all
the birds they have experienced: sparrows, robins, ostriches, penguins, etc. If
a new stimulus is similar enough to some of these stored bird examples, the
person categorizes the stimulus in the "bird" category.Various versions of the exemplar
theory have led to a simplification of thought concerning concept learning,
because they suggest that people use already-encountered memories to determine
categorization, rather than creating an additional abstract summary of
representations.

It is a very popular (in fact way too popular) theory in
psych and cog-neuro nowadays. In case you cannot tell, it is redolent of a
radical kind of Empiricism and, not surprisingly perhaps, given bedfellows and
all that, a favorite of the connectionistically inclined. At any rate, it works
by more or less “averaging” over things you’ve encountered experientially and
categorizing new things by how close they come to these representative
examples. In the domain of face recognition, which is what C&T talks about,
the key concept is the “eigenface” (here) and you can see some
of the “averaged” examples in the Wikipedia piece I linked to.

C&T argues that this way of approaching face
categorization is completely wrong.

In its place C&T proposes an axis theory, one in which
abstract features based on specific facial landmarks serve as the representational
basis of face categorization. The paper identifies the key move as “first
aligning landmarks and then performing principle component analysis separately
on landmark positions and aligned images” rather than “applying principle
component analysis on the faces directly, without landmark alignment” (1026).
First the basic abstract features and then face analysis wrt them, rather than
analysis on face perceivables directly (with the intent, no doubt, of
distilling out features). C&T argues that the abstracta come first and with
the right faces generated from these rather than the faces coming first and
these used to generate the relevant features.[1]
Need I dwell on E vs R issues? Need I mention how familiar this kind of
argument should sound to you? Need I mention that once again the Eish fear of
non- perceptually grounded features seems to have led in exactly the wrong
direction wrt a significant cognitive capacity? Well, I won’t mention any of
this. I’ll let you figure it out for yourself!

Second, the paper demonstrates that with the right features
in place it is possible to code for faces with a very small number of neurons;
roughly 200 cells suffice. As C&T observes, given right code allows for a
very efficient (i.e. small number of units suffice), flexible (allows for
discrimination along a variety of different dimensions) and robust (i.e. axis
models perform better in noisy conditions) neuro system for faces. As C&T
puts it:

In sum, axis coding is more
flexible, efficient, and robust to noise for representation of objects in a
high-dimensional space compared to exemplar coding. (1024)

This should all sound quite familiar as it resonates with
the point that Gallsitel has been making for a while concerning the intimate
relation between neural implementation and finding the correct “code” (see here).
C&T fits nicely with Gallistel’s observations that the coding problem
should be at the center of all current cog-neuro. It adds the following useful
codicil to Gallistel’s arguments: even absent
a proposal as to how neurons implement the relevant code, we can find
compelling evidence that they do so and that getting the right code has
immediate empirical payoffs. Again C&T:

This suggests the correct choice of
face space axes is critical for achieving a simple explanation of face cells’
responses. (1022).

C&T also relates to another of Gallistel’s points. The relevant
axis code lives in individual neurons. C&T is based on single neuron
recordings that get “added up” pretty simply. A face representation ends up
being a linear combination of feature values along 50 dimensions (1016). Each
combination of values delivers a viable face. The linear combo part is
interesting and important for it demystifies the process of face recognition,
something that neural net models typically do not do. Let me say a bit more here.

McClelland and Rumelhart launched the connectionist (PDP)
program when I was a tyke. The program was sold as strongly
anti-representational and anti-reductionist. Fodor & Pylyshyn and Marcus
(among others) took on the first point. Few took on the second, except to note
that the concomitant holism seemed to render hopeless any hope of analytically understanding
the processes the net modeled. There was more than a bit of the West Coast
holistic vibe in all of this. The mantra was that only the whole system
computed and that trying to understand what is happening by resolving it into
the interaction of various parts doing various things (e.g. computations) was
not only hopeless, but even wrongheaded. The injection of mystical awe was part
of the program (and a major selling point).

Now, one might think that a theory that celebrated the
opacity of the process and denigrated the possibility of understanding would,
for that reason alone, be considered a non-starter. But you would have been
wrong. PDP/Connectionism shifted the aim of inquiry from understanding to
simulation. The goal was no longer to comprehend the principles behind what was
going on, but to mimic the cognitive capacity (more specifically, the I/O
behavior) with a neural net.Again, it
is not hard to see the baleful hand of Eish sympathies here.At any rate, C&T pushes back against this
conception hard. Here is Tsao being quoted in the NYT:

Dr. Tsao has been working on face cells for 15 years and views
her new report, with Dr. Chang, as “the capstone of all these efforts.” She
said she hoped her new finding will restore a sense of optimism to
neuroscience.

Advances in machine learning have been made by training a
computerized mimic of a neural network on a given task. Though the networks are
successful, they are also a black box because it is hard to reconstruct how
they achieve their result.

“This has given neuroscience a sense of pessimism that the brain
is similarly a black box,” she said. “Our paper provides a counterexample.
We’re recording from neurons at the highest stage of the visual system and can
see that there’s no black box. My bet is that that will be true throughout the
brain.”

No more black box and the mystical holism of PDP. No more
substituting simulation for explanation. Black box connectionist models don’t
explain and don’t do so for principled reasons. They are what one resorts to in
lieu of understanding. It is obscurantism raised to the level of principle.
Let’s hear it for C&T!!

Let me end with a couple or remarks relating to extending
C&T to language. There are lots of ling domains one might think of applying
the idea that a fixed set of feature parameters would cover the domain of
interest. In fact, good chunks of phonology can be understood as doing for ling
sounds what C&T does for faces, and so extending their methods would seem
apposite. But, and this is not a small but, the methods used by C&T might
be difficult to copy in the domain of human language. The method used, single
neuron recordings is, ahem, invasive. What is good for animals (i.e. that we
can torture them in the name of science) is difficult when applied to humans
(thx IRB). Moreover, if C&T is right, then the number of relevant neurons
is very small. 200 is not a very big neural number and this sized number cannot
be detected using other methods (fMRI, MEG, EEG) for they are far too gross.
They can locate regions with 10s of thousands of signaling neurons, but they,
as yet, cannot zero in on a couple of hundred. This means that the standard
methods/techniques for investigating language areas will not be useful if something like what C&T found
regarding faces extends to domains like language as well. Our best hope is that
other animals have the same “phonology” that we do (I don’t know much about
phonology, but I doubt that this will be the case) and that we can stick
needles into their neurons to find out something about our own.At any rate, despite the conceptual fit, some
clever thinking will be required to apply C&T methods to linguistic issues,
even in natural fits like phonology.

Second, as Ellen Lau remarked to me, it is surprising that so
few neurons suffice to cover the cognitive terrain. Why? Because the six brain
patches containing these kinds of cells have 10s of thousands of neurons each.
If we only need 200 to get the job done, then why do we have two orders of
magnitude more than required? What are all those neurons doing? It makes sense
to have some redundancy built in. Say
five times the necessary capacity. But why 50 times (or more)?And is redundancy really a biological
imperative? If it were, why only one heart, liver, pancreas? Why not three or
five? At any rate, the fact that 200 neurons suffices raises interesting
questions. And the question generalizes: if C&T is right that faces are
models of brain neuronal design in general, then why do we have so many of damn
things?

That’s it from me. Take a look. The paper and NYT piece are
accessible and provocative and timely. I think we may be witnessing a turn in
neuroscience. We may be entering a period in which the fundamental questions in
cognition (i.e. what’s the right computational code and what does it do?) are
forcing themselves to center stage. In other words, the dark forces of Eism are
being pushed back and an enlightened age is upon us. Here’s hoping.

[1]
We can put this another way. Exemplar theorists start with particulars and
“generalize” while C&T start with general features and “particularize.” For
exemplar theorists at the root of the general capacity are faces of particular
individuals and brains first code for
these specific individuals (so-called Jennifer Aniston cells) and then use
these specific exemplars to represent other faces via a distance measure to
these exemplars (1020). The axis model denies that theer are “detectors for
identities of specific individuals in the face patch system” (1024). Rather
cells respond to abstract features with specific individual faces represented
as a linear combination of these features. Individuals on this view live in a
feature space. For the Exemplar theorist the feature space lives on
representations for individual faces. The two approaches identify inverse
ontological dependencies, with the general features either being a function of (relevant)
particulars or particulars being instances of (relevant) combinations of
general features. These dueling conceptions, what is ontologically primary the
singular or the general being a feature of E/R debates since Plato and
Aristotle wrote the books to which all later thinking are footnotes.

1 comment:

A couple of weeks ago on a previous blog post I wrote this comment, which seems just as relevant here:

When neuroscientists try and figure out the neural code for spatial and conceptual navigation, they go way beyond correlational analysis of oscillatory entrainment to external stimuli (as in Ding et al and much other work). They also look at what's going on inside the rest of the brain, examining cross-frequency couplings (like phase-amplitude coupling), spike time coordination, cellular analysis, etc.

Take this recent study by Constantinescu et al (http://science.sciencemag.org/content/352/6292/1464.long). They show that the neural code which has long been implicated in spatial navigation may also be implicated in navigating more abstract representations, such as conceptual space (recent work also points to the same code being implicated in navigating auditory space, too).

This work should be of exceptional interest to linguists. If this is how the brain interprets basic relations between conceptual representations, then we should probably put aside the Jabberwocky EEG studies and eye-tracking experiments for a little while (important though they may be) and engage in these sorts of emerging frameworks.

Instead of claiming that some region of interest in the brain (or some oscillatory band) is responsible for some complex process (e.g. "semantic composition is implemented via gamma increases", "syntax is represented in anterior Broca's area", and "my Top 10 Tom Cruise movies are stored in the angular gyrus"), exploring the neural code is of much greater importance and urgency. This is something Gallistel actually stressed at CNS recently.

Final implication for Merge, the "Basic Property", and other rhetorical and computational constructs: The Constantinescu study actually reflects a more general trend in neurobiology these days. Things that were once deemed highly domain-specific are now being understood to implement much more generic computations, and the only domain-specific things left are the *representations* these computations operate over. In other words, good luck trying to find the "neural correlates of Merge" if you only have your Narrow Syntax glasses on.

My bets on "the right computational code": https://www.ucl.ac.uk/pals/research/linguistics/research/uclwpl/wpl/16papers/UCLWPL_2016_Murphy.pdf