Comments

Friday, August 30, 2013

Bob Berwick is planning to write something erudite about
Bayes in a forthcoming post. I cannot do this, for obvious reasons. But I can
throw oil on the fires. The following is my reaction to a paper that Ewan
suggested that I read the critiques of. I doubt that his advice had the
intended consequence. But, as Yogi Berra observed, it’s always hard to accurately
predict how things will turn out, especially in the future. So here goes.

In an effort to calm my disquiet about Bayes and his
contemporary acolytes, Ewan was kind enough to suggest that I read the comments
to this
B&BS (is it just me, or does the acronym suggest something about the
contents?) target article. Of course,
being a supremely complaisant personality, I immediately did as bid, trolling
through the commentaries and even reading the main piece and the response of
the authors to the critics’ remarks so that I could appreciate the subtleties
of the various parries and thrusts.
Before moving forward, I would love to thank Ewan for this tip. The paper is a lark, the comments are
terrific and, all in all, it’s just the kind of heated debate that warms my cold
cynical heart. Let me provide some of my
personal highlights. But, please, read this for yourself. It’s a page-turner,
and though I cannot endorse their views due to my incompetence, I would be lying if I denied being comforted by their misgivings. It’s always
nice to know you are not alone in the intellectual universe (see here).

The main point that the authors Jones and Love (J&L)
make is that, as practiced, there’s not much to current Bayesian analyses,
though they are hopeful that this is repairable (in contrast to many of the
commentators who believe them to be overly sanguine e.g. see Glymour’s comments. BTW, no
fool he!). Indeed, so far as I can tell, they suggest that this is not a
surprise for Bayes as such is little
more than a pretty simple weighted voting scheme to determine which among a set
of given alternatives best fits the
data (see L&J’s 3). There is some
brouhaha over this characterization by the law firm of Chater, Goodman,
Griffiths, Kemp, Oaksford and Tenebaum (they charge well over $1000 per probable
hour I hear), but J&L stick to their guns and characterization (see p. 219)
claiming that the sophisticated procedures that Chater et. al. advert to
“introduces little added complexity” once the mathematical fog is cleared (219).

So, their view is that Bayesianism per se is pretty weak stuff. Let me explain what I take them to
mean. J&L note (section 3 again) that there are two parts to any Bayesian
model, the voting/counting scheme and the structure of the hypothesis space.
The latter provides the alternatives voted on and a weighting of the votes
(some alternatives are given head starts). Now, Bayes’ Rule (BR) is a
specification of how votes should be allocated as data comes in. The hypothesis space is where the real heavy
lifting is done. In effect, in J&L’s view (and they are by no means the
most extreme voices here as the comment sections show) BR, and modern souped up
versions thereof, add very little of explanatory significance to the mix. If so,
J&L observe, then most of the psychological interest of Bayesian models
resides in the structure of the assumed hypotheses spaces, i.e. whatever
interesting results emerge from a Bayesian model, stem not from the counting
scheme but the structure of the hypothesis space. That’s where the empirical meat lies:

All a Bayesian model does is
determine which of the patterns or classes of patterns it is endowed with is
most consistent with the data it is given. Thus, there is no explanation of
where those patterns (i.e. hypotheses) come from. (220)

This is what I meant by saying that, in J&L’s view,
Bayes, in and of itself, amounts to
little more than the view that “people use past experience to decide what to do
or expect in the future” (217). In and of
itself Bayes does not specify or bound the class of possible or plausible
hypothesis spaces and so in and of itself
it fails to make much of a contribution to our understanding of mental life.
Rather, in and of itself, Bayesian
precepts are anodyne: who doesn’t think that experience matters to our mental
life?

This view is, needless to say, heartily contested. Or so it
appears on the surface. So Chater et. al. assert that:

By adopting appropriate representations of a problem
in terms of random variables and probabilistic dependencies between them,
probability theory and its decision theoretic extensions offer a unifying framework
for understanding all aspects of cognition that can be properly understood as
inference under uncertainty: perception, learning, reasoning, language
comprehension and production, social cognition, action planning, and motor
control, as well as innumerable real world tasks that require the integration
of these capacities. (194)

Wow! Seems really opposed to J&L, right? Well maybe not. Note the first seven words of
the quote that I have conveniently highlighted. Take the right representation
of the problem add a dash of BR and out pops all of human psychology. Hmm. Is this antithetical to J&L’s
claims? Not until we factor out how much
of the explanation in these domains comes from the “appropriate
representations” and how much from the probability add on. Nobody (or at least nobody I know) has any
problem adding probabilities to mentalist theories, at least not in principle
(one always wants to see the payoff). However, if we were to ask where the hard
work comes in, J&L argue that it’s in choosing the right hypothesis space
and not in probabilizing up a given such space.
Or this is the way it looks to J&L and many many many, if not most,
of the other commentators.

Let me mote one more thing before ending. J&L also pick
up on something that bothered me in my earlier post. They observe more than a
passing resemblance between modern Bayesians and earlier Behaviorism (see their
section 4). They assert that in many
cases, the hypotheses that populate Bayesian spaces “are not psychological
constructs …but instead reflect characteristics of the environment. The set of
hypotheses, together with their prior probabilities, constitute a description
of the environment by specifying the likelihood of all possible patterns of
empirical observations (e.g. sense data)” (175). J&L go further and claim
that in many cases, modern Bayesians are mainly interested in just covering the
observed behavior, no matter how it is done. Glymour dubs this “Osiander’s
Psychology,” the aim being to “provide a calculus consistent with the
observations” and nothing more. At any
rate, it appears that there is a general perception out there that in practice Bayesians have looked to
“environmental regularities” rather than “accounts of how information is
represented and manipulated in the head” as the correct bases of optimal
inference.

Chater et. al. object to this characterization and allow “mental
states over which such [Bayesian] computations exist…” (195). This need not invalidate J&L’s main point
however. The problem with Behaviorism
was not merely that it eschewed mental states, but that it endorsed a radical
form of associationism. Behaviorism is the natural end point of radical
associationism for the view postulates that mental structures largely reflect
the properties of environmental regularities. If this is correct, then it is
not clear what adding mental representations buys you. Why not go directly from
regularities in the environment to regularities in behavior and skip the
isomorphic middle man.

It is worth noting that Chater et. al. seem to endorse a
rough vision of this environmentalist project, at least in the domains of vision
and language. As they note “Bayesian approaches to vision essentially involve
careful analysis of the structure of the visual environment” and “in the
context of language acquisition” Bayesians have focused on “how learning
depends on the details of the “linguistic environment,” which determines the
linguistic structures to be acquired” (195).

Not much talk here of structured hypothesis spaces for
vision or language, no mention of Ullman-like Rigidity Principles or principles
of UG. Just a nod towards structured environments and how they drive mental
processing. Nothing prevents Bayesians from including these, but there seems to be a predisposition to focus on environmental influences. Why?
Well, if you believe that the overarching “framework” question is how
data (i.e. environmental input) moves you around a hypothesis space, then maybe
you’ll be more inclined to downplay the role of the structure of that space and
highlight how input (environmental input) moves you around it. Indeed, a
strong environmentalism will be attractive to you if you believe this. Why? Well, given this
assumption then mental structures are just reflections of environmental
regularities and, if so, the name of the psychological game will focus on explaining how data is
processed to identify these regularities.
No need to worry about the structure of hypothesis spaces for they are
simple reflections of environmental regularities, i.e. regularities in the
data.

Of course, this is not a logically
necessary move. Nothing in Bayes requires that one downplay the importance of hypothesis
spaces, but one can see, without too much effort, why these views will live
comfortably together. And it seems that Chater et. al., the leading Young
Bayesians, have no trouble seeing the utility of structured environments to the
Bayesian project. Need I add that this is the source for the expressed unease
in my previous post on Bayes (here).

Let me reitterate one more point and then stop. There is no reason to think that the practice
that J&L describe, even if it is accurate, is endemic to Bayesian modeling.
It is not. Clearly, it is possible to choose hypothesis spaces that are more
psychologically grounded and then investigate the properties of Bayesian models
that incorporate these. However, if the
more critical of the commentators are correct (see Glymour, Rehder, Anderson
a.o.) then the real problem lies with the fact that Bayesians have hyped their
contributions by confusing a useful tool with a theory, and a pretty simple
tool at that. Here are two quotes
expressing this:

Rehder [i.e. in his comment, NH] goes as far as to suggest viewing
the Bayesian framework as a programming language, in which Bayes’ rule is
universal but fairly trivial, and all of the explanatory power lies in the
assumed goals and hypotheses. (218)

…that all viable approaches
ultimately reduce to Bayesian methods does not imply that Byesian inference
encompasses their explanatory contribution. Such an argument is akin to
concluding that, because the dynamics of all macroscopic physical systems can
be modeled using Newton’s calculus, or because all cognitive models can be
programmed in Python, calculus or python constitutes a complete and correct
theory of cognition. (217)

So, in conclusion: go read the paper, the commentaries and
the replies. It’s loads of fun. At the
very least it comforts me to know that there is a large swath of people out
there (some of them prodigiously smart) that have problems not dissimilar to
mine with the old Reverend’s modern day followers. I suspect that were the revolutionary swagger
toned down and replaced with the observation that Bayes provides one possibly useful
way for exploring how to incorporate probabilities into the mental sciences,
nobody would bat an eye. I’m pretty sure
that I wouldn’t. All that we would ask is what one should always ask: what does
doing this buy us?

Monday, August 26, 2013

It seems that I hit some button that changed the view of the blog. I have no idea what I did or how to fix it. As this doesn't bother me much, I will either find a way to revert to "spare" easily or will leave things as they are. Sorry for the new (accidental) look.

In the last chapter of Dehaene’s Reading the Brain he speculates
about one of the really big human questions: whence culture? The books big
thesis, concentrating on reading and writing as vehicles for cultural
transmission, is the Neuronal Recycling Thesis (NRT). The idea is simple;
culture supervenes on neuronal mechanisms that arose to serve other ends. Think
exaptation as applied to culture. Thus,
reading and writing are underpinned by proto letters, which themselves live on
ecologically natural patterns useful for object recognition. So too, the hope goes, for the rest of what
we think of as culture. However, as Dehaene quickly notes, if this is the
source, and “we share most, if not all of these processors [i.e. recycled
structures NH] with other primates, why are we the only species to have
generated immense and well-developed cultures” (loc 4999). Dehaene has little
patience for those who fail to see a qualitative difference between human
cultural achievements and those of our ape cousins.

…the scarcity of animal cultures
and the paucity of their contents stand in sharp contrast to the immense list
of cultural traditions that even the smallest human groups develop
spontaneously. (loc 4999)

Dehaene specifically points to the absence of “graphic
invention” in primates as “not due to any trivial visual or motor limitation”
or to a lack of interest in drawing, apparently (loc 5020). He puts the problem
nicely:

If cultural invention stems from
the recycling of brain mechanisms that humans share with other primates, the
immense discrepancy between the cultural skills of human beings and chimpanzees
needs to be explained. (loc 5020)

He also surveys several putative answers, and finds them
wanting. His remarks on Tomasello (loc 5046-5067) seem to me quite correct,
noting that though Tomasello’s mind reading account might explain how culture
might spread and its achievements retained cross generationally:[1]

…it says little…about the initial
spark that triggers cultural invention. No doubt the human species is
particularly gifted at spreading culture – but it is also the only species to create culture in the first place. (loc
5067, his emphasis)

So what’s Dehaene’s proposal?

My own view is that another
singular change was needed - the capacity to arrive at new combinations of
ideas and the elaboration of a conscious mental synthesis (loc 5067).

This is quite a mouthful, and so far as I can see, what
Dehaene means by this is that our frontal lobe got bigger and that this
provided a “”neuronal workspace” whose main function is to assemble, confront,
recombine, and synthesize knowledge” (loc 5089).

I don’t find this particularly enlightening. It’s
neuro-speak for something happened, relevant somethings always involving the
brain (wouldn’t it be refreshing if every once in a while the kidney, liver or
heart were implicated!). In other words, the brain got bigger and we got
culture. Hmm. This might be a bit unfair. Dehaene does say more.

He notes that the primate cortex, in contrast to ours, is
largely modular, with “its own specific inputs, internal structure, and
outputs.” Our prefrontal cortex in contrast “emit and receive much more diverse
cortical signals” and so “tend to be less specialized.” In addition, the our
brains are less “modular” and have greater “bandwidth.” This works to prevent
“the division of data and allows out behavior to be guided by any combination
of information from past or present experience.” (loc 5089)

Broken down to its essentials, Dehaene is here identifying
the demodularization of thought as the key ingredient to the emergence of
culture. As he notes (loc 5168), in this he agrees with Liz Spelke (and others)
who has argued that the general ability to integrate information across modules
is what spices up our thinking beyond what we find in other primates. Interestingly for my purposes here, Spelke
ties this capacity for cross module integration to the development of linguistic
facility (see here).

This assumption, that language is a necessary condition for
the emergence of the kind of culture we see in humans is consistent with the
hypothesis Minimalists have been assuming (following people like Tatersall (here))
that the anthropological “big bang,” which occurred in the last 25-50,000 years,
piggy backed on the emergence of FL in the last 50-100,000 years. Moreover,
it’s language as module buster that gets the whole amazing culture show on the
road.

But what features of language make it a module buster? What allows grammar to “assemble and
recombine” otherwise modular information? What’s the secret linguistic sauce?

Sadly, neither Dehaene nor Spelke say. Which is too bad as me and my lunch buddies
(thx Paul, Bill) have discussed this question off and on for several years now,
without a lot to show for it. However, let me try to suggest a key
characteristic that we (aka I) believe is implicated. The key is syntax!

The idea is that FL provides a general-purpose syntax for
combining information trapped within modules.
Syntax is key here, for I am assuming (almost certainly wrongly, so feel
free to jump in at any point) what makes information modular is some feature of
the module internal representations that make it difficult for them to
“combine” with extra-modular information. I say syntax for once information trapped within a module can combine
with information in another module it appears that, more often than not, the
combination can be interpreted. Thus, it’s not that the combination of
modularly segregated concepts is semantically undigestible, rather the problem
seems to be getting the concepts to talk to one another in the first place, and,
I take this to mean, to syntactically combine. So module busting will amount of
figuring out how to treat otherwise distinct expressions in the same way. We
need some kind of abstract feature that, when attached to an arbitrary
expression, allows it to combine with any other expression from any other
module. What we need, in effect, is,
what Chomsky called, an “edge-feature,” (EF) a thingamajig that allows expressions to
freely combine.

Now, if you are like me, you will not find this proposal a
big step forward for it seems to more name a solution than provide one. After
all, what can EFs be such that they possess such powers? I am not sure, but I am pretty confident that
whatever this power is it’s purely syntactic. It is an intrinsic property of
lexical atoms and it is an inherited property of congeries of such (i.e.
outputs of Merge). I have suggested (here)
that EFs are, in fact, labels, which function to close Merge in the domain of
the lexical items (LIs). In the same place I proposed that labeling is the
distinctively linguistic operation, which in concert with other cognitively
recycled operations, allowed for the emergence of FL.

How might labels do this?
Good question. An answer will require addressing a more basic question: what
are labels? We know what they must do:
they must license the combination both of lexical atoms and complexes of
such. Atomic LIs are labels. Complexes of LIs are labeled in virtue of
containing atomic ones. The $64,000 question (doesn’t sound like much of a
prize anymore, does it?) is how to characterize this. Stay tuned.

So, culture supervenes on language and language is the
recycling of more primitive cognitive operations spiced with a bit of labeling.
Need I say that this is a very “personal” (read “extremely idiosyncratic and
not currently fashionable”) view?
Current MP accounts are very label-phobic. However, the question Dehaene raises is a
good one, especially for theories like MP that presuppose lots of cognitive
recycling.[2] It’s not one whose detailed answer is
anywhere on the horizon. But like all good questions, I suspect that it will
have lots of staying power and will provide lots of opportunities for fun
conversations.

[1]
It’s good to see that Tomasello is capable of begging the interesting question
regardless of where he puts his efforts.

[2]
See discussion in the comments I had with Jan Koster about this my previous
post (here).

Friday, August 23, 2013

There's been a lot of discussion concerning Mirror Neurons (MN) in both the professional literature and the popular press. They are represented as just what the neuro-scientist ordered to exlain learning, empathy, other minds, whatever. Greg Hickok is in the process of finishing a book on this timely topic, and, unless the final product ends up saying the opposite of what I read (very unlikely), MN enthusiasts will be in for a rough time. At the very least the cognitive benefits MNs are supposed to endow have been massively oversold, if he is correct. I encourage you to get the book when it finally comes out. The title, Greg tells me, is "The Myth of Mirror Neurons: the real neuroscience of communication and cognition." To whet your appetite for the full course meal, Greg has allowed me to cross list some of his better posts from Talking Brains. So here's an hors d'oeuvre: here, here, here, here, and here. Enjoy.

Tuesday, August 20, 2013

Darryl McAdams sends me this link
from the American Statistical Society. It discusses a possible rather attractive alternative to
the traditional academic publishing regime. In particular, it explores how the suppleness of
the blog format might be used to enhance dissemination of novel results and
improve the quality of discussion by encouraging useful critical commentary. The contrast between Jane 2.0 and her
unfortunate 1.0 avatar is striking. The
latter is disadvantaged in myriad ways: she is cut off from interesting peer
commentary, has to wait an excessively long time for reviews, is limited to
submission to one journal at a time and only gets her work widely read if
published. Jane 2.0 is far better off in virtually every respect. A further benefit of 2.0 is
that it re-empowers the community that does most of the work: the research community
that reviews the papers and which cares about the work (as opposed to the
publishing houses that are mainly interested in turning a profit). I think that
open source journals are where academic publishing is (and should be) heading. Jane
2.0 seems like an attractive alternative to what we have now. I say this
knowing full well that the utopian vision 2.0 describes will surely engender
problems of its own.

Monday, August 19, 2013

In November, Stan Dehaene is
coming to Maryland to give the annual Baggett Lectures on language and
cognition. To “prepare” myself, I have just finished reading his last book, Reading
in the Brain, which I can highly recommend. It appears that our friends
in cog-neuro have begun to understand the underlying mechanisms behind our
ability to read, tracing it to a confluence of capacities lodged, not
surprisingly, in the visual system and FL. The reading trick, again not a
surprise, is to figure out how to link morphemes to graphemes (I’m talking about
alphabetic reading systems here) and this problem turns out to piggy back on
some rather deep facts about the mechanisms that the visual system uses to
interpret the physical environment and how different alphabets express the
relevant morphemes in a language. It
seems that letters like ‘T’ and ‘F,’ ‘K,’ ‘Y,’ and ‘L’ are “proto-letters” and they
exploit capacities central in parsing a visual scene:

The shape T, for example, is
extremely frequent in natural scenes. Whenever one object masks another, their contours
always form a T-junction. Thus neurons that act as “T-detectors” could help
determine which object is on front of which.

Other characteristic
configurations, like the shapes of a Y and an F are found at places where
several objects of an object meet…All of these fragments of shapes belong to
what is known as “non-accidental properties” of visual scenes because they are
unlikely to occur accidentally in the absence of any object…(loc 2138 e-book
version).

These “natural” shapes find their way into many alphabetic
systems thereby allowing the capacities of the visual system to be recycled to
undergird the capacity to read.[1]

The second leg of the reading capacity lies in tying
graphemes to morphemes. This turns out to be rather difficult. I was surprised to find out (remember, I come
from a philosophy department so I know virtually no phonology and, come to
think of it, very little else) that the emerging consensus opinion concerning
dyslexia is that stems from “an anomaly in the phonological processing of
speech sounds” (loc 3779). It seems that the majority of dyslexic kids have
trouble processing phonemes in general (i.e. independent of reading) and that’s
why they have trouble matching graphemes (letters) to morphemes in reading. In other words, it seems that dyslexia is
largely a speech processing problem
(loc 3801). Dehaene calls this is a
“revolutionary idea,” one that seems “barely credible,” but he argues that the
evidence points to dyslexics having a problem with “phonemic awareness” and hence
have trouble with the necessary phoneme-grapheme mapping mastery of which is
required for fluent reading (loc 3801).

Interesting to me was the information that dyslexia appears
to be far less apparent in some cultures than in others. For example, it seems
that “dyslexia is hardly ever diagnosed in Italy” (loc 3876), whereas it is a
pretty common syndrome in French and English reading cultures. Could dyslexia
be nothing more than a cultural “disease”?
Seems unlikely. And indeed, it is
not so.

Rather, the biological propensity is rather stable across
readers of different languages but the practical reading problem becomes acute
only in cases where “writing systems [are] so opaque that they put a major
stress on the brain linking vision to language” (loc 3898).

How this was demonstrated was rather neat. A research group in Milan (headed by Eraldo
Paulesu) scoured Italy for reading impaired individuals who superficially did
not seem particularly impaired. However, careful testing showed they were; in particular,
“when compared to normal Italian readers, their scores were as deviant as those
of groups of French and English dyslexics as compared to control subjects in
their respective countries” (loc 3898). In other words, the absolute impairment Italian dyslexics
suffer from is less than that afflicting English or French dyslexics though the
relative impairment is the same. Conclusion: there is no underlying difference
between these populations despite their very different behaviors. I love these
kinds of discoveries, ones that penetrate beneath the surface glare to unpack
common features of the underlying mechanisms.[2]

Let me end this post noting one more thing that caught my
syntactician’s eye. Chapter 7 is a long discussion of symmetry effects in reading.
Dehaene reports on “mirror reading” (where (young) readers/writers “spontaneously
confuse left and right”). He attributes this to a basic structural feature of
the brain, viz. It encodes a symmetry principle “deeply buried in the structure
of our cortex” wherein “[o]ur visual brain assumes that nature is not concerned
with left and right…” (loc 4228).

It should be obvious why I found this interesting. The
Minimalist Program (MP) has taken the position that grammars care exclusively
about hierarchical dependencies, treating left/right linear order as a late
addition that arises when hierarchical grammatical structures are sent to the
S&M system for articulation. It is
curious do find out that the disregard for left/right order is a design feature
of certain parts of the nervous
system. Specifically, Dehaene recounts the following accepted wisdom: the
visual system has two main networks, a ventral what system, which functions to “recognize and label objects,” and
a dorsal how system that does things
(executes actions) with the objects so identified. Distinguishing left from
right, Dehaene notes, likely arises from the dorsal how system and symmetry is a core feature of the ventral what system.

This dorsal/ventral cut has also made an appearance in the
cog-neuro of language. Hickok and Poeppel have relatively recently
distinguished a ventral and a dorsal pathway for language, the former mapping
sound onto meaning and the latter mapping sound onto articulators (see here). My impressionistic self would love to
speculate that FL’s disregard for left/right information is related to its
living in a part of the brain that is blind to this kind of information (i.e.
maybe the part of FL that maps syntax to meaning (to CI) lives in the ventral
stream!). This comports with the basic
MP conceit that FL exploits (in part) structures from extant brainware used for
other (non-linguistic) cognitive tasks. So, if the FL mapping to “meaning”
lives in the symmetrical (ventral) part of the brain (where high level “object
recognition” also resides) then the fact that this mapping ignores left/right
information (see here)
is what we might expect (is this tenous enough for you?). We might also expect linear
(left/right) info to be prominent in the dorsal stream, the part of the brain, which
maps representations onto articulatory based representations.

Now, all of this is VERY stream of consciousness and as you
all know I am far from being competent to do anything more than ramble here
(but hey, what’s a blog for!). However,
it is neat to have discovered that some parts of the brain, as a matter of
fundamental organization (one view: symmetry is “inherent in the geometry of
our interhemispheric conncections” (loc 4444)), ignore left/right info and that
some parts of the language system, the ones mapping to meaning, appear to live
in this general neighborhood.

There’s lots more in the book, and I cannot recommend it
highly enough. I will blog one more post in the near future on another topic
that Dehaene takes up in the penultimate chapter. But for now, if you have a
couple of days of pleasure reading you are looking to fill, reading about
reading is a good way to idle away the hours.

[1]
Recycling is the star idea in this book. The term is self-explanatory:
cognitive circuits that typically serve one function can be repurposed to serve
other ends, an idea congenial to modern day minimalists.

[2]
Dave Kush’s analysis of island violations in Swedish has a similar structure.
He noted that the relative unacceptability of island violations was similar in
Swedish and English (i.e. the same sentence enjoyed the same relative standing
in the two languages), despite the fact that what is deemed ok or ? by speakers
of Swedish is considered * by speakers of English. Like Paulesu, Kush has
argued that the same mechanisms are at work wrt islands in both grammars
despite these absolute differences in acceptability ratings. Of course, why this latter difference exists is
well worth exploring (and Kush does) but the important common point is that
these easily noticeable differences often mask deeper important commonalities.