Comments

Thursday, February 26, 2015

Elliot Murphy, a grad student at UCL, sent me this link to a piece that he did on quantum biology and how it relates to issues about dualism and naturalist methodology that we've discussed on FoL. I enjoyed the read (and the links to excellent looking papers) and hope you might too.

Wednesday, February 25, 2015

Deep learning: Here are two papers that investigate the “psychological reality” of some popular deep learning models. They are
particularly important for those wishing to borrow insights from this
literature for cognitive ends. What the papers show is that it is possible to
construct stimuli that the systems systematically
classify (i.e. classify with very high confidence) as objects that no human
would mistake them for.Thus,deep neural networks
are easily fooled: see http://arxiv.org/abs/1412.1897

These papers do something that linguists commonly do. The papers are about negative data. Negative data describe
what humans do not do (e.g. native English speakers do not accept sentences
like “*who did you meat a man who saw”). If deep learning models are to be
understood as psychological theories, they need to agree both on the good and
the bad data (i.e. on what we accept and reject). So far, much of the
discussion has been on the positive capacities of such systems. They can be
trained to spot a dog in a picture. However, these papers observe that current
systems spot dogs that are not there, or, more precisely, categorize some
picture as a dog photo that no human would so categorize. Or as the first paper
puts it:

A recent study revealed that changing an image
(e.g. a lion) in a way imperceptible to humans can cause a DNN [deep neural
network NH] to label; the image as something else entirely (e.g. mislabeling a
lion a library). Here we show a related result: it is easy to produce images
that are completely unrecognizable to humans, but the state of the art DNNs
believe to be recognizable objects with 99.99% confidence (e.g. labeling with
certainty that white noise state is a lion).

Why do DNNs do this? Right now, it seems that nobody knows. Need I say that
these are important results for the psychological “reality” of DNNs? As every
linguist knows, explaining negative data is critical in evaluating any proposal
aimed at describing our mental powers.

A note on evolution: http://www.newyorker.com/news/daily-comment/evolution-catechism. This is an interesting discussion of the
obvious political import that theories of evolution have in the US. The points
are mainly obvious and congenial (to me). However, there is one distinction I
would have made that Gopnik does not; the difference between the fact of evolution versus the centrality
of the mechanism of natural selection
(NS) as the prime causal force behind evolution. The fact is completely
uncontroversial. Indeed, it was considered commonplace before Darwin, though Darwin did a lot to cement the truth of this
fact. What is somewhat controversial today is how large a part NS plays in
explaining this fact. All agree that it plays some role, the question is how big.

In many ways the recent
Evo-Devo discoveries replay discussions similar to those in the early cog
revolution. The Evo-Devo stuff suggests that the range of options that NS has to pick from is quite a bit narrower than
earlier believed (viz. there are very few ways of doing anything (e.g. building
an eye) and these tend to be strongly conserved over evolutionary time scales.
Of course, the fewer the options available, the less one looks to NS to explain
the outcome. Why? Because NS relies on the idea that were you are is heavily
dependent on the path you took to getting there. But if the number of paths to
get anywhere is very small in number then why you got to where you are is less
dependent on a long series of linked choices than on the one or two you made at
the very start. So, it is not that NS plays no role. Rather the importance of NS’s
role depends on how wide the range of possibilities. The question is then not
either/or but how much. And these are
theoretical/empirical questions.

So, Gopnik is quite
right that denying the fact of evolution is a nutty thing to do, sort of like
denying that the earth is round or the earth orbits the sun. However,
questioning the size of NS effects to evolutionary trajectories is not. NS is a
theory. Evolution is the fact. As Gopnik notes, theories evolve and change. One
of the changes being currently contemplated is that NS is a less potent factor
than heretofore believed. Even a Republican can believe this in scientific good
faith.

This is an interesting evolutionary finding because it uncovered a new
mechanism by which evolution can work very quickly in a very complex setting.We don’t know much about how complex organ
systems evolve – the “major transitions in evolution” – like our brains. There
are so many genes involved – how is it all put together without it getting all
tangled up?But now somebody’s got their
foot in the door about one of the biggest transitions of all – how placental
mammals evolved pregnancy, and went from laying eggs externally to growing them
inside their bodies. Turns out that again (surprise!) this involved a set of
regulatory genes, plus – the real surprise –what are called ‘transposons’ –
bits of genes that can leap whole genes at a single bound and insert themselves
even across chromosomes.[1]
(Lynch et al., Ancient Transposable
Elements Transformed the Uterine Regulatory Landscape and Transcriptome during
the Evolution of Mammalian Pregnancy, Cell
Reports (2015.[2]
Apparently, the transposons donated regulatory elements to the genes that were
recruited to alter the immune system so that the mother wouldn’t reject fetal
embryos as foreign (remember a fetus has all those unknown genes from daddy).
Lynch et al. demonstrated that this involved thousands of genes in a carefully
coordinated orchestration led in part by the transposons, enabling
exceptionally rapid evolution.As Lynch
notes, nobody expected to find that evolution could work this way to evolve
large complex organ systems. Seems we still have a lot to learn about the basic
evolutionary machinery, more than 150 years after Darwin.

More on Minds
and Bodies:Those that enjoyed
Chomsky’s discussion of the mind-body problem (here)
(or should I say the non-existence of the problem given Newton’s excision of
body from the equation) and were (rightly) dissatisfied with my discussion (here)
might enjoy a real philosophical exposition of the state of the art by John
Collins (here). It engages with lots of the philo literature on these matters and
is eminently readable, even for linguists. He discusses and defends a position
that he calls it “methodological naturalism,” which, if understood and adopted
as the standard in the cog-neuro sciences (including linguistics) will remove
most of the metaphysical and epistemological underbrush that hinders fruitful
collaboration between linguistics and neuro-types. So, take a look and pass it
onto your friends (and enemies) in the neurosciences who ignore most of what
you have to say.

[1] First
discovered by Barbara McClintock in corn, in the 1940s.Nobody believed her at first, because it
violated everything people thought they new about Mendelism and genes, but she
hammered away at it.Forty years later
she got a Nobel prize.

Sunday, February 22, 2015

There is a Bayes buzz
in the psycho/cog world. Several have been arguing that Bayes provides the proper
“framework” for understanding psychological/cognitive phenomena. There are
several ways of understanding this claim. The more modest one focuses on useful
tools leading to specific analyses that enjoy some degree of local
justification (viz. this praises the virtues of individual analyses based on
Bayes assumptions in the usual way (i.e. good data coverage, nice insights)). There
is also a less modest view. Here Bayes enjoys a kind of global privilege (call
this ‘Global Bayes’ (G-Bayes)). On this view, Bayesian models are epistemically
privileged in that they provide the best starting point for any
psychological/cognitive model. This assumption is often tied together with
Marrian conceptions of how one ought to break a psycho/cog problem up into
several partially related (yet independent) levels. It’s this second vision of
Bayes that is the topic of this post. A version is articulated by Griffiths et.
al. here. Many contest this vision. The dissidents’ arguments are the focus of what
follows. Again, let me apologize for the length of the post. A lot of the
following is thinking out loud and, sadly, I tend to ramble when I do this. So
if this sort of thing is not to your liking, feel free to dip in and out or
just ignore. Let’s begin.

Cosma Shalizi, (here)blogs about a paper by Eberhardt and Danks (E&D) (here) that
relates to G-Bayes.Shalizi’s blog post discusses E&D’s paper, which
focuses on the problem that probability matching poses for Bayesian psycho
theories.The problem E&D identifies
is that most of the empirical literature (which E&D surveys) ends with the
subject pool “probability matching the posterior.”[1] Here’s the abstract:

Bayesian models
of human learning are becoming increasingly popular

in cognitive
science. We argue that their purported confirmation largely relies on a methodology
that depends on premises that are inconsistent with the claim that people are
Bayesian about learning and inference. Bayesian models in cognitive

science derive
their appeal from their normative claim that the modeled inference is in some
sense rational. Standard accounts of the rationality of Bayesian inference imply
predictions that an agent selects the option that maximizes the posterior expected
utility. Experimental confirmation of the models, however, has been claimed
because of groups of agents that ‘‘probability match’’ the posterior.
Probability matching only constitutes support for the Bayesian claim if
additional unobvious and
untested (but testable) assumptions are invoked. The alternative strategy of
weakening the underlying notion of rationality no longer distinguishes

the Bayesian
model uniquely. A new account of rationality—either for inference or for
decision-making—is required to successfully confirm Bayesian models in
cognitive science.

There are replies to this (here) by Tenenbaum & friends
(T&F) and a reply to the reply (here) by Icard, who I think is (or was) a
student of Danks’ at CMU.The whole
discussion is very interesting, and, I believe, important.

Here’s how I’ve been thinking about
this translated into linguistiky terms. Before getting started, let me say up
front that I may have misunderstood the relevant issues. However, I hope that
others will clarify the issues (viz. dispel the confusion) in the comments. So,
once again, caveat lector!

Griffiths et.al. describes Bayesian
models as follows. They are intended as “explanations of human behavior.” The
explanations are “teleological” in that “the [Bayesian-NH] solutions are
optimal” and this optimality “licenses an explanation of cognition in terms of
function.” The logic is the following: “the match between the solution and human
behavior may be why people act the way they do” In other words, a Bayesian
model intends to provide an optimal solution to “problems posed by the
environment” with the assumption that if humans fit the model (i.e. act
optimally) then the reason they do so is because this is the optimal solution
to the problem. Optimality, in other words, is its own explanation. And it is
the link between optimality and explanation that lends Bayes models a global
kind of epistemic edge, at least for Marr level-1 descriptions of the
computational problems that psycho theories aim to explain.

Curiously, Bayesians do not seem to
believe that people actually do act/cognize optimally.[2] As T&F explains, Bayesian accounts do “not imply a
belief that people are actually computing these optimal solutions” (p. 415).[3] Why not? Because it is recognized that such computations
are intractable. Therefore, assuming that the models are models of what people
do is “not a viable hypothesis.” How then to get to the actual psychological
data points? To get from the optimal Bayesian model to the witnessed behavior
requires the addition of “approximate algorithms that can find decent solutions
to these problems in reasonable time” (p. 415-6). This seems to suggest that
subjects’ behavior is approximately
optimal.

In other words, optimal Bayes
considerations sets the problems to be solved by more tractable algorithms,
which in fact do a pretty good job of coming close-ish to the best solution given resource constraints. So, what
faces the tribunal of actual psychological evidence is the combination of (i) a Bayesian optimal solution and (ii) a good
enough algorithm that approximates this optimal solution given other computational
constraints. So, more specifically, in cases where we find individuals
probability matching rather than all converging on the same solution (as the
Bayes model would predict) the algorithm does the heavy lifting. Why? Because a
Bayes account all by itself implies that all participants if acting in a
rational/optimal Bayes manner should all make the same choice; the one with the highest posterior. So Bayes accounts
need help in order to make contact with the data in such cases, for Bayes by
itself is inconsistent with probability matching results.[4] Here’s Shalizi making this point:

Here's
the problem: in these experiments (at least the published ones...), there is a
decent match between the distribution of choices made by the population, and
the posterior distribution implied plugging the experimenters' choices of prior
distribution, likelihood, and data into Bayes's rule. This is howevernotwhat Bayesian decision theory
predicts. After all, the optimal action should be a function of the posterior
distribution (what a subject believes about the world) and the utility function
(the subjects' preferences over various sorts of error or correctness). Having
carefully ensured that the posterior distributions will be the same across the
population, and having also (as Eberhardt and Danks say) made the utility
function homogeneous across the population, Bayesian decision theory quite
straightforwardly predicts that everyone should make thesame choice, because the
action with the highest (posterior) expected utility will be the same for
everyone. Picking actions frequencies proportional to the posterior probability
is simply irrational by Bayesian lights ("incoherent"). It is all
very well and good to say that each subject contains multitudes, but the
experimenters have contrived it that each subject should contain thesamemultitude, and so should acclaim the
same choice. Taking the distribution of choicesacrossindividuals to confirm the Bayesian
model of a distributionwithinindividuals then amounts to a fallacy
of composition. It's as thoughthe
poetsaw two of his three blackbirds fly east and one west,
and concluded thateachof them "was of three
minds", two of said minds agreeing that it was best to go east.

Now for a linguistics analogue:
Acceptability data is what we largely use to adjudicate our competence
theories.[5] Now, we recognize that acceptability and grammaticality
do not perfectly overlap, there being some cases of grammatical sentences that
are unacceptable and some cases of acceptable sentences being ungrammatical.
However, this overlap is marginal (at least in a pretty big domain).[6] What E&D argues (and Shalizi emphasizes) is that this
is false in the typical case of Bayesian psychological accounts. Typically the
Bayes solution does not provide the
right answer for in many cases we find that the tested population probability
matches the posterior distribution of the Bayes model. And this is strongly inconsistent with what Bayes would
predict. Consequently, what allows for the fit with the experimental data is
not the Bayes vision of the computational problem but the added algorithm. In
effect, it would be as if our competence theories mostly failed to fit our acceptability profiles and we explained this
mismatch by adding parsing theories that took in the slack. So, for example, it
would be as if our Gs allow free violations of binding principle A but our parsers
say that such violations, though perfectly grammatical, are hard to parse and
this is why they sound bad.

Let me be clear here: syntacticians do in fact make such arguments. Think of
what we say about the unacceptability of self-embedded sentences (e.g. ‘That
that that Bill left annoyed Harry impressed Mary’). However, imagine if this
were the case in general. I think
that syntacticians would begin to wonder what work the competence theory was
doing. In fact, I would bet that such a competence theory would quickly find
its way to the nearest waste basket. This is effectively the point that Icard
makes in his reply to T&F. Here’s the money quote:

It is commonly
assumed that a computational level analysis

constrains the
algorithmic level analysis. This is not always

reasonable,
however. Sometimes, once computational costs

are properly
taken into account, the optimal algorithm looks

nothing like the ideal model or any
straightforward approximation

thereto” (p. 3) [my emphasis, NH].

This raises a
question, which Shalizi (and many others) presses home, about the whole rationale behind Bayesian modeling in
psychology. Recall, the explanatory fulcrum is that Bayes models provide optimal
solutions, as it is this optimality that licenses teleological explanations of
observed behavior. Shalizi argues that the E&D results challenge this
optimality claim.

By hypothesis, then, the mind is
going to great lengths to maintain and update a posterior distribution, but
then doesn'tuseit in any sensible way. This hardly
seems sensible, let alone rational or adaptive. Something has to give. One
possibility, of course, is that is sort of cognition is not
"Bayesian" in any strong or interesting sense, and this is certainly
the view I'm most sympathetic to…

In other words,
Shalizi and Icard are asking in what sense the Bayes level-1 theory of the
computation is worth doing given that
it’s description of the problem seems not to constrain the algorithmic level-2
account (i.e. Marr’s envisioned fertile link between levels seems to be
systematically broken here). If this is correct, then it raises Icard’s
question in a sharp form: in what sense is providing a Bayesian analysis even a step in the direction of explaining
the psychological data? Or to put this another way: what good is the Bayesian
computational level theory if E&D and Shalizi and Icard are right? Things
would not be bad if most of the time subjects approximated the Bayes solution.
What’s bad is that this appears to be the exception rather than the rule in the
experimental literature that is used to argue for Bayes. Or so E&D’s literature review suggests is the case.

To fix ideas, consider
the following abstract state of affairs: Subjects consistently miss
target A and hit target B. One theory is that they are really aiming for A, but
are missing it and consistently hitting B because trying to hit A is too hard
for them. Moreover, it is too hard in precisely a way that leads to
consistently hitting B. This account might
be correct. However, it does not take a lot of imagination to come up with an
alternative: the subjects are not in fact aiming for A at all.This is the logic displayed by the literature
that E&D discusses. It is not hard to see, IMO, why some might consider
this less than powerful evidence in favor
of Bayes accounts.

Let me further
embroider this point as it is the crux of the criticism. The Bayes discussions
are often cloaked in Marrish pieties. The claim is that Bayes analyses are
intended to specify the “problem that people are solving” rather than provide a
“characterization of the mechanisms by which they might be solving it” (p. 2
Griffiths et. al.). That is, Bayes proposals are level-1, not level-2 theories.
However, what makes Marrish pieties compelling is precisely the intimation that
specifications of the problems to be solved will provide a hint about the actual
computations and representations that the brain/mind uses to solve these problems.Thus, it is fruitful to indulge in level-1
theorizing because it sheds light on
level-2 processes. In fact, one might say that Marr’s dubbing level-1 theories
as ‘computational’ invites just this supposition, as does his practice in his
book. Problems don’t compute, minds/brains do. However, a computational
specification of a problem is useful to the degree that it suggests the
magnitudes the brain is computing and the representations and operations that
it uses to compute them. The critique above amounts to saying that if Bayes
does not commit itself to the position that by
and large cognition is (at least approximately) optimal, then it breaks
this Marrian link between level-1 and level-2 theory and thereby looses its main
conceptual Marrian motivation. Why do a Bayes level-1 analysis if it in no way
suggests the relevant brain mechanisms or the variables mental/brain
computations juggle or the representations used to encode them? Why not simply
study the representations and algorithms directly without first detouring via a
specification of a problem that will not shed much light on either?

Here’s one
more try at the same point: Griffiths et. al. try to defend the Bayes
“framework” by arguing that it is a fecund source of interesting hypotheses.
That’s what makes Bayes stories good places to start. But this just seems to
beg the question critics are asking; namely why
should we believe that the framework does indeed provide good places to start?
This is the point that Bowers and Davis (B&D) seem to be making in their
reply to Griffiths et. al. here
and the one that Shalizi, E&D and Icard are making as well.

Let me provide an analogy with contemporary minimalism. In
proposing a program it behooves boosters to provide reasons for why the program
is promising (i.e. why adopting the program’s perspective is a good idea). One,
of course, hopes that the program will be fecund and generate interesting
questions and analyses (i.e. models), but although the proof of a program’s
pudding is ultimately in the eating, a
program’s proponents are required to provide (non-dispositive) reasons/motivations
for adopting the program’s take on things. In the context of MP this is the
role of Darwin’s Problem. It provides a rationale for why MP generated
questions are worth pursuing that are independent of the fruits the program
generates. Darwin’s Problem motivates the claim that the questions MP asks are
interesting and worth pursuing because if answered they promise to shed light
on the basic architecture of FL. What is the Bayes analogue of Darwin’s problem?
It seems to be the belief that considering the properties of optimal solutions
to “problems posed by the environment” (Griffiths et. al. p. 1) will explain
why the cognitive mechanisms we have are the way they are (i.e. a Bayes
perspective will provide answers to a host of why questions concerning the representations and algorithms used by
the mind/brain when it cognizes). But why
believe this if we don’t also assume that a specification of the computational
level-1 problems will serve to specify level-2 theories of the mechanisms? All
the critiques in various ways try to expose the following tension: if Bayes
optimality is not intended to imply anything about the mind/brain mechanisms
then why bother with it and if it is so intended then the evidence suggests that
it is not a fruitful way to proceed for more often than not it empirically
points in the wrong direction. That’s the critique.

Please take the above
with a large grain of salt. I really am no expert in these matters so this is an
attempted reconstruction of the arguments as a non-expert understands them.
But, I believe that I got the main points right and if things are as Shalizi,
E&D, Icard and B&D describe them to be then it stands as a critique of
the assumption that Bayes based accounts are prima facie reasonable places to start if one wants to account for
some cognitive phenomenon (i.e. it seems like a strong critique of G-Bayes). Or
to put this in more Marrian terms: it is not clear that we should default to
the position that Bayes theories are useful
or fecund Level-1 computational
analyses as they do not appear to substantially constrain the Level-2 theories
that do all (much) of the empirical heavy lifting. This is the critique. It
suggests, as Shalizi puts it, acting like a Bayesian is irrational. And if this
is correct, it seems important for it challenges the main normative point that
Bayes types argue is the primary virtue of their way of proceeding.

Let me end with two
pleas. First, the above, even if entirely
correct, does not argue against the adequacy of specific Bayesian proposals (i.e. against local Bayes). It merely
argues that there is nothing privileged
about Bayesian analyses (i.e. G-Bayes is wrong) and there is nothing
particularly compelling about the Bayes framework as such. It is often
suggested (and sometime explicitly stated) that Rational Analyses of cognitive
function (and Bayes is a species of these) enjoy some kind of epistemologically
privileged status due to their normative underpinnings (i.e. the teleological
inferences provided by optimality). This is what these critical papers undercut
if successful. None of this argues against any specific Bayes story. On its own terms any such particular account
may be the best story of some specific phenomenon/capacity etc. However, if the
above is correct, a model gains no epistemic privilege in virtue of being cast
in Bayesian terms. That’s the point that Shalizi, E&D, Icard and B&D make.Specific cases need to be argued on their
merits, and their merits alone.

Second, I welcome any
clarifications of these points by those of you out there with a better
understanding of the current literature and technology. Given the current
fashion of Bayes story telling (it is clearly the flavor of the month) in
various parts of cognition (including linguistics) it is worth getting these matters
sorted out. It would be nice to know if Bayes is just technology (and if so justified on application at a time) or
is it a framework which comes with some independent conceptual motivation. I
for one would love to know.

[1]Griffiths et. al. agree that there is a
lot of this in the literature and agree it is a serious problem. They offer a
solution based on Tenebaum & friends that is discussed below.

[2]Optimality of cognitive faculties is a
pretty fancy assumption. I know because Minimalists sometimes invoke similar
sentiments and it has never been clear to me how FL or cognition more generally
could attain this perfection. As Bob Berwick never tires of telling me, this is
a very fancy assumption in an evolutionary context. Thus, if we are really
perfect, this may be more of a problem requiring further explanation than an
explanation itself.

Griffiths
et. al. seem to agree with this. They tend to agree that human’s are not
optimal cognizers nor even approximately optimal cognizers. Nonetheless, they
defend the assumption that we should look for optimal Bayes solutions to
problems as a good way to generate hypotheses. Why exactly is unclear to me.
The idea seems to be the tie in between optimal solutions and the teleological
explanations they can offer to why
questions. Nice as this may sound, however, I am very skeptical for the simple
reason that teleological explanations are to explanations what Christian (and
political?) Science is to science. Indeed, these were precisely the kinds of
explanations that 16th century thinkers tried to hard to excise. It
is hard to see how this functional fit, even were it to exist, is supposed to
explain why a certain system has the
properties it has absent non-teleological mechanisms that get one to these
optimal endpoints.

[3]Note that whether people compute optimal solutions is consistent
with the claim that people act
optimally. I mention this for the teleological account explains just in case the subjects are acting according to the
model. Only then can “the match between the solution and
human behavior” teleologically explain “why people act the way they do.” If
however, subjects don’t act this way,
then it is hard to see how the teleological explanation, such as it is, can
gain a footing. I confess that I don’t follow the Griffiths et. al. reasoning
here. The only way out I can see is if they assume that subjects do in fact act (approximately) optimally even if
they don’t compute using Bayes like algorithms. But even this seems incorrect,
as we shall see below.

[4] It is important to note that it is the posterior probability (a number that is
the product of prior Bayesian calculation) that is being matched, as Griffith’s
et. al. observe.

[5]Actually, acceptability under an
interpretation.Acceptability simpliciter is just the limiting case
where some bit of linguistic data is not acceptable under any interpretation.

[6]
Why the caveat? If Gs generate an unbounded number of
Ss then most will be beyond the capacity for judging acceptability. What we
assume is that where an acceptability judgment can be made grammaticality and
acceptability will largely overlap.

Thursday, February 19, 2015

Here are some recent things that I found interesting that
may interest you as well.

On MOOCish matters:
http://www.voxeu.org/article/disruptive-potential-online-learning. The big finding is that employers don’t like MOOCs that much and treat them
as inferior degrees. This would change if, for example, places like Harvard and
MIT and Stanford substituted the 4 year college experience they currently offer
to elites with a MOOCish experience. When the well-to-do vote with their kids’
feet and buy into MOOC based degrees, then everyone will. Till then, it will
largely be a way of bending some cost
curves (and you know whose) and not others.

This little interview
is filled with exciting tidbits. Here are three:

(i)Sapir’s
hypothesis concerning the interaction of language with thought is far more
modest than many have assumed. On DE’s interpretation, the Sapir-Whorf
hypothesis is not the rather exciting (but clearly wrong) view that,the language we
speak determines the way we can think” but the rather modest claim that “the
language we speak affects in some way some of the ways we think when we
need to think quickly.” Note the Kahnemanian tinge here.IMO, this is hardly an exciting thesis, and
it is little wonder that the strong version of the thesis is what aroused
interest. The weak version seems to me close to a truism.

(ii)But a truism that Everett is impressed with. He claims that Sapir
discovered that “culture can influence language” and that though language
“clearly has some computational aspects that cannot be reduced to culture…there
are a number of broad characteristics that reflect the culture they emerge
from…”(3). I confess that this strikes me as obvious and is the first thing a neophyte learning a second
language focuses on. So, though Sapir is deserving of honors, it is not because
of this “insight.” Curiously, Everett seems not to have noted that Sapir’s
first observation (i.e. that language is a kind of computational system) does
not impress him. Maybe that’s why Everett has problems understanding claims
that people imake about such systems. In particular,

(iii)DE still
confuses Chomsky Universals with Greenberg Universals. It comew across in DE’s
discussion of recursion where he once again asserts that the existence of a
finite language would undermine the Chomsky claim that language is recursive
(see answer to question 2). This is not
the claim. The claim is that UG produces Gs that are recursive.So the fact that FL endows humans with the capacity to acquire Gs that are
recursive does not imply either that every language has a recursive grammar or
that every speaker uses this capacity to produce endlessly large sentences. So,
evern were Piraha a “finite language” as DE claims (and which, truth be told, I
still do not believe) it implies nothing
whatsoever for Chomsky’s claim that it is a fact about FL/UG that language is
recursive. This is simply a non-sequitur based on DE’s misunderstanding of
what GGers take a universal to be (note his claim would be valid were he
understanding ‘universal’ in Greenbergian terms). However, do note expect DE to
ever loose this misunderstanding. As Upton Sinclair once noted: “It is
difficult to get a man to understand something when his salary depends on his
not understanding it.” What do you think the odds are that DE would be getting
interviewed here or featured in the New
Yorker or the Chronicle of
HigherEeducation were he not peddling the claim that his work on Piraha
showed that Chomsky work in linguistics was incorrect? Do I hear 0?

(iv)DE does
not appear to understand that Gs can be recursive even if utterances have an
upper bound. I am not saying that
this is what is the case for Piraha. I am saying that recursion is a property
of Gs not of utterances. A mark of
recursion (i.e. evidence for recursive mechanisms) can be gleaned by looking to
see if the products of this mechanism are unbounded in length and depth.But the converse does not obtain: Gs might be
recursive even if utterances (their products) are bounded in size. DE seems to
think that during language acquisition, kids scale the Chomsky hierarchy, first
treating them as finite lists and then as generated by regular grammars and
then by context free and then…all the way to mildly context sensitive. Where he
got this conception I cannot fathom. But there is no reason to think that this
is so. And if it is not, then given that Piraha speakers can learn what even DE considers recursive languages (a bad
locution, by the way, given that ‘recursive’ is properly speaking a predicate
of grammars, and only secondarily their products) like Portuguese it is clear
that they have the same UGs we all
do. And if this is right, then it is quite unlikely that they would not acquire
a recursive G even for Piraha. But this is a discussion for another time. Right
now it suffices for you to know that DE, it appears, cannot be taught and that
there is still a large and lucrative market for “Chomsky is wrong” material.
Big surprise.

Genes and languages: The Atlantic
has a little piece showing that some “languages and genes do in fact share
similar geographical fault lines.” Apparently, whether this was so was a
question of interest to linguists. As the paper puts it: “Using new dataset and
statistical techniques, the researchers were able to scratch an itch linguists
and demographers have struggled to reach.” I confess to never having had this
itch so I am not sure why this observation is of particular interest to
linguists.

It is quite clear that
whatever genetic change occurred did not affect the basic structure of FL. How
do I know this? Because, so far as we can tell any kind can still learn any
language in roughly the same way any other kid can. And, from what we can tell,
all Gs obey effectively the same kinds of general structure dependent
constraints. So, whatever the genetic changes, they did not affect those genes
undergirding FL/UG.Nor, so far as I can
tell, is there any reason to think that the phoneme properties and genetic
features that are tracked are in a causal relation (i.e. neither is the root
cause of the others change). It just seems that they swing together. But is
this really surprising? Don’t people who have similar phonemes tend to live
near each other? And as these kinds of genetic changes are subject to
environmental influence is this really a surprise?

Maybe this is
interesting for some other reason. If so, please post a comment and let me know
what that interest is. I would love to know. Really. Here’s the link:

Some philo/history of science: I enjoyed this little piece
mainly for the discussion of the relationship between realism and mathematics
in the physical sciences historically. It suggests one way of understanding
Newton’s famous line about not feigning hypotheses. His theory gave a precise
mathematical understanding of gravity. He thought that this was enough and that
metaphysical speculations concerning its “reality” were not required from a
scientific theory. This was enough. At any rate, there has been lots of
intellectual pulling and pushing about how to understand one’s theoretical
claims (e.g. realistically, instrumentally) and it is interesting to see a
little history.

Much of the controversy has centered on the types of statistical analyses
used in most scientific studies, and hardly anyone disputes that the math is a
major tripping point…

There is a case to be
made that though statistics is in
principle useful, applying it correctly is very very hard. It’s one of
these things that are better in theory than they are in practice. And maybe any
paper dressed up in statistical garb should ipso
facto be treated cautiously. Right now we do the opposite: stats lend
credence to results. Might it be that they should be treated with suspicion until proven innocent? (For some useful
discussion how even the best intended can go statistically astray see this recent piece by Gelman and Locken.)

One great scientist
who was very suspicious of statistical results, it seems, was Ernest
Rutherford. He was working at a time when physical theory was far more advanced
than anything we see in our part of the sciences. Here’s what he said: “If your
experiment needs statistics, you ought to have done a better experiment.” The
problems with replication seem to lend his one liner some weight, as does the
apparent difficulty inherent in doing one’s stats correctly.