Comments

Monday, November 30, 2015

Patrick Trettenbrein posted a comment on the recent little post on Gallistel's conjecture (here). He included a little paper of his that reviews some interesting new work on aplysia that provides further evidence for the Gallsitel conjecture. Here are the concluding paragraphs.

All
in all, it seems that there indeed are two different processes at work in
learning and memory, as Chen et al. (2014) also point out. While the exact details about both remain obscure, there
appears to be a dissociation between the way in which learning occurs and how
memory works. We do not know how the brain implements a read/write memory, but
there is good evidence that it does. Similarly, there is ample and convincing
evidence, also in Chen
et al. (2014), that
synaptic conductivity and connectivity play a role in regulating behavior.
Consequently, it appears that synaptic plasticity might not so much be a
precondition for learning as it is a consequence of it, so that the observed
rewiring of synaptic connections might constitute the brain's way of ensuring
an “efficient,” or possibly even close to “optimal” (Cherniak et al., 2004; Sporns, 2012), connectivity and therefrom resulting activity
pattern that is appropriate to environmental (and presumably also “internal”)
conditions. Synaptic plasticity thus might be reinterpreted as a way of
regulating behavior (i.e., activity and connectivity patterns) only after
learning has already occurred (i.e., after relevant information has been
extracted from the environment and stored in memory).

Extrapolating
Chen et al.'s (2014) findings stemming from work on Aplysia
to claims about much more complex nervous systems is, of course, speculative in
nature, to say the least. However, it seems to be no more speculative than the
almost universally accepted idea of the synapse being the locus of memory.
Similarly to Johansson
et al. (2014), the
work of Chen et al.
(2014) shows that
(1) there is plenty of “room” for the implementation of symbols other than
synapses, and (2) substantiates the understanding that the network approach of
connectionism might indeed best be seen as an implementational theory (Fodor and Pylyshyn, 1988) that still requires
representation, computation, and a Turing architecture (i.e., a read/write
memory). Gallistel
and Balsam (2014)
proclaimed that is was about time to rethink the neural mechanisms of learning
and memory, Chen et al.'s experimental results add to the urgency of this
claim.

I particularly like the speculation that the wiring is there not to code the relevant information but for efficient use and that it is a consequence of learning rather than a pre-condition for it. I also like the observation, that Gallistel and Matzel also emphasize (see here), that there is, at best, paltry evidence for the standard assumption that the "synapse [is] the locus of memory." The Gallistel conjecture is generally assumed to be some daring edge of thought kind of speculation for which there is little evidence in contrast to the well-established "fact" that memory lives in inter-neural connections. Vast academic enterprises are based on this assumption. This may be more truthy than true, however.

At any rate, take a look at Patrick's note and the Chen paper he links to. It seems that the Gallsitel conjecture is daily becoming less "exciting." About time.

Are the critics right? Is there scant evidence for
universals? The right answer is that it all depends what you mean by ‘universal.’
If by this you intend a Greenberg Universal (GU) then it might be right (in
fact, as you will see below, I think it should
be right). If by this you mean a Chomsky Universal (CU) then you are not likely
right. There is a big difference between these two and the empirical success of
GG rests on keeping them firmly distinguished. Why? Because a priori there is little reason to think
that there are many GUs out there. There may be a few, but the standard GG
universals if understood as candidate GUs are not likely among them. When
critics of GG argue that it’s hard to find universals expressed in the world’s
many languages, they understand ‘universals’ as GUs. And here they may well be
right!However, as GG commits itself to
CUs and not GUs it requires quite some ancillary argument to conclude from the
absence of GUs to the non-existence of CUs.

Indeed, from a perfectly normal “scientific” point of view,
we should not expect to find many GUs. What do I mean? Well, think of the
analogues of GUs in a real science, like Newtonian mechanics. We observe that
bodies fall. We ask what makes bodies fall. We propose that bodies “fall” because
of gravitational attraction. In particular, there is a force G that causes
bodies (i.e. masses) to attract. Larger masses exert a stronger pull than
smaller ones. Thus, bodies much smaller than the earth, say a ball, will “fall”
in the sense that the mass of the earth will more strongly attract the mass of
the ball than vice versa. This will make it appear that the ball “falls” to
earth when it is released (rather than the earth “falls” to the ball).That’s the story, and a good one it is,
though we now know that it needs amending, especially if the ball is travelling
near the speed of light. At any rate, what’s this have to do with GUs?

Well, is it indeed phenomenologically accurate that when we
observe falling bodies in the wild we observe them acting in accord with
Newton’s law of gravitation? Nope. Note even close. A leaf drops from a tree. Does it appear to fall in accordance with the
law of falling bodies. Not on your life.Drop a ball into lake and see how long it takes to hit bottom (or if it hits
bottom at all). Does it appear to drop in accordance with the law of falling
bodies? Nahh! Or take a body that is electrically charged and drop it in an
electrical field and see if Newton’s law suffices to describe its trajectory.
It doesn’t. What’s the right conclusion: that gravity is not a cause of falling
bodies, that things don’t universally attract (i.e. fall)? Not on your life.
Why?

Here’s the conventional wisdom. We understand that the law
of falling bodies is not intended as a description of what we see outside our
window. It describes one relevant
force in causing what we see. And this force in complex interaction with many other factors, causes observed
physical behavior. Thus, we know that shape matters (not just mass) if the
object is not dropped in a vacuum (and vacuums are pretty rare out there in the
real world). We know that the consistency of the space into which an object
drops also matters (less frictional resistance in air than in water and less in air than in mercury). We know that
electrical charges exert forces on electrically charged objects and so this, as
well as mass, can effect a falling object’s trajectory. If the object is small
enough, then other factors may intercede as well. If shaped a certain way, drop
an object into water and it will float rather than fall. So many factors stand
between dropping and falling and nonetheless gravity does explain why bodies
fall.

What’s this mean? It means that whatever the law of falling
bodies is, it is not a description of what
we see when we look at the world outside our window. In other words, it is not GUish.[1]
It is not a description of the immediately phenomenal world, but a proposal
about one of the fundamental forces that act on bodies and this force is
(often) an important causal factor in determining how bodies that we see fall
actually do fall. Pure expressions of the law take careful experimental set up.
Indeed, we must control for many other factors, before we can “see” gravity’s effects.[2]

There is an excellent discussion of just how complex this is
in Cartwright (here,
chapter 4). I have discussed her main points in previous posts (see here). For
current purposes, Cartwright makes two very important observations. First, that
it takes a lot of work to hook a law up to observation. This is what a good
experiment does. It establishes a way of observing the effects of abstract
non-observable features to visible effects. It creates a “nomological machine,”
a way of hooking up the underlying capacities to surface regularities. And,
this is the important part:

There is no fact of the matter what
a system can do just in virtue of having a given capacity. What it does ,
depends on its setting, and the kinds of settingsnecessary for it to produce
systematic and predictable results are very exceptional (73).

So, gravity can be seen in action, but only if we arrange
things very carefully! And if this is true of gravity, why should it be less true of a principle of UG?Of course it might be different in the mental sciences, but it might not be, and
assuming that linguistic “laws” (aka principles of UG) must be apparent to inspection in the wild is little more than
methodological dualism (a real no-no).

Returning to the main topic, GUs are typological
generalizations. They describe (and are intended to describe) generalizations
thought to be observable across languages, surface generalizations. Why are we
surprised that not many can be found? Why are we surprised that the UG
principles proposed are not “surface true”? Why should we expect the visible
surface properties of language to express the underlying grammatical forces at
work any more than we expect the phenomenological observables of real world
events to distinctly manifest their underlying causes (e.g. the law of gravity
in bodies observed falling around us). We don’t in the latter, and shouldn’t in
the former. Which brings us to CUs.

Chomsky understood universals to be properties of FL, FL
being the specifically linguistic contribution that minds exploit to build
language particular Gs. From the get-go, these were understood to be quite
abstract, and to not be inducible from the simple inspection of the surface
properties of sentences. Thus, CUs were not intended to be surface true,
anymore than gravity is. Thus, the absence of GUs does not imply the
non-existence of CUs any more than the phenomenological inadequacy of the laws
of gravity to describe what happens when any object falls any time anywhere
invalidates Newton’s theory of gravity and its explanation for the law of
falling bodies.

IMO, none of this is or should be controversial. I mention
it because it seems easily forgotten.Linguists
(or many of them) are currently quite skeptical that we have discovered any universals. But this is because many
forget the distinction between GUs and CUs. Doing so leads to skepticism precisely
because there is every reason to believe that universals understood as Greenbergian objects are not (and should not be)
thick on the linguistic ground. Thus, when critics point out that such GUs are
not pervasive we should agree and say that nobody thought (or should have
thought) they would be. And then loudly repeat that GUs are not CUs and CUs is
what we are looking for.

Why the warning? Because, it seems to me that typological
work invites the inference that
linguists are on the hunt for GUs and that GGers agree with critics of the
Chomsky program that universals ought to be understood as GUs. But this is a
mistake, one that misunderstands what GG is about. To repeat a venerable theme:
GG takes the object of study to be the structure of FL/UG, not the properties
of languages. These latter are interesting to the degree that they illuminate
the former. And there is no reason to think that linguistic principles, any
more than any other scientific principles, will be visible in the data used to investigate
them.

Let me make this point another way. IMO, there is no way
that something like FL/UG does not exist (see here
and here
for a defense). That FL/UG exists is
a virtual truism. What’s in FL/UG is
not. Thus, what’s up for grabs is the fine structure of FL/UG, not whether it
exists. Here’s another triviality: language exhibits the properties of FL/UG
only in interaction with many other adventitious linguistic factors, many
non-linguistic cognitive factors and probably much else (like the weather, time
of day, and who knows what else). This means that we expect the fine structure
of FL/UG to be hard to discern and we do not expect it to sit out there waiting
to be spotted by (even careful) observation.

In fact, I would go further (as you knew I would). I suspect
that the only really good way to argue for a CU is via something like a POS
argument.Looking at lots of languages
and Gs might be helpful (see here), but if you want to zero in on potential
candidate universals, there is nothing like a POS argument. Why? Because, POSs
limn the borders of the grammatically possible. That’s what’s so nice about
them. Inductive surveys of many Gs cannot do this. POSs are the linguistic
analogue of Cartwright’s nomological machines. They afford the most direct
access to CUs, and for those interested in FL/UG, CUs are the principle objects
of interest.

So be careful out there. Languages and their fabulous
intricacies can be confusing. It’s not that hard to mistake Greenberg
Universals for Chomsky Universals, and it’s a slippery slope from there the
dreaded vice of Empiricism (and its concomitant horrors (e.g. connectionism).
So watch your step when you go into the field.

[1]
As I’ve noted before, there is a tendency to understand universals as patterns in the data waiting to be revealed.
Finding universals is then roughly a problem in signal processing in which the
judicious use of statistical techniques will find the signal in the often very
noisy noise. This conception understands universals as GUs. It is not the right model of a CU. For
discussion see here.
Incidentally, mistaking GUs for CUs will eventually lay low Deep Learning/Big
Data approaches to language. The latter count on the fact that all universals
will be GUs. If this is false, and it is, then such approaches cannot succeed,
and so they won’t. Of course it will take time for this to become evident and
by then another fad will sweep the Empiricist world.

[2]
There is an excellent discussion of just how complex this is in Cartwright (here,
chapter 4). I have discussed her main points in previous posts (see here).

Tuesday, November 24, 2015

I once heard of a class tight in the great days of literary theory entitled something like "The influence of Philip Roth on Charles Dickens." My memory tingles the suggestion that I have the names wrong here, but I am pretty sure that I got the gist right. A linguistic version of this might be "The influence of Chomsky on von Humboldt." The idea is that we see the past more clearly, when we see the present concepts more clearly. The inimitable intellectual archivist Bob Berwick sent me this great quote from Marvin Minsky:

“Unfortunately, there is still very little definite knowledge about, and not even any generally accepted theory of, how information is stored in nervous systems, i.e., how they learn. … One form of theory would propose that short-term memory is ‘dynamic’—stored in the form of pulses reverberating around closed chains of neurons. … Recently, there have been a number of publications proposing that memory is stored, like genetic information, in the form of nucleic-acid chains, but I have not seen any of these theories worked out to include plausible read-in and read-out mechanisms. (Minsky 1967, 66). Minsky, Finite and Infinite Machines.

So, it seems that Randy's conjecture has a distinguished pedigree and we cog-neuro has investigated the theory of genetic information storage largely by ignoring it. Let's hope that this time around this alternative hypothesis, one which really would challenge long held views in cog-neuro, is carefully vetted. Conceptually, the Gallistel view seems to me very strong. This does not mean that it is right, but it does mean that a perfectly reasonable alternative view has not even been pursued.

Monday, November 23, 2015

Jeff Lidz sent me this
great little piece by Randy Gallistel on his favorite theme: how most
neuroscientists have misunderstood how brains compute. I’ve discussed Randy’s
stuff in various FoL posts (here,
here,
and here).
Here in just four lucid pages, Randy makes his main point again. If he is right
(and the form of his argument seems impeccable to me), then much of what goes
on in neuroscience is just plain wrong. Indeed, if Randy is right, then current
neo-connectionist/neural net assumptions about the brain are about as accurate
as 1950s-60s behaviorist conceptions were about the mind. In other words, at
best of tertiary interest and, more likely, deserving to be completely
forgotten.[1]
At any rate, Randy here makes four main points.

First, that there is recent evidence (discussed here)
strongly pointing to the conclusion that information can be stored inside a
single neuron (rather than in connections of many neurons).

Second, that there is scads of behavioral evidence showing
that brains store number values and that there is no way of storing numbers
this in connection weights, thus implying that any theory of the brain that
limits itself to this kind of hardware must
be at best incomplete and at worst wrong.

Third, that there is a close connection between neural net
“plasticity” conceptions of the brain and traditional empiricist conceptions of
the mind (especially learning). In fact, Randy argues that these are largely
flip sides of the same coin.

Fourth, that brains already contain all the hardware that is
required to function like classical computers, the latter being the perfect
complements for the computational cognitive theories that replaced behaviorism.

And all in four pages.

There is one argument that Randy hints at but doesn’t stress
that I would like to add to his four. It is a conceptual argument. Here it is.

Whatever one thinks of cognition, it is clear that animals
use large molecules like DNA and RNA for information processing. Indeed, this
is now standard biological dogma. As Gallistel and King (here)
illustrates, this system has all the capacities of a classical computer
(addresses, read-write memory, variables, binding etc.). So here’s the
conceptual argument: imagine that you had an animal with the wherewithal to
classically compute hereditary information but instead of repurposing
(exapting) this system for cognitive ends it developed an entirely different additional system for this purpose. In other
words, it had all it needed sitting there but ignored these resources and
embodied cognition in a completely different way. Does this seem plausible? Is
this the way evolution typically works? Isn’t opportunism the main mover in the
evolution game? And if it is, doesn’t this suggest that Randy’s conjecture must be right? In fact, wouldn’t it be
weird if large chunks of cognition did not exploit that computational machinery
already sitting there in DNA/RNA and other large molecules? In fact, wouldn’t
the contrary assumption bear a huge burden of proof? Well, you know what I
think!

Why is this not the common perception? Why is Randy’s
position considered exotic? Here’s the one word answer: Empiricism! In the
cog-neuro world this is the default view. There is little to empirically
support this conception (see here
for a review of the pas de deux between unsupported empiricism in psychology
and tendentious reasoning in neural net neuroscience). Indeed, it largely
flourishes when we know next to nothing about some domain of inquiry. However,
it is the default conception of the mind. What Randy is pointing out (and has
repeatedly pointed out and is right to point out) is that it is fatally flawed,
not only as a theory of mind but also as a theory of the brain. And its flaws
are conceptual as well as empirical. I can’t wait for the day that this becomes
the conventional wisdom, though given the methodological dualism characteristic
of the cog-neuro-sciences, I suspect that this day is not just around the
corner. Too bad.

[1]
Note that I say “deserving” of amnesia. This concedes the sad fact that
neo-behaviorism is making a vigorous comeback within cognition. Yet another
indication of the collapse of civilization.

Tuesday, November 17, 2015

Never thought I would say this, but I found that I resonated
positively to a recent small comment
by Chris Manning on Deep Learning (DL) that Aaron White sent my way (here).
It seems that the DL has computational linguistics (CL) of the Manning variety
in its sights. Some DLers apparently believe that CL is just is nano-moments
away from extinction. Here’s a great quote from one of the DL doyens:

NLP is kind of like a rabbit in the
headlights of the Deep Learning machine, waiting to be flattened.

DL wise men like Geoff Hinton have already announced that
they expect that machines will soon be able to watch videos and “tell a story
about what happened” and be downsized onto an in-your-ear chip that can
translate into English on the fly. Great things are clearly expected.
Personally, I am skeptical as I’ve heard such hyperbole before. We have been
five years away from this sort of stuff for a very long time.

Moreover, I am not alone. If I read Manning correctly, he is
skeptical (though very politely so) as well.[1]
But, like me, he sees an opportunity here, one I noted before (here
and here).
Of course we likely disagree about what kind of linguistics will be most useful
for advancing these technological ends,[2]
but when it comes to engineering projects I am very catholic in my tastes.

What’s the opportunity consist in? It relies on a bet: that
generic machine learning (even of the DL variety) will not be able to solve the
“domain problem.” The latter is the belief that how a domain of knowledge is
structured matters a lot even if one’s
aim is to solve an engineering problem.

An aside: shouldn’t those that think that the domain problem
is a serious engineering hurdle also think that modularity is a good biological
design feature? And shouldn’t these people therefore think that the domain
specificity of FoL is a no-brainer? In other words, shouldn’t the idea that
humans have domain specific knowledge
that allows them to “solve” language problems (and support human facile
acquisition and use) be the default position? Chris?What think you? Dump general learning approaches
and embrace domain specificity?

Back to the main point: The bet. So, if you think that using
word contexts can only get you so far (and not interestingly far either), then
you are ready to bet that knowing something about language will be useful in
solving these engineering problems. And that provides linguists with an
opportunity to ply their trade. In fact, Manning points to a couple of projects
aimed at developing “a common syntactic dependency representation and POS
(‘part of speech,’ NH) and feature label sets which can be used with reasonable
linguistic fidelity and human usability across all human languages” (3).[3]
He also advocates developing analogous representations for “Abstract Meaning.”
This looks like the kind of thing that GGers could usefully contribute to. In
other words, what we do directly fits into the Manning project.

Another aside: do not
confuse this with investigating the structure of FL.What matters for this project is a reasonable
set of Greenberg “Universals.” Indeed, being too abstract might not be that
useful practically, and being truly universal is not that important (what is
important is finding those categories that best fit the particular languages of
interest). This is not a bad thing. Engineering is not to be disparaged. It’s just
not the same project as the one that GG has scientifically set for itself. Of
course, should the Chomsky version of GG succeed, it is possible that it will
contribute to the engineering problem. But then again, it might not. As I
understand it, General Relativity has yet to make a big impact on land
surveying. It really all depends (to fix ideas think birds and planes or fish
and submarines. Last time I looked plane wings don’t flap and sub bodies don’t
undulate).

Manning makes lots of useful comments about DL, many of
which I didn’t understand. He makes some, however, that I did. For example, his
the observation that DL has mainly proved useful in signal processing contexts
(2) (i.e. where the problem is to get the generalization that is in the data, the pattern from (noisy)
patternings). The language problem, as I’ve argued, is different from this (see
here)
so the limits of brute force DL will, I predict, become evident when the new wise
men turn their attention to these. In fact, I make a more refined prediction:
to “solve” this problem DLers will either (i) ignore it, (ii) restrict the
domain of interest to finesse it or (iii) promise repeatedly that the solution
is but 5 years away. This has happened before and will happen again unless the
intricate structural constraints that characterize language are recognized and
incorporated.

Manning also makes several points that I would take issue
with. For example, IMO he (like many others) confuses squishy data for squishy
underlying categories. See, in particular, Manning’s discussion of gerunds on
p. 4. That the data does not exhibit sharp boundaries does not imply that the
underlying structures are not sharp. In fact, at some level they must be for
under every probabilistic theory there is a categorical algebra.I leave it to you out there to come up with
an alternative analysis of Manning’s observed data set. I give you a 30 second
time limit to make it challenging.

At any rate, you will not be surprised to find out that I disagree
with many of Manning’s comments. What might surprise you is that I think he is
right in his reaction to DL hubris and he is right that there is an opportunity
for what GGers know to be of practical value. There is no reason for DL (or
Bayes or stats) to be inimical to GG. It’s just technology. What makes its practice
often anathema is the hard-core empiricism gratuitously adopted by its
practitioners. But this is not inherent to the technology. It is only a bias of
the technologists. And there are some like Jordan and Manning and Reisinger who
seem to get this. It looks like an opportunity for GGers to make a contribution?
One, incidentally, that can have positive repercussions for the standing of GG.
Scientific success does not require technological application. But having
technological relevance does not hurt either.

[1]
I confess to a touch of schadenfreude given that this is the kind of thing that
Manning and Co like to say about my
kind of linguistics wrt to their CL approaches.

[2]
Though I am not confident about this. I am pretty confident about what kind of
linguistics one needs to advance the cognitive project. I am far less sure
about what one needs to advance the engineering one. In fact, I suspect that a
more “surfacy” syntax will fit the latter’s design requirements better than a
more abstract one given its NLPish practical aims. See below for a little more
discussion.

[3]
I have it from a reliable source that this project is being funded by Google to
the tune of millions. I have no idea how many millions, but given that billions
are rounding errors to these guys, I suspect that there is real gold in them
thar hills.

Monday, November 16, 2015

I have been thinking lately about the following question: What
does comparative/typology (C/T) study contribute to our understanding of FL/UG?
Observe that I am taking it as obvious that GG takes the structure of FL/UG to
be the proper object of study and, as a result, that any linguistic research
project must ultimately be justified by the light it can shed on the fine
structure of this mental organ. So, the question: what does studying C/T bring
to the FL/UG table?

Interestingly, the question will sound silly to many.After all, the general consensus is that one
cannot reasonably study Universal Grammar
without studying the specific Gs of lots of different languages, the more the
better. Many vocal critics of GG complain that GG fails precisely because it
has investigated too narrow a range of languages and has, thereby, been taken
in by many false universals.

Most GGers agree with spirit of this criticism. How so?
Well, the critics accuse GG of being English or Euro centric and GGers tend to reflexively
drop into a defensive crouch by disputing the accuracy of the accusation. The
GG response is that GG has as a matter of fact studied a very wide variety of
languages from different families and eras. In other words, the counterargument
is that critics are wrong because GG is already doing what they demand.

The GG reply is absolutely accurate. However, it obscures a
debatable assumption, one that indicates agreement with the spirit of the
criticism: that only or primarily the study of a wide variety of
typologically diverse languages can ground GG conclusions that aspire to universal relevance. In other words,
both GG and its critics take the intensive study of typology and variation to
be a conceptually necessary part of an empirically successful UG project.

I want to pick at this assumption in what follows.I have nothing against C/T inquiry.[1]
Some good friends engage in it. I enjoy reading it. However, I want to put my
narrow prejudices aside here in order to try and understand exactly what C/T work
teaches us about FL/UG? Is the tacit (apparently widely accepted) assumption
that C/T work is essential for (or at
least, practically indispensible for or very conducive to) uncovering the
structure of FL/UG correct?

Let me not be coy. I actually don’t think it is necessary, though I am ready to believe
that C/T inquiry has been a practical and
useful way of proceeding to investigate FL/UG. To grease the skids of this
argument, let me remind you that most of biology is built on the study of a rather
small number of organisms (e. coli, C. elegans, fruitflies, mice). I have rarely
heard the argument made that one can’t make general claims about the basic
mechanisms of biology because only a very few organisms have been intensively studied.
If this is so for biology, why should the study of FL/UG be any different. Why
should bears be barely (sorry I couldn’t help it) relevant for biologists but
Belarusian be indispensable for linguistics? Is there more to this than just
Greenbergian sentiments (which, we can all agree, should be generally resisted)?

So is C/T work necessary?
I don’t think it is. In fact, I personally believe that POS investigations (and
acquisition studies more generally (though these are often very hard to do
right)) are more directly revealing of FL/UG structure. A POS argument if
correctly deployed (i.e. well grounded empirically) tells us more about what
structure FL/UG must have than
surveys (even wide ones) of different Gs do. Logically, this seems obvious.
Why? Because POS arguments are impossibility arguments (see here)
whereas surveys, even ones that cast a wide linguistic net, are empirically
contingent on the samples surveyed. The problem with POS reasoning is not the
potential payoff or the logic but the difficulty of doing it well. In
particular, it is harder than I would like to always specify the nature of the
relevant PLD (e.g. is only child directed speech relevant? Is PLD degree 0+?). However,
when carefully done (i.e. when we can fix the relevant PLD sufficiently well),
the conclusions of a POS are close to definitive. Not so for cross-linguistic
surveys.[2]

Assume I am right (I know you don’t, but humor me). Nothing
I’ve said gainsays the possibility that C/T inquiry is a very effective way of
studying FL/UG, even if it is not necessary. So, assuming it is an effective
way of studying FL/UG, what exactly does C/T inquiry bring to the FL/UG table?

I can think of three ways that C/T work could illuminate the
structure of FL/UG.

First, C/T inquiry can suggest candidate universals. Second,
C/T investigations can help sharpen our understanding of the extant universals.
Third, it can adumbrate the range of Gish variation, which will constrain the
reach of possible universal principles. Let me discuss each point in turn.

First, C/T work as a source of candidate universals. Though
this is logically possible, as a matter of fact, it’s my impression that this
has not been where plausible candidates have come from. From where I sit (but I
concede that this might be a skewed perspective) most (virtually all?) of the
candidates have come from the intensive study of a pretty small number of
languages. If the list I provided here
is roughly comprehensive, then many, if not most, of these were “discovered”
using a pretty small range of the possible Gs out there. This is indeed often
mooted as a problem for these purported universals. However, as I’ve mentioned
tiresomely before, this critique often rests on a confusion between Chomsky
universals with their Grennbergian eponymous doubles.

Relevantly, many of these candidate universals predate the
age of intensive C/T study (say dating from the late 70s and early 80s). Not
all of them, but quite a few. Indeed, let me (as usual) go a little further:
there have been relatively few new
candidate universals proposed over the last 20 years, despite the continually
increasing investigation of more and more different Gs. That suggests to me
that despite the possibility that many of our universals could have been
inductively discovered by rummaging through myriad different Gs, in fact this
is not what actually took place.[3]
Rather, as in biology, we learned a lot by intensively studying a small number
of Gs and via (sometimes inchoate) POS reasoning, plausibly concluded that what
we found in English is effectively a universal feature of FL/UG. This brings us
to the second way that C/T inquiry is useful. Let’s turn to this now.

The second way that C/T inquiry has contributed to the
understanding of FL/UG is that it has allowed us (i) to further empirically
ground the universals discovered on the basis of a narrow range of studied
languages and, (ii) much more importantly, to refine these universals. So, for example, Ross discovers island
phenomena in languages like English and proposes them as due to the inherent
structure of FL/UG. Chomsky comes along and develops a theory of islands that
proposes that FL/UG computations are bounded (i.e. must take place in bounded
domains) and that apparent long distance dependencies are in fact the products
of smaller successive cyclic dependencies that respect these bounds. C/T work then
comes along and refines this basic idea further. So Rizzi notes that (i) wh-islands
are variable (and multiple WH languages like Romanian shows that there is more
than one way to apparently violate Wh islands) and (ii) Huang suggests that
islands needs to include adjuncts and subjects and (iii) work on the East Asian
languages suggests that we need to distinguish island effects from ECP effects
despite their structural similarity and (iv) studies of in-situ wh languages allows us to investigate the bounding
requirements on overt and covert movement and (v) C/T data from Irish and
Chamorro and French and Spanish provides direct evidence for successive cyclic
movement even absent islands.

There are many other examples of C/T thinking purifying
candidate universals. Another favorite example of mine is how the anaphor
agreement effect (investigated by Rizzi and Woolford) shows that Principle A
cannot be the last word on anaphor binding (see Omer’s discussion here).
This effect strongly argues that anaphor licensing is not just a matter of binding domain size, as the classical GB binding
theory proposes.[4]
So, finding that nominative anaphors cannot be bound in Icelandic changes the
way we should think about the basic form
of the binding theory. In other words, considering how binding operates in a
language with different case and agreement profiles from English has proven to
be very informative about our basic understanding binding principles.[5]

However, though I think this work has been great (and a great
resource at parties to impress friends and family), it is worth noting that the
range of relevant languages needed for the refinements has been relatively
small (what would we do without Icelandic!). This said, C/T work has made apparent the wide range of
apparently different surface phenomena that fall into the same general underlying
patterns (this is especially true of the rich investigations on case/agreement
phenomena). It has also helped refine our understanding by investigating the
properties of languages whose Gs make morpho-syntactically explicit what is
less surface evident in other languages. So for example, the properties of
inverse agreement (and hence defective intervention effects) are easier to
study in languages like Icelandic where one finds overt post verbal nominatives
than it is in English where there is relatively little useful morphology to
track.[6]
The analogue of this work in (other) areas of biology is the use of big fat and
easily manipulated squid axons (rather than dainty, small and smooshy mice
axons) to study neuronal conduction.

Another instance of the same thing comes from the great
benefits of C/T work in identifying languages where UG principles of interest
leave deeper overt footprints than in others (sometimes very very deep (e.g.
inverse control, IMO)). There is no question that the effects of some
principles are hard to find in some languages (e.g. island effects in languages
which don’t tend to move things around much, or binding effects in Malay-2 (see
here)).
And there is no doubt that sometimes languages give us extremely good evidence
of what is largely theoretical inference in others. Thus, as mentioned, the
morphological effects of successive cyclic movement in Irish or Chamorro or
verb inversion in French and Spanish make evident at the surface the successive
cyclic movement that FL/UG infers from, among other things, island effects. So,
there is no question that C/T research has helped ground many FL/UG universals,
and has even provided striking evidence for their truth. However (and maybe this
is the theorist in me talking), it is surprising how much of these refinements
and evidence builds on proposals with a still very narrow C/T basis. What made
the C-agreement data interesting, for example, is that it provided remarkably
clear evidence for something that we already had pretty good indirect evidence
for (e.g. Islands are already pretty good evidence for successive cyclic movement
in a subjacency account). However, I don’t want to downplay the contributions
of C/T work here. It has been instrumental in grounding lots of conclusions
motivated on pretty indirect theoretical grounds, and direct evidence is always
a plus. What I want to emphasize is that more often than not, this additional
evidence has buttressed conclusions reached on theoretical (rather than
inductive) grounds, rather than challenging them.

This leaves the third way that C/T work can be useful: it
may not propose but it can dispose. It can help identify the limits of universalist ambitions. I
actually think that this is much harder to do than is often assumed. I have
recently discussed an (IMO unsuccessful) attempt to do this for Binding Theory
(here
and here),
and I have elsewhere discussed the C/T work on islands and their implications
for a UG theory of bounding (here).
Here too I have argued that standard attempts to discredit universal claims regarding
islands have fallen short and that the (more “suspect”) POS reasoning has
proven far more reliable. So, I don't believe that C/T work has, by and large,
been successful at clearly debunking most of the standard universals.

However, it has been important in identifying the considerable
distance that can lie between a universal underlying principle and its surface
expressions. Individual Gs must map underlying principles to surface forms and
Gs must reflect this possible variation. Consequently, finding relevant
examples thereof sets up interesting acquisition problems (both real time and
logical) to be solved. Or, to say this another way, one potential value of C/T
work is in identifying something to explain given
FL/UG. C/T work can provide the empirical groundwork for studying how FL/UG is
used to build Gs, and this can have the effect of forcing us to revise our
theories of FL/UG.[7]Let me explain.

The working GG conceit is that the LAD uses FL and its UG
principles to acquire Gs on the basis of PLD. To be empirically adequate an
FL/UG must allow for the derivation of different Gs (ones that respect the
observed surface properties). So, one way to study FL/UG is to investigate
differing languages and ask how their Gs (i.e. ones with different surface properties)
could be fixed on the basis of available PLD. On this view, the variation C/T
discovers is not interesting in itself
but is interesting because it empirically identifies an acquisition problem: how
is this variation acquired? And this problem has direct bearing on the
structure of FL/UG. Of course, this does not mean that any variation implies a
difference in FL/UG. There is more to actual acquisition than FL/UG. However,
the problem of understanding how variation arises given FL/UG clearly bear on
what we take to be in FL/UG.[8]

And this is not merely a possibility. Lots of work on
historical change from the mid 1980s onwards can be, and was, seen in this
light (e.g. Lightfoot, Roberts, Berwick and Nyogi). Looking for concomitant
changes in Gs was used to shed light on the structure of FL/UG parameter space.
The variation, in other words, was understood to tell us something about the
internal structure of FL/UG. It is unclear to me how many GGers still believe
in this view of parameters (see here
and here).
However, the logic of using G change to probe the structure of FL/UG is
impeccable. And there is no reason to limit the logic to historical variation.
It can apply just as well to C/T work on synchronically different Gs, closely
related but different dialects, and more.

This said, it is my impression that this is not what most
C/T work actually aspires to anymore, and this is becuase most C/T research is
not understood in the larger context of Plato’s Problem or how Gs are acquired
by LADs in real time. In other words, C/T work is not understood as a first step towards the study FL/UG. This is unfortunate for this is
an obvious way of using C/T results to study the structure of FL/UG. Why then
is this not being done? In fact, why does it not even seem to be on the C/T
research radar?

I have a hunch that will likely displease you. I believe
that many C/T researchers either don’t actually care to study FL/UG and/or they
understand universals in Greenbergian terms. Both are products of the same
conception; the idea that linguistics studies languages, not FL.Given this view, C/T work is what linguists
should do for the simple reason that C/T work investigates languages and that’s
what linguistics studies. We should recognize that this is contrary to the
founding conception of modern linguistics. Chomsky’s big idea was to shift the
focus of study from languages to the
underlying capacity for language (i.e
FL/UG). Languages on this conception are not
the objects of inquiry. FL is. Nor are Greenberg universals what we are
looking for. We are looking for Chomsky universals (i.e. the basic structural
properties of FL). Of course, C/T work might
advance this investigation. But the supposition that it obviously does so needs argumentation. So let’s have some, and to
start the ball rolling let me ask you: how does C/T work illuminate the
structure of FL/UG? What are its greatest successes? Should we expect further
illumination? Given the prevalence of the activity, it should be easy to find
convincing answers to these questions.

[1]
I will treat the study of variation and typological study as effectively the
same things. I also think that historical change falls into the same group. Why
study any of these?

[2]
Aside from the fact that induction over small Ns can be hazardous (and right
now the actual number of Gs surveyed is pretty small given the class of
possible Gs), most languages differ from English in only having a small number
of investigators. Curiously, this was also a problem in early modern biology.
Max Delbruck decreed that everyone would work on e.coli in order to make sure
that the biology research talent did not spread itself too thin. This is also a
problem within a small field like linguistics. It would be nice if as many
people worked on any other language as work on English. But this is impossible.
This is one reason why English appears to be so grammatically exotic; the more
people work on a language the more idiosyncratic it appears to be. This is not
to disparage C/T research, but only to observe the obvious, viz. that
person-power matters.

[3]
Why has the discovery of new universals slowed down (if it has, recall this is
my impression)? One hopeful possibility is that we’ve found more or less all of
them. This has important implications for theoretical work if it is true,
something that I hope to discuss at some future point.

[4]
Though, as everyone knows, the GB binding theory as revised in Knowledge of Language treats the
unacceptability of *John thinks
himself/heself is tall as not a binding effect but an ECP effect. The
anaphor-agreement effect suggests that this too is incorrect, as does the
acceptability of quirky anaphoric subjects in Icelandic.

[5]
I proposed one possible reinterpretation of binding theory based in part on
such data here.I cannot claim that the proposal has met with
wide acceptance and so I only mention it for the delectation of the morbidly
curious.

[6]
One great feature of overt morphology is that it often allows for crisp speaker
acceptability judgments. As this has been syntax’s basic empirical fodder, crisp
judgments rock.

[7]
My colleague Jeff Lidz is a master of this. Take a look at some of his papers.
Omer Preminger’s recent NELS invited address does something similar from a more
analytical perspective. I have other favorite practitioners of this art including
Bob Berwick, Charles Yang, Ken Wexler, Elan Dresher, Janet Fodor, Stephen
Crain, Steve Pinker, and this does not exhaust the list. Though it does exhaust
my powers of immediate short term recall.

[8]
Things are, of course, more complex. FL/UG cannot explain acquisition all by
its lonesome; we also need (at least) a learning theory. Charles Yang and Jeff
Lidz provide good paradigms of how to combine FL/UG and learning theory to
investigate each. I urge you to take a look.