Comments

Sunday, March 29, 2015

This continues this post and this post. I have mentioned before, that GG has developed an impressive body of doctrine. Part of this are the generalizations that have been discovered over the course of the last 60 years. Finding these and trying to then explain them have been a central feature of GG research. Here I list a bunch. The remaining two sections (to be posted) discuss some proffered explanations.

***

2.A
list of some of the effects GG has discovered

The LK story is one of many that GGers have told over the
last 60 years.Here is a partial list of
some of the effects that are still widely investigated (both theoretically and
empirically) within GG research. Some of these effects can be considered
analogous to “laws of grammatical structure” which serve as probes into the
inner workings of FL.As in the case of
LK’s binding proposal, the effects comprise both negative and positive data and
they have served as explanatory targets (and benchmarks) for theories of FL.

These effects also illustrate another distinguishing mark of
an emerging science. In the successful sciences, most of the data is carefully constructed,
not casually observed.In this sense, it
is not “natural” at all, but factitious.The effects enumerated here are similar. They are not thick on the
conversational ground. Many of these effects concentrate on what cannot exist (i.e. negative data).Many are only visible in comparatively
complex linguistic structures and so are only rarely attested in natural speech
or PLD (if at all). Violations of the binding conditions such as John believes himself is intelligent are
never attested outside of technical papers in GG syntax. Thus, in GG (as in
much of physics, chemistry, biology etc.) much of the core data that GG uses to
probe FL is constructed, rather than natural.[1]
To repeat, this is a hallmark of modes of investigation that have made the leap
from naturalistic observation to scientific explanation. The kinds of data that
drive GG work is of this constructed kind.[2]

Here, then is a partial list of some of the more important
effects that GG has discovered.

As in the case of the LK binding proposal outlined above,
just describing these effects involves postulating rules and structures to NL
expressions. Thus, each effect comes together with sets of positive and negative
examples and rules/restrictions that describe these data.As in any scientific domain, simply
describing the effects already requires quite a bit of theoretical apparatus
(e.g. what’s an island, what’s a deletion rule, what’s the difference between A
and A’ movement, what’s case, what’s a clause etc.). And, as is true elsewhere,
the discovery of these effects sets the stage for the next stage of inquiry:
explaining them and seeing what these explanations can tell us about the
structure of FL.

[1]
Cartwright discussion of this?How only
in artificial settings get generalizations….

[2]
Constructed data are generally more robust than naturalistic data, as
Cartwright observes.Furthermore, it
allows for investigations to be more systematic by allowing researchers to put
their own questions to nature and make her answer these rather than simply
waiting until nature voluntarily gives up her secrets.

Wednesday, March 25, 2015

I just read an interesting paper by Mark Fedyk on
evolutionary psychology (EP) (here).
The paper does a pair of things (i) it provides an accessible discussion of the
logic behind EP and (ii) it criticizes the massive modularity hypothesis (i.e.
the proposal that minds/brains consists of many domain specific modules shaped
by the environmental exigencies of the Pleistocene Epoch).[1]

Fedyk elucidates the logic of EP by discussing the relation
between “ultimate” and “proximate” explanations. The former “refer to the
historical conditions responsible for causally stabilizing a particular
phenotype (or range of phenotypes) in a population” (3). Such explanations try
to isolate “why a pattern of behavior was adaptive in an environment” however
it is not restricted to the mechanisms involved in fitting the phenotype with
the environment. These mechanisms are the province of “proximate” explanations.
These refer to the “psychological, physiological, neurophysiological,
biochemical, biophysical, etc. processes which occur at some point within the
course of an organism’s development and which are responsible for determining
some aspect of an organism’s phenotype” (3).

One of Fedyk’s main points is that there is a many-many
relation between ultimate and proximate explanations and, importantly, that
“knowing the correct ultimate explanation” need “provide no insight whatsoever”
into which particular proximate explanation is correct. And this is a problem
for the EP research program which aims to “offer a method for discovering human psychological traits”
(Fedyk quoting Machery)(5). Here’s how Fedyk summarizes the heuristic (6):

…the evolutionary psychologist
begins by finding a pattern of human behavior that in the EEA [environment of
evolutionary adaptedness, NH] should have been favored by selection. This is
sufficient to show that the patterns of behavior could have been an adaptation…Next, the evolutionary psychologist infers
that there is a psychological mechanism which is largely innate and
non-malleable, and which has the unique
computational function of producing the relevant pattern of behavior. Finally a
test for this mechanism is performed.

The first part of the paper argues that this logic is likely
to fail given the many-many relationship between ultimate and proximate
accounts.

In the second part of the paper, Fedyk considers a way of
salvaging this flawed logic and concludes that it won’t work. I leave the details
to you. What I found interesting is the discussion of how modern evolutionary
differs from the “more traditional neo-Darwinian picture.” The difference seems
to revolve on how to conceive of development. The traditional story seemed to
find little room for development save as the mechanism for (more or less)
directly expressing a trait. The modern view understands development to be very
environment sensitive capable of expressing many traits, only some of which are
realized in a particular environmental setting (i.e. many of which are not so
realized and may never be). Thus, an important difference between the
traditional and the modern view concerns the relation between a trait and the
capacities that express that trait. Traits and capacities are very closely tied
on the traditional view, but can be quite remote on the modern conception.

Fedyk discusses all of this using language that I found
misleading. For example, he runs together ‘innate,’ ‘hardwired’ and
‘malleable.’ What he seems to need, IMO, is the distinction between traits and
the capacities that they live on. His important observation is that the traits
are expressions of more general capacities and so seeing how the former change
may not tell you much about how (or even whether) the latter do. It is only if
you assume that traits are pretty direct reflections of the underlying
capacities ( i.e. as Fedyk puts it (15): if you assume that “each of these
modules is largely hardwired with specific programs which must cause specific
behavioral patterns in response to specific environmental conditions…”) that
you get a lever from traits to psychological mechanisms.

None of this should strike a linguist as controversial. For
example, we regularly distinguish a person’s grammatical capacities from the
actual linguistic utterances produced. Gs are not corpora or (statistical)
summaries of corpora. Similary, UG is not a (statistical) summary of properties
of Gs. Gs are capacities that are expressed in utterances and other forms of
behavior. UG is a capacity whose properties delimit the class of Gs. In both
cases there is a considerable distance between what you “see” and the capacity
that underlies what you “see.” The modern conception of evolution that Fedyk
outlines is quite congenial to this picture. Both understand that the name of
the game is to find and describe these capacities, and that traits/behavior are
clues, but ones that must be treated gingerly.

[1]
One of the best reads on this topic to my mind is Fodor’s review here.

Saturday, March 21, 2015

Work in the first period involved detailed investigations of
the kinds of rules that particular Gs have and how they interact.Many different rules were investigated:
movement rules, deletions rules, phrase structure rules and binding rules to name
four. And their complex modes of interaction were limned. Consider some
details.

Recall that one of the central facts about NLs is that they
contain a practically infinite number of hierarchically organized objects.They also contain dependencies defined over
the structures of these objects. In early GG, phrase structure (PS) rules recursively
specified the infinite class of well-formed structures in a given G. Lexical
insertion (LI) rules specified the class of admissible local dependencies in a given
G and transformational (T) rules specified the class of non-local dependencies
in a given G.[1]
Let’s consider each in turn.

PS rules are recursive and their successive application
creates bigger and bigger hierarchically organized structures on top of which LI
and T rules operate to generate other dependencies.(6) provides some candidate phrase PS rules:

(6) a. Sà
NP aux VP

b. VPà
V (NP) (PP)

c. NPà
(det) N (PP) (S)

d. PPà
P NP

These four rules suffice to generate an unbounded number of
hierarchical structures.[2]
Thus sentences like John kissed Mary
has the structure in (7) generated using rules (6a,b,c).

(7) [S [NP N] aux [VP V [NP
N ]]]

LI-rules like those in (8) insert terminals into these
structures yielding the structured phrase marker (PM) in (9):

(8)a. Nà John, Mary…

b. Và
kiss,…

c. auxà
past

(9) [S [NP [N John ] [aux
past] [VP [V kiss] [NP [N Mary]]]]

PMs like (9) code for local inter-lexical dependencies as
well. Note that replacing kiss with arrive yields an unacceptable sentence:
*John arrived Mary. The PS rules can
generate the relevant structure (i.e. (7)), but the LI rules cannot insert arrive in the V position of (7) because arrive
is not lexically marked as transitive. In other words, NP^kiss^NP is a fine local dependency, but NP^arrive^NP is not.

Consider the Passive rule in (10). ‘X’/’Y’ in (10) are
variables. The rule says that if you can factor a PM into the parts on the left
(viz. the structural description) you can change the structure to the one on
the right (the structural change).Applied to (9), this yields the derived phrase marker (11).

Note, the rule codes the fact that what was once the object
of kiss is now a derived subject.
Despite this change in position, Mary
is still the kisee. Similarly, John,
the former subject of (9) and the kisser is now the object of the preposition by, and still the kisser.Thus, the passive rule in (10) codes the fact
Mary was kissed by John and John kissed Mary have a common thematic
structure as both have an underlying derivation which starts from the PM in (9).
In effect, it codes for non-local dependencies, e.g. the one between Mary and kiss.

The research focus in this first epoch was on carefully
describing the detailed features of a variety of different constructions,
rather than on factoring out their common features.[4]
Observe that (10) introduces new expressions into the PM (e.g. be+en, by), in addition to rearranging
the nominal expressions. T-rules did quite a bit of this, as we shall see
below. What’s important to note for current purposes is the division of labor
between PS-, LI- and T-rules. The first generates unboundedly many hierarchical
structures, the second “chooses” the right ones for the lexical elements involved
and the latter rearranges them to produce novel surface forms that retain
relations to other non-local (e.g. adjacent) expressions.

T-rules, despite their individual idiosyncrasies, fell into a
few identifiable formal families. For example, Control constructions are
generated by a T-rule (Equi-NP deletion) that deletes part of the input
structure. Sluicing constructions also delete material but, in contrast to
Equi-NP deletion, it does not require a PM internal grammatical trigger (aka,
antecedent) to do so. Movement rules (like Passive in (11) or Raising)
rearrange elements in a PM. And T-rules that generate Reflexive and Bound
Pronoun constructions neither move nor delete elements but replace the lower of
two identical lexical NPs with morphologically appropriate formatives (as we
illustrate below).

In sum, the first epoch provided a budget of actual examples
of the kinds of rules that Gs contain (i.e. PS, LI and T) and the kinds of
properties these rules had to have to be capable of describing recursion and
the kinds of dependencies characteristically found within NLs. In short, early
GG developed a compendium of actual G rules in a variety of languages.

Nor was this all. Early GG also investigated how these
different rules interacted. Recall, that one of the key features of NLs is that
they include effectively unbounded hierarchically organized objects.This means that the rules talk to one another
and apply to one another’s outputs to produce an endless series of complex
structures and dependencies. Early GG started exploring how G rules could
interact and it was quickly discovered how complex and subtle the interactions
could be. For example, in the Standard Theory, rules apply cyclically and in a
certain fixed order (e.g. PS rules applying before T rules). Sometimes the
order is intrinsic (follows from the nature of the rules involved) and
sometimes not. Sometimes the application of a rule creates the structural conditions
for the application of another (feeding) sometimes it destroys the structures
required (bleeding).These rules systems
can be very complex and these initial investigations gave a first serious taste
of what a sophisticated capacity natural language competence was.

It is worth going through an example to see what we have in
mind. For illustration, consider some binding data and the rules of Reflexivization
and Pronominalization, and their interactions with PS rules and T rules like
Raising.

Lees-Klima (LK) (1963) offered the following two rules to
account for an interesting array of binding data in English.[5]The proposal consists of two rules, which
must apply when they can and are (extrinsically) ordered so that (12) applies
before (13).[6]

(12) Reflexivization:

X-NP1- Y- NP2- Z à X- NP1-Y-
pronoun+self-Z,

(Where NP1=NP2, pronoun has the phi-features of NP2,and NP1/NP2 are in the
same

simplex sentence).

(13)
Pronominalization:

X-NP1-Y-NP2-Z à X-NP1-Y- pronoun-Z(Where NP1=NP2 and pronoun has the phi-features of NP2).

As is evident, the two rules have very similar forms. Both
apply to identical NPs and morphologically convert one to a reflexive or
pronoun. (12), however, only applies to nominals in the same simplex clause,
while (13) is not similarly restricted. As (12) obligatorily applies before (13),
reflexivization will bleed the environment for the application of
pronominalization by changing NP2 to a reflexive (thereby rendering
the two NPs no longer “identical”).A consequence
of this ordering is that Reflexives and (bound) pronouns (in English) must be in
complementary distribution.[7]

An
illustration should make things clear. Consider the derivation of (14a).It has the underlying form (14b). We can
factor (14b) as in (14c) as per the Reflexivization rule (12). This results in
converting (14c) to (14d) with the surface output (14e) carrying a reflexive
interpretation. Note that Reflexivization codes the fact that John is both washer and washee, or that John non-locally relates to himself.

(14) a. John1
washed himself/*him

b. John
washed John

c.
X-John-Y-John-Z

d.
X-John-Y-him+self-Z

e. John
washed himself

What blocks John likes
him with a similar reflexive reading, i.e. where John is co-referential with him?
To get this structure Pronominalization must apply to (14c).However, it cannot as (12) is ordered before
(13) and both rules must apply when they can apply.But, once (12) applies we get (14d), which no
longer has a structural description amenable to (13). Thus, the application of
(12) bleeds that of (13) and John likes
him with a bound reading cannot be derived, i.e. there is no licit
grammatical relation between John and
him.

This changes in (15). Reflexivization cannot apply to (15c)
as the two Johns are in different
clauses. As (12) cannot apply, (13) can (indeed, must) as it is not similarly
restricted to apply to clause-mates. In sum, the inability to apply (12) allows
the application of (13). Thus does the LK theory derive the complementary
distribution of reflexives and bound pronouns.

(15) a. John
believes that Mary washed *himself/him

b. John
believes that Mary washed John

c.
X-John-Y-John

d.
X-John-Y-him

e. John
believes that Mary washed him

There is one other feature of note: the binding rules in
(12) and (13) also effectively derive a class of (what are now commonly called)
principle C effects given the background assumption that reflexives and
pronouns morphologically obscure an underlying copy of the antecedent. Thus,
the two rules prevent the derivation of structures like (16) in which the bound
reflexive/pronoun c-commands its antecedent.

(16) a. Himself1
kissed Bill1

b. He1
thinks that John1 is tall

The derivation, of these principle C effects, is not particularly
deep.The rules derive the effect by
stipulating that the higher of two identical NPs is retained while the lower
one is morphologically reshaped into a reflexive/pronoun.[8]

The LK theory can also explain the data in (17) in the
context of a G with rules like Raising to Object in (18).

If (18) precedes (12) and (13) then it cannot apply to raise
the finite subject in (19) to the matrix clause. This prevents (12) from
applying to derive (17a) as (12) is restricted to NPs that are clause-mates. But,
as failure to apply (12) requires the application of (13), the mini-grammar
depicted here leads to the derivation of (17b).

(19) John1 believes C John1 is
intelligent

Analogously, (12), (13) and (18) also explain the facts in
(20), at least if (18) must apply when it can.[10]

(20)a. John1
believes himself1 to be intelligent

b. *John1 believes him1
to be intelligent

The LK analysis can be expanded further to handle yet more
data when combined with

other rules of G. And this is exactly the point: to
investigate the kinds of rules Gs contain by seeing how their interactions
derive non-trivial linguistic data sets. This allows us to explore what kinds
of rules exist (by proposing some and seeing how they work) and what kinds of
interactions rules can have (they can feed and bleed one another, then are
ordered, etc.).

The LK analysis illustrates two important features of these
early analyses. First, it (in combination with other rules) compactly
summarizes a set of binding “effects,” patterns of data concerning the relation
of anaphoric expressions to their antecedents in a range of phrasal
configurations. It doesn't outline all
the data that we now take to be relevant to binding theory (e.g. it does not
address the contrast in John1’s
mother likes him/*himself1), but many of the data points
discussed by LK have become part of the canonical data that any theory of
Binding is responsible for.Thus, the
complementary distribution of reflexives and (bound) pronouns in these
sentential configurations is now a canonical fact that every subsequent theory
of Binding has aimed to explain. So too the locality required between
antecedent and anaphor for successful reflexivization and the fact that an
anaphor cannot c-command the antecedent that it is related to.[11]

The kinds of the data LK identifies is also noteworthy.From very early on, GG understood that both
positive and negative data are relevant for understanding how FL is
structured.Positive data is another
name for the “good” cases (examples like (14e) and (15e)), where an anaphoric
dependency is licensed. Negative data are the * cases (examples like (17a) and
(20b)) where the relevant dependency is illicit.Grammars, in short, not only specify what can be done, they also specify what cannot be. GG has discovered that
negative data often reveals more about the structure of FL than positive data
does.[12]

Second, LK provides a theory
of these effects in the two rules (12) and (13).As we shall see, this theory was not retained in later versions of GG.[13]
The LK account relies on machinery (obligatory rule application, bleeding and
feeding relations among rules, rule ordering, Raising to Object, etc.) that is
replaced in later theory by different kinds of rules with different kinds of
properties. The rules themselves are also very complex (e.g. they are
extrinsically ordered). Later approaches to binding attempt to isolate the
relevant factors and generalize them to other kinds of rules. We return to this
anon.

The distinction between “effects” and “theory” is an
important one in what follows.As GG
changed over the years, discovered effects have been largely retained but
detailed theory intended to explain these effects has often changed.[14]
This is similar to what we observe in the mature sciences (think Ideal Gas Laws
wrt Thermodynamics and later Statistical Mechanics). What is clearly cumulative in the GG tradition
is the conservation of discovered effects. Theory changes, and deepens. Some
theoretical approaches are discarded, some refashioned and some resuscitated
after having been abandoned. Effects, however, are conserved and a condition of
theoretical admissibility is that the effects explained by earlier theory,
remain explicable given newer theoretical assumptions.

We should also add, that for large stretches of theoretical
time, basic theory has also been conserved. However, the cumulative nature of
GG research is most evident in the generation and preservation of the various discovered
effects. With this in mind, let’s list some of the many discovered till now.

[1]
Earliest GGs did not have PS rules but two kinds of transformations. Remember
this is a Whig history, not the real
thing.

[3]
In the earliest theories of GG, recursion was also the province of the
transformational component, with PS rules playing a far more modest role.
However, from Aspects onward, the
recursive engine of the grammar was the PS rules. Transformations did not
generally created “bigger” objects. Rather they specified licit grammatical
dependencies within PS created objects.

[4]
This is not quite right, of course. One of the glories of GG is Ross’s
discovery if islands, and many different constructions obeyed them.

[5]
LSLT was a very elaborate investigation of G(English). The rules for pronouns
and reflexives discussed here have antecedents in LSLT. However, the rules
discussed as illustrations here were first developed in this form by Lees and
Kilma.

[6]
The left side of à
is the “structural description.” It describes the required factorization of the
linguistic object so that the rule can apply. The right side describes how the
object is changed by the rule. It is called the “structural change.”

[7]
Cross-linguistic work on binding has shown this fact to be robust across NLs
and so deriving the complementarity has become an empirical boundary condition
on binding theories.

[8]
Depending on how “identical” is understood, the LK theory prevents the
derivation of sentences like John kissed
John where the two Johns are
understood as referring to the same individual. How exactly to understand the
identity requirement was a vexed issue that was partly responsible for
replacing the LK theory. One particularly acute problem was how to derive
sentences like Everyone kissed himself.
It clearly does not mean anything like ‘everyone kissed everyone.’ What then is
its underlying form so that (12) could apply to it. This was never
satisfactorily cleared up and led to revised approaches to binding, as we shall
see.

[9]
This is not how the original raising to object rule was stated, but it’s close
enough. Note too, that saying that C is finite means that it selects for a
finite T. In English, for example, that
is finite and for is non-finite.

[12]
The focus on negative data has also been part of the logic of the POS. Data
that is absent is hard to track without some specification of what absences to
look for (i.e. some specification of where to look).More important still to the logic of the POS
is the impoverished nature of the PLD available to the child. We return to this
below.