Comments

Friday, May 15, 2015

David Adger talks to Thomas Graf about some very big issues

This David Adger post speaks for itself.

***

I’d intended this to be a response to Thomas’s
comments but it got too long, and veered off in various directions.

Computational
and other levels

Thomas makes the point that there’s too much
work at the ‘implementational’ level, rather than at the proper Marrian
computational level, and gives examples to do with overt vs covert movement,
labelling etc. He makes an argument that all that stuff is known to be formally
equivalent, and we essentially shouldn’t be wasting our time doing it. So ditch
a lot of the work that goes on in syntax (sob!).

But I don’t think that’s right. Specification
at the computational level for syntax is not answered fully by specifying the
computational task as solving the problem of providing an infinite set of
sound-meaning pairings; it’s solving the issue of why these pairings,
and not some other thinkable set.So,
almost all of that `implementation’ level work about labels or whatever is
actually at the computational level. In fact, I don’t really think there is an
algorithmic level for syntax in the classical Marrian sense: the computational
level for syntax defines the set of pairings and sure that has a physical
realization in terms of brain matter, but there isn’t an algorithm per se. The information
in the syntax is accessed by other systems, and that probably is algorithmic in
the sense there’s a set by step process to transform information of one sort
into another (to phonology, or thinking, or various other mental subsystems),
but the syntax itself doesn’t undergo information transforming processes of
this sort, it’s a static specification of legitimate structures (or
derivations). I think that the fact that this isn’t appreciated sometimes
within our field (and almost never beyond it) is actually a pretty big problem,
perhaps connected with the hugely process oriented perspective of much
cognitive psychology.

Back to the worry about the actual
`implementational’ issue to do with Agree vs Move etc. I think that Thomas is
right, and that some of it may be misguided, inasmuch as the different
approaches under debate may have zero empirical consequences (that is, they
don’t answer the question: why this pairing and not some other -
derivations/representations is perhaps a paradigm case of this). In such cases
the formal equivalence between grammars deploying these different devices is
otiose and I agree that it would be useful to accept this for particular cases.
But at least some of this ‘implementational’ work can be empirically sensitive:
think of David Pesetsky’s arguments for covert phrasal as well as covert
feature (=Agree) movement, or mine and Gillian’s work on using Agree vs overt
movement to explain why Gaelic wh-phrases don’t reconstruct like English ones
do but behave in a ways that’s intermediate between bound pronouns and
traces. The point here is that this is work at Marr’s computational level
to try to get to what the correct computational characterization of the system
is.

Here’s a concrete example. In my old paper on
features in minimalism, I suggested that we should not allow feature recursion
in the specification of lexical items (unlike HPSG). I still think that’s
right, but not allowing it causes a bunch of empirical issues to arise: we
can’t deal with tough constructions by just saying that a tough-predicate
selects an XP/NP predicate, like you can in HPSG, so the structures that are
legitimized (or derivations if you prefer) by such an approach are quite
different from those legitimized by HPSG. On the other hand, there are a whole
set of non-local selectional analyses that are available in HPSG that just
aren’t in a minimalist view restricted in the way I suggested (a good thing).
So the specification at the computational level about the richness of feature
structure directly impacts on the possible analyses that are available. If you
look at that paper, it looks very implementational, in Thomas’s sense, as it’s
about whether embedding of feature structures should be specified inside
lexical items or outside them in the functional sequence, but the work it’s
actually doing is at the computational level and has direct empirical (or at
least analytical) consequences. I think the same is true for other apparently ‘implementational’
issues, and that’s why syntacticians spend time arguing about them.

Casting
the Net

Another worry about current syntax that’s
raised, and this is a new worry to me so it’s very interesting, is that it’s
too ‘tight’: That is, that particular proposals are overly specific which is
risky, because they’re almost always wrong, and ultimately a waste of energy.
We syntacticians spend our time doing things that are just too falsifiable
(tell that to Vyv Evans!). Thomas calls this net-syntax, as you try to cast a
very particularly shaped net over the phenomena, and hence miss a bunch.
There’s something to this, and I agree that sometimes insight can be gained by
retracting a bit and proposing weaker generalizations (for example, the debate
between Reinhart Style c-command for bound variable anaphora, and the
alternative Higginbotham/Safir/Barker style Scope Requirement looks settled,
for the moment, in the latter’s favour, and the latter is a much weaker claim).
But I think that the worry misses an important point about the to and fro
between descriptive/empirical work and theoretical work. You only get to have
the ‘that’s weird’ moment when you have a clear set of theoretical assumptions
that allow you to build on-the-fly analyses for particular empirical phenomena,
but you then need a lot of work on the empirical phenomenon in question before
you can figure out what the analysis of that phenomenon is such that you can
know whether your computational level principles can account for it. That
analytical work methodologically requires
you to go down the net-syntax type lines, as you need to come up with
restrictive hypotheses about particularities, in order to explore the
phenomenon in the first case. So specific encodings are required, at least
methodologically to make progress. I don’t disagree that you need to back off
from those specific encodings, and not get too enraptured by them, but
discovering high level generalisations about phenomena needs them, I
think. We can only say true things when we know what the empirical lay of
the land is, and the vocabulary we can say those true things in very much
depends on a historical to and fro between quite specific implementations until
we reach a point where the generalizations are stable. On top of this, during
that period, we might actually find that the phenomena don’t fall together in
the way we expected (so syntactic anaphor binding, unlike bound variable
anaphora, seems to require not scope but structural c-command, at least as far
as we can tell at the moment). The difference between syntax and maths, which was
the model that Thomas gave, is that we don’t know in syntax where the hell we
are going much of the time and what the problems are really going to be,
whereas we have a pretty good idea of what the problems are in maths.

Structure
and Interpretation

I’ll (almost) end on a (semi-)note of
agreement. Thomas asks why we care about structure. I agree with him that structures
are not important for the theoretical aspects of syntax, except as what systems
generate, and I’m wholly on board with Thomas’s notion of derivational
specifications and their potential lexicalizations (in fact, that was sort of
the idea behind my 2010 thing on trying to encode variability in single
grammars by lexicalising subsequences of functional hierarchies, but doing it
via derivations as Thomas has been suggesting is even better). I agree
that if you have, for example, a feature system of any kind of complexity, you
probably can’t do the real work of testing grammars by hand as the possible
number of options just explodes. I see this as an important growth area for
syntax: what are the relevant features, what are their interpretations, how do
they interact, and my hunch is that we’ll need fairly powerful computational
techniques to explore different grammars within the domains defined by
different hypotheses about these questions, along the lines Thomas
indicates.

So why do we have syntax papers filled with
structures? I think the reason is that, as syntacticians, we are really
interested in how sign/sound relates to meaning (back to why these pairings), and unless you have a
completely directly compositional system like a lexicalized categorial grammar,
you need structures to effect this pairing, as interpretation needs structure
to create distinctions that it can hook onto. Even if you lexicalize it all,
you still have lexical structures that you need a theory of. So although
syntactic structures are a function of lexical items and their possible combinations,
the structure just has to go somewhere.

But we do need to get more explicit about
saying how these structures are interpreted semantically and phonologically.
Outside our field, the `recursion-only’ hypothesis (which was never, imo, a
hypothesis that was ever proposed or one that anyone in syntax took seriously),
has become a caricature that is used to beat our backs (apologies for the mixed
metaphor). We need to keep emphasizing the role of the principles of the
interpretation of structure by the systems of use. That means we need to talk
more to people who are interested in how language is used, which leads me to …

The
future’s bright, the future’s pluralistic.

On the issue of whether the future is rosy or
not, I actually think it is, but it requires theoretical syntacticians to work
with people who don’t automatically share our assumptions and to respect what
assumptions those guys bring, and see where compatibilities or rapprochements
lie, and where there are real, empirically detectable, differences. Part of the
sociological problem Thomas and others have mentioned is insularity and
perceived arrogance. My own feeling is that younger syntacticians are not as
insular as those of my generation (how depressing – since when was my
generation a generation ;-( ), so I’m actually quite sanguine about the future
of our field; there’s a lot of stellar work in pure syntax but those same
people doing that work are engaging with neuroscientists, ALL people,
sociolinguists, computational people etc). But it will require more work on our
(i.e. we theoretical syntacticians’) part: talking to non-syntacticians and nonlinguists,
overcoming the legacy of past insularity, and engaging in topics that might
seem outside of our comfort zones. But there is a huge amount of potential
here, not just in the more computational areas that Thomas mentioned, but also
in areas that have not had as much input from generative syntax as they could
have had: multilingualism, language of ageing, language shift in immigrant
populations, etc. There are areas we can really contribute to, and there are
many more. I agree with Thomas that we shouldn’t shirk `applied’ research: we
should make it our own.

13 comments:

I agree that a major question is why we find the sound-meaning pairings we find, and not some other conceivable system --- that's why the most important data in linguistics is typological gaps. But I don't understand why this necessitates detailed technical implementations. It can be done this way (if one is careful not to get sidetracked by matters of implementation), but there's alternatives. The burden of proof is of course mine at this point, and all I can offer is a handful of papers on very special topics versus thousands of "algorithmicy" syntax papers on an incredibly diverse range of topics. But I think those few papers make for a nice proof of concept, so my complaint breaks down into two parts: 1. It looks like a fair share of ink is spent on matters of implementation, at least some of which just distracts from the real issues, and 2. there is an alternative way of doing things that is exceedingly rare, for reasons I do not understand (and aren't really discussed anywhere in the literature afaik).

Regarding the interplay of theoretical work and empirical data gathering, I also agree that you want to have those "that's weird" moments. My issue is, once again, that I don't see why we need net syntax for that. For example, my current working hypothesis is that syntax is adequately described by some fragment of first-order logic over suitably abstract tree structure (right now derivation trees, but probably even more general). That doesn't tell you much about specific structures, features, strong VS weak phases, or anything like that. But I have plenty of "that's weird" moments: why are there only 4 PCCs instead of the 64 that are first-order definable? Why is there no mirror image of binding where anaphors must c-command their antecedents even though first-order definable constraints are symmetry-closed? In addition, the restriction to first-order logic also makes clear predictions what kind of constructions and constraints should be impossible and hence unattested. And it points out parallels to phonology, so all of a sudden phonological phenomena have an impact on what I expect to see in syntax.

You also argue that the net approach is more fruitful and productive for getting a feel of the lay of the land. I don't know if that's true, but let's assume so. Now many problems we worry about have been around for over 50 years and the empirical facts are fairly well understood. So wouldn't that suggest that the net approach has served its purpose and it's time to back off a bit, at least in those areas?

Finally, on the structure side I once again agree that some structure is needed, but the question is how much. Does it matter for semantics whether you have a single CP or a highly elaborate left periphery? Does it matter for semantics whether N is an argument of D or the other way round? Does it matter for semantics how you label adjuncts? Similar questions can be asked about the PF side --- Norvin Richards' contiguity theory, for example, is stated over very elaborate tree structures but if you look carefully at the examples he discusses the crucial part is the selectional configurations of arguments, which is a very basic (and fairly well understood) property.

Just to clarify: my worry is not that the current syntactic methodlogy is fundamentally broken and rotten to the core. But there are problems, problems of degree: how much importance is attributed to specific things, how much abstraction do we use. I'm fine with a syntactic field that has accounts that are very nitty-gritty and accounts that are closer to my ideal of abstraction and minimal assumptions. But that's not the mixture we have right now. I think the field needs a different mix. I might easily be wrong; what I find puzzling, though, is that there's no indication that this issue is even on syntacticians' radar.

One minor point I'd like to add: the Only say true things approach can actually deliver stronger explanations when used the right way. It deliberately tries to keep the set of assumptions small, and the individual assumptions simple and easy to verify. So if that small set of assumptions suffices to explain certain typological gaps, that is a much stronger result because it holds for every formalism that is compatible with those assumptions. You have less to say about what language really looks like, but you can produce very strong results about what it doesn't look like.

I recently got to read a very nice example of this where the authors show that the assumption that phonology defines regular string languages (which is a very safe bet) immediately explains why stress patterns cannot interact with morphology in certain ways. Here's the PDF.

And one more thing (the last one, pinky swear): The question of which pieces of technical machinery are equivalent, to which degree, and how we can measure that, is precisely what my conference poster is about. The basic idea is that we can distinguish at least three levels of equivalence that span the range from E-language to I-language:

- weak equivalence: the two formalisms generate the same string languages- strong equivalence:the two formalisms generate the same phrase structure languages- derivational equivalence: the two formalisms have identical derivation tree languages

Derivational equivalence is the strongest notion and a good formal approximation of I-language identity: the two grammars have equivalent operations, and they apply them in exactly the same fashion, even though the mechanisms regulating the operations might be specified differently. Weak equivalence is very close to the notion of E-language, and strong equivalence is somewhere between the two.

A lot of progress has been made in the last 10 years regarding how different proposals line up with respect to this equivalence hierarchy, and that allows us to gauge which issues are substantial and which just notational. There's two big take-home messages:

- virtually all variants are weakly equivalent, so they cannot be distinguished based on grammaticality judgments, you need structural facts like semantics or succinctness criteria (one exception is recursive AVMs a la HPSG);- except for different types of movement (raising VS lowering VS sideward), virtually all are strongly equivalent, and most are derivationally equivalent

This hierarchy is far from perfect, of course, because it does not take common linguistic criteria like succinctness into account. But it's a start, and it paints a picture of a vast shared common core with most of the perceived disagreements arising purely from how things are encoded in the grammar.

@Thomas: This is a bit beside the point and doesn't have bearing on much of anything. It's just a terminological/historical note. I don't see how the sorts of equivalences you're talking about line up with the E-language/I-language distinction as Chomsky originally used it. Chomsky's original use of these terms was as labels for externalist positions about the nature of language and internalist positions about the nature of language, respectively. Unless I'm completely missing something, there seems nothing that is a priori inconsistent with an externalist position on language that also takes language to be a generative mechanism that generates derivational trees.

@Adam: The E-language = weak generative capacity equation is actually an implication "weak generative capacity --> E-language" since we all agree that string sets are not a description of the speaker's knowledge of language, i.e. I-language.

Regarding I-language = derivational capacity: that's the closest formal approximation of I-language we have because derivation trees describe how the grammar operates, that is their canonical interpretation. If even that isn't good enough to make principled claims about I-language, I frankly don't know what is. One can of course take a step back and downgrade derivation trees to just another tree structure (fine by me), but that doesn't change the fact that derivation trees are best suited to an I-language interpretation among all the structures we currently know.

@Thomas: I see. Thanks for the clarification. The point about the implicational relation between a theory that is just a weakly generative theory and that theory necessarily being an E-language theory makes sense (though only given the assumption that string sets are not a description of the speaker's knowledge of language, an assumption that I happily share, but one that nonetheless seems to me something that needn't have been true of the world).

As for your second point of clarification, I think you either misunderstood what I was saying or I'm still missing something that you're saying. In case of the former, what I was trying to suggest is just that I see no reason for a theory of language that posits a derivational capacity to necessarily be an I-language theory. It seems to me that one could be—or at least that there is no a priori reason why one couldn't be—an externalist about language while maintaining that language consists of a derivational capacity, at least to the extent that externalist theories of language make any sense to begin with. (And Chomsky's original use of the term 'E-language' was just as a way to identify theories of language that were externalist.) So I didn't mean to suggest that a theory of language consisting of a derivational capacity as (one of) its core component(s) is not good enough to make principled claims about I-languages. I agree that a derivational theory seems to be a good internalist theory of language; I'm just not sure that it couldn't also be an externalist one.

In case of the latter (me misunderstanding you), it's probably really not that big of a deal. I don't think much of anything rests on this. I've just seen others using the terms I-language/E-language differently from how Chomsky originally used them, so I was wondering if you were doing the same. Not there's anything necessarily wrong with that, of course. Anybody is free to define to their own technical terms. But if people define the same technical terms but in different ways, then it can become confusing, which is why I decided to ask. But it seems that I might just be misunderstanding you, so it's not a big deal.

@Adam: you're right that derivations do not necessarily imply I-language, but that's a general property of any theory of language: you can always downgrade it to a theory of E-language. Even if you look at grammars, you can deny them any kind of cognitive reality and simply treat them as a succinct description of specific E-languages. In fact, you can take that kind of agnostic stance with pretty much any cognitive theory.

The E-language perspective is always available, which is why I don't find it particularly troubling that it is available for derivation trees. The important thing is that derivation trees provide a natural I-language interpretation.

just a brief response to Thomas. I took you as saying that too much current work in syntax is too `implementationy' which distracts from bigger issues and we should have more work at a more abstract level. My response was that the examples you gave weren't implementationy, they were directly attacking core computational level concerns (e.g. David's work on covert feature vs XP movement). I think your response to that is that there's an alternative way of doing things, and that alternative way (yours ;-)) forces us not to get so hooked up in meaningless discussions about different implementations of the same idea. Maybe, but I'm not so sure.

Let's take, for example, privativity vs binarity of feature systems, which in your poster you say are all s-equivalent. Well, are they? It's not clear to me. For example Harbours treatment of inverse agreement in Kiowa requires a three way distinction of features (+,-, absent) suggesting binarity, at least, is needed and privitivity won't give you enough power for that analysis. So if we want to have a handle on the actual empirical data, if Harbour is right in his analysis, we need to distinguish between systems that you say are S-equivalent. Now, I don't know what makes you say that these two different systems are equivalent, but whatever it is, if it misses how to deal with Kiowa inverse, it's missing something important.

More generally, the intermediate level that focusses on analysis, rather than broad data patterns plus assumed theoretical principles, is still, I think, incredibly important in linguistics. Patterns of strings, in syntax, are pretty uninformative. Patterns of form and meaning are massively informative for theory: just finding out what these and specifying them in sufficient detail might look implementationy, but I think it's both crucial, and is really at the computational level.

None of this is to say that I don't agree with you that there's a problem in making our results accessible to those outside the field, but I think it's really one of presentation rather than of content.

I think this is, as so often, a case of one person's modus ponens being another person's modus tollens.

The s-equivalence is implied by various facts. The least interesting one is that privative, binary and finite-valued feature systems have the same power (priv < bin < val, and in the other direction every val system can be binarized and every binary system can be viewed as a privative system). In addition, features aren't needed at all to generate the intended output trees. You would probably say that those are technical tricks that miss the spirit of Harbour's result, and to that I say: yes, that's the point, because feature systems aren't what his result is about.

Feature systems are a way of coding up algebras, agreement algebras in the case at hand. Harbour's argument is that there is a certain structural richness to this algebra at least in some languages. That richness is completely independent of how you encode it. The important task is to characterize this richness, and to see whether we find algebras of similar complexity anywhere else in language. And that's where the feature-based perspective is inherently limiting because not all aspects of language are easily modeled with the same kind of feature mechanism (syntactic Agree VS morphological paradigms in DM). Even where you find a common core, the more fine-grained assumptions diverge significantly (Harbour's agreement story VS Nevin's account of the PCC). So the level of description is an impediment to unification.

I'm not sure why this is even considered a (radical?) departure from standard generative methodology that has to be carefully vetted before it can be considered a viable research strategy. The early days of generative grammar had its fair share of work along these lines, the problem was that it mostly took the form of weak generative capacity results, which are too coarse for most linuistic issues (though they are a lot more important than linguists usually give them credit for). Some successful work in OT also fits this mold, e.g. Bruce Tesar's research on output-driven maps.

It's also not the case that one has to have a strong formal background for this kind of work. The island paper I referenced in my reply to Omer uses a very simple idea that anyone could have come up with, it just requires thinking at the level of whole languages rather than specific structures.

We agree that there are important underlying generalisations at a very abstract level that are cashed out in superficially different ways in particular implementations. I think what you're saying is that you have a good grasp of what the right level to state these generalisations is at (in terms of algebras or whatever) and you think the generalisations are good enough to do so. I'm not so sanguine about that, at least as a statement about syntax in general, as I don't think we understand large swathes of the system well enough to decide what the right mathematical models are: i.e what are the primitive units and relations such that we can just model them as algebras. For some of the system, perhaps we do, which is where your approach probably has serious traction.

Take crossover, for example. We have a very good understanding of how weak and strong crossover work in terms of a collection of empirical generalisations, but I don't think we're even at the `only say true things' stage at any reasonable level of generality, and the generalisations we do have interact in ways that suggest they are missing something: is it about precedence in underlying structure, is it about bijection, parallelism, economy of scope? We just don't know as yet, and so we need people to keep on plugging at the analytical level to try to figure out what the right generalisations are before we decide on the mathematical model for dealing with these. And crossover crosses over (see what I did there) into questions of scope, quantification, binding, distributivity, command, order, etc, so the right theory for crossover will have implications for the right theory for all of these.

So I think there's an enormous amount that we don't understand as yet, and we should approach that with whatever methodological resources we have at our disposal, whether we use minimalism, CCG, LFG, or whatever, irrespective of whether these different systems are of equivalent weak or strong power. Different systems spark off different ways of thinking about research questions, at lest in my experience, and given that we're at a fairly primitive stage in our field, I think we probably need them and analyses couched within them, for that reason.

That is certainly a position I can live with, but I'm not sure how representative it is of the field. Elena Anagnostopoulou voices concerns about mandatory theoretical fashions in her statement, and Gillian Ramchand's comment is in the same spirit. And TAG, CCG, LFG or HPSG research is not commonly seen in mainstream journals, not even in the bibliography. Now there's of course many reasons for that (limited time, the Matthew effect), but the bottom line is that most Minimalist research thinks in Minimalist terms and only Minimalist terms.

It all creates a very rigid mode of scientific discourse that strikes me as unhealthy (and probably you too). On the most generous reading, it is akin to physicists who can only think of quantum mechanics in terms of collapsing probability waves and don't know any of the alternative interpretations like multiple worlds or pilot waves. Some of those hypothetical physicists may even reject perfectly fine analyses with excellent coverage based on how much sense they make under their probability wave interpretation. Such a restricted view is inherently stifling and limits creativity.

I also find your crossover example very interesting. It is true that we don't know too much yet about crossover, but for me that raises two follow-up questions:

1) Why don't we know more yet? Generative syntax has been around for over 50 years. This isn't much shorter than modern chemistry (which begins with the success of spectroscopic methods in the 40s and 50s) or computer science (the beginnings of which are usually equated with the work of Turing and von Neumann in the 30s and 40s). I think it is fair to say that both fields have been a lot more successful than is accounted for by the 10-20 year gap. This might be due to scientific peculiarities of their domain, a broader base of previous results to build on, better institutional support and more resources, but it might also be something about how research is done.

For example, you don't see computer scientists write an algorithm in a specific programming language and then claim that it works for the handful of problems they've looked at. You have to give a complete, abstract description of the problem, the input, and a proof that the algorithm works. Nor do computer scientists spend their days debating whether regular languages are better formalized via grammars or automata. And, perhaps most importantly, computer scientists try to answer easy questions before hard ones. That's why standard complexity theory makes some abstractions that remove its results from many practical concerns of programmers: the latter is such a hard research problem that it isn't even clear what the questions should look like.

2) Continuing the last point, maybe we don't know a lot about crossover because we aren't asking the right questions. The first question should always be "Is it surprising that this happens in some language?", where surprising means "does not arise if you assume free variation within the computational bounds already set by the rest of your formalism". The existence of crossover effects is not particularly surprising on that level, so the next question is "do we have free variation"? And tellingly enough, I don't know the answer to that because I can't think of a single paper on the typology of crossover effects. We know that weak crossover is not universal, so if the whole typological range were [+/- weak crossover], there would be absolutely nothing to say about crossover. But there is also strong crossover, and I'm not sure if that holds across all known languages. If it doesn't, then the problem of crossover is interesting only to the extent that not all logically possible variants of crossover are attested (e.g. symmetric variants of weak and strong crossover). Which is still an interesting question, but it's also one that current accounts have very little to say about.

Yep, I'm with you about the narrowness of some of the discourse, and Gillian and I have talked about this a bit: the sociological issue that there's a privileged vocabulary and if you don't put things in that vocabulary, people don't listen. I'm generally more comfortable with the standard minimalist vocabulary than she is, so maybe that's why I don't notice the problem so much. Also, I spent my early years doing categorial grammar and various kinds of unification based grammar before I started on minimalism, and I see them all as pretty broadly the same, so I don't care that much about what vocabulary I use to express the generalisations I'm interested in. So maybe I'm not the right person to judge just how problematic this issue is. Certainly much if the work that's influenced me very is not done in 'standard' minimalist vocabulary. Think Ramchand herself, Borer, Williams, Brody etc.

The crossover example is really interesting I think. Am just finishing a paper (for that Frontiers volume on how linguistic is the language faculty) on bound variable interpretations, and much of that falls into the domain of it being easy to just say true things about it, which is how I'm couching the paper. But crossover doesn't fit in neatly to the other 'just say true things' generalisations about bva, although it looks like it should. Strong crossover looks to be universal while weak seems to be more variable, but some of that variability correlates directly with whether the language uses a (possibly null) resumption strategy (which requires analysis tidal level work to determine!) and for some languages (palauan) linear order seems crucial, but that seems to correlate with freedom of order between subject and object. For this reason I don't think that the distribution of weak crossover is just a free variation question. Something interesting is going on, and, I guess it just needs more work! Made me think just the same as you, a perfect research project would be to look at this systematically typologically.