Different approaches to causality in linguistics

Damian Blasi and I are organising a workshop on Causality in the language sciences (call for presentations now open!). As we were talking about the themes, we realised that there are multiple ways that a causal mechanism may manifest itself in the real world, and that very different statistical approaches may be applicable to each.

Below is the bare bones of a paper that should be coming out in an edited volume on Dependencies in language. We discuss three types of causal mechanism, with examples from linguistics.

The fist kind of causal mechanism is the kind that is often discussed on this blog, what we might call a bi-directional implication. For example, Greenberg’s structural dependencies (e.g. object-subject languages have prepositions, while subject-object languages have postpositions). That is, two variables are predicted to co-vary. The causal strength would be maximised by something like the following table of observations:

Cause Present

Cause Absent

Effect observed

75

0

Effect not observed

0

25

The causal strength of this kind of mechanism can be assessed by Eell’s (1991) measure of causal strength:

Regression techniques may be suitable for statistical analysis of this kind of effect.

The second kind of causal dependency is what we could call coexistentialif the effect and the cause tend to be present together and when the cause is also a necessary condition for the effect to make sense. Linguistic universals would fall into this category, for example the claim that all languages have stop contrasts (Hyman, 2008). The assumed cause is a universal constraint on physicality or communication that is always present. The cause may be absent in e.g. animal communication. In this case, the strength of the claim depends on the number of counterexamples.

Cause Present

Cause Absent

Effect observed

100

(0)

Effect not observed

0

(0)

and causal strength could be measured based on the number of counter-examples:

Causal strength = P(Effect not observed| Cause present)

Positive results of this kind are difficult to demonstrate statistically. For example, Piantadosi & Gibson (2013) suggest that there are not enough independent languages to provide statistical support that reaches conventional significance limits.

The third kind of dependency is a uni-directional implication. For example, Everett (2013) suggested that languages spoken at higher altitudes and drier climates should be more likely to have ejective sounds. This does not make explicit assertions about the overall probability of ejective sounds, nor a constraint that languages spoken at low altitudes should not have ejectives. That is, while the Greenberg case expected ‘pairings’, this case expects a ‘gap’ (no languages without ejectives at higher altitudes).

This is similar to the causal strength measure for bi-directional implication, but normalised by the probability of not observing the effect when the cause is absent (1-(P(E|~C)). This removes the floor effect, meaning that a high probability of the effect when the cause is absent does not necessarily diminish the causal strength.

For this type of mechanism, typical regression methods may not be suitable, because the prediction only makes a claim about a certain portion of the data, not an overall trend in the whole data. In this case, permuation tests or Monte Carlo tests may be more suitable.

When assessing a claim about causality in a linguistic system, it’s important to keep these different mechanisms in mind. Different mechanisms should be approached in different ways, and a single statistical framework may not be the best fit for all hypotheses.