Friday, 19 November 2010

The mutual exclusivity bias has been posited as a fundamental learning bias (Markman & Wachtel, 1988; Merriman & Bowman, 1989). However, there is mounting evidence that bilinguals do not exhibit mutual exclusivity (Merriman & Kutlesic, 1993; Byers-Heinlein & Werker, 2009;Healey & Skarabela, 2009; Houston-Price, Caloghiris & Raviglione, 2010). It is hypothesised that the amount of variation in the input of bilinguals either lacks enough evidence for a mutual exclusivity bias to emerge, or renders it ineffective. Previously, I showed that a Bayesian model of cross-situational learning (Frank et al., 2009) could not account for differences found between the mutual exclusivity behaviour of monolingual and bilingual children. A reasonable step towards capturing this behaviour would be to add a higher level of abstraction to the model. This would allow the model to alter the strength of its own mutual exclusivity bias in accordance with the amount of variance it encountered.

The research described here implements a hierarchical Bayesian model of causal structure (Lucas & Griffiths,2010;Lucas, 2010). Although it does not address mutual exclusivity directly, it is very relevant. Adult's and children's inferences about causal relationships were affected by their previous experience. A hierarchical Bayesian model that could adjust its assumptions about causal structure was shown to match the performance of adults.When we learn about causal relationships, we learn to associate a cause with an effect. When there are multiple possible causes, we also need to consider the functional form of the relationships. For example, imagine that two switches are connected to a light. There are a number of possibilities that could cause the light to turn on. Perhaps only one switch needs to be activated (OR), perhaps both are needed (AND). The switches may always work the same (deterministic) or vary some proportion of the time (stochastic or `noisy').

Lucas & Griffiths (2010) show that a hierarchical Bayesian model can match the behaviour of adults in a learning task where they had to make inferences about causal relationships. Previous models of causal structure learning either assume particular functional forms of causal relationships, making them inflexible, or make no assumptions, rendering them incapable of capturing effects of context (see Griffiths & Tenenbaum, 2005, Lucas & Griffiths, 2010). The hierarchical model infers the functional form of the causal relationship as well as the exact relationship between variables.

Although not couched in terms of cross-situational learning, the task is compatible with one. Participants were shown wooden blocks, identical except for a one-letter label (A, B and C in the figure above). The task was to learn which were `blickets'. To help them, there was a `blicket meter' - a device that activated when in the presence of a blicket. Participants were shown several training rounds where one or two blocks were placed on the blicket meter and observed the meter activating or not. After training, they saw another set of blocks (D, E and F) go through a series of blicket tests. They were asked to indicate how confident they were that each block was a blicket.

The experiments had different training conditions. In one, the training data was consistent with the blicket meter's response having a disjunctive relationship (OR) with its causes. That is, it activated when any of the blocks placed on meter were blickets. In another condition, the blicket meter responded consistently with a conjunctive relationship (AND). That is, two blickets were required to activate the blicket meter. The test block was set up to be consistent with either training condition.

Participants' responses in the test block were affected by their experience during training. They saw block D fail to activate the meter 3 alone 3 times, block E fail once and blocks D and F together activate the meter twice. If you assume a disjunctive relationship, then D failing 3 times should be evidence against D being a blicket, while E failing once is less evidence. Indeed, participants in this condition rated D as being less likely to be a blicket than E. Assuming a conjunctive relationship, however, D failing to activate the meter is not informative, whereas seeing D and F activate the meter together is evidence for D being a blicket. Participants given conjunctive training rated D as being more likley to be a blicket. The model matched the participants responses closely.

Lucas & Griffiths argue that this shows that people can make inferences appropriate to causal relationships with more than one kind of functional form (e.g. conjunctive, disjunctive) and that their inferences can be affected by evidence `transferred' from a previous experience. In other words, participants' assumptions about causal relationships can be modified by experience, and this can lead to qualitatively different behaviour.

Lucas (2010) also shows that the model accounts for children's behaviour, too. However, children's responses were more affected by the likelihood than the adults, while adults tended to assume an OR function. This suggests that children are more flexible learners.

However, I'm unsure whether hierarchical Bayesian modelling can be applied to language. Causal forms and causal relationships have a definite hierarchy. But what about F1 and F0? English uses formants to make distinctions at the lexical level and pitch to make distinctions at the pragmatic or phrasal level. Tonal languages, however, use pitch at the lexical level, and some have morphological markers of phrasal boundaries (see Black, 2000).

I suggest that Bayesian models will always have built-in assumptions about the structure of the phenomenon. In studying language evolution, we should be focussing on how this structure emerges in the first place. I propose a different kind of hierarchical model that does not specify a structure in advance. Rather, the role of each level of the hierarchy should be determined by the data. This should be based on the most salient cues that divide the variance in the data in the most functional way.

One possible solution could be general hierarchical dynamic expectation maximisation models. But more on this in the future.