Word meanings across languages support efficient communication

University of California at Berkeley

Languages vary widely in their semantic categories—the ideas they encode in single words or morphemes. Figure 1 illustrates cross-language variation in the semantic domains of color, kinship, and space.

Figure 1. The semantic domains of color (left), kinship (middle), and space (right). For each domain, the top panel illustrates (part of) the domain, and the bottom panel shows the category systems of two languages partitioning that domain. Categories are shown as color-coded regions, sets, or outlines. The research reviewed here seeks to explain why languages vary as they do in their category systems.

Such cross-language variation is found in other semantic domains as well (e.g. Malt & Majid, 2013). Yet at the same time, many logically possible category systems are not found in any known language, and similar category systems appear in unrelated languages. What explains this pattern of wide but constrained variation, and what does this pattern reveal about the relation of language and cognition?

A recent line of research has suggested an answer: this pattern of cross-language variation may reflect the functional principle that language supports efficient communication (e.g. Piantadosi et al., 2011; Fedzechkina et al., 2012; Kemp & Regier, 2012; Kirby et al., 2015), and should therefore show the imprint of selective pressure for that function. On this view, the different category systems that we see across languages represent different language-speciﬁc solutions to this shared communicative challenge. At the same time, on this view, certain category systems recur across unrelated languages because they represent good solutions to this functional problem, and are for that reason arrived at independently. In what follows, I first flesh out these ideas and illustrate them in the semantic domain of spatial relations, with pointers to analogous work in other domains. I then sketch a recent challenge to this line of research, and a response to that challenge.

Efficient communication and cross-language variation: The case of space

The idea of efﬁcient communication involves a tradeoff between two competing forces: informativeness and simplicity. A communicative system is informative to the extent that it supports precise communication; this can be formalized as the extent to which a system minimizes expected information loss in communication between a speaker and a listener (Kemp & Regier, 2012; Regier et al., 2015). A communicative system is simple if its cognitive representation is compact—for example, if it has a small number of semantic categories. Finally, a system is efficient to the extent that it provides high informativeness (low expected information loss) for its level of complexity.

These ideas were originally shown to account for patterns of cross-language semantic variation in the domains of color (Regier et al., 2007; 2015) and kinship (Kemp & Regier, 2012), and more recently number (Xu & Regier, 2014) and household artifacts (Xu et al., 2015; in press). Khetarpal et al. (2013) wished to determine whether the same ideas would account for systems of spatial categories across languages, which are known to vary widely. To that end, they analyzed the spatial systems of 11 diverse languages: Basque, Dutch, English, Ewe, Lao, Lavukaleve, Maijɨki, Tiriyó, Trumai, Yélî-Dnye, and Yukatek. The cross-language data for this study were drawn primarily from Levinson et al. (2003), supplemented by other sources. The spatial systems of these languages cross-cut extensively, as illustrated in Figure 2.

For each of these languages, Khetarpal et al. computationally assessed the informativeness of its spatial system, and compared it to the informativeness of a wide range of hypothetical systems of the same complexity. In all cases, the informativeness of a system—actual or hypothetical—was based on the similarity of the items it categorized together, as determined empirically through pile-sorting by speakers of English and Dutch. Thus, informativeness in communication was taken to be the extent to which a spoken word elicits, in the listener’s mind, notions that are similar to the one that the speaker intended.

Figure 3 shows results for one language, Basque. The red vertical line shows the informativeness of the Basque spatial adpositional system, and the blue histogram shows the informativeness of a wide range of hypothetical systems of complexity comparable to Basque. The Basque system is more informative than 99.95% of the hypothetical systems.

Figure 3. Informativeness of communication supported by the Basque spatial adpositional system (red line), compared with that of comparable hypothetical systems (blue histogram).

For each of the 11 languages analyzed, the attested system was similarly found to lie within the upper tail of the distribution of hypothetical systems (> 95%). These findings suggest that the spatial systems of these genetically and semantically diverse languages are all nearly as informative as it is possible to be, given that level of complexity—consistent with the suggestion that these systems may reflect a functional drive to support efficient, informative communication. Although not detailed here, the same findings also suggest that this account of semantic diversity provides a better explanation of these data than does a major competing theoretical alternative, namely semantic maps (Croft, 2003:134; Haspelmath, 2003; Regier et al., 2013). Finally, as we have seen, the same notion of efficient communication also explains cross-language variation in other qualitatively different domains (color, kinship, number, household artifacts), suggesting that it may provide a domain-general foundation for semantic categories across languages.

A challenge: The origins of semantic diversity

In a commentary on Kemp and Regier’s (2012) kinship study, Levinson (2012) pointed out—correctly—that although that research explains cross-language semantic variation in communicative terms, it does not tell us “where our categories come from” (p. 989); that is, it does not explore the historical or cultural processes that have produced these diverse yet informative category systems. Levinson suggested that this work might be usefully supplemented by experimental studies of cultural transmission in the lab. Carstensen et al. (2015) pursued that idea. They explored iterated learning (e.g. Kirby et al., 2008) of spatial semantic category systems through a chain of human learners. The first learner in the chain was taught a random category system, in which spatial scenes were arbitrarily assigned to categories, and then attempted to reproduce that assignment of scenes to categories. The first learner’s output was then provided as input to the second learner in the chain, and so on. It was found that this process of iterated learning resulted in systems of categories that gradually converged toward greater informativeness, in a variety of ways. These findings, taken together with those above, suggest that larger-scale cultural transmission over historical time could have produced the diverse yet informative category systems found in the world’s languages.