This year’s topic is ‘triggers of change’: What causes a sound system or lexicon or grammatical system to change? How can we explain rapid changes followed by periods of stability? Can we predict the direction and rate of change according to external influences?

We have also added two new researchers to our keynote speaker list, which now stands as:

A new paper by Anita Slonimska and myself attempts to link global tendencies in the lexicon to constraints from turn taking in conversation.

Question words in English sound similar (who, why, where, what …), so much so that this class of words are often referred to as wh-words. This regularity exists in many languages, though the phonetic similarity differs, for example:

English

Latvian

Yaqui

Telugu

haw

ka:

jachinia

elaa

haw mɛni

tsik

jaikim

enni

haw mətʃ

tsik

jaiki

enta

wət

kas

jita

eem;eemi[Ti]

wɛn

kad

jakko

eppuDu

wɛr

kuɾ

jaksa

eTa; eedi; ekkaDa

wɪtʃ

kuɾʃ

jita

eevi

hu

kas

jabesa

ewaru

waj

ˈkaːpeːts

jaisakai

en[du]ceeta; enduku

In her Master’s thesis, Anita suggested that these similarities help conversation flow smoothly. Turn taking in conversation is surprisingly swift, with the usual gap between turns being only 200ms. This is even more surprising when one considers that the amount of time it takes to retrieve, plan and begin pronouncing one word is 600ms. Therefore, speakers must begin planning what they will say before current speaker has finished speaking (as demonstrated by many recent studies, e.g. Barthel et al., 2017). Starting your turn late can be interpreted as uncooperative, or lead to missing out on a chance to speak.

Perhaps the harshest environment for turn-taking is answering a content question. Responders must understand the question, retrieve the answer, plan their utterance and begin speaking. It makes sense to expect that cues would evolve to help responders recognise that a question is coming. Indeed there are many paralinguistic cues, such as rising intonation (even at the beginning of sentences) and eye gaze. Another obvious cue is question words, especially when they appear at the beginning of question sentences. Slonimska hypothesised that wh-words sound similar in order to provide an extra cue that a question is about to be asked, so that the speaker can begin preparing their turn early.

We tried to test this hypothesis, firstly by simply asking whether wh-words really do have a tendency to sound similar within languages. We combined several lexical databases to produce a word list for 1000 concepts in 226 languages, including question words. We found that question words are:

More similar within languages than between languages

More similar than other sets of words (e.g. pronouns)

Often composed of salient phonemes

Of course, there are several possible confounds, such as languages being historically related, and many wh-words being derived from other wh-words within a language. We attempted to control for this using stratified permutation, excluding analysable forms, and comparing wh words to many other sets of words such as pronouns which are subject to the same processes. Not all languages have similar-sounding wh-words, but across the whole database the tendancy was robust.

Another prediction is that the wh-word cues should be more useful if they appear at the beginning of question sentences. We tested this using typological data on whether wh-words appear in initial position. While the trend was in the right direction, the result was not significant when controlling for historical and areal relationships.

Despite this, we hope that our study shows that it is possible to connect constraints from turn taking to macro-level patterns across languages, and then test the link using large corpora and custom methods.

Anita will be presenting an experimental approach to this question at this year’s CogSci conference. We show that /w,h/ is a good predictor of questions in real English conversations, and that people actually use /w,h/ to help predict that a question is coming up.

A new paper by Monica Tamariz, myself, Isidro Martínez and Julio Santiago uses an iterated learning paradigm to investigate the emergence of iconicity in the lexicon. The languages were mappings between written forms and a set of shapes that varied in colour, outline and, importantly, how spiky or round they were.

We found that languages which begin with no iconic mapping develop a bouba-kiki relationship when the languages are used for communication between two participants, but not when they are just learned and reproduced. The measure of the iconicity of the words came from naive raters.

Here’s one of the languages at the end of a communication chain, and you can see that the labels for spiky shapes ‘sound’ more spiky:

An example language from the final generation of our experiment: meanings, labels and spikiness ratings.

These experiments were actually run way back in 2013, but as is often the case, the project lost momentum. Monica and I met last year to look at it again, and we did some new analyses. We worked out whether each new innovation that participants created increased or decreased iconicity. We found that new innovations are equally likely to result in higher or lower iconicity: mutation is random. However, in the communication condition, participants re-used more iconic forms: selection is biased. That fits with a number of other studies on iconicity, including Verhoef et al., 2015 (CogSci proceedings) and Blasi et al. (2017).

Matthew Jones, Gabriella Vigliocco and colleagues have been working on similar experiments, though their results are slightly different. Jones presented this work at the recent symposium on iconicity in language and literature (you can read the abstract here), and will also present at this year’s CogSci conference, which I’m looking forward to reading:

Our paper is quite short, so I won’t spend any more time on it here, apart from one other cool thing: For the final set of labels in each generation we measured iconicity using scores from nieve raters, but for the analysis of innovations we had hundreds of extra forms. We used a random forest to predict iconicity ratings for the extra labels from unigrams and bigrams of the rated labels. It accounted for 89% of the variance in participant ratings on unseen data. This is a good improvement over some old techniques such as using the average iconicity of the individual letters in the label, since random forests allows the weighting of particular letters to be estimated from the data, and also allows for non-linear effects when two letters are combined.

However, it turns out that most of the prediction is done by this simple decision tree with just 3 unigram variables. Shapes were rated as more spiky if they contained a ‘k’, ‘j’ and ‘z’ (our experiment was run in Spanish):

So the method was a bit overkill in this case, but might be useful for future studies.

All data and code for doing the analyses and random forest prediction is available in the supporting information of the paper, or in this github repository.

One of the fundamental principles of linguistics is that speakers that are separated in time or space will start sound different, while speakers who interact with each other will start to sound similar. Historical linguists have traced the diversification of languages using objective linguistic measurements, but so far there has never been a widespread test of whether languages further away on a family tree or more physically distant from each other actually sound different to human listeners.

An opportunity arose to test this in the form of The Great Language Game: a web-based game where players listen to a clip of someone talking and have to guess which language is being spoken. It was played by nearly one million people from 80 countries, and so is, as far as we know, the biggest linguistic experiment ever. Actually, this is probably my favourite table I’ve ever published (note the last row):

Continent of IP-address

Number of guesses

Europe

7,963,630

North America

5,980,767

Asia

841,609

Oceania

364,390

South America

356,390

Africa

74,032

Antarctica

11

We calculated the probability of confusing any of the 78 languages in the Great Language Game for any of the others (excluding guesses about a language if it was an official language of the country the player was in). Players were good at this game – on average getting 70% of guesses correct.

Using partial Mantel tests, we found that languages are more likely to be confused if they are:

We also used Random Forests analyses to show that a language is more likely to be guessed correctly if it is often mentioned in literature, is the main language of an economically powerful country, is spoken by many people or is spoken in many countries.

We visualised the perceptual similarity of languages by using the inverse probability of confusion to create a neighbour net:

This diagram shows a kind of subway map for the way languages sound. The shortest route between two languages indicates how often they are confused for one another – so Swedish and Norwegian sound similar, but Italian and Japanese sound very different. The further you have to travel, the more different two languages sound. So French and German are far away from many languages, since these were the best-guessed in the corpus.

The labels we’ve given to some of the clusters are descriptive, rather than being official terms that linguists use. The first striking pattern is that some languages are more closely connected than others, for example the Slavic languages are all grouped together, indicating that people have a hard time distinguishing between them. Some of the other groups are more based on geographic area, such as the ‘Dravidian’ or ‘African’ cluster. The ‘North Sea’ cluster is interesting: it includes Welsh, Scottish Gaelic, Dutch, Danish, Swedish, Norwegian and Icelandic. These diverged from each other a long time ago in the Indo-European family tree, but have had more recent contact due to trade and invasion across the North Sea.

The whole graph splits between ‘Western’ and ‘Eastern’ languages (we refer to the political/cultural divide rather than any linguistic classification). This probably reflects the fact that most players were Western, or at least could probably read the English website. That would certainly explain the linguistically confused “East Asian” cluster. There are also a lot of interconnected lines, which indicates that some languages are confused for multiple groups, for example Turkish is placed halfway between “West” and “East” languages.

It was also possible to create neighbour nets for responses from specific parts of the world. While the general pattern is similar, there are also some interesting differences. For example, respondents from North America were quite likely to confused Yiddish and Hebrew. They come from different language families, but are spoken by a mainly Jewish population and this may form part of players’ cultural knowledge of these languages.

In contrast, players from African placed Hebrew with the other Afro-Asiatic languages.

Results like this suggest that perception may be shaped by our linguistic history and cultural knowledge.

We also did some preliminary analyses on the phoneme inventories of languages, using a binary decision tree to explore which sounds made a language distinctive. Binary decision trees identified some rare and salient features as critical cues to distinctiveness.

The future

The analyses were complicated because we knew little about the individuals playing beyond the country of their IP address. However, Hedvig and I, together with a team from the Language in Interaction consortium (Mark Dingemanse, Pashiera Barkhuysen and Peter Withers) create a version of the game called LingQuest that does collect people’s linguistic background. It also asks participants to compare sound files directly, rather than use written labels.

The conference is part of the “X in the Language Sciences” (XLanS) series which aims to bring a wide range of researchers together to focus on a particular topic in language that interests them. The goal is to identify the crucial issues and connect them with cutting-edge techniques in order to develop better explanations of linguistic phenomena (see details of the first conference “Causality in the language sciences” here).

This year’s topic is ‘triggers of change’: What causes a sound system or lexicon or grammatical system to change? How can we explain rapid changes followed by periods of stability? Can we predict the direction and rate of change according to external influences?

It’s International Women’s day! Language Evolution is a largely male dominated discipline: women account for only 8 out of the top 100 most cited authors, and only 14 out of 82 invited speakers at the Evolution of Language Conference (see here). To promote the contribution of women to our field, we’ve compiled a list of 100 female researchers in language evolution.

The list is by no means exhaustive, and is largely based on attendance at the most recent EvoLang conference. Topics cover both language origins and evolutionary approaches to linguistics more generally. A recent paper by each author is also included, though it may not be the best representation of their work. All mistakes with regards to links and citations are my own.

Micklos (2016) Interaction for facilitating conventionalization: negotiating the silent gesture communication of noun-verb pairs. The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11)

Kriengwatana (2016) A general auditory bias for handling speaker variability in speech? evidence in humans and songbirds. . The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11)

(2016) Rule learning in birds: zebra finches generalize by positional similarities, budgerigars by the structural rules.. The Evolution of Language: Proceedings of the 11th International Conference (EVOLANG11)

In 2016 and 2017, ten post-doc researchers will join the MPI for Psycholinguistics in Nijmegen to form the Language Evolution and Interaction Scholars of Nijmegen group (LEvInSON).

The group will explore the biological and cultural origins of language, and how they are linked through social interaction. The group, led by Stephen Levinson, Seán Roberts and Mark Dingemanse, will be hosted by the Language and Cognition department.

In a new paper in the Journal of Language Evolution, Tessa Verhoef and I analyse reviewer ratings for papers submitted to the EvoLang conference between 2012 and 2016 . In the most recent conference, we trialed double-blind review for the first time, and we wanted to see if hiding the identity of authors revealed any biases in reviewers’ ratings.

We found that:

Proportionately few papers are submitted from female first authors.

In single-blind review, there was no big difference in average ratings for papers by male or female first authors …

… but female first-authored papers were rated significantly higher than male first authored papers in the double-blind condition.

There are many possible explanations of these findings, but they are indicative of a bias against female authors. This fits with a wider literature of gender biases in science. We suggest that double-blind review is one tool that can help reduce the effects of gender biases, but does not tackle the underlying problem directly. We were pleased to see better representation of women on the most recent EvoLang talks and plenary speaker list, and look forward to making our field more inclusive.

I’ll be appearing at Nineworlds convention as part of Stephanie Rennick’s panel on “Lessons for Academia from Computer Games”. The idea is to talk about ways in which games have informed our research, and here’s some of the things I’ll mention:

Minecraft shows us how language evolved

How were the very first languages created? How do you agree on words for things if you don’t have a language yet? The accepted theory is people point at stuff they need and invent a word for it at the same time. After many rounds of negotiation, people come to a consensus about how to describe things. We tried to simulate this in Minecraft by getting people to build a little house together, but they could only communicate by knocking on the table. But what we found was that, if you gave people the ability to point at things, they could do the task perfectly well without inventing a communication system at all. This was quite surprising, and suggests that language did not originate as a simple way of requesting things, but maybe as a way of referring to stuff that you can’t easily point to, like the future or emotions. More here

A chimp playing a computer game shows us we have flexible brains

Ayumu is a chimpanzee who plays computer games, and they’re REALLY GOOD. In a game where you have to memorise the location of numbers on a screen, they left human participants in the dust (there’s a fun video of this). The original researchers concluded that there was a genetic difference between us and chimpanzees: Chimps had evolved better visual memory for hunting, and we evolved better auditory memory for speaking. However, we wondered if Ayumu could beat experienced gamers. We set up a ‘Chimp Challenge’ online where people could play the game. We found over 60 people who were as good as Ayumu. This suggests that the difference is also due to our experience – humans have very flexible brains that can get good at a lot of different things. More here.

Computer games can help us learn about linguistic diversity

Linguists are great at spotting differences between languages, but we don’t actually know very much about what differences matter most to people. We explored “the great language game” – an online game where you have to name the language being spoken in a recording. Looking at 15 million results, we found that the more different languages were, the easier people could tell them apart. But we also found that people confused some languages that linguists would consider extremely different, and also that there were differences depending on the languages you know. We suggest that how you experience a foreign language is linked to you cultural knowledge and beliefs. We took this one step further by creating an updated version of the game with some very rare languages, which we hope to analyse in the future. More here.

The proceedings of the 11th Evolution of Language conference are now available to buy as a physical book.

The book is available through print-on-demand publisher Lulu for £23.72. This is the lowest price allowed by the site, and will provide EvoLang with £2.81 for each sale. The book now also has an ISBN: 978-1-326-61450-8.

This book is being made available due to popular demand, but all the papers and abstracts are freely available from the proceedings website, which is the canonical source. Unfortunately, the costs were too great to publish in colour, so the inside of the book is black and white.