Cambridge University Linguists Track Changes in Welsh Language Using Twitter

May 29, 2013

A team of researchers from the University of Cambridge announced today that they have been using a database of Welsh tweets to identify and track the characteristics of the evolving Welsh language.

Welsh is spoken by only around a sixth of the population of Wales, accounting for around 560,000 people, however, enough Welsh speakers are tweeting in Welsh to allow the researchers to compile a database of source material which they intend to use as the basis for future research.

Dr David Willis, of Cambridge’s Department of Theoretical and Applied Linguistics, explained the crucial role Twitter has played in the research and presented some of the advantages of working with conversational Tweets.

He stated that “when your intention is to capture everyday usage, one of the greatest challenges is to develop questions that don’t lead the respondent towards a particular answer but give you answers that provide the material you need. If I want to find out whether a particular construction is emerging, and where the people who use it come from, I would normally have to conduct a time-consuming pilot study, but with Twitter I can get a rough and ready answer in 30 minutes as people tweet much as they speak.

My focus is on the syntax of language – the structure or grammar of sentences – and my long-term aim is to produce a syntactic atlas of Welsh dialects that will add to our understanding of current usage of the language and the multi-stranded influences on it. To do this relies on gathering spoken material from different sectors of the Welsh-speaking population to make comparisons across time and space.”

The Twitter analysis is being used to support a year-long study in which Willis and assistant researchers have been interviewing around 160 people across Wales beginning with North Wales where the language is thriving and a significant number of children use Welsh as their home language. The study included both those Welsh speakers who had acquired Welsh at home and school.

The spoken questionnaire asked interviewees to repeat in their own words sentences that were presented to them in deliberately ‘odd’ Welsh that mixed different dialects, inviting the interviewee to rephrase the awkwardly phrased sentence to sound more ‘natural’. An example in English might be ‘we’ve not to be there yet, don’t we?’ which a British speaker might be expected to rephrase as ‘we haven’t got to be there yet, have we?’

It is hoped that the data from these interviews can shine a light on how and why the structure of language shifts over time – and provide the researchers with a valuable database for use in future research.

Changes identified so far include use of pronouns and multiple negatives. An analysis of usage of the Welsh words for ‘anyone’, ‘someone’ and ‘no-one’ reveals that there are differences between those who learnt Welsh in the home (who are more likely to say the equivalent of ‘did someone come to the meeting?’ and ‘I didn’t see no-one’) and those who learnt it at school (who are more likely to say ‘did anyone come to the meeting?’ and ‘I didn’t see anyone’).

One example of multiple negatives reveals a shift in meaning of the Welsh word for refuse, ‘cau’. “We knew that people in the north used the word ‘cau’ to mean ‘won’t’, saying the equivalent of ‘the door refuses to open’ for ‘the door won’t open’. Negative concord – such as saying ‘I haven’t not seen no-one’ for ‘I haven’t seen anyone’ – is a strong feature of Welsh. We’ve now identified two groups in the north: one that still says ‘the door refuses to open’ and the other that have begun to say ‘the door doesn’t refuse to open’. The next step is to work out when and how this change occurred.”

To track shifts in the language, a technique known as GIS mapping was used to plot where interviewees were raised. This enabled researchers to look at the geographical spread of particular aspects of syntax, making comparisons between age groups and gender in the different areas interviewees were brought up.

The research has revealed that, while Welsh does not appear to vary much by social class, there are interesting differences between the variety of Welsh spoken by those who learn it as their first language and that spoken by those who are first exposed to it at nursery or primary school.

“Those who acquire Welsh once they reach school are more likely to use English sentence constructions, which are perfectly good Welsh but differ significantly from the constructions used by those who acquired Welsh at home. For example, they tend to prefer standard focus particles – words that correspond to a strong stress in English sentences like ‘I know YOU’ll be on time’ – over the ones from their local dialect,” said Willis.

With around 22% of the Welsh population educated in Welsh at school, and all children learning it as a second language, data on this aspect of language acquisition may prove valuable in developing Welsh teaching policy – for example, in determining which forms to teach second-language learners or in promoting both dialect and standard written Welsh in schools.