Global Language Network

How does Wikipedia help linguists in their research? How many languages are keeping the whole communication of the world interconnected? Assistant Professor of Media Arts and Sciences at MIT Media Lab Cesar Hidalgo explains how to quantify the importance of a language.

Of the many languages that have ever been spoken only a few of them have been able to achieve global prominence, they have been important enough to become a global language. A lot of people have tried to look at what languages were global in the past and they have used different measures, whether a language is spoken by a large number of people, or whether people that speak that language have a large military, or whether people that speak that language have a high level of income. These measures in some sense give us an idea of which languages are more important than others, but they leave out an important dimension that helps us to understand the truly global importance of a language.

The thing is, although this is an observation that is rather obvious, it has been so far historically impossible to measure the network of global languages and therefore use of that information to help to determine which are the languages that are more important. What we have done in my group is we have gone to the Web and different repositories of data, and we have been able to connect information that allows us to map languages that are spoken with the others. We used 3 different data sources: the first one was the twitter data set, and we looked at 1 billion tweets, and we can ask “Well, do you express yourself in Russian in Twitter? And, do you express yourself in English in Twitter?” And we can detect the language of your tweets. If you know that you express yourself in both languages, we know that you contribute a little bit to the link between Russian speakers and English speakers.

Basically, being born into a highly connected language is a better predictor of whether that person is going to be important or not than being born into language that is very populous or that is spoken by people that are very wealthy. We used alternatively in the other dataset that was published by Charles Murray in a book a few years ago, in which he created a list of 4000 people in the arts and sciences, and we basically found the same results. The centrality of a language in the global language network is a significantly strong predictor of whether that language produces a large number of successful people after controlling for their income and the population of the language.