A social media specialist

Language frequency of Ebola tweets

Ebola is a unique word particularly for an infectious disease; in comparison to Bird Flu or Swine Flu, for example, where developing search queries may be difficult. In the case of Ebola, using the keyword on its own, for me, has been sufficient to gather an enormous amount of tweets. And for languages supported on Twitter, ‘Ebola’ is used across 15 languages and 7 languages have their own translation. As shown in the table below:

I found that my sample of tweets contain languages which have different translation of Ebola as Twitter users may opt to use ‘Ebola’ rather than their own translation. For example, Russian tweeters may use ‘Ebola’ rather than ‘Эбола’.

In order to examine the percentage of English tweets relative to those in other languages; I gathered over a million tweets using Mozdeh which uses Twitter’s Search API. The tweets were gathered over an 11 day period starting 27th of November and ending on the 7th of December 2014.

I used the language metadata to work out the frequencies of these using SPSS, and I have created a table to show the different languages:

Language Breakdown

Frequency (%)

English

632112 (62.3)

Spanish

220566 (21.8)

Portuguese

59774 (5.9)

French

42242 (4.2)

Italian

20645 (2.0)

Dutch

12698 (1.3)

Turkish

5099 (0.5)

German

4899 (0.5)

Russian*

2267 (0.2)

Hungarian

1854 (0.2)

Swedish

1779 (0.2)

Japanese*

1649 (0.2)

Polish

1362 (0.1)

Arabic*

1303 (0.1)

Danish

586 (0.1)

Norwegian

465 (0.0)

Finnish

405 (0.0)

Korean

366 (0.0)

Hindi

187 (0.0)

Thai*

170 (0.0)

Urdu*

116 (0.0)

Farsi*

36 (0.0)

Total

1010580

Missing**

37995

Total

1048575

*These languages have their own translation of ‘Ebola’, but users have still chosen to use ‘Ebola’.
**Not all tweets have language identifiers

The keyword Ebola was picked up across 22 out of 29 languages that Twitter supports. It is interesting to note that 62.3% of Ebola tweets are in English, and Spanish tweets are the second most frequent (21.8%), the third most frequent tweets are in Portuguese (5.9%). For my PhD research I am focusing on English language tweets and this type of analysis tells me that there are a sufficient number of English language tweets related to the Ebola epidemic.

A limitation of this, however, is that I was only able to draw up frequencies of languages that are ‘supported’ by Twitter, for which there is metadata. And not for languages which do not have language identifiers, such as Sub-Saharan African languages.

In the next post I will look at the number of tweets on Ebola that have geolocation data and cross-tabulate these with language identifiers. These results form a part of a larger project which has ethics approval.