Paris, France — November 24, 2011 — In October 2011, more than 2 million public messages were posted every day on Twitter in Arabic, from about 30 000 in July 2010, a study of 5.6 billion tweets reveals.

The analysis, carried out by Semiocast, is an update of the study on language shares on Twitter published in February 2010. In October 2011, the top 5 languages used on Twitter were: English, Japanese, Portuguese, Spanish and Malay. The survey was conducted on 5.6 billion public messages gathered between July, 1st 2010 and October, 31st 2011, to establish the evolution of most used languages on Twitter.

The messages were processed with Semiocast’s analytics tools, which can identify the language used in short messages among 61 languages in all major writing systems (including Arabic, Greek, Hebrew, Chinese, Korean, Tamil, Cyrillic, Devanagari).

The share of English messages stabilized, Japanese's decreased

Unsurprisingly, English is still the most used language on Twitter, with 39% of messages. This amounts to more than 70 million public tweets per day. Between October 2010 and October 2011, the volume of English messages increased by 182% (x2.82), slightly faster than Twitter's global growth on the same period (+150%, or x2.5). The share of English tweets has stabilized in the last 12 months between 35% and 40%, down from two thirds in 2009 and 50% in February 2010.

While Japanese remains the second most used language, the share of Japanese tweets has been slowly decreasing from more than 19% mid-2010 to 14.2% in October 2011 (about 26M per day). In one year, the volume of Japanese tweets only grew, in absolute value, by 85%, the second slowest growth among top languages after Korean (+72%).

Portuguese still third, Spanish now fourth most used language

The third most used language on Twitter is Portuguese, with 12.4% of all tweets. While this is a significant increase since February 2010, Portuguese has grown slower than Twitter globally: volume only doubled in the last twelve months (+113%). Likewise, volume of tweets in Malay languages (including Bahasa Indonesia and Bahasa Melayu) only doubled (+107%): they represent 6.4% of all messages, mostly coming from Indonesia.

With +250% growth in one year, Spanish has outgrown Malay and has been the fourth most used language on Twitter since August 2011. 8.3% of all public messages on Twitter are now in Spanish — about 15 million of tweets per day.

Likewise, Dutch (+230% in one year) has outgrown Korean (+72%). They are now 6th and 7th most used languages on Twitter representing in October 2011 respectively 2.7% and 1.6% of all tweets.

Arabic grew fastest

Yet, the growths of Spanish and Dutch look feeble compared to Arabic's. The volume of Arabic messages has multiplied by 22 (+2 146%) in the last 12 months. Arabic is now the 8th most used language on Twitter, and Arabic messages represent 1.2% of all public tweets (2.2M per day). With recent events, Twitter has grown exceptionally fast in the Middle East. Although they are not part of the top 10 most used languages, Farsi (+350% in one year, but only 50K messages per day) and Turkish (+290%, 0.8% of all tweets) have also grown fast over the period.

Thai, the 9th most used language on Twitter, also increased significantly (+470% in one year).

Noteworthily, Twitter's website, translated into 17 languages, is not available in Thai nor Arabic yet.

Twitter still banned in China

Less than 0.5% of all tweets are in Chinese (520K per day). Indeed, Twitter is still banned in mainland China and the transition to a new authentication scheme eventually prevented Chinese users to use proxies with regular Twitter clients. China has turned to Twitter's Chinese competitors, most prominently Sina's Weibo. Semiocast's recent client surveys have shown that Weibo now accounts for up to 20% of all micro-messages for queries on global luxury, retail, tourism and transport brands.

Multilingual social media monitoring more required than ever

Language detection, associated with Twitter profiles geo-location, is a powerful tool to assess the presence of global brands on social media. While half of Twitter conversations were in English in 2010, brands now need tools that can cope with their multilingual setups. To match this growing need, Semiocast has extended its language detection algorithms to identify 61 languages in tweets, comments and blog posts, and developed premium sentiment analysis, among other tools, that can be applied to any language.

About Semiocast

Semiocast is a company based in Paris, France, providing data intelligence and research on social media. Semiocast helps its clients measure and evaluate reactions of consumers to a campaign or product launch, to understand buzzes and what consumers are saying about their products, services and brands on social media conversations. In 2011, Semiocast launched Semioboard, a multilingual social media monitoring dashboard for brands and agencies.