Wikipedia coverage by langauge

Update (November 2014): I’ve recently published a related paper examining how many users edit multiple language editions of Wikipedia and how these multilingual users connect the editions together. Please see Multilinguals and Wikipedia Editing for further information and a free, open-access copy of the article.

I’ve put my new mapping skills to work on the latest Wikipedia dumps from 30 September 2011 to uncover some patterns in geotagged articles. My methods are not perfect and not all language editions of the encyclopedia have the same level of geo-tagging; nevertheless, I think the patterns revealed are quite telling:

The map above shows which language edition out of German, Portuguese, and Spanish has the most geotagged articles in each country. There are a few ties, but for the most part a clear pattern emerges: countries in the Spanish-speaking world have more Spanish articles, German-speaking regions more German articles, etc. I will parse more dumps and add these (in particular I’d like to add Arabic, French, and English), but I think this pattern will hold across these and other languages.

I’ve received some challenge on my language-related research about what specific benefits multilingual contributors might bring, and I think one answer lies in breadth of content. Better coverage for a particular language edition of Wikipedia might not lie in energizing those in the home regions of a language, but rather in mobilizing the diaspora and language learners. As Brian Hecht’s 2010 article shows, there is little overlap in content and articles between different additions of Wikipedia and thus the possibility of greater coverage exists for all language editions.

Note: This post was originally published on Scott Hale's blog on 17 October 2011 11:27 am. It might have been updated since then in its original location. The post gives the views of the author(s), and not necessarily the position of the Oxford Internet Institute.