In the spring of 2016, the WMF partnered with Votomobileand conducted a phone survey to learn more about technology and Wikipedia use in Nigeria.

The 19 questions in the survey covered:

Internet use

Mobile phone use (smartphones & basic voice/SMS phones)

Awareness and use of Wikipedia

General demographics

This was a large-scale IVR phone survey, gathering over 2700 completed survey responses from randomly generated numbers across Nigeria. Voice (IVR) surveys were chosen to include respondents who may not have internet access. This approach allowed us to measure internet and smartphone penetration, along with answering other Wikipedia related questions. Also, the scale and methodology of the survey kept the margin of error low (<2%) for questions asked of all respondents.

To get the most representative data possible, we worked with Votomobile to conduct a phone IVR survey. The reach of a phone survey can encompass nearly the full spectrum of age, gender, geography, income and education levels. For Nigeria, the survey generated random phone numbers which were assigned to mobile phones.

For proper statistical validity, our survey size of 2700 completed responses is large enough where the questions asked of all respondents have a 95% degree of certainty of being accurate within a 2% margin of error.

The survey was recorded in 4 languages - Hausa, English, PIdgen and Yoruba.

One issue with phone surveys is the tendency for some respondents to favor the first response to a question. To address this problem, most of the survey questions presented the responses in a random order for each call. This distributes any bias evenly among the responses instead of accumulating it all on one response. Note that questions that have a 'none of these' or 'other' response always kept this option as the last one presented.

A couple of survey questions, however, have a strong order dependency of their responses and are confusing if they are presented in a completely random order. For instance, when we ask how often they use Wikipedia, asking in a non-sequential order would not make sense (e.g. an order of “once a week”, “once a month”, “once a day”). For these questions, we would randomly present the question in one of two orders: either from lowest to highest, or highest to lowest.

The questions asking if the respondent uses Facebook or WhatsApp are only asked if they previously said that they do not use the internet. This is by design - we wanted to use this question to gauge how many people did not understand that Facebook was part of the internet. The responses to these two questions were not intended to measure the full use of Facebook or WhatsApp.

It’s important to note that this survey is not linear. Depending on how a question is answered, the flow of the rest of the survey may change. For example, if a respondent says they do not have a smartphone, we skip the smartphone related questions. You can review the flow diagram in to see how the survey progresses.

Within the CSV file, each row represents one survey taken, with each column containing the response to the associated question. In certain cases, some questions that should have been asked were not, and these entries are marked as 'Missing Gx'. The number after the G indicates the group of questions that were not asked for that particular respondent.

The original run of the survey had one problem the logic for the Facebook, Whatsapp, and what people used the internet for questions were inverted. Therefore, this branch of responses is not valid data and is marked as 'Missing'.

A second, supplementary survey was later run to gather correct responses to the previously skipped group of questions. These results are in the second CSV file, named as containing 'original and additional data'.