Revolutionizing the collection of household data

Can you recall how much money you spent on food last year? Or how many days of work you missed last summer?

No doubt these are difficult questions to answer, especially if you are trying to answer off the top of your head or without referring to well-kept records. Yet many household surveys expect accurate recall over such long periods from survey participants. One obvious strategy for reducing the length of time over which participants must recall important pieces of information is to simply increase the frequency of data collection. But the costs of these household surveys are such that such data collection efforts are rarely justified beyond only a few visits during the lifetime of a typical research project. Not only does this result in longer recall periods, but data collection efforts may inadvertently ignore important sources of exogenous variation that occur in the intervening time between survey visits. Furthermore, during these visits, investigators engage participants in cognitively taxing exercises for extended periods during which they may grow fatigued, or be distracted or called away by other obligations, potentially resulting in less reliable responses.

This confluence of data quality and reliability concerns may lead some to question the extent to which these data provide accurate signals or characterizations to serve as the basis for development policies and interventions.

The diffusion of inexpensive smartphones into many rural communities presents the development and policy research community with a unique opportunity to directly and continually engage with rural peoples, and researchers from various disciplines have recently been exploring the use of smartphones to gather information.

A recent article published in PLoS ONE presents some early highlights from one such pilot study in Bangladesh. Starting in December 2015, researchers from IFPRI and New York University distributed nearly 500 Android smartphones to farmers in Rangpur district in northwestern Bangladesh. During the course of the study, participants responded to a series of survey “microtasks” (5 to 10 tasks per week, each requiring 3 to 5 minutes each) in return for talk-time, SMS, and mobile data “micropayments.” Survey tasks were structured and valued such that continued engagement with the program could potentially pay for all of the participants’ mobile talk, text, and data needs, while also garnering enough credits that they would be eligible to keep their smartphone handset following the completion of the study.

The early highlights demonstrate some of the obvious value of this approach for collecting data. There were systematic differences in the responses that participants gave depending upon the length of the recall period. For example, when reporting on income generating activities, those responding once at the end of a growing season (125 days) listed activities totaling only 9 person-hours per day for the entire household. Meanwhile, those reporting on a weekly basis—thus recalling a span of only the previous seven days—listed activities totaling more than 50 person-hours per day per household. The average earnings reported also differed by an order of magnitude across these different recall periods.

Even when the recall period was held constant, asking questions more frequently produced different patterns of responses. For example, in one survey task, participants were asked to report on their access to drinking water over the previous 7-day period, and to rate the quality of that drinking water on a scale from 1 (worst) to 5 (best). On average, the longitudinal variance among individuals differed significantly depending upon the frequency with which they were asked. Those asked to report on their access to quality drinking water on a weekly basis tended to have significantly more variable responses than those responding less frequently, capturing higher frequency variation in water quality than would emerge in less frequent data collection efforts.

This study provides an early snapshot into the potential for this data collection method. One of the most pressing concerns in the short run is that smartphone adoption is likely to be concentrated among relatively young, relatively wealthy, and relatively well-educated members of society. Such participants will almost certainly be systematically distinct from the underlying population of interest, introducing an important source of bias. This can at least partially be ameliorated in the near term by crowdsourcing data (i.e., tasking participants to gather information from friends, family, and other members of their networks). This approach leverages the power of aggregation while, at most, only sacrificing the ability to explain the underlying observations with higher resolution data. Indeed, the article demonstrates the potential for crowdsourcing to address this selection bias. Overall, as expected, participants in the study were younger (33 years of age compared to the Rangpur division average of 44 years) and more educated (10 years of schooling compared to the division average of 3.5 years among household heads). Their crowdsourced samples, however, demonstrated a measurable step toward the division average (at least in terms of education, at 8.5 years), marginally increasing the representativeness of the responses. Crowdsourcing also improved the gender balance of respondents (27 percent of participants in the crowdsourced sample were female, compared with only 11 percent in the main sample). In the future, as smartphone use becomes more widespread in the rural landscape, it seems likely that this data collection approach may produce progressively more representative participant pools.

Taken to scale, such a program presents several opportunities. Smartphone-based research platforms could provide near-real-time characterizations of on-the-ground realities, including tracking the diffusion of new practices or technologies, monitoring labor or price dynamics both within and across agricultural seasons, and diagnosing and following the spread of both human and crop diseases. Such programs could help researchers across the world to test and deploy a wide range of survey and experimental tasks for fees that translate directly into mobile data and talk time for the rural poor.

This increased access to the vast amounts of information on the internet raises the possibility of massive positive informational externalities. The opportunity cost of this approach is a loss of accountability. In addition to concerns over the representativeness of the sample, there is also a shift from using trained teams of enumeration professionals to self-reporting participants whose abilities and goals will vary widely. This study clearly demonstrates that micropayments can be successful in encouraging continued engagement in the project, but outside of direct ground-truthing, there is little that can be done to test the veracity of submitted responses. As the use of this methodology expands, new burdens will be placed on researchers to ensure data quality by making survey tasks intuitive, ensuring the rewards are appropriately aligned with activities, and finding innovative strategies for overcoming sample biases.

Patrick Ward is a Research Fellow in IFPRI's Environment and Production Technology Division.