Datasets

For most researchers, datasets come into the world in Stata format. For those privileged few with the opportunity to collect primary data, conception happens in a concept note or grant proposal, which grows into a survey design, a sample calculation, then a questionnaire. At this point though, most of us are content to let the stork intervene, carrying a freshly powdered dataset to our office, ready to be nursed into a strong and gifted paper.[1] We gloss over the messy birth process, in which flesh and blood interviewers and respo

Since its inception, the World Bank’s Open Data initiative has generated considerable excitement and discussion on the possibilities that it holds for democratizing development economics as well as for democratizing the way that development itself is conducted around the world. Robert Zoellick, in a speech given last year at Georgetown University, expounded on the many benefits resulting directly from open data. Offering the example of a health care worker in a village, he spoke of her newfound ability to “see which schools have feeding programs . . . access 20 years of data on infant mortality for her country . . . and mobilize the community to demand better or more targeted health programs.” Beyond this, Zoellick argued that open data means open research, resulting in “more hands and minds to confront theory with evidence on major policy issues.”

The New York Times featured the Bank’s Open Data initiative in an article published earlier this month, in which it referred to the released data as “highly valuable”, saying that “whatever its accuracy or biases, this data essentially defines the economic reality of billions of people and is used in making policies and decisions that have an enormous impact on their lives.” The far-reaching policymaking consequences of the data are undeniable, but the New York Times touches upon a crucial question that has been overshadowed by the current push for transparency: what about quality?