You are browsing the archive for open data benefits.

A couple of weeks ago, I wrote the first part of the three part series on Open Data in Economics. Drawing upon examples from top research that focused on how providing information and data can help increase the quality of public service provision, the article explored economic research on open data. In this second part, I would like to explore the impact of openness on economic research.

We live in a data-driven age

There used to be a time when data was costly: There was not much data around. Comparable GDP data, for example, has only been collected starting in the early mid 20th Century. Computing power was expensive and costly: Data and commands were stored on punch cards, and researchers only had limited hours to run their statistical analyses at the few computers available at hand.

Today, however, statistics and econometric analysis has arrived in every office: Open Data initiatives at the World Bank and governments have made it possible to download cross-country GDP and related data using a few mouse-clicks. The availability of open source statistical packages such as R allows virtually everyone to run quantitative analyses on their own laptops and computers. Consequently, the number of empirical papers have increased substantially. The left figure (taken from Espinosa et al. 2012) plots the number of econometric (statistical) outputs per article in a given year: Quantitative research has really taken off since the 1960s. Where researchers used datasets with a few dozens of observations, modern applied econometricians now often draw upon datasets boasting millions of detailed micro-level observations.

Why we need open data and access

The main economic argument in favour of open data is gains from trade. These gains come in several dimensions: First, open data helps avoid redundancy. As a researcher, you may know there are often same basic procedures (such as cleaning datasets, merging datasets) that have been done thousands of times, by hundreds of different researchers. You may also have experienced the time wasted compiling a dataset someone else already put together, but was unwilling to share: Open data in these cases can save a lot of time, allowing you to build upon the work of others. By feeding your additions back to the ecosystem, you again ensure that others can build on your data work. Just like there is no need to re-invent the wheel several times, the sharing of data allows researchers to build on existing data work and devote valuable time to genuinely new research.

Second, open data ensures the most efficient allocation of scarce resources – in this case datasets. Again, as a researcher, you may know that academics often treat their datasets as private gold mines. Indeed, entire research careers are often built on possessing a unique dataset. This hoarding often results in valuable data lying around on a forgotten harddisk, not fully used and ultimately wasted. What’s worse, the researcher – even though owning a unique dataset – may not be the most skilled to make full use of the dataset, while someone else may possess the necessary skills but not the data. Only recently, I had the opportunity to talk to a group of renown economists who – over the past decades – have compiled an incredibly rich dataset. During the conversation, it was mentioned that they themselves may have only exploited 10% of the data – and were urgently looking for fresh PhDs and talented researchers to unlock the full potential of the their data. But when data is open, there is no need to search, and data can be allocated to the most skilled researcher.

Finally, and perhaps most importantly, open data – by increasing transparency – also fosters scientific rigour: When datasets and statistical procedures are made available to everyone, a curious undergraduate student may be able to replicate and possibly refute the results of a senior researcher. Indeed, journals are increasingly asking researchers to publish their datasets along with the paper. But while this is a great step forward, most journals still keep the actual publication closed, asking for horrendous subscription fees. For example, readers of my first post may have noticed that many of the research articles linked could not be downloaded without a subscription or university affiliation. Since dissemination, replication and falsification are key features of science, the role of both open data and open access become essential to knowledge generation.

But there are of course challenges ahead: For example, while a wider access to data and statistical tools is a good thing, the ease of running regressions with a few mouse-clicks also results in a lot of mindless data mining and nonsensical econometric outputs. Quality control, hence, is and remains important. There are and in some cases also should be some barriers to data sharing. In some cases, researchers have invested a substantial time of their lives to construct their datasets, in which case it is understandable why some are uncomfortable to share their “baby” with just anyone. In addition, releasing (even anonymized) micro-level data often raises concerns of privacy protection. These issues – and existing solutions – will be discussed in the next post.

Looking beyond the Open Knowledge community, however, the situation is very different: In Economics, for example, not many know what “open data”, “open access” or “Open Economics” exactly mean. Indeed, not many even care. A common reaction is: “Yes, it sounds interesting and important, but does it really matter? And why should I care about it?”

In this post, I would like to give some hard evidence on the positive role of opening up information has had in economics, and sketch ideas for how to involve economists – professional or in training – to mainstream ideas of openness. The blog post is divided into three parts: The first part looks at economic research on open data. The second part looks at the impact of open data on economic research. The third part discusses challenges and ways forward.

The real world impacts of open information

Making information accessible to the public can improve public service delivery. In countries where corruption is pervasive, services and funds often do not reach the frontline provider. And even if services do reach the people, the quality of services provided is often shockingly poor: Survey evidence from Bangladesh, Ecuador, India, Peru and Uganda found absence rates as high as 20% and 35% for school teachers and health workers. In many cases, the staff is poorly trained.

Releasing data on service delivery in this case can help reduce corruption and improve public services. In Uganda, researchers provided information to parents by publishing funding data for a random subset of schools in local newspapers. In consequence, corruption decreased significantly, while schooling outcomes improved substantially. Similar evidence in health delivery and redistributive policies suggest that providing information can help the public to discipline public service providers, improving the quality of services.

Information can also expose corrupt politicians: The Federal Government of Brazil, for example, began to select and audit municipalities at random, releasing audit reports to the media. Researchers found that the audit outcomes had a significant impact on the reelection probability of politicians: Those exposed for corruption were punished at the ballots, and the impact was most pronounced in areas where the dissemination of information was favoured by local radio.

A story from fishermen in South India provides another example of how information can improve market efficiency: Studying the adoption of mobile phones in Kerala, researchers have found convincing evidence that access to information through mobile phones helped fishermen sell their catch at the market where the price was highest (and fish most demanded): Instead of sailing to a port and simply hoping for a good price, fishermen were empowered by technology to make informed decisions on how to trade.

Finally, the benefits of transparency are not only restricted to reducing corruption and lowering the cost of information: A comparative study finds that transparency – measured by accuracy and frequency of macroeconomic information released to the public – leads to lower borrowing costs in sovereign bond markets. Open data pays off in many ways – in many different contexts.

These are just a few selective examples on how cutting-edge economic research has identified the benefits of openness in a diverse range of situations. The cases I presented are not based on correlations, but carefully established causal relationships, leaving – at least within the context studied – little doubt that information matters – big time. Perhaps most importantly, these cases have also shown that open data must be understood in a broad sense: These interventions do not take advantage of linked data, do not use CSVs that are shared through Facebook or Twitter – often, these interventions are simple solutions that ultimately help improving the everyday lives of the people.