Tag Archives: datasets

The open data movement has surely gathered momentum across the world. Taking a cue from the United States, which launched its data.gov open data sharing site in May 2009, many national governments have taken similar initiatives to create their own open data sites. These include developed countries such as Australia, Canada, France, Germany, Italy, Netherlands and UK as well as emerging nations such as Brazil, India, Indonesia and Russia. Most of these sites got launched between 2010-2012.

India launched its open data site, data.gov.in September 2012. After starting off slow, it has now picked up momentum and today offers more than 2500 datasets. A dataset is a table of data on a particular area. It could be as large as all the crop production in the country crop-wise, district-wise for the last 30 years or it could be as narrow as exports of a particular item to different global regions in a single year.

The US opened the government data as part of president Obama’s open governance promise, while the first Federal CIO Vivek Kundra, the person behind implementing the initiative, called upon individuals, groups and commercial companies to make use of open data to build innovative apps that would solve citizens’ problems. Kundra consistently championed building apps and even prophesized that in the coming years, there would be “explosion of apps” based on open data.

Since then, these two attributes—transparency and citizen apps—have become the de facto objectives of government open data initiatives across the world. While the developed world has taken to both these objectives, the emerging countries have focused more on the citizen app side, for obvious reasons. Transparency is a very lofty objective to achieve in these countries just by releasing some datasets, when other governance frameworks are not ready.

While both these are worthy expectations to have from government open data initiatives, what is a little worrying is that these objectives have come to define open data priorities and policies in many countries.

Take the Apps expectation, for example. Globally, the role of apps creation from open data has been so overemphasized that many governments try to measure the effectiveness of their open data programs by the number of apps developed on the data made available. That is a completely misplaced expectation because of two reasons. One, data can help in betterment of citizen’s life in many ways beyond apps. Two, it is difficult for governments to track all the apps created. Look at the US data.gov site. Though there are more than 75,000 datasets, there are only around 350 citizen developed apps shared in the site.

Apart from misplaced expectations (and disappointments because of not meeting those expectations), the apps expectation has also resulted in misplaced priorities and policies governing open data.

Here are some of the skewed policies governments have followed because of the overemphasis on the apps part of open data.

Not measuring the efficiency accrued to the economy. Open data initiatives throw important government information in public domain, accessible easily to all. Very often, similar information is separately collected by various others (academic researchers, commercial organizations, other government bodies and agencies) for their requirements, thus duplicating the efforts. In other words, it is inefficient use of time and resources.

Open data, by eliminating—or at least minimizing—the need to duplicate that effort makes the whole economy far more efficient. This is difficult to measure in the short run but over a period of time can be measured. I have never heard any open data evangelist talking about this anywhere.

Further, if the governments realize this, they could cooperate with the other stakeholders and data collection and processing can be optimized to meet the requirements of more stakeholders. In future, the cost can even be shared. This can lead to far more efficient collection and processing of basic information and even enhance data quality.

Limited Outreach. The overemphasis on apps aspect has created a misplaced priority in terms of outreach. The outreach programs of governments in most countries are directed at the tech/app builder community with some tech savvy NGOs/advocacy groups joining in. The entire open data discussion is restricted to these three communities: government, developers, NGOs/advocacy groups. Many major stakeholders such as media, market researchers and academic researchers who could play an important role in showing the latent value that lies in open data are today left out. Even if they do show an interest, they often get scared away by the technical lingo that dominates these discussions. That is a loss for the cause of open data.

In an online conversation hosted by The World Bank on Open Data for Poverty Alleviation, I raised this point. Tim Davies of Practical Participation did agree and had this to say.

I think there is often a failure in open data capacity building to think about the consultants, analysts, researchers and so-on who might be engaged as users of data, and who will provide bespoke value added services on top of it (hopefully realizing social as well as economic value).

Restrictive data formats. Many government agencies implementing open data in their countries focus all their attention on obtaining/creating datasets in machine readable format—a direct result of working from apps backwards. While a lot of time and energy is wasted in conversion/cleaning, a lot of good, structured datasets, that are not in machine readable format never make it to their list of published datasets. That is a big loss.

True, machine readable formats do make life easier for everyone, but ignoring human readable formats is the other extreme. Open data is not defined by any format. Maybe, the implementers of data portals should take some middle path, which will encourage machine readable formats but should not leave out human readable formats such as pdf completely.

Too much emphasis on datasets on consumer interest areas. The overemphasis on citizen apps put an undue pressure on the managers of data portals to work towards obtaining more and more datasets that are directly of interest to end consumers and hence good data to build apps on. So, while a hospital list or a crime info dataset is cheered, a crop production data or exports data is often dismissed as “useless information dumped by government.” While it’s true that data that is of consumer interest can be used to instantly create apps, research on data on agriculture and meteorology, when analyzed at the hands of experts and using right tools can have a far broader and long term impact on the lives of millions of citizens. These analyses could help in maximizing agricultural production/avoiding big disasters/imparting the right skills to unemployed youth and so on, even if they are not created as sleek apps.

Slowly but surely, the constraints of associating open data too much with apps and pre-designed visualizations are being realized. Mike Gurstein, a leading voice about open data argued this in his blog.

But why shouldn’t we think of “open data” as a “service” where the open data rather than being characterized by its “thingness” or its unchangeable quality as a “product”, can be understood as an on-going interactive and iterative process of co-creation between the data supplier and the end-user; where the outcome is as much determined by the needs and interests of the user as by the resources and pre-existing expectations of the data provider?

Though Gurstein’s explicit question is about the rationality of deciding outcomes by the pre-existing expectation of the data provider, the logic can be extended to ask why should it be based on the pre-existing expectation of the apps providers? In most cases, the apps providers do not have too much of extra insight about the end users’ needs.

At the end, it must be pointed out that open data is about making information work for the betterment of society—making lives of citizens convenient, creating the basis for decisions at a macro-economic level, making the economy and business ecosystem more efficient, and yes, minimizing risk. It is not about technology; technology is a very handy tool, though.