Dec 13 Don’t Just Make Data Open, Make Open Data Useful!

Introduction

Earlier this week in USAID’s Evolving Open Data Culture I applauded that U.S. government agency’s efforts to make its open data useful. In this post a dive a little deeper into the topic of “open data usefulness.”

Background

I came to my interest in open data by way of a career that has mostly involved technology-related projects and consulting. Major goals have been to make or support products or services that are useful to somebody.

For private sector clients this usually involved impacting cost or revenue targets. For government agencies or nonprofits work has focused on objectives that have both quantitative and qualitative aspects.

In either case, “usefulness” has meant that the actions taken by people as a result of using the product or service are viewed by them in a beneficial or positive light because it helps them accomplish their objectives.

As a result of this perspective I’ve thought it would be shortsighted or incomplete not to give thought to whether the data provided by an open data program will actually benefit the data’s users. Thinking about making data useful should, therefore, inform the open data provider’s development of services that help users locate, obtain, and use data in ways they find beneficial.

Key Planning Questions

So, what kinds of planning is needed to increase the likelihood that open data will be useful? We need to look at the following:

While there are many ways to “slice and dice” user communities for open data, one of the first ones that programs should consider is inside vs. outside users. “Inside” users are employees associated with the agancy or program that generates the data. “Outside” users are people not directly employed by or affiliated with the generating organization, e.g., the media, the public, associations, and other interest groups. Open data technology and easy-to-use web- and app-based delivery mechanisms have advanced to the point where many have found that an agency’s internal users may actually prefer using the new data tools over older legacy system based tools. If that’s the case, knowing about inside users in advance will speed both planning and acceptance processes.

You also need to look at the potential role of intermediaries who will facilitate data access and usage by performing both pass-along and interpretation functions. There are many types of intermediaries and they can perform both formally and informally. (Actually, This “formal vs. informal intermediary” distinction may not be as clearly defined as it once was given that organizational boundaries are so easily crossed given how easily data can be shared via the web.)

It is important to understand, from each user group’s perspective, who the trusted and reliable intermediary “gatekeepers” are for the types of data being provided. If such intermediaries do not exist, the open data program developer might then need to consider whether to support creation of such intermediaries as part of some sort of capacity building process.

2. How will users find out the data exist?

Users will find out about the data on their own (discoverability) or they will have to be told (outreach). Your consideration of the different channels that can be used and the role of intermediaries will be important. More specialized data may have a relatively small group of potential users and channels where the importance of existing social and professional networks (as potential partners) must be considered.

You also need to take into account one of the key features of open data: a lack of restriction on data reuse. You have to consider the possibility that uses (and users) of your data will be found that you cannot anticipate, predict, or control. For example, will users be able to use open data to create new products and services that generate revenue and jobs?

That’s an important justifications for some open data programs and is crucial to programs such as NOAA’s evolving big data partnership. In such cases, it may be useful to involve potential users and developers right from the start as part of the planning process, as is the case with the NOAA program.

There may arise other open data development priorities where a variety of social utility goals are important. One example is the use of open data for research purposes. In these days of “hackathons” and open data, there are many ways research sponsors can encourage open data analysis. One example is the Millenium Challenge Corporation’s Open Data Challenge to students in economics, public policy, and international development to analyze and MCC financed primary data. Other examples of guided data exploration and development are “hackathon” events sponsored by organizations such as Code for America and the World Bank (see Learning from the World Bank’s “Big Data” Exploration Weekend). In such cases the sponsoring organization provides the organizational (and often financial) support for app development or analysis in order to address specific problems or opportunities that can be approached by analyzing open data sets.

3. What kinds of skills or knowledge will users need to use the data?

In Management Needs Data Literacy to Run Open Data Programs I suggested that those responsible for planning and managing an open data program need a basic understanding of how data are captured, prepared, organized, and used. They don’t have to be data scientists but they might need to understand basics like the difference between a spreadsheet and a database and the importance of metadata and standards.

When talking about public access to open data things can get complicated due to the wide range of skills and experience users can bring to unrestricted open data ranging from no data skills to familiarity with modeling or statistical analysis.

This is one of the reasons why the development-related approach of “API First” is so important. Provision of a documented API (application programming interface) can significantly enhance the potential for both human and machine usability and reusability of open data.

As suggested by my colleague Jason Hare of BaleFire Global in Open Data Portals should be API [First], a variety of usability issues have emerged with dataset oriented open data web portals that an API first design process might help avoid. APIs are by design intended to make it easier for programmers to interact with structured data sets and to develop mobile apps and other mechanisms for interacting with open data. An example of one open data API is provided in CityGram NYC: A Model for Open Data Efforts. There programmers were able to quickly adapt existing software originally designed to access open data from Lexington and Charlotte to access similarly structured NYC open data for distribution via smartphone.

The rationale for focusing first on API development for open data access is a strong one but does assume that the capacity exists within target user communities to take advantage of the API to create applications that reflects the needs of the community. In the CityGram NYC example given above the capacity did exist. Motivated volunteer developers were able to rapidly adapt existing software to deliver services around an existing documented dataset.

Conclusions

Returning to our initial focus, making data useful, the following key points have been made:

Understand who is going to use open data both inside and outside the sponsoring organization.

Examine all relevant channels for making potential open data users aware that potentially useful data exists.

Assess the skills and other resources needed to interact with the data, including the possible need to start by developing and documenting an appropriate API toolset.

Understand the potential roles of intermediaries and social networking in making open data available and useful.

Just making a huge catalog of data files available for downloading should not be thought of as the end goal for an open data program. Some people will be “power users” and able to locate, crunch, and analyze data on their own. Others may need more direct — and costly — support. All are legitimate uses of open data and are deserving of support.