Blog IODC 2016
Madrid. October 6-7, 2016

Beyond raw data: creating city standards for priority datasets

A guest post from CTIC exploring work towards developing common open data standards. CTIC has been involved in many of the most important open data projects in Spain over recent years, from the implementation of the national catalogue, and multiple initiatives at local levels, to working as a partner in the SharePSI2.0 project, that brings together 40 partners from 25 countries with the aim of harmonizing the Opendata in Europe and identifying good practices.

In the early days of the open data movement, the call was for “raw data now”, but more recently we’ve been learning to get more strategy. To paraphrase the campaign led by Tim Berners on creating a ‘Web we Want’, it’s time to be talking about the “Open Data we Want” and how it should be published using common standards.

The growth of open data

Since the “Big Bang” that occurred with the launch of Data.gov in 2009, we have seen an expanding Open Data universe, with an almost exponential growth in terms of Open Data initiatives and portals throughout the world. In Europe, we have seen a strong push from the European Commission through the development of the legal frameworks for open data and the launch of projects for the development and dissemination of Open Data, such as:

At Spanish level, this process began in 2010 with the publication of catalogues in the Basque Country and in the city of Zaragoza and then the publication in 2011 of the National Catalogue datos.gob.es. Since thenthe process has grown to include more than 110 open data initiatives at all levels of government: national, regional, and local (although some initiatives have fallen by the wayside).

Besides the momentum that the government of Spain has provided for open data, a key driver of the success of the national catalogue has been the process of federating catalogues. Through this, data from more than 75 catalogues in Spain are brought together representing about 60% of the entire national catalogue. This is enabled by the catalogue description standard DCAT, developed by the W3C.

Considering all of this, it would seem easy to say that this is a resounding success of the Open Data initiative. However, if we remember the expectations raised initially, it is clear that we have made progress but we have not reached the full promised value generation of open data.

Standardizing key datasets

One of the main hindrances causing the reuse of data more difficult is the lack of standardization. The same sets of data provided by the city of Madrid, should be easily discoverable in Barcelona, Paris, Amsterdam or Ottawa. The standardization of this data would enable an application created by an entrepreneur in Spain to be able to potentially use it anywhere in the world.

The process of standardization requires a continuous and on-going dialogue between those who produce data and those who use it. Practical standardization should focus on the most popular datasets, in order to deliver the maximum benefit.

In Spain, this process began in early 2015 with the publication of the Spanish standard UNE 178 301:2015, created by the standardization group Smartcities. It defines a set of indicators divided into 5 axes (political, organizational, technical, legal and economic) as well as a measurement metric that can assess the level of open data initatives in cities. But, event more interestingly it defines 10 datasets, along with their corresponding vocabularies, that governments should publish, and in the process this recommendation paves the way towards standardization: at least for cities.

Over time, this process must move towards a greater number of datasets and recommended vocabularies and must provide an extension to other levels of the government in order to improve coordination and standardization of public catalogues, which would mean a real improvement in the ratios of reuse and the economic value produced.

How can we encourage more global collaboration around this standard setting? Or is standardisation always a national task?

3 comments

Junar

May 19, 2015 at 23:28

Great Blog post and interesting to have a conversation during Ottawa about most popular datasets and some standards around those to ensure cross-utilization of data between agencies and municipalities.

I recently published a post on top datasets and how to source them that may add to the conversation. But I have to admit that this blogpost by CTIC adds a component I did not mention in mine about the importance of defining some standards around top datasets. Here my blog post: :: https://www.linkedin.com/pulse/top-10-datasets-en-open-data-diego-may.

2 quick comments:
#1 – Ways of prioritizing datasets in different agencies: (1) FOIA requests (2) City or Agency Priorities, and (3) Internal team that knows the dataset and systems that can offer good data for public consumption.