From access to re-use: a user’s perspective on public sector information availability

Frederika Welle Donker

Abstract

If data are the building blocks to generate information needed to acquire knowledge and understanding, then geodata, i.e. data with a geographic component (geodata), are the building blocks for information vital for decision-making at all levels of government, for companies and for citizens. Governments collect geodata and create, develop and use geo-information - also referred to as spatial information - to carry out public tasks as almost all decision-making involves a geographic component, such as a location or demographic information. Geo-information is often considered “special” for technical, economic reasons and legal reasons. Geoinformation is considered special for technical reasons because geo-information is multi-dimensional, voluminous and often dynamic, and can be represented at multiple scales. Because of this complexity, geodata require specialised hardware, software, analysis tools and skills to collect, to process into information and to use geoinformation for analyses.

Geo-information is considered special for economic reasons because of the economic aspects, which sets it apart from other products. The fixed production costs to create geo-information are high, especially for large-scale geo-information, such as topographic data, whereas the variable costs of reproduction are low which do not increase with the number of copies produced. In addition, there are substantial sunk costs, which cannot be recovered from the market. As such, geo-information shows characteristics of a public good, i.e. a good that is non-rivalrous and non-excludable. However, to protect the high investments costs, re-use of geo-information may be limited by legal and/or technological means such as intellectual property rights and digital rights management. Thus, by making geo-information excludable, it becomes a club good, i.e. a non-rivalrous but excludable good. By claiming intellectual property rights, such as copyright and/or database rights, and restricting (re-)use through licences and licence fees, geo-information can be commercially exploited and used to recover some of the investment costs.

Geo-information is considered special for a number of legal reasons. First, as geo-information has a geographic component, e.g. a reference to a location, geoinformation may contain personal data, sensitive company data, environmentally sensitive data, or data that may pose a threat to the national security. Therefore, the dataset may have to be adapted, aggregated or anonymised before it can be made public. Secondly, geo-information may be subject to intellectual property rights. There may be a copyright on cartographic images or database rights on digital information. Such intellectual property rights may be claimed by third parties involved in the information chain, e.g. a private company supplying aerial photography to the National Mapping Authority. The data holder may also claim intellectual property rights to commercially exploit the dataset and recoup some of the vast investment costs made to produce the dataset. Lastly, there may be other (international) legislation or agreements that may either impede or promote publishing public sector information, whereby in some cases, these policies may contradict each other.

It has been recognised that to deal with national, regional and global challenges, it is essential that geo-information collected by one level of government or government organisation be shared between all levels of government via a so-called Spatial Data Infrastructure (SDI). The main principles governing SDIs are that data are collected once and (re-)used many times; that data should be easy to discover, access and use; and that data are harmonised so that it is possible to combine spatial data from different sources seamlessly. In line with the SDI governing principles, this dissertation considers accessibility of information to include all these aspects. Accessibility concerns not only access to data, i.e. to be able to view the data without being able to alter the contents but also re-use of data, i.e. to be able to download and/or invoke the data and to share data, including to be able to provide feedback and/or to provide input for co-generated information.

Accessibility to public sector geo-information is not only essential for effective and efficient government policy-making but is also associated with realising other ambitions. Examples of these ambitions are a more transparent and accountable government, more citizens’ participation in democratic processes, (co-)generation of solutions to societal problems, and to increase economic value due to companies creating innovative products and services with public sector information as a resource. Especially the latter ambition has been the subject of many international publications stressing the enormous potential economic value of re-use of public sector (geo-) information by companies. Previous research indicated that re-users of public sector information in Europe encountered barriers related to technical, organisational, legal and financial aspects, which was deemed to be the main reason why in Europe the number of value added products and services based on public service information were lagging compared to the United States. Especially the latter two barriers (restrictive licence conditions and high licence fees) were often cited to be the main barriers for reusers in Europe. However, in spite of considerable resources invested by governments to establish spatial data infrastructures, to facilitate data portals and to release public sector information as open data, i.e. without legal and financial restrictions, the expected surge of value added products based on public sector information has not quite eventuated to date and the expected benefits still appear to lag expectations.

When this research started a decade ago, the debate around accessibility of public sector information focussed on access policies. Access policies ranged from open access (data available with a minimum of legal restrictions and for no more than marginal dissemination costs) to full cost recovery, whereby all costs incurred in collection, creation, processing, maintenance and dissemination costs to be recoveredfrom the re-users. Most of the public sector bodies in the European Union adhered to a cost recovery policy for allowing re-use of public sector information. In 2003, the European Commission adopted two directives to ensure better accessibility of public sector information Directive 2003/4/EC of the European Parliament and of the Council of 28 January 2003 on public access to environmental information and repealing Council Directive 90/313/EEC, the so-called Access Directive, provided citizens the right of access to environmental information. Citizens should be able to access documents related to the environment via a register, preferably in an electronic form and if a copy of a document was requested, the charges must not exceed marginal dissemination costs. Directive 2003/98/EC of the European Parliament and of the Council of 17 November 2003 on the re-use of public sector information, the co-called PSI Directive, intended to create conditions for a level playing field for all re-users of public sector information. However, the PSI Directive of 2003 left room for public sector organisations to maintain a cost recovery regime with restrictive licence conditions. In spite of these directives, access policies for geographic data were slow to change in most European nations.

At the end of the last decade, accessibility of public sector information received two major impulses. The first major impulse was the implementation of Directive 2007/2/ EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE), the cocalled INSPIRE Directive, established a framework of standardisation rules for the data and publishing via web services, which significantly contributed to the accessibility of public sector geo-information. The second major impulse was the development of open data policies following the Digital Agenda for Europe adopted in 2010 and the USA Open Government Directive of 2009 and the Digital Agenda for Europe of 2010. These two impulses were the main drivers in Europe to start a careful move from cost recovery policies to open access or open data policies and for more public sector information to be made available as open data. Thus, of the four barriers to re-use of public sector information data cited in Chapter 1 (legal, financial, technical and organisational barriers), two barriers should have been lifted to a large degree due to open data. This shift to open data provided an excellent opportunity to test the hypothesis that the main barriers for re-users of public sector information were indeed restrictive licences and high fees as suggested by earlier research.

Chapter 2 showed that by 2008, most European Union Member States had transposed and implemented the 2003/98/EC PSI Directive, however, in various ways and with considerable delay. By 2008, the effects of the PSI Directive were only slowly starting to emerge. A number of Member States reviewed their access policies and more public sector information became available for re-use. Some Member States made the information available free-of-charge or reduced their fees significantly. In many cases, where re-use fees were reduced the number of regular re-users increased significantly and total revenue even increased in spite of lower fees. Although the 2007/2/EC INSPIRE Directive paved the way for technical interoperability by providing guidelines for web services and catalogues, neither the INSPIRE Directive nor the PSI Directive had tackled the issue of legal interoperability. Chapter 2 also demonstrated that a major barrier to creating a level playing field for the private sector was the fact that some public sector bodies acted as value added resellers by developing and selling products and services based on their own data. Thus, the level playing field envisioned by the European Commission had not been realised.

Chapter 3 researched the aspect of harmonised licences as a first step towards legal interoperability. Earlier research had indicated that one of the biggest barriers for re-users were complex, intransparent and inconsistent licence conditions, especially for re-users wanting to combine data from multiple sources. A survey of licences used by public sector data providers in the Netherlands demonstrated that although there were differences in length and language, there were also many similarities. The conclusion was that the introduction of a licence suite inspired by the Creative Commons concept would be a step towards increased transparency and consistency of geo-information license agreements. This chapter introduced a conceptual model for such a geo-information licence suite, the so-called Geo Shared licences. Both Creative Commons and Geo Shared licence suites enable harmonisation of licence conditions and promote transparency and legal interoperability, especially when re-users combine data from different sources. The Geo Shared licence suite became a serious option for inclusion into the draft version of the INSPIRE Directive as an annex. Unfortunately, the concept of one licence suite for the entire European Union came too early in 2006. The Geo Shared licences were further developed and implemented into the Dutch National Geo Register.

In 2009, the European Commission recognised that PSI was the single largest source of information in Europe and the potential for re-use of PSI needed to be highlighted in the digital age. As part of a review of the 2003/98/EC PSI Directive, the European Commission carried out a round of consultations with stakeholders to seek their views on specific issues to be addressed in the future in 2010. In addition, the Commission commissioned a number of studies. These studies included a review of studies on public sector information re-use and related market studies, an assessment of the different models of supply and charging for public sector information and a study on public sector re-user in the cultural sector. The first study, carried out by Graham Vickery in 2011, showed that the overall economic gain from opening up public sector information as a resource for new products and services could be in the order of €40 billion per annum in the European Union. Both the Vickery Report and the second study, the so-called POPSIS Study, showed that for most public sector data providers their revenues from licence fees were relatively low in comparison to their total budget. After the evaluation, Directive 2013/37/EU of the European Parliament and of the Council of 26 June 2013 amending Directive 2003/98/EC on the re-use of public sector information was adopted and came into force on 17 July 2013. Chapter 4 described the main changes of the 2013/37/EU Amended PSI Directive, including the recommendation to employ open data licences. This chapter continued with a review of the various open data licences in use in Europe and analysed their interoperability. Although adoption of open data licences for public sector information should have addressed legal interoperability barriers for re-users, in practice, the different types of open data licences might not be so interoperable after all. Effectively, only a public domain declaration, such as a Creative Commons Zero (CC0) declaration, is suitable for open data re-users requiring with cross-border data sets and that such a public domain declaration is published in a prominent place to remove uncertainty for re-users. Without a public domain declaration, re-use of open data is still impeded as re-users are loathe to invest time into the development of value added products or services when it is uncertain if and which restrictions may be applicable and what the impact may be on their product or service.

This dissertation also researched the financial and economic aspects of public sector information accessibility. Chapters 1 and 2 indicated that a cost recovery regime for dissemination of public sector information provided a financial barrier for private sector re-users because the fees charged were perceived to be too high. However, in 2008, there were still many advocates for maintaining a cost recovery regime. Especially public sector bodies that are not funded by the national Treasury, the socalled self-funding agencies, needed revenue from data sales to cover a substantial part of their operational costs. A sustainable source of revenue was viewed as essential to maintain the data at an adequate level, and to ensure actuality and continuity. Chapter 5 explored the potential business models and pricing mechanisms for public sector INSPIRE web services. Although, depending on the type of web service, and type of re-user, there might have been an argument for employing a subscription model as a pricing mechanism, business models based on generating revenue from public sector information would not be viable in the long run and were not in the spirit of the INSPIRE Directive. This research concluded that public sector information web services employing different pricing regimes were counterproductive to achieving financial interoperability.

In Chapter 6, business models for public sector data providers were revisited, this time from an open data perspective. Government agencies, including self-funding government agencies are under increasing pressure to implement open data policies. This chapter analysed the business models of self-funding agencies either already providing open data or under pressure to provide (some) open data in the near future. The analysis showed which adaptions might be necessary to ensure the long-term availability of high quality open data and the long-term financial sustainability of self-funding agencies. The case studies confirmed that providing (raw) open data does not necessarily lead to losses in revenue in the long term as long as the organisation has enough flexibility to adapt its role in the information value chain, especially when revenue from licence fees represents only a relative small part of theirtotal budget. The case studies indicated that switching to open data has resulted in internal efficiency gains. In practice, it is difficult to isolate and quantify the internal efficiency gains that are solely attributable to open data as the researched organisations continuously implement efficiency measures. However, the reported decreases in internal and external transaction costs due to open data are in line with the case study carried out in Chapter 7.

Open data also provided an excellent opportunity to assess the effects of open data ex ante as baseline measurements could be carried out. To develop both quantitative and qualitative indicators to assess the success of a policy change is a challenge for open data initiatives. In Chapter 7, a model to assess the effects on the organisation of an open data provider was developed. Liander, a private energy network administrator mandated with a public task, planned to publish some of their datasets as open data in the autumn of 2013. This offered an excellent opportunity to apply the developed assessment model to provide an insight into internal, external, and relational effects on Liander. A benchmark was carried out prior to release of open data and a follow-up measurement one year later. The benchmark provided an insight into the then work processes and into the preparations required to implement open data. The follow-up monitor indicated that Liander open data are used by a wide range of users and have had a positive effect on the development of apps to aid energy savings. However, it remains a challenge to quantify the societal effects of such apps. The follow-up monitor also indicated that regular re-users of Liander data used the open data to improve existing applications and work processes rather than to create new products. The case study demonstrated that private energy companies could successfully release open data. The case study also showed that Liander served as a best-practice case for open data and had a flywheel effect on companies within the same sector. By 2015, nearly all energy network administrators had published similar open data. The monitoring model developed in this project was assessed to be suitable to monitor the open data effects on the organisation of the data provider.

The assessment model developed and tested in Chapter 7 proved to be suitable to monitor the effects of open data on organisational level. However, to provide a more complete picture of the effects of open data and to assess if there are other barriers for re-users, a more holistic approach was required to assess the maturity of open data. Therefore, a holistic open data assessment framework addressing the supplier side, the governance side, and the user side of the open data was developed and applied to the Dutch open data infrastructure in Chapter 8. This Holistic Open Data Maturity Assessment Framework was used to evaluate the State of the Open Data Nation in the Netherlands and to provide valuable information on (potential) bottlenecks. The framework showed that geographic data scored significantly better than other types of government data. The standardisation and implementation rules laid down by INSPIRE Directive framework appear to have been a catalyst for moving geographic data to a higher level of maturity. The maturity assessment framework provided Dutch policy makers with useful inputs for further development of the open data ecosystem and development of well-founded strategies that will ensure the full potential of open data will be reached. Since the publication of the State of the Open Data Nation in 2014, a number of the recommendations have already been implemented.

This dissertation demonstrated that many aspects that should facilitate accessibility, such as standardised metadata, have already been addressed for geodata. This research also showed that for other types of data, there is still a long way to go. There is a growing demand for other types of data, such as financial data and healthcare data. Public sector organisations holding such types of data need hands-on guidelines to enable publication of their datasets, preferably as open data. However, data published as open data are forever and cannot be recalled. Therefore, the decision to publish public sector data as open data is complex: datasets are often of a heterogeneous nature and may contain microdata (data that quantify observations or facts, such as data collected during surveys) Although microdata may not necessarily contain personal data, the datasets will probably have to be processed before publication to address confidentiality and data quality issues. In addition, there is a tension between open data and protection of personal data. The big question remains to which level the data need to be aggregated and/or anonymised to ensure protection of personal data now and in the future, and at the same time keeping sufficient significance to be re-usable. Another issue that needs further research is data-ownership of sensor data and co-created data. Increasingly, sensor data generated by e.g. smart phones, smart energy meters and traffic sensors are collected by the public sector and the private sector and become part of a big data ecosystem. In addition, public sector organisations cooperate with other public sector organisations and the private sector to create information from their data, so-called co-created information. Citizens also collect data or complement information on a voluntary basis, e.g. bird counts data. Co-created information will become more commonplace in the coming decades, as will the contribution of sensor data to a big data ecosystem. However, the aspect of who owns the data in which part of the information value chain has not been researched. Uncertainty related to third party rights will pose a barrier to publishing open data. Therefore, the aspect of data-ownership for sensor data and for co-created data should be further researched.