Espace adhérent

Open data is a philosophy and practice requiring that certain data be freely available to everyone, without restrictions from copyright, patents or other mechanisms of control. It has a similar ethos to a number of other "Open" movements and communities such as open source and open access. However these are not logically linked and many combinations of practice are found. The practice and ideology itself is well established (for example in the Mertonian tradition of science) but the term "open data" itself is recent. Much of the emphasis in this entry is on data from scientific research and from the data-driven web. In some cases open data may be considered as more properly Open Metadata and there is not yet a consistent formalisation. This article uses recent publications and activities to define the scope of the concept and term.

The concept of open data is not new; but although the term is currently in frequent use, there are no commonly agreed definitions (unlike, for example, Open Access where several formal declarations have been made and signed).

Open data is often focussed on non-textual material such as maps, genomes, chemical compounds, mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data is controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions are against the communal good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by license.

A typical depiction of the need for open data:

Numerous scientists have pointed out the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery…..we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge

Creators of data often do not consider the need to state the conditions of ownership, licensing and re-use. For example, many scientists do not regard the published data arising from their work to be theirs to control and the act of publication in a journal is an implicit release of the data into the commons. However the lack of a license makes it difficult to determine the status of a data set and may restrict the use of data offered in an Open spirit. Because of this uncertainty it is also possible for public or private organizations such as IEEE to aggregate said data, protect it with copyright and then resell it.

Under "Toward Open Data" Connolly (2005, v.i.) gives two quotations:

I want my data back. (Jon Bosak circa 1997)

I've long believed that customers of any application own the data they enter into it..[2] (This quote refers to Veen's own heart-rate data.)

The concept of open access to scientific data was institutionally established with the formation of the World Data Center system, in preparation for the International Geophysical Year of 1957-1958.[3] The International Council of Scientific Unions (now the International Council for Science) established several World Data Centers to minimize the risk of data loss and to maximize data accessibility, further recommending in 1955 that data be made available in machine-readable form.[4]

In 1995 GCDIS (US) put its position clearly in On the Full and Open Exchange of Scientific Data (A publication of the Committee on Geophysical and Environmental Data - National Research Council):

"The Earth's atmosphere, oceans, and biosphere form an integrated system that transcends national boundaries. To understand the elements of the system, the way they interact, and how they have changed with time, it is necessary to collect and analyze environmental data from all parts of the world. Studies of the global environment require international collaboration for many reasons:

to address global issues, it is essential to have global data sets and products derived from these data sets;

it is more efficient and cost-effective for each nation to share its data and information than to collect everything it needs independently; and

the implementation of effective policies addressing issues of the global environment requires the involvement from the outset of nearly all nations of the world.

International programs for global change research and environmental monitoring crucially depend on the principle of full and open data exchange (i.e., data and information are made available without restriction, on a non-discriminatory basis, for no more than the cost of reproduction and distribution."

The last phrase highlights the traditional cost of disseminating information by print and post. It is the removal of this cost through the Internet which has made data vastly easier to disseminate technically. It is correspondingly cheaper to create, sell and control many data resources and this has led to the current concerns over non-open data.

In 2004, the Science Ministers of all nations of the OECD (Organisation for Economic Co-operation and Development), which includes most developed countries of the world, signed a declaration which essentially states that all publicly-funded archive data should be made publicly available.[19] Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation.[20]

In 2006 Science Commons [21] ran a 2-day conference in Washington where the primary topic could be described as Open Data. It was reported that the amount of micro-protection of data (e.g. by license) in areas such as biotechnology was creating a Tragedy of the anticommons. In this the costs of obtaining licenses from a large number of owners made it uneconomic to do research in the area.

In 2007 SPARC and Science Commons announced a consolidation and enhancement of their author addenda [22]

In 2010 the Panton Principles launched,[23] advocating Open Data in science and setting out for principles to which providers must comply to have their data Open.

Public money was used to fund the work and so it should be universally available.

It was created by or at a government institution (this is common in US National Laboratories and government agencies)

Facts cannot legally be copyrighted.

Sponsors of research do not get full value unless the resulting data are freely available

Restrictions on data re-use create an anticommons

Data are required for the smooth process of running communal human activities (map data, public institutions)

In scientific research, the rate of discovery is accelerated by better access to data.[24]

It is generally held that factual data cannot be copyrighted.[25] However publishers frequently add their copyright statements (often forbidding re-use) to scientific data accompanying (supporting, supplementing) a publication. It is also usually unclear whether the factual data embedded in full text are part of the copyright.

While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.

As the term Open Data is relatively new it is difficult to collect arguments against it. Unlike Open Access where groups of publishers have stated their concerns, Open Data is normally challenged by individual institutions. Their arguments may include:

this is a non-profit organisation and the revenue is necessary to support other activities (e.g. learned society publishing supports the society)

the government gives specific legitimacy for certain organisations to recover costs (NIST in US, Ordnance Survey in UK)

government funding may not be used to duplicate or challenge the activities of the private sector (e.g. PubChem)

By "open access" to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.

The logic of the declaration permits re-use of the data although the term "literature" has connotations of human-readable text and can imply a scholarly publication process. In Open Access discourse the term "full-text" is often used which does not emphasize the data contained within or accompanying the publication.

Some Open Access publishers do not require the authors to assign copyright and the data associated with these publications can normally be regarded as Open Data. Some publishers have Open Access strategies where the publisher requires assignment of the copyright and where it is unclear that the data in publications can be truly regarded as Open Data.

The ALPSP and STM publishers have issued a statement about the desirability of making data freely available [26]:

Publishers recognise that in many disciplines data itself, in various forms, is now a key output of research. Data searching and mining tools permit increasingly sophisticated use of raw data. Of course, journal articles provide one ‘view’ of the significance and interpretation of that data – and conference presentations and informal exchanges may provide other ‘views’ – but data itself is an increasingly important community resource. Science is best advanced by allowing as many scientists as possible to have access to as much prior data as possible; this avoids costly repetition of work, and allows creative new integration and reworking of existing data.

and

We believe that, as a general principle, data sets, the raw data outputs of research, and sets or sub-sets of that data which are submitted with a paper to a journal, should wherever possible be made freely accessible to other scholars. We believe that the best practice for scholarly journal publishers is to separate supporting data from the article itself, and not to require any transfer of or ownership in such data or data sets as a condition of publication of the article in question.

Even though this statement was without any effect on the open availability of primary data related to publications in journals of the ALPSP and STM members. Data tables provided by the authors as supplement with a paper are still available to subscribers only.

There are a number of other "Open" philosophies which are similar to, but not synonymous with Open Data but which may overlap, be supersets, or subsets. Here they are briefly listed and compared.

Open Source (Software) is concerned with the licenses under which computer programs can be distributed and is not normally concerned primarily with data.

Open Content has similarities to Open Data and may be seen as a superset but differs in that it emphasizes creative works while Open Data is more oriented towards factual data and the output of the scientific research process.

Open Notebook Science refers to the application of the Open Data concept to as much of the scientific process as possible, including failed experiments and raw experimental data.[27]

Open Knowledge. The Open Knowledge Foundation argues for Openness in a range of issues including, but not limited to, those of Open Data. It covers (a) scientific, historical, geographic or otherwise (b) Content such as music, films, books (c) Government and other administrative information. Open Data is included within the scope of the Open Knowledge Definition, which is alluded to in Science Commons' Protocol for Implementing Open Access Data.[28]

OpenPSI the (OpenPSI project) is a community effort to create UK government linked data service that supports research. It is a collaboration between the University of Southampton and the UK government, led by OPSI at the National Archive and is supported by JISC funding.