The Linked Open Data Cloud is a global network of semantically interconnected data and databases. The basic principle is that everyone can publish their data and databases in the Linked Open Data Cloud under an “open” license, that is a license allowing a free use, distribution and reuse of these data and databases. There are two incentives for libraries to publish their data in the Linked Open Data Cloud:

A published dataset in the Linked Open Data Cloud increases the visibility of an organisation.

Libraries can develop and offer new services by connecting their own data with data of other libraries.

Most often the license Creative Commons – no rights reserved is applied (CC0). CC0 is not really a license but rather a waiver of all rights. CC0 is applicable to data and databases. CC0 is widely used in the international library community to publish bibliographic data in the Linked Open Data Cloud.

The three surprises related to CC0

We faced many surprises when we started our internal discussion about the license under which our data should be published:

Surprise 1: Only few deep and comprehensive discussions take place when it comes to the decision about which license should be used to protect library data in the Linked Open Data Cloud. In 2012, so far no single one of the large library conferences in Germany devoted a workshop or special track on license models for library data but many about publishing data in the Linked Open Data Cloud!

Surprise 2: The library community has often only very simple answers to the question “Why are you licensing you data under CC0”? Typical answers are “because other big players are doing it”, “because you cannot protect catalog data”, “because it is a requirement if you want to link up with Europeana”, “because it was paid with public money”.

Surprise 3: Some argue that CC0 is the only way to avoid that our data is exploited commercially is to waive all rights we have. It is too early to assess this view. And we still have to wait for the first project results, e.g. of EC-funded projects targeting at new commercial services which fully rely on data licensed under CC0 (c.f. SME initiative on Digital Content and Languages of The European Commission)

CC0 – No attribution to the library

Of course, the advantage of CC0 is that no control is necessary if the conditions of a license agreement are complied by third parties. For third parties the benefit lies in the unrestricted use of the data and databases i.e. also commercial exploitation is allowed. At the same time the products licensed under CC0 are compatible with other data or databases that are published under an open license. This facilitates the development of new products and services which of in turn can increase the world-wide use of the data.

And, so the expectation of many organisations, if the world-wide use increases also the world-wide visibility of the originator will increase. But his is not necessarily the case. CC0 does not require any attribution of the organisation which originally provided the data in the Linked Open Data Cloud. The most significant disadvantage therefore is that data provenance becomes impossible if attribution is not required.

But why is this a disadvantage? To give the answer, a deep understanding of the full logic of Linked Open Data is necessary. The Linked Open Data Cloud is a network which nobody owns, which nobody controls and which does not have any quality assurance mechanism. One important indicator for quality is the reputation of the organisation providing data to the Linked Open Data Cloud. If data licensed under CC0 is used without attribution (e.g. for a new library service), it becomes impossible to assess the quality of the data anymore. Are libraries aware of it? But what else if not CC0? To answer this question, we suggest to think about Open Database Licenses or Open Data Commons:

Alternatives worth to think about

The Open Database License (ODbL) allows the free reuse and distribution of the database as well as the modification of the same and the creation of new products. ODbL is applicable to databases. Besides the attribution of the creator of the database, ODbL requires that new products that are generated using a database with ODbL must be released under ODbL or under an equivalent license (ShareAlike). This ensures that one can track where the specific database is used and which new products are generated with it. If needed, one has the opportunity to negotiate with commercially interested providers. Attribution is a mandatory requirement of ODbL, which always ensures the visibility of the creator of the database. Outside the library community, ODbL has received much attention, when OpenStreetMap changed their license from Creative Commons by-share-alike to ODbL.

If one does not want to apply ODbL to their data, other models do exist. Two of which are described here:

ODC-by: As a compromise between CC0 and ODbL one can choose the Open Data Commons Attribution License (ODC-by). This license does not have a ShareAlike-clause but requires the attribution of the database creator. This ensures visibility but does not prohibit the commercial use of the database.

Core metadata set: Another way to combine CC0 and ODbL is the creation of a reduced metadata set (core metadata set) based upon the full data set. The idea of a core metadata set is pursued by the German National Library (PDF). While the full database version is released under ODbL, the database with the reduced metadata set is published under CC0.

Perishing will follow the publishing hype

We are convinced that more options do exist, if more efforts are devoted to the question about which open license should be used for which type of library data and for which purpose. Still, to trigger this thinking a more visible and lively discussion is necessary. And we hope it will kick off soon. One opportunity will be the next Conference on Semantic Web in Libraries (SWIB). If not, perishing will follow the current Linked Open Data publishing hype.

View Comments

Thanks for the write-up! Two comments: in fact less than 20% of the LOD cloud datasets provide explicit license information [1]. On a related note: we’ve provided guidance how to improve on that situation [2].

The most significant disadvantage therefore is that data provenance becomes impossible if attribution is not required.

This is not true. Attribution licenses aren’t a necessary condition for others to provide provenance information. I too think that data provenance information is an important feature and one should of course provide provcenance information – even for the data you have completely produced yourself.

I’d argue the other way round: The importance of provenance information can lead to the conclusion that a legal requirement like an attribution license isn’t necessary. Thus, I think that any responsible data publisher who wants to build a good reputation will indicate the provenance of her data – whether the underlying third-party data is CC0-licensed or ODC-BY or whatever. And good provenance information will be better than attribution as provenance information goes beyond saying “This dataset contains data from the ZBW library catalog.”

Thus, I would be happy to see more discussion about actual provision of provenance information than about which open (ODC-BY and ODbL both – like CC0 – comply with the open definition) license to choose.

Thanks for this blog post that tries to raise some questions about licensing library data.

But it is not even clear that bibliographic data can be licensed at all.

Data sets are not subject to copyright laws per se. Only if integrated in databases, and some technical characteristics are met, law is enforced, at least in Europe.

So there is at least an unclear situation whether data distribution can be ruled by licenses at all. Licenses only allow things to do that would be forbidden otherwise.

So the question is, is it really forbidden to use library catalogs to access library material?

In the beginning, there was the question how to perform steps that ensure that library catalog data becomes “open” (assumed it had been “closed” before). The insight was, after releasing library catalogs as open data, many libraries may get confused and confronted with the same questions over and over again, in the sense “May I use your data if …?” It is known that library catalogs are a valuable asset for Big Data companies, the industry, and academic institutions. Bibliographic data is valuable since it is “real world data”, no data created in an artificial way.

The consequence of not freeing the data completely is that each library would have to run a legal department for answering legal questions, track permissions, build data customer databases, earn license fees, bring suits to a court and so on. The problem of this kind of business is that libraries and library catalogs were not invented to serve such a purpose – instead the data in catalogs should lower the barrier for the public to access library material down to zero for maximum public distribution.

It is an economical and very pragmatic decision to select CC0. It lowers the cost for open data in the library because no legal questions will arise and maximum public distribution is guaranteed.

The “surprises” are of theoretical nature and have no practical issues:

To “Surprise 1”: To discuss how to “protect library data” is far from being practical. Librarians want the opposite, they want free access to library data because it is public data and they understand libraries being a part of the public, just as science and education is a part of the public.

To “Surprise 2”: The answers of the library community are not “simple”. The finding of the Europeana license took a long time with many experts and many pros and cons have weighed in, as far as I know. And, yes, if librarians find the tax payer has a right to use the data he has funded, they are right. This can’t be a real surprise.

To “Surprise 3”: Nobody argued that CC0 is the only license for commercial purpose. The challenge was to answer the question what kind of license (if a license is the way to go) satisfies best the requirements for all kind of different users, private or institutional, commercially or non-commercially, and for the libraries themselves.

Good that you highlighted again your point that single library data sets (e.g. a catalog item) cannot be protected. This view however, does not apply to every type of catalog data. For example, if a catalog item contains an abstract of the catalogued paper this particular catalog item can be licensed – because the writing of the abstract requires an intellectual effort.

And just to make again my view very clear: I am fully supporting an open access policy for any data be it library data or research data. But I think the community has always the right to know who was the original creator of a data set. And CC0 cannot ensure this wish.

For me it is still too simple to argue “the tax payers pays so we have to make it public without any restriction”. Yes the tax payer paid and the public has the right to access, reuse the data – no doubt about that. But what about commercialisation of the data? What did we learn during the last decades about publication processes of scientific papers: The tax payer pays twice: First the scientific community writes and reviews the papers for the publishers; Second the scientific community buys the journals in which these papers have been published. What makes you so confident that we will not enter the same loop with data – regardless of whether it is research data or library data? And then the European Commission must help out similar to a recent proposal to publish only OA in EC-funded projects.

Maybe you will change your view on my third surprise when the first commercial vendor knocks at the door of your library and tries to sell you a new service which is fully and only based upon the data you originally created – or even worse if they sell you your data just in a different wrapping. Or maybe I will change my view, if this scenario will not become reality. In this case I think we both will be satisfied.