The library, archives and museums (i.e. LAM) community is increasingly interested in the potential of Linked Open Data to enable new ways of leveraging and improving our digital collections, as recently illustrated by the first international Linked Open Data in Libraries Museums and Archives Summit (LOD-LAM) Summit in San Francisco. The Linked Open Data approach combines knowledge and information in new ways by linking data about cultural heritage and other materials coming from different Museums, Archives and Libraries. This not only allows for the enrichment of metadata describing individual cultural objects, but also makes our collections more accessible to users by supporting new forms of online discovery and data-driven research.

But as cultural institutions start to embrace the Linked Open Data practices, the intellectual property rights associated with their digital collections become a more pressing concern. Cultural institutions often struggle with rights issues related to the content in their collections, primarily due to the fact that these institutions often do not hold the (copy)rights to the works in their collections. Instead, copyrights often rest with the authors or creators of the works, or intermediaries who have obtained these rights from the authors, so that cultural institutions must get permission before they can make their digital collections available online.

However, the situation with regard to the metadata — individual metadata records and collections of records — to describe these cultural collections is generally less complex. Factual data are not protected by copyright, and where descriptive metadata records or record collections are covered by rights (either because they are not strictly factual, or because they are vested with other rights such as the European Union’s sui generis database right) it is generally the cultural institutions themselves who are the rights holders. This means that in most cases cultural institutions can independently decide how to publish their descriptive metadata records — individually and collectively — allowing them to embrace the Linked Open Data approach if they so choose.

As the word “open” implies, the Linked Open Data approach requires that data be published under a license or other legal tool that allows everyone to freely use and reuse the data. This requirement is one of most basic elements of the LOD architecture. And, according to Tim Berners-Lee’s 5 star scheme, the most basic way of making available data online is to make it ‘available on the web (whatever format), but with an open licence’. However, there still is considerable confusion in the field as to what exactly qualifies as “open” and “open licenses”.

While there are a number of definitions available such as the Open Knowledge Definition and the Definition of Free Cultural Works, these don’t easily translate into a licensing recommendation for cultural institutions that want to make their descriptive metadata available as Linked Open Data. To address this, participants of the LOD-LAM summit drafted ‘a 4-star classification-scheme for linked open cultural metadata’. The proposed scheme (obviously inspired by Tim Berners-Lee’s Linked Open Data star scheme) ranks the different options for metadata publishing — legal waivers and licenses — by their usefulness in the LOD context.

In line with the Open Knowledge Definition and the Definition of Free Cultural Works, licenses that either impose restrictions on the ways the metadata may be used (such as ‘non-commercial only’ or ‘no derivatives’) are not considered truly “open” licenses in this context. This means that metatdata made available under a more restrictive license than those proposed in the 4-star system above should not be considered Linked Open Data.

According to the classification there are 4 publishing options suitable for descriptive metadata as Linked Open Data, and libraries, archives and museums trying to maximize the benefits and interoperability of their metadata collections should aim for the approach with the highest number of stars that they’re comfortable with. Ideally the LAM community will come to agreement about the best approach to sharing metadata so that we all do it in a consistent way that makes our ambitions for new research and discovery services achievable.

Finally, it should be noted that the ranking system only addresses metadata licensing (individual records and collections of records) and does not specify how that metadata is made available, e.g., via APIs or downloadable files.

3 Responses to “4 Stars for Metadata: an Open Ranking System for Library, Archive, and Museum Collection Metadata”

I sometimes feel like I’m the only person in the world who defines ‘open access’ as non-commercial access. I honestly don’t get how people think charging for a resource somehow makes it ‘more free’, and I’m pretty sure there’s a sizeable foundation-type lobby making sure that the ‘open-as-commercial’ perspective holds sway. Well, I haven’t drunk the commercialism Kool-Aid, and consequently, I reject the proposal coming forth from the so-called ‘LOD-LAM Summit’ to create a 4-star definition of openness. If you can block access to something and demand payment for it, it’s not open. It is certainly not ‘more open’ than the non-commercial form of openness that most people actually want to use.

MacKenzie, this is an interesting model for illustrating ongoing discussions about sharing and licensing – thank you for presenting it so clearly.

Stephen, I have read your posts on this topic for years, and appreciate your interest in sharing knowledge and education. While I don’t sympathize with your fondness for ‘non-commercial’ licenses,I can follow your reasoning. But this conspiracy theory version of your view goes too far. Please assume good faith of those who disagree with you, and be moderate in assumptions about what “most people” want. No lobby is required to make people view public domain as the ‘most free’ license — it is, by many rules of thumb.

I cannot speak for most people, but in the communities I frequent – where highly distributed or multi-contributor reuse and derivative use are common – the problems with ‘NC’ restrictions, or any restriction beyond simple attribution, are well known. Metadata is the sort of knowledge that lends itself naturally to repeated revision, clarification, and expansion over time. I don’t think such data should be placed under copyright at all, and hope that placing it into the public domain becomes standard practice.

Stephen Downes: Restricting data to non-commercial use doesn’t make it more free. If the data is public domain, who cares if someone is including it in a published book that costs money? You can always get the data for free on the internet. By restricting use to non-commercial, however, you are effectively limiting the data to only be used on the internet since every other use costs money which needs to be recouped.