Sunday, 13 November 2011

On releasing museum data and the importance of licenses

I've been preparing for the workshop on 'Hacking and mash-ups for beginners' I'm running at the Museum Computer Network conference (MCN2011) this year, which as always means poking around the GLAM APIs, linked and open data services page for some nice datasets to use in exercises. Meanwhile, people have been using NMSI data at Culture Hack North this weekend, and a question from that event made me realised I never blogged here about the collections data released by NMSI (i.e. the UK Science Museum, National Media Museum and National Railway Museum) back in March 2011.

We’ve released the files [218,822 object records, 40,596 media records and 173 event records] as a lightweight experiment – we’d like to understand whether, and if so, how, people would use our data. We’d also like to explore the benefits for the museum and for programmers using our data – your feedback will inform decisions about future investment in more structured data as well as helping shape our understanding of the requirements of those users. The files are in CSV format – because it’s a really simple format, viewable in a text editor, we hope that it will be usable by most people.

And since someone asked for some background on how I dealt with the organisational issues, the short answer is - I was pragmatic, figured any reasonable data was better than none, and kept it simple. Or, as I wrote at the time in Update on collections data and geocoded NRM data:

A few people have commented on the licence (Creative Commons
Attribution-NonCommercial-ShareAlike, CC BY-NC-SA) and on the format
(CSV). As tomorrow is my last day, I can’t really speak for the museum
but the intention is to learn from how people use the data – the things
they make, the barriers they face, etc – and iterate (as resources
allow) until we get to an optimal solution (or solutions). So please get in touch
if you’ve got requests or think you can help clear up some of the
issues these kinds of projects face, because there’s a good chance
you’ll help make a difference.

The licence is a pragmatic solution – it’s clarification of existing
terms rather than a change to our terms, because this avoided a need for
legal advice, policy review, etc, that would have added several months
to the process.

And yes, I know CSV is quick and dirty, but it’s effective. The
museum sector is still working out how to match the resources available
with the needs of mash-up type developers who work best with JSON and
those who are aiming for linked open data; my hope is that your feedback
on this will help museums figure out how to support people using open
data in various forms. A simple solution like this also means it’s easy
for the museum to re-run the export to update the data as time goes on,
and that anyone, geek or not, can open the files without being startled
by angle brackets and acronyms. Also, did I mention it was quick?

In some ways, 2011 has been the year I really understood how much of a barrier a 'non-commercial' license is to re-use ('Wired releases images via Creative Commons, but reopens a debate on what “noncommercial” means' is quite a useful article for understanding the confusion though the LOD-LAM Summit was really where it came together for me). Even I've struggled with questions like 'does a non-commercial license mean I can or can't upload the data to Google Fusion Tables to clean it?', let alone 'can a widget made with non-commercial data be displayed on an ad-supported blog site?'.

Most people who want to play with heritage data want to do the right thing, so an ambiguous 'non-commercial' license effectively prevents them using it (people who want to do bad things with it would probably just scrape the data anyway). I get the sense that museums (and other GLAM orgs) are strongly loss averse, so a full 'commercial use ok' statement might be a bit much, but maybe we can do more to define exactly what's reasonable 'commercial' use and what's not? The Wired article provides some useful starting questions, as does Europeana's discussion of their Data Exchange Agreement. Maybe 2012 will be the year we start to provide answers...

Update, January 2013: I've been writing a piece on open cultural data in museums so have been coming across more material on confusion about 'non-commercial'. The Danger of Using Creative Commons Flickr Photos in Presentations discusses one case where the owner of a photograph was confused about whether it was being used commercially or not. While that may turn out to be a case of mistaken identity, one commenter, Michael, says:

'Commercial and non-commercial are very difficult to determine. As
such, I make a point of never using photos that have a non-commercial
license. Too much hassle. (I also now do not use photos with a
share-alike provision. Same reason, too much hassle.)'

I thought the default was that the publisher had copyright over their content, unless they specifically said otherwise? Either way, at the time there were no resources to use anything other than an existing license to at least allow for some re-use.