A Scientist and the Web

Menu

Open Access and Open Data: licences, policies and other constraints

I am trying to follow up the value and deficiencies of attaching a formal Open Data licence to a chunk of information. I'd written a good deal when I also saw Peter Suber's blog: Whether or not to allow derivative works: (Thomas Lemberger, Open Access: Derivs or No Derivs? It's your call!The Seven Stones, November 1, 2007. Excerpt follows) [this is therefore a long post as I list the arguments - which are instructive - and make additional comments. This post is followed by a second one on the relationship to Open Source]

TL: I am pleased to announce that Molecular Systems Biology has changed its license to publish for all articles accepted after October 1st, 2007 (see updated instruction to authors). The new license allows our authors to choose between two Creative Commons licenses: one that allows the work to be adapted by users (by-nc-sa), the other that does not allow the work to be modified (by-nc-nd)....

Our content is therefore not only freely available to all but our authors can now also decide to make their research fully open for reuse and adaptations.

The current explosive development of data and text mining, semantic-web and information aggregation technologies is profoundly changing the publishing landscape (eg Tim O'Reilly visits Nature). When we were contacted a few months ago by the OpenWetWare community who envisaged the "wikification" of one of our Reviews (see post), we decided that Molecular Systems Biology should strongly support such initiatives by providing our content in an as open form as possible. Our Senior Editors fully supported this transition to a more open license but also encouraged us to allow authors to have some influence on the decision.

Providing authors the possibility to choose their license has some decisive advantages: first, by enforcing a conscious choice by authors it will inevitably raise awareness on the implications of the various publication licenses; second we would like to see the question of "what should be open access" being addressed in a more democratic way by the community itself rather than through incantations of what the ideal solution should be. My guess – and my personal hope – is that most of the authors will indeed choose the most open version of the license, but I think that it is important to respect the opinions of those who think differently and who would feel uncomfortable with the idea that their article can be remixed or adapted without them being aware of it.

Our attitude is motivated by the fact that, at Molecular Systems Biology, we see the role of a scientific journal more as a catalyst facilitating and accelerating scientific discovery rather than a policy-making instrument. What is Systems Biology? Rather than providing a rigid definition of a rapidly evolving field, we prefer to let the community define the scope of this field and we adapt to it. What is open access? Rather than relying on a dogmatic position in a still fluid situation, we prefer to let scientists define their priorities.

MP: We are grateful to Thomas Lemberger for his response to the recent PLoS Biology editorial concerning the confusion about open versus free access. We thank him as well for pointing out that authors at Molecular Systems Biology are now given a choice between two Creative Commons licenses when they publish their work. The announcement of a new option for their authors of an alternative “Share Alike” licence was not available as the PLoS Biology editorial went to press. We certainly agree with him that open access offers a tremendous potential for researchers and scientific publishing. However, in our view, no matter how well-intentioned this new policy might be, it will only lead to further confusion.

As noted in our editorial, all the research articles published in Molecular Systems Biology still end with the statement that the article is published under a Creative Commons Attribution License – see for example this article. This remains misleading, because only one Creative Commons Attribution License allows any kind of derivative reuse subject only to appropriate attribution of the authors. If you follow the license link at the bottom of the article cited above you find that the license is quoted as an “attribution, non-commercial, no derivative works” license – one of the most restrictive of the Creative Commons licenses ([see this]...summary of the licenses). The Creative Commons web site explains the meaning of “no derivative works” as follows: “You may not alter, transform, or build upon this work”. This is not open access.

The new “share alike” choice now offered to Molecular Systems Biology authors is closer to the accepted definition of open access, but includes the “non-commercial” and “share alike” restrictions, which means that any derivatives that are created have to be distributed under the same license terms. While we agree with the sentiments underlying this licence (in that it potentially promotes open access) – it is still restrictive, which is why open access publishers such as PLoS, BioMedCentral and Hindawi have chosen to use the Creative Commons Attribution License.

In effect, Molecular Systems Biology offers authors the choice between free access to their work and open access (with some restrictions). This means that the content of the journal is not all available open access. It is therefore not correct to say the “Molecular Systems Biology is an open access journal” as it does at the bottom of the research articles.

It is unfortunate that the PDFs of the articles published in Molecular Systems Biology lead to further confusion. The PDF of the article available at this link http://www.nature.com/msb/journal/v3/n1/pdf/msb4100156.pdf has a copyright line at the top indicating that the copyright belongs to EMBO and the Nature Publishing Group and that all rights are reserved....

It seems to us that we share many of the same goals as the editors Molecular Systems Biology, and so we urge them to work with their publisher to rationalize and simplify the license policies of their fine journal.

PeterS: Comment. A few background thoughts:

It's healthy and useful to debate which licenses best promote research. Moreover, it might even be productive. The best way to debate subtly different shades of openness is to debate explicit, well-crafted licenses, and all the CC licenses are explicit and well-crafted.

However, the public definitions of OA (from Budapest, Bethesda, and Berlin) do not have the same sharp edges that explicit licenses do. Moreover, they differ on some fine points, including the one under discussion here. For example, the Bethesda and Berlin definitions allow derivative works, but the Budapest definition allows authors to disallow derivative works that would interfere with "the integrity of their work".

Because the BBB definitions don't settle the question, I think it's more productive to debate policy than labels --what promotes research rather than what deserves the name of "open access". If everything that satisfies at least one of the BBB definitions is OA, then both sides are talking about OA here.

I'm not saying that clear labels aren't useful, or that the label "OA" isn't usefully clear. I'm saying that when the label covers both policies under discussion, then we don't gain by debating the label and should focus instead on specific advantages and disadvantages of the two policies.

PeterS: One response to this situation might be to revisit and revise the public definitions. There might be some gain in that. But even if we did, I'd want any newly revised definition would include some latitude for variation and flexibility --within limits, of course.

My own preference is for the straight, unadorned attribution license (same as PLoS Biology), essentially permitting every use except plagiarism. I wish all OA journals would use it. But several other decisions, including the decision to disallow derivative works, fall within the boundaries of OA.

PMR: I think this is a very timely analysis besides having Peter's normal clarity. For ca. 2 years I have regarded BBB as almost scriptural - an algorithm for OA. I am coming round to the view that an "OA sticker" on a document or chunk of data is a useful indication that the author has thought about the issue and that there is some intention to make something freely available to the community. A CC-BY licence is clear and is a great step forward - like Peter and Mark I urge everyone to use it as all other CC-* cause problems.

However CC-* is a coarse-grained instrument. No doubt it will be tested in law for academic publications or scientific data ( ... tell me it already has) but things shouldn't get that far normally. There should - in addition - be a clear statement of the policy of the authors or the publishers or the funders or all of them. This policy might not have legal standing but should command moral respect in the community.

Since I and my colleagues publish Open Data we have encountered some of the problems (but by no means all). Here are some examples of issues not easily covered by a licence (it could be argued that CC-ND covers these but it is too blunt IMO)

what components of a data set (including metadata) must be retained so as not to break its integrity?

should reference data and formal specifications be regarded as sacrosanct? I don't really want people editing the CMLSchema and republishing it - all the software will break.

What happens a re-user edits mistakes into a data set which still retains the address and authorship of the primary author?

can an author require that the institutional branding (e.g. logos) are retained on recirculated works? Can other institutional logos be added?

can these works be used for marketing third-party products, with implied endorsements.