April 17, 2012

News! Elsevier has agreed that that researchers at the University of British Columbia can text-mine Elsevier content for a wide variety of purposes, including:

direct analysis for research

selection of excerpts for citizen science, and

calculating statistics on the usage of research objects for open dissemination in research tools.

I believe this is an epic win. Let me tell you why.

First, this agreement is out in the open. Publishers have traditionally required that their contracts with libraries are secret: prices and terms. When terms are open it means that other libraries can determine if they are getting a fair deal, researchers can know how publishers are facilitating/inhibiting reuse of their content, and we can all assess if a publisher’s behaviour matches its rhetoric.

Second, these terms are head and shoulders ahead of what standard contracts have allowed. Want to know what standard contracts allow? NO TEXT MINING AT ALL. (excerpts collected in the face of secret agreements). In my n=1 sample of negotiating for text-mining rights, the standard text-mining-is-allowed clause suggested by publishers does not allow text mining result data to be disseminated outside the university. In contrast, the terms Elsevier is permitting in this agreement allow the sort of broad uses that are the future of research: combining text-mining with citizen science, using text-mining to power tools for researchers, open dissemination of aggregate results, and the like.

As such, the terms of this agreement should serve as a minimum template for what publishers offer (and subscribers insist upon) within standard subscription agreements going forward. Libraries, you don’t know when your researchers are going to need this. Get it for them now so they have it when they need it — negotiating when they need it is a serious delay to research.

Third, Elsevier is not charging UBC any more money for these terms.

Fourth, Elsevier has agreed that the text mining software can reside on computers of UBC researchers — rather than those within the university library IT system — when text mining is done in ways that does not create a large corpus of full text (for example text mining via api and on-demand processing). This is empowering for researchers and avoids an unfunded burden for libraries.

Finally: are you convinced yet that blogging and tweeting about your research is totally worth it? :)

I hasten to add that the agreement between UBC and Elsevier has not yet been signed off officially… Elsevier and I have agreed on the phone and via a brief email that they will extend these rights, but the language of the original letter needs to be amended to reflect these terms and agreed upon by UBC proper. I’ll post when that is complete… I’ve been warned by UBC this sort of thing often takes at least 6 weeks.

The agreement here isn’t perfect: there are lots of things researchers might want to do that aren’t covered. And frankly, although I’m happy to have these new terms and consider it progress, I don’t think this approach is the best one for research. There are other approaches for establishing text mining access that move the power rather than just extracting better terms. Peter Murray-Rust is asserting his rights to mine subscription content directly, giving publishers notice, and then just doing it. Moves are afoot in the UK to reform copyright to explicitly allow text-mining. An increasing number of researchers are choosing to publish in gold CC-BY open access journals: a solution that enables reuse by anyone for any purpose. Policies that require libre open access for all publicly-funded research (after embargo) continue to gain momentum. All of these solutions remove the need to ask publishers for permission — and really, doesn’t the idea of *asking publishers for permission to use research* grate? it should. I’m still boycotting with my research papers and reviewing hours.

That said, we are where we are, and we need to be moving the ball ahead in all ways.

That means all of us. Copy the link to this post and email it to your university librarian. Right now. Ask him or her if your institution has text mining rights in its contracts with all its publishers…. and tell them that you want what UBC has :)

April 13, 2012: I sent email to my Elsevier contacts alerting them to my blogpost and summarizing my disappointment that they chose to limit reuse dissemination to “scholarly communication.” David Tempest wrote back immediately and said “I don’t know why you feel this is preventing you from doing your research – we developed this to allow you to continue.” I responded in email:

This is the part of the letter that suggested to me you were restricting this agreement to the first of my three use cases:

UBC may not… “Make all or any portion of the Subscribed Products available to anyone other than an Authorized User and other than as publishing the text mining results via scholarly communication.”

I interpreted “via scholarly communication” to mean traditional publishing of my research results in blog posts and conferences and journals with reasonable-but-not-excessive amounts of supplementary data.

My second use cause involves making some of the research articles (or, instead if you request, excerpts of the research articles) identified through text mining available in a limited way to citizen scientists so they can help with semantic markup. These citizen scientists wouldn’t be UBC authorized users, and I wouldn’t have expected this use case to be included in “via scholarly communication.”

My third use case involves disseminating text mining results within a research tool for use by researchers. I wouldn’t have expected this to be considered “via scholarly communication” either.

Is your intent to facilitate all of these uses? If so, fantastic! Do you think the terms in the letter do in fact cover them right now?

We planned phone calls to continue the conversation.

April 16, 2012: Quick phone call with David Tempest. He asked for more detail on my third use case. I explained the way we hope to use text-mining results within total-impact, including plans to disseminate the counts openly with snippets of context, with hyperlinks from the aggregate counts to the articles themselves on Elsevier’s own website. He said he needed to check with lawyers and team about my third use case but was optimistic and would get back to me ASAP.
I also asked if it was necessary that the text mining system be housed in the university library IT system as the letter implied. David replied that was only necessary when the use involved establishing a large corpus of articles, not if I intended to use the API or process articles on the fly.

30 minutes later: David wrote back and said “I have already spoken to my colleagues and we are happy for you to proceed as we discussed with the third element of your email. No more issues to discuss, so please proceed!”

At this point I believe the written language of the agreement needs to be updated and clarified to reflect our verbal understanding. I’m meeting with UBC librarians in person this week.

I found some researchers at MPOW who would use such access and then contacted the person at the main campus who would negotiate this. Interestingly, she said we’ve always had this right we just have to let them know first!

[…] most influential living palaeoartist, Lipps has had a hugely distinguished career, and Piwowar is in the vanguard of the current efforts to mainstream the text-mining techniques that we can all see are the […]