Europe Struggles With Licensing Text and Data Mining by
Marydee Ojala
Posted On June 10, 2013

You could characterize Licences for Europe, the European Commission’s Working Group 4 on text and data mining (WG4), either as a major meeting of the minds or as wishful thinking that scientists, librarians, publishers, and the entertainment industry could all just get along. Apparently, the wishful thinking characterization won the day with those wishes not granted as representatives of the library, research, and technology communities formally withdrew from the process just prior to the fourth scheduled meeting on May 29, 2013.

The organizations signing the withdrawal letter are:

The Association of European Research Libraries

The Coalition for a Digital Economy

The European Bureau of Library, Information and Documentation Associations

The Open Knowledge Foundation

The COMMUNIA Thematic Network

Ubiquity Press Ltd.

The Trans Atlantic Consumer Dialogue

The National Centre for Text Mining, University of Manchester

European Network for Copyright in Support of Education and Science

Jisc

WG4 was established in December 2012 as a joint initiative by the Internal Market and Services Directorate General, the Digital Agenda for Europe, and the Education, Culture, Multilingualism, and Youth Commissions. The intent was to look not only at text and data mining (TDM), but it was also to explore cross-border access and the portability of services (WG1), user-generated content and licensing (WG2), and the audiovisual sector and cultural heritage (WG3). These areas of concern related to the more overarching copyright reforms being considered by the European Union.

The disillusionment with WG4 set in early for library and research organizations. At the first meeting on Feb. 4, 2013, the organizations noticed that membership was skewed toward the publishing and entertainment sectors. The group was also worried that the agenda concentrated too much on adding licensing requirements for text and data mining rather than looking at implementation of exceptions for researchers.

The position of the publishing and entertainment sectors was that TDM constitutes a threat to their revenue bases and puts an enormous strain on their servers. Their solution, which many thought was a “done deal” before WG4 even held its first meeting, was to require a second license for TDM, one that would be added onto existing licenses for access to publications.

An open letter to commissioners Michel Barnier, Máire Geoghegan-Quinn, Neelie Kroes, and Androulla Vassiliou (dated Feb. 26, 2013), states, “It appears the research and technology communities have been presented not with a stakeholder dialogue, but a process with an already predetermined outcome—namely that additional licensing is the only solution to the problems being faced by those wishing to undertake TDM of content to which they already have lawful access.” Those signing this letter represent more than 60 organizations.

The situation worsened. It was only after the second meeting (on March 8, 2013) that a response from the commissioners to the February letter was forthcoming. Their letter, dated April 15, 2013, was a masterpiece of bureaucratizing, saying, “[Y]ou are of course free to point to all the issues and any limitations of current licensing models and indicate your preferred options.” It ends by reminding the recipients that the conclusions of WG4 are “only of an information nature and do not bind any of the participants.”

Far from being mollified, the members of the library and research community saw the commissioners’ response as failing to address their concerns. TDM, in their opinion, is crucial to the next steps in research. Researchers’ ability to use computer programs to extract pertinent data from large numbers of scientific papers creates knowledge previously hidden. In the current era of Big Data, the opportunities to uncover groundbreaking information that could have profound implications in medicine, science, and technology are enormous. However, the licensing agreements that their institutions have signed with the publishers makes TDM either legally prohibited or financially untenable—or both.

The withdrawal letter, dated May 22, 2013, acknowledges the April 15th communication from the commissioners, but it strongly emphasizes the failure of WG4 to consider any path forward other than licensing. It also notes the “urgent need” to lower legal barriers to become more competitive with the U.S., Japan, and South Korea, and it stresses the growing importance of open access (OA) content in the scientific community. The letter explicitly states the worry that “our participation in a discussion that focuses primarily on proprietary licenses could be used to imply that our sectors accept the notion of double licensing as a solution. It is not.”

The withdrawal letter further states: “We maintain that a vibrant internet and a healthy scholarly publishing community need not be at odds with a modern copyright framework that also allows for the barrier-free extraction of facts and data.” The frustration of the library and research community at not being heard in WG4 is obvious; another source of frustration is the lack of transparency. In the withdrawal letter, the group also requests that the list of organizations participating in Licences for Europe be made public and that final documents emanating from WG4 be clear that it is not endorsed by the organizations that have withdrawn from the group.

Peter Murray-Rust, a chemist at the Unilever Centre for Molecular Informatics, Cambridge University (U.K.), staunch advocate of OA and open science, and a member of WG4, coined the phrase, “The right to read is the right to mine.” This phrase was picked up by the critics of WG4, much to Murray-Rust’s delight. In a blog post, he decries the current state of publishers demanding extra payment for TDM, an activity he does not consider an added service: “The only thing the publishers are doing is holding us to ransom.”

On one hand, it’s a shame that these discussions among stakeholders in the European copyright and TDM issues could not reach even a preliminary agreement. Certainly, the hope going into the initial meeting was that a common ground could be found. However, when the library and research communities discovered that their views were not welcome and would not be taken into account, since the entire discussion was about additional licensing requirements, they saw no alternative to removing themselves from WG4.

No one wants copyright to stifle creativity and innovation. However, reconciling the positions of the divergent stakeholders on how to accomplish this proved extraordinarily difficult. Moreover, TDM technology moves at a much faster pace than the torpidity endemic to governmental deliberations. As more data moves to the open web, where licensing does not present the same barriers to text and data mining, the discussion is likely to change.

Additionally, common ground has been found among those withdrawing, and there is hope that consensus about legal, licensing, payment, and technology issues can occur. CrossRef is developing a possible solution to the TDM licensing dilemma, which it is tentatively calling CrossRef Prospect. This would consist of a publisher API for researchers wanting to access full text for TDM purposes. As this is still in a relatively early stage of development, it’s not yet clear whether this will satisfy both the publishers and the researchers. If it’s successful, we will indeed have a meeting of the minds rather than dysfunctional wishful thinking.

Marydee Ojala is the editor-in-chief of Online Searcher magazine, chairs WebSearch University, and is Program Development Director for Enterprise Search & Discovery.