The Fate of Text and Data Mining in the European Copyright Overhaul

The current European Digital Single Market copyright negotiations involve more than just the terrible upload filter and link tax proposals that have caused so much concern—and not all of the other provisions under negotiation are harmful. We haven't said much about the text and data mining provisions that form part of this ambitious legislative agenda, but as the finalization of the deal is fast approaching, the form of these provisions is now taking shape. The next few weeks will provide Europeans with their last opportunity to guide the text and data mining provisions to support coders rights, open access, and innovation.

Text and data mining, which is the automated processing and analysis of large amounts of published data to create useful new outputs, necessarily involves copying at least some of the original data. Often, that data isn't subject to copyright in the first place, but even when it is, copies made in the course of processing generally fall within the scope of the fair use right in the United States.

But European countries have no such fair use right in their copyright law. Instead, they have a patchwork of narrower user rights, which vary from one country to another. Although some states have introduced rights to conduct text and data mining, there is little consistency between them. As such, the legality of text and data mining conducted in Europe is questionable, even though it doesn't result in the creation of anything that resembles the original input data set. Worse still, Europe also has a separate copyright-like regime of protection for databases, which has no equivalent in the United States. Text and data mining activities could also run afoul of these database rights.

Recognizing the usefulness of text and data mining to scientific research, the European Commission proposed to clarify its legality by adding a new optional text and data mining right to European copyright law. Provided that those exercising the right had lawful access to the input data in the first place, they would not have to acquire any additional license to perform text and data mining on such data, for either commercial or non-commercial purposes—and, importantly, the copyright owner would not be able to prohibit them from doing so by contract.

However the Commission's proposal also contained a number of limitations that made it less useful than it ought to have been. Its three biggest limitations were that:

It only allowed research organizations to conduct text and data mining activities, excluding independent researchers, small businesses, libraries and archives, and others who might otherwise wish to make use of the exception.

Text and data mining could only be conducted for the purpose of scientific research, excluding other purposes such as education, archival, or literary criticism.

It would do nothing to prevent copyright holders from using DRM (digital locks with legal reinforcement) to make the exercise of the right practically impossible.

Proposals to Strengthen or Weaken the Commission's Proposal

In February 2018 an in-depth analysis [PDF] of the provisions was published for the Legal Affairs (JURI) Committee, which has leadership of the Digital Single Market dossier within the European Parliament. This analysis identifies the limitations mentioned above, and provides recommends for addressing some of them; perhaps most notably, "clearly spelling out that both Technological Protection Measures (TPMs) and network security and integrity measures should not undermine the effective application of the exception."

Following up on this, in late March 2018 by a letter to the Legal Affairs (JURI) Committee from a coalition of 28 groups including EIFL (Electronic Information for Libraries), the European University Association (EUA), and Science Europe, made four concrete recommendations that would strengthen the Commission's proposal by:

Broadening it to include any person (natural or legal) that has lawful access to content, provided that reproduction or extraction is used for the sole purpose of text and data mining.

Affirming that contractual terms restricting the use of the right should be unenforceable.

Clarifying that DRM cannot be used to unreasonably restrict the exercise of the right.

Allowing datasets created for the purpose of text and data mining to be stored on secured servers for future verification.

But countering these recommendations, some member states would like to weaken the text and data mining right, rather than strengthening it. Last week the Bulgarian Presidency of the Council of the European Union asked member states, [PDF] “Should the scope of the optional exception for text and data mining provided for in Article 3a be limited and to what extent, for example to temporary copies of works and other subject matter which have been made freely available to the public online?” Their answer, expected to be given at Friday's meeting of the Committee of the Permanent Representatives of the Governments of the Member States to the European Union (COREPER), may determine the version of the proposal that goes to a vote.

We are encouraging all our European members to contact their representatives about an upcoming vote on the European copyright proposals in the JURI Committee. Along with the most serious problems with the proposal—the link tax in favor of news publishers (Article 11) and the upload filtering mandate on Internet platforms (Article 13)—the Article 3a text and data mining right is also included in the upcoming vote. When you contact your representatives about the sweeping and dangerous copyright proposals, tell them your thoughts about the importance of protecting text and data mining too. Although the details are complex, you can keep to one simple message—that Articles 11 and 13 should be eliminated, and that Article 3a should be kept and strengthened.

Related Updates

The recent omnibus bill passed by Congress contains a nugget of good news for those interested in access to publicly funded research. Open access activists have long been asking for reports by the Congressional Research Service, or CRS, to be made publicly and easily available. CRS creates a vast array...

Communities across the United States are considering strategies to protect residents’ access to information and their right to privacy. These experiments have a long history, but a new wave of activists have been inspired to seek a local response to federal setbacks to Internet freedom, such as the FCC’s decision...

Threat of Imprisonment for Colombian Scientist Demonstrates the Far-Reaching Implications of Copyright Policy In 2011, Colombian graduate student Diego Gómez did something that hundreds of people do every day: he shared another student’s Master’s thesis with colleagues over the Internet. He didn’t know that that simple, common act could ...

Nearly six years ago, Internet user communities rose up and said no to the disastrous SOPA copyright bill. This bill proposed creating a new, quick court order process to compel various Internet services—free speech’s weak links—to help make websites disappear. Today, despite the failure of SOPA, a federal...

The public should be able to read and use the scientific research we paid for. That’s the simple premise of the Fair Access to Science and Technology Research Act, or FASTR (S. 1701, H.R. 3427). Despite broad bipartisan support on both sides of the aisle, FASTR has been...

The global movement for open access to publicly-funded research stems from the sensible proposition that if the government has used taxpayers' money to fund research, the publication of the results of that research should be freely-licensed. Exactly the same rationale underpins the argument that software code that the government...

It’s almost too strange to believe, but a federal court ruled earlier this year that copyright can be used to control access to parts of our state and federal laws—forcing people to pay a fee or sign a contract to read and share them. On behalf of Public.Resource.Org...

When you pay for federally funded research, you should be allowed to read it. That’s the idea behind the Fair Access to Science and Technology Research Act (S.1701, H.R.3427), which was recently reintroduced in both houses of Congress.
FASTR was first introduced in 2013, and while it has strong support...

There’s a bill in the California Assembly that we think would make postsecondary education more expensive for students. Not only that: we think that it would undermine students’ right to make fair uses of educational materials. To make matters worse, several states around the country appear to be considering similar...