EU Bill On Data Mining Lacks Ambition

European researchers have become frustrated in recent years by the restrictions European copyright laws put on their freedom to use text and data mining—two automated techniques for analyzing data—on resources they can legally access and analyze with nonautomated means. As part of its recent proposals to reform copyright laws, the European Commission has recommended lifting these restrictions, but only for academics. This is a good first step, but the EU should also allow everyone to take advantage of these more efficient and effective data-driven research methods.

The Commission’s proposal is good for European researchers in a wide range of disciplines, from bioinformatics to digital humanities. For scholars and scientists, access to the rigorously scrutinized work of their peers, such as academic journals and databases, has always been a vital resource. Researchers who subscribe to these sources can explore them using traditional keyword searches and meta-tags predefined by publishers, but that has serious limitations. Manually reviewing all of these sources is a slow and tedious process, the results of which are often inaccurate and incomplete.

Text and data mining is a powerful tool that allows researchers to plough into texts and datasets and interpret minute details. Data mining gives researchers the ability to not only find a needle in a haystack, but to quickly find and categorize all manner of small objects hidden in many hundreds or thousands of haystacks. For example, medical researchers can use technologies like natural language processing to quickly analyze the outcomes of thousands of clinical trials. This type of analysis supports efforts to develop data-driven precision medicine initiatives that use the latest evidence to deliver personalized treatments. Data mining cannot provide all of the insights gained from human experts closely studying texts, but it does allow researchers to use rapidly developing tools to draw on a much larger pool of literature and data to support their work.

The use of data mining on copyrighted material often falls foul of existing intellectual property laws because the technical process involves extracting data from its original source and copying it into another database for analysis. The proposed exemption is reasonable because it creates a special dispensation for data mining and does not alter other laws that prohibit the unauthorized extraction or reproduction of copyrighted works. After all, there is nothing illegal about “mining” databases manually; this technology only automates the process. A researcher could legally sift through many thousands of published works, note their findings with pen and paper, and then analyze the assembled notes. This is why an exemption for academics is not enough: This method should be legal for anyone. Copyright law should allow publishers to set the subscription fees for access to their content, prohibit unauthorized reproductions of their content, and receive appropriate compensation. But it should not require people with lawful access to content, such as paid subscribers, to seek approval from publishers for using automated research methods.

Some member states—such as the United Kingdom—have already implemented similar (and similarly inadequate) exceptions. But national legislation is insufficient; the issue should be tackled at the EU level, because research is often cross-border: Researchers and sources are spread across different countries. Unless the same rule applies throughout Europe, this work is very difficult. For example, Europeana.eu is an online repository of books, films, art, and other materials that have been digitized in various member states. A researcher could legally mine this archive from the UK while a colleague elsewhere could not—or the former could inadvertently commit a crime by mining a resource in the latter’s country.

It does not go far enough, but the Commission’s proposal does address this problem for the academic community. If the Commission is serious about building a Digital Single Market, then it should introduce a rule change that applies to everybody, not just academics. If everyone had this freedom, Europe would enjoy far greater opportunities for data-driven innovation in several sectors. Nevertheless, this exemption could be the first step towards that, so the Council and the Parliament should support it.

Nick Wallace is a senior policy analyst at the Center for Data Innovation. He previously worked as a government technology policy analyst at Ovum, a global consultancy based in London. He has a master’s degree in public policy jointly awarded from the Central European University in Budapest and the Institut Barcelona d’Estudis Internacionals and a bachelor’s degree in politics from Liverpool John Moores University. Wallace speaks English, Spanish, and German.