'Hands off our books, content miners! Those aren't cheap'

Automated research slurping is bad, claim publishers

Common Topics

Allowing people to use computers to 'mine' vast banks of copyrighted material would damage the economy, the UK Publishers Association has said. It wants to block an exception to copyright law for the practice.

A government-commissioned review of intellectual property (IP) laws earlier this year recommended that researchers should be able to use 'data mining' techniques without fear of falling foul of copyright law. The Government has backed the finding.

The Publishers Association is a UK trade body that represents book, journal, audio and electronic publishers. Its chief executive, Richard Mollet, said that introducing a content mining exception would overload internet networks, create commercial risks for publishers and place the UK at a competitive disadvantage with other EU countries. He called on the focus to be on improvements to the "complexities" involved in licensing content mining instead.

Mollet said publishers must be able to control access to their documents. Content mining involves the use of technology to go through vast amounts of content in an automated way to extract information from it.

"If publishers lost the ability to manage access to allow content mining ... the platforms would collapse under the technological weight of crawler-bots," Mollet said in a blog.

"Some technical specialists liken the effects to a denial-of-service attack; others say it would be analogous to a broadband connection being diminished by competing use. Those who are already working in partnership on data mining routinely ask searchers to 'throttle back' at certain times to prevent such overloads from occurring. Such requests would be impossible to make if no-one had to ask permission in the first place," he said.

"Then there is the commercial risk. It is all very well allowing a researcher to access and copy content to mine if they are, indeed, a researcher. But what if they are not? What if their intention is to copy the work for a directly competing-use; what if they have the intention of copying the work and then infringing the copyright in it? Sure they will still be breaking the law, but how do you chase after someone if you don’t know who, or where, they are? The current system of managed access allows the bona fides of miners to be checked out. An exception would make such checks impossible," Mollet said.

This data's not for any Tom, Dick and Harry

"Which leads to the third risk. Britain would be placing itself at a competitive disadvantage in the European and global marketplace if it were the only country to provide such an exception (oh, except the Japanese and some Nordic countries)," he said. "Why run the risk of publishing in the UK, which opens its data up to any Tom, Dick & Harry, not to mention the attendant technical and commercial risks, if there are other countries which take a more responsible attitude?"

In a review of the UK's IP laws earlier this year professor Ian Hargreaves recommended that researchers should be able "to use modern text and data mining techniques" without worrying about infringing copyright. Hargreaves also said that the UK should "press at EU level" for an exception to be introduced into copyright laws for "text mining and data analytics for commercial use".

Hargreaves' recommendations were supported by the Government in its official response to his review in August. The Government said it was "not persuaded" that publishers needed to be able to license text mining, stating that it was not "in the UK's overall economic interest".

Mollet admitted that licensing content mining can be "complex" but said that this was no reason to "eradicate" copyright around the practice.

"The main problem with content mining would appear to be complexity with the different technologies involved and the fact that not all content can be mined from the same place," Mollet said.

"An exception would do next to nothing to solve these questions, and whilst it would clearly sort out the problem of complex licensing, it seems a bit extreme to cut the Gordian knot in this fashion," he said. "Where the problem is one of complexity we should look to simplify and clarify, not eradicate. Content mining is a massively exciting technology and will be a true game-changer in the field of academic research. It will only realise its full potential if it is developed with sensitivity to the surrounding environment."

Mollet said that some academic institutions have been able to "work through" complexities in content mining licensing and that publishers were trying to improve the licensing system. He rejected claims that publishers do not open up access to their material for content mining.

"A recent pan-European study found that 90% of publisher respondents always grant permission or licences. These are refused in only 12% of cases," Mollet said.

"Publishers are not acting like aggressive bouncers, denying access to works and data. Rather, they are like good maitre d’s - helping people to get the most out of the available service, and ensuring that the engagement is managed and not chaotic," he said.