20 June 2011

There's considerable excitement about an announcement from the British Library and Google detailing a wonderful gift to the world:

The British Library and Google today announced a partnership to digitise 250,000 out-of-copyright books from the Library’s collections. Opening up access to one of the greatest collections of books in the world, this demonstrates the Library’s commitment, as stated in its 2020 Vision, to increase access to anyone who wants to do research.

Selected by the British Library and digitised by Google, both organisations will work in partnership over the coming years to deliver this content free through Google Books (http://books.google.co.uk) and the British Library’s website (www.bl.uk). Google will cover all digitisation costs.

Isn't that just swell? Vast quantities of fascinating books in the public domain are being made "available to all", as the press release trumpets:

This project will digitise a huge range of printed books, pamphlets and periodicals dated 1700 to 1870, the period that saw the French and Industrial Revolutions, The Battle of Trafalgar and the Crimean War, the invention of rail travel and of the telegraph, the beginning of UK income tax, and the end of slavery. It will include material in a variety of major European languages, and will focus on books that are not yet freely available in digital form online.

Freely available, too... But, er, exactly *how* freely available?

Once digitised, these unique items will be available for full text search, download and reading through Google Books, as well as being searchable through the Library’s website and stored in perpetuity within the Library’s digital archive.

Fab, and....?

Researchers, students and other users of the Library will be able to view historical items from anywhere in the world as well as copy, share and manipulate text for non-commercial purposes.

But hang on: these are materials that are in the public domain; public domain means that anyone can do anything with them - including commercial applications. So this condition of "non-commercial purposes" means one thing, and one thing only: although the texts themselves are public domain, the digitised texts are not (otherwise it would be impossible to impose the non-commercial clause).

In other words, far from helping to make knowledge freely accessible to all and sundry, the British Library is actually enclosing the knowledge commons that rightfully belongs to humankind as a whole, by claiming a new copyright term for the digitised versions. Call me ungrateful, but that's a gift I can do without.

43 comments:

Can I challenge your phraseology "Encloses the Public Domain" There is nothing in this that changes the status quo, if you have access to a copy of one of these books and want to put the effort in to digitise it you still can. You can then do what you like with that. What they seem to be saying is you can't make a profit from their efforts (and i bet if you rock up with a commercial licensing proposition they will listen).While there would be obvious benefits if they chose a more open licence to release these under, they aren't doing anything to trample your rights as you are heavily implying.

At the Europeana plenary in 2010 I repeatedly asked James Crawford, Engineering Director of Google Books, whether scanned public domain books would be available under a public doman licence. He finally admitted that they would be copyright Google, but freely available.

Not the same thing at all. You're right about Google enclosing the public domain. I'm so reminded of http://en.wikipedia.org/wiki/Enclosure

@tony: yes, you're right in legal terms; but practically, that's what they're doing.

For example, for books it owns where there are very few other copies in existence - and given the riches of the BL's collection I'm sure there are many - it will be hard if not impossible to find other institutions that will allow digitisation.

So the BL effectively has a monopoly on those books. And I doubt whether it would be keen to allow anyone else to digitise them and then give away the files to anyone.

Really interesting post and thread. I see and applaud the idea that the public domain should be a free and untrammeled space, and even that the public domain should imply 'commercial' as well as 'non-commercial' uses.

My problem is that very few people advocating this view have a practical answer to the question of how the very considerable costs of Digitisation and long-term preservation *should* be covered if these kinds of restriction are not permitted.

Given that no Government is ever simply going to fund the digitisation and unfettered re-use of the bulk of our cultural heritage, what model do you propose by which these costs should be covered?

Setting aside the usual suspicion that surrounds the motivations of large organisations, is there not an argument to say that the access which the BL and Google are providing is considerably better than the nothing which would take its place if such arrangements weren't made?

@Nick: you're of course right that a central issue is who pays for the digitisation?

There is an interesting parallel with Ordnance Survey data. Many people (myself included) wants it all available free, for any use. The same question comes up: who will pay for its collection?

Well, let's look at the US. There, geographic information is paid for by the US government, and then given away. Why? Because it has generated a multi-billion dollar industry based around geographic information (the Americans wouldn't give away this stuff if there weren't a profit - for the US - in doing so.) More details here:

http://www.guardian.co.uk/technology/2006/mar/09/education.epublic

Similarly, imagine the businesses that could be build around free access to the BL's holdings.

The current approach specifying "nonn-commercial" only is a failure of imagination that is really penny-wise but pound-foolish....

mrsean2k: no, there's nothing in the press release, nor did I say there was; I was pointing out a possible, plausible danger.

It's not "freely accessible to all and sundry" - freely accessible means being able to do what you wish with it; this is looking but not touching - not my idea of free (which I use in the "free software" sense - see the rest of this blog....)

But what worries me is that accepting this accepts the principle that a major institution can take public domain material and take it out of the public domain in this way as a "quid pro quo": this is about principles.

I was at the wikipedia / museums conference hosted by the British Museum last year which had a very interesting collection of attendees.

One of the presentations was by the National Portrait Gallery about their dispute with Wikimedia Commons who had posted high res versions of NPG pictures taken from the NPG website.

Each section of this presentation was named for a Barbra Streisand song. NPG have heard of the Streisand effect.

They explained that Wikimedia commons has now added a disclaimer to each of these pictures noting that Wikimedia thinks they are public domain but NPG claims copyright. This disclaimer includes a link to the NPG website.

NPG has decided to live with this and not pursue the matter further. This disclaimer creates just enough uncertainty to drive serious commercial users to get a contract with NPG.

Lady Bridgeman was in the audience that day and in the discussion afterwards she discussed the Corel vs Bridgeman case, voicing the opinion that the Bridgeman gallery were badly advised and were unlucky to lose that court case.

Wikimedia's response - the day after a U.S. court decides photos of old masters are copyrightable these will disappear from Wikimedia Commons. Until then they stay.

Having this content on the web with an unfree license is a first step but it isn't the end of this road. There are lots of people thinking about these issues and your post here is a useful addition to that discussion.

I'm struggling to see how the BL or Google can claim copyright over the resulting text (this is similar to, but clearer legally than) the hoo-hah over digitisation of images in the National Portrait Gallery a little while ago.

If Google is digitising the text, and if the text is faithful to the original (and if it isn't, then why isn't it?), then the google version of the text cannot attract copyright as it is not an original work.

We really need Google and/or the BL to be asked the question "will the text files of the scanned works be identical to the text of the originals", to which they will only be able to answer "yes" to retain any credibility, at which point they are essentially admitting that the text files are in the public domain.

If you want to put the effort in there is absolutely nothing to stop you from making your own digital copies if you want to use them for commercial purpose.

Of course it would be a better situation if Googles digital copies were completely open but that does nothing to change the fact that this is a quite fantastic project that will be a enormous use to many. Trying to spin this in a negative way is quite astonishing to me.

@Andrew: that's a hugely interesting comment. Are you essentially say *if* the text is identical, it has to be in the public domain, even digital versions, and that conditions can't be attached to it? Because that's clearly a very Big Thing if you are...

You've been right on the money about a lot of things recently Glyn! I followed you via rss for a while but warranted an official twitter follow due to your outstanding writing on a variety of freedom-centric tech topics.

@mrsean2k: because it's incidental to the main point: that the BL is claiming new rights in digitised public domain material. That's a terrible precedent for the future in terms of getting *all* public domain analogue texts online as public domain.

Yes - I am saying that if the textfile made available by Google/BL (typos and all) is identical to the text of the original book, then that text file will exhibit zero originality, and as such will be incapable of attracting copyright.

regarding the question of claiming copyright on digitizations of Public Domain books: Recently the German lawyer Till Kreutzer in a legal guide for German libraries on digitizing public domain material made clear that you in most cases and most probably cannot claim copyright or relative rights on digitizations of public domain works. See the guide "Digitalisierung gemeinfreier Werke durch Bibliotheken" (PDF), chapter 5.

@Adrian: thanks very much for that interesting link. As you say, it's German law, but the logic seems pretty universal - especially the section about automatic processes (like OCR) not giving rise to new copyright.

Having worked in AV archiving in the UK, my understanding is that converting an item into digital form (or migrating it to a new standard/platform) legally generates a new work. This work - even though it may be practically identical to an existing one - attracts copyright for the producer/publisher. Unless, of course, the producer waives that right.

This seems partly (if not wholly) nonsensical to me, but it's not the fault of the BL or Google that this is legally the case.

A deeper issue is the BL's position as a copyright library. If, for instance, an author thinks a work of theirs has been plagiarised, they have legal recourse to the BL's catalogue to prove that they originated the work/idea/patent. This is one of the reasons that the BL have generally avoided digitising in-copyright works. However, having BL acting as a digital publisher even of out of copyright material puts them in an ambiguous position: digital copies generate (or infringe existing) copyright willy-nilly. It's just the way the law works.

@Niall: thanks for those. Certainly, I'm not saying this is wholly BL's fault - the legal system is clearly largely to blame. I just don't see it being very helpful in trying to (a) deal with these problems (b) change them.

I'll give you a break down of what the British Library is currently doing. They take a public domain book and scan it in. If I want to use a photo from that book for commercial purposes from their library I will have to pay them a fee. I called for a quote and they want 350 pounds, or 550 dollars for the use of an image they scanned in.

It is one thing to pay a current copyright holder a fee for a photo, but I think it is an outright hustle for them to charge such high fees for images in the public domain.

The other aspect is that if I were to find the same image from another book at a public library, scan it in myself, the British Library could potentially turn around and sue me for "their" image. The burden would be upon myself to provide documentation that I went to another library and made the scan myself - my own "faithful reproduction of a document in the public domain."

I think it's a racket, and a complete violation of the spirit of public domain.

BTW there are holdings at other online libraries that do not have the same restrictions. More importantly, they cite where the book resides so that you can source the information.

About Me

I have been a technology journalist and consultant for 30 years, covering
the Internet since March 1994, and the free software world since 1995.

One early feature I wrote was for Wired in 1997:
The Greatest OS that (N)ever Was.
My most recent books are Rebel Code: Linux and the Open Source Revolution, and Digital Code of Life: How Bioinformatics is Revolutionizing Science, Medicine and Business.