It is this criticism of the project that prompted me to accept your invitation to speak — and explain why we believe this is a legal, ethical, and noble endeavor that will transform our society.

Legal because we believe copyright law allows us the fair use of millions of books that are being digitized. Ethical because the preservation and protection of knowledge is critically important to the betterment of humankind. And noble because this enterprise is right for the time, right for the future, right for the world of publishing, right for all of us. ….

…We were digitizing books long before Google knocked on our door, and we will continue our preservation efforts long after our contract with Google ends. As one of our librarians says, “We believed in this forever.”

Google Book Search complements our work. It amplifies our efforts, and reduces our costs. It does not replace books, but instead expands their presence in the marketplace.

We are allowing Google to scan all of our books – those in the public domain and those still in copyright – and they provide our library with a digital copy. We insisted on this for one very important reason: Our library must be able to do what great research libraries do – make it possible to discover knowledge. …

…Let me assure you, we have a deep respect for intellectual property – it is our number one product. That respect extends to the dark archive and protecting your copyrights.

We know there are limits on access to works covered by copyright. If, and when, we pursue those uses, we will be conservative and we will follow the law. And we will protect all copyrighted materials your work – in that archive.

….

“We have to remember,” President Angell said, “that the library is the great central power in the instruction given in the University, and that the books are here not to be locked up and kept away from readers, but to be placed at their disposal with the utmost freedom…”

Be placed at their disposal with the utmost freedom. That’s what the technology of Google Book Search does with our books. …

…I was particularly struck by one Ford official’s assessment of the absolute need for transformation: “Change or die,” he said. Change or die. …

..At its essence, the digitization project is about the public good.

It transcends debates about snippets, and copyright, and who owns what when, and rises to the very ideal of a university – particularly a great public university like Michigan.

This project is about the social good of promoting and sharing knowledge. As a university, we have no other choice but to do this project.

Noble — yes, of course. Making all books anywhere available for whomever might connect to the ‘net is certainly a noble endeavor.

The issue isn’t about making money. It’s about making as much knowledge available as possible to as many people as possible. If one happens to make a lot of money doing it, then consider it one of those rare instances where lucre honors nobility.

Legal — of course! The copying and snippet-providing won’t and can’t replace the original works. Google isn’t providing the entire book, and isn’t selling a copy of the entire work. Your suggestion that they are is just a specious rhetorical ploy used in a cowardly manner.

Ethical?

The sharing and expansion of knowledge through Philosophical and Scientific pursuit is one of the most ethically unchallengable positions in Human society. There is never has been a need for anyone to ask such permission, nor should there ever be.

Once again — they’re not selling the books, nor reselling them; Google’s providing a web-based catalog service that will allow anyone to *find* those books.

The AAP is merely bitter at Google for so openly besting them in the execution of what they claim as their responsibility.

“Once again — they’re not selling the books, nor reselling them; Google’s providing a web-based catalog service that will allow anyone to *find* those books.”

Are you kidding me.. They certainly will be using your activity, as an input for profiling and ultimately for direct marketing, thus without the books to offer they can’t make as much money.. Thus, the ethical, noble, and legal thing should be to give a cut to those that generate content!

The point I am about to make has been made before, by others around the web, but it seems like most people don’t quite get it. So I’ll repeat it here.

Once you give Google the right to make a full copy of the book, for its own index, then you pretty much give everyone else in the country that same exact right.

Back in the early days of information retrieval (1960s and 70s), one of the popular techniques for search was manual indexing/annotation of documents. In other words, what we today would call “tagging”.

So, let’s suppose I wanted to provide a web service in which people could search an index of all the books that I had read. Much in the same way other services let you share your music playlists. Well, in order to create this service, three things need to happen: (1) I need a copy of the book, (2) I need to analyze the contents of the book, and (3) I need to manually assign a couple of index terms for the book.

So, since Google’s “book copying in full” program is purportedly now legal, (1) I simply go out to my local library and scan/copy the whole book, in its entirety. Then (2) I analyze the contents of the book. I.e. I read it. Remember, this is manual indexing, a search/IR technique that dates back 40 years or more. This is not some capricious new approach. Finally, (3) I write 2-3 sentences about the book, and post those on my web page.

Now, what I’ve effectively done is created a legal situation where I pretty much can go out and copy any book from any library or person, and read it in its entirety, without having to buy or otherwise pay for it. So long as I don’t distribute those copies, it’s good.

Can’t anyone see that this is the pandora’s box that Google is opening up?

Are these “e-books” going to be easily downloadable (in one fell swoop)? I would doubt it. It’s possible to allow internet access to an electronic journal one-page-at-a-time. This would be more expensive (time&effort) than simply buying the REAL book. I don’t know the details on how they plan on offering the material. But, I think it’s possible to make it prohibitively expensive for abusers to make copies of the copyrighted material. Ever use (or hear about) Safari? This is how they do it.

JG: “Now, what I’ve effectively done is created a legal situation where I pretty much can go out and copy any book from any library or person, and read it in its entirety, without having to buy or otherwise pay for it. So long as I don’t distribute those copies, it’s good.”

You can already, today, go to a library and read a whole book without paying for it (or a bookstore for that matter if you’re a speed reader). I use the library quite often and haven’t yet been asked to pay for any of the books I’ve “stolen” in this fashion. Am I missing something?

“Are these “e-books” going to be easily downloadable (in one fell swoop)? “

Nope. For books in copyright, it’s just gonna be snippets unless the publishers give permission to show more (in which case, you might get to see a couple chapters, but probably no more than that). For books in the public domain, you will actually get to page through the full text on Google Book Search, but I think it’s yet to be determined whether there’ll be easy full text downloads.

Wait. She said “copyright law allows us the fair use of millions of books that are being digitized.”

Uh, but it’s Google that is claiming fair use here, not the University of Michigan. It’s Google doing the copying and presenting useless “snippets” through its mystical and incomprehensible ranking system. It’s Google capturing the text behind DRM.

What is she talking about her (and Michigan’s) fair use rights for?

Besides, what the Michigan library was doing before was not as much a Sec. 107 (fair use) thing as a Sec. 108 (library privilege) thing.

The ethics and nobility thing would ring true if the University of Michigan were actually assuming responsibility for this project. It is not. This is a disingenuous speech.

There you go, you see Oreilly pay’s the authors and publishers! Now that is a model that protects the IP!

“Nope. For books in copyright, it’s just gonna be snippets unless the publishers give permission to show more (in which case, you might get to see a couple chapters, but probably no more than that).”

Every heard of bittorrent and a nice hash function! -> So if a book is 300 pages, and every person can download 5 pages, it requires only 60 people to share their 5 pages, and with a pretty minimal python/ruby script, and the whole book can be joined together… There is no way around that arguement..

(1) With your library example, you check out that book, and then you return it. Under the new GPL (Google Purloiner Licence), I would be able to check the book out, copy it, and retain that copy indefinitely. I would be able to read.. eh.. ahem.. I mean.. “manually index”.. that book over and over again, any time I wanted, without having to go to the library again to check it out.

(2) With your library example, there can only be one user of the book at a time. If I’ve already checked the book out, you cannot check it out, too, until I return it. This limits the number of simultaneous readers to 1. Under this new Google “Fair Use” scheme, I borrow the book from the library, and make my copy. Then you borrow it, and make your copy. Then Battelle borrows it and makes his copy. So on a week from Thursday, we can all be reading (ahem, “manually indexing”) it at the same time. That is a very different usage scenario of the book, than when the publishers sold it to the library in the first place.

(3) If you allow multiple simultaneous copies of an object to exist (i.e. if you allow both Google and Yahoo to go to Michigan and scan the same book from the same library), then it might not matter if Google and/or distribute the copies. All you need to do is make sure the original gets wide enough circulation, and everyone can basically copy (for the purposes of “manual indexing”, of course) anything they want. All it takes is one person to own the original, and all of that person’s friends can now legally copy the original directly from him.

And because you can trade and resell physical objects, I envision collectives popping up whereby massive numbers of books are rapidly traded, until everyone gets copies of everything they want.

Heck, why stop with books? Why not music? This Google scanning rationale applies just as much to CDs as it does to book, no? Don’t we want to preserve our musical cultural heritage, too? So let’s all start swapping CDs, ripping our own copies (for the purposes of “manually indexing” of course), and then passing the CD along to the next person. Legalized Napster, because (1) we’re only swapping physical objects, not intellectual property, and (2) each person who comes in contact with an original physical object has the right to “scan” (rip/copy) it for himself, in order to “index” it.

Yes, to a certain extent, people already do copy each other’s CDs. But its quite gray and very small scale. What we’re talking about now with this new Google “Making an Self Copy if I’m Going to Index it” Licence is legallizing this on a massive scale. I see whole new companies popping up facilitating the massive trade in original physical media objects.

I could go on and on about this, but my posts tend to be too verbose as it is. Apologies. Dya see how it is different, though?

“Every heard of bittorrent and a nice hash function! -> So if a book is 300 pages, and every person can download 5 pages, it requires only 60 people to share their 5 pages, and with a pretty minimal python/ruby script, and the whole book can be joined together… There is no way around that arguement..”

Well, sort of. Obviously just about anything is hackable. But there are a few problems with your formulation, as far as I can see. For one thing, it’s not really the case that “every person can download 5 pages” — every person can view a number of pages designated by the agreement with the publisher (for a book I just tried, Siva’s Copyrights and Copywrongs, it let me see three pages). But it doesn’t let you select and copy the text or save the page image. So, though I am certainly no hacker (though I am quite aware of the capabilities of BitTorrent and hash functions), I don’t see any really easy way of getting the pages off of Google and onto my hard drive for later hosting. I mean, sure, you could do screen grabs, but how ugly would that be?

Furthermore, none of this even applies to the copyrighted books for which the publishers haven’t given any permissions beyond copyright — for those, you can’t even see a whole page. Personally, I would think that digitally reassembling a book from hundreds or thousands one- or two-line snippets would be a seriously tedious and annoying task, especially when you — and anyone else who might like to download it — could probably just check the whole darn thing out from a local library.

I guess my point is essentially this: obviously there are going to be people who try to hack in and reassemble the books indexed by Google Book Search, but frankly, it seems like doing that would probably be a hell of a lot more expensive, difficult, and tedious than just checking the book out of the library and going at it with a scanner (though obviously that option would be no more legal than the hack job). So my question on this issue would be: why would anyone bother?

I’ve made a plugin (or mashup) that takes the Google Book Search results and links it to holdings in your local library’s catalog, so you can use the search tool as an index and click through to get the book. I did it for the Ann Arbor District Library (that’s my public library), and it’s already been adapted to a library in Mexico. See http://www.superpatron.com for details.

“So if a book is 300 pages, and every person can download 5 pages, it requires only 60 people to share their 5 pages”

For the ‘in copyright’ books that I’ve accessed via Google Book Search most of those 5 pages would be the same 5 pages for everyone. Often publishers only allow a portion of the book to be made available, sometimes just the covers and the contents and index pages, not the whole book.

I think that publishers and authors would do a lot better leveraging Google Book Search as a marketing opportunity rather than trying to stop it. Something we’ve learned in the last decade or so is that if you try to ban something on the internet that’s potentially profitable then all you will probably do is force it to somewhere in the physical world that’s out of your control.

I just returned from the AAP/PSP conference and the most interesting thing that I heard was not Dr. Coleman’s speech, but an answer to a question she was asked.

Q: Some publishers have asked Google if they would give the copyright holders a digital copy of their content in exchange for a license to make those copies. Google has yet to respond to those requests. Would University of Michigan consider giving the copyright holders a copy of their own content?

A: No. Our contract with Google prevents us from doing that.

I think what she’s talking about is section 4.4.1 of the contract which prevents the University from giving a third party a copy of the files. There is nothing else in the contract preventing the University from giving copies to the copyright holders. It’s interesting though that the owners of the intellectual property are being interpreted as third parties. When I first read that section I thought it was a protection for publishers, preventing the University from distributing copies. Now I’m concerned it’s Google’s way to protect its monetizing of that content.

Ask yourself why Google wouldn’t want the copyright owners to have a copy of their own content? I estimate that of the in copyright content that Google will be digitizing, at least 60% of that content will be getting digitized for the first time by this project, probably more. Most publishers didn’t start creating content digitally until the 90s and most have only begun to digitize their older content. Could it be that Google wants to prevent the copyright holders from competing with them in the eventual selling of that content online?

After reading the story linked above and other public relations material from Google I’m convinced that’s what’s happening. Google wants a monopoly on the distribution of that content online, and unfortunately The University of Michigan is aiding them in that pursuit.

Google will have a digitized version of the books it has scanned, as well as the OCR’d text of the books. If another company is interested in using this data, can Google legally sell it to them? It certainly would make sense to me, no sense it re-scanning and re-OCRing every book, right?

Let’s look 10 years down the line, when we have multi-terabyte storage in our pockets. Can Google sell a copy of all the book pages it has scanned to consumers, as long as you promise to only make “fair use” of it? Or, if there’s some disgruntled engineer there who makes a copy of the entire data set of all books and puts it out there on bittorrent-of-the-future, what then?

Through the age of mechanical reproduction, the arts survived, but how will they fare in an age of massive digital reproduction?

If Google’s doing this for the benefit of mankind, shouldn’t they share in the scanning effort of the Open Content Alliance? Should they allow other companies to use the same scanned data?

I’m part of a team of students at the University of Westminster in London who have produced a ‘dummy’ magazine about social networking. In this, we used the picture of a book included your blogpost above.

As we’re submitting the magazine for a student award, we need to get copyright approval for all pictures. Could you let me know if you are happy for us to use your image – the magazine will not be sold or used for commercial purposes.