Google’s university book scanning can move ahead without authors’ OK

Court gives Google significant fair use protections.

A federal appeals court on Tuesday upheld the right of universities, in conjunction with Google, to scan millions of library books without the authors' permission.

The 2nd US Circuit Court of Appeals, ruling in a case brought by the Authors Guild and other writers' groups, argued that the universities were not breaching federal copyright law, because the institutions were protected by the so-called "fair use" doctrine. More than 73 percent of the volumes were copyrighted.

The guild accused 13 universities in all of copyright infringement for reproducing more than 10 million works without permission and including them in what is called the HathiTrust Digital Library (HDL) available at 80 universities. The institutions named in the case include the University of California, Cornell University, Indiana University, and the University of Michigan.

Those with "certified print disabilities" like the blind may access the complete scanned works, which the New York-based appeals court also found are preserved indefinitely because of their digital reproduction. Those without disabilities may only search keywords in the books unless an author grants greater permission.

"We have no reason to think that these copies are excessive or unreasonable in relation to the purposes identified by the Libraries and permitted by the law of copyright. In sum, even viewing the evidence in the light most favorable to the Authors, the record demonstrates that these copies are reasonably necessary to facilitate the services HDL provides to the public and to mitigate the risk of disaster or data loss," the court wrote (PDF).

The fair use doctrine is a defense to copyright infringement and generally may be asserted for reasons such as scholarship and research, teaching, news reporting, commentary, parody, and criticism.

According to the appeals court:

It is not disputed that, in order to perform a full‐text search of books, the Libraries must first create digital copies of the entire books. Importantly, as we have seen, the HDL does not allow users to view any portion of the books they are searching. Consequently, in providing this service, the HDL does not add into circulation any new, human‐readable copies of any books. Instead, the HDL simply permits users to 'word search'—that is, to locate where specific words or phrases appear in the digitized books. Applying the relevant factors, we conclude that this use is a fair use.

The court added that making volumes available in their entirety to the disabled "is an example of fair use."

Daniel Goldstein, who argued the case on behalf of the disabled, said the decision "changed for the better the lives of print-disabled Americans, that is, those who cannot readily access printed text, whether because of blindness, arthritis, dyslexia, cerebral palsy, upper spinal cord injury, or a host of other conditions."

The guild did not immediately comment on the decision, which largely affirms a 2012 lower court ruling.

Gah, I have a lot of sympathy with authors in the modern world, but stuff like this doesn't help their cause any. Remember authors, and authors groups: every time you try to over-reach on copyright protection "just because we can," you lose. Let's look at it this way: do you want to end up as hated as the RIAA? Just keep it up.

Gah, I have a lot of sympathy with authors in the modern world, but stuff like this doesn't help their cause any. Remember authors, and authors groups: every time you try to over-reach on copyright protection "just because we can," you lose. Let's look at it this way: do you want to end up as hated as the RIAA? Just keep it up.

++ With my own first book coming out later this year (yay!), I just didn't get what the Author's Guild thought they were doing. Even if you're right, it's kind of important not to show your ass. Here, ass was shown, right was not had.

So many of these books are now out of print, and the last of their kind. They are disappearing very quickly. I know a few local books in my own county library system of this nature I wish could be preserved like this before they disappear. Copyright be damned. If you're not actively printing and/or selling it with 10 years of original publication, it should be able to be reproduced by someone else.

You know, if at the beginning of this they'd said "Well, OK, but since there's increased circulation vis-a-vis disabled people's access, we want a licensing fee of $5/book/year", they'd've probably gotten it. Or a per-year fee or each user of $100 or whatever.

But instead they decided to be asshats about it, and came out of it with some legal fees.

This case will be appealed and reversed. The legal analysis regarding the third factor in fair use - whether secondary user employs more than necessary and whether copying is excessive - is simply wrong, shockingly so. I'm surprised three very respected appeals court judges signed off on it. Their opinion says their past precedent says that in some instances, copying the entire copyright work may be allowed under fair use and they cite Bill Graham Archives case from 2006 in support of that proposition.

Quote:

The crux of the inquiry is whether “no more was taken than necessary.” Id. at 589. For some purposes, it may be necessary to copy the entire copyrighted work, in which case Factor Three does not weigh against a finding of fair use. See Bill Graham Archives, 448 F.3d at 613 (entire image copied); Arriba Soft, 336 F.3d at 821 (“If Arriba only copied part of the image, it would be more difficult to identify it, thereby reducing the usefulness of the visual search engine.”).

But that interpretation of Bill Graham completely turns the conclusion of that decision on its head. The Bill Graham decision said exactly the OPPOSITE of what this appeals court says it did. If you read the case here,

you see the case involved the use of 7 images from 480 page book of illustrations. The court in Bill Graham went out of its way to emphasize how little of the book was used:

Quote:

Third, BGA's images constitute an inconsequential portion of Illustrated Trip. The extent to which unlicensed material is used in the challenged work can be a factor in determining whether a biographer's use of original materials has been sufficiently transformative to constitute fair use. See Craft v. Kobler, 667 F.Supp. 120, 129 (S.D.N.Y.1987) (Leval, J.) (finding biography of Stravinsky to be unfair in part because the takings were numerous and were the "liveliest and most entertaining part of the biography"). Although our circuit has counseled against considering the percentage the allegedly infringing work comprises of the copyrighted work in conducting third-factor fair use analysis, see NXIVM Corp. v. Ross Inst., 364 F.3d 471, 480 (2d Cir.2004), several courts have done so, see, e.g., Harper, 471 U.S. at 565-66, 105 S.Ct. 2218 (finding the fact that quotes from President Ford's unpublished memoirs played a central role in the allegedly infringing magazine article, constituting 13% of that article, weighed against a finding of fair use); Salinger, 811 F.2d at 98-99 (finding the fact that letters are quoted or paraphrased on approximately 40% of the book's 192 pages weighs against fair use). We find this inquiry more relevant in the context of first-factor fair use analysis.

In the instant case, the book is 480 pages long, while the BGA images appear on only seven pages. Although the original posters range in size from 13" x 19" to more than 19" x 27," the largest reproduction of a BGA image in Illustrated Trip is less than 3" x 4½," less than 1/20 the size of the original. And no BGA image takes up more than one-eighth of a page in a book or is given more prominence than any other image on the page. In total, the images account for less than one-fifth of one percent of the book. This stands in stark contrast to the wholesale takings in cases such as those described above, and we are aware of no case where such an insignificant taking was found to be an unfair use of original materials.

In fact, in that last sentence, the Bill Graham decision explicitly distinguishes the allowable fair use of 7 pages versus the "wholesale takings" in other copyright cases - yet the Google panel cites this case to allow the copying of entire books! Simply wrong legal analysis. This panel was fishing for the result it wanted and it twisted its own case precedent to do so. It will be appealed and reversed.

So many of these books are now out of print, and the last of their kind. They are disappearing very quickly. I know a few local books in my own county library system of this nature I wish could be preserved like this before they disappear. Copyright be damned. If you're not actively printing and/or selling it with 10 years of original publication, it should be able to be reproduced by someone else.

Indeed, we should never lose sight of the fact that the one explicit purpose of copyright is to encourage the production and dissemination of art, with the means being to give copyright holders temporary government-sponsored monopolies. Thus the means, i.e. the temporary government-sponsored monopolies, should not get in the way of achieving the purpose, i.e. getting more art consumed by more people. Thus there are no ethical issues with reproducing a piece of work if the copyright holder of the work cannot be found or does not exist anymore or simply refuses to sell. There SHOULDN'T be legal issues either, but, well, that's clearly not the case.

I was thinking the same thing and the reference is too good to not call to everyone's attention. Technology marching forward at the cost of the past (not that that's happening here)... from the book Rainbow's End by the brilliant Vernor Vinge.

Tiny flecks of white floated and swirled in the column of light. Snowflakes? But one landed on his hand: a fleck of paper.

And now the ripping buzz of the saw was still louder, and there was also the sound of a giant vacuum cleaner...

Brrrap! A tree shredder!

Ahead of him, everything was empty bookcases, skeletons. Robert went to the end of the aisle and walked toward the noise. The air was a fog of floating paper dust. In the fourth aisle, the space between the bookcases was filled with a pulsing fabric tube. The monster worm was brightly lit from within. At the other end, almost twenty feet away, was the worm's maw - the source of the noise... The raging maw was a "Navicloud custom debinder." The fabric tunnel that stretched out behind it was a "camera tunnel..." The shredded fragments of books and magazines flew down the tunnel like leaves in a tornado, twisting and tumbling. The inside of the fabric was stiched with thousands of tiny cameras. The shreds were being photographed again and again, from every angle and orientation, till finally the torn leaves dropped into a bin just in front of Robert.

(edit: noted that I'm not against them scanning the books - although it would be nice if the original authors are compensated in a beneficial way)

So many of these books are now out of print, and the last of their kind. They are disappearing very quickly. I know a few local books in my own county library system of this nature I wish could be preserved like this before they disappear. Copyright be damned. If you're not actively printing and/or selling it with 10 years of original publication, it should be able to be reproduced by someone else.

I heard an estimate (not verified) that we may be losing upwards of ten thousand out of print books per year.

Indeed, we should never lose sight of the fact that the one explicit purpose of copyright is to encourage the production and dissemination of art, with the means being to give copyright holders temporary government-sponsored monopolies. Thus the means, i.e. the temporary government-sponsored monopolies, should not get in the way of achieving the purpose, i.e. getting more art consumed by more people. Thus there are no ethical issues with reproducing a piece of work if the copyright holder of the work cannot be found or does not exist anymore or simply refuses to sell. There SHOULDN'T be legal issues either, but, well, that's clearly not the case.

There is actually an ethical issue of sorts, in that the aim of copyright is to provide incentives (ie financial incentives) for the production and dissemination of art. If you allow people to copy and distribute the art for free - as these libraries are doing - then there is less incentive to produce works of art and therefore fewer will be produced in future.

I'd have no ethical issue here if this were purely a preservation project (ie the scan could only be consulted on-site, and only one person could consult it at a time), rather than a means for Google to make millions of dollars in advertising revenue on copyrighted books and not pay a single penny of that to the authors of the books.

Indeed, we should never lose sight of the fact that the one explicit purpose of copyright is to encourage the production and dissemination of art, with the means being to give copyright holders temporary government-sponsored monopolies. Thus the means, i.e. the temporary government-sponsored monopolies, should not get in the way of achieving the purpose, i.e. getting more art consumed by more people. Thus there are no ethical issues with reproducing a piece of work if the copyright holder of the work cannot be found or does not exist anymore or simply refuses to sell. There SHOULDN'T be legal issues either, but, well, that's clearly not the case.

There is actually an ethical issue of sorts, in that the aim of copyright is to provide incentives (ie financial incentives) for the production and dissemination of art. If you allow people to copy and distribute the art for free - as these libraries are doing - then there is less incentive to produce works of art and therefore fewer will be produced in future.

I'd have no ethical issue here if this were purely a preservation project (ie the scan could only be consulted on-site, and only one person could consult it at a time), rather than a means for Google to make millions of dollars in advertising revenue on copyrighted books and not pay a single penny of that to the authors of the books.

Those incentives break down in the cases I mentioned: copyright holder is dead, or copyright holder can't be found, or copyright holder doesn't care to make money by selling the work of art. Freely distributing art in those cases CAN'T lower the incentive to produce, because there is already no incentive.

How many of you RTFA? Yes, these universities scanned entire books. And yes, some of the books they scanned are under copyright. BUT if the book is copyrighted and you're not blind, you can't read it!

But you can search it, though.

And you know what happens then? Some of the people who discover that a certain copyrighted book talks about Subject X, will then go and buy the book. And then that author will make money without having to do extra work.

In any case, copyright is not a property right. The rights of the reading public are supposed to be balanced with the rights of the author. And in this case, they are.

Unless someone wants to complain that now blind people are "stealing" copyrighted books. If you think that, I'm not going to bother arguing with you.

A tiny aside, it mentions dyslexia as one of the vision impairments helped by this. I am curious how text can and is manipulated to make it easier for dyslexics.

Except it makes complete sense. For people who can use the actual books, the digital version tells them no more than which book contains the information they're looking for and where in it can be found. For people who cannot use the physical book the digital version gives them the ability (one that is commonly not being provided by the copyright holder) to use the book. The latter has explicitly been upheld as fair use.

Further, it becomes another copy of the book, that is being maintained by institutions that have a sense of duty to maintain them, that can be potentially relied upon to exist when the copyright for the book expires. A copy that would not itself be encumbered by "new" copyright (as happens with, for example, Shakespeare's works when they're republished because the layout of the works are considered copyrightable).

Even if these institutions decided to try and assert a copyright on the copy of the book it would be dateable back to this decade (or so) and as such would expire much quicker than if the copy were made 50-100 years from now.

In short: the market for the books is not being impacted (negatively or otherwise). Accessibility that many publishers don't feel is a large enough market to cater to is being provided. The information within the books is much more discoverable than it would be otherwise. Which, since they're not providing text of the books, helps drive demand for the books that the information being sought can be found in up. Finally, the books are being preserved now in a format, and by an institution with leaning toward preservation as a duty, for all time.

How is that not good and fair use? The authors will get paid when their book is found by someone searching for a term or concept found in their book and that person either accesses it via the library or purchases it. If anything doing this across all works can only help to drive the aggregate value of works up.

A tiny aside, it mentions dyslexia as one of the vision impairments helped by this. I am curious how text can and is manipulated to make it easier for dyslexics.

Audio poses no problem for dyslexics.

Conversion to alternate formats is the means of making books accessible to those unable to read normal print books. Alternate formats include audio, braille, large print and ebooks for specialized ereaders. There are additional alternate formats, but these cover most needs.

There has been a free library for the blind operating for decades now that offers books on 16rpm vinyl records (nowadays they use better quality 4track tape). Dyslexics are also able to use the service.

When told that he would be reading the introductions that were written for his books...

Quote:

You can imagine my delight when I received an email saying my books had been approved for recording. I was doubly delighted when I was asked to record an introduction for each book....It's true that I am the author of the material, but a ghostwriter wrote the books, and I read on a fifth-grade level! I then found myself having a slight out-of-body experience. I heard myself replying, "Oh! I can't do that. I can't read."

Maybe if they where actually curating it maybe you can make that case but they are not. They allow huge mistakes in basic catalogue information which has the tendency of helping them and hurting the copyright holder.

While I have no doubt it will be appealed, the ruling looks like it was unanimous and I don't think there is a circuit split on the issue. So what is the likelihood of the 2nd Circuit rehearing en banc or the Supreme Court picking it up?

Indeed, we should never lose sight of the fact that the one explicit purpose of copyright is to encourage the production and dissemination of art, with the means being to give copyright holders temporary government-sponsored monopolies. Thus the means, i.e. the temporary government-sponsored monopolies, should not get in the way of achieving the purpose, i.e. getting more art consumed by more people. Thus there are no ethical issues with reproducing a piece of work if the copyright holder of the work cannot be found or does not exist anymore or simply refuses to sell. There SHOULDN'T be legal issues either, but, well, that's clearly not the case.

There is actually an ethical issue of sorts, in that the aim of copyright is to provide incentives (ie financial incentives) for the production and dissemination of art. If you allow people to copy and distribute the art for free - as these libraries are doing - then there is less incentive to produce works of art and therefore fewer will be produced in future.

I'd have no ethical issue here if this were purely a preservation project (ie the scan could only be consulted on-site, and only one person could consult it at a time), rather than a means for Google to make millions of dollars in advertising revenue on copyrighted books and not pay a single penny of that to the authors of the books.

If authors did what they could do by using the potential of the internet, they would make money by publishing their books digitally and not going through rip-off middleman publishers first. I can imagine digital platforms that let you read any book online, paid for by on-line advertising (or without advertising if the author wishes). More popular books might have more advertising, but if it's too much then people would stop reading---or a smaller number of advertisers in that space would pay more money to get access to that larger audience. Supply and demand at work.

As it stands now, a lot of authors get literally nothing for their works which the publishing houses sell for large profit, and authors can lose control completely of any copyright. I suspect this "Authors Guild" is really a front for the "Publisher's Guild."

How many of you RTFA? Yes, these universities scanned entire books. And yes, some of the books they scanned are under copyright. BUT if the book is copyrighted and you're not blind, you can't read it!

But you can search it, though.

And you know what happens then? Some of the people who discover that a certain copyrighted book talks about Subject X, will then go and buy the book. And then that author will make money without having to do extra work.

Seriously. I assume people saw "Google’s university book scanning can move ahead" and stopped reading there? All this allows is text search. It has absolutely nothing to do with distributing ebooks or selling them. If you want the book, you still have to find someone who sells it.

Incidentally, that's why this still doesn't save orphaned works. The good news is that they have a scan of it for when US copyrights start expiring, if they ever do expire, but in the meantime, no one can access them. At least now you can search them, just good luck finding anyone who will be able to sell you a copy.

How many of you RTFA? Yes, these universities scanned entire books. And yes, some of the books they scanned are under copyright. BUT if the book is copyrighted and you're not blind, you can't read it!

But you can search it, though.

And you know what happens then? Some of the people who discover that a certain copyrighted book talks about Subject X, will then go and buy the book. And then that author will make money without having to do extra work.

Seriously. I assume people saw "Google’s university book scanning can move ahead" and stopped reading there? All this allows is text search. It has absolutely nothing to do with distributing ebooks or selling them. If you want the book, you still have to find someone who sells it.

Incidentally, that's why this still doesn't save orphaned works. The good news is that they have a scan of it for when US copyrights start expiring, if they ever do expire, but in the meantime, no one can access them. At least now you can search them, just good luck finding anyone who will be able to sell you a copy.

Indeed, we should never lose sight of the fact that the one explicit purpose of copyright is to encourage the production and dissemination of art, with the means being to give copyright holders temporary government-sponsored monopolies.

This *was* the justification for copyright. It is no-longer the justification.

Today copyright exists to create jobs and strengthen the economy.

Copyright terms are now so long they might as well be indefinate. They are no longer temporary.

As a copyright holder, my works are my property until 70 years after my death. In other words, forever from my perspective.

While I have no doubt it will be appealed, the ruling looks like it was unanimous and I don't think there is a circuit split on the issue. So what is the likelihood of the 2nd Circuit rehearing en banc or the Supreme Court picking it up?

Fair point. I should have said it will get appealed and SHOULD get reversed, at least the portion that relates to third factor of the fair use test. What might persuade an en banc panel to take a second look at this decision is that the opinion cites a second decision, a 9th Circuit ruling called Arriba that involved an image search engine making wholesale copies of photographs. That decision is actually much more on point and it does support the Authors Guild panel's reasoning. The problem is that Arriba is a decision from another circuit and internal circuit rules say decisions from the same circuit take precedent over decisions from sister circuits. I think the plaintiffs can make a good argument that the holding in the Bill Graham case, as well as other 2nd Circuit decisions related to fair use wholesale copying of copyrighted works, means that the 2nd Circuit doesn't follow the test formulated in Arriba and that the panel erred in following Arriba. Of course, defendants will argue that Arriba doesn't conflict with 2nd Circuit decisions and it was proper for the Authors Guild opinion to rely on it. It'll be up to the full circuit to decide.

The number of down-votes here is rather disappointing. I believe we have a learned and capable commentator being 'dissed' simply because his/her considered opinion is unpopular and does not ride the tide of popular jubilation. That I personally welcome such an outcome is irrelevant. My S. O. who is a retired attorney was equally nonplussed and I had to sit through another tutorial. S. O. comes down on the side of will be, BTW.

This case will be appealed and reversed. The legal analysis regarding the third factor in fair use - whether secondary user employs more than necessary and whether copying is excessive - is simply wrong, shockingly so. I'm surprised three very respected appeals court judges signed off on it. Their opinion says their past precedent says that in some instances, copying the entire copyright work may be allowed under fair use and they cite Bill Graham Archives case from 2006 in support of that proposition.

Quote:

The crux of the inquiry is whether “no more was taken than necessary.” Id. at 589. For some purposes, it may be necessary to copy the entire copyrighted work, in which case Factor Three does not weigh against a finding of fair use. See Bill Graham Archives, 448 F.3d at 613 (entire image copied); Arriba Soft, 336 F.3d at 821 (“If Arriba only copied part of the image, it would be more difficult to identify it, thereby reducing the usefulness of the visual search engine.”).

But that interpretation of Bill Graham completely turns the conclusion of that decision on its head. The Bill Graham decision said exactly the OPPOSITE of what this appeals court says it did. If you read the case here,

you see the case involved the use of 7 images from 480 page book of illustrations. The court in Bill Graham went out of its way to emphasize how little of the book was used:

Quote:

Third, BGA's images constitute an inconsequential portion of Illustrated Trip. The extent to which unlicensed material is used in the challenged work can be a factor in determining whether a biographer's use of original materials has been sufficiently transformative to constitute fair use. See Craft v. Kobler, 667 F.Supp. 120, 129 (S.D.N.Y.1987) (Leval, J.) (finding biography of Stravinsky to be unfair in part because the takings were numerous and were the "liveliest and most entertaining part of the biography"). Although our circuit has counseled against considering the percentage the allegedly infringing work comprises of the copyrighted work in conducting third-factor fair use analysis, see NXIVM Corp. v. Ross Inst., 364 F.3d 471, 480 (2d Cir.2004), several courts have done so, see, e.g., Harper, 471 U.S. at 565-66, 105 S.Ct. 2218 (finding the fact that quotes from President Ford's unpublished memoirs played a central role in the allegedly infringing magazine article, constituting 13% of that article, weighed against a finding of fair use); Salinger, 811 F.2d at 98-99 (finding the fact that letters are quoted or paraphrased on approximately 40% of the book's 192 pages weighs against fair use). We find this inquiry more relevant in the context of first-factor fair use analysis.

In the instant case, the book is 480 pages long, while the BGA images appear on only seven pages. Although the original posters range in size from 13" x 19" to more than 19" x 27," the largest reproduction of a BGA image in Illustrated Trip is less than 3" x 4½," less than 1/20 the size of the original. And no BGA image takes up more than one-eighth of a page in a book or is given more prominence than any other image on the page. In total, the images account for less than one-fifth of one percent of the book. This stands in stark contrast to the wholesale takings in cases such as those described above, and we are aware of no case where such an insignificant taking was found to be an unfair use of original materials.

In fact, in that last sentence, the Bill Graham decision explicitly distinguishes the allowable fair use of 7 pages versus the "wholesale takings" in other copyright cases - yet the Google panel cites this case to allow the copying of entire books! Simply wrong legal analysis. This panel was fishing for the result it wanted and it twisted its own case precedent to do so. It will be appealed and reversed.

David Kravets / The senior editor for Ars Technica. Founder of TYDN fake news site. Technologist. Political scientist. Humorist. Dad of two boys. Been doing journalism for so long I remember manual typewriters with real paper.