When the Invisible is Made Visible: Oxford University Press on the Google Book Settlement

On July 29th, OUP USA President Tim Barton published a detailed overview and position statement regarding the AAP/Authors Guild Settlement with Google which can be found here (behind a paywall). The full text of his article appears below.

In September 2009 Tim Barton and OUP USA General Counsel Barbara Cohen contributed to a Q & A on the Stanford Libraries’ Copyright and Fair Use website. You can visit the interview here or read it at the very bottom of this page.

At a focus group in Oxford University Press’s offices in New York last month, we heard that in a recent essay assignment for a Columbia University classics class, 70 percent of the undergraduates had cited a book published in 1900, even though it had not been on any reading list and had long been overlooked in the world of classics scholarship. Why so many of the students had suddenly discovered a 109-year-old work and dragged it out of obscurity in preference to the excellent modern works on their reading lists is simple: The full text of the 1900 work is online, available on Google Book Search; the modern works are not.

In describing books, the Scottish-American classicist Gilbert Arthur Highet once wrote, “These are not lumps of lifeless paper, but minds alive on the shelves.” In a world in which students consult not shelves but keyboards, too many of those lively minds remain out of sight, exiled to those same shelves, where, every year, there is a virtual fire not unlike the one at the ancient library at Alexandria, as last copies of precious books crumble slowly to dust, or are damaged, stolen, or lost.

What once seemed at least debatable has now become irrefutable: If it’s not online, it’s invisible. While increasing numbers of long-out-of-date public domain books are now fully and freely available to anyone with a browser, the vast majority of the scholarship published in book form over the last 80 years is today largely overlooked by students, who limit their research to what can be discovered online.

For most books published in the last ten years or so, the picture is more heartening: University libraries provide students and scholars with access to a fair number of those works via services purchased directly from publishers and aggregators. Excerpts can often be viewed online for free (but only as much as is allowed by publishers, with an eye on generating sales). And many are available as e-books. Nonetheless, the vast majority of the scholarship published since 1923 (the date before which titles are in the public domain in the United States) is now effectively out of reach to the modern student.

As one of the world’s most prolific scholarly publishers, Oxford views as a core expression of its mission—and the responsibility of all publishers sharing that mission —the reactivation of publications long sidelined by the restrictions of a print-only existence. Five years ago, we published a complete archive of our journals, enabling access to four million pages spanning a century and a half of scholarship, and we recently began a project to extend that archive to include tens of thousands of our out-of-print books.

In doing so, we immediately found ourselves confronted with myriad issues of ever-mounting complexity and difficulty. Should we engage in destructive scanning, which destroys the original but yields better results less expensively, or nondestructive scanning, which is more expensive and less effective but spares the book? How should we best clear and clarify the rights, since older contracts understandably do not mention electronic rights? What should we do about the copyrighted materials from other sources that many of the books contain (a single edited volume can include the intellectual property of dozens of chapter contributors, the volume editor, series editors, and third parties whose work is featured in the form of photographs, tables, graphs, poetry, etc.)? What level and type of functionality and metadata (behind the scenes information about the content) is appropriate for such a product?

As publishers were grappling with those sorts of questions, so too was Google. Since 2004 Google has been scanning the works found in some of the world’s best and largest scholarly libraries. Google’s stated plan was to allow “snippet” views of in-copyright works, which it claimed constituted “fair use.” In the eyes of many authors, agents, and publishers, however, Google was doing so illegally. They complained vociferously, eventually launching two lawsuits, one a class-action. Four years on, the parties to those lawsuits the Association of American Publishers, the Author’s Guild, and Google have proposed a settlement. Its fate, and the fate of the ten to twenty million titles that Google is rumored to be scanning, will be decided by publishers, authors, and other “rightsholders,” who have until September 4, 2009 to decide whether or not to be part of the settlement, and by U.S. District Court Judge Denny Chin, following a hearing on October 7. The Justice Department has indicated that it is looking into whether the deal violates antitrust laws and no one knows what this bodes.

It has taken many months for the import of the settlement to become clear. It is exceedingly complex, and its design—the result of more than two years of negotiations including not just the parties but libraries as well—is, not surprisingly, imperfect and can and should be improved. But after long months of grappling with it, what has become clear is that it is a remarkable and remarkably ambitious achievement.

It provides a means whereby those lost books of the last century can be brought back to life, and made searchable, discoverable, and citable. That aim aligns seamlessly with the aims of a university press. It is good for readers, authors, and publishers—and, yes, for Google. If it succeeds, readers will gain access to an unprecedented amount of previously lost material, publishers will get to disseminate their work—and earn a return from their past investments—and authors will find new readers (and royalties). If it fails, the majority of lost books will be unlikely ever to see the light of day, which would constitute an enormous set-back for scholarly communication and education.

The settlement is a step forward in solving the problem of “orphan works,” titles that are in copyright but whose copyright holders are elusive, meaning that no rights holder can be found to grant permission for a title’s use. For such books, a professor cannot include a chapter in a course pack for students; a publisher cannot include an excerpt in an anthology; and no one can offer a print or an electronic copy for sale. Making those books available again is a clear public good. Google having exclusive rights to use them, as enshrined in the current settlement, however, is not. If the parties to the settlement cannot themselves solve this major problem, then at a minimum Congress should pass orphan-works legislation that gives others the same rights as Google—an essential step if Google is not to gain an unfair advantage. Despite significant advocacy, Congress has failed to legislate on this issue for 20 years; we hope the specter of Google having exclusive rights to use orphan works will spur heightened public debate and Congress to immediate action.

The majority of the lost and invisible titles of post-1923 scholarship, however, are in that state not because their copyright holders are unknown. They remain in limbo because of the enormous practical obstacles involved in bringing them back to life.
Given our own digitization project, Oxford may know more about the difficulties involved in rescuing such lost publications than most publishers. And Oxford is also better suited to undertake such a project than most: We have a full-time archivist who oversees an out-of-print library that has been in place for the last century, as well as a lot of experience in online academic publishing, with products ranging from the Oxford English Dictionary, online since 2000, to Oxford Scholarship Online, a service that allows Oxford to publish its front list in 16 disciplines more or less simultaneously in print and online. Even so, the task of tackling our long-out-of-print list has proved both formidable and daunting

It is therefore not at all surprising that most publishers with smaller backlists have found it more fruitful to invest in new publishing, rather than in attempts to revive their older, inactive titles. At Oxford, our efforts flowed naturally from our mission: publishing works that further Oxford University’s objective of excellence in research, scholarship, and education. The returns to be made were highly uncertain, and we were unsure whether the revenue would repay the effort and expense.

This is the “Good Book Settlement”, rather than the “Google Settlement’. It is not solely Google’s settlement, but is equally or more so the authors’, readers’, and publishers’ settlement. It promises revenue streams for Google, for sure, although it’s possible that one can exaggerate the money to be made from older backlist titles, and the split of the revenues between the parties are helpfully enshrined in the document. The pricing mechanisms and principles should ensure a reasonable approach, with the establishment of an independent Books Rights Registry, via which author and publisher representatives will set prices for the database of older titles, as well as decide about future business models.

Google’s core business is not e-book and database retailing, and it may be a reluctant entrant into this arena, having frequently stressed that it is not in the business of creating content. So why is Google willing to make a rumored $200 million investment in scanning and to tackle the practical issues involved in restoring to life so many books, when most publishers have eschewed that opportunity? Perhaps it is that Google is playing for advertising trillions rather than publishing billions. Investments that those seeking a return from publishing could not make are more understandable when potential global-advertising revenue streams are at stake. We should note too that, in extending its business model in this way, Google provides authors, publishers, and readers with another important route to market.

For those not inclined to pay to access copyright material, Google will, per its original plan, serve up “snippets” from titles in the settlement, or more, if rightsholders allow. The settlement will also permit anyone in a public or university library to have free and full access to the titles (albeit only at one computer terminal per library).

Some publishers will be unhappy about copyrighted material being made available for free: publishers have a good argument about the need to protect copyright to secure revenues to support future publishing. But the settlement is a compromise for everyone, and publishers need to bear in mind that it exhibits a decent respect for copyright. In the same way that Apple’s iTunes created an alternative to the copyright theft of peer-to-peer software, the agreement establishes a framework in which intellectual-property rights will be acknowledged and respected, rather than ignored.

The settlement also allows for a great deal of flexibility about the participation of copyright holders. They lose only the right to sue Google, should they participate in the settlement; they can choose not to take part in any of its programs. Indeed, many publishers who decide to be part of the settlement may choose other means of electronically publishing their front list and/or backlist. The institutional database product will likely end up resembling a Swiss cheese, with plenty of holes reflecting rightsholders’ decisions to republish their work in other ways.

First and foremost, the settlement is about discovery: a basic restoration of books to our literary landscape that enables readers to find what they once would have missed. The database is unlikely to offer the functionality to which modern researchers have become accustomed: The scanning quality may be poor, and the metadata and therefore searching rather basic. Here at Oxford, for example, we are looking at our backlist archive project, and trying to work out what the settlement means for us. Many publishers will not have the mission nor the means to overcome the formidable obstacles involved in giving their print backlists an online life. But whether the lost scholarship is made available again through the settlement or also through the activities of publishers, the means may be different, but the end is the same. The settlement gets authors, readers, and publishers farther and faster than if we had been left solely to our own devices.

To be clear, as noted above, the settlement is certainly not perfect and the solution to dealing with orphan works is particularly problematic: Google should not have the exclusive ability to exploit these works, and further refinement is needed to ensure that the Book Rights Registry can license those titles to others besides Google. Yet it also seems more likely that orphan-works legislation will be forthcoming if the settlement goes ahead. And it is important that all of the participants to the settlement, and especially Google, should now publicly commit themselves to supporting the needed new legislation in meaningful ways. We may also find the orphan-works issue diminishing in scale over time, as rightsholders come forward, should the program be successful.

In any event, antitrust authorities must be both vigilant and responsive to any anticompetitive effects of the settlement and anticompetitive practices that may flow from it.

With the exception of orphan and any other unclaimed works, the Book Rights Registry will be able to license the scanned material to others, including Google’s competitors. Google is being unnecessarily cautious in restricting the registry from giving licensees better terms than the registry gives to Google. Google may consider the “most favored nations” clauses in the settlement not unreasonable, given the many millions it has spent on digitizing; while others who lost trust in Google when they announced their opt-out, ‘anti-copyright’ scanning project may see this as rewarding bad behavior. Such positioning aside, as the clear search engine of choice for the years that these clauses remain in force, Google does not need those provisions in order to protect its position and its investment. And it should trust the registry to do the right thing.

A lot depends on the Book Rights Registry, and there are justifiable concerns about the trust that the settlement places in that new institution. The choice of its first director, Michael Healy, inspires confidence, but will the publishers and authors on the registry be sufficiently knowledgeable to represent the range of publishing for which they are responsible? Also, some formal means of securing the advice of the library community in the continuing operations of the registry is important and would be welcome.

One can make a case for the settlement by imagining how bad things will be if it fails. The lawsuit could be abandoned. Or it will proceed, at great expense, and except for those on the extremes of the arguments, neither victory nor defeat is palatable. In both cases, the opportunity to bring back to life those rumored ten to twenty million titles is lost. Victory for publishers and authors would halt Google’s scanning and use of in-copyright material, but neither would readily want to sue libraries who now possess the scanned files. Victory for Google would leave millions of scanned files at large, and authors and publishers more uncertain about investing time and money in new publishing.

We cannot now predict all of the places where the settlement will take us, which should make us understandably cautious. But even as we debate the important issues surrounding it, we must not shirk our responsibility to take forward-thinking, tangible steps now—today—by conjuring perilous futures and retreating to the safety of inaction and paralysis.

The settlement raises other interesting challenges: The scholarly world is drowning in information already, so we will need better paths through all this newly rediscovered older material. But what an enviable problem to face.

So we at Oxford University Press support the settlement, even as we recognize its imperfections and want it made better. As Voltaire said, “Le mieux est l’ennemi du bien,” the perfect is the enemy of the good. Let us not waste an opportunity to create so much good. Let us work together to solve the imperfections of the settlement. Let us work together to give students, scholars, and readers access to the written wisdom of previous generations. Let us keep those minds alive.

Tim Barton is president of Oxford University Press, Inc.(OUP-USA)

Google Book Search Settlement: A Publisher’s Viewpoint with Tim Barton and Barbara Cohen

Minow: In that Oxford University Press is a publisher with a mission to expand access to academic knowledge, it has taken a particularly nuanced approach to the Google Books settlement proposal. Could you describe the issues that tipped you to supporting the settlement?

Barton: I would break this into two parts–substantively what inclined us to support the settlement; and what prompted us to voice our support publicly, at the point that we did so.

The settlement was not an easy document to understand, and it took a group of us, working for some months, to form a considered view of it. When we did understand it, what made it in the end straightforward for us to support the settlement was the almost unimaginable access that it will enable to millions of works that were lost to readers and scholars and which, without the settlement, were likely to remain so. We had been working on a project at OUP to bring our own out-of-print books back to life, and we were aware of the very considerable difficulties and costs involved in doing so. From these efforts at digitizing our backlist, we saw that only an entity such as Google would take on the risks and make the investments needed to bring these millions of books back to life. This is because Google wants to make its search engine as useful as possible, in order to secure advertising revenues, and so it can justify the major costs: publishers cannot make anything like the same level of return on selling their out-of-print backlist as Google can in securing revenues as a result of returning the best quality searches. At Oxford it was our mission–one of supporting and disseminating scholarship and education, rather than securing a commercial return–which was the primary driver in our plan to develop a backlist archive. Many other publishers do not have that same mission, and among those who do, few could afford the substantial investment of time and money required.

So the settlement’s promise of enabling students, scholars and readers access to these millions of works is the primary issue that tipped OUP into supporting the settlement.

Of course there are elements of the settlement about which we are less positive, and I think it would have been (and would still be) helpful if those who negotiated the settlement were to come out in advance of the court date with some changes to address sensible reservations which have been expressed.

We decided that we should publicly voice our support for a number of reasons, including what I view as poor branding of this settlement as “the Google settlement.” It is not surprising that the public has been especially cautious–skeptical even–in considering something that sounds as if it is just for the benefit of a company as powerful as Google. But this isn’t just Google’s settlement; Google is a party to the settlement, for sure, but it is equally a settlement which is in the interests of publishers, authors, libraries, and, I believe, the general public. We also felt that while the groups that had negotiated the settlement had done a remarkable job in negotiating it, they were falling short in explaining and promoting it. Those who had negotiated the deal didn’t seem to be coming forward to correct misunderstandings and support it. I can appreciate that, after having slogged through two and a half years of negotiation, they must have relished the prospect of putting it to the side even for a short while. But the vacuum created was filled by outspoken critics, some of whom seemed to have vested interests in scuttling the settlement. Underlying a growing chorus of criticism, we heard repeated misunderstandings about the settlement, as well as a visceral fear of something that seemed to be for Google. But, as I mention above, the settlement was negotiated by authors, publishers and libraries too, and it promises tangible and significant benefits for these groups as well.

I’m not one who eagerly sticks his head above the parapet, but I was quite concerned that, if people did not step forward to voice support for the settlement, it might fail. And that would serve no one except Google’s competitors. Hence, my article in The Chronicle of Higher Education.

Minow: You wrote that making orphan works “available again is a clear public good. Google’s having exclusive rights to use them, as enshrined in the current settlement, however, is not.” Is the Justice Dept right to be investigating antitrust issues?

Barton: I don’t profess to have special antitrust expertise, so I can’t say whether in fact there is a genuine antitrust issue here. But it certainly makes sense that the antitrust authorities should, as they are doing, at least explore whether there is any basis to the antitrust concerns that have been voiced.

That said, I don’t share the concerns that I’ve heard about how the institutional database’s subscription prices will be set. For one thing, the settlement contains many checks and balances, including:

-the overarching dual pricing objectives stated in the settlement (the realization of both market-rate revenue and broad public access to the books in the database)

-pricing which is based on “full time equivalents” (FTEs)

-and an arbitration provision if the parties can’t agree on pricing.

Also, price-gouging is unlikely because of the significant amount of free access enshrined in the settlement: the full database will be available for free to anyone who walks into a public library in the U.S. or who is associated with a U.S. higher education institution, and a significant portion of each work also will be available for free through so-called “Preview Use” to anyone in the U.S. with internet access. With so much free access to this database, what leverage might Google and/or publishers have to charge exorbitant prices? I also view the settlement as offering pro-competitive effects, since it provides authors, publishers and readers with another important route to market. Finally, the significant public debate and the continuing role of the court in overseeing the settlement are both antidotes to any misbehavior.

It is also noteworthy that Google will not be able to use its position to bully publishers into improving the split of revenues from selling works in the settlement: these are helpfully laid out in the document (although I am still uncertain about what the ongoing costs of running the Book Rights Registry are likely to be–an important question for a publisher, given that authors and publishers will pay for these ongoing costs out of their 63% share of the revenue split). So even if one concludes that the settlement gives Google monopoly power in relation to titles in the settlement, the settlement also ensures that Google isn’t able to exploit that position unfairly, whether in relation to readers, authors, librarians, or publishers.

I have also heard antitrust concerns about Google’s exclusive right to use the orphan works they’ve scanned. I do have concerns about orphan works and the settlement, though my concerns aren’t antitrust-related. (In fact, it may even be that, by going out first, Google has cleared a path for others who might be interested in digitizing and offering these works. The risks are certainly now more clear.) But an imperfection I see relating to orphan works is that, at least immediately following the settlement, Google alone has the ability to exploit orphan works, when even the original publishers of these works will share no such right. As I understand it, this is a byproduct of the fact that this is the settlement of a class action and can be addressed only by Congress passing orphan works legislation. I do think that Google should publicly commit itself to supporting this legislation. And I also think that Google should make its database of information about public domain and orphan works publicly available, so other would-be users of this material will have a better sense of the risks they might face.

I also think that Google should drop the “most favored nations” clause in the agreement.

Barbara Cohen: I agree, if only because the MFN’s meaning seems almost uniformly to be misunderstood. I keep reading concerns expressed by people who mistakenly read the MFN as broadly prohibiting the Book Rights Registry from giving any firm other than Google a better deal in any respect than Google has with respect to exploiting any of the books in the database. But in fact the MFN is an extremely narrow clause and is being misread. Only if, during the next 10 years, there is another class action and settlement involving a “significant amount” of the orphan works in the Google database could this clause be invoked. But, narrow though the MFN is, I agree with Tim that Google should eliminate it, if only to ease public concerns. The mere presence of this clause has been read by many as showing Google’s monopolistic desires and this has cast a long shadow. It would be a shame if fears based on a misunderstood clause came to overshadow the settlement’s remarkable potential to do good. If there are steps that Google and the other parties can take now to eliminate these concerns and ensure the settlement’s approval, I hope that they do so.

Minow: Critics say that the non-representativeness of the class is one ground on which it is possible to object to the proposed Book Search settlement. What do you think?

Cohen: My understanding is that the judge will accept or reject the settlement based on whether or not it’s deemed fair to the class, and it’s unclear to me whether the question of the representativeness of the class will weigh into this decision. (If the lawsuit had been litigated, then the representativeness of the class certainly would have been a key issue.)

Barton: As I think about how much choice rightsholders will have to decide whether and how their works may be used, I have trouble seeing what the risks might be in any case. And as I weigh the particular concerns of various rightsholders, my sense is that overall those at the negotiating table did a good job of addressing those concerns that can be addressed by the settlement.

Certainly as far as small publishers and university presses go, the settlement will be a boon (though I acknowledge that there are some university presses who hold a different view). Many small presses do not have the resources that would be needed for them to digitize their own backlists. But now they will have a means to accomplish that goal, via the settlement. Some presses will decide to develop their own digitization projects – there are quality issues relating to including titles in the settlement (scanning quality; limited metatdata; issues about inserts and other material not included in the settlement or excluded from the settlement database; and so on). And those of us who wish to pursue our own backlist projects are free to do so–in addition to, or instead of, including our books in the Google database. Also, the settlement’s promise of disseminating research and scholarship fits squarely within a university press’s mission. University press representation on the Book Rights Registry would ensure that, over time, this group’s interests continue to be protected–a genuine concern, given that scholarly works will make up such a significant portion of the settlement database.

The interests of libraries, too, seem to have been well represented–no surprise, as they were involved in the Google Library Project from the start and were at the negotiating table. As the ALA and other library associations have recognized, the settlement’s potential to provide unprecedented public access to millions of books also advances the libraries’ core mission of providing patrons with access to information.

Authors’ interests generally are so well protected by the Author-Publisher procedures in the settlement that some worry that authors’ interests have actually been expanded beyond their publishing contracts. But, with the total choice offered by the settlement, this doesn’t seem to me a real concern.

I am aware that, while voicing support for the settlement, some academic authors have expressed concerns that privacy and academic freedom principles and open access issues were inadequately addressed–concerns that I trust will be addressed by Google and the Book Rights Registry to the extent that these issues are products of the settlement.

Some of the concerns that I’ve heard academic authors and others raise don’t seem to be products of the settlement itself, and wouldn’t be solved if the settlement is scuttled. For example, with or without a settlement, we should all be concerned about and seek to address privacy issues, as author James Gleick noted at a recent forum on the settlement.

That said, privacy is one of a number of issues which I think Google ought to have addressed in the run up to the Court’s “opt out” date. I have been disappointed not to have had a meaningful statement from Google which would put to rest the privacy concerns relating to the settlement. Earlier I mentioned that strong support for orphan works should also be forthcoming; I’d like to hear Google’s plans and commitments in both of these areas. There has been a great deal of misunderstanding and misinformation about the settlement; but there has also been some very informed criticism. Most of the latter focuses on issues relating to Google, rather than to the other parties, and I think Google should have engaged more with these sensible reservations before the court date, revising aspects of the settlement, as appropriate. Some aspects of the settlement are obviously part of a careful balancing act, to make sure all parties feel able to support it. But some are not, and Google would increase confidence in its likely approach to issues that will come up about the settlement over the coming years if it were now to address some of the current concerns.

Cohen: What’s most striking to me isn’t the different interests that these academic authors have highlighted, but, rather, our shared interests. In a letter to the court written by a group of academic authors, who were requesting an extension of the deadline for “opting out” of the settlement, they highlighted that the court’s approval of the settlement will unquestionably bring about a significant expansion of access to knowledge–and share knowledge for the sake of the general public. In other words, as with university presses and libraries, academic authors also applaud the alignment of the settlement’s broad aim of public access to knowledge with their own scholarly mission.

Barton: On this point–which I view as the most significant aspect of the settlement–we all seem to be in perfect alignment.

Finally, it is also worth considering what happens if the settlement fails. The settlement offers us a vision of a world where all Americans have access–for free–via c. 20,000 public libraries and higher education institutions–to millions of works which are not now available. They would also have substantial free access to those same titles from every (online) computer in the country. Consumers could also purchase these titles (for what I believe will be a reasonable price), and institutions can subscribe to them (again for what I believe will be a reasonable price). The alternative is access to snippets, at most.

The availability of a book used to be determined either by whether a publisher could justify a print run, or by access to the specialized collections of a relatively small number of libraries. Printing technology and cost structures meant that books were put out of print long before their useful lives were over. We now live in a time when technology and the different commercial dynamics around internet search have combined to give us an unprecedented opportunity to make available again the ideas and work of millions of such books written by generations of scholars and writers. Why wouldn’t we grasp that opportunity?