The Fight over the Google of All Libraries: An (Updated) Wired.com FAQ

Google’s plan to digitize the world’s books into a combination research library and bookstore started in 2002 when it first began scanning books without permission from authors. The Google Books project has since grown into an epic legal battle pitting Google and a coalition of authors and publishers who originally sued the search engine against a small army of academics, open-source advocates, Google competitors and a medley of authors.

The Justice Department’s antitrust division has twice weighed in against the settlement, dimming Google’s chances of convincing a federal judge to let it slice through stifling copyright law to create a vibrant online library. Critics argue effectively that the Google Book settlement gives Google an unfair advantage in search and is so complex that it should be left to Congress to iron out.

Google begs to differ, saying the settlement isn’t a bridge too far and calling it a “remarkably creative settlement, beneficial to the parties and absent class members and the public,” that also settles the legal dispute.

Federal judge Denny Chin will have the difficult job of sorting that out Thursday, as he gives the second version of the Google settlement a “fairness hearing.”

The story of Google Books is a complicated one, combining copyright law, anti-trust issues, plain old capitalist competition and the odd problem of orphan books. It’s also the story of one company’s audacious attempt to create the largest and most comprehensive library in the history of the world.

And despite the fancy and upbeat name for Thursday’s hearing, there’s not likely to be any way to make a happy ending out of this plot.

Here’s Wired.com’s FAQ to help make sense of the complicated issue.

Google is a search engine, right? What do words printed on dead trees have to do with it?

Google claims its mission is to “organize the world’s information and make it universally accessible and useful.” If that’s your goal, then a library full of books makes you salivate in hunger for the knowledge held inside. So in partnership with major university libraries, Google began scanning and digitizing millions of books in 2002. The books Google began digitizing included ones like Chaucer’s Canterbury Tales that are no longer copyrighted to the Harry Potter series to books whose authors and publishers cannot be located. The idea is simple, and audacious. Make the library of all libraries by converting every book ever published into an e-book that can be indexed, searched, read — and sold — online.

That’s cool! Where can I find this?

Go to Google Book Search, for one. You might also see book snippets in Google’s Web search results.

How many books are in there already?

Google has scanned more than 7 million books as of April 2009.

Can I download or buy old books through Google right now?

Yes and no. Google lets you download any book it has scanned that is not in copyright in the U.S. anymore – books that have fallen into the public domain. You can also turn those books back into hard copies — on demand — at selected bookstores around the world that have an Espresso book printer. For other books, the online display shows up to 20 percent of the text, and usually includes links to places to buy it online if it’s in copyright.

What about new books? Are they included?

Many are, but that’s through Google’s Partner project that lets publishers and authors decide how much or how little of their books go into Google’s index, as well as letting them get a portion of the money from ads shown next to their book pages. New books aren’t part of the settlement.

How did Google get away with scanning 7 million library books?

Well, there’s no problem with scanning millions of public domain books so long as you have the cash, cool technology and cachet to convince some of the world’s best libraries to work with you. As for in-copyright books, Google says it has the right to scan and index them, and show snippets online, under the Fair Use doctrine, which carves out exceptions to copyright holders’ rights. Google was prepared to defend itself on these grounds before getting a better deal. Being a massive company, mostly loved by users, also helps.

So could I go into the library and legally rip every music CD and video they have, and put snippets of them online, under the Fair Use doctrine?

That’s an interesting question. How good is your lawyer and how big is your bank balance?

Then why did the Authors Guild and the Association of American Publishers sue Google in 2005?

Well, once they saw Google using snippets of the books in search results and making money off it, they decided they deserved some of it. After all, they wrote the books. At least some of them, anyway.

Why did Google settle in 2007 if it has the right to do this? Especially since they have to pay $125 million in lawyer fees and royalties?

Well, Google could have fought a court battle to definitively answer that question — setting ground rules that all search sites would have to follow. But that was risky and potentially put the company at risk of losing, setting a bad precedent for ‘fair use’ and having to pay billions in fines.

By contrast, the settlement gives Google the legal cover to digitize all books that are in copyright. That’s exclusive cover. For books that are copyrighted but out-of-print, Google gets to show 20% of the book online and sell digital copies of it, keeping 37 percent. For books in-print and copyrighted, Google gets the right to scan the books, use snippets in search results and use them for some research.

What about anthologies or photos licensed for use in a book? How does that work?

Who manages authors and publishers’ rights if Google is going to be advertising next to book pages and selling books?

The newly-created Book Rights Registry is in charge of finding rights holders, collecting and disbursing payouts, setting prices and negotiating other deals. It’s not unlike the ASCAP system that collects royalties for song writers, musicians and publishers.

What about libraries?

University and public libraries around the country will get one free subscription for one computer that will let users read and print all pages from the full text of all the books in Google’s catalog, excluding books still in-print. Beyond that, libraries and institutions can order additional subscriptions. The demand is likely to be high. Very high.

What is an author’s role in all this?

Rights holders can go to Book Rights Registry’s database and choose whether to let Google include their works, sell them online, and show snippets and ads. They can also opt-out and reserve the right to negotiate their own terms or sue Google later .

How can Google get a monopoly? Can’t the Book Registry negotiate with other entities that want to do the same thing?

No other project has come close to scanning as many books as Google has, in no small part thanks to the size of Google’s bank balance. The Open Book Alliance calls on the government to create a digital Library of Congress that could license a similar size database to any qualified companies.

Of course, if Microsoft wanted to catch up it could try to make an agreement with the Book Registry, but only for those authors it can speak for – in other words, the known authors of copyrighted books.

Is the opposition to the settlement all about the so-called orphans?

Not solely, but it’s a huge problem. There are more orphans (books whose copyright holders can’t be located) than in a Dickens novel. Google won’t say how many there are. But UC Berkeley Professor Pamela Samuelson estimates that 70 percent of books that are still in copyright have rights holders that can’t be found.

What’s the problem with orphans?

Copyright infringement can be expensive – up to $150,000 per violation. So if you scan an old book and start selling copies of it, or displaying chunks of it on the web, and the orphan’s father shows up one day waving a paternity test in your direction, you could face a mean copyright infringement suit. Unless you are Google: Since all U.S. book copyright holders are now plaintiffs in the lawsuit, Google gets liability protection from authors who abandoned their books by not registering in its books database. If they show up later, all they can do is collect a little cash, change their book price or ask Google to stop selling the book.

Could Google end up with the most comprehensive online library in the world? Won’t libraries place thousands of subscriptions due to overwhelming demand? And since there’s only one vendor (Google) and the Book Registry will set the price, won’t the price be incredibly high? Or at least climb that way over time?

Bingo.

Why can’t Amazon or Yahoo or Microsoft go to the Book Registry and get an orphans waiver like Google is getting?

The registry can set rates and negotiate contracts for all authors, unless they opt-out. But signing away unknown people’s rights to sue? Only a judge in a class-action lawsuit or Congress can do that.

So what changes did Google/The Authors Guild make in November when they submitted the amended version? And why do the feds still object?

The settlement was changed in some odd, but important ways. It changed from being worldwide to covering only works registered in the U.S. or published in the U.K., Australia or Canada. If Google makes money selling a book whose author can’t be found the money is held onto for 10 years, and then it is either turned over to states or given to literacy projects. Previously, the unclaimed money was to be distributed among the known authors. Other retailers can sell out-of-print books, getting most of the 37 cents that goes to Google for that sale. Authors also have more control over how their works are displayed and shared, including choosing Creative Commons licenses that allow for re-use.

If another company wants to digitize, display and use orphan works without the Sword of Damocles hanging over its head, it has to start digitizing without permission, get sued by a reasonable plaintiff and the go through this settlement process again?

Exactly. They could negotiate with the Books Registry to get the same or better deal than Google got for known works, but that leaves the orphan works gap.

That’s ridiculous. Isn’t there a better solution to the orphan works problem?

Yes. For one, Congress could step up and pass a law about orphan works. But the last time Congress passed a substantial copyright law, it extended them for 20 more years to keep Mickey Mouse from entering the public domain. That’s kept more and more books from reaching the public domain. Don’t expect much help here.

Is a lot of money at stake?

If you think all the value in digitizing the world’s knowledge will come from selling out-of-print books as e-books for an iPhone, you’re not thinking like Google is. Think of all the subscriptions that universities and colleges and high schools and corporations will need to buy. Think of how search could be improved if you can test your algorithms on a huge digitized swath of the world’s knowledge. And current thinking about search engines is that most people choose as a default the search engine that gives them the best answer on the “rare” or hard queries. Google Books is one way to that solution.

Think of the data that could be mined from an index of tens of millions of books, or how a question-and-answer service resembling artificial intelligence could be created. Google “the Singularity.” Or better Google Book Search “The Singularity”.

Why is the Justice Department involved and disapproving?

The Justice Department has made it clear over the last year and a half that its worried about Google becoming an abusive monopoly — and they considered the idea of Google getting sole power over orphan works to be a real problem.

That’s why they opposed the original settlement saying, “A global disposition of the rights to millions of copyrighted works is typically the kind of policy change implemented through legislation, not through a private judicial settlement.” And while they commended the changes made in the Amended Agreement, the feds stuck to their argument that the issue at hand was too big for a court or a lawsuit to change.

When does all this end and I get to start browsing the library of the future and buying out-of-print books?

The federal court’s final hearing on the fairness of the settlement is on February 18. Then the judge has to rule, which could take months. In the best case scenario for Google, it will have something resembling the library of the future online sometime in 2010, but given the number of lawyers eying this deal and the potential amount of money at issue, one can be pretty sure the legal battle will drag out far into the 2010s, no matter what the court rules.

Note: This FAQ was originally published on April 30, 2009. It has been substantially updated and edited to reflect the events of the last nine months.