Yesterday I clicked on a link to a Forbes.com post and was greeted by a an insterstitial page that said only:

Kindness is a language which the deaf and the blind can read.

This raised a few questions:

What was going through Forbes’ head when it decided to show us this pap? Does Forbes think that maybe we’re on the verge of kindness and just need this nudge?

Did Twain ever actually say this?

Why is there any question about what the deaf can read?

So I turned to Google. Herewith my findings:

1. There are a number of variations, including the more logical

“Kindness is the language which the deaf can hear and the blind can see.”

2. At Google Books, there are 1,800 results for “mark twain” kindness deaf. The ones I poked at do not provide a source for the quote, although The Gratitude Attitude footnotes it…but Google Books doesn’t show the page with the footnote.

3. If you search Google Books by author for the words “kindness,” “blind,” and “deaf”, you get nine results. None of the four that have the quote cite a source for it.

Thanks to the persistence of Javier Ruiz of the British Open Rights Group, you can now read [pdf] the contract between the British Library and Google Books. Google has shrouded its book digitization contracts in non-disclosures wrapped in lead sheathing that is then buried in collapsed portions of the Wieliczka salt mines. It took a Freedom of Information Act request by Javier to get access, and Google restricts further re-distribution.

Javier points out that the contract is non-exclusive, although the cost of re-digitizing is a barrier. Also, while the contract allows non-commercial research into the scanned corpus, Google gets to decide which research to allow. “There is also a welcome clause explicitly allowing for metadata to be included in the Europeana database,” Javier reports.

But it’s disturbing that the cartoon purposefully makes the Fair Use “explanation” unintelligible. Presumably that’s because Fair Use is so complex and so difficult to defend that Google doesn’t even want to raise it as a possibility. Nevertheless, it seems like a missed opportunity to do some education. Worse, it’s a sign that we’ve pretty much given up on Fair Use.

Having written in opposition to the Google Books Settlement (123), I was pleased with Judge Chin’s decision overall. The GBS (which, a couple of generations ago would have unambiguously referred to George Bernard Shaw) was worked out by Google, the publishers, and the Authors Guild without schools, libraries, or readers at the table. The problems with it were legion, although over time it had gotten somewhat less obnoxious.

Yet, I find myself slightly disappointed. We so desperately need what Google was building, even though it shouldn’t have been Google (or any single private company) that is building it. In particular, the GBS offered a way forward on the “orphaned works” problem: works that are still in copyright but the owners of the copyright can’t be found and often are probably long dead. So, you come across some obscure 1932 piece of music that hasn’t been recorded since 1933. You can’t find the person who wrote it because, let’s face it, his bone sack has been mouldering since Milton Berle got his own TV show, and the publishers of the score went out of business before FDR started the Lend-Lease program. You want to include 10 seconds of it in your YouTube ode to the silk worm. You can’t because some dead guy and his defunct company can’t be exhumed to nod permission. Multiply this times millions, and you’ve got an orphaned works problem that has locked up millions of books and songs in a way that only a teensy dose of common sense could undo. The GBS applied that common sense — royalties would be escrowed for some period in case the rights owner staggered forth from the grave to claim them.. Of course the GBS then divvied up the unclaimed profits in non-common-sensical ways. But at least it broke the log jam.

Now it seems it’ll be up to Congress to address the orphaned works problem. But given Congress’ maniacal death-grip on copyright, it seems unlikely that common sense will have any effect and our culture will continue to be locked up for seventy years beyond the grave in order to protect the 0.0001 percent of publishers’ catalogs that continue to sell after fourteen years. (All numbers entirely made up for your reading pleasure.)

Jon Orwant is an Engineering Manager at Google, with Google Books under him. He used to be CTO at O’Reilly, and was educated at MIT Media Lab. He’s giving a talk to Harvard’s librarians about his perspective on how libraries might change, a topic he says puts him out on a limb. Title of his talk: “Deriving the library from first principles.” If we were to start from scratch, would they look like today’s? He says no.

He says it’s not controversial that patrons are accessing more info online. Foot traffic to libraries is going down. Library budgets are being squeezed. “Public libraries are definitely feeling the pinch” exactly when people have less discretionary money and thus are spending more time at libraries.

At MIT, Nicholas Negroponte contended in the early 1990s that telephones would switch from wired to wireless, and televisions would go from wired to wireless. “It seems obvious in retrospect.” At that time, Jon was doing his work using a Connection Machine, which consisted of 64K little computers. The wet-bar size device he shows provided a whopping 5gb of storage. The Media Lab lost its advantage of being able to provide high end computers since computing power has become widespread. So, Media Lab had to reinvent itself, to provide value as a physical location.

Is there an analogy to the Negroponte switch of telephone and TV, Jon asks? We used to use the library to search for books and talk about them at home. In the future, we’ll use our computer to search for books, and talk about them at our libraries.

What is the mission of libraries, he asks. Se;ect and preserve info, or disseminate it. Might libraries redefine themselves? But this depends on the type of library.

1. University libraries. U of Michigan moved its academic press into the library system, even though the press is the money-making arm.

2. Research libraries. Harvard’s Countway Medical Library incorporates a lab into it, the Center for Bioinformatics. This puts domain expertise and search experts together. And they put in the Warren Anatomical Museum (AKA Harvard’s Freak Museum). Maybe libraries should replicate this, adopting information-driven departments. The ideal learning environment might be a great professor’s office. That 1:1 instruction isn’t generally tenable, but why is it that the higher the level of education, the fewer books are in the learning environment? I.e., kindergarten classes are filled with books, but grad student classrooms have few.

3. Public libraries. They tend to be big open rooms, which is why you have to be quiet in them. What if the architecture were a series of smaller, specialized rooms? Henry Jenkins said about newspapers, Jon says, that it’s strange that hundreds of reporters cover the Superbowl, all writing basically the same story; newspapers should differentiate by geography. Might this notion of specialization apply to libraries, reflecting community interests at a more granular level. Too often, public libraries focus on lowest common denominator, but suppose unusual book collections could rotate like exhibits in museums, with local research experts giving advice and talks. [Turn public libraries into public non-degree based universities?]

Part 2: Software architecture

Google Books want to scan all books. Has done 12M out of the 120 works (which have 174 manifestations — different versions and editions, etc.). About 4B pages, 40+ libraries, 400 languages (“Three in Klingon”). Google Books is in the first stage: Scanning. Second: Scaling. Third: What do we do with all this? 20% are public domain.

He talks a bit about the scanning tech, which tries to correct for the inner curve of spines, keeps marginalia while removing dirt, doing OCR, etc. At O’Reilly, the job was to synthesize the elements; at Google, the job is to analyze them. They’re trying to recognize frontispieces, index pages, etc. He gives as a sample of the problem of recognizing italics: “Copyright is way too long to strike the balance between benefits to the author and the public. The entire raison d’etre of copyright is to strike a balance between benefits to the author and the public. Thus, the optimal copyright term is c(x) = 14(n + 1).” In each of these, italics indicates a different semantic point. Google is trying to algorithmically catch the author’s intent.

Physical proximity is good for low-latency apps, local caching, high-bandwidth communication, and immersive environments. So, maybe we’ll see books as applications (e.g., good for physics text that lets you play with problems, maybe not so useful for Plato), real-time video connections to others reading the same book, snazzy visualizations, presentation of lots of data in parallel (reviews, related books, commentary, and annotations).”

“We’ll be paying a lot more attention to annotations” as a culture. He shows a scan of a Chinese book that includes a fold-out piece that contains an annotation; that page is not a single rectangle. “What could we do with persistent annotations?” What could we do with annotations that have not gone through the peer review process? What if undergrads were able to annotate books in ways that their comments persisted for decades? Not everyone would choose to do this, he notes.

We can do new types of research now. If you want to know whether the past tense of “sneak” is, 50 yrs ago people would have said “snuck” but in 50 years it’ll be “sneaked.” You can see that there is a trend toward regularization of verbs (i.e., not irregular verbs) over the time, which you can see by examining the corpus of books Google makes available to researchers. Or, you can look at triplets of words and ask what are the distinctive trigrams. E.g., It was: oxide of lead, vexation of spirit, a striking proof. Now: lesbian and gay, the power elite, the poor countries. Steve Pinker is going to use the corpus to test the “Great man” theory. E.g., when Newton and Leibniz both invented the calculus, was the calculus in the air? Do a calculus word cloud in multiple languages and test against the word configurations of the time. The usage of phrases “World War I” and “The Great War” cross around 1938, but there were some people calling it “WWI” in 1932, which is a good way to discover a new book (wouldn’t you want to read the person who foresaw WWII?). This sort of research is one of the benefits of the Google Books settlement, he says. (He also says that he was both a plaintiff and defendant in the case because as an author, his book was scanned without authorization.)

The images of all the world’s books are about 100 petabytes. If you put terminals in libraries so anyone can access out of print books. You can let patrons print on demand. “Does that have an impact on collections” and budgets? Once that makes economic sense, then every library will “have” every single book.

How can we design a library for serendipity? The fact that books look different is appealing, Jon says. Maybe a library should buy lots and lots of different e-readers, in different form factors. The library could display info-rich electronic spines (graphics of spines) [Jon doesn’t know that this is an idea the Harvard Law Library, with whom I’m working, is working on]. We could each have our own virtual rooms and bookshelves, with books that come through various analytics, including books that people I trust are reading. We could also generalize this by having the bookshelves change if more than one person in the room; maybe the topics get broader to find shared interests. We could have bookshelves for a community in general. Analytics of multifactor classification (subject, tone, bias, scholarliness, etc.) can increase “deep” serendipity.

Q&A

Q: One of the concerns in the research and univ libraries is the ability to return to the evidence you’ve cited. Having many manifestations (= editions, etc.) lets scholars return. We need permanent ways of getting back to evidence at a particular time. E.g., Census Dept. makes corrections, which means people who ran analyses of the data get different answers afterward.
A: The glib answer: You just need better citation mechanisms. The more sophisticated answer: Anglo-Saxon scholars will hold up a palimpsest. I don’t have an answer, except for a pointer to George Mason conf where they’re trying to come up with a protocol for expressing uncertainty [I think I missed this point — dw]. What are all the ways to point into a work? You want to think of the work as a container, with all the annotations that come up with it. The ideal container has the text itself, info extracted from it, the programs needed to do the extraction, and the annotations. This raises the issue of the persistence of digital media in general. “We need to get into the mindset of bundling it all together”: PDFs and TIFFs + the programs for reading them. [But don’t the programs depend upon operating systems? – dw]

Q: Centralized vs. distributed repository models?
A: It gets into questions of rights. I’d love to see it as distributed to as many places and in as many formats as possible. It shouldn’t just be Google digitizing books. You can get 100 petabytes in a single room, and of course much smaller in the future. There are advantages to keeping things local. But for the in-copyright works, it’ll come down to how comfortable the holders feel that it’s “too annoying” for people to copy what they shouldn’t.

Charlie Leadbeater has a terrific post on the threats posed by the fact that The Cloud (as in “cloud computing”) too often actually is a recentralizing of the Net by profit-seeking companies.

The easiest example cited by Charlie is Google Books, which provides a tremendous service but at the social cost of giving a single company control over America’s digital library. The problem here isn’t capitalism but monopolization; an open market in which other organizations could (the pragmatic “could,” not the legal or science fiction “could”) also offer access to scanned libraries would create a cloud of books not solely controlled by any single company. (The Google Books settlement threatens to rule out competition because without an equivalent agreement with publishers and authors, any other organization that scans and provides access to books runs the strong risk of being sued for copyright infringement, especially when it comes to books whose copyright holders are hard to find. The revision of the Settlement is less egregiously monopolistic.)

Here is a letter Lewis Hyde sent to Judge Denny Chin who is considering the proposed Google Books settlement. I’ve also appended a supporting letter written by Eric Saltzman. The issue is that the newly-proposed trustee overseeing the handling of “orphaned works” (i.e., works that are still in copyright but whose copyright holders cannot be found) still does not have the power to adequately represent the interests of the rights holders, especially when it comes to allowing companies that are not Google to license the works. Granting Google a monopoly on these works seems like too much of a reward for Google’s scanning of them (which I’ve costs about $30/book), and does not seem to serve the interests of the rights holder or — more important, from my point of view — the overall social good of increasing access to these works. (Note: I am not a lawyer.)

So, here are the letters, minus some addresses, etc.:

27 January 2010Â

Dear Judge Chin:Â Â

I write to amend the letter of objection that I wrote last August in regard to The Authors Guild, Inc., et al. v. Google Inc. (Case No. 1:05-cv-08136-DC).Â My August letter is on file with your office as Document 480.Â Â

I shall here limit my remarks to provisions of the amended settlement that are changed from the original settlement, specifically to the role of the newly proposed trustee for orphan works.Â Â

I object to the fact that, despite the amended settlement’s creation of an Unclaimed Works Fiduciary (UWF), the monopoly powers that Google and the Books Rights Registry will acquire, should the Court approve the orphan works elements of the settlement, still stand.Â The settling parties have limited the role of the UWF such that he may discharge some duties of the registry in some circumstances, but little else.Â He cannot act fully on behalf of the rightsholders of unclaimed books; he cannot, for example, license their work to third parties.Â Â

To put this another way, it is still the case that an approved settlement will in essence grant the settling parties unique compulsory licenses for the exploitation of orphan works.Â But why make such licenses unique?Â If the Court and the settling parties believe that they can authorize compulsory licenses of any sort, why not go the extra step and grant such licenses broadly so that competing providers can enter this market?Â Â

To address the problem of monopoly in the market for digital books the UWF should be empowered to act as a true trustee.Â As such, he should make every effort to locate lost owners, communicate to them their rights under the approved settlement, and pay them their due.Â Absent their instructions to the contrary, he should deliver the works of lost owners to the public through the efficiencies of a fully competitive market.Â Â

As Chief Justice Rehnquist has written in regard to the larger purposes of our copyright laws:Â “We have often recognized the monopoly privileges that Congress has authorized … are limited in nature and must ultimately serve the public good…” (Fogerty v. Fantasy, Inc., 510 U.S. 517 (1994)).Â In regard to both content owners and the public, then, the fiduciary needs to operate in an open economy of knowledge and, for that, he will need the freedom to license work to other actors.Â Â

(Note:Â I have asked my attorney, Eric Saltzman, to separately address the question of the UWF’s authority to license orphaned works to others; please see the attached addendum to this letter.)Â Â

My client, Lewis Hyde, tells the Court in his letter of January 27th that the new proposed settlement cannot be fair to the owners of the copyrights in the orphan works and to the public unless it allows the Unclaimed Works Fiduciary to make licenses to other providers to allow competition with the monopoly plan that Google and the Plaintiffs now propose to the Court.Â Â

I would like to offer the Court additional support for Professor Hyde’s objection and suggestion.Â Â

If the named plaintiffs or others who “opt in” to the settlement wish to sign on to it with their own copyrights (and if it survives any antitrust process), then that shall be their prerogative.Â However, the combination in this class action lawsuit of inadequate representation and significant actual conflicts among the so-called class should make the Court skeptical of granting a monopolistic license of the absent members’ copyrights.Â Â

If the Court does decide to approve a settlement of the case, it should not approve one where Plaintiff’s counsel have consented to deliver the licenses for the orphan works to just one licensee.Â

It would be a complete fiction to say that Plaintiffs’ attorneys have adequately represented the orphan works authors and their successors in interest in this case.Â The original settlement proposal clearly demonstrated counsel’s willingness and ability to compromise or, at least, to ignore the orphan works owners’ interests in favor of the named plaintiffs who engaged them and whose assent they needed to cut the deal. Â

The problem of plaintiff counsel shaping a settlement attractive to the clients before them at the expense of absent class members is a well-discussed problem in class action jurisprudence.Â This Court may take notice of an incentive in that direction, the more than fifty million dollars of fees that Google has agreed to pay to Plaintiffs’ counsel if the settlement goes through.Â Â

Allow me to point out two methods whereby the proposed settlements seriously shortchanged the orphan works owners to enrich other class members at their expense. Â

The proposed settlement provides that “Google will make a Cash Payment of at least $60 per Principal Work, $15 per Entire Insert and $5 per Partial Insert for which at least one Rightsholder has registered a valid claim by the opt-out deadline” (Emphasis supplied). According to the settlement, total payments will amount to $45 million. Â

By definition, no orphan work Rightsholders could meet this registration condition.Â Thus was the settlement engineered so that the rightsholders of orphan works and their successors-in-interest would not and could not get any share of the up-front payments total. Â

Evidently, in dividing up the scores of millions of dollars that defendant Google was ultimately willing to pay up-front (i.e., unrelated to yet unproven forthcoming revenues) to settle the lawsuit, counsel felt no obligation to share any of it with the orphan works owners, even if the rightsholder should later appear and wish to register and claim that payment.Â This very large slice of the pie would go only to the known rightsholders, their de facto clients.Â

This economic discrimination against the orphan works rightsholders went beyond just up-front payments. It also took unclaimed (after five years) revenues from exploitation of the orphan works and assigned them to the known rightsholders of other books, thus promising still further enrichment of the client sub-class with actual control over the settlement.Â Â

That particular feature drew such unpleasant attention to the bias in representation in favor of the known rightsholders (and disfavoring the orphan works rightsholders) that it was written out of the settlement proposal now before the Court.Â Nevertheless, the Plaintiffs’ counsel who now urge the court to approve this revised settlement agreement are the same counsel who, in the first settlement go-around, assured the Court then (as they do now) that they had adequately represented the entire class, including the orphan works rightsholders.Â

Commonality and adequacy of representation are two touchstones for class certification.Â “The adequacy inquiry under Rule 23 (a) (4) serves to uncover conflicts of interest between named parties and the class they seek to represent.” Amchem Prods. v. Windsor, 521 U.S. 591 at 625 (1997). Â

In Amchem, the Supreme Court upheld the Third Circuit Court’s decertification of the class because it found that “â€¦the settling parties achieved a global compromise with no structural assurance of fair and adequate representation for the diverse groups and individuals affected. The Third Circuit found no assurance here that the named parties operated under a proper understanding of their representational responsibilities. That assessment is on the mark.” Id at 595.Â

As demonstrated above, much less than promising the “structural assurance of fair and adequate representation for the diverse groups and individuals affected”, the settlements that were and are proposed to this Court suggest that advantaging the named class members at the expense of the unrepresented orphan works rightsholders was a goal successfully achieved during the settlement negotiation.Â

Accordingly, if the Court will entertain a settlement, it should itself take on the burden of making sure that the orphan works rightsholders interests are well protected.Â At this point, the best way to do so is to free the orphan works from the monopoly straitjacket that the proposed settlement forces on them.Â Â

Let the parties live with the deal they made for the parties who were, in fact, adequately and aggressively represented. For the inadequately represented sub-class, the orphan works rightsholders, the Court should empower the UWF (or similar fiduciary) to license their works into the open market. With this authority going forward, the UWF will, as well, be able to adjust licensing of digital rights in these works to the market conditions in an area that is still very new and sure to develop in ways that are, today, impossible to predict.Â Â

Professor Hyde’s objection addresses the two enormous flaws in the proposed settlement:Â 1. the actual conflicts within the class together with the failure of adequate representation of the orphan works rightsholders, and 2.Â the anti-competitive effect of the full copyright term license it would grant to Google only.Â The first undermines both the process by which the settlement was achieved and, correspondingly, the public confidence in the courts.Â The second hurts both the orphan works rightsholders and the strong public interest in access to the knowledge and creativity these books offer.Â Â

Short of a initiating a new attempt at settlement — with new counsel for the orphan works rightsholders — the changes Professor Hyde proposes would achieve a result that would be fair for all the parties and for the public.Â Â

Here’s a summary of the summary Google provides [pdf], although IANAL and I encourage you to read the summary, which is written in non-legal language and is only 2 pages long:

1. The agreement now has been narrowed to books registered for copyright in the US, or published in the UK, Australia or Canada.

2. There have been changes to the terms of how “orphaned works” (books under copyright whose rightsholders can’t be found) are handled. The revenue generated by selling orphaned works no longer will get divvied up among the authors, publishers and Google, none of whom actually have any right to that money. Instead it will go to fund active searching for the rightsholders. (At the press call covered by Danny Sullivan [see below], the Authors Guild rep said that with money, about 90% of missing rightsholders can be found.) After holding those revenues in escrow (maybe I’m using the wrong legal term) for ten years (up from five in the first settlement), the Book Rights Registry established by the settlement can ask the court to disburse the funds to “nonprofits benefiting rightsholders and the reading public”; I believe in the original, the Registry decided who got the money. So, in ten years there may be a windfall for public libraries, literacy programs, and maybe even competing digital libraries. (The Registry may also (determined by what?) give the money to states under abandoned property laws. (No, I don’t understand that either.))

The new settlement creates a new entity: A “Court-approved fiduciary” who represents the rightsholders who can’t be found. (James Grimmelmann [below] speculates interestingly on what that might mean.)

3. The settlement now explicitly states that any book retailer can sell online access to the out-of-print books Google has scanned, including orphaned works. The revenue split will be the same (63% to the rightsholder, “the majority of” 37% to the retailer).

4. The settlement clarifies that the Registry can decide to let public libraries have more than a pitiful single terminal for public access to the scanned books. The new agreement also explicitly acknowledges that rightsholders can maintain their Creative Commons licenses for books in the collection, so you could buy digital access and be given the right to re-use much or all of the book. Rightsholders also get more control over how much Google can display of their books without requiring a license.

5. The initial version said Google would establish “market prices” for out of print book, which seemed vague because what counts as the market for out-of-print books? The new agreement clarifies the algorithm, aiming to price them as if in a competitive market. And, quite importantly, the new agreement removes the egregious “most favored nation” clause that prevented more competitive deals to be made with other potential book digitizers.

From my non-legal point of view, this addresses many of the issues. But not all of them.

I’m particularly happy about the elements that increase competition and access. It’s big that Amazon and others will be able to sell access to the out-of-print books Google has scanned, and sell access on the same terms as Google. As I understand it, there won’t be price competition, because prices will be set by the Registry. Further, I’m not sure if retailers will be allowed to cut their margins and compete on price: If the Registry prices an out-of-print book at $10, which means that $6.30 goes to the escrow account, will Amazon be allowed to sell it to customers for, say $8, reducing its profit margin? If so, then how long before some public-spirited entity decides to sell these books to the public at their cost, eschewing entirely the $3.70 (or the majority of that split, which is what they’re entitled to)? I don’t know.

I also like the inclusion of Creative Commons licensing. That’s a big deal since it will let authors both sell their books and loosen up the rights of reuse.

As far as getting rid of the most favored nation clause: Once the Dept. of Justice spoke up, it’s hard to imagine it could have survived more than a single meeting at Google HQ.

The Open Book Alliance (basically an everyone-but-Google consortium) is not even a little amused, because the new agreement doesn’t do enough to keep Google from establishing a de facto monopoly over digital books. The Electronic Frontier Foundation is not satisfied because no reader privacy protections were added. Says the ACLU: “No Settlement should be approved that allows reading records to be disclosed without a properly-issued warrant from law enforcement and court orders from third parties. ”

Danny Sullivan live-blogged the press call where Google and the other parties to the settlement discussed the changes. It includes a response to Open Book Alliance’s charges.

Harry Lewis has a terrific post about a $300 do-it-yourself book scanner he saw at the D is for Digitize conference on the Google Book settlement. The plans are available at DIYBookScanner.org, from Daniel Reetz, the inventor.

There are lots of personal uses for home-digitized books, so â€” I am definitely not a lawyer â€” I assume it’s legal to scan in your own books. But doesn’t that just seem silly if your friend or classmate has gone to the trouble of scanning in a book that you already own? Shouldn’t there be a site where we can note which books we’ve scanned in? Then, if we can prove that we’ve bought a book, why shouldn’t we be able to scarf up a copy another legitimate book owner has scanned in, instead of wasting all the time and pixels scanning in our own copy?

Isn’t Amazon among the places that: (a) knows for sure that we’ve bought a book, (b) has the facility to let users upload material such as scans, and (c) could let users get an as-is scan from a DIY-er if there is one available for the books they just bought?