A Writer’s Plea: Figure Out How to Preserve Google Books

The dispute over Google Books continues to rage in the courts and op-ed pages of the country. There are legitimate questions about Google, profit sharing and privacy. But let’s not let the litigation obscure that Google Books provides an unprecedented and irreproducible service to its users.

I’m a science writer at Wired.com, but I’m also working on a book about the history of (what we now call) green technology. My book puts a topic front and center that has been hidden in the footnotes of the American energy story. And without Google Books, I’m not sure it would have been possible to write it. At the very least, my contribution to the book world would have been smaller and shallower.

The searchability, accessibility and breadth of the Google Books collection do not just portend some future best-ever digital library. It’s already the best resource for research that exists.

I’m not a traditional library- or book-hater. I’m a visiting scholar at Berkeley’s Office for the History of Science and Technology and have dozens of books checked out from the UC system. I smell the insides of old books for pleasure.

We take search relevance for granted — that what’s at the top is better than what’s on page 10 — but libraries are still organized around keywords and subject headings. If your chosen subject — say, wave motors in California during the 1890s — is just a footnote in a book, you won’t find it. You’ve got to go look at any book that might have useful information and skim the chapters, index and text. When you’re used to the internet, but you’re stuck digging through machine unreadable documents instead of thinking about and reading them, you want to scream, “This is a job for my robots!”

Even when you do get a topical hit in a library catalog, the information-consumption process is slow end-to-end. There are exactly four hits for “wave motors” in Berkeley’s library system. (Multiple queries with librarians led to some other engineering texts but not much else.) They were located at the rare book library, Bancroft.

You have to register at the library’s front door and receive a 20-minute lecture on dealing with the sensitive materials. No pens, bags or cameras are allowed inside the hallowed reading room. You can’t browse; each book must be delivered to you at your designated seat. When you leave, it’s like going through airport security, except the laptop check is more rigorous. (To be honest, I love going through the ablution ritual, but it just takes sooo long.)

Granted, my “wave motor” example, because it falls within the time frame when many works are in the public domain, highlights the power and beauty of full text access to millions of books. Google Books isn’t quite as good for modern books, but it’s still better than any library.

I’m not sure I believe that “information wants to be free,” but many libraries are managing their works as if information wanted to be kept cloistered and expensive in both time and money. In Chris Anderson’s terms, they are managing for scarcity when the paradigm should be abundance. You can understand why: Libraries have been important for millennia because they could control access to valuable information. Now, that’s a strategy that leads straight to irrelevance.

A lot of smart librarians recognize the imperative of digitization but their institutions rarely give them money for such “low-priority” tasks. Take the library at the National Renewable Energy Laboratory. They tried to get money to put their stuff online, but to no avail.

Given the billions of dollars flowing into renewable energy, you’d think that we’d want to have easy and free access to the research that government dollars paid for in the past. Yet thousands of documents at NREL remain in paperspace only. The only way to read them is to go out to the suburbs of Denver and sit in a room with fluorescent lighting and floral-patterned chairs that still smells of the millions of cigarettes the lab’s engineers smoked during the energy crises of the ’70s.

Actually, now, there is a new way to read some of these documents. I photographed a few dozen documents and cribbed them together into PDFs, which I posted to Scribd. But the efforts of one obsessed person can’t provide the level of resources that people need from serious data sources. My work on the SERI documents is nothing compared to what Google could do with them because the company is built precisely for the task of scanning the world’s documents and making them useful to researchers like me.

So, as we sort out the various privacy, competitiveness and profit issues, let’s not just assume the status quo was the best of all possible information-distribution worlds. It wasn’t — and we know that because Google Books showed us how the system could be better.