For librarians, it must seem that the web has turned information gathering on its head. The Internet is a heady young fellow, self-obsessed, self-referential, and unflinchingly modern in its focus (see right). Libraries house history, centuries of wisdom buried deep in stacks, and even deeper in the un-searchable text of yellowing book pages. So how can libraries remain relevant?

Let’s first examine the mission of public libraries versus the mission of major search engines. The following excerpts are from the mission statements of several major libraries and a certain web giant - see if you can distinguish between them (hover over or click the link for the answers):

Clearly the missions of Google (and other search engines) are converging with those of the leading libraries. Google recognized this as an opportunity and launched the Google Library Project in late 2004. The project started with five major library partners, but has since extended to 11 libraries in three countries. Google is digitizing the contents of prestigious libraries such as Harvard, Stanford and Oxford, increasing access to tens of millions of unique books that were once accessible only to a small, elite group.

Google spends between $10 and $30 for every book it scans. The entire project, which will span at least a decade, will only cost Google the equivalent of its 4th quarter profits in 2006. Not a bad investment for the web giant.

Google has already made these books available on its Google Book Search, a fascinating portal that for the first time in human history opens up rare and not-so-rare books to anyone in the world. Not only are these books are fully text searchable, Google has recently announced an integration with Google Maps, making librarians, technologists and Google-philes giddy. No more leafing through musty books to find a quote or location.

Google’s Library Project is distinctive from several others such as the Open Content Alliance sponsored by Yahoo and Microsoft because Google is barreling ahead and scanning copyrighted texts. This has not only provoked lawsuits, but more importantly, it has also provided a necessary impetus to publishers and libraries to address the issue of how to manage copyrighted books in the digital era.

Google Book Search allows full text search for copyrighted works by simply telling you that the terms you searched for are in the book. Google then provides a tantalizing “snippet view” of the text as if it was torn right from the page. If you want to read the whole book online, however, you’ll have to wait. Rather than selling the e-book, Google paradoxically directs you to amazon.com which will happily mail you a hardback in 5 to 7 days for $21.95.

This is about to change dramatically. On January 21, 2007, Google announced to the Times of London that it would launch an e-book service. Details are murky, but it seems likely that users will be able to purchase all or part of copyrighted books. I can only reiterate that pricing matters. With e-books, publishers can increase the exposure of previously obscure books and eliminate publishing costs. Ideally, this will increase profit margins and create significant savings for consumers. Because digitized books are easily divided, e-books could lead to a new model of micropayments enabling consumers to purchase only what they need, be it a chapter, a paragraph, or even just a quote.

.

Public Library Reactions

Google’s mission is not without its critics. Jean-Noël Jeanneney, president of France’s Bibliothèque Nationale wrote a plaintive book called Google And the Myth of Universal Knowledge warning of Anglo-Saxon cultural imperialism and the risk of market-driven libraries:

“As anyone who uses Google knows, what is intrinsic to all the information it provides is hierarchization. Even if there are many pages of results, the searcher rarely goes beyond the first few. …The profit motive will necessarily promote one product over another.”

As long as there have been publishers with a marketing budget, there have been attempts to woo readers. And while the psychological effect of publishers’ advertising cannot be stopped at the door of library, our French friend would like to see it diminished.

There is some merit to this view, but not much. Libraries and library science will continue to weigh market forces against intellectual ones, but this new digital medium should not be made the culprit. If a library were to license, buy, or rent the contents Google’s digital library, couldn’t they simply reorganize it in a neutral, intellectualized way that would make even Mr. Dewey Decimal proud? Or should the public libraries simply create their own digital library system from scratch?

.

Public Library Initiatives

Most libraries do see the upside to digitizing their libraries or they wouldn’t be working with Google. In fact, Google recently gave a $3 million grant to the Library of Congress for its World Digital Library Project in conjunction with UNESCO. The project is focused on improving web access to rare materials that, “…are physically stored in geographically dispersed locations, and which, when brought together with other collections through cross-national and cross-cultural multilingual search and browse capabilities, will yield new knowledge and insights.”

The World Digital Library may sound ambitious, but its scope is much more limited than that of Google Books. It will focus primarily on the long end of the tail: rare cultural treasures that most of us don’t use, rather than popular literature that most of us check out from our local libraries.

So we have the ivory tower approach and the commercial approach. Caught in the middle are the libraries that most Americans use.

Is Google the only answer? To be sure, they have a massive head start (see statastic below). In a decade they may have more books in their digital collection than any library system on earth. But if there is true intellectual concern about the earth’s largest library being in the hands of a profit-driven company, why not launch a public initiative? Can the U.S. government even afford it?

It’s all about priorities. If the U.S. government decided to scan and digitize every one of the 65 million books on earth, it would only cost about $2 billion. That’s less than we are spending for one week in Iraq, and it’s less than kids (I presume it’s kids) are paying for cell phone ring tones each year. We can afford it; so far, we just haven’t chosen to.

Even if the public sector did spend resources digitizing, libraries would face copyright issues. Tomorrow I will look at that as well as e-books initiatives at local libraries. And later this week, a case study that imagines the The DC Public Library in 2017.

.

.

.

Sources and assumptions: Google has not disclosed the number of books it is digitizing or its timeline for completing the project. Early in the project Google claimed that it would scan 3,000 books per day. This figure was used for the low estimate. The high estimate was based on expanded date searches (1500 to 2007) on Google Books that returned about 4.5 million books. This was extrapolated back to the beginning of the project to find the scan rate, which was then used to project the high estimate. Statastic believes that the high estimate is probably more accurate because the 3,000 books per day figure referred to a contract with only the University of California library system. The fact that Google continues to add libraries to the project indicates that Google is likely to accelerate the scanning rate.

Wednesday, August 9th, 2006

Sure English has the most entries in Wikipedia - 1.3 million at last count - but we also have half a billion native speakers around the world. In fact, it takes 250 Internet users to produce a single English entry on Wikipedia.

So which country is the most prolific? A mere 2 million Slovenians have cranked out 32,000 entries, and they only needed an average of 30 Internet users to write each entry.

And 8.8 million Swedes have produced a whopping 177,000 entries. So if anyone ever asks you how many Swedes it takes to screw in a Wikipedia entry on lightbulbs, the answer is 50. The punchline is that it takes 389 English speakers and more than 11,000 Chinese speakers to do the same. This is hardly surprising given that Sweden is one of the most industrious and entrepreneurial countries in the world, home of Ikea, Volvo and the Swedish Chef.

But look out for the Lusophones. The 210 million Brazilians and Portuguese have produced 169,000 Wikipedia entries so far, and while they’re not the most efficient, that’s still an impressive 28% increase in entries since April, 2006.

Here are the top 25 languages used to write Wikipedia entries. The first statastic is a snapshot of which native speakers produce the most entries per capita. The second measure shows how productive the language groups are given their access to the Internet.

Internet usage: Wikipedia entries as of August 9, 2006. Internet usage data is from 2004. In cases where a language spans several countries, a weighted average was used to determine estimated Internet usage data for a language group.