Main menu

Navigation

Google Challenges for Academic Libraries

Submitted by editor on 8 February 2006 - 12:00am

John MacColl analyses the reactions many academic libraries may be having to the range of tools Google is currently rolling out and outlines a strategy for institutions in the face of such potentially radical developments.

Introduction: A 'Googly' for Libraries?

A googly, or a 'wrong'un', is a delivery which looks like a normal leg spinner but actually turns towards the batsmen, like an off break, rather than away from the bat. (BBC Sport Academic Web site [1]. Search result found in Google).

How should we understand Google? Libraries still feel like the batsman at whom something has been bowled which looks familiar, but then turns out to be a nasty threat. At Edinburgh University Library, as I suspect in many academic libraries, the suite of tools introduced in recent years by Google is a topic of daily conversation. What is sometimes interesting is the reaction to this fact. At a presentation to academic staff of the Library Committee in December 2005, we found that some academics were puzzled - even annoyed - at the idea that the Library should perceive any sort of a challenge from Google. This attitude is also echoed in a recent posting by RLG's Walt Crawford - writing on the subject of 'Library 2.0' - who is irritated by the view of some library and information world pundits that Amazon and Google together have bowled a googly at libraries:

Crawford is of the school which opposes the idea that libraries are at the heart of the universe, and it is indeed surely arrogance to say that libraries must reclaim the entire search territory which Google has taken, as though people ever did consult a library in response to every question mark in their heads. We use Google frequently for the sort of information we would previously have sought in places other than the library. Who, for example, in pre-Google days, would have asked their public library for a recommendation about the best quad bike on the market? Libraries, we must never forget, are selections, defined by what they exclude - though we rarely state what that is. Google, on the other hand, is truly universal in a way even our most universal libraries have never been. Libraries aim to be deep selections - and the use of that depth is our challenge. Google also has depth, in the penetration of its indexing, but its depth is not judicious, and it lacks the objectively described scope which is required by authoritative bibliographic tools. Libraries struggle to know how to present it, given that we have no influence over it, not having the familiar vendor- or licensor-client relationship.

Libraries and the Google Portfolio

Librarians talk about Google with their colleagues, of course, but also in blogs, several of which are now devoted to the company and its products. Some of us profess to love it; others profess to hate it. Some of the world's major libraries have become very active collaborators with it. Some librarians think that Google wants to work with libraries to create a better information environment for the world. Google claims that itself, and has recently launched a newsletter for libraries. Others think that Google wants to take libraries' business away from them, believing that, in a digital world, it can do it better. Some think that Google is still just a flash in the pan, and will disappear, or be replaced by other products in time, through the force of competition. Others see it as the ultimate 'killer application' which libraries have sought since the arrival of digital information - an affordable, comprehensive discovery tool which is gradually acquiring all of the content needed by the communities they serve, and which is differentiating its services so that it can reliably serve up respectable, scholarly information on a grand scale, at no cost to the end-user, thus sweeping away the jumble of subject-based discovery tools which libraries currently provide in a painful, ragged way, and replacing them with a final solution.

Which view is correct? The library partners in the Google Print Library Program, now rechristened 'Google Book Search', believe that Google is an opportunity. This programme has set about digitising millions of books from five of the world's top libraries - Stanford, Michigan, Harvard, New York Public Library, and Oxford. For books out of copyright, it seems clear that this programme should do us all a major service, pushing rapidly forward a book digitisation initiative which has been underway via a number of other smaller initiatives (such as Project Gutenberg) for several years now, and filling in the gap between the last 70 years or so, and the Early English Books Online and Eighteenth Century Collections Online publisher initiatives. But what sort of job will Google do in its first venture into full-content services?

The main problem comes with the in-copyright works (which OCLC has calculated represents more than 80% of the total for the 'Google Five' [3]). By digitising those, claims the Association of American University Presses, Google is infringing copyright even before it creates any sort of a service from the digitised texts. What Google wants to do, of course, is to create massive word-by-word indexes, so that users can find copies of books which meet their search arguments. Where the books are out of copyright, then they can be viewed online. Where they are in copyright, then users should see only a few 'snippets', and then be given some easy options of pursuing the book they have identified as meeting their search need - by ordering it from Amazon, or by discovering a holding in a local library (via a tie-up with OCLC's OpenWorldCat initiative). Google argues that this should in fact increase book sales and increase library usage, but several publishers are unhappy.

Google Book Search: The Business Model

Not all Library Directors share the publishers' objections. Some indeed believe that the law suit which the publishers have launched against Google is entirely based on greed: they have suddenly decided that they want some of Google's immense store of cash in return for allowing their books to be digitised. This is of course not a claim which can be made about all publishers, since several are already participating in the Google Program for Publishers, giving Google their digital full-text for its index, with a direct aim of generating online book orders. So the programme has stalled while the law suit is pending, and the 'Google Five' libraries are all avoiding giving Google any in-copyright materials for the time being. Meanwhile, there are indications that Google is planning to rent out online books: a one-week online loan for one-tenth of the book purchase price has been suggested [4]. This is the sort of service which might literally buy off publisher objections eventually, assuming that they receive a proportion of the income from the service, but the library community may perceive a threat from it.

The business model is rental - the same model which NetLibrary uses: an online replication of physical book borrowing. Many librarians at first greeted with horror the idea that the usage of digital material could be circumscribed by print-age practices in this way. Surely the point about digitisation was that it made objects into a free good, capable of being consumed 'non-rivalrously', to use the economic jargon? Why on earth would we perpetuate a print library practice in this brave new world of promiscuous digital objects, available for free to everyone, happy to be used multiply and simultaneously across the globe?

But of course, we see better now that this was a naïve reaction. Without business models, the digital age will never acquire the support of the commercial interests which produce and distribute the goods, and so the digital library which we may be able to parade as utopian will be filled primarily with content of little real value, online ephemera and grey literature. There is a difference here in the way we view legitimately commercial content, such as textbooks, and what we might regard as illegitimately commercial content, such as research outputs -articles published in academic journals. Stevan Harnad has long made the distinction between the 'for trade' and 'for free' literature [5]. Textbooks are usually written by academic or other authors with an intention to reap some financial benefit; research outputs are rarely written with that intention. The publishing industry, however, has treated both types as though they were trade literature, except that, in the case of research outputs, it has kept all of the profits from sales, rather than split them with authors on a royalty basis as happens with books. As the digital age develops, we need to have a sophisticated approach to the commercialisation of content, and to be able to accept different business models for different types, provided that they are appropriate to each.

But - as with much of what Google offers - we find ourselves in a condition of febrile speculation about exactly how it will develop, and therefore what it will mean for us in the longer term. The business model is unclear, and we cannot gauge its impact upon us. This makes librarians uneasy about Google in general. Leaving aside rental, what exactly is Google Book Search's business model? It can certainly be argued that it has the potential to drive traffic both to bookshops - physical and online - and to libraries - physical and online. The American university publishers are crying foul over its plans to digitise full-text of copyrighted content in order to generate indexes from it. But the indexing of in-copyright material is not new. What else is Current Contents - or indeed the indexing and abstracting of journal articles in the services we have used for years in our academic libraries, in both printed and online forms? Admittedly, that is content which has been indexed manually - or semi-manually in the case of the ISI offerings, but that makes no difference. An index is a means - literally - of advertising content, and Google understands advertising better than anybody. Publishers do not worry about Current Contents because they see it as driving traffic to their journals, not replacing it.

The big difference with the Google Book Search initiative is that it creates digital versions of whole texts as a derivative of the indexing process. To the publishers, this is the plutonium, the dangerous by-product, which results from the good and healthy activity of generating indexes. Who knows what Google might do with all of that digitised content when the backs of libraries and publishers are turned, or when it decides to mount a legal challenge to the established order? It might divide and rule the publishing community by offering rentals at 10% of the list price, as has now been suggested. And if it get its foot into that door, then might our patrons decide not to come to our libraries, once all books are available either to buy or to rent, online?

At the very least, Google has produced a very rich post-coordinate index to huge quantities of content, allowing users to 'meta-read' texts much more easily than they could do conventionally - by having to visit a bookshop or library, skimming through a book with the help of a back-of-the-book index, if one is available. Having ubiquitous anytime access to Google's gigantic index is likely to mean that where users do decide to purchase, or rent, a book, they do so because they know that 'meta-reading' it won't be sufficient for their purpose. This should therefore make book purchase or rental a more judicious activity, but at the same time perhaps a more frequent one as well.

The Book Search programme, by getting inside vast quantities of printed books and generating indexes from their content, is splitting the information atom for our community. This is radical and new. Google splits open resources of all kinds - now even books - and indexes the atomic contents word-by-word. The power of Google's word-by-word indexes is much greater than that of indexes we are used to because of the extent of the content it reaches. Its creation is mundane, and robotic - compared to the efforts of human indexers - but it goes much further, much deeper, than any previous index could, and unearths material which was largely unfindable. If it were ever able to index the entire book contents of a universal library then it would undoubtedly have created a genuinely significant library product. Since this is very far from being true or even likely at the present time, Google Book Search is at least likely to be successful in its appeal to advertisers in that it gives hits a more thought-processed feel. Hits from published books will have more authority than hits from amateur Web sites, or indeed almost any Web sites, because they surface content which is deep both in being essentially off-Web, and in being the product of intellectual effort and research, which others have deemed worthy of publication.

Google Scholar: Scholarly Credibility?

On the face of it, Google Scholar is a tool with a huge amount of potential. It capitalises on Google's already large embeddedness in the search practices of researchers. Its coverage, however, is of academic material - journal articles, reports, conference proceedings, and e-theses and dissertations. It occupies a large portion of that selected territory which is the purview of libraries, and so must therefore be seen either as a parasitic threat or an assistive tool. It ranks results by relevance, as with the general Google engine, but its algorithm in this case includes citedness, and so it is engineered for the academic quality and reward system in which academics and researchers work. And it can plug into the holdings of individual libraries to allow resolution to full-text on the basis of subscribed content. It therefore contains all of the elements of the sort of search service which we in our libraries are trying to provide by purchasing federated search tools. Even better, it is not based on federated search technology, which some have argued is a 'broken technology' [6] - at least in many of its vendor manifestations to date. In his weblog earlier this year, Lorcan Dempsey wrote:

How quickly things can change! Last year there were discussions about the Google-busting potential of metasearch. How naive. This year there are discussions about the metasearch-busting potential of Google Scholar. Let us wait and see. [7]

Here is our dream - an interface to all of our subscribed content which is as fast and responsive and clean and as solidly branded as Google's (because it is Google's). Again, Dempsey sums this up well:

Libraries struggle because they manage a resource which is fragmented and 'off-web'. It is fragmented by user interface, by title, by subject division, by vocabulary. It is a resource very much organised by publisher interest, rather than by user need, and the user may struggle to know which databases are of potential value. By off-web, I mean that a resource hides its content behind its user interface and is not available to open web approaches. Increasingly, to be on-web is to be available in Google or other open web approaches. These factors mean that library resources exercise a weak gravitational pull. They impose high transaction costs on a potential user. They also make it difficult to build services out on top of an integrated resource, to make it more interesting to users than a collection of databases. [7]

But there are aspects of Google Scholar which fail to satisfy us, as librarians. Most unacceptably, Google does not tell us exactly what its coverage is. Even in its list of FAQs, it does not include the question which would surely be the most commonly asked question by librarians, 'Which journals do you index?' It does not permit subject searching either, as we understand it. You cannot, of course, search keyword descriptors, as you would be able to do with any of the abstracting and indexing services we take on subscription. Subject searching in Google Scholar therefore relies on judicious use of keyword or phrase searching - the same natural language limitations as apply to vanilla Google, and not really as precise as we would wish to provide for academic searching. But for known item searching - for that paper by this author on this topic for instance - it is often as good as any of the abstracting and indexing services we take, and better in that it is Google - easy and free and used by everyone.

Ultimately, what disconcerts us is its opacity. We cannot see under its bonnet, and so we cannot really trust it. It is fine for our users to use if they choose, but we find it difficult to give it our imprimatur. Google Scholar therefore gives us a public relations headache. Where we do have the full-text content which fulfils results it provides, and we have hooked our environment up to Google Scholar (via IP range definition and our OpenURL resolver), we want our libraries to be identified with Google Scholar's success in bringing users to it ('Look', we say, 'You only got to the full text of that reference because we have purchased it'). Where we do not, we want users to know that we do not endorse such a fickle service. In short, Google has put us in a new dilemma which is difficult for us culturally as a controlling profession: should we collaborate with a service whose limitations we cannot justify, and which we have not evaluated or selected?

Or are we being too librarian-like about this? Too 'Library 1.0', to use the recent definition of Michael Casey [8] and the Talis White Paper? [9] After all, Google Scholar is free, and it gives us much of what we profess to want from a multi-database search tool. In addition, we know that our users are already using it. Maybe we should just go with it, explaining to our users its potential shortcomings, and offering a disclaimer on our part: that we cannot give it our unqualified blessing as information professionals, but nonetheless it does a reasonably good job - though we have other offerings which have passed our quality assessment tests and which we therefore recommend more strongly. If, on the other hand, we believe that the vendors are finally beginning to deliver federated search services which work (and the fact that WebFeat was awarded a patent last year for its federated search engine might indicate that they are), then we have to face the difficult task of convincing our users that a new tool is better because its depths are known and understood, and we have some control over the shape of them. Libraries cannot expect their patrons to stop using Google for academic and scholarly purposes. They can and must face the challenge of providing something better.

Google and the Nature of Our Business

Google's new technologies for indexing, ranking and digitising, operating on the vast scale of the Internet, have brought us into a nuclear age of information. That means that we librarians need to rethink our own organisations, but it does not mean that they are now not needed, or have been replaced. We can be impressed, but we should not be overawed. New technologies do not change principles.

The challenge for us is that we have not been here before. Google is an inevitable consequence of the Web. Valued now at $US80 billions, it meets a clear need. It is unbeatable for natural language searching, but it is not the answer to all searching. In researching this article, for instance, I did the majority of research via weblogs I use regularly, because I wanted access to minds I respect and their processed ideas.

Google is new and big, and it intrudes on our landscape. It comes to us offering peace, but on its own terms, which we cannot change, having no purchaser power. The challenge we face is one of understanding both the extant power and the potential of such a major force. We must be cautious, try to see ahead, and predict where Google is going next. As Google moves into for-fee services, such as book rental, we must try to reach deals before they are forced upon us. And we must work out our own position so that - if we are to take a stand and resist Google in any form - be it Google Scholar, 'Google Book Rental' or any other new service - then we should do so together, in a united way. We must work out the business models which suit our business. This is the age of nuclear information power, and our services, to use Lorcan Dempsey's memorable phrase, need to strengthen their 'gravitational pull'. In our traditional business relationships with suppliers - of federated search engines, or e-books - we must use the Google factor to our advantage by demanding better services, which incorporate Google's strengths of speed and cleanness, and avoid its weaknesses - opacity and blurred scope.

As librarians, running pleasant study environments, containing expert staff, providing havens on our campus which are well respected, and building and running high-quality Web-based services, we will decide which of Google's offerings we wish to promote, and which we are prepared to pay for. And we will stand up - no matter how wealthy we assume our students and academic users to be - for the principle of free and equal access to content, and for the principle of high-quality index provision, whether free or at a cost, because without those principles we are no longer running libraries.