Monday, October 17, 2005

Time to take the red pill

Listening to presentations, and talking to delegates, at Internet Librarian International 2005 (ILI) last week, I was reminded of the film The Matrix. In the movie, the main character is offered an opportunity and a choice: he can take the red pill and see the truth; or he can take the blue pill and return, comfortably unaware, to the illusion that is the world of the Matrix, and life will simply carry on as before.

With the Internet continuing to challenge their traditional skills and roles, information professionals face a not dissimilar choice: embrace the reality of the new world they inhabit, or seek to deny it, clinging to a now outdated illusion of reality.

Disconcerting

For while information professionals initially welcomed the arrival of the Internet, many have become increasingly concerned that it poses a significant threat to their settled world.

This concern was all too evident at ILI, with both delegates and presenters clearly of the view that many traditional notions of information science are under attack from the Web. Long-standing classification systems, for instance, are threatened by newer notions of categorisation; hierarchical indexing is having to give way to the flat indexing of the Web; and taxonomies face growing pressure from new-fangled concepts like folksonomies.

For information professionals — who pride themselves on the many skills and techniques that they have developed over the years — this is both disorientating and distressful. If that were not enough, the Web challenges the very notion that information intermediaries have a role to play any more in a networked world.

None of these anxieties are new, of course, but the depth and intensity of the pain information professionals are experiencing was all too palpable at the London event. Certainly there was a desperate need to appear relevant. As one librarian plaintively put it, "We need to find ways to put ourselves back between the information and the user."

That said, some information professionals — generally the younger ones — are embracing the new world. Michael Stephens, a special projects librarian at St. Joseph County Public Library in Indiana, for instance, gave a presentation in which he talked with great enthusiasm about how libraries can exploit wikis, instant messaging, and podcasts to enhance the services they provide for patrons.

Stephens also bravely volunteered to defend folksonomies from the caustic tongue of UKOLN's Brian Kelly who, amongst other things, publicly critiqued Stephen's "inadequate" use of tags when labelling photographs of his dog Jake on the social networking site Flickr. Kelly's aim was to demonstrate that folksonomies are a pale shadow of traditional classification, even in the hands of a trained librarian.

Grumpy old men

All in all, it felt at times as if ILI was awash with grumpy old men muttering bad-temperedly about the good old days, and the shocking ignorance of the young.

This attitude was best exemplified in the keynote given by information industry personality Stephen Arnold. In a paper entitled Relevance and the Future of Search, Arnold complained that the traditional view of relevance in online searching was under siege on the Web.

Specifically, information science's notion of precision and recall (where precision measures how well retrieved documents meet the needs of the user, and recall measures how many of the relevant documents were actually retrieved) was being destroyed by the practises of web search engines, particularly Google.

This state of affairs, he argued, is being driven by the desire to monetise the Web, not least through Google's pioneering of advertising-driven search models. When a user does a search on Google, for instance, the resulting pages of "organic results" (i.e. the product of Google's search algorithm) are placed alongside links paid for by advertisers. Unfortunately, said Arnold, over 90% of users do not differentiate between the paid listings and organic results.

Entirely alien

The situation is aggravated, he added, because people don’t generally click through many pages of search results. This encourages owners of web sites to exploit Google's search algorithms in order to push links to their sites higher up Google's search page. Indeed, said Arnold, a large and powerful Search Engine Optimisation (SEO) industry has been created precisely in order to sell services aimed at "fixing" search results on Google and the other main search engines. As a consequence, he complained, relevance on the Web is now a concept entirely alien to anything understood by information professionals.

As the market leader, and primary innovator, it was Google that attracted the full force of Arnold's ire. “Indexing is not what you learned in library school," he said. "It’s what Google wants. Effectively, SEO is the new indexing model."

In other words, the notions of comprehensiveness and objectivity long promulgated by information professionals as central to online searching have given way to a process whose raison d’être is to falsify search outcomes to satisfy commercial interests. "The SEO market has grown up to take advantage of this new idea of relevance," said Arnold.

To underline the extent to which traditional notions of relevance have been undermined, Arnold cited research done by the UK-based Internet magazine .net, which found only a 3% overlap in search results listed on Google, Yahoo and AskJeeves when the same search term was input. "When is a hit relevant?" Arnold asked rhetorically. "Where is the boundary between SEO and ‘real indexing?'"

Worse, added Arnold, Google's dominance is growing all the time. Whereas in the previous quarter it had had a 51% share of weblog referrals in the US, for instance, this figure is now 62%. (blog referral logs collect information on who visits a website and how they arrived there).

Intellectual dishonesty

After his presentation I asked Arnold why he objected to these developments. "It's intellectually dishonest," he replied. "These shortcuts trivialise indexing." Moreover, he added, it is dangerous. "If a medical term is misused, it could affect a person's life if the appropriate article is not found. Likewise, if a company doesn’t find the right patent document it could cost that company a lot of money. So I really disapprove.”

But is it really likely that a corporate lawyer or a doctor would rely on Google for an exhaustive patent or medical search? And are information consumers really as naïve or stupid as Arnold implies?

As Arnold himself acknowledged, most users probably don’t care if their search results are paid-for ad links, or the product of Google's algorithm. If someone is looking for a restaurant, for instance, what they want to find is a good-enough restaurant, not a long list of every possible eating house available, categorised by thirty different criteria, and listed by the number of available tables! After all, most of the sponsored links turn up on pages where users are looking for products or services. In this case Google is simply acting like a yellow pages directory.

Moreover, even if it is true that web users don’t always understand the way search engines work, they are learning all the time. In fact, as a general rule, users know as much as they need to know, and this is usually more than information professionals give them credit for knowing!

All in all, it was hard not to conclude that Arnold reflects the grumpy old man school of information science. As he himself admitted. "I'm old. I'm dying out."

For all that, while deprecating SEO techniques, Arnold was happy enough to offer the audience five "cheats" they could use in order to ensure their web sites received higher rankings on Google.

Essentially, Arnold's view seemed to be that much is awry on the Web, but there is little to be done but accept it.

They're watching us!

But Arnold had a second point to make. While many still view Google as a search company, he argued, it was now far more than that. Currently offering 56 different services, he explained, Google is in the process of creating a completely new operating system — one moreover up to 40 times faster than anything that IBM or HP could offer, and based on anything between 155,000 to 165,000 servers.

This too Arnold clearly deprecated, explaining that this "Googleplex" (a term he has appropriated from the name of Google's Mountain View headquarters) now encircles the world like the carapace of a tortoise — making Google the new AT&T; an AT&T, moreover, not subject to any regulation. Clearly in likening the Googleplex to a new operating system Arnold was also portraying Google as the new Microsoft.

At this stage Arnold's presentation began to sound more like a conspiracy theory than factual exposition. Confiding to the audience that Google founders Larry Page and Sergey Brin had refused to speak to him once they realised his was a critical rather than adulatory voice, and referring to a series of patent thickets that Google has built around its technology (patents which his lawyer had, for some inexplicable reason, advised Arnold not to put up on the Web), he went on to complain that he had never provided his address to Google, yet the company nevertheless knew it. "Google knows where I live," he said dramatically. "I didn’t tell them. They are watching me!"

And for those librarians still harbouring any illusion that by scanning books and making them available on the Web Google represents a force for good, Arnold depicted Google Print as a smokescreen. “The scanning of books is a red herring," he said, adding that Google was like a magician into whose hand a quarter suddenly appears as if from nowhere. "Everyone looks at the quarter, not the magician.”

Fortunately, Arnold's presentational mode appeared to owe more to his predilection for drama — and a canny sense of how to market a new book — than to paranoia. It also had moments of humour. Fifteen minutes into his presentation, we were all evacuated after the hotel fire alarm was set off, giving Arnold the opportunity to yell: "You see — I'm so hot! This is what I use in bars to get women."

Later, when we were allowed to re-enter the hotel to hear the rest of Arnold's presentation, the conference organiser announced that the alarm had been triggered by an old man smoking a cigar in his bed. "And that old man," promptly quipped Arnold, "is none other Gregorovich Brin, Sergey's uncle."

Not only is Google watching Arnold, it seems, but its founders have deployed their extended family to silence him!

Real or perceived threat?

But how seriously should we take Arnold's prognostications? He is, after all, not the only commentator to depict Google as the new Microsoft, or AT&T, and thus a significant monopoly threat.

Interestingly, most now view Microsoft as somewhat grey at the temples. This more relaxed view, moreover, is a consequence not of the antitrust case against the company — after all, Judge Jackson's order to break up Microsoft was subsequently overturned by a federal court — but from the growth of new competitors like Google, and the rise of the open source software movement.

That said, Arnold is right to deprecate the growing commercialisation of the Internet, and now that Google is a public company we can surely expect its "do no evil" ethos to come under increasing pressure from shareholders keen to see the return on their investment maximised.

But leaving aside Arnold's dire predictions of an all-seeing, all powerful Googleplex encircling the world and pulling everyone into its monopolistic grasp, it is certainly worth asking how much of a monopoly threat Google represents to web searching. The answer seems to be: "Not as much of a threat as Arnold implies". Many, for instance, believe that large generic search engines are set to see their dominance diminish rather than increase.

Commenting in an EcommerceTimes article earlier this year, the associate editor of SearchEngineWatch.comChris Shermanargued that the bigger the Web grows, the less useful generic search engines become. As a consequence, he said, "We're seeing a real rise in vertical search engines, which are subject-specific or task-specific — shopping, travel and so on." He added: "We're going to see more of that going forward as people become more sophisticated and as these specialised search engines become better at what they do."

Neither is Sherman a lone voice. Commenting in the same article Gartner Group's Rita Knox said: "People still need information on the Internet, but a more generic search capability like Google is going to be less useful."

Self-fulfilling prophecy

Time will tell. But the fundamental problem with Arnold's dark view of the future is that conspiracy theories tend to have a debilitating effect on our ability to act. We become less inclined to ward off the object of our fear if we believe it to be inevitable, creating a kind of self-fulfilling prophecy.

Arnold is not the only one to be disenchanted with the growing commercialisation of the Web. Nor is he the only one to deplore Google's role in this. In a recent paper called The Commercial Search Engine Industry and Alternatives to the Oligopoly, for instance, Bettina Fabos, from the Media Research Center at Budapest University of Technology and Economics, makes very similar points. Her conclusion, however, is very different.

Rather than portraying the situation as inevitable, and advising us to get over it, she concludes: "[T]o realize the web’s educational and non-commercial potential, educators and librarians need to move away from promoting individual skills (advanced searching techniques, web page evaluation skills) as a way to cope with excessive commercialism" and instead "address the increasing difficulties to locate content that is not commercial, and the misleading motives of the commercial, publicly-traded internet navigation tools, and the constant efforts among for-profit enterprise to bend the internet toward their ends."

In other words, rather than rushing around like Private Frazer in the BBC Sitcom Dad's Army shouting "We're all doomed", information professionals should adopt a more positive approach. Why not take the initiative and turn the technology in a more desirable direction? Why not fill the web with non-commercial content, and then build non-commercial tools to help users locate that content?

Indeed, says Fabos, some are already at work doing just this. She commends, for instance, the activities of initiatives like the Internet Scout Project, which enables organisations to share knowledge and resources via the Web by putting their collections online; she commends Merlot, the free and open resource providing links to online learning materials; and she commends tools like iVia, and Data Fountains, designed to allow web users discover and describe Internet resources about a particular topic.

Open Access

As it turns out, one of the more organised and advanced initiatives with the potential to help create a non-commercial web is the open access (OA) movement — a movement, in fact, in which librarians have always played a very active role.

For while the movement's original impetus was solely to liberate scholarly peer-reviewed articles from behind the subscription firewalls imposed by commercial publishers, there are grounds for suggesting it could develop into something grander, in both scope and scale. How come?

As scholarly publishers have consistently and obdurately refused to cooperate with the OA movement in its attempts to make scientific papers freely available on the Web, the emphasis of the movement has over time shifted from trying to persuade publishers to remove the toll barriers, to encouraging researchers to do it themselves by self-archiving their published papers, either in institutional repositories (IRs), or in subject-specific archives like the arXiv preprints repository and PubMed Central, the US National Institutes of Health free digital archive of biomedical and life sciences papers.

And to assist researches do this, the OA movement has created an impressive collection of self-archiving tools, including archival software like Southampton University's Eprints, and MIT's DSpace; a standardised protocol to enable repositories interoperate (the Open Archives Initiative Protocol for Metadata Harvesting , or OAI-PMH); and OAI-compliant search engines like Michigan University's OAIster, which harvest records from multiple OAI-compliant archives to create a single virtual archive. In this way hundreds of different repositories can be cross-searched using a single search interface — much like Google searches the Web. Essentially a vertical search engine, OAIster currently aggregates records from over 500 institutions.

But while the initial purpose of the Open Archives Initiative (OAI) was limited to scholarly papers, it has become apparent that its aims and its technology could have wider potential. As the OAI FAQ puts it, OA advocates came to realise that "the concepts in the OAI interoperability framework — exposing multiple forms of metadata through a harvesting protocol — had applications beyond the E-Print community." For this reason, the FAQ adds "the OAI has adopted a mission statement with broader application: opening up access to a range of digital materials."

How might this work? Two years ago Clifford Lynch published a paper in which he argued that there is no reason why an institutional repository could not contain "the intellectual works of faculty and students — both research and teaching materials — along with documentation of the activities of the institution". It could also contain, he said: "experimental and observational data captured by members of the institution that support their scholarly activities."

Indeed, Lynch added, repositories in higher educational establishments could also link with other organisations in order to extend and broaden what they offer. "[U]niversity institutional repositories have some very interesting and unexplored extensions to what we might think of as community or public repositories; this may in fact be another case of a concept developed within higher education moving more broadly into our society. Public libraries might join forces with local government, local historical societies, local museums and archives, and members of their local communities to establish community repositories. Public broadcasting might also have a role here."

Need not end there

And it need not end there. Why not use the OAI technology as the framework for an alternative non-commercial web; one encompassing as much as is deemed sufficiently valuable that it could benefit from being accessible outside the confines, constraints and biases of the commercial web. If users wanted to find a restaurant they could go to Google; but if they want to do a medical search then the non-commercial web would be a better choice. Data searchable within this alternative web would no doubt need to meet certain standards — in terms, for instance, of provenance, and depth and range of metadata etc.

Self-archiving purists discourage such talk, fearful that it may distract the movement from the priority of "freeing the refereed literature". But the reality is that as research funders like the Wellcome Trust and Research Councils UK begin to mandate researchers to self-archive their research papers, so the number of institutional repositories is growing. And once a university or research organisation has an institutional repository there is an inescapable logic for that repository to develop in the kind of directions proposed by Lynch.

It may be, of course, that in the end OAI technology is not appropriate for this job. It may also be wise not to distract the OA movement from its primary aim. But it is perhaps now only a matter of time before some such phenomenon develops. Initiatives like Google Print and Google Scholarhave served to highlight growing concerns at the way commercial organisations are now calling all the shots in the development of the Web. And it is these concerns that are encouraging more and people to think in terms of non-commercial alternatives.

What we are beginning to see, says Fabos, is a "small but growing countervailing force to the commercialisation of 'the universe of knowledge.'" What will drive these efforts, she adds "is the understanding that, in our commercial system, educators, librarians and citizens interested in nurturing a public sphere must work together to control the destiny of the internet — or somebody else will."

Clearly there is a valuable potential role here for information professionals, should they choose to seize the opportunity. After all, what better way for disenchanted librarians to make themselves indispensable in a new and relevant way — not by playing their traditional role as gateways to information (putting themselves between the information and the user), but as facilitators able to help researchers and other data creators collaborate and share information. If this means abandoning some of their traditional skills for new ones then so be it. Now there's a topic for discussion at Internet Librarian International 2006!

The fact is, it's time for information professionals to stop bemoaning the loss of some perceived golden age, and take control of the Web. In short, it's time to reach for the red pill!