Sunday, March 06, 2005

The director of the French national library, Jean-Noel Jeanneney, is not, it seems, happy about Google's project to put the collections of major libraries online in full text. Harvard, Michigan, Stanford, the Public Library of New York, and the Bodleian have signed up. It's certainly an inspiring aim - to make everything in the great libraries available, free of charge, to anyone with internet access. But M. Jeanneney is not going to play.

His beef is apparently that he thinks that it will mean "Anglo-Saxon domination". But what, in terms of search engine design, would that actually mean? Presumably he does not really believe that, if you were to search for a quote from La Rochefoucauld, you'd be confidently directed to Philip Roth. That would be absurd and make a mockery of the entire project (as well as rendering all other results from it unreliable). Of course, it's not put in such terms. He relativises at length about the "miroir américaine" and the "sensibilité européenne".

Blah. What does it mean?

The proposal is that the texts would available to all, searchable by keyword - it would be very difficult to conceive of a usable search system that selected results by nationality without the user's knowledge in this context. A search engine, at bottom, matches the input text (which, being digital, is language-neutral) with the content of its index and measures the relevance of the result according to some rule. Although the exact algorithms are trade secrets, it is well known that Google's Pagerank system works by treating links from other sites as votes for the quality of a page. The more people refer to you, the logic runs, the more reliable you must be. This concept should not be utterly foreign to M. Jeanneney, as it has a long tradition in academia around the world (they are called citations). Given at least that all the books are catalogued with metadata in the same format, there should be no problem. Cross-linguistic search is difficult, but the metas could be given in multiple languages. Cataloguing citations of scientific papers (a serious worry for Jeanneney) would help to bridge the language gap, and anyway, one can always put search terms in another language.

As a thought experiment, then, imagine the result of uploading a mass of French texts into such a system. It should be obvious, if Jeanneney is right and there is some mystical Weltanschauung that would prevent French and American scholars from reading each others' work, the links between the Francophone writings would buoy them up the ranks. With the various texts together on one page, it is to be expected that the distinction would melt away over time. If he is wrong, of course, there is no problem anyway.

What would achieve the goal of keeping French writings from the eyes of users? Exactly M. Jeanneney's policy. If he does not participate, then quite obviously the only French texts in the system will be whichever ones belong to the Oxford or Harvard holdings; and surely they certainly will have been filtered through his "régard Anglo-saxonne"? What on earth is wrong with the man? One has to ask if he has ever used the internet (PS: it's like Minitel, but good). Perhaps the strangest part of the whole rant arrives when he gives the conditions on which he would agree to participate. (You can read the interview here) Apparently, he would be happy to participate in the event that a European search engine of comparable power existed. Let us run by that one again; he will only open the archives when and if a European Google appears. Is there any sign of such a development? I haven't seen it. Surely he does not think it would be a good idea to have the world's libraries divided by trading block? Or would he then set out to re-digitise all the books available on Google Print - neatly doubling at least the total cost?

M. Jeanneney goes on at more length about the European social model (does he think that Google ranks pages by Republican donation? What does it mean, in concrete practical terms?), but why does he want to refuse the BNF's treasures to researchers in the Third World for whom the cost of travelling to Paris might not be as trivial as travelling to them no doubt is for him? If he is concerned about the welfare of the Francophonie, surely this should be uppermost in his mind? But no; it seems his cross-hairs are poised over his toes with laser precision.

One point I would take issue with Google's project about is its relations with other universal library schemes. Project Gutenberg, if anyone but geeks like me remembers, have been hammering the world's great books into computers since the Web was born, without hope for gain. But our man would probably reject them on nationalistic grounds, as they are located in San Francisco. The new Public Library of Science, which aims to publish current scientific research for the free access of any interested person, would seem to be much more worrying in terms of scientific publishing. But M. le Directeur does not seem to have heard of it. How will Google's scheme connect up with these? It would seem a terrible waste to catalogue reams of journals twice, and locking them into many small and mutually incompatible sites would render the whole thing pointless. There's a need to sort out interoperability, and also to avoid trampling the volunteer-led schemes like Gutenberg.

Finally, we come to the increasingly hackneyed point of search governance. Clearly it's stupid to lock books up in national ghettos. Even more stupid would be building several mutually interfering internet libraries. Perhaps the best solution would be for the texts to be published in an agreed format by the libraries themselves on their websites: that way, full-text search would be open to all comers in the search engine business. In order not to prejudice existing contracts, Google could keep the copies on their own servers - perhaps they would get a slightly quicker result - but the principle of not permitting offprinting or republishing (in order not to piss off the publishers) could be maintained. The goal would be to make the libraries and the PLOS available to anyone with an internet connection.

Why doesn't the Bibliothéque Nationale propose something along those lines and do something constructive?