Présentation

The foxglove (digitale in French) is a flower which grows in difficult situations, such as the crevices of old walls or rocks. It is grown commercially for life saving medecine, and yet is also a celebrated poison. In this blog, I will try unsuccessfully to avoid too many bad puns, but this convergence of digital(is) and digitalisation is fascinating, and a useful trope to launch my glorying in the linguistic and organisational consequences of the collision between the digital and the humanities in France, seen from a British perspective.

Administration

Encoding documents and collections at Caen

And so, once more, and maybe for the last time, to Caen for Encodage de documents et de collections, the two-day culmination of the seminar series of Caen’s Pole document numérique, ‘organisé dans le cadre de la chaire d’excellence de Matthew James Driscoll’ We are met this time in the magnificent Belvedere room, affording splendid views over the surrounding countryside, which is bathed in unwonted spring sunshine. Matthew kicked off with an overview of his handrit project, focussing this time on the TEI’s manuscript description module, its evolution and how it fits the needs of his project (or was adjusted to do so); this was nicely complemented by a description of the manuscript holdings (crossing the frontier between Library and Archive), and the digitization work flow used by the Icelandic partners in the project from Örn Hrafnkelsson of the National Library in Reykjavík.

The virtual reconstitution of the great libraries of the middle ages is one of the projects which mass digitization has been promising us for many years. The Bibliothèque virtuelle du Mont Saint-Michel is a classic example: Catherine Jacquemard, from CRAHAM at the Université de Caen Basse-Normandie, Jean-Luc Leservoisier, from the Scriptorial d’Avranches (where many but no means all of the surviving manuscripts from the Abbey of Saint-Michel are now holed up) and Marie Bisson, the technician responsible for finding ways of pooling and harmonising the scattered records describing that library, gave a good report from the coal face where those actually trying to deliver on that promise have been labouring, stubbing their toes occasionally on the mutually inconsistent cataloguing of manuscripts in various institutions.

We then broke for lunch, noticing en passant that the campus seemed to have acquired a number of students disguised as angels, smurfs, gangsters, and other figures of popular iconography.

After lunch, Marie-Luce Demonet, of the CESR, Université de Tours gave a whirlwind overview of the activities of the Bibliothèques Virtuelles Humanistes: I noted in particular the way it needs to treat uniformly both manuscript and printed sources, the ingenious use of iconclass as a unifying vocabulary to provide image search facilities across both miniatures and ornamented letters, and the availability of an online lexicon of printers marks, but there was much more meat besides.

The SCRIPTA project at Caen uses a traditional mySQL database to catalogue charters, but is now evolving into something more like an XML database by means of the addition of a front end written in XML Mind. This was presented by Pierre Bauduin and Tamiko Fujimoto from the CRAHAM unit at Caen, with technical support from Anne Goloubkoff of the Pôle Document numérique. About this point in the day, the growing number of angels, smurfs, gangsters etc. outside the building reached a critical mass and started its rather noisy procession around the building and indeed the town, which made it difficult to follow all of Tamiko’s walk through the software. I did note however that the TEI markup deployed was using some rather politically incorrect values for its @type attributes, derived apparently from recommended practice in the archival community.

I rounded off the day with another appearance of my talk on the History of the TEI, which I still haven’t quite got to fit into the confines of a 45 minute presentation, despite two previous attempts. Ah well. If on the other hand, you’re more interested in angels, smurfs, gangsters, etc. then you may prefer to look at my photos.

Next morning bright and early, we listed to Georg Vogeler from Graz (now located in something called the Center for information-modelling in the humanities, I learn: probably one word in German) describing the Monasterium.net, which is a kind of collaborative digital library and hence maybe a collaborative research environment. It holds information about thousands of charters and legal documents, either aggregated or syndicated from 99 other archives in a dozen countries world wide. As such, it is itself arguably (I say arguably because we argued about the meaning of the term) a kind of finding aid. It uses, of course, its own schema, drawing on both EAD and TEI P4, the former for the archive-level description, the latter for the encoding of individual documents. The resulting CEI schema is arguably neither fish nor fowl, but does have quite an impressive implementation, using eXist via Xforms and a bunch of ajax controls to deliver a cool integrated browse and search interface for finding aid and document alike (though only 10% of the documents are transcribed). Clearly any kind of cross archive search’n browse facility is a Good Thing, though whether this constitutes any kind of « digital edition », collaborative or otherwise is more debatable, as indeed we did.

Lists of documents, and the collections which gave rise to them, were the theme of this second day. Lucien Reynhout from the Bibliothèque Royale de Belgique described a Belgian project to create « Sanderus Electronicus » a digital edition of an important 17th century list of lists of books, made by one Antonius Sanderus: this too was a collaborative project: Sanderus published as a single work about sixty different lists derived from the reports of several correspondents whom he had asked to describe the holdings of several significant libraries and as such exhibits all the problems of inconsistency of description and detail we’re accustomed to in the digital domain, deriving perhaps also from the same ontological anxieties: what are the individual components of such lists? which object in the FRBR model corresponds with their constituents? for example what does « duo Iuvenales » actually mean? Sanderus Electronicus will take the common sense view that it is composed of list items (or so I believe) rather than anything more bibliographic, though it will also use a database called BIBALES, to hold entries for people, places, works etc. referenced.

After coffee, the man from the ministry, an amiable person called Florent Palluault explained just why every archive in France, if it creates a catalogue at all, will do so using EAD. and how it came about that the digital version of the venerable Catalogue général des manuscrits des bibliothèques, all 116 volumes of it, is being updated and produced to the same standard. He described the workflow, which reminded me of some other large scale retrodigitizan projects : OCR of the original ancient print volumes had been automatically split up into separate Word documents for editing, each notice managed within a database, had been exported as a Word document with some degree of automatic conversion to EAD on the basis of the typography. Jérôme Sirdey (Bibliothèque nationale de France) then described PALME, a new project aiming to convert an existing MARC-based catalogue of 20thc French literary mss. into EAD and Pellualt then concluded with some speculation about future directions, notably a planned catalogue collectif de france (CCFR): an ambitious union catalogue of mss holdings across CALAME (funded by CNRS institutions), the BNF, and the new digital CGM, all still based on EAD, which clearly still has a great future in France.

EAD and TEI and whether there was any hope for a happy marriage between them was a theme to which Florence Clavaud (Ecole des Chartes) returned after lunch. Florence is a member of the expert group which is currently proposing revisions to the EAD standard and to the accompanying (French) Guide to best practice for its application, as well as being expert in both EAD and TEI, amd so she has lots to say on both, unfortunately rather more than she really had time for on this occasion. Anne-Marie Turcan and Hanno Wijsman from IRHT concluded the session by presenting work building on the Biblifram project notably a database under development at IRHT (and allegedly only accessible there, for IPR reasons) to support research in the history of the book: Libraria et Bibale.

The two days were rounded off in a very satisfactory way by Torsten Schassen, from the Herzog August Bibliothek in Wolfenbüttel, recounting his experience as a participant in the EU-funded digitisation project Europeana regia project which aims to catalogue and provide access to all the mss from three specific royal collections now dispersed across a number of European libraries, and hence catalogued in a number of different formats (Marc, EAD, TEI, MAB, MXML… ) and seven different languages. Possibly an unusual aspect of the project, or one that Thorsten chose to emphasize at any rate, was a requirement that the resulting system be both usable and interesting for the general public. As the party responsible for metadata, WAB had the thankless task of trying to define a kind of Dublin Core minimal set for manuscript description, which is reassuringly a clean subset derived from TEI-P5, even if the European Library cannot currently handle TEI format data directly. The minimum data set was also internationalised, even though Europeana itself cannot currently handle multilingual data. There is even a button on the website which sends you the TEI <msDesc> for each manuscript that has one.

The take-away message from this presentation, as from the seminar as a whole, was encouraging: TEI is proving its usefulness in a variety of complex document management situations. I also think some serious investigation of the feasibility of integrating EAD within it is warranted: not much is needed and much would be gained.