Even abbreviated to IIIF (triple-I-F), International Image Interoperability Framework doesn’t quite trip off the tongue, but this collaborative project does exactly what it says on the tin. The IIIF is an international community of libraries, museums and digital project groups who have for the past few years been working on a set of tools to make digital image collections much, much more accessible, interoperable, and user-friendly.

To give scholars an unprecedented level of uniform and rich access to image-based resources hosted around the world.

To define a set of common application programming interfaces that support interoperability between image repositories.

To develop, cultivate and document shared technologies, such as image servers and web clients, that provide a world-class user experience in viewing, comparing, manipulating and annotating images.

Representatives from the IIIF came to the Weston Library on Monday 15 June to talk about the project in general and the Bodleian’s IIIF efforts in particular.

Intro to the IIIF

The first speaker was Tom Cramer, Chief Technology Strategist at Stanford University Libraries, who spoke about the project’s principles and what the IIIF hopes to achieve.

Image collections, ranging from digitised medieval manuscripts to modern video, have been available online for many years now. These image libraries are a great resource for researchers, but they’re far from perfect. Held in individual silos and accessed with a wide range of viewer software, some of it years old, there can be huge inconsistencies in useability and quality between different collections. On top of that, there is no easy way to compare images held by different institutions, and often no easy way to cite, share or download images. The IIIF recognised that modern scholars need a better set of tools, tools with the potential to take digital scholarship in entirely new directions.

The Biblissima Project is a hub for digital humanities projects in France that is focused on the written cultural heritage of the Middle Ages and the Renaissance. They have an implementation of Mirador that currently shows content hosted at multiple institutions and will, increasingly, provide access to French content.

The tools

To make this work, the community has developed two shared APIs [an explanation of APIs]. The image API retrieves the images from wherever in the world they’re held and allows the user to select an area of the image and to resize, rotate, and change the colour quality. It also generates a static URL which allows scholars to cite the image, even an image at a particular level of zoom, in a stable, reliable way [an explanation of static URLs]. The presentation API packages each image with its metadata so that users know the originating institution, the title of the image, what page of what book the image was taken from, and so on.

Any interested institution will be able to apply these APIs to their data stores in order to make their own image collections fully interoperable and accessible. Meanwhile, external software developers have stepped in to build IIIF-compliant clients to display the images, including the Wellcome Lbrary’s Wellcome Player and the Internet Archive BookServer.

Just one benefit of this new image interoperability is that for the first time, it is possible to do things like virtually re-unite manuscripts and books which over time (and through carelessness or greed) have been broken up and dispersed across the world. But it also allows libraries, museums, archives, and the like, to offer rich image delivery and to publish an image just once in a way that allows for frequent re-use. Scholars will now be able to remix content from multiple sources, cite and share their content, and even do annotations and transcriptions more easily.

IIIF at the Bodleian

The Bodleian has been part of the project from the beginning. Matthew McGrattan, from the Bodleian’s Digital Library Systems and Services (BDLSS), and Judith Siefring, project manager for the Digital Manuscripts Toolkit, spoke about what the Library is doing with IIIF.

The Digital Manuscripts Toolkit (DMT) is a Bodleian effort to build tools to make it easier for other institutions, including smaller or less well-resourced institutions, to deploy their own IIIF-compliant image services. Part of the DMT project is to look at user needs and develop case studies to show some of the research possibilities opened up by the IIIF, which will also test the functionality of the Toolkit. To this end, four projects related to medieval manuscripts have just been selected for funding. The Apocalypse in Oxford is a digital comparison and editing project. Rolling History in 15th Century England will look at the practicalities of digitising scrolls (some 17ft long!). Rota Dominice Orationis will explore how to digitise wheel diagrams with revolving rings, or fold outs, or other unusual formats, and the Armenian Codicology & Palaeography project will create the very first palaeography teaching manual for the Armenian language.

The Bodleian also has its own IIIF-compliant image store: Digital.Bodleian. An upgrade of our current digital image collection, Digital.Bodleian will re-launch in July 2015 with a full suite of IIIF features, including the ability to open the Bodleian’s images in any other IIIF-compliant image viewer.

Comments and ideas welcome

Monday’s IIIF event was a thought-provoking and inspiring introduction to an exciting project, which opens up new opportunities for cataloguing, teaching, editing and long-term preservation, among many other possibilities. Your comments and ideas are welcome! Judith Siefring encouraged us to let her know our thoughts. You can get in touch at judith.siefring@bodleian.ox.ac.uk.

I spent last week in the lovely city of Copenhagen immersed in all things Drupal. It was a great experience, not just because of the city (so many happy cyclists!), but because I’d not seen a large scale Open Source project up close before and it is a very different and very interesting world!

Drupal Does RDF
OK, so I knew that already, but I didn’t know that from Drupal 7 (release pending) RDF support will be part of the Drupal core, showing a fairly significant commitment in this area. Even better, there is an active Semantic Web Drupal group working on this stuff. While “linked data” remains something of an aside for us (99.9% of our materials will not make their way to the Web any time soon) the “x has relationship y with z” structure of RDF is still useful when building the BEAM interfaces – for example Item 10 is part of shelfmark MS Digital 01, etc. There is also no harm in trying to be future proof (assuming the Semantic Web is indeed the future of the Web! ;-)) for when the resources are released into the wild.

Drupal Does Publishing
During his keynote, Dries Buytaert (the creator of Drupal) mentioned “distributions”. Much like Linux distributions, these are custom builds of Drupal for a particular market or function. (It is testament the software’s flexibility that this is possible!) Such distributions already exist and I attended a session on OpenPublish because I wondered what the interface would look like and also thought it might be handy if you wanted to build, for instance, an Open Access Journal over institutional repositories. Mix in the RDF mentioned above and you’ve a very attractive publishing platform indeed!

Another distro that might be of interest is OpenAtrium which bills itself as an Intranet in a Box.

Drupal Does Community
One of my motivations in attending the conference was to find out about Open Source development and communities. One of the talks was entitled “Come for the Software, Stay for the Community” and I think part of Drupal’s success is its drive to create and maintain a sharing culture – the code is GPL’d for example. It was a curious thing to arrive into this community, an outsider, and feel completely on the edge of it all. That said, I met some wonderful people, spent a productive day finding my way around the code at the “sprint” and think that a little effort to contribute will go a long way. This is a good opportunity to engage with a real life Open Source community. All I need to do is work out what I have to offer!

Drupal Needs to Get Old School
There were three keynotes in total, and the middle one was by Rasmus Lerdorf of PHP fame, scaring the Web designers in the audience with a technical performance analysis of the core Drupal code. I scribbled down the names of various debugging tools, but what struck me the most was the almost bewildered look on Rasmus’ face when considering that PHP had been used to build a full-scale Web platform. He even suggested at one point that parts of the Drupal core should be migrated to C rather than remain as “a PHP script”. There is something very cool about C. I should dig my old books out! 🙂

HTML5 is Here!Jeremy Keith gave a wonderful keynote on HTML5, why it is like it is and what happened to xhtml 2.0. Parts were reminiscent of the RSS wars, but mostly I was impressed by the HTML 5 Design Principles which favour a working Web rather than a theoretically pure (XML-based) one. The talk is well worth a watch if you’re interested in such things and I felt reassured and inspired by the practical and pragmatic approach outlined. I can’t decide if I should start to implement HTML5 in our interface or not, but given that 5 is broadly compatible with the hotchpotch of HTMLs we all code in now, I suspect this migration will be gentle and as required rather than a brutal revolution.

Responsive Design
I often feel I’m a little slow at finding things out, but I don’t think I was the only person in the audience to have never heard about responsive Web design, though when you know what it is, it seems the most obvious thing in the world! The problem with the Web has long been the variation in technology used to render the HTML. Different browsers react differently and things can look very different on different hardware – from large desktop monitors, through smaller screens to phones. Adherence to standards like HTML5 and CSS3 will go a long way to solving the browser problem, but what of screen size? One way would be to create a site for each screen size. Another way would be to make a single design that scales well, so things like images disappear on narrower screens, multiple columns become one, etc.

Though not without its problems, this is the essence of responsive design and CSS3 makes it all possible. Still not sure what I’m on about? dconstruct was given as a good example. Using a standards compliant browser (ie. not IE! (yet)) shrink the browser window so it is quite narrow. See what happens? This kind of design, along with the underlying technology and frameworks, will be very useful to our interface so I probably need to look more into it. Currently we’re working with a screen size in mind (that of the reading room laptop) but being more flexible can only be a good thing!

There were so many more interesting things but I hope this has given you a flavour of what was a grand conference.

Thanks to Neil Jefferies for a link to this article in The Register, which tells us that MS has begun two open source projects that will make it possible for developers to create tools to ‘browse, read and extract emails, calendar, contacts and events information’ which live in MS Outlook’s .pst file format. These tools are the PST Data Structure View Tool and the PST File Format SDK, and both are to be Apache-licensed.

It has been an odd couple of days. You know how it is. A problem that needs solving. A seemingly bewildering array of possible solutions and lots of opinions and no clear place to start. In an attempt to bring some shape to the mist, I’m going to start at the start, with the basics.

The Raw Materials

A collection of things.

A set of born digital items – mostly documents in antique formats.

EAD for the collection – hierarchical according to local custom and ISAD(G).

A spreadsheet – providing additional information about the digital items, including digests.

The Desired Result

A browser-based reader interface to the digital items that maintains the connections to the analogue components and remains faithful to the structure of the finding aid and presents that structure in such a way as to not confuse the reader. Ideally the interface should also support aspects of a collaborative Web, where people can annotate and comment, as well as offer “basket”-like functionality (“basket” is the wrong term), maybe requests for copies and maybe even the ability to arrange the collection how they’d like to use it.

(I imagine you’ve all got similar issues! :-))

We put together a sketch for the interface to the collection for the Project Advisory Board and got some very useful feedback from that. Our Graduate Trainee Victoria has also done some great research on interfaces to existing archives and some commercial sites which provides some marvellous input on what we should and could build.

But this is where things get misty…

We have some raw materials, we have a vision of the thing we want to build (though that vision is in parts hazy and in parts aiming high! (why not eh?)), so where to we go from here?

(To put it another way, there are the foundations of a “model”, a vision of a “view”; now we need to define the “controller” – the thing that brings the first two together).

We could build a database and put all the metadata into it and run the site off that

We could build a set of resources (the items, the sub[0,*]series, the collection, the people), link all that data together and run the site off that.

We could build a bunch of flat pages which, while generated dynamically once, don’t change once the collection is up.

There is a strong contender for how it’ll be done (the middle one!) and in the next exciting episode I’ll hopefully be able to tell you more about the first tentative steps, but for now I’m open to suggestions – either for alternatives or technologies that’ll help and if you have already built what we’re after then please get in touch… 😉

Just stumbled on a tool called reframe it. It’s available as a Firefox extension, and allows you to add marginalia to any website, whether it has a comments feature of not. Looks like you can share your comments with specific groups, or more widely. Reviews suggest it may need a little fine-tuning, but it could be a useful tool for researchers.

The digital lives conference provided a space to digest some of the findings of the AHRC-funded digital lives project, and also to bring together other perspectives on the topic of personal digital archives. At the proposal stage, the conference was scheduled to last just a day; in the event one day came to be three, which demonstrates how much there is to say on the subject.

Day one was titled ‘Digital Lifelines: Practicalities, Professionalities and Potentialities’. This day was intended mostly for institutions that might archive digital lives for research purposes. Cathy Marshall of Microsoft Research gave the opening talk, which explored some personal digital archiving myths on the basis of her experiences interviewing real-life users about their management of personal digital information.

Next came a series of four short talks on ‘aspects of digital curation‘.

Cal Lee, of UNC Chapel Hill, emphasised the need for combining professional skills in order to undertake digital curation successfully. Archives and libraries need to have the right combination of skills to be trusted to do this work.

Naomi Nelson of MARBL, Emory University, told a tale of two donors. The first donor being the entity that gives/sells an archive to a library and the second being the academic researcher. Libraries need to have a dialogue with donors of the first type about what a digital archive might contain; this goes beyond the ‘files’ that they readily conceive as components of the archive, and includes several kinds of ‘hidden’ data that may be unknown to them. The second donor, ‘the researcher’, becomes a donor by virtue of the information that the research library can collect about their use of an archive. Naomi raised interesting questions about how we might be able to collect this kind of data and make it available to other researchers, perhaps at a time of the original researcher’s choosing.

Michael Olson of Stanford University Libraries spoke of their digital collections and programmes of work. Some mention of work on the fundamentals – the digital library architecture (equivalent to our developing Digital Asset Management System – DAMS – which will provide us with resilient storage, object management and tools and services that can be shared with other library applications). Their digital collections include a software collection of some 5000 titles, containing games and other software. I think that sparked some interest from many in the audience!

Ludmilla Pollock, Cold Spring Harbour Laboratory, told us about an extensive oral history programme giving rise to much digital data requiring preservation. The collection contains videos of the scientists talking about their memories and has a dedicated interface.

After, we heard from a panel of dealers in archival materials: Gabriel Heaton of Sotheby’s, Julian Rota of Bertram Rota and Joan Winterkorn of Bernard Quaritch. I was curious to hear if the dealers had needed to appraise archives conatining obsolete digital media. Digital material is still only a tiny proportion of collections being appraised by dealers, and it seems that what little digital material they do encounter may not be appraised as such (disk labels are viewed rather than their contents). While paper archives are plentiful, perhaps there’s not much incentive to develop what’s needed to cater for the digital (many archivists may well feel this way too!). What’s certain is that the dealer has to be quite sure that any investment in facilitating the appraisal of digital materials pays dividends come sale time.

Inevitably, questions of value were a feature of the session. The dealers suggest that archives and libraries are not willing to pay for born-digital archives yet; perhaps this stems from concerns about uniqueness and authenticity, and the lack of facilities to preserve, curate and provide access. It’s not like there’s actually much on the market at the moment, so perhaps it’s a matter of supply as much as demand? Comparisons with ‘traditional’ materials were also made using Larkin’s magic/meaningful values:

“All literary manuscripts have two kinds of value: what might be called the magical value and the meaningful value. The magical value is the older and more universal: this is the paper [the writer] wrote on, these are the words as he wrote them, emerging for the first time in this particular magical combination. We may feel inclined to be patronising about this Shelley-plain, Thomas-coloured factor, but it is a potent element in all collecting, and I doubt if any librarian can be a successful manuscript collector unless he responds to it to some extent. The meaningful value is of much more recent origin, and is the degree to which a manuscript helps to enlarge our knowledge and understanding of a writer’s life and work. A manuscript can show the cancellations, the substitutions, the shifting towards the ultimate form and the final meaning. A notebook, simply by being a fixed sequence of pages, can supply evidence of chronology. Unpublished work, unfinished work, even notes towards unwritten work all contribute to our knowledge of a writer’s intentions; his letters and diaries add to what we know of his life and the circumstances in which he wrote.”

The ‘meaningful’ aspects of digital archives are apparent enough, but what of the ‘magical’? Most, if not all, contributors to the discussion saw ‘artifactual’ value in digital media that had an obvious personal connection, whether Barack Obama’s Blackberry or J.K. Rowling’s laptop. What wasn’t discussed so much was the potential magical value of seeing a digital manuscript being rendered in its original environment. I find that quite magical, myself. I think more people will come to see it this way in time.

Delegates were then able to visit to digital scriptorium and audiovisual studio at the British Library.

After lunch, we resumed with a view of the ‘Digital Economy and Philosophy‘ from Annamaria Carusi of the Oxford e-Research Centre. Some interesting thoughts about trust and technology, referring back to Plato’s Phaedrus and the misgivings that an oral culture had about writing. New technologies can be disruptive and it takes time for them to be generally accepted and trusted.

Next, four talks under the theme of digital preservation.

First an overview of the history of personal films from Luke McKernan, a curator at the British Library. This included changes in use and physical format, up to the current rise of online video populating YouTube, and its even more prolific Chinese equivalents. Luke also talked about ‘lifecasting’, pointing to JenniCam (now a thing of the past, apparently), and also to folk who go so far as to install movement sensors and videos throughout their homes. Yikes!

We also heard from the British Library’s digital preservation team, about their work on risk assessment for the Library’s digital collections (if memory serves, about 3% of the CDs they sampled in a recent survey had problems). Their current focus is getting material off vulnerable media and into the Library’s preservation system; this is also a key aim in our first phase of futureArch. Also mention of the Planets and LIFE projects. Between project and permanent posts, the BL have some 14 people working on digital preservation. If you count those working on webarchiving, audiovisual colections, digitisation, born-digital manuscripts, digital legal deposit, etc., areas, who also have a knowledge of this area, it’s probably rather more.

William Prentice offered an enjoyable presentation on audio archiving, which had some similar features to Luke’s talk on film. It always strikes me that audiovisual archiving is very similar to digital archiving in many respects, especially when there’s a need to do digital archaeology that involves older hardware and software that itself requires management.

Juan-José Boté of the University of Barcelona spoke to us about a number of projects he had been working on. These were very definitely hybrid archives and interesting for that reason.

Next, I chaired a panel of ‘Practical Experiences‘. Being naturally oriented toward the practical, there was lots for me here.

John Blythe, University of North Carolina, spoke about the Southern Historical Collection at the Wilson Library, including the processes they are using for digital collections. Interestingly, they have use of a digital accessioning tool created by their neighbours at Duke University.

Erika Farr, Emory University, talked about the digital element of Salman Rushdie’s papers. Interesting to note that there was overlap of data between PCs, where the creator has migrated material from one device to another; this is something we’ve found in digital materials we’ve processed too. I also found Rushdie’s filenaming and foldering conventions curious. When working with personal archives, you come to know the ways people have of doing things. This applies equally to the digital domain – you come to learn the creator’s style of working with the technology.

Gabby Redwine of the Harry Ransom Center, University of Texas at Austin gave a good talk about the HRC’s experiences so far. HRC have made some of their collections accessible in the reading room and in exhibition spaces, and are doing some creative things to learn what they can from the process. Like us, they are opting for the locked down laptop approach as an interim means of researcher access to born-digital material.

William Snow of Stanford University Libraries spoke to us about SALT, or the Self Archiving Legacy Toolkit. This does some very cool things using semantic technologies, though we would need to look at technologies that can be implemented locally (much of SALT functionality is currently achieved using third-party web services). Stanford are looking to harness creators’ knowledge of their own lives, relationships, and stuff, to add value to their personal archives using SALT. I think we might use it slightly differently, with curators (perhaps mediating creator use, or just processing?) and researchers being the most likely users. I really like the richness in the faceted browser (they are currently using flamenco) – some possibilities for interfaces here. Their use of Freebase for authority control was also interesting; at the Bod, we use The National Register of Archives (NRA) for this and would be reluctant to change all our legacy finding aids and place our trust in such a new service! If the NRA could add some freebase-like functionality, that would be nice. Some other clever stuff too, like term extraction and relationship graphs.

The day concluded with a little discussion, mainly about where digital forensics and legal discovery tools fit into digital archiving. My feeling is that they are useful for capture and exploration. Less so for the work needed around long-term preservation and access.

Academic Earth presents ‘thousands of video lectures from the world’s top scholars’. So far, contributors are from top U.S. universities: Berkeley, Harvard, MIT, Princeton, Stanford and Yale. There is scope for expansion and the Academic Earth team are inviting new partners to contribute.

This is a great idea, but my main reason for linking to Academic Earth is that I rather like the interface. It feels very clean and it’s easy to navigate.

There was some discussion yesterday on the EAD list about evaluating the effectiveness of online finding aids. Wendy Duff drew attention to a project she is involved in, called ‘Archival Metrics‘, which has put a toolkit together for undertaking user evaluation of online finding aids. There are also other archival metrics toolkits available for download. Am wondering whether this will be useful to us as interface developments become a larger part of our work.

Not the traditional form of indexing an archive, I know, but it seems to me that automagically extracted metadata formed into tag clouds would be a marvelous way of navigating through some digital archives.

We could present clouds at different levels of granularity – at the collection level, in series and lower levels all the way down to the item. We could even present clouds across multiple aggregations, be they of series, collections or items. This could be fun.

For some digital archives, I think tag clouds are probably a ‘must’. Poorly structured and overly large email archives are a good candidate.

One of the downsides of the ‘hybrid archive’ is that we can’t necessarily generate tag clouds that draw on all the contents of the archive. All ‘physical’ material and non-textual digital formats are excluded unless these things are already tagged by creators. They can, of course, be tagged later by cataloguers and/or users. I guess that we need to recognise that imbalance in our user interface, to help our users get to grips with the nature of research in a hybrid archive.

I know that automatic metadata extraction may have shortcomings, but I’d really like to see a fusing of standardised subject headings with tag clouds. We can have the best of both worlds, surely?

There have been lots of examples of tag clouds about recently, including TagCrowd and Wordle.