Menu

“No, we can’t just script it” and other refrains from (tired) archival data migrators

Below are the slides and script I used for the talk I gave with Danielle Robichaud at Access 2017 in Saskatoon. Note that the script has been edited from my messy notes to be readable, so it’s not 100% verbatim.

So today we’re going to be talking about migrating archival data, and since this is (generally speaking) a non-archival crowd I’m going to start with a primer on archival data and then a quick history of archival description technology. And then I’m going to turn it over to Danielle who’s going to present a case study of migrating archival data at the University of Waterloo.

I should start by saying that, well, we do script a lot of data transformations. I do a lot of data migration work, and I’m certainly not manually transforming data to match a specific standard. I rely on the developers that I work with to do a ton of scripting. But today we wanted to focus on the factors that make that work difficult and look at a situation where saying “Why can’t you just script it?” isn’t helpful.

All of our descriptions are original work. Every time we describe archival material, it’s the first time it’s ever been described. So we spend a lot of time researching our collections, learning about them, writing about them, and getting deeply, emotionally involved with them. This is why archivists, like all cataloguers, are protective of our data.

Slide 5 (Sara)

Each archival record, along with being original all the time, is part of, or represents, an organic, inter-related, hierarchical conglomeration of material. One record could describe one photograph, or it could describe a folder full of photographs, or it could describe a collection of hundreds of folders containing thousands of photographs. There isn’t a one-to-one relationship between the object and the record.

Slide 6 (Sara)

Generally speaking, we don’t describe the single photograph, because we don’t have the mandate to do so. We describe some level of the hierarchy above that, relying on researchers to drill down and find the individual items. For many archivists, describing individual objects only happens when we receive special funding from the donor, or we decide that item is important enough, or we have a specific need to write those descriptions, as Dr Christen emphasized this morning.

Slide 7 (Sara)

So we’ve got complex data; I would also think of it as fragile data – not easy to replicate, maybe only stored in a database somewhere. Surely, archivists should be concerned with standardization. And this is true – but adoption of national or international standards has been slow and uneven, especially when those standards conflict with systems that are already in place. In libraries, cataloguing standards have been shared by institutions for many decades, but archives never had that, so we developed systems internal to our institutions, sometimes even internal to a single collection, to describe our material. Over the past, say, 40 years, progress has been made, but the adoption of technological solutions to support standardized, machine readable data has been even slower.

Slide 8 (Sara)

So this is a very brief history of archival description and archival description technology. This is in no way comprehensive, and it’s from a very Canadian perspective.

1990: In 1990, RAD, the Canadian Rules for Archival Description, was first published; it’s become the de facto standard for archival description in Canada today.

Mid-90s: By the mid-90s, some institutions had started to adopt database solutions to take the place of paper records; this was in no way universal, though, and many institutions still used catalogue cards and paper finding aids (and some still do today).

1995: In 1995, a hyperlinked version of RAD was released, which helped archivists navigate the 200+ page standard.

Late 90s-00s: By the late 90s and early 2000s, specialized archival management software started to become available.

Slide 9 (Sara)

1998: In 1998, Encoded Archival Description brought XML to the archives, giving us the ability to harvest and share our data.

2001: In 2001, the International Council on Archives published a report recommending a standardized, open source tool for encoding archival finding aids, building on the availability of EAD.

2008 (July): After 7 long years, but as an indirect result of that OSARIS report, ICA-AtoM 1.0-beta was released at the ICA Congress in Kuala Lumpur.

2008 (November): And by November of that year it had implemented support for RAD.

So it took 18 years for our Canadian standard to have a system where it could be represented online. And in the years since 2008, AtoM has become the de facto system for online archival description in Canada.

Slide 10 (Sara)

Finally, this quote is perhaps a bit unfair, because certainly there are tech-inclined, progressive, forward-thinking archivists among us. But I like it, because I think that there are parallels to what we’re talking about today. In 1970, Jay Atherton wrote that “Just to mention the words “computer” or “automation” in some circles is to invite cold suspicious stares of hostility, making one feel as though he had said something dirty.” Thinking about our complicated, messy, homegrown data, and the snail’s pace progress that we’ve made in developing technological solutions to make archivists’ work easier has perhaps made us especially wary of trying to adopt a “tech will fix it” mentality.

The collection consists of approximately 2 million negatives and is one of our most heavily used collections, which made it an obvious choice for our Islandora (Waterloo Digital Library) pilot, but the decision presented a series of challenges tied to how the negatives were described.

The descriptive information is limited to what was provided by KWR staff, which is title, shoot date and whether or not the photos are col. or b&w. The information is useful to have but since it was added by the photographer for use by office staff, it doesn’t align well with the types of questions researchers want answered.

Slide 13 (Danielle)

Here’s an exercise to illustrate disconnect between envelope titles and expectations..

What type of images would you expect to see in an envelope from 1953 titled “St. Agatha Orphanage?

Copying file level records for re-use at the item-level doesn’t work because they often don’t reflect the contents of the images which falls short our migration goals which are improving accessibility and discoverability

How can we improve the descriptions to facilitate keyword searching and the identification of people or events?