Archivist Skills & Digital Preservation

Any discussion that includes “digital preservation” and “traditional archivist skills” in the same sentence always interests me. This reflects my own personal background (I trained as an archivist) but also my conviction that the skills of an archivist can have relevance to digital preservation work.

I recently asked a question along these lines after I heard Catherine Taylor, the archivist for Waddesdon Manor, give an excellent presentation at the PASIG17 Conference this month. She had started life as the paper archivist and has evidently grown into the role of digital archivist with great success. Her talk was called “We can just keep it all can’t we?: managing user expectations around digital preservation and access”.

As Taylor told it, she was a victim of her own success; staff always depended on her to find (paper) documents which nobody else could find. The same staff apparently saw no reason why they couldn’t depend on her to find that vital email, spreadsheet, or Word document. To put it another way, they expected the “magic” of the well-organised archive to pass directly into a digital environment. My guess is that they expected that “magic” to take effect without anyone lifting a finger or expending any effort on good naming, filing, or metadata assignment. But all of that is hard work.

What’s so great about archivists?

My question to Catherine was to do with the place of archival skills in digital preservation, and how I feel they can sometimes be neglected or overlooked in many digital preservation projects. Possible scenario is that the “solution” we purchase is an IT system, so its implementation is in the hands of IT project managers. Archivists might be consulted as project suppliers; more often, I fear they are ignored, or don’t speak up.

Catherine’s reply affirmed the value of such skills as selection and appraisal, which she believes have a role to play in assessing the overload of digital content and reducing duplication.

After the conference, I wondered to myself what other archival skills or weapons in the toolbox might help with digital preservation. A partial tag cloud might look like this:

Created using http://worditout.com

We’ve got an app for that

What tools and methods do technically-able people reach for to address issues associated with the “help us to find stuff” problem? Perhaps…

Automated indexing of metadata, where the process is operated by machines on machine-readable text.

Using default metadata fields – by which I mean properties embedded in MS Word documents. These can be exposed, made sortable and searchable; SharePoint has made a whole “career” out of doing that.

Network drives managed by IT sysadmins alone – which can include everything from naming to deletion (but also backing up, of course).

De-duplication tools that can automatically find and remove duplicate files. Very often, they’re deployed as network management tools and applied to resolve what is perceived as a network storage problem. The way they work is based on recognition of checksum matches or similar rules.

Search engines – which may be powerful, but not very effective if there’s nothing to search on.

Artificial Intelligence (AI) tools which can be “trained” to recognise words and phrases, and thus assist (or even perform) selection and appraisal of content on a grand scale.

Internal user behaviours

There are some behaviours of our beloved internal staff / users which arguably contribute to the digital preservation problem in the long-term. They could all be characterised as “neglect”. They include:

Keeping everything – if not instructed to do otherwise, and there’s enough space to do so.

Free-spirited file naming and metadata assignment.

Failure to mark secure emails as secure – which is leading to a retrospective correction problem for large government archives now.

I would contend that a shared network run on an IT-only basis, where the only management and ownership policies come from sysadmins, is likely to foster such neglect. Sysadmins might not wish to get involved in discussions of meaning, context, or use of content.

How to restore the “magic”?

I suppose we’d all love to get closer to a live network, EDRMS, or digital archive where we can all find and retrieve our content. A few suggestions occur to me…

Collaboration. No archivist can solve this alone, and the trend of many of the talks at PASIG was to affirm that collaboration between IT, storage, developers, archivists, librarians and repository managers is not only desirable – it is in fact the only way we’ll succeed now. This might be an indicator of how big the task is ahead of us. The 4C Project said as much.

Archivists must change and grow. Let’s not “junk” our skillsets; for some reason, I fear that we are encouraged not to tread on IT ground, start to assume that machines can do everything we can do, and that our training is worthless. Rather, we must engage with what IT systems, tools and applications can do for us, how they can help us realise the results in that word cloud.

Influence and educate staff and other users. And if we could do it in a painless way, that would be one miracle cure that we’re all looking for. On the other hand, Catherine’s plan to integrate SharePoint with Preservica (with the help of the latter) is one move in the right direction: for one thing, she’s finding that the actual location of digital objects doesn’t really matter to users, so long as the links work. For reasons I can’t articulate right now, this strikes me as a significant improvement on a single shared drive sitting in a building.

Conclusion

I think archivists can afford to assert their professionalism, make their voice a little louder (in line with Steph Taylor’s talk), where possible stepping in at in all stages of the digital preservation narrative; at the same time, we mustn’t cling to the “old ways”, rather start to discover ways in which we can update them. John Sheridan of The National Archives has already outlined an agenda of his own to do just this. I would like to see this theme taken up by other archivists, and propose a strand along these lines for discussion at the ARA Conference.