Archives Unleashed: A Series of Datathons for Cultural Heritage

Web archives have been around for over twenty years – while the collections are great, access and usage remain a considerable problem. Historians and others need to get involved to explore various models for research access. But how do we even get things started?

It will require community. It isn’t just going to be historians working on their lonesome that are going to be able to bootstrap a revolution in the use of web archives for historical research… but neither is it going to be computer scientists on their own, or librarians, or communications scholars, or any one disciplinary perspective. We will all need to work together.

The crowd at the Toronto Datathon in March 2016 (Archives Unleashed 1.0)

These datathons aim to get people working together to accomplish four main points:

Building a community that could cohere around Web archives, finding scholarly voices);

Articulate a common vision of web archiving development and tool development, to give the field some additional shape;

To avoid the black boxes of search engines we don’t understand – if you run a search on billions of documents, you need to know why the search engine has given you the first ten or twenty results, or else it’ll really be writing your histories for you;

And most importantly, equipping us as a collective – here I really mean society more generally – to work with born-digital cultural resources.

Forming teams at the Library of Congress (Archives Unleashed 2.0)

This means that we need to bring different perspectives together. Hackers, or those who can work with data and code, to make our vision a reality and generate new accessible open-source code that can let us rise to this reality. But also yackers – or humanists – who have technical chops to have meaningful conversations, but who can bring their professional wisdom developed over years of theoretical and historical engagement.

Accordingly, these datathons bring together the authors of various web archive analytics platforms, collecting institutions, and researchers to collaborate on how best to develop the tools, frameworks, and access platforms needed to work with these materials. They have the three-fold goal of facilitating web archives access, community building, and graduate skills training. Each event has seen us bring together fifty individuals, break them into interdisciplinary teams, provide the teams with web archival data and computing infrastructure, and set them to work. The results have been significant: analyses of political party websites, social media streams, and the development of dynamic tools.For example, one team at the Toronto version generated the in-browser D3.js visualizer that is now part of Warcbase. Another team at the Library of Congress datathon worked with the 2004 American federal election crawl and created a visualization to show the geographic locations mentioned by parties and candidates in the final days of the election.

A team works late at the Internet Archive during Archives Unleashed 3.0

We can’t wait for our next event at the British Library! Stay tuned for some news out of that, as the teams form, the projects are put together, and conversations about the future of web archiving research and community continue.