Report from the field: network analysis

In July 2017 the Folger Institute welcomed participants and faculty to the third of its Early Modern Digital Agendas (EMDA) gatherings—an NEH-funded Institute for Advanced Topics in the Digital Humanities. The EMDA institutes train early modern scholars in digital methods, digital tools, and theoretical frameworks, exposing them to the latest methods and thinking in the field, with faculty drawn from academia and beyond.

Participant John Ladd produced a network visualization of the three EMDA institutes.

Each iteration of the institute has been more focused than its predecessor as the organizers—the director Jonathan Hope and the Folger Institute dream team of Owen Williams and Elyse Martin—have designed programs that responded to current scholarly debates and developments within the fields of early modern literature and the digital humanities. When they came to plan the 2017 program on the theme of “network analysis,” I was delighted they called on me to join their happy band to help design and pitch it to the NEH and to co-direct the institute with Jonathan.

This was clearly a dinner that other people wanted to eat at too: an impressive volume of applications had to be whittled down to fifteen participants. We captured a rich breadth of participants from six countries, various career stages (from graduate students to heads of departments), and varieties of employment (including technical specialists and librarians, as well as scholars working within more traditional departments). More importantly, they worked on diverse and fascinating network-heavy material including recipes, European newspapers, postal itineraries, printers and publishers, letters, translations, and textual dedications.

It is hard to give a meaningful overview of two weeks of rich presentations and teaching, but the sessions that our faculty served up were as varied as the people in the room. The institute itself began with a two-part day providing different perspectives on the scholarly landscape of networks: the first half was a Latourian account of Actor Network Theory (led by the Folger’s own director Mike Witmore, Mattie Burkert, and Ellen MacKay); a second session introduced the field of network science and the ways that it has begun to shape scholarship in the arts and humanities (led by Max Schich, Sebastian Ahnert, and Scott Weingart).

Max Schich gave a whistle-stop tour through scores of exciting projects that use methods from network science. Photo by Owen Williams.

The taster of what might be possible with some computational power set us up for the rest of the program and would be sated in the following days.

The institute was designed to give participants hands-on experience of the kinds of scholarship that is possible using both visualisation and quantitative network analysis. So presentations of the kinds of work coming out of this field alternated with a structured set of “build days” to help the participants see the stages in creating a project to realise their own plans.

These “build days” started with the basics but rapidly moved on to some pretty high-powered computational approaches. In sessions spread over the two weeks, participants thought about how to select their source material, learned methods for extracting data for network analysis, building and structuring their database, “cleaning” their data, visualizing and analyzing their data with off-the-shelf tools (specifically Palladio and Gephi), as well as writing their own code for tailored network analysis. Weingart did some heavy lifting with the teaching in week one, showing participants how to make “node” and “edge” lists: the latter is a list of all of the connections or ties in one’s network (of nodes) and the basic starting point for all visualizations and computation executed on the network. To get everyone practicing their new skill, Weingart got the participants to read three days’ worth of entries from Pepys diaries, and to then make edge lists for all the interactions they observed. While most people created edges between people represented in some kind of social exchange, there were also some more creative responses, including one team who made a network of the co-occurrence of words in sentences.

Week one also came with a health warning: data can be very dirty. Sometimes we create our data sets from scratch (perhaps by sitting with a specific archive and making all our entries by hand). This kind of data needs some light intrinsic “dusting” to get rid of the kinds of inconsistencies and mistakes we make when typing things in by hand. On the other hand, sometimes data sets might be pulled from online archives and there will be some hardcore deep cleaning in required. I gave the salutary example of cleaning the 37,101 name fields in the State Papers archive I’m working on with Sebastian, and the nine months (full time, plus nine further months part time) we dedicated to that process. The message was hammered home by a live cleaning activity: adding to and editing the Six Degrees of Francis Bacon database. Blaine Greteman extended the debate with some discussion about the toleration we can have for dirt, accompanied with some great memes.

The hard messages were balanced with some light relief: with drinks receptions and some alcohol-themed (if literally sober) sessions. At the end of week one Meirelles and Coleman ran a session on visualization that involved an analogue questionnaire and paper and pencil design to display the results. The questionnaire contained items about our fields of study, the Shakespeare play we’d read most recently, and favourite cocktails, amongst others. My attempt at showing the correlation between academic discipline and alcohol choice is shown here:

This analogue chart shows that EMDA17 participants who study English literature, Digital Humanities, and book history prefer Old Fashioneds, while historians are more varied in their tastes. Photo by Ruth Ahnert.

The morning session prepared the room for the session in the afternoon, which explored what digital tools like Palladio (designed by Coleman and her team) can help us “see” in our data.

The set of skills that the participants had developed by the end of the first week meant they were equipped to run their own spontaneous study session (or “groupthink” as it was labeled) over the weekend. But week two took things up a notch: moving from Palladio, which is quick and intuitive, to the non-intuitive beast that is Gephi was challenging for many, but expertly handled by faculty member Silke Vanbeselaere. We also moved on to quantitative methods and the use of algorithms. Sebastian and I shared some of the statistics we were able to extract from the 130,000 letters in our data set and the predictive power we can harness to understand the network profiles of, for example, spies and conspirators. Sebastian also lifted the hood and showed what was underneath: a pit of snakes, or, to be more precise, the Python code (the NetworkX library). Participants were introduced to some very basic coding which nevertheless enabled them to run some powerful network analysis algorithms on their data.

The varied diet of the teaching sessions, combined with lots of social lubricants and libations, led to two very important outputs to the institute. The first was notable progress on the individual projects that the participants brought with them, which were showcased in 20-30 minute presentations on the last two days of the institute. The other, which came out strongly during the presentations, was the ethos of collaboration and exchange that had developed during the two weeks. The participants repeatedly gave shout-outs to the technical assistants Pierce Williams and Caitlin Rizzo and to other participants who had helped them with technical issues. The clearest demonstration of collaboration, however, came with the decision of two participants, Yann Ryan and Rachel Midura, to give a joint presentation using their distinct data sets on news dissemination and postal itineraries, respectively: they sought to see if the postal itineraries could predict how news would spread, and if deviations from those predictions could yield important revelations. We look forward to seeing this work develop when we come back together in a year’s time for the EMDA17 reunion workshop.

Others too found kindred spirits, such as the cluster working on printer networks (Blaine Greteman, Rebecca Emmett, Tara Wood, Marie-Alice Belle), or those using diplomatic and other correspondence (Matt Symonds, Thea Lindquist, Catherine Medici-Thiemann, Ingeborg van Vugt). The connections, however, were not just thematic. Yann Ryan commented on Twitter about Melissa Schultheis’s recipe data:

Many more lateral connections between the projects emerged during these presentations, such as the effects of suppression, from the religious suppression of Genelle Gertz’s Tudor mystics, to the threats of censorship to Helkiah Crooke’s Mikrokosmographia discussed by Jillian Linster. Michael Gavin explored semantic similarity and geographical proximity via networks of the eighteenth-century Highlands.

The institute’s concluding discussion was necessarily future-focused. These were both short-term realistic aims and long-term dreams. Regarding the former, the participants talked about how they wanted their projects to grow and develop and how they hoped to fold these methods into their teaching. Others had bigger dreams: John Ladd conjured a future in which OCR (Optical Character Recognition) for manuscripts could make all texts available to us, and there was an easily shareable network of images from all over the world—the promise of Linked Open Data (LOD).

The institute ultimately concluded with a powerful debate about the broader ramifications of so-called technological progress. Burkert suggested that LOD was our version of the eighteenth-century encyclopedic dream, but David Baker pointed out that we needed to understand how this dream looks from the outside, how it looks to our colleagues who might be opponents of DH, and those who are worried about the corporatization of the university. Freely accessible data and predictive power are things they fear, and perhaps rightly so. All the technology that is developed to help us with our scholarly quests can potentially be used against us, as the recent case of Cambridge Analytica’s micro-targeting of propaganda on Facebook has shown. But this is no reason to stop, argued Schich: “the enlightenment is not going to be safe by stopping the enlightenment.” If we are to take anything away from this institute, I think it should be this.

For those who are interested, the archive of #EMDA17 tweets can be accessed here, a searchable version can be accessed here, and a visualization is available here.

Dr. Ruth Ahnert is Senior Lecturer in English at Queen Mary University of London. She researches broadly in the area of Tudor culture and writing, often using digital methods from the field of Complex Networks to study Tudor letters. Work in this area has been funded by the Folger Shakespeare Library, Stanford Humanities Center, the Arts and Humanities Research Council (UK), and a QMUL Innovation Grant. Her first book, The Rise of Prison Literature in the Sixteenth Century (Cambridge University Press, 2013), explored the kinds of writing undertaken by Tudor prisoners.

Guest Author is the byline used for guests at The Collation, including Folger fellows, program participants, and readers who wish to share their research on collection items. Please see the end of each post for information about that author.