Sherwood Helps Shape Supercomputing Tool for Literary Analysis

Cofounder of the IUP Center for Digital Humanities and Culture, Kenneth Sherwood spent a week at UT Austin in May 2013 as an invited participant in “High Performance Sound Technology for Access and Scholarship.” This National Endowment for the Humanities-sponsored Institute for Advanced Topics in Digital Humanities centers involves an innovative use of supercomputers for the analysis of spoken word audio.

Sherwood joined the project at the beginning of its initial phase, which involves working with developers to optimize a sound analysis tool called ARLO for research applications in spoken word audio. Sherwood is among those bringing a research background in orality and poetry performance, joining scholars with expertise in poetry, Native American culture, and oral history from institutions that include the Library of Congress and Storycorp.

“The process is exciting and challenging,” Sherwood said. “As scholars, we have the opportunity to shape the tool we will use for doing our research rather than being given a tool off the shelf.” The scholars spent the week learning how to use ARLO, creating provisional queries of large data sets, in order to develop a features list for developers to modify the tool to better suit humanities research.

One of the larger challengers facing digitizers is that the vast quantity of material can become difficult to search or navigate.

Reflecting one of the emerging trends in Digital Humanities—research into “big data”—this project will help researchers develop algorithms that will “listen to” massive amounts of digitized audio data now housed in many research collections. Traditionally, scholars in literary studies have worked closely with individual texts. Some see digitization and the application of algorithmic analysis as heralding a paradigm shift in the humanities, changing the very nature of the questions which can be asked.

Over the course of the year, Sherwood and other scholars will be using ARLO to query audio databases, searching for patterns that would be too laborious to locate manually, with the aim of presenting their findings through professional conferences and publications.

“The tool does not think for you, but it extends the kinds and scope of questions you can ask,” Sherwood said. For instance, “if I am interested in joke telling or audience response such as laughter in poetry readings, or whether poetry readings became more dramatic during the 1960s. It might take hundreds of hours of listening to identify a cross-range of instances or the frequency of jokes within a decade of audio.” According to Sherwood, feeding ARLO a proper query should produce interesting results within a matter of minutes. Student scholars affiliated with IUP’s Center for Digital Humanities and Culture will have the opportunity to learn how to use the tool and contribute to this ongoing research.

The initial phase of the project concludes in May 2014. Participants and organizers are hopeful that further funding will allow the project ot advance to the next level. The HIPSTAS project is hosted by the University of Texas, and the project is further supported through the Illinois Informatics Institute of the University of Illinois. In order to crunch gigabytes of audio data, ARLO runs on the NSF-funded Xsede network of supercomputers.