Unearthing America’s rural history

Imagine a time when information was far less accessible than it is today. Without the power to pull out your smartphone and perform a fact check in a matter of seconds, you had to trust that the information presented to you was the accurate truth.

Throughout the 1930s and 1940s, Roy Stryker served as the head of the Information Division for the Farm Security Administration. He sent out photographers to capture images depicting rural life for a photo-documentary project. The total corpus of images consisted of over 250,000 photos, although only 175,000 of these were preserved and now reside in the Library of Congress. Yet many of these images never reached the public eye. Stryker would punch holes in the negatives of images he didn't want the public to see, which is also known as "killing" an image.

Stryker used photography to change public perception and even Congressional perception of what was needed to alleviate the dire situation of farmers post-Dust Bowl and post-Great Depression. Whether or not you would consider this to be propaganda, it was certainly a media strategy which had significant impacts on our history. Now, with the help of XSEDE, the full stories can be told.

Co-PIs Elizabeth Wuerffel and Jeffrey Will of Valparaiso University, along with the rest of their team (Alan Craig and Sandeep Puthanveetil Satheesan of XSEDE, Marcus Slavenas of the National Center for Supercomputing Applications, and Paul Rodriguez of the San Diego Supercomputer Center [SDSC]) have spent the last year analyzing the images from Stryker's photo-documentary. This research, funded solely by XSEDE and conducted on both SDSC's Comet and Oasis, entailed using pre-existing algorithms, as well as creating new algorithms, to study the large corpus of images belonging to the Farm Security Administration - Office of War Information from 1935-1944, held by the Library of Congress. These images provide a glimpse into rural life and small-town America during this decade and throughout important events in America's history such as the Dust Bowl and the Great Depression.

Their work serves many disciplines—the arts, humanities, American history, science, etc. By creating and expanding upon different algorithms, they're empowering researchers to ask big questions, and relying on the power of supercomputer to help them answer these questions. By using a supercomputer to analyze this massive amount of images, it allows researchers to ask questions like "How did the images in this database depict rural life? What visual narratives appear across the corpus? Are there significant differences in approach from photographer to photographer? Which images did Stryker kill and what can we learn from those images?"

However, the answers to these types of questions didn't come without their own set of challenges.

"It's a far-cry from that very high-level distant question to the nitty-gritty on the ground work of figuring out how to take an image in, run image processing on it, and then determining what to output in order to answer those high level questions," Will reported. "Then, given the metrics that we are able to generate from the images, we have to do the data mining on that to say, well, how do we take such a mass of information and use it to answer the questions that we're looking at?"

"Without the ECSS, we wouldn't have been able to do this project. They have been great as far as providing support implementing the algorithms. Their familiarity with the hardware and what can be done has really enabled us to do this project," Will stated.

And to them, it feels like they’re only getting started. Wuerffel's team received a funding renewal for a second year to expand upon the work they've begun. Now that they've got a lot of the infrastructure in place, they're excited to start asking more questions.

"In some ways, it feels like we're just getting going," Wuerffel stated. "Now that we have the data set and some of these image analysis tools, we can start to sift through that using SQL and figure out what kind of questions we can already ask of the data set, and, with answers to those questions, another set of questions will emerge."

The team also hopes to be able to incorporate visualizations into their research in the future, in order to better understand the results of the image analyses to help communicate these findings with the public.

Another interesting discovery that the team is excited to pursue in the coming year is the difference that the quality of image makes when running an algorithm or image analysis. Most of the photographs are available in three or four different sizes, and what they've found so far is that using a lower resolution image and the highest resolution image will not present the same results.

In the coming year, they're interested in determining the benefit of using a medium-quality image versus a high-quality image, and vice versa. This discovery will benefit not just this corpus of images, but all other image corpuses, as well as photographers and data miners.