Our unit in Intro DH right now is on mapping. In class we’ll be working on creating maps with Palladio. We also had a preliminary introduction to data, tables, and maps by experimenting with Google Fusion Tables. In preparation for class, I imported a data set consisting of a list of images from the Cushman Archive into a few different tools to experiment.

Here is the map of the data in a Google Fusion map:

This is Miriam Posner’s version of the data. She downloaded the data from the Cushman archives site, restricted the dates slightly, and cleaned it up. This data went straight into Google’s Fusion Tables as is. The map shows the locations of the objects photographed. One dot for every photograph. Locations are longitude-latitude geocoordinates.

Then I tried CartoDB. I’ve never used it before, but it’s fairly user friendly for anyone willing to spend some time just playing around and seeing what works and doesn’t work. The first thing I discovered was that CartoDB (unlike Fusion Tables) does not like geocoordinates in one field. In the Cushman dataset, the longitude and latitude were together in one field. But in CartoDB, longitude and latitude must be disaggregated. So to create the following map in CartoDB I first followed the instructions in their FAQ to create separate columns for longitude and latitude. Then I had fun playing with their map options.

This is just a plain map, but with the locations color coded by the primary genre of each photograph (direct link to CartoDB map):

This one shows the photographs over time (go to the direct link to CartoDB map, because on the embedded map below, the legend blocks the slider):

Then I decided I wanted to see if I could map based on states or cities (for example, summing the number of photographs in a certain state, and color-coding or sizing the dots on the map based on the number of photographs from that city or state). So I used the same process to disaggregate cities and states as I used to disaggregate longitude/latitude — I just changed the field names. I noted, though, that for some reason, trying to geo-code by the city led to some incorrect locations. If you zoom out in the map below, you’ll see that some of the photographs of objects in Atlanta, Georgia, have been placed in Central Asia, in Georgia and Armenia. This map represents many efforts to clean the data through automation — simply retelling CartoDB to geocode the cities or states. Didn’t work well.

I also couldn’t figure out a good way to visualize density — the number of photographs from each state, for example. So I downloaded my new dataset from CartoDB as a csv file and then imported it into Tableau (Desktop 9.0). By dragging and dropping the “state” field onto the workspace, I quickly created a map showing all the states where photographs in the collection had been taken:

Then I dragged and dropped Topical Subject Heading 1 (under the Dimensions list on the left in Tableau) onto my map, and I dragged and dropped the “Number of Records” Measure (under the Measures list on the left in Tableau), and I got a series of maps, one for each of the subjects listed in the TSH1 field:

Note that Tableau kindly tells you how many entries it was unable to map! (the ## unknown in the lower right).

Below I’ve Summed by the number of records (no genre, topical subject, etc.) for each state. For this, it’s better to use the graded color option than the stepped color option. If you have just five steps or stages of color, it looks like most of the states have the same number of images, when it is more varied. The graded color (used below) shows the variations better.

This map also shows that the location information for photographs from Mexico was not interpreted properly by Tableau. Sonora (for which there is data) is not highlighted.

Then I decided hey, why not a bubble map of locations, so here we go. Same data as above map, but I selected a different kind of visualization (called “Packed Bubbles” in Tableau).

When I hovered on some of the bubbles, I could easily see the messy data in Tableau. Ciudad Juarez is one of the cities/states that got mangled during import, probably due to the accent:

Finally, a simple map with circles corresponding to the number of photographs from that location. (Again clearly showing that the info from Mexico is not visible. In fact, 348 items seem not to be mapped.)

Obviously the next step would be to clean the data, using Google Refine, probably, and then reload.

October 20, 2015 / / Comments Off on Digital pedagogy and student knowledge production

The past two weeks in my Introduction to Digital Humanities course, students have been using the open-source content management system Omeka to create online exhibits related to the early Christian text, the Martyrdom of Perpetua and Felicitas.

I was astounded by their accomplishments. The students raised thoughtful questions about the text, found items online related to Perpetua and Felicitas to use/curate/re-mix, and then created thoughtful exhibits on different topics in groups.

None of them know much if anything about early Christianity. (I think one student has taken a class with me before). None of them had used Omeka before. Few of them would consider themselves proficient in digital technology before taking the class.

Each student then went home and found three items online related to Perpetua and Felicitas or any of the themes and questions we brainstormed. (They watched out for the licensing of items to be sure they could reuse and republish them.)

In class each person added one item to the Omeka site — we talked about metadata, licensing, classfication

We revised revised revised; in groups, each student added two more items

We grouped the Items into Collections (which required discussion about *how* to group Items)

Then in small groups, students created Exhibits based on key themes we had been discussing. Each group created an Exhibit; each student a page within the exhibit.

What made it work?

Before even starting with Omeka, we read about cultural heritage issues and digitization, licensing, metadata, and classification — all issues they had to apply when doing their work

Lots and lots of in class time for students to work

Collaboration! Students all contributed items to Omeka, and then they each could use any other students’ items to create their exhibits; we had a much more diverse pool of resources by collaborating in this way

Peer evaluating: students reviewed each others work

The great attitude and generosity of the students — they completely submersed themselves into it.

The Omeka CMS forced students to think about licensing, sourcing, classification, etc., as they were adding and creating content.

The writing and documentation in these exhibits exceeded my expectations, and also exceeded what I usually see in student papers and projects. Some of this is due to the fact that I have quite a few English majors, who are really good at writing, interpreting, documenting. I also was pleasantly surprised by the level of insight from students who were not formally trained in early Christian history. They connected items about suicide and noble death, as well as myths about the sacrifice of virgins; they found WWII photos of Carthage.

Are there some claims in these exhibits that I would hope someone more steeped in early Christian history would modify, nuance, frame differently? Sure. And not all items are as well sourced or documented as others. We also did not as a class do a good job of conforming all of our metadata to set standards (date standards, consistent subjects according to Dublin Core or Library of Congress subject categories, etc.). We tried, but it was a lot of data wrangling for an introductory class. And honestly, I was satisfied that they wrestled with these issues and were as consistent as we were.

So in sum, for undergraduate work, I was pleased with the results, and am happy to share them with you.

In my Introduction to Digital Humanities course, my students are conducting very basic text analysis using Voyant and AntConc. One of the datasets we are using is a set of martyr texts taken from the now public domain Ante-Nicene Fathers series (available at newadvent.org).

I’m a little bit of a skeptic regarding wordclouds; I generally regard them as useful insofar as they are aesthetically pleasing and in that they may spark a deeper interest in a text or set of texts.

Thus, I was pleasantly surprised to see the results of the wordcloud in Voyant. A martyr is a witness, quite literally in Greek. And lo and behold: the most prominent word (after accounting for a standard English stop word list) is “said.” Speaking. Witnessing?

We also put the martyr texts through AntConc, and we tested the Martyrdom of Perpetua and Felicitas against the rest of the dataset to check for key words: just which words were distinctive to Perpetua and Felicitas? Once again I was pleasantly surprised.

AntConc: Keywords in Perpetua and Felicitas measured against other martyr texts in English translation

Note the prominence of “I” and “my” and “me.” The “keyness” of the first person pronouns reflect the presence of a section of the martyr text often called Perpetua’s “prison diary”; according to tradition, the diary was written by Perpetua herself. The keyness of “she” and “her” of course reflect the text’s women protagonists.