As I mentioned in an earlier post, my Digital Scholarly Editing class is working to make a digital scholarly edition of the diary of World War I soldier Albert Woodman. That previous post went into some detail about our TEI encoding schema, the decisions that we made regarding how to render the text using a machine-readable format. For this post, however, I would like to take a step back and look at the larger digital scholarly edition that we are creating, and how we strove to create something that is faithful to the original source material—not only in content, but also in the “feel” of the site that we built. Those decisions all have a place in the broader context of the digital scholarly edition as an academic medium, and in how such editions fulfil the role of representing individual objects in the digital realm.

Patrick Sahle, who maintains a catalog of: Digital Scholarly Editions, v 3.0, defines a scholarly edition as “a critical representation of historical documents” (Sahle, “About”). He goes on to say that a digital scholarly edition is more than just a digitized print edition, it contains elements that cannot be rendered in a printed medium without a loss of data. In other words, a digital scholarly edition presents a document or closely related group of documents, and accompanying scholarly material such as annotations and critical articles, in a way that a printed edition cannot. That difference can include audio or video material that cannot be included in books, or functionality such as rendering the text in different ways or cross-referencing links in a manner that printed editions are unable to match.

A digital scholarly edition does not have to be a massive, large-scale project. Our edition of the Woodman diary falls under the category of what Anne Burdick and others call “‘lowercase’ Digital Humanities projects”, which “are typically carried out by individuals or small teams in consultation with experienced staff” (Burdick et al. 124). The fact that digital scholarly editions can be created by small teams such as this is a boon to objects such as the Woodman diary, which, as small snapshots of an individual’s life, may not be seen as having a sufficiently large potential audience to justify the expenses associated with traditional print publication. The diary is an important object, however, and Woodman’s story has as much right to be told as anyone’s. The fact that a digital scholarly edition of the diary can be produced without much expense, while simultaneously providing more flexible options for its presentation than can printed media, make electronic publication an ideal venue for the Albert Woodman diary.

However, as Sahle’s definition suggests, creating a digital scholarly edition of the diary involves more than just taking snapshots of the diary and throwing them up online. Creating such a digital facsimile, while perfectly functional in terms of preservation and accessibility, does not a scholarly edition make. In creating the digital scholarly edition of the diary, therefore, we made an effort to take advantage of the opportunities that our medium afforded us, including adding videos of interviews, contextual articles, maps, and annotations.

The added flexibility that the digital platform affords, however, is also a danger. In our meetings discussing what we would do with the edition, it was very easy to get caught up in ideas about what we could do—creating geo-referenced versions of the maps in the diary that highlight areas that Albert Woodman visited, for instance, or writing the history of the locales mentioned in the diary and pinning them on maps, or including videos of historical re-enactments. Now, these are all ideas that have merit, and many of the ideas that we came up with are reflected in the final product in some form, but the critically important key is moderation. The physical artefact that we are building our digital object to showcase is Woodman’s diary, and we kept coming back to the same point: nothing should pull focus from the diary.

Our precept that the diary should always have centre stage went on to inform how the site itself was built. I’ve already written about how we handled the text itself, but it bears restating that one of our important decisions in deciding how to display the content of the diary was to include the ability to view the encoded text alongside a facsimile image of the diary, in which the layout of the typed text mirrored that of the written text. That layout decision is perhaps the most obvious example of our making choices that respected the original object, but the physical diary is reflected in other elements of the digital scholarly edition as well, including in the overall design.

While each member of the digital scholarly editing class has a hand in all elements of the object that we are building, the overall design of the site was primarily my responsibility. In creating the site, I chose a flat design over skeuomorphism (replicating the look and feel of an object in one medium in order to improve familiarity and ease of use in a different medium) for reasons similar to those driving my decisions in the redesign of Susan Schreibman’s Versioning Machine, which I have already discussed in depth in my blog post on the topic. While the Versioning Machine is a born-digital object, however, our digital scholarly edition is very closely linked to a physical object, and so while our object is not laid out or navigated like a physical book, it does take its visual cues from the original diary, most notably in its use of colour. The buttons, logo, background, and text of the site are all rendered using various browns. These colour choices were far from arbitrary: rather, I pulled the colours directly from the images of the original object. The heading text, the background, the colour of the button for the current window—all owe their hues to the cover, the ink, and even the mottled, ageing paper that Woodman originally used to write his words. Pulling colours from the diary not only makes the site more representative of the original object, but it also makes the original facsimiles, which we include throughout our edition, more cohesive with the site as a whole. If the site were bright red, for example, the browns and yellows of the journal would look quite jarring, but because their colours are subtly reflected in the larger page, the images of the original diary feel consistent with the larger digital object—as well they should, since they are the heart of the edition itself.

Then there is the site’s main identity, the logo itself. A logo is, in essence, an encapsulation of an identity, something that should reflect the object it represents while being flexible and recognisable. A logo or header can be useful for introducing a digital scholarly edition and giving it a distinct character. In the digital scholarly edition of the Diary of Mary Martin, a woman living in Monkstown in Ireland in 1916, the site’s identity is established by a detailed header that includes photographs of Mary Martin and her family superimposed on images of her diary itself (“A Family at War”). By way of contrast, another edition, Electronic Beowulf, uses a simple, modern logo in hues of blue-green, using colours mirroring those of a small illustration from the text that is featured at the top of each of the edition’s main pages (“Studying Beowulf”). In this case, while the logo is entirely modern, it maintains its tie to the original work through a visual connection to the original material.

In making a logo for our digital scholarly edition, I wanted to stay true to our philosophy that the diary is the centrepiece around which the entire digital object was constructed. Therefore, I searched through the diary in order to render our title using Albert Woodman’s own handwriting, piecing together the words from different pages, rendering them as vector images, normalising their colour, and then placing them back on a vector-rendered image of Woodman’s original diary paper (I used vector images not only so that the logo could be rendered in any size without sacrificing fidelity, but also so that the logo text could be used without a background if necessary, affording us more flexibility). The end result is, in essence, reflective of the overall digital scholarly edition itself—it is created from an analogue object, deconstructed, and rebuilt in a way that takes advantage of the capabilities of modern technology, all while staying true to the original source. Because the source is, after all, what the digital scholarly edition is all about.

In a previous post, I wrote at length about my participation in a practicum involving Susan Schreibman’s Versioning Machine, and about the design work that I was doing. However, my role in the practicum includes more than just design; I am to help with outreach as well. Part of outreach includes letting an audience know that something exists, but another important part of outreach lies in collecting feedback from users. While one of the easiest methods of collecting information is through the use of surveys, it is potentially more illuminating—especially when collecting feedback on how easy or difficult something is to use—to conduct a focus group.

Conducting a focus group means bringing a number of people together in one place and then having them use a product while discussing their experiences or answering specific questions. Such a group generally requires that a moderator be present, both to keep participants on task and to collect the data being generated. Focus groups can be resource-intensive, since a moderator must be present for the duration of the group, a location (physical or virtual) for conducting the group needs to be secured, and participants must be willing to donate a greater amount of time than is generally required by a simple survey. As such, focus groups tend to generate smaller data sets than surveys do, but the trade off is quite significant: first, focus groups can collect users’ immediate feedback from the very point at which they use a product, rather than requiring them to remember an experience that may have happened in the past; second, behaviour can be observed that users may not even think to report, such as how long it takes a user to find a particular function or the various things that a user tries in first learning to navigate an object; and third, issues may be discovered or information can be gathered that the researchers may not even think to ask about in a survey. In short, the difference between a focus group and a survey can be seen as a choice between quantity and quality. Now, generally when the quality vs. quantity question is presented, the “correct” choice is expected to be quality, but with data collection, that is not a given. Qualitative data can be very useful, but from a statistical standpoint, quantitative data, such as the 1-5 scales seen so often on surveys, is much more easily analysed, and can be very powerful in large numbers. Rather, the choice between large amounts of quantitative data and small amounts of qualitative data comes down to the purpose for which the data is being collected. In the case of the versioning machine, we wanted a small-scale idea of how our new features and interface were received (and how intuitive they were) in order to help guide our further development. In a case like this, a focus group is ideal.

This blog post will not go into all the particulars of setting up and conducting a focus group—a simple search on the web for “how to conduct a focus group” yields a wealth of information on that front. Rather, I am going to focus on one particular area that I feel is of utmost importance when collecting information: how to ask questions. While I am now a digital humanist and have worked in a number of different fields, the piece of paper I received upon completing my undergraduate studies labels me as a psychologist. While working as a psychologist, one of the most critical points impressed upon me was how careful one must be in asking questions. After all, many of us are conditioned early on in school to believe that every “question” has a correct “answer”: and we as humans have become very good at picking up subtle cues as to what someone asking a question wants to hear in response. Dr. Robert Cialdini has written a fantastically illuminating book on the ways that subtle cues can affect another person’s behaviour, appropriately titled Influence: The Psychology of Persuasion. The book was required reading during my freshman year, and I have kept its lessons in mind ever since. In the case of research, the factors outlined in Cialdini’s Influence become a list of things to avoid when asking questions. Some of the triggers that can affect responses are surprising: in his introduction, for example, Cialdini writes about the research of Ellen Langer, who found that the simple presence of a reason when asking for a favour has a dramatic impact on the likelihood of someone granting that favour—regardless of how meaningful the reason is. Specifically, Cialdini writes that, when people waiting in line to use a copy machine were approached and asked “Excuseme,Ihavefivepages.MayIusetheXeroxmachine?”, 60% of the people agreed to let the approaching person cut in front of them, while if the waiting people were asked “Excuse me, I have five pages. May I use the Xerox machine because I have to make some copies?”, the request was met with 93% compliance (Cialdini 4). The simple presence of a clause beginning with “because”, regardless of whether or not the reason that followed provided any additional information, was sufficient to dramatically alter the trend in user responses.

So how, then, does one craft a question to get feedback that accurately represents how someone feels? When I am asking questions to get at what someone thinks or feels, I remember a book I read in high school, Dr. Virginia M. Axline’s Dibs in Search of Self. In the book, which is a fascinating case study of a troubled young child, therapist Dr. Axline is very careful in her interactions with the child Dibs to ask questions in such a way as to not in any way lead him to a response. Sometimes questions can seem very innocuous, but even asking for value judgements using words such as “like” or “enjoy” can prod a user in one direction or another.

It may seem strange to think that a case study on child therapy can have implications on data collection, so perhaps an example is in order. Consider this button that we have on the new Versioning Machine:

As buttons go, this question mark is extremely straightforward. When I designed the new Versioning Machine’s layout, I intended for the question mark to open up tooltips showing how the site works. Much to my surprise, however, I found that, in the implementation, the question mark button was used instead to bring up bibliographic information about the document being examined in the versioning machine. Both approaches can be argued to be perfectly reasonable applications of the question mark: one produces information about the document being used, while the other produces information about the versioning machine itself. When I saw the way that the question mark was being applied, I disagreed with its use: I felt that the question mark should point to help about using the Versioning Machine itself. I didn’t want my focus group participants to know how I felt about the button’s application before they told me their own impressions—it would be unprofessional, for one, but their feedback would also be unreliable if I primed them to feel a particular way with my questions. Here are some of the questions I could have asked, and reasons why they would not have been ideal.

1. “Don’t you think that the question mark button should point to a help file instead of to information about the document itself?”

This question is terrible for reasons that are, I hope, immediately evident. Any statement that begins with “don’t you think” is essentially asking for agreement, rather than for an objective opinion. In this case, it immediately tells the listener that the “correct” response is to say that yes, the question mark should point to a help file. It also forces the user to make a choice between two alternatives, rather than giving an option for a neutral response. Granted, someone may still take a contrary position if that person felt very strongly that the question mark button’s use was correct, but if the participant had no strong opinion about the button’s function, he or she is likely to simply agree with the moderator. Asking a question like this may seem like an obvious mistake, but it does happen. Many years ago I worked for a company that held a focus group to find out what users thought of a product in development, and I sat and cringed as my boss asked a string of leading questions like this one, and left satisfied that the participants completely agreed with his opinions. When the product was finally released, my boss could not understand why the opinions of the wider consumer base toward the product were not wildly positive like those of the focus group had been.

2. “Should the question mark button bring up help about the Versioning Machine itself?”

This question takes a neutral tone, and as such is better than question #1, but it still falls short by introducing an idea that may not have otherwise occurred to the focus group participants. I call these “seed ideas”, because they are quiet little thoughts planted in questions that can quickly sprout in the participants’ minds. A question like this is often met with “Oh, that’s a good idea!” Wonderful to know, but it doesn’t answer the question of whether or not the participants found the button’s original function perfectly satisfactory to begin with. And once that new idea has taken root, it’s impossible to back-pedal and get at that initial impression: there’s too much new idea foliage in the way. There is occasionally a place for asking about a possible feature change such as this in a focus group, but those are few and far between. Questions like this are best generally avoided.

3. “Do you think the question mark button is poorly implemented?”

Here is another leading question. “Poorly implemented” essentially introduces a value judgement, which participants are then either expected to accept or reject. Furthermore, the question’s negative tone is a subtle cue to participants that they should be finding fault with the question mark button, even if they initially thought it was fine. This can cause participants to second-guess their own initial reactions, leading to disingenuous responses. As with question #1, a participant who feels very strongly that the question mark button is functioning fine as-is will probably still say so, but participants who have no strong feelings one way or the other will suddenly feel that they are expected to take sides—and they are likely to side with the moderator in that case.

4. “Do you like the question mark button?”

Here is a very common form for a question to take, but it is nonetheless not ideal. “Do you like…?” is one of those tricky questions that sounds neutral while really introducing a value judgement. It is, essentially, the positive alternative to the negative stance presented in question #3. People are so used to being asked questions beginning with “Do you like…” that the subtle prod the phrase gives is easy to overlook, but showing someone a question mark button and asking, “Do you like the question mark button?” is basically like giving someone a plate of strawberries and asking, “Do you like strawberries?” If the participant answers in the negative, it is a subtle kind of rejection, and social niceties dictate that we generally shouldn’t reject something presented to us unless we have a good reason. This is the kind of question that is likely to be met with “Yeah, I think it’s fine.” from people who normally wouldn’t care one way or the other.

5. “What do you think of the question mark button?”

This, finally, is an ideal question. The question is specific in that it asks for feedback about a particular element of the design, but it does so while avoiding any words that may prime participants to feel a particular way or to have a particular idea. Also, unlike every other question on this list, it is not a question that can be answered with a “yes” or a “no”—it does not, in and of itself, expect that participants make value judgements, while at the same time providing room for them to do so if they wish. Any time a particular element is being asked about, a question beginning with “What do you think of…” is generally a safe approach. Questions #1 to #4 may come into play as follow up questions if a participant indicates that he or she has a particular opinion, but question #5 is the one that should start that conversation, as it gives no indication of expecting a value judgement at all.

In the case of the Versioning Machine, of the focus groups held so far, opinions have been mixed. In the first two sessions, the opinion was very strong that the question mark button was confusing and that any button labelled “?” should bring up help about the interface itself rather than about the document being examined, while in the third session, no fault was found with the button at all. Three data points do not make for conclusive results, but I am at least aware that the question mark button is potentially an area of concern. While I cannot yet decide whether the button’s implementation is ideal, however, I can be confident that the opinions expressed by the participants are genuine, because of the effort I put into crafting neutral questions.

And getting genuine responses is, after all, what focus groups are all about.

Cialdini, Robert B. Influence: The Psychology of Persuasion. Rev. Ed. New York: Quill-William Morrow & Company, Inc., 1993. Print.
-Note: A 2006 revision of this book has been published by Harper Business.

My Digital Scholarly Editing class has been hard at work recently preparing a website featuring the diaries of World War I soldier Albert Woodman. In doing so, we have needed to encode the handwritten diary in a machine-readable format. We chose a TEI encoding schema (TEI stands for, appropriately enough, the Text Encoding Initiative). During some of our very animated debates over how our encoding structure should look, I’ve been reminded of discussions that have happened in meetings for my practicum, in which I have been working on a new design for Susan Schreibman’s Versioning Machine, which I have written about here.

In both encoding the diary and in working on the practicum, we have looked at the same question from two different sides: how much encoding is enough?

Taking a handwritten document and presenting it on-screen is a complex task. At its most basic, an electronic edition of the diary can be accomplished by simply scanning every single page and putting the digital facsimile up online. Indeed, in our digital scholarly edition of the Woodman Diary that we are creating as a part of our Digital Scholarly Editing class, we are including digital images of the diary pages. But that alone would make for a gross underutilisation of the digital medium. Being able to cross-reference entries and add annotations are major features that the digital diary could provide, not to mention simply including a more readable version of the text for those who may have trouble deciphering Albert Woodman’s handwriting. All of these things require the text to be encoded, rather than simply scanned.

Encoding our text is by no means a straightforward matter, however. TEI schemas include a plethora of options for capturing not only the words on the page, but also scribbles, removed words, additions, and even elements such as gaps in the text or horizontal lines drawn across the page. All of these possibilities had to, at one point or another, be addressed in our discussions of how to render the diary. Ultimately, the encoding choices came down to the tension between rendering the text as closely as possible to how it appeared on the page and the practicality of creating a readable digital object in a reasonable amount of time.

In order to make those decisions, we first had to think about how we wanted the digital scholarly edition to look. Would the text reflect the layout of the page, becoming basically a typeset version of Albert Woodman’s written diary, or would it distance itself from the physical object, taking form in a layout that provided smoother readability in a browser window?

Ultimately, we decided to go with both approaches. When a user is simply browsing the contents of the diary, the text appears in isolation, and takes a shape that is as easily readable as possible, with a single day’s text appearing in its entirety, regardless of how many physical diary pages the entry required or whether other days appeared on the same original pages. When a user chooses to view a facsimile of a diary page, however, a transcription appears alongside the original image that preserves the original’s line breaks, and any text on the page not related to the selected day appears in greyed-out form. If the selected day does not end on the physical page, then that text is also omitted from the accompanying transcription.

In order to create both layouts, we needed to have an encoding structure that recognizes both individual dates and individual diary pages. We included both, therefore, giving dates a greater hierarchical importance because they were our primary method for organizing the diary. In order to create a display that presented text side-by-side with the facsimile, however, we also needed to encode line breaks for each individual line on the paper, regardless of whether a break in the line would make syntactical sense in the text. One of the nice things about XML is that, if a certain encoded structure is not amenable to a certain display style, it can simply be ignored, and thus our standard reading view will simply ignore those line breaks.

Ignoring line breaks would, however, leave us with giant masses of text for each individual day, and so we also decided to encode paragraphs, preserving Albert Woodman’s conceptual organization in addition to that imposed upon him by the width of his diary pages. The text in the reading view will then be organized by those paragraphs, breaking up the text to make it more manageable.

We began looking at other ways to respect the text of the diary as it was written on the page, but we soon realized that trying to capture every element of the diary would quickly become overwhelming. We decided to encode deletions by the author, such as words he has scribbled out, by using the <del> tag, to capture underlined words with the <hi> tag, and additions that were added in the middle of a sentence with the <add> tag, all with the appropriate attributes when necessary. But capturing more than this—horizontal lines drawn to break up entries on a page or words that are curved slightly to be crammed into the space at the end of a line, for example—would be overkill, especially since we were not planning to render any of those elements in our display of the text. Our design, in other words, came to inform our encoding, just as much as our encoding plans needed to be reflected in our design.

While decisions about the digital representation of the diary were important, however, a digital scholarly edition includes more than just the words that were on the paper. We also had to decide how to annotate our text.

A century-old wartime diary would naturally include a number of terms and references that would want some degree of explanation, even if it were not a personal account. The fact that the diary includes a number of references that are specific to Albert Woodman’s personal life and experiences only complicated our decisions more. Theoretically, we could have decided to use tags to encode nearly every word for a potential annotation, but doing so would bloat our encoding to such a degree that it would be next to impossible to complete our digital object on any kind of a reasonable schedule. And so, we again had to decide what to encode and what to leave be.

People and places were obvious choices for inclusion—both provide valuable context for the diary, and are rich subjects for expansion in notes after they are captured with the <persName> and <placeName> tags and assigned appropriate attributes.

Other terms were more troublesome, however. The diary is rife with military terms, bits of dialectical and period slang, foreign sprinklings of words, and simple abbreviations for brevity’s sake. Initially, we attempted to capture all of these using the <distinct> tag and then to distinguish between them all with “type” and “time” attributes such as “slang”, “mil”, and “WWI”. Keeping everything straight quickly became a logistical nightmare, however, and we realized that we were also creating distinctions that we didn’t need for our rendering of the edition. As far as the annotations were concerned, a term is a term is a term, whether it be military jargon or simple period slang. Those distinctions could be made in the annotations themselves, rather than in the overall encoding. Furthermore, creating a linked annotation for so many words would turn our diary into an unreadable stew of hyperlinks, and attempting to read the diary while clicking on every individual annotation would likely exhaust users before they reached the end of a single day. So, we simplified, giving any relevant term a <term> tag and a simple “ref” attribute marker, which is more than sufficient for our organizational needs. Foreign words and basic abbreviations are encoded as such internally but are displayed as-is on the website, free of any extraneous annotations. That gave us a firm framework to work in without making the diary’s encoding too complex to write (or to read).

And so, the answer to the question “How much encoding is enough?” lies in deciding what the encoding is for. If our presentation is too oppressive, we are defeating our own purpose—as we also would be doing if we made our encoding so complicated that we were unable to finish it in time for our release. After all, what good is a digital scholarly edition if nobody ever gets to read it?

As a part of my studies in Digital Humanities, I am taking part in a Practicum involving the Versioning Machine, a text-comparison platform developed by Susan Schreibman. The interface allows for multiple versions of a TEI-encoded text to be displayed side by side, facilitating analysis and editing. While samples of the versioning machine at work can be viewed on the web, it is designed to be downloadable and functional in an offline environment, with its modular nature making its backend amenable to modification and customization. The frontend, however, is starting to show its age, and thus my role in the practicum: I am to help with design and outreach, updating the look and functionality of the UI and then letting our users know about the new and improved Versioning Machine 5.0.

When redesigning the look of the Versioning Machine, I decided to make use of what is known as “flat” design. The concept of flat design is, in essence, exactly what the name suggests: it involves minimizing the artificially three-dimensional elements presented to the user, thereby “flattening” the display (which is, after all, only two-dimensional to begin with).

There is more to flat design than simple aesthetics, however—although, as with any kind of design, aesthetic considerations do certainly come into play to some degree. Flat design is a reaction to skeuomorphism, which is a design philosophy centred around replicating the look and/or feel of an object in a different medium into the medium being designed. The “book metaphor” that many digital humanists struggle so hard to escape from is an example of this: when the Amazon Kindle originally launched and tried so hard to capture the look of printed paper (“Introducing Amazon Kindle”), it was a skeuomorphic design choice. An analogue clock on a computer screen is another very common example, as Luke Clum illustrates in his article “A Look at Flat Design and Why It’s Significant” by using Apple’s dashboard as an exemplar. Apple is not the only company making these choices, either: Microsoft, whose Windows 8 line of operating systems is a very clear example of flat design, displays an analogue clock when a user clicks on the time and date on the lower right corner of the desktop.

What made Skeuomorphism so popular in computing? I believe that it was because these new products and operating systems were bringing technology to a new, wider audience. When something is new, people require a frame of reference: we experience new things in relation to our previous experiences, after all. Look at the conversations that happened in 2011 and 2012 when the Amazon Kindle was gaining widespread traction in the consumer market. The concept of an e-reader met with some resistance because people wanted to read the way they were used to: on paper, in printed books and magazines. Articles like Sara Barbour’s in the Los Angeles Times and Andrew Blackman’s on his blog argue that e-readers just aren’t the same as good old-fashioned books. The conversation across Amazon’s own customer forums, sparked by a user with the appropriate username of “lovetoread”, focused on customers weighing the new technology against the traditional books they knew and loved. They key point that the printed book lovers made in all of these discussions was that they missed the traditional “feel” of paper books. The new technology, with all its conveniences, just wasn’t as comfortable and familiar as what readers were used to. Amazon’s decision to replicate printed material as closely as possible with their first generation of e-readers wasn’t laziness, therefore: it was good marketing—a way of providing people as familiar an experience as possible in new technology. And that’s what skeuomorphism is all about: using familiar, established objects and concepts as metaphors for functions and features in a new medium or technology.

But now, all of that is falling out of favour, and flat design is in. Why? I think it has a lot to do with the penetration of these ‘new’ technologies in our everyday lives. Phil Goldstein reports on FierceWireless that various research firms reported smartphone penetration in the United States being as high as 61% in 2013, with global penetration increasing rapidly. New generations of children are growing up with these new technologies; digital downloads are becoming the norm while CDs, DVDs, and even printed books and magazines are finding less and less circulation. Regardless of whether this is perceived as a great leap forward in how we conduct our everyday lives or a lamentable dying out of traditional media, one thing remains certain—our metaphors are changing.

Perhaps, in fact, the metaphors are simplifying: the learning curve for touch-based interfaces is remarkably low, especially for individuals who have not been weaned on other media: look at infants who, long before they could hope to navigate a printed magazine or even a mouse-based operating system, are able to navigate touch-enabled tablets and open apps before they can even speak (Chang, Rakowsky, and Clark).

Similarly, researchers Harini Sampath, Bipin Indurkhya, and Jayanthi Sivaswamy at the International Institute of Information Technology in Hyderabad, India have developed a system called AutVisComm, whereby children with autism spectrum disorder, even those who have trouble verbally communicating, can benefit greatly from work with smartphone-style devices, sometimes even bridging communication gaps that were previously insurmountable. These devices are touching upon a kind of interaction that is much simpler and more fundamental than what humans have been used to over the past century, built upon “the concept that a visual system might be structured like a language” (Drucker 25).

And now, since our interfaces have become so intuitive and natural, those skeuomorphic metaphors have become more complicated than the technologies they are supposed to facilitate. What once provided critical experiential analogies linking the familiar to the foreign is now an obstacle to usability, an extraneous set of facilitators that complicate rather than simplify. The end result is more than just what Luke Clum describes as being inconsistent and functionless, it actually causes interference with usability.

Researchers Paul Chandler and John Sweller found that, in printed materials, images and text can work at cross-purposes, even if they say the same thing. Put simply, the more information that needs to be processed in order to understand a single concept, the harder it is for leaning to occur. I have written more about this phenomenon (called “cognitive load”) in my essay “Digital Overload: Cognitive Load Theory and Digital Scholarly Editions”, which I wrote during my first semester at Maynooth University and which is available upon request.

But how does that fleeing from skeuomorphism play into our design choices for the Versioning Machine? Simply put, it means removing unnecessary mental processing from the interface. Look at these two “close” buttons, for example: one old, and one new:

Versioning Machine Close Buttons

What was wrong with the old “close” button? Nothing really, except that the faux three-dimensional aspect is unnecessary. A few years ago, the visual representation of a red button that could be pressed was important as a cue to users that the “×” could be clicked on; now, however, people are now accustomed to the concept of an “×” in the corner of the screen meaning “close,” and so it is no longer necessary to make it stand out as a big red button.

Similarly, look at these two buttons, one old and one new, used to bring up an image of the text being examined:

Versioning Machine Picture Buttons

The old button, whatever it was supposed to represent skeuomorphically, is almost indecipherable in its small pixel size. A simpler, more abstract button can now fill its function satisfactorily. The new button, while very abstract (and still somewhat skeuomorphic in that it uses simple shapes to suggest a landscape, and thus an image), is nonetheless sufficient to convey from the context of the page that it will bring up a picture.

The design choices go further than that, however. By reducing the number of colours on the page, the design is not only unified, but it also introduces a smaller variety of stimuli, making processing the page much simpler. The removal of black borders not only pulls the design together in line with modern aesthetics, it also ensures that nothing on the page is more saturated and has higher contrast than the black version text of the versions themselves in their white windows, thus emphasizing the text on the page, which is, after all, what the Versioning Machine is all about.

Thus, the interface has been flattened as a part of its new round of modernization. And while aesthetic considerations have, of course, influenced the design choices—it is important for the product to look good, after all—the main reason behind the change in the user interface was to improve the user experience and make the platform as intuitive as possible, resulting in a polished, well-oiled Versioning Machine.

In a previous post, I wrote about a trip to the National Archives in Ireland and about some of the data I’d photographed there. And I certainly have had a lot to say about Digital preservation. One thing that the trip to the Archives has left me thinking a lot about, however, is what the Digital Humanities means for physical archives. Now, that may seem a bit backward: isn’t the field of Digital Humanities concerned with websites, scans, multimedia, and computers? Yes, of course it is. But the fact remains that there are plenty of physical archives in existence, and that Digital Humanities has significant implications for those analogue objects, as well.

There’s something about the concept of Digital Humanities that I’m sure a lot of people find scary. Putting aside the issues of Big Data and new analysis techniques and the application of scientific and computational practices to the humanities, there is another aspect of what we do that simply changes how we look at preservation. I’ve already written a fair amount about digital data and the challenges that come with keeping it relevant and accessible. Therein, perhaps, lies some of the paranoia: the humanities are starting to move into a realm where text and images are captured in computer files rather than in print—in other words, coming into a format where some sort of hardware or interface needs to serve as an intermediary between us and the ones and zeroes of data in order for us to be able to consume them. If a whole library goes up into the cloud, these people may ask, how will it ever come down?

Now, of course that is a tongue-in-cheek exaggeration. The effect may not be as dramatic as what I went through in having my personal library digitised before moving to Ireland (the process used by the scanning service I employed necessitated destroying the originals in order to create their digital equivalents), but the fact remains that the process of creating a digital archive does in fact affect its analogue equivalent.

One aspect where the digital affects the analogue is that the process of digitisation (other than that taken by the service I employed) does not replace the original artefact, but rather duplicates it, albeit in a different form. This duplication suddenly means that the information in the analogue artefact can be accessed easily by a wide variety of users. For some institutions, however, this duplication can cause a problem when the copyright holder of the original item isn’t happy to have an item replicated, as was the case in 2010 when the British Library came into conflict with James Murdoch after the latter revealed plans to digitise its newspaper collection (Wray, n.p.).

Copyright is a big issue, but the effects that digitisation have on physical archives can be much more far-reaching. In fact, digitization can play an important role in preserving analogue objects. That last sentence may beg some explanation: I myself hadn’t thought much of digital saving the analogue until it was pointed out to me during my first trip to Ireland’s National Archives, when one of the archivists mentioned during her presentation on preservation that digitization protects the archive. The rationale is a simple one: if people can easily access a digital approximation of an artefact, they will be less likely to need access to the original. The Archives’ own website even explicitly states this as a reason for digitising, stating that the institution is undertaking “long-term projects intended to improve the preservation of the original documents by protecting them from handling by creating surrogates in hardcopy, digital or microfilm formats” (National Archives of Ireland, n.p.).

The recognition of the potential of the digital in protecting the analogue that it represents is not limited to the personnel at the National Archives of Ireland. Krystyna Matusiak and Tamara Johnston presented a paper to UNESCO on just this point. While the paper itself is not dated, Matusiak’s homepage states that the presentation happened in 2012 (Matusiak, Krystyna Matusiak n.p.). In their paper, Matusiak and Johnston put forth the assertion that “[h]andling of rare items is less of an issue if there are multiple digital copies available for access” (Matusiak and Johnston, 4). They go on to discuss how digitisation has been helpful in the preservation of film-based photographs, pointing out that “[d]igitization contributes to making ‘visible’ a large body of historical visual evidence, as many of the images become available for public viewing for the first time” (5). Their discussion goes beyond simple access, however, since nitrate-based film is actually quite hazardous to preserve: the film is an unstable compound that steadily and slowly decays, and at advanced stages of decay, has a tendency to spontaneously combust—and worse yet, “Cellulose nitrate if ignited, cannot be extinguished. The film burns in the absence of oxygen producing its own supply” (6). For the medium that Matusiak and Johnston are concerned with, digitisation is not simply a matter of preservation, it is a matter of safety. That said, the preservationist value of digitisation cannot be overstated for nitrate-based film: the original material steadily decays even in highly controlled environments, meaning that all archived nitrate-based film will eventually decay to the point of being unusable, and digitised versions will be the only remaining record of those materials.

So, where does that leave us? Analogue objects are subject to decay (with some media being more at risk than others). Additionally, more and more artefacts are ‘born digital’, meaning that they came into existence in the form of computer data, and will potentially always exist as such. Printing out such data was once a common form of preservation, but now that technology has become so ubiquitous, that form of preservation—going from the digital to the analogue—seems almost archaic.

Will analogue archives, then, eventually decay into obscurity like nitrate-based film, leaving only their digital brethren behind to carry on their legacies?

Hardly.

In his seminal 1936 work “The Work of Art in the Age of Mechanical Reproduction”, Walter Benjamin writes that original objects all have “aura”, which is not transferred intact when an object is duplicated. A number of interpretations can be made of Benjamin’s somewhat vaguely defined concept of aura, but the interpretation I am making for the purposes of this blog is simply that a duplicated item in a different medium evokes a subjective experience that is different from that invoked by the original.

A sculpture of a human being can never perfectly or accurately capture that person. It may be a remarkable likeness, but is not a living, breathing person. It may be a beautiful work of art; it may even convey a sense of wonder or majesty that someone would not experience in seeing the actual person whom the sculpture represents, but the fact remains that the experience of viewing the sculpture is different from that of meeting the original person. Similarly, a photograph of the sculpture cannot perfectly represent the sculpture itself. The photograph may be breathtaking; it may be taken in the perfect lighting and from just the right angle, highlighting the sculpture’s best features while hiding its flaws, but it is nonetheless a two-dimensional print of a three-dimensional, textured object. Touching the film is not touching marble, nor is a B4-sized print able to capture the height of a three-metre-tall statue. The experience of seeing the photograph may be impressive, but it is different from that of seeing the sculpture—which is in turn different from that of meeting the person.

This point was brought home by a guest lecturer in my other class, Theory and Practice. The lecturer, Orla Murphy, who was visiting from University College Cork, spoke about her archaeology research and showed us some of the challenges and decisions faced in digitally representing a three-dimensional object. Many of the tools at her disposal allowed her to analyse the objects she was studying on levels that she would not have been able to with the physical object alone, but representing that physical object in the digital space nonetheless required compromises, since, by its very nature, a two-dimensional display cannot replicate the size, texture, weight, and depth of the actual three-dimensional stone objects she was studying (Murphy).

For me and my classmates in Digital Scholarly Editing, we will have similar decisions to make in the spring term, when we tackle digitally representing a man’s World War I diaries, which come complete with various newspaper clippings, photos, and sketches. How will we choose to render the physical object in digital form? The ramifications of our choices will be slightly different than those of an archivist, however, because we are curating only the digital form of the diaries; we will not be responsible for preserving (nor providing access to) the original physical object along with our digital creation.

A digital scholarly edition of a collection of analogue artefacts serves important preservationist purposes by providing scholars and the public with a durable, readily-available, and often flexible (especially for the purposes of analysis) representation of the original—and, if made well, the digital version can provide vast amounts of information that cannot be found in the analogue version alone. But it is just that: a representation. So, while the National Archives may seek digitisation in order to reduce demand for viewing the analogue original, with certain specialized audiences, it may have the opposite effect:much as tourism boards show people images of beautiful beaches in the hope of enticing people to come and experience the real thing for themselves, digital versions of artefacts may actually encourage some audiences to come and experience the originals themselves. Rather than devaluing the analogue artefacts they represent, then, digitally preserved versions may accentuate the value of their physical originals, while simultaneously serving an important role in preserving them for future generations.

After I finished writing an earlier post about Jordan Mechner’s experience with his Prince of Persia source code, I started thinking about methods of keeping data safe and accessible in the humanities. While the issue faced by Mechner was one of his data existing on floppy disks in an outdated format, floppy disks themselves are a thing of the past. Even forms of portable data storage that were common only a few years ago—USB flash drives and SD cards—are quickly being replaced by something new: Cloud Storage. But is just throwing our data up on servers enough to keep it preserved? Now, to be fair, losing large amounts of data on multiple occasions has made me somewhat paranoid when it comes to keeping my files safe. But corporations, archivists, and even hobbyists often raise similar questions, and many scholars and professionals in a variety of fields have written about those issues and possible solutions.

After doing research into what others have had to say about preservation, my current feeling is that the ideal solution would be a leveraging of the interested public in a kind of decentralized group storage, as a combination of crowdsourcing and cloud storage—”Crowdstorage,” if you will—where the population involved in working on a project also participates in its preservation and curation. This, however, raises a number of issues about copyright law and intellectual property, and is probably in many cases an impossibility.

So, at this juncture, rather than chasing rainbows with the idea of throwing the data to the crowd, I thought I’d let the sources I read speak for themselves—with some annotations of my own, of course.

Digital Preservation Coalition (DPC). Preservation Management of Digital Materials: The Handbook. November 2008. Web. Accessed 25 November 2014. <http://www.dpconline.org/component/docman/doc_download/299-digital-preservation-handbook>
Originally a print book, this handbook now exists as a PDF file online. The 160-page document details the Digital Preservation Coalition’s best practices for handling digital materials. Of particular interest is Chapter 4.3, “Storage and Preservation” (p. 103), which is a central topic of the book in spite of appearing near the end. One important point noted in the chapter is the concept of having both “preservation” and “access” copies in the digital form, much as may be done in an analogue archive: a copy exists to preserve the original data, while another exists for people to access and use (further discussed in Chapter 4.5, “Access,” on p. 122). The document makes the assertion that “not all resources can or need to be preserved forever” and urges making the decision of what to preserve (and for how long) as early as possible (p.103).
Notably, this large document makes little mention of storage techniques beyond physical media, and the use of online, decentralized servers is barely mentioned. Third-party services are discussed in Chapter 3.3 (p. 51), but the discussion is focused more on legality and logistics and less on what services third parties may actually provide.
In summary, while the handbook does not address off-site networked storage directly, it does an excellent job of framing existing preservation practices in the Digital Humanities, as well as common issues and considerations for preservationists to be aware of.

Gantz, John F. et al. The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010. Framingham, MA: IDC Information and Data, 2007. Web. Accessed 27 November 2014. <https://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf>
Though eight years old, the forecast by Gantz et al. does a fantastic job of illustrating the particular challenges faced in Digital Curation. The report calculates that, in 2006, the amount of data produced was 161 billion gigabytes, about three million times the amount of information in every book ever written (p. 1). The paper predicted that that value would increase by a compound rate of 57% every year to 2010, meaning that by the beginning of the current decade, the forecast expected the digital world to be 988 billion gigabytes in size (p. 3). Personally, given the proliferation of streaming entertainment in the last ten years and the increased size of digital downloads, not to mention the explosive growth of social media, I believe that the estimation put forth by Gantz et al. is accurate. The analyses on data production in this document are a fascinating read for anyone interested in digital information, but two topics that are particularly compelling for this post’s purposes have to do with storage. First, Gantz et al. predict that, in 2010, companies will only be producing about 30% of the digital universe (with the other 70% created by users), but will be responsible for safeguarding over 85% of it (p. 9). With the advent of cloud computing, this certainly appears to have come to pass. Second, there is the issue of long-term storage, with digital media degrading far more quickly than its analogue counterpart, so much so that keeping digital data valid requires transferring it to new media every few years (p. 11). All in all, this analysis illustrates that the issue of archival and storage is not limited to the Digital Humanities or even to archivists; it is faced by anyone dealing with any kind of data—contemporary or otherwise—that needs to be safeguarded and accessed digitally.

Inter-University Consortium for Political and Social Research. Guide to Social Science Data Preservation and Archiving, 5th Edition. Ann Arbor, MI: Institute for Social Research University of Michigan, 2012. Web. Accessed 27 November 2014. <http://www.icpsr.umich.edu/files/deposit/dataprep.pdf>
This document outlines the methods and guidelines for data preservation and archiving by the Inter-University Consortium for Political and Social Research (ICPSR). The paper outlines a number of areas that need to be considered before data is to be deposited in a public archive, and notably among these areas are issues of intellectual property and copyright (p. 11). The guidelines remind users that the owners of data may not be the same as the individuals who collect it (as can be the case when researchers work on behalf of institutions), and also may not be the same as the entities responsible for archiving that data (p.11). Both of these issues need to be considered when data is archived and distributed. While institutions are likely to expect their data to be shared to some degree, they often place particular limitations on the nature in which that sharing may occur, as well as the audience with whom the sharing may take place. Furthermore, there is more than one method of sharing, and five common approaches are outlined on page 12. While an archivist’s primary concern may be preservation of data, the practice goes hand-in-hand with making the data accessible to appropriate audiences, and so issues of sharing and accessibility need to be considered from the very beginning of the preservation process. After all, if no one is going to be given access to the data, why is it being preserved in the first place? Other sections of this document, particularly “Importance of Good Data Management” (p. 19) and “Master Datasets and Work Files” (p. 33) contain valuable information for those interested in protecting data, but I would recommend the document in its entirety as required reading for anyone beginning the a Digital Curation project for the first time.

Kraus, Kari and Rachel Donahue. “’Do You Want to Save Your Progress?’: The Role of Professional and Player Communities in Preserving Virtual Worlds.” Digital Humanities Quarterly 6.2 (2012). DHQ. Web. Accessed 25 November 2014. <http://digitalhumanities.org:8080/dhq/vol/6/2/000129/000129.html>
Imagine my surprise when I stumbled across an article discussing exactly the concept that got me thinking about using the crowd for preservation in the first place: the actions of the video game playing community with regard to old and outdated software titles. Kraus and Donahue discuss in detail the issues that make video games particularly susceptible to decay and loss, even relative to other born-digital artefacts. Here are a few examples that are particularly challenging for video games: software and hardware dependence, particularly in the case of titles for consoles or arcade cabinets; lack of documentation from the development or annotations for tools used; questions of authenticity as to whether a program has been free from later adjustment or tampering; and intellectual property rights, often closely guarded by large companies even when they have no plans to license or reuse an outdated title.
Kraus and Donahue go on to examine the preservation measures taken by different groups involved in video games—the companies making the games, and the players themselves—in terms of preserving the games they have made or played. The article goes into a lot of detail with regard to various types of preservation and gives a number of various examples, but it essentially boils down to this: the player community is what is preserving these artefacts, not professional archivists, nor even the creators of the games themselves.

McClean, Tom. “Not with a Bang but a Whimper: The Politics of Accountability and Open Data in the UK.” 15 August 2011. Web. Accessed 27 November 2014. <http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1899790>
This is an important paper as a counterpoint to my idealized vision of using the crowd for curation. McClean outlines the British government’s 2010 decision to “begin the online publication of historical data from the Combined Online Information System, known as COINS” (p. 1). The goal was transparency and accountability in terms of budget and expense in the public sector. This essay argues that the release of data has had little impact so far, and gives some reason for coming to this conclusion. The opening of the essay is heavy on political background, since McClean asserts that politics were the main driving force behind the release of the data (p. 2). Similarly, McClean cites several political and bureaucratic reasons for the release of the data’s failing to meet expectations (p. 5). However, one argument in McClean’s analysis of the data release itself is very enlightening for the purposes of this blog: that the raw data itself is far too complex to be useful, and that no tools are provided to assist the public in interpreting it (p. 8). For example, McClean writes that On the most basic level, the dataset is simply too large to be processed with commonly-available software packages such as Microsoft Excel; doing so requires slightly more specialized software, and this alone probably puts it out of reach of a good many ordinary citizens” (p. 8). Furthermore, McClean notes that the data needed to be reformatted and otherwise modified in order to even be viable for interpretation (p. 9). Although McClean neglects to cite any sources supporting his claim that “after a brief burst of publicity immediately after its release, COINS has not featured prominently in media reports” (p. 9), one can still assume that the difficulties outlined in his writing are real barriers to the free use of the released data. McClean’s essay highlights the dangers of simply releasing data to the public and assuming that they will have the resources, knowledge, and inclination to use it. Without providing cleanly formatted data and the tools necessary for interpretation, the decision to release the contents of COINS seems far more political than practical.

National Records of Scotland. The National Records of Scotland and born digital records – a strategy for today and tomorrow. September 2014. Web. Accessed 27 November 2014. <http://www.nrscotland.gov.uk/files/record-keeping/nrs-digital-preservation-strategy.pdf>
This extremely recent white paper, barely two months old, details a 5-year plan put forth by the National Records of Scotland for the purpose of preserving the digital artefacts in its archive, with the goal of being “efficient, flexible, high quality, and trustworthy” (p. 3). The report states that “the Scottish Government’s corporate electronic document and records management system holds 10.5 terabytes of records” (p.3), and that is only part of the material for which the National Records of Scotland is potentially responsible. The challenge is compounded further by the fact that the National Records apparently do not have the freedom to decide what to preserve and what to discard, but are rather obliged to keep all the records that other government entities pass to them for preservation (p.4), this, coupled with the fact that each institution has its own idiosyncrasies in terms of the formats in which it produces data, creates an especially difficult situation. The National Records recognize that it cannot “do everything now and then walk away from the issue” (p.4), and thus has adopted a forward-thinking approach to preservation. This aggressive approach is particularly important in light of the fact that the National Records of Scotland currently holds only about 256GB of data (p. 8), far less than the Records may soon be asked to curate. For the purposes of this blog post, this article illustrates the immediate contemporary issues faced by a large-scale repository. While the white paper does not propose a firm solution to National Records of Scotland’s situation, I am interested to see whether the institution partners with other groups in the governmental structure to alleviate issues of storage and access.

O’Carroll, Aileen et al. Caring for Digital Content: Mapping International Approaches. Maynooth: NUI Maynooth (2013). Web. Accessed 27 November 2014. <http://dri.ie/caring-for-digital-content-2013.pdf>
The Digital Repository of Ireland is one of the country’s leaders in digital preservation, and this publication is geared specifically toward examining different approaches toward keeping digital content. Of particular note for the purposes of this blog post is the section “Multi-Site Digital Repositories,” which details a number of repositories that “host data within a federated structure that allows sharing of metadata and data across institutions” (p. 15). The five repositories included in this section (along with links to each) describe institutions that have decentralized their data by creating a structure wherein the data is stored in multiple interconnected locations. This not only allows multiple institutions to pool their resources and work more effectively by sharing information, but it also safeguards against data loss by not keeping all of their data stored in one place. One very interesting thing to note is that, while the five institutions—ARTstor, the Digital Repository of Ireland, the IQSS Dataverse Network, Openaire, and the Texas Digital Library—all operate under the same principle of decentralizing their data, each goes about that process very differently. ARTstor, for example, provides institutions with tools to store and publish data privately or publicly (p.13), while IQSS “delegates the access controls to its users” while keeping the systems themselves centralized (p.14). Caring for Digital Content includes a large amount of information on other types of collections and services in addition to those mentioned in “Multi-Site Digital Repositories,” constituting a large overview of data curation strategies in a number of different countries. While the primary goal of the publication was to help the Digital Repository of Ireland “future-proof” its own digital collection (p. 27), the institution has done the broader field of Digital Humanities a great service in making its research findings available to the public at large.

Online Computer Library Center, Inc. and The Center for Research Libraries. Trustworthy Repositories Audit & Certification: Criteria and Checklist. Dublin, Ohio: Online Computer Library Center, Inc., 2007. Web. Accessed 27 November 2014. <http://www.crl.edu/sites/default/files/attachments/pages/trac_0.pdf>
This document, created through the cooperation of the Research Libraries Group and National Archives and Records Administration of the United States, outlines the criteria for certification by those bodies of a “Trustworthy Repository”. The certification process and requirements are valuable for the purposes of this blog post as they outline the expectations for an official digital repository, which would be a very different entity from the idealized “Crowdstored” collection suggested in the introduction to this blog post. Among these requirements is that of specifying legal permissions for the repository’s contents (p. 13), again highlighting the importance of considering permission and copyright law when maintaining a repository, and highlighting the special situations regarding intellectual copyright that would have to exist if a collection were to be stored and curated by the crowd. The requirement that would probably pose the greatest obstacle to a crowd-curated collection would be this: “Repository commits to defining, collecting, tracking, and providing, on demand, its information integrity measurements” (p. 15). With a collection where data is stored freely by users, tracking changes to the data becomes difficult, and keeping every instance the data free from corruption is nearly impossible. Finding ways to counter this problem would be an important next step in developing a collection of this type.

Poole, Alex H. “Now is the Future Now? The Urgency of Digital Curation in the Digital Humanities.” Digital Humanities Quarterly 7.2 (2013). DHQ. Web. <http://digitalhumanities.org:8080/dhq/vol/7/2/000163/000163.html>
This article opens by referencing the 2006 report by the American Council of Learned Societies, in which the group stated that accessibility by the public was a critical next step in data curation. The essay states that, as a result of this report, many entities “since have embraced the importance of promoting the digital humanities through democratized access to and an expanded audience for cultural heritage materials and through posing new research questions — indeed, they have done so at an accelerating rate” (n.p.). Section II of the article goes into detail about the different incentives and disincentives to sharing curated data, noting that shared data allows for a much wider degree of computation, analysis, and utilization, but on the other hand also opens up the data to misuse and corruption (n.p.). Furthermore, the essay states that sharing data has another potential downside, that of giving an unscrupulous researcher the opportunity to steal—and get credit for—another’s work (n.p.). The paper goes on to discuss developments in digital curation in the seven years following the report by the American Council of Learned Societies, and in Section IV details a few case studies in the United Kingdom. Most notable for the purposes of this blog post is that “researchers shared their methods and tools more freely than their experimental data,” being much more careful with the data that was often tied to their livelihood (n.p.). Furthermore, “[p]ersonal relationships loomed large in researchers’ willingness to share their data externally; conversely, they felt apprehensive about cyber-sharing” (n.p.), suggesting that researchers were often very nervous about losing control over their data. These findings suggest that researchers often treat digital data in the same ways as analogue data in terms of sharing and control, and that the concept of “Crowdstorage” will run into similar obstacles where academic data is concerned.

Wharton, Robin. “Digital Humanities, Copyright Law, and the Literary.” Digital Humanities Quarterly 7.1 (2013). DHQ. Web. <http://digitalhumanities.org:8080/dhq/vol/7/1/000147/000147.html>
As many of the other sources on this list suggest, the availability of digital data has many ramifications when it comes to intellectual property and copyright law. This article looks at copyright law from primarily a United States perspective, particularly with regard to Section 101, Title 17 of U.S. Copyright Code, which concerns itself with literary objects. The section “U.S. Copyright Law and the Literary” emphasizes that, in the United States, “[o]nly copying of expression gives rise to copyright infringement; use of the ideas does not” (n.p.), which draws a kind of legal box around what is “literary” and what is merely an idea. All of that, however, is somewhat tangential to the issues faced by archivists in Digital Humanities with regard to copyright law, since archiving is more concerned with preserving digital objects in their actual expressed form. Therefore, in the United States, the bigger issue is one of “fair use” (as outlined in the section Digital Humanities and Copyright Law), wherein the use of literature for literary scholarship is allowed because of the supposition that “literary scholarship will not resemble, except in the most superficial way, the literary objects with which it engages” (n.p.). The tensions this definition creates in the digital world are legion, and the article goes on to address some of them, but the biggest issue for the purposes of this blog is that once curated objects are made available to users in a way that allows users to duplicate the data, the distribution can hardly be justified under “fair use”. The idealized “Crowdstorage” concept, then, would need to seriously consider the ramifications that copyright law would have on any such collection’s legal existence.

This past Monday, I went with Shane to the National Archives in Dublin, where we spent some time photographing documents for the Letters of 1916 project. The project, now in its second year, aims to create a digital archive of letters sent to and from Ireland from November 1915 to October 1916—the months before and after the Easter Uprising, one of the major events leading to the country’s establishment as an independent nation. The Archives have diligently preserved a large number of documents, which researchers from the Letters of 1916 project then looked over and selected for preservation. The archivists go through a painstaking process of restoring the documents to a state where they can be photographed at a decent level of quality, which is what I was helping with that day.

One of the things that may not be obvious prior to looking at all the letters that collected on the site is just how broad the concept of a “letter” can be. Today, nearly 100 years after the Easter Rising, we have all manner of communications—telephone calls, television broadcasts, e-mails, text messages, Twitter feeds—that were unheard of in the past. In 1916, the only commonly-available means of communicating with people over a distance was through letters. The more technologically cutting-edge communications—telegraphs—were still transcribed and delivered like letters. As a result, even simple personal or official communications left written or typed physical records. A cursory glance at the Letters of 1916 site will give an idea of the variety of topics covered by the letters that have been collected. Some are personal love letters, some are religious in nature, and some discuss business; still others are directly concerned with the question of Ireland’s independence.

With such a great variety of letters to look at, I did not know what to expect when I went to the National Archives to help with digitising letters. I thought that maybe I would come across some basic correspondence, perhaps some letters from churches or perhaps, if I was lucky, someone discussing the question of whether Ireland should pursue independence.

Instead, I found a packet of telegrams and documents from a prison, featuring page after page—after page—of names.

What was I looking at? These were names and places of origin of hundreds of Irish citizens, reported to officials in Dublin Castle, who were being released from prison over the summer and autumn of 1916. The documents were all bore stamps stating that they originated in Wales.

The original documents are of course safely ensconced in the Dublin Archives, and the digital images of them are currently awaiting post-production, so I do not have access to them right now. I did, however, remember enough details to do some research upon coming home. My best guess is that these documents originated from the Frongoch Prison Camp in Wales, where BBC writer Phil Carradice reports that the British government detained 1,863 rebels in “the worst type of knee-jerk reaction.” While Carradice and the Frongoch Prison Camp Wikipedia entry report that the camp was emptied in December of 1916, the records I photographed show that several hundred prisoners were released prior to this date.

What’s so great about a list of names, you may wonder? To be honest, even back through most of my undergraduate years, I would have found page after page of names to be painfully boring. Most letters tell a story; they may be a statement of love, a set of instructions, a personal account of someone’s experience, or a thank-you for a particular favour. But, apart from giving one an idea of just how many people were incarcerated in Wales, what information can be gleaned from a list of names alone?

Not much, really—if you were to look at the names alone. But part of the beauty of Digital Humanities lies in the tools that technology puts at our disposal. I’d be interested to see, for example, the geographical distribution of people who were arrested and then released. Did the people tend to all originate in the same location? Sure, one may expect that the lion’s share of those arrested would be from the area near the General Post Office where the uprising was centered, but what about people from far away? I even saw names from as far away as Cork: were those people simply in the wrong place at the wrong time, or did they have more significant reasons for participation?

We can do far more than just run analyses on the list itself, however, and this is where I am particularly excited. Hundreds of people were arrested after the Easter Rising and were released from prison in Wales over the course of the following seasons. What did these people have to say about their experiences? Did any of them write letters before or after their imprisonment, and if so, what did they have to say? Once we have the records and letters transcribed into computer-readable data, these questions and more can be explored.

When we reach that stage, however, we need to be careful about simple name-matching: lumping together every letter that is written from someone with the same name would be a critical mistake. This is a pitfall that John Bradley knew enough to avoid even back in 2005, in his analysis of the thousand-year-old Durham Liber Vitae. Bradley reports that the list he was examining is a church record containing names added from the 800s all the way up to the 1500s, making for an extremely large record. It is hardly surprising that, in such a large list, there would be cases of two or more distinct individuals sharing the same name. Complicating matters further, a single individual might be listed in one place under one spelling of a name, while in another place, the name might be written differently, just as some people call me “Joshua” and others refer to me as just “Josh”.

People are sometimes under the misconception that, when Digital Humanists talk about applying the power of computers to humanities research, we mean to take people out of the equation entirely, and let computers to all the work for us. Nothing could be further from the truth: thinking that we could let computers do every bit of critical analysis for us would be the height of stupidity. In fact, computational analysis is only as good as the data we input—and the output, similarly, means nothing unless a human is there to examine and draw conclusions from it. Bradley realized that the computer wouldn’t be able to tell the difference between two people with the same name (or one individual with her name written two different ways) unless it was specifically instructed to treat two entries as separate or unique. Therefore, in addition to encoding the names from the Durham Liber Vitae into the computer, he assigned each a unique identifying number, and the team of researchers were then able to decide on their relative degrees of certainty regarding whether a particular mention of an individual was referring, say, to person #397 or to person #4569. With such a system in place, a computer can run analyses on instances of a person with a certain numerical identifier, rather than with a certain name, and could then produce results that would be much more in line with what the researchers would want to see.

Mercifully, the Letters of 1916 project is focused on a period spanning several months rather than several centuries, and so the problem of names is much smaller in scope. Even so, however, we would do well to approach individuals in a manner similar to Bradley, using the information available to us to decide whether different mentions of a name are in fact the same person, should we hope to do analyses tracking the movements, behaviour, or correspondences of specific people over the course of 1916. And with opportunities that the lists of names we’ve found in the National Archives provide us, it’d be a shame not to do so.

One of the topics that came up in the very first session of our core class, the appropriately titled “Digital Humanities: Theory and Practice”, was the issue of preserving digital data. It is an easy assumption that once a book is scanned and backed up onto a drive, it is safe for all eternity—but that is far from the case.

Here’s a simple example: have you ever backed something up to a writable CD, only to come back a few years later and have your computer report that the disc is completely unreadable? A compact disc, like any media, is subject to decay, and unfortunately those early CD-Rs decay more rapidly than most.

But old CDs aren’t the only way data can be lost: it is also very easy to inadvertently destroy data. Consider the poor laptop I used throughout my four years as an undergraduate: when I departed for Japan, I took a new computer with me. While I was gone, one of my family members found my old computer and decided to put it to use, backing up its original contents to a portable drive and reformatting the laptop itself. When, a few years later, I was visiting home and needed to pull some files from my old computer, I found the drive reinitialized. Worse still, the hardware that had been used to back up my drive had already become out of date, and we didn’t have any way of reading it. When we finally dug up a machine that could pull the data, we found that the backup had become corrupted—probably demagnetized. In the years following, I managed to piece together much of my undergraduate career from various backups I found lying around, but it has been a slow and painful process. In the end, most of my third year of study is lost, as are all the source files of the music I wrote while a student.

As one might expect, the experience led me to become somewhat preoccupied with the preservation of digital data. When I finally left Japan and moved back home for a spell, the first thing I did was to round up everything I could get my hands on—like old 3.5″ floppies from my childhood, Iomega Zip Disk backups I made in college, the decrepit old Compaq computer I used in high school—and move everything I possibly could to a latest-generation external hard drive, where it is safe, at least for the time being.

As it turns out, this very personal concern of mine of preserving my own digital history is one that is shared throughout the realm of technology.

One example I found particularly fascinating is Jordan Mechner’s discovery of the original Apple II source code for his groundbreaking game Prince of Persia. To the uninitiated, this is a mere curio, but to a programmer or historian interested in electronic entertainment, it is a major event: Mechner compares the source code of a game with sheet music—it can be studied, broken down, and analyzed from a perspective that the finished product or performance cannot—and the original Prince of Persia is an important artifact in electronic entertainment history. Not only was the game notable for breaking new ground in the area of cinematic platforming and rotoscoped animation in games, it also was popular enough to spawn a major game franchise and a movie tie-in, but the original title was, by Wikipedia’s count, ported to over 20 other platforms, and remains a seminal title in 90’s computer gaming.

What makes the find so compelling in my eyes, however, is the medium it was on: 3.5″ Apple II floppies. Think about it: these disks turned up only two years ago, in 2012. If you are one of the proud owners of an Apple II computer, do you still have it? Does it still work? Perhaps it does; perhaps you loved the old thing and cared for it well, and were lucky enough not to have it decay. Well, do you have any way of connecting the data from an Apple II to one of your next-gen computers, and still have it be readable? Therein lies the rub: how do you bridge the gap between old formats and new? Jordan Mechner was fortunate enough to have found something that people with the necessary skills thought was worth saving, and he found it at the right time. As he puts it on his blog: “The 1980s and the Apple II are long enough ago to be of historical interest, yet recent enough that the people who put the data on the disks are still with us, and young enough to kind of remember how we did it” (Mechner).

Ultimately, fortune smiled upon Prince of Persia, and the source code was still intact on the old disks, and has now been extracted and posted online. But the process wasn’t an easy one, and the code’s author asserts that much of the difficulty came from the fact that the preservation involved digital media:

Pretty much anything on paper or film, if you pop it in a cardboard box and forget about for a few decades, the people of the future will still be able to figure out what it is, or was. Not so with digital media. Operating systems and data formats change every few years, along with the size and shape of the thingy and the thing you need to plug it into. Skip a few updates in a row, and you’re quickly in the territory where special equipment and expertise are needed to recover your data. Add to that the fact that magnetic media degrade with time, a single hard knock or scratch can render a hard drive or floppy disk unreadable, and suddenly the analog media of the past start to look remarkably durable (Mechner).

So, if digital media are so fragile, what is the point of Digital Humanities? Digital Humanist Abby Smith writes that the perceived impermanence of data is a significant obstacle to digitisation in academia, and then goes on to describe the very limitations that Jordan Mechner experienced above. Why are we so concerned with cramming books into little magnetic drives if getting the information off of them ten years down the line is going to be akin to digital brain surgery?

In a 2002 report for the Council on Library and Information Resources, Daniel Greenstein and Abby Smith outline four proposed ways to combat the fragility of digital data. In short: reformatting information as technology changes; preserving data along with the platform and hardware it depends upon; developing emulation of older environments; and persistent object preservation, which involves recording all the context and properties necessary to make data persistent. All of these approaches, however, are focused on saving one instance of a digital object, which fails to take advantage of one of the main strengths of our current digital age.

A different answer can be found in what Jordan Mechner did just after he successfully extracted that very source code that inspired his reflection: he put it online. It is far easier to make an exact copy of a megabyte of data than it is to make a perfect duplicate of a book. And furthermore, that data can be much more easily shared than a physical book can. Now that the Prince of Persia source code has been put online and downloaded untold times, if, ten years from now, its creator again finds himself unable to locate the original code, it is incredibly likely that someone will simply be able to e-mail it to him. This kind of widespread duplication may terrify copyright holders, but it is a preservationist’s dream.

As digital humanists, then, our very act of making artifacts accessible digitally contributes to their preservation. Should something happen to the archive of data we have created, chances are that the items will still exist on a drive somewhere on the Internet, thanks to its being downloaded by an amateur historian somewhere. Furthermore, burgeoning initiatives such as the Open Archives Initiative Protocol for Metadata Harvesting aim to facilitate data sharing among scholarly institutions, further helping to preserve data by allowing multiple copies to be hosted in different locations.

Even with data copied in multiple locations, however, we should not become complacent in thinking it safe. Equally important is making sure that what we DO have is stored on the most current hardware possible and in contemporary formats, lest it become lost in the hieroglyphs of an archaic format, much like the Prince of Persia source code nearly was.

This is where it all begins—but every beginning has a story, some kind of activation energy that got things moving in the first place. Let me tell you a bit of mine, and lay down a bit of context for this little corner of the blogosphere I’ve newly carved out for myself.

I am Joshua D. Savage, newly started in the Masters programme at Maynooth University in Ireland, working on earning my MA in Digital Humanities. When I tell most people this, the very next question (accompanied by a blank stare) is usually, “What in the world is Digital Humanities?” The short answer is that the field seeks to put art, literature, and history on computers, but the reality is much more complicated than that. In this blog, I will explore various aspects and challenges of the field, test a few boundaries, and perhaps raise a few new questions.

But I digress—we were talking about backstory.

Mine, like most people’s, is complicated: it’s a tapestry consisting of a number of unlikely interwoven threads, pulled from all manner of placed. The first thread is the work I did while earning my undergraduate degree in Psychology and East Asian Studies from Harvard University, and while there are a number of stories (both good and appalling) to tell about those years, one important thread for our purposes is the appreciation I gained for how involved research is, and how difficult it can be to find helpful materials—one experience that left a significant impression was wandering through the labyrinthine depths of the university’s largest library to find a journal article for a professor, all during a period when the library was under construction, meaning that most of the lights were not working and several of the aisles were cordoned off (I went into the library at 3PM, and by the time I left, it was dark outside).

Another is a love of history, cultivated not only through my courses in East Asian Studies, but also, even more so, by a core course (a sort of pseudo-elective) entitled “Inventing New England” taught by a brilliant historian named Laurel Thatcher Ulrich, which was my first experience of seeing history as a living, breathing concept, ever subject to interpretation and rediscovery in every subsequent person and age, rather than as a dead collection of names and dates. I had spent most of my middle school years heavily involved with the history of maritime Salem, thanks in no small part to the (successful) efforts of my middle school, the Phoenix School, to spearhead a movement to get a replica of the East Indiaman Friendship built in Salem Harbor, and the new perspective cast by “Inventing New England” caused my entire experience of that segment of history to explode in vibrant possibility.

Yet another thread is the seven years I spent living and teaching in Japan, seeing the culture there firsthand, seeing national treasures (both buildings and people) whom I had read about in books standing before me in living color, where just about every historical site of note is older than the country where I grew up. Shortly before I left, for example, one of my favorite cities in the country, Nara, celebrated its 1300th anniversary.

Still another thread is the work I have done as an educational consultant, writer, and designer since coming home, helping a startup client in the educational software industry establish a digital footprint, and then plant that digital foot in the door at a number of schools throughout the United States. Working there let me see a piece of digital academia from all sides, from the outermost user interface to the innermost guts of its code, and gave me the opportunity to dirty my hands working on every level.

And through it all, I have a lifelong love of computers, and in particular a love of video games. This might not, on the surface, look like it has a great deal of overlap with history, but in fact many of the issues faced by people who view video games as Art, rather than simply entertainment, are similar to the ones faced by Digital Humanists, particularly when it comes to preservation. As a born digital medium—meaning one that exists only in digital form from the point of its creation—keeping old video games in accessible form is a daunting task, in part because they so often exist in proprietary formats. This topic deserves an entire post on its own, which is exactly what it will get.

And finally, there is my own interest in digital preservation. What with moving home from Japan, moving back out on my own, dating, getting married, and coming to Ireland, I have moved quite a bit in the last three years, and particularly moved essentially four times in 2013-2014 alone. For a bookworm and a collector, that means potentially having an awful lot of stuff to lug around, especially a lot of out-of-print books that I brought home with me from Japan. Thus, most of the first half of 2014 found me heavily invested in digitizing my own library, making use of both a book-scanning service in the States and a document camera that I invested in on my own. The process was slow and painful (and somewhat expensive), and there is still a great deal that was left behind, but I now have a significant library on a little digital drive. I was surprised to learn after coming here how many of the challenges I faced in digitizing my own collection are ones that are shared by the Digital Humanities department here.

So, those are the threads. If you want to see how they all weave together, find me here at Humanities.exe, creating my own born-digital tapestry, a record of one person’s life as a fledgling Digital Humanist at Maynooth University.