Abstract

A Day in the Life of the Digital Humanities (Day of DH) is a community documentation project that brings together digital humanists from around the world to document what they do on one day, typically March 18. The goal of the project, which has been run three times since 2009, is to bring together participants to reflect on the question, "Just what do computing humanists really do?" To do this, participants document their day through photographs and commentary using one of the Day of DH blogs set up for them. The collection of these journals (with links, tags, and comments) is, after editing, made available online. This paper discusses the design of this social project, from the ethical issues raised to the final web of journals and shares some of the lessons we have learned. One of the major challenges of social media is getting participation. We made participating easy by personally inviting a seed group, choosing an accessible technology, maintaining a light but constant level of communication prior to the event, and asking only for a single day of commitment. In addition, we tried to make participation at least rewarding in formal academic terms by structuring the Day of DH as a collaborative publication. In terms of improvements, we have over the iterations changed the handling ethics clearances for images and connected to other social media like Twitter.

1. Introduction

On March 18th, 2009, 85 digital humanists around the world participated in a social research project designed to document "Just what do computing humanists really do?". We did this through photographs and blog entries reflecting on the day's work.[1] The "Day in the Life of the Digital Humanities" project was conceived as a social publication project that began with reflection on what we do as we do it. By the end of the day, we had 668 blog entries, many including photographs. Since then we have run the project twice more (in 2010 and 2011) with growing numbers each time, though this paper focuses mostly on the first iteration.[2] The entries document, in varying ways, the community as it reflects on itself and inevitably the project itself. As a community project, the analysis of the entries is open to everyone and not the privilege of the project curators. The collection of entries has been open to the community at every stage of its creation and curation so that participants could reflect on the entries while they were being written. We have come to view this capacity for immediate recursive reflection as one of the most interesting features of this project, but more on that later.

In this paper, our focus is on the design of the first iteration in 2009, rather than on the analysis of the content. We discuss the background of the Day of DH project and how it was conceived to answer questions about our community. Our goal is to explain our approach to the design of social media in the humanities and to share the lessons we have learned about the design of an international social media event given the promise of this form of research configuration. Community research or academic crowdsourcing offer significant advantages over traditional ways of organizing research projects, but they depend on community participation to work and careful design to avoid ethical problems. Community research projects like this depend on a different type of preparation that creates a welcome context for participation, including ethics review. Further, the infrastructure for such projects has to be designed to be easy to use by many with minimal training. We also discuss participation – who participated and from where – and then talk about the technology, the types of entries, and tagging approaches. Finally, we conclude by reflecting on the challenges and opportunities for such communal or social research projects.

We begin by examining what we believed we, as the organizers, were doing, which is leveraging the potential of social media to enable our community to build and maintain social capital, both within the DH community and beyond. Bourdieu’s (1983) work on social capital emphasizes both the actual and potential resources available to the individual through participation in a network. Coleman (1994) focuses on the potential benefits to the individual. Putnam (2000) highlights the value of social capital to the community by equating community participation with civic virtue.

Individuals involved in the Day of DH 2009 have had an opportunity to increase, extend, or consolidate existing social capital through self-revelation within the framework of the day. The DH community in the larger sense has had a moment of opportunity for critical self-reflection. It should be noted, however, as it was by more than one participant, that the Day of DH is not a unique opportunity for building and maintaining social capital through social media. Many of the participants in the Day of DH 2009 also make frequent use of ongoing resources such as Twitter, LinkedIn, Facebook, and their own blogs and wikis.

2. About the Project

2.1 Conception

The idea for A Day in the Life of Digital Humanities 2009 came from a lecture by Edward L. Ayers titled "What Does a Professor Do All Day, Anyway?" Ayers – formerly of the University of Virginia and currently president of the University of Richmond – was an early computing historian whose "The Valley of the Shadow" project was a founding Institute for Advanced Technology in the Humanities (IATH) project. In that lecture he reflected on how many people, including his own son, know little of what a professor does. As he puts it,

In the eyes of most folks, a professor either portentously and pompously lectures people from his narrow shaft of specialized knowledge, or is a bookworm – nose stuck in a dusty volume, oblivious to the world [Ayers 1993].

The situation is even worse in the digital humanities where people not only do not know what we do as academics, but also seldom know what "humanities computing" or the "digital humanities" are. Contributing to this problem is the usual divergence of views within the field as to how to define it and how to name it. A Day in the Life of Digital Humanities addresses the question of definition by posing the question "what do we do" much as Ayers did. Rather than summarizing what he does, Ayers broke down and shared each step of his own work week. The Day of DH 2009 follows his example and expands upon it, organizing a participatory community to reflect as a community.

The "Day in a Life of" organizing principle is not a new one. It has been a recurring motif in literature, seen in such work as Solzhenitsyn's One Day in the Life of Ivan Denisovich, Joyce's Ulysses, and Woolf's Mrs. Dalloway. It has also been used in the popular The Day in the Life series of photography books that document a country or continent in photographs taken over a 24-hour period by a team of around one hundred photographers. The motif suggests the documentation of a subject's "real" life, emphasizing the ordinary aspects of their environment over the extraordinary. A day becomes any day. While it is inevitable that these projects lead to unusual reflection, where the act of observation affects the way that the participant acts, such reflection is, we believe a virtue in the humanities, especially when reflection is in aide of self-definition. In other words, reflection is important to the humanities, and we have the experience in the humanities to do it well. While a participant's actions may be staged or deliberate, this does not necessarily mean that the events are unrepresentative.

In the era of social computing, "A Day in a Life of" can also be reinterpreted online. There are Facebook groups like "A Day in the Life of ..." that use the theme for journalistic purposes. Blogs such as A Day in the Life of an Ambulance Driver follow the principle of "any given day" without specifying a particular one. Flickr features a day-in-the-life group in the vein of the book series, with nearly 5500 members uploading digital photos on a predetermined day. One of these took place on just two days after the Day of DH 2009.

Social Web 2.0 tools like wikis and blogging platforms make it relatively easy to enable a group of people to collaborate online and we are not the first to experiment with the possibilities of social Web 2.0 projects in humanities computing. The Suda On Line project, started in 1998, has demonstrated how such an approach can leverage a broader community toward research ends. In the Suda On Line project some 150 editors and translators are translating the Suda, "a massive 10th century Byzantine Greek historical encyclopedia of the ancient Mediterranean world"
[Suda On Line, About SOL]. Likewise, Geoffrey Rockwell's Dictionary of Words in the Wild is a social photography project where users can post photographs of public textuality and tag them by the words that appear. That project has gathered over 6,500 images and close to 5,500 words tagged on the strength of an interested community.

The Day of DH 2009 utilizes a number of these organizing principles. It uses the "day-in-the-life" motif both symbolically – as a look into a particular way of life – and literally – in being bracketed within a single day (more or less; given the flexibility of time zones, this "day" existed within a period of 48 hours). It also takes advantage of the rapid publishing of the blogging format to make the day of documentation into a near real-time event, a quality that lends a kinetic interest to those who were following the project.

2.2 Execution of the Project

The project was developed over the Fall of 2008 and Winter of 2009 (for details, see the wiki we used: Day in the Life of Digital Humanities). If there is one lesson to this project it is that even a fairly simple idea takes a lot of time to think through and organize, especially if a number of people are depending on you on a particular day. Here is an outline of the major tasks that have emerged over 3 iterations.

We first had to develop the project sufficiently in order to be able to propose it to the appropriate Research Ethics Board (REB) of the University of Alberta. The project was presented as a collaborative publication, similar to an edited collection of essays. Participants, from the very start, were not presented or treated as "subjects", so as to avoid ethical concerns. However, one issue that had to be negotiated with the REB was photography permissions. For this we developed a protocol and a sample permissions slip for participants to use. In later iterations we were able to convince the REB that we shouldn't have to collect ethics forms.

Once we had ethics approval, we started the quiet recruitment phase, where we personally invited people we knew to participate in the project. We drew up a list of some thirty seed participants that we knew and then kept adding to that list as we thought about people we wanted for the project. Only once about forty participants were invited did we go public and issue an invitation through Humanist.[3] It is hard to know whether the quiet recruitment phase made a difference in participation. We might have had as many participants without it, but it did give us a way of doing something about participation other than announcing and waiting. We also know that some of the people who joined after the announcement did so because invited people talked up the project. We should note, however, that for the subsequent iterations we didn't need to personally invite folk as we had a list of previous participants as a seed community.

In the invitations and announcements we asked people to fill out a short form. Though labeled as an application form, it served primarily as a screening form, to make sure that applicants were serious and understood what the project was going to be. We ended up accepting all but a couple of applicants, and only rejected a few because they clearly had misunderstood the project. As part of the form, we asked, "How do you define Humanities Computing/Digital Humanities?" – and received a surprisingly interesting set of answers.[4] This list of definitions has proven to be a welcome side-effect of the project. which is now seen as a useful starting point for defining the field.[5]

As people registered to participate we needed ways to communicate with them and to support them. We settled on using an email list for communication and a wiki for support with information for participants including a list of the participants. As March 18th, the Day of DH 2009, approached we used the email list more and more. In the last month we sent emails at least once a week on various topics. We tried to remind participants of the project without irritating them with too many inconsequential messages.

After consulting with the community on blogging technology, we settled on WordPress MU (Multi-User) for the participants to enter their activities and photographs. We set it up so each participant had a blog and a biography page. We encouraged people to enter an "About" page about themselves before March 18th as a way of familiarizing themselves with the environment and as a way to describe projects they are involved in that they may not mention on the day. We also added help materials to the wiki. You can see how different people used their blog instance at the List of Day of DH Participants.[6]

For the stream of entries on the day we set up an aggregated RSS feed so people could see the entries from all participants as they were posted. This feature turned out to be of interest to participants experimenting with visual exploration tools, like Stéfan Sinclair and Alejandro Giacometti. Sinclair has developed a wide range of text analysis tools and environments, the most recent of which is Voyeur [Voyeur 2010]. Giacometti similarly works in the area of visualization tools for the humanities, in this case focusing on a rich-prospect browsing environment called TextTiles [TextTiles]. Both these researchers were able to rapidly experiment, using their tools on the live data stream.

After the day itself, we provided a few weeks for people to edit and finish writing any entries. This proved useful for participants who took pictures but didn’t have the time to upload them on March 18th or for those who wanted to edit longer reflections.

Like many projects, once the excitement of the Day of Digital Humanities was over, there was tedious work left for us to do. That included double-checking photographs for ethics, tagging all the entries with a common set of keywords, lightly editing all the blogs, cleaning the data and exporting it in a logical structure. The dataset is separately available as an appendix with documentation.[7]

2.3 Technical Design, Export, and Transformation

A concern early in the project was the choice of the platform that would be used for recording the day. How exactly should a hundred people narrate their day online? Basically the project had four needs: textual entries, media (image) uploads, customizable time zone information, and comprehensive export functionality. Additionally, we were looking for something that was affordable and easy to use – something that could be installed on our servers and offered visual and technical flexibility.

Our solution was found in WordPress (http://www.wordpress.org), a free open-source blogging platform. Due to its popularity, WordPress is well-documented and phenomenally well-supported. Furthermore, the WordPress community has generated a large corpus of third-party plugins, though the Day of DH 2009 ultimately did not use any of these. Wordpress also has a malleable visual template system and an export format that, in an informal review of blogging software, we found to be one of the most comprehensive. Unfortunately, the maturity of the software also meant that the interface was more complicated than other blogging platforms. Though perhaps offset by familiarity for some participants, the clutter and number of features — e.g. pages versus posts, categories versus tags — created difficulties. We are looking at how to simplify the interface if used again. The last confidence afforded by WordPress was the existence of Wordpress.com, a free WordPress hosting service that offers a nearly identical experience. In case of any catastrophic server errors on the Day of DH 2009, Wordpress.com could have functioned as a backup.

After it was decided to create individual spaces for each user – rather than a single space for all users – we were presented with a logistical problem. Installing a blogging platform takes a fair amount of setup time and installing the number required by the project would be a large undertaking. To avoid this, a WordPress fork called WordPress MU was used, which allows for the creation of many separate blogs as a single install. WordPress MU blogs have the same functionality as regular WordPress blogs, but since they have shared common files and a single database, it is more manageable and efficient. Indeed, while many blogging platforms would have met the project's major needs, in the problem of scale WordPress MU stood above all other affordable or free solutions.

WordPress MU was installed on a Red Hat Enterprise Linux server with a MySQL database. The database stored user accounts and textual data, but the images uploaded by users were hierarchically stored in the file system. In addition to the standard installation of WordPress MU, a site-wide RSS feed was installed to aggregate posts and comments of all blogs. A customized template was also developed to provide a user interface that is more in line with the project goals.

While the day proceeded, data was periodically backed up into TAR archives of images and database dumps. Although these types of backups are very useful in disaster recovery situations, they are hardly flexible enough for encapsulating data for further analysis. For the purpose of a dataset for further study, all the data was exported in XML files, combined into a single file, processed using XSLT and scripts, and finally proofed manually. The dataset that is available in addition to the preserved blogs was designed to provide material for those who want to study the reflections of the community. Therefore, we considered it important to preserve as much as possible of the metadata automatically generated by individual participants when entering their blogs. Our TEI "A Day in the life of the Digital Humanities" customization strives to reconcile this aim with the benefits that are associated with the use of the TEI standard when encoding, preserving, and interchanging such data. See "About the Dataset" for more information.

3. Preliminary Analysis

The collegial spirit of the project prohibits us from analyzing the dataset of the 2009 iteration in a definitive way. We do not believe that the project curators have a special relationship to the dataset that resulted as it was conceived of and organized to be an open shared project, and it could also be argued that we are too close to the project. We were encouraged that participants like Stéfan Sinclair applied analytical tools to the RSS feed on the day in question, a sign that the community felt they had the right to analyze the results as they were happening. Nonetheless as organizers we have analyzed the project in order to understand what worked and didn't. We share this analysis in that spirit and not as a definitive analysis of the community.

3.1 Answering the Question

The first question we should be asking is whether the project has advanced our knowledge of what digital humanists do and what the nature of the field is. The answer is not simple because the project has become part of the field it describes. Yes, the majority of participants posted entries about what they did on March 18th, but there has also been a level of critical reflection on the project that makes clear that the entries can't be trusted as an "objective" record of participant activity.[8] As a result, our view of what it is that we are doing has shifted. We went from a simple idea for a project that would be a form of community auto-ethnography (and an experiment in social media in the humanities), to wondering if we haven't stumbled upon an alternative form of (un)conference where people gather on their own time for a day to discuss things without paying to attend timed events. While most participants still do document their day, there is a degree of self-reflection and disciplinary reflection that makes the project more than just a community documenting everyday practices.

What is clear is that the How do you define Humanities Computing / Digital Humanities? feature of the registration process is probably a better answer to the question of what the digital humanities is than the full dataset. This feature, initially designed to help us filter applicants, has proven a useful collection of short definitions for all to use.

We have also had to be honest with ourselves about our motivation, especially as we annually consider whether we want to go through with another iteration when we have no specific funding for the project. In the Fall of 2008 when we conceived of it, we thought of it not only as a way to answer a question, but also as a way to experiment with crowdsourcing in the humanities inexpensively – without having to get a grant, which is what we usually do in our field. As such, it has successfully answered a different question, i.e. whether we could design a crowd-sourced discussion about the digital humanities.[9] Each year there have been more participants and a modicum of attention. In its success, the project may be taking on a life of its own.

3.2 Participation

A second type of analysis is to look at participation statistics. Participants for the Day of DH 2009 were primarily recruited in two ways: through direct invitation and through an application process advertised on the Humanist email list. Word of mouth filled in the rest of group. The most common occupations amongst participants were teaching roles, such as professors and instructors, and research roles. Also common were administrative heads, programmers, and librarians. However, within the group there was a noticeable lack of students. Despite our efforts to encourage a diverse crowd of participants, students did not feel confident in the value of their experiences to want to participate in the project.

Reflecting the linguistic and geographic bias of the organizers, Canada was the most represented country – which is not surprising given that it was organized out of Canada – followed by the USA, Great Britain, Ireland, and Germany.[10]

We can’t escape concluding that, despite the open call, Anglophones were much more likely to participate despite our efforts. This could be due to an Anglo-centric bias in the digital humanities. It could be that people doing this sort of work elsewhere don’t associate with the term digital humanities preferring something like informatics. It could be that social networking projects tap into existing networks despite the potential for broader distribution. Whatever the cause, it is one of the limitations of the project and one which we are addressing (see below).

The participants described their 2009 "Day in the Life of Digital Humanities" with a total of 668 posts, 434 images, and 181 comments. The distribution of these items among different blogs is illustrated in the above graph, where the x-axis represents the count of items in a blog and the y-axis represents how many blogs had each count. This basic analysis reveals that the majority of blogs (twenty-six) contained about six posts in each. Many blogs had ten or fewer posts, but there were a few blogs with as many as eighteen posts.

The graph also shows that several blogs had a fairly large number of images. However, as many as twenty-two blogs included no images. We don't know whether exclusively textual entries were intended or if any technical problems discouraged participants from sharing images. The number of comments per blog follows a similar pattern. Perhaps in a future event, something can be done to encourage more images and comments.

3.3 The Tagging

While keeping the tags applied by participants, we decided after the event to add a common set of classification terms to the data. Before this could be done a consistent system had to be developed, beginning with the decision of whether to pursue a controlled vocabulary or free text. WordPress uses its own terms for both: 'categories' for controlled vocabularies and 'tags' for free text. Free text tagging was decided against, as the dataset and range of subjects covered within the Day of DH 2009 was too small to benefit from it. Instead, a controlled vocabulary was created.[11]

The first step in establishing a controlled vocabulary is determining a level of specificity [Taylor 2005]. Should categories be very specific or more general? While the former offers a finer description of its text, the latter is less time-consuming and less prone to error. With the Day of DH 2009 tag set, we pursued a more general set of umbrella terms. Going through a partial set of results, overarching concepts were identified until the set was at saturation. In the first draft, concepts were hierarchical, with relatively abstract top-level headings such as "actions" and "events". As we began to work with this approach, we realized that it was more specific and complex than the project required, and by the final version the hierarchies were mostly flattened and considerably more direct. Finally, it was decided that terms should define explicit concepts and avoid those which are implicit. For example, location and time were categorized only when they were part of the content. In these and other examples, for the indexer to extrapolate data not in the text would be to risk introducing inconsistencies. This choice for broad specificity appears to have been an appropriate one, as we were able to keep the classification process in the purview of just one coder, which benefited the consistency of the task.

Once the category vocabulary had been established, a single member of the team tagged each post in the interests of consistency. Several additional categories of tags were created as the range of activities explored by the bloggers emerged. Initially, the process of categorizing the individual posts seemed straightforward within the context of the controlled vocabulary. However, it soon became clear that a controlled vocabulary, while needed in order to structure our dataset for eventual export and quantitative analysis, did not fully encompass the complexities inherent in the average Day of a Digital Humanist. The main issue with using the controlled vocabulary for tagging the blog entries was interpretive (c.f. [McCarty 1991]; [Hockey 2000]). Though a category tag gives the impression of being an objective label of any particular content, the tag chosen to represent any particular post is colored by the tagger’s personal biases, unconscious or otherwise. In some instances, the labels applied by the Day of DH 2009 research team were different than those applied by the researchers themselves.

This brings up the question of whether one interpretation is more valid than another. In the case of a single blog, an activity labeled "learning" by a participant but labeled "research" by the Day of DH 2009 organizing group is not a large issue. The tag applied externally to the original author could easily be changed to accommodate the author’s intended interpretation of his or her own work. However, since hundreds of posts were tagged, a consistent interpretation of activities was necessary to control the quality of the exported data, therefore superseding, in some cases, the interpretation of the author.

In addition to the interpretive gap between the original author and the coder, there is the possibility that a tag applied to a post is going to bias others’ interpretation of that post. If the Day of DH 2009 tagger tags a post "Research" then what influence does that category tag have on the next reader? Would that person have interpreted that post as representative of another activity entirely? Ultimately, it was decided that the interpretive issues with tagging were all appropriate for the project. In the time since, we have found that our subsequent work with the data has not been limited by the choices made in classification. The Humanities as a whole encompass disciplines in which multiple interpretations are usual and welcome. We acknowledge that the system of tagging in place for the Day of DH 2009 posts is only one possible interpretive framework given the range of activities engaged by the digital humanities community, and we have been refining it with each iteration.

It shouldn’t surprise us that "day", "digital", and "humanities" are among the most frequently used words. Likewise, we would expect entries describing what people are doing to use "I’m" often (as in "I’m doing X or Y".) Similarly, the frequency of "time" can be ascribed to the role of time management in describing a day of work, especially if you have to take time to blog about what you are doing in addition to doing it.

From a disciplinary perspective what is interesting is the importance of the "project". The frequency of "project" suggests we conceive our work as being around projects. This is supported by the tagging. "Project Work" (DDH-ProjectWork) was the most popular tag, being applied to 209 entries. Project work and discourse around projects is one of the distinguishing features of digital humanities work; we doubt philosophers would use the word as frequently to describe their work.

This correlates with the high incidence of the Home tag. Note also the importance of the Coffee House which surpasses the lab. We would expect the computer lab to be more important; perhaps coffee is the most important technology in the digital humanities.

As for what we do with our time, project work is important, as mentioned, but our time is also taken up by other administrative tasks including email and service tasks. This is not to devalue administration and service, but to acknowledge their importance.

We think it also important to note how social digital humanists are. Contrary to the image of the solitary humanist, digital humanists spend a lot of time with other people in meetings, in conferences (or planning them), in class, and in labs. That is not to say that we don’t also work alone (see Office and Home above), but we spend a significant amount of time with others. This could be connected to project and administrative work as digital projects typically involve multiple people with different skills who need to communicate and meet. Here’s a typical description, from one participant: "this a regular and generally argumentative internal review meeting at which people from across the department report on projects underway." The attitudes towards meetings, are, however, mixed. As another puts it, after reading the posts of others, "People go to way, way too many meetings."

If we look at tags for types of academic activity we see that digital humanists do what we would expect humanists to do including Reflecting, Teaching, and Research. The high count for Reflecting is, as we discuss later in this paper, due in part to the nature of the Day of DH project, but it is also a paradigmatically humanist response. Humanists will be relieved to know that Reading, Writing, and Editing are still done in the digital humanities, but Programming, Blogging, Data Collection, and Gaming are new activities for the humanities. It is surprising that programming would show up more often than writing, even given the role of computing.

Lastly we share some anecdotal thoughts coming from the Day of DH and the correspondence around the project. Coming from a university that has a number of DH projects, faculty, and graduate students (the University of Alberta has an MA in the field) we forget how lonely it can be to do digital humanities elsewhere. Many who do computing in the humanities are alone in their university and feel isolated both in their department and from the field. We were struck by how many people told us in correspondence how they welcomed the Day of DH because it let them be part of a larger research community for one day. It also gave many a feeling of visibility in and belonging to a field that can increasingly be seen as exclusive. While there was an application to participate, we didn’t turn anyone down who understood what they were getting into. This meant that many people who felt outside the discipline now felt part of it for a day and part of building the disciplines’ self-understanding. This is good as the digital humanities is a field that always thought it was inclusive, but can fail to live up to its self-image.[12]

4. What Worked and What Didn't Work

Social research projects hold significant promise for the humanities, where we often deal with content of interest to the larger community but don't always have the funding for the time-consuming and human work of gathering and editing content. Where the arts and humanities can restructure research projects for community participation they can engage a broader community, whether it is an academic or extra-academic community. Humanities computing can help with the technical design of projects so that they can involve community participation, but the difficult issue is how to engage a community so that its members will participate. There are many social media projects from the massive ones like Flickr to small group blogs. Like most blogs most falter and disappear without participation. The difficult issue is how to organize a project to encourage participation. Here are some preliminary reflections based on our post mortem discussions and the feedback we received from others.

4.1 What worked

First of all, we designed the project so that it kept low the commitment required of its participants; it was announced to prospective participants that if they participated the work would be limited to one day of posting a few photos and entries. What people wrote about could be constrained to the day's activities – a goal that is relatively easy to accomplish, at least before you try it.

Secondly, we designed the project so that participants could get credit for what they contributed (as a co-author of the whole) and they could write about what they know (i.e. what they are doing). Thus, contribution was driven by personal motivation and knowledge and very little push of obligation. We weren’t asking people to contribute to our fame or to have to do a lot of research. This paper, the site with its blogs, and the dataset all acknowledge and document the participatory authorship of the project. This had the additional virtue of making it easier to explain for ethics purposes.

Thirdly, we did not assume people would volunteer to participate. Instead, we invited people personally, creating a seed group before issuing an open call for participation. In late iterations we tried to use other social media like Twitter to encourage people to participate. With such projects you can’t assume one message on Humanist will be enough. You need to gently remind people of the project through different venues with word of mouth being the most effective.

Once we had the participants, we decided to maintain a light but steady feed of information. We sent about an e-mail a week to keep in touch as the day approached. Human contact and communication are essential – participants are, after all, volunteering their work to make the project work so we had to be responsive. For that reason we had a number of people assigned to answer different types of questions and spent some time developing online materials to help explain the project and connect people.

As for the day itself, an important factor was that the technology was reasonably familiar and worked. We chose to use WordPress because many would be familiar with it and because rolling our own would be too dangerous. It also allowed us to postpone decisions about the final publication technology. WordPress has good exporting capabilities so we can work with a potential publisher later to bring it into a publication structure.

4.2 What Didn't Work

There were, however, things we could have done better. For example, we needed to explain the photo consent forms earlier and better. The details of research ethics in Canada are intricate, and some unfamiliar participants needed time to understand and accept them. In subsequent iterations we worked with the Research Ethics Board to develop less intrusive consent system.

Secondly, the technology should have been explained more thoroughly; for example, the ability to upload photographs was notably unintuitive. We also didn't make it clear to participants how they could create an introductory "About" page with their biography. The idea of the introductory biography was to have an entry that participants could create before the day and which they could use to point to all the projects they are involved in that they might not work on the actual day. We wanted to give people a way to provide context and additional information. As it is, we set up the WordPress accounts with a separate page as a bio, but many just created a blog entry for that.

We could have promoted the project better, alerting people that they could watch it unfold on the RSS feed page. We also should have anticipated that people would use Twitter to tweet about the project, creating a secondary discussion context. We should have supported that better by suggesting a subject hashtag, something we did in subsequent iterations.

WordPress, for all its virtues, turned out to export data that was difficult to clean up. The content of the entries exported from WordPress includes whatever the user pasted in or edited. Where users wrote in MS Word and pasted in their entries, we also found layers of Microsoft XML code that were hard to process automatically. Ultimately, we had to settle for fairly crude consistent content code.

We would like to encourage greater international participation, especially in non-English speaking countries, but have failed in attempts to get more participation. We have tried personally inviting people we know in other countries. We have tried asking international participants to recommend people they think would like to participate. None of this has lessened the Anglo-centric geographic distribution. For 2012 we hope to enlist partner institutions to present the project in their community as a way of widening participation.

5. Theorizing the Project

A Day in the Life of Digital Humanities 2009 is an example of a collaborative crowd-sourced project. With always-on access to the Internet, dispersed communities can create such documentation together with less logistical complexity than used to be.[13] In our case, the instant publishing and sharing of participant content resulted in a fairly rapid feedback loop within the project. Participants actively followed each other's blogs and commented on them. Others thought about the project itself and offered reflections on it. This active commentary overflowed beyond the project as discussion unfolded in other forums, notably on Twitter.

In humanities research, there is often an inverse relationship between depth and breadth. An individual scholar can do one or the other. At their most meticulous, humanists may spend years reading and interpreting a short text in great depth. To handle a broad corpus necessitates a colder, more mechanical approach to the data or more people. Clement et al. respond to Crane's question of "what do you do with a million books?" by claiming that one certainly does not need to read such a corpus #clement2008#crane2006. In fact, we often don’t – we can now mine it. However, large-scale online collaboration suggests a way to handle breadth while keeping the human depth of reading and writing. By involving many people and dividing the task among them one can closely look at a broad phenomenon without resorting to automated mining and analysis. The Day of DH 2009 was not a text reading project, but it and other such projects show how a number of people can take a deeper look at a phenomenon together in a short period of time. We hypothesize that there are two ways the humanities can respond to the challenge of scale; we can automate research practices so that computers can mine large corpora or we can mobilize large crowds to distribute human research practices over large corpora. Both strategies have merit and should be pursued.

One aim of the Day of DH was to explore the usefulness of auto-ethnography as a methodology for studying the digital humanities. Nicholas Holt defines auto-ethnography as a "writing practice [involving] highly personalized accounts where authors draw on their own experiences to extend understanding of a particular discipline or culture"
[Holt 2003]. Auto-ethnography differs significantly from traditional ethnography where the researcher takes a distant and observational perspective on the culture being studied.

The single question posed by this project, "What do you do?" could have just as readily been addressed through standard ethnographic techniques such as questionnaires or interviews with participants. However, by inviting participants to record their day in a blog, the participants then became the primary researchers of themselves, incorporating their own experiences and culture into the narrative of their day. By turning the collection of data on the digital humanities into an auto-ethnographic study of over 90 digital humanists, each participant became the researcher of their own role in the digital humanities. This reflexive study of the participant-researcher’s own role in a greater culture thus has created a dataset far richer and more complex than would have otherwise been available if digital humanists had been given a set of parameters, such as a questionnaire, in which to define themselves. The rich dataset derived from these multiple auto-ethnographies also allows for an additional layer of more traditional ethnographic analysis by all. We can all use the rich dataset to make generalizations about the activities of digital humanists as a group, yet fully appreciate the complexities of the individual narratives.

5.1 Reflection in the Digital Humanities?

Community projects don't simply document an existing community – to some extent they create it. This is an age-old pattern where a community negotiates its becoming by presenting to itself images of what it would be if already mature. Consider how thinkers like Plato and Cicero used dialogues to imagine a mature culture of discourse. In the digital humanities, we have a tradition of welcoming interdisciplinary researchers, which means that our boundaries are permeable and the discipline perpetually thinks of itself as being youthful and in formation. Community building is therefore an ongoing activity, to which projects like the Day of DH 2009 can contribute. One could say that the Day of DH provides one site where we can renegotiate the community. One participant told us afterwards that they were thinking of running something similar at their university as a community building exercise. The exercise could thus become recursive with communities within the community negotiating themselves in perpetuity. The data will never be an objective representation of what we typically do (if there is such a thing); it will constitute a move representative of what some of us think we do or should be doing.

Recursive reflection is not a vicious thing. The humanities have a different relationship with observation than the sciences. The humanities are not concerned with changes that the act of observing a phenomenon (like the human) will effect in that phenomenon. Rather, it is believed that observation of the human is an observation of ourselves; therefore, self-reflection, if effective, will change the observer and the observed. For that reason, reflection has always been important to the humanities and is alive and well in the digital humanities. As mentioned above, we were surprised by how many of the participants used the event not just to document their everyday activities, but also to reflect on the field, the place of computing, the methods, the technologies, and even the project itself. This shouldn’t have surprised us as it is a paradigmatic move for humanists to ask about the asking. That this self-reflection could take place internationally using an online social media technology like WordPress MU is perhaps a reflection of the comfort with technology that characterizes the digital humanities.

The project is typical of humanities computing in another way. We learned through collaborative making. Every participant was making something through technology on the day, even if the technology was fairly simple. While we tell granting agencies that we have fully worked out the theory of what we are doing before proposing projects, in many cases the fabrication is the research. To paraphrase the title of a workshop organized by William Turkel (also one of the participants), in this project we collectively explored "Crowdsourcing as a Way of Knowing." Such fabricated research is new to the humanities, but not to the arts and design. In the humanities we are used to publishing as a way of sharing knowledge not collaborative making of technological things.

Willard McCarty proposes that we think of our practice as one of modeling, where we are modeling as a process of exploration while at the same time creating models that are representations [McCarty 2005]. This project can be thought of as a collaborative modeling of the field where for one day we used some of our own tools and new methods to think about our research in community. The process and the result was an interpretation.

Lastly, we return to the deliberately modest use of technology in this project. The digital humanities can often be about really big grant-funded projects like the TEI, TAPoR, Monk, Zotero, HASTAC, and so many other worthy projects. With the economic downturn after 2008 came a defunding of universities and infrastructure so we now need models for how to do research that does not depend on startup or ongoing funding. We need to imagine how we can reuse existing technologies, use free web services, and curate projects over a long term without being dependent on a grant. The Day of DH project was deliberately developed without grant funding, instead using the resources at hand from wikis that were already set up to WordPress Mu. For us it was not only an experiment in crowdsourcing, it was also an experiment in reuse. The most important resource was the time of the organizers and participants, not the software and the volunteered time was just that, voluntary. Granted, we were lucky to have a very supportive Arts Resource Centre which has servers and staff to help with such projects, but the point is that one can design projects to use existing resources and to involve colleagues rather than to depend on a grant to pay all involved. Further, we believe that the digital humanities has developed a dependence on funding that has curtailed our imagination as to what can be done. There is nothing wrong with grant programs that support research and the digital humanities needed an initial injection of funding, but grants shouldn’t be an end in themselves or the sole metric for value. Fortunately, there are now sufficient free services online, from Google Code to Flickr, that make it possible to run projects without having to develop specialized tools or infrastructure. What matters most is the people, their expertise, and their time.[14]

Appendix: A Day in the Life of the Digital Humanities: About the Collective Dataset

Introduction

The Day in the Life of the Digital (Day of DH) project was conceived as a community documentation project. This document describes “the dataset” by which we mean the edited and tagged version of the combined blogs, comments and pictures. This dataset was curated by the University of Alberta team in the following fashion:

The individual participants wrote their entries in a WordPress MU blog set up for the project.

We allowed the participants time, after the day, to update and edit their entries.

In some cases we removed the empty blogs of potential participants who were not able to blog their day.

We exported the blogs from WordPress in the XML format that they provide. WordPress unfortunately escapes the content of the entries as CDATA marked entries as the tagging within tends to vary depending on the user. Below is an example from the original export of Geoffrey Rockwell’s blog. Note how WordPress adds its own tags that with square brackets, [ and ] for the caption of the image:

<post timestamp="March 18th, 2009 at 9:18 am MDT">
<title>Email and News</title>
<tags>DDH Morning,DDH-AboutDDH,DDH-AdminService,DDH-Blogging,DDH-Email,DDH-Home</tags>
<categories>Various</categories>
<![CDATA[
Needless to say I woke up over an over lat night worrying about the Day of DH project. I don't know how many times I started my first post in twilight dreams. Anyway, I don't have a meeting until 10:00am so I'm starting the day in my corner by the window checking e-mail and doing various online tasks from my to do list like posting a news item on the <a href="http://portal.tapor.ca">TAPoR portal</a> and blogging the project on <a href="http://www.philosophi.ca/theoreti/?p=2415">theoreti.ca</a> (my usual blog). There seems to be something circular about blogging about blogging. Perhaps I should go twitter too.
[caption id="attachment_30" align="aligncenter" width="300" caption="Laptop and Books"]<img class="size-medium wp-image-30" src="http://ra.tapor.ualberta.ca/~dayofdh/GeoffreyRockwell/files/2009/03/img_2579-300x225.jpg" alt="Laptop and Books" width="300" height="225" />[/caption]
A quick glance at e-mail shows that nothing blew up with our blog system and the discussion is friendly. It will take me an hour to get through all the e-mail and little tasks. Why has e-mail become such a chore?
]]>
</post>

We used XSLT and scripts to clean up the XML, to unescape the content, and to try to standardize the tagging of the content as much as possible. We tried to make the XML well-formed and able to display in a browser as the author saw it.

We tagged all the entries with our own “DDH-" prefixed categories. Examples include:

We lightly edited the content for obvious typos and redundant tags. We did this partly to make sure that we had read all the entries. We were also checking for spam comments (which did creep in and which were deleted) and inappropriate content.

In collaboration with the Kompetenzzentrum für elektronische Erschließungs- und Publikationsverfahren in den Geisteswissenschaften, University of Trier, Germany, the data was then transformed into a TEI "A day in the life of the Digital Humanities" customization.

Structure of blog entries

The blog entries have the following structure (description of the content of elements is in italics):

<item>
<title>The title the author gave the item.</title>
<dc:creator>The author’s name collapsed into one word
like “GeoffreyRockwell”</dc:creator>
<wp:post_date>2009-03-18 16:08:40</wp:post_date>
<wp:post_date_gmt>2009-03-18 22:08:40</wp:post_date_gmt>
<description/>
<edited>Initials of the person who did the editing
and a short description of what they did.</edited>
<tags>
<category domain="tag" nicename="add-new-tag">Add new tag</category>
More tags
</tags>
<content:encoded>The content of the entry in some mix of HTML.</content:encoded>
<comments>
<wp:comment>
<wp:comment_id>2</wp:comment_id>
<wp:comment_author>The author of the comment.</wp:comment_author>
<wp:comment_author_url/>
<wp:comment_date>2009-03-18 17:05:01</wp:comment_date>
<wp:comment_date_gmt>2009-03-18 23:05:01</wp:comment_date_gmt>
<wp:comment_content>The text of the comment.</wp:comment_content>
<wp:comment_type/>
<wp:comment_parent>0</wp:comment_parent>
<wp:comment_user_id>0</wp:comment_user_id>
</wp:comment>
More comments
</comments>
</item>

TEI Customisation

Given the ethnographic character of the project, our rationale was to preserve as much of the WordPress export as possible. While we wanted the content of entries to unescaped, we decided not to try to edit it extensively. Nevertheless, we recognized the importance of finding a balance between implementing our project rationale and affording the project data the usual benefits that are associated with the use of the TEI standard. We therefore treated the dataset as a TEI Corpus, in terms of its outermost container elements and headers, while the contents of blog entries (such as that illustrated in 2.0 above) were largely preserved.

Notes

[1]We should distinguish two sense of "we" used in this paper. In most many cases, like this introductory sentence "we" is all the participants in the Day of Digital Humanities. In other cases, "we" refers to the authors of the paper who were the organizers of the Day and curators of the data. We hope the context makes clear which we are writing about.

[5]See, for example, Dan Cohen's discussion at a Columbia University panel on "Research Without Borders: Defining the Digital Humanities April 6, 2011" available on YouTube at http://www.youtube.com/watch?v=Xu6Z1SoEZcc.

[8]From the first iteration project we have had to address questions about whether the Day of DH wasn’t just a promotional exercise. For example, in response to private emails Rockwell posted the following to the email list,

I have been asked if this project isn't just about our self-promotion and whether people will fabricate entries. I tend to answer like this:
- Of course this is about us, it is a form of auto-ethnography where a community reports on what it does with all the attendant misrepresentations. Others reading it will know how it was written and be able to draw their own inferences. Wouldn't it be interesting if someone found a different way to track what we do and compare it to what we say we do...

[9]For some sense of the social success of the project see the "Day of DH" Twitter account we created in 2011 at http://twitter.com/#!/DayofDH or just Google "Day of DH".

[10]For an independent analysis of participants see Adam Crymble’s blog entry "Where are the Digital Humanists?" (March 17, 2011). Note that he is analyzing the 2011 list of participants, not the list from 2009.

[13]One could argue that the Oxford English Dictionary project was a print and mail crowdsourcing project. All the Internet and the Web have done is make it easier to gather a community and faster for the community to communicate.

Ayers 2003 Ayers, E. J. ""The Valley of the Shadow: Two Communities in the American Civil War"" [Online]. University of Virginia (2003). Available at: http://valley.vcdh.virginia.edu/ [Accessed 19 January 2010].