Getting Out of the Book and Into the Digital

Month: November 2014

Visualisations have played a very important role in our understanding of large sets of data. Contrary to what you might think, they aren’t a relatively recent phenomenon; they’ve existed for hundreds of years. After all, the very first maps were a type of data visualisation – a way to visualise area (otherwise known as spatial visualisations). Today, however, I want to talk a bit about temporal visualisations – visual representations of time as it is related to data. Using an annotated bibliography, I will provide a number of sources that will provide further reading on the subject in order to gain a broader understanding.

This article from Flowingdata.com discusses some of the standard types of visualisations used when dealing with time-related data. The article describes each visualisation, its standard usage, and provides an example (via a link) to an implementation of said visualisation.

While relatively short compared with some of the other articles I’ve posted here, I think this article does a great job of summing up the main types of time-based visualisations. I love the use of examples to illustrate an implementation as well as an explanation regarding when it is appropriate to use a certain type of visualisation.

Aris et al discuss time series data and its use in visualisation. Specifically, they focus on unevenly spaced time data and propose 4 different visualisation techniques: sampled events, aggregated sampled events, event index and interleaved event index. Each is discussed in depth, and an example is provided showing its implementation.

The methods presented here are certainly presented in a more cognisant manner than some of the other entries I’ve listed here. The visualisations presented as examples are easy to follow and interpret, if lacking somewhat in imagination. Shiroi, Misue, and Tanaka (see entry) based much of their work on the work presented here, and I can see the relationship between the two (which is why I called this out as an additional resource). The corpus here provides a really great understanding of time series data but allows for some growth in regards to creativity in the actual implementation of a visualisation method.

This white paper opens by discussing some of the methods used for visualising time related data, specifically data in a time series. In addition, the paper discusses query techniques that can be used for searching time series data. The paper then examines TimeSearcher2, the next iteration of the TimeSearcher software. TimeSearch is software that allows a user to load a large set of data into the software, and then visualise said data using a number of analysis tools built into TimeSearcher2. The paper mainly focuses on a few of the features new to TimeSearcher2, such as a view that allows the user to look at multiple variables when visualising the data, improvements to the interactivity of the search, and improvements to the search algorithms. The paper closes with a discussion of shortfalls within the software and improvements that could be made in future versions.

The visualisations used in the software are somewhat primitive, but given the age of the paper (nearly a decade ago), this is not wholly surprising. Buono et al are quite candid in their evaluation, specifically in the conclusion where they discuss the shortfalls of the tool. In addition, they are also quite open with the methods used, particularly in their discussion regarding improvements to the search algorithm. The paper serves as an interesting insight into the history of time based visualisations in the last 10 years.

Capozzi looks at 19th century British literature that specifically deals with India as its primary subject. In her presentation, she attempts to provide data to support her hypothesis that not only did Britain have an impact on Indian culture but, more importantly, India had an impact on British culture (via literature) as a result of British colonialism. She looks at a random sampling of literature and uses topic modeling, via a programme known as “Mallet“, to plot various topics over time. Via the use of line graphs (simple time visualisations), Capozzi uses this data to provide proof related to her hypothesis.

While Capozzi’s presentation is not a temporal visualisation in the sense that I am using the term throughout this post, I include it here as a cautionary tale of what not to do with a visualisation. Capozzi presents some very simple line graphs which seem to support her hypothesis. However, upon closer inspection, it is clear she relies on correlations between upticks in topic clusters at certain times and events in Indian history (such as the rise of the Raj or political unrest during World War I). She provides no empirical data to prove the correlation, instead merely relying on a cause and effect relationship. Furthermore, Capozzi offers no methodology behind her topic model (and by extension her visualisations). Without a thorough understanding of how her data was derived, we cannot make informed opinions regarding the data she is attempting to visualise. When working with visualisations, it is imperative to not only use a visualisation that will be intuitive to your audience (as I point out in later works) but also to remain transparent in the methodologies used to derive the data.

Day presents a number of interesting ideas around spatial and temporal visualisations. In his presentation, he discusses how we typically use time and space data, as well as common methods of plotting this data in a graphical form. He then continues by discussing both time and space data, separately and together, in a more in-depth format. He also discusses some great tools for visualising this type of data.

What I love most about this presentation are slides 14, 15, and 16. Here, Day discusses how we are used to seeing time plotted in a linear format. But he delves deeper by plotting out actual time data (using the movie Back to the Future as an example) to illustrate other, non-linear visualisations of time.

De Keyser, V. “Temporal Decision Making in Complex Environments.” Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 327.1241 (1990) : 569-576. Web. 20 November 2014. http://www.jstor.org/stable/55328.

De Keyser’s essay delves into the importance of time and the role it plays when making decisions. De Keyser begins by discussing how technology has changed our perception of time in regards to planning, due in large part to the increased availability of data and the ability to control the minutiae of outputs. He then discusses strategies behind temporal decision making, such as anticipation, assessment, and adjustment. He concludes with a discussion of errors that can (and most likely will) evolve from temporal decision making and their effects on a given process.

The article deals largely with the effects of time on technology projects and how the private sector constantly evaluates the success of such projects based on metrics involving time. These metrics often stem from datasets and statistics compiled into visualisations in order to express success as a function of time and resources. While more esoteric than the many of the other entries listed here, this article does provide a theoretical understanding of time and its role in decision making — a factor that largely plays into the importance of temporal visualisations.

Friendly discusses the history of data visualisation, noting the first temporal visualisation used in a 10th century graph of stars and planetary movements over a period of time (p. 3). The article continues to trace the history of visualisations and their developments throughout 17th, 18th, and 19th centuries, noting the dramatic shift in approach during the latter part of the 19th century as a result of the increased usage of statistics among government agencies as well as innovations in the field of graphics and design. Following this, Friendly then discusses visualisations in the 20th century, noting the dramatic changes between the earlier and latter parts of the 20th century thanks to innovations such as interactive visualisations, dynamic data, and high dimensional visualisations. Friendly concludes with a look at “history as data” (p. 26) and his evaluation of the “Milestones Project” — a project on which he based much of his review (p. 25).

Overall, Friendly provides an interesting and thorough analysis of the history of data visualisations. His essay provides the reader with the background necessary to understand the context behind visualisations and how the methods have evolved over the course of the last few centuries. This is an excellent starting point for anyone wanting to dive deeper into the theoretical realm of the subject matter.

While visual representations of space tend to have a sort of uniform acceptance of standard visualisations even across disciplines, Mayr argues that the time-based visualisations are much less standardised and tend to be rather specific to the individual discipline in which the visualisation is focused. In order to address this phenomenon, Mayr discusses several exercises performed with students in an effort to visualise time based data around guidelines Mayr has laid out in the article.

While the article itself is quite interesting, I don’t think Mayr actually manages to create any kind of coherence around the visualisation and notation of time, nor do I agree that consistent visualisations practices do not exist. He opens by discussing how notations vary from discipline to discipline but then proceeds to focus on techniques that rely rather heavily on the field of music to inform his guidelines (Mayr mentions in his article that he has a background in music). However, the exercises he gives to use in the classroom and the corresponding results lead to some interesting takes on time visualisations that I think most will find very interesting.

While Friendly’s article is an excellent take on the history of the field, Moore and Dwyer discuss the importance of visualisations and their relation to learning and cognitive development. While their entire book contains a plethora of interesting and important information, of particular note are sections 5 and 6, which discuss the role of visualisations in schools and business, as well as the cultural and socio-political impact of the field of semiotics and its intersection with technology. Semiotics, the study of signs and symbols, plays a major role in the understanding of visualisations and the data they entail.

Moore and Dwyer’s work is an excellent companion to Friendly’s article for providing a strong basis of understanding of the overall realm of data visualisations. Both are a necessary first step to a deeper understanding of the theory and reasons behind why visualisations are both important and utilised.

Shiroi, et al discuss their creation of a visualisation technique they have developed known as “ChronoView”. They begin by discussing one of the problems of temporal visualisations, which is treatment of each time interval as discrete and the lack of ability to cluster a single event around multiple time entries. In order to combat this problem, they developed a circular view of the data.

While I’m not entirely sold on the visualisation used here, it is an interesting approach to visualising time-related data. The paper itself is well thought out, and the methods used for plotting the data are clearly and concisely disclosed — something I feel is incredibly important in the field of visualisation work. This, however, is the type of visualisation that I feel doesn’t lend itself well to the average reader. I would posit that to understand the data presented in this type of format, one would need not only a solid understanding of the particular field or data being discussed but also a strong background in statistics or visualisation theory. However, as a whole, I think it’s a solid take on time visualisations.

Turker and Balcisoy discuss the use of visualisations of temporal data utilising large datasets from social networks. As a result of their research, they have created a new visualisation technique they have dubbed the “Hyperbolic Temporal Layout Method” (HTLM). HTLM utilises geometry and spatial placement to visualise actors and relationships utilising a spiral layout. This paper describes how HTLM was developed, the algorithms used, and examples of the actual visualisation.

Turker and Balcisoy have done an excellent job of researching and proposing a new visualisation technique. They have taken great care in remaining transparent in their approach and have fully disclosed the algorithms used as well as discussed the background information that has led them to the creation of HTLM. That said, I feel that the visualisation itself falls somewhat flat. While an interesting take on a temporal visualisation, I feel that without significant understanding of the data and the field, most users would be unable to parse the data being presented — the visualisation remains almost unreadable to the casual observer. Perhaps Turker & Balcisoy are positioning HTLM towards a specific audience, but there is no indication within the paper itself this is the case. Thus while the visualisation offers a new and creative technique for visualising data, it’s difficult readability makes it a less than ideal visualisation.

The Letters of 1916 is a project involving the transcription and compilation of letters written by or to Irish residents between November 1915 and October 1916. Originally begun at Trinity College Dublin, the Letters of 1916 project has recently been transferred to Maynooth University. My Digital Scholarly Editing class has had the privilege to assist with the upload and transcription of some of the letters on the website. But one of the things that has fascinated me most about this project is the use of crowdsourcing to transcribe the letters and the methods of promotion that have been utilised to garner public attention.

Crowdsourcing the Transcriptions

At initial glance, one wouldn’t think there would be many letters to transcribe for such a short period of time. But once you step out of a modern mindset and realise that, during this time period, letter writing was really the only way to communicate, you begin to understand the sheer breadth of what this project is attempting to undertake. Factor in that during this time, Ireland is firmly enmeshed in World War I, and the Easter Uprising, a prominent event in Ireland’s fight for independence, occurred in April of 1916, there is quite a bit of activity going on for the citizenry to discuss. So how does a small research team transcribe all of these letters?

Enter the concept of crowdsourcing. Wikipedia defines crowdsourcing as “the process of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers.” The idea is to leverage your audience, the group of people most invested in your product, to assist with the collection of your content. While some have maligned the concept of crowdsourcing as nothing but free labour (see Crowdsourcing: Sabotaging our Value), crowdsourcing has become an important tool in the content collection space, especially among non-profit endeavours.

So how does crowdsourcing work with the Letters of 1916? It’s fairly simple. Anyone can upload a letter (although many of the letters are contributed from agencies such as the National Archives of Ireland, the Military Archives of Ireland, University College of Dublin Archives, etc.) and once the letter is uploaded, any user can then transcribe the letter using a standard set of transcription tools provided by the website (Letters of 1916 utilises Omeka and Scripto to assist with the transcription efforts).

By the Numbers

The success of the crowdsourced transcription effort has been great. As of 31 October 2014, more than 1,000 letters have been transcribed or were in the process of being transcribed (approximately 71% of all currently uploaded letters), and October saw the addition of more than 30 new members to the transcription effort. For more information on the numbers, please refer to the October 2014 Progress Report.

These numbers show a positive trend in the use of crowdsourcing to leverage the audience for the Letters of 1916 in the creation of content for the website. Unfortunately, Omeka tracks only character counts and doesn’t really provide a solid look into the demographics of the transcribers or the extent of their contribution beyond the actual number of characters transcribed. Therefore, it is difficult to see the contributions of those transcribers who are proofing other people’s contributions but aren’t contributing to the overall character count, as they may only be changing a word here or there. This is one area where Omeka really falls short in terms of attempting to understand the scope of contribution of your user base. But the overall sense is that the crowdsourcing effort is highly successful.

Modes of Engagement

So what contributes to the crowdsourcing effort being as successful as it is? It is difficult to tease out any one particular item, but I would posit that the use of social media by the team has led to a strong engagement. The team utilises Twitter heavily to promote not only the site but also items related to the site, such as news articles and other events occurring within the Digital Humanities space. In addition, the team holds a monthly twitter chat utilising the hashtag #AskLetters1916 (check out Storify for the latest #AskLetters1916 chat). The team also leverages Facebook in addition to a blog to advertise the site to those interested in the history of the time period or with a general interest in Irish History or Digital Humanities.

Room for Improvement

While the site has been relatively successful in its efforts to leverage crowdsourcing, that doesn’t imply there isn’t room for improvement. While analysing the site for this article, I came across a couple of items that I thought could be improved from an interaction and usability standpoint.

First, there is A LOT of information on the site. So much, in fact, that it is very easy to get lost in the weeds and forget why you came – and that’s before you even get to the transcription area of the website. There are 7 top-level menu options, many of which have multiple sub-menus. There are a number of really interesting and helpful resources related to education and current news as well as the obligatory “About Us” and Sponsorship pages. These are well and good, but if the main purpose of the site is to contribute a letter or transcribe a letter, I wonder why there isn’t a persistent top-level menu item just for that. Yes, there is a “Contribute” top-level menu, but to do the actual transcription or contribution, one has to navigate to a sub-menu and follow links to login or signup. In addition, the menu item is easily lost among the other items.

As an alternative, I would suggest adding a persistent option for Contribution in the form of a button, that is coloured so as to stand out. I would also place it in the upper-right hand corner where the login / signup metaphor typically exists. This would draw the eye to the primary purpose of the site as well as facilitate a faster workflow for those who are returning to the site and simply wish to login to submit a new transcription or upload a new letter.

My second suggestion comes more from the transcription workflow itself. When attempting to transcribe a letter, the only view available is by category. As a user, I have no way of knowing which items are already transcribed and awaiting review, which are completed, and which have not been started. I have no options to sort or even filter. I’m simply presented with a long, scrollable list by category of the items loaded into the system. Once an item is selected, I can then see the status of the item (Not Started, Needs Review, Completed, etc.), but it requires an additional click to do the actual transcription (a click I deem largely unnecessary). Finally, the transcription tool itself is a little clunky. There is a toolbar provided to assist the user with standard TEI encodings, but as the average user may have no knowledge of TEI and the transcription page provides no explanation for how encoding should be handled, a number of transcriptions require a lot of clean up in order to conform to standards. Much of these complaints are, however, limited by the Omeka and Scripto software, so they are a criticism aimed more at those particular implementations than at the Letters of 1916 project itself.

Conclusion

Criticisms aside, the Letters of 1916 project has done a great job of garnering attention and drawing in its audience in order to facilitate the creation of content. The next step in the process is to migrate from the transcription desk to a site that is searchable and discoverable. With the implementation of a strong search mechanic and a few visualisations of the data to add a little spice, I think the Letters of 1916 will set itself up to be a rousing success.