Visualization Across Disciplines

In recent years, visualization has become an all-purpose technique for communicating and exploring data within the humanities. There are a wide availability of tools offering different points of entry from IBM’s Many Eyes to Gephi to Tapor 2.0. Projects like the Visual Thesaurus, Mapping the Republic of Letters, and Hypercities, among countless others, all engage with visualization as an integral part of their scholarship. Yet, they do so in very different ways and from a wide variety of disciplinary perspectives, leaving us to question: what is visualization in the humanities?

Why do we use it? How do we use it? And to what end?

This forum seeks to explore some of the key ideas and problems at stake in beginning to articulate an answer. Taking an expansive view of both visualization and the humanities, this forum will interrogate not only the ways in which tools are used, but also the different priorities and intersections of varying disciplines. Using four broad categories (case studies, tools, theory, and pedagogy) as a loose structure, we hope to encourage an open conversation the will speak to the growing use of visualization for research and teaching across the humanities.

Questions and Starting Points:

Case studies: Why and how do you use visualization? What are some of the existing exemplary visualizations in humanities research? What do they offer and/do that makes them exemplary? What kind of work do they do that is different from other scholarship in the humanities?

Tools: What visualization techniques and/or tools have you found helpful in creating visualizations? Do certain visualization techniques and/or tools have inherent limitations (practically or conceptually)?

Theory: How might we combine what Franco Moretti has called “distant reading” with existing practices of close reading? How do we understand visualization theoretically and critically, in relation to other forms of past and present media?

Pedagogy: What does visualization expect and/or offer pedagogically? In what ways is it a tool for understanding, communicating, and creating knowledge in the classroom?

Hosted by HASTAC Scholars:

Tassie Gniady, department of Library and Information Science, Indiana University

53 comments

Due in large part to its often powerful and aesthetically pleasing visual impact, relatively quick learning curve (thanks to “one-click”-style visualization platforms,” and overall “cool,” the practice of visualizing textual data has been widely adopted by the digital humanities. This prevalence is evidenced by, for instance, the high frequency of the term “information visualization” in the 2011-2013 Digital Humanities conference abstracts. Scott Weingart created a bar chart depicting the frequency of various paper and panel submissions for the Digital Humanities 2013 conference. “Visualization” ranked sixth (51 papers) out of 94 total categories, surpassed only by “literary studies,” “data mining/text mining,” “collaboration,” “archives; repositories,” and “text analysis.” If the first wave of large-scale database projects in the digital humanities is exemplified by the practices of digitizing texts, constructing archives, and determining best practices for digital preservation, then the practice of information visualization is emblematic of the second wave of projects devoted to mining this new data. The NEH-funded “Digging into Data” granting program, a yearly challenge that asks how the notion of scale affects humanities research, has specifically supported this practice of leveraging and visualizing large databases and archives.

I am interested in the deployment of information visualization as a method of textual analysis in the digital humanities, with a particular interest in how a technology with a long-running history in the social and STEM sciences might affect the practice of literary scholarship, particularly in terms of reading and interpretation, whether the quantitative shift in scale afforded by visualization tools working with large databases might in fact usher in a qualitative shift in scholarship. I am interested in both historicizing and theorizing a technical process that has, in some ways, become detached from its own original historical and conceptual trajectory. I want to locate contemporary visualization within a broader constellation of aesthetic practice and visual representation, specifically within the traditions of statistics, computer science, and graphic design. I am really interested in hearing people’s thoughts on the motivations behind their incorporation of visualization methods into their research.

Dana, interesting point about a first wave and second wave in DH projects.

We might also define a few more waves -- for example, digitization initiatives, online databases / archives, search interfaces to those archives, and data mining / visualization projects might be split into three or four overlapping eras -- although, notably, the later ones often presuppose that the earlier ones are still ongoing. For example, I will be doing a data viz project fairly soon that requires a mass digitization effort in order for visual exploration to be possible -- although I probably wouldn't apply for funding using a "digitize this" narrative, as my personal sense is that this era has indeed largely passed in terms of such work being supported as such.

One other complication -- I agree that the rise of the term "visualization" in the DH literature marks a hot topic, but it may also partly masks the fact that earlier 'waves' were often also invested in the visual, but did not necessarily using terms such as "visualization." The "result set" / "record" design pattern for search that was so common in many early DH projects is also a visual interface, and many "list then retrieve" systems of various kinds were descibed at length in the literature as an experience for a researcher, often with screenshots... these user interfaces are visual interfaces, although not information visualization in a strict sense. For example, I wonder how often the term "visualization" occurs in McGann's Radient Textuality (2004) when compared with the incidence of discussions of visual interfaces to doing image-based or UI-based digital humanities work?

That's an excellent point Jeremy. I'm definitely interested in parsing the waves themselves for more nuanced phases/shifts/etc. I think it's also worth thinking about visualization itself as a set of associated practices. In other words, "visualization" often involves digitization, database construction, etc., and understanding each phase, so to speak, betters our understanding of the final visualization. With respect to your second point, I think you're right about the masking of visualization work prior to the current wave. I would also add to that trajactory portions of the "visual" turn in the pre-digital humanities as being relevant to/influential of current vis. work (W.J.T. Mitchell, Drucker, and others).

My scholarly projects have been dedicated to investigating the rise of popular visual culture in the 18th Century and the anxiety, and more often, resistance, with which they were met by the Romantic literary elite. Gillen D’Arcy Wood refers to the profound democratization and popularization of visual culture from 1760-1860 as the programmatic desire for the “shock” of the real. Visual spectacles such as the panorama and diorama, advances in theatrical stage technology and acting techniques, the pre-cinema visual experience of the phantasmagoria and the advent of the modern museum and gallery culture all contributed to how the “real” was defined and consumed.

Thus, as a HASTAC scholar, I am interested in the possibility of developing new analytics to explore the 18th and 19th century visual spectacles of London as ground zero for the “poetics of augmented reality” (Manovich).

My primary work with visualizations has been with network visualizations. I’ve used network visualizations to investigate officer relationships in the U.S. Navy from 1798-1812, and also to tracknetworks of print culture (created through analyzing reprinted texts in American newspapers 1836-1860).

Network visualizations are more specific than some other kinds of visualization because there’s a well-developed theoretical base to networks. Humanists, though, sometimes use the term “network” when what they mean is simply “connection.” There’s not necessarily anything wrong with that, but it’s important to be aware of the theory when using such visualizations.

Network visualizations have been really helpful for me on two different fronts. First, network visualizations help to show connections that seem unimportant when viewed just as a list or archival material. Gephi and other network programs such as NodeXL allow the user to weight the connection line based on the strength of the connection (number of reprints, number of pieces of correspondence, etc.). Sometimes those weighted results can be quite surprising. For instance, one connection we’ve been surprised to note that a paper in Missouri, the Boon’s Lick Times, has a very strong connection to the Vermont Phoenix, in Brattleboro, VT. These connections were discovered from an algorithmically derived data set of over 2 million lines--not something you can analyze by hand or even in Excel.

Of course, that surprising result from the visualization is not the end of the investigation. Though network analysis can provide us with ideas about how things are connected, they can’t tell us why. And that’s where traditional archival historical work comes back into play. Using this network visualization as a jumping-off point, we can then go into the archives to discover what made these two papers so strongly connected. Distant reading has to give way to close reading at some point to figure out causation.

On the other hand, and the second way networks have been helpful to me, close reading can be bolstered (or destroyed) by distant reading as well. That’s the case with my officer network. The close reading of one well-known historical text posits a strong connection between a certain group of naval officers; but I believe that once I’ve got a robust visualization of the actual connections, it will show that the strong connection is illusory.

Both of these examples illustrate benefits and detriments of network visualization. Networks are hard pressed to visualize very complex situations (such as the connections between terrorist organizations, funding sources, and governments in Bosnia). But they can help us strip away some of the noise of the connections to be able to more clearly and easily analyze what we have.

Welcome! I am honored to be co-hosting this forum with Tassie, Brian, Abby, and Dana. Also, many thanks to Fiona and our special guests! We are all quite excited to be a part of this discussion and hope that you will add your thoughts and questions.

I am a student at Indiana University getting a Masters in Information Science as icing for my PhD in early modern literature from UC-Santa Barbara. I've been involved with DH for a while now, and I am currently working as the graduate assistant for the Wells Library's IQ-Wall. As a large display that is driven by a Windows 7 machine, we offer a low barrier to entry for those wanting to engage with the possibilities that 12 million pixels have to offer.

We have hosted a 3-D art course, students working on visualizations in R, presentations about open access, and even an arcade game night. However, we would like to expand our programming, and this is where you come in. If you had a Wall such as this at your disposal, what would you do with it?

I've played with Google Earth a bunch on the Wall, but I could entice lit, history, and geography classes to come play with the Cassini map. I'm also exicted about the workshop What Can We Do with 500 Billion Words starting tomorrow. There are some specific visualizations talks planned at the Wall Saturday morning, so I will report back.

Really interesting question. Having had access to one of the largest high resolution display walls in the world (UC San Diego - Calit2) for several years, I've had some time in the last year to ponder what it was best and worst at now that one is no longer in my back yard. As my new home is UC Santa Barbara, a meta-question is whether to immediately reach out to large display groups on campus such as the AlloSphere, or if I should stick with a projector quad tile of a few displays bolted together in some office or lab. Right now, I'm leaning towards the second, and then migrating to collaborating with a large display group once the findings are found and I ahve things to demo.

I suppose the big question I come back to is, is the application for research, or is it for presentation? My experience is that experimental one-of-a-kind display platforms are really hard to use for active research unless you have a lot of access and support. In the end, software with a zoom function on your laptop may not be better in terms of immersion or resolution, but in terms of 24-hour access, configurability, a shared environment for your other software... there is just no comparison with a personal computer as a tool to assist thought. That said, there are a few things that experimental displays do really, really well. One of them is intuitively present data sets where simultaneous high resolution at distant locations is important -- a special case, but sometimes an important one. Another is give killer presentations or present lightning-fast overviews -- the executive summary. So, another way of reframing the question is -- if your display wall is a "situation room" -- who is it for?

My first exposure to data visualization came in using spatial analysis, where dataviz is known as "maps" and has a long history of standards and practices known as "cartography". Maps and GIS and spatial analysis prove interesting in the context of data visualization due to their long-established nature, and I wonder how scholars thinking of "doing dataviz" think of maps and mapping technologies in the context of information or data visualization. Lately, I've been offering up a reformulation of mapping as "geospatial information visualization" so as to draw attention to the lessons learned in the evolution of the map as a form of communication, as well as to recognize that data visualization seems to increasingly focus on presenting multiple types of data (network, text, maps, charts) at the same time. Neatline, for instance, gives us maps and text and timeline by default and out-of-the-box. In these cases, maps no longer are the focus, but rather supplement other equally important methods of data visualization.

I also deal a bit with other forms of data visualization besides maps, such as network representation in some form or another (sometimes geospatial network visualization, as seen in ORBIS). Networks are more intimately tied to the modern conceptualization of information visualization and Abby alludes to a point that I think explains why data visualization is growing so popular: Dataviz isn't just a product, but oftentimes it's the exposed computational process. I've found that when people want me to teach a workshop on Gephi, they really want a workshop on network analysis and representation, and likewise when I'm leveraging some data visualization to present the attributes of a dataset, it's being used to explore and understand it more often than it is to publish it. Visualization doesn't just expose data, it exposes functionality in code and models, and this is often more comprehensible both to the scholars and their audience. This feeds into the old Hacking/Yacking argument in DH, because building dataviz allows you to better understand these processes (and even more if you're coding dataviz and not just using a tool).

My greatest concern with data visualization is that it is still highly invested in a particular production model that expects an expert audience creating summary information for a group of non-experts with very limited time. Tufte's Challenger example, for instance, or Playfair's charts or the dataviz in the New York Times, are emblematic of this. But complex and sophisticated dataviz for peer scholars shouldn't be using the same methods and practices that are expected for infographics aimed at the lay public. In the design and dataviz community, if an audience doesn't "get" your dataviz, then that's your fault, because accessibility is the priority, but we exert a greater expectation on readers of scholarly works, and I don't that this expectation is maintained for "readers" of scholarly dataviz.

Elijah, I really appreciate this last disctinction you make between dataViz (expectations) in the design community vs. the academic context. In my own work, I have been interested in tracing the history of dataViz through the history of design (and other disciplines), and one thing I consistently find myself observing is how the goals and priorities of practitioners of dataViz change depending on audience. If accessiblity, clarity, efficiency, etc., are the hallmarks of effective dataViz in the design context, then what can we call successful dataViz in the academic context? Exploratory value? Generation of insight? Effective mediation/clarification of complex phenomena?

Dana, I wish I had a substantive response ready for this--I'll give it a try, though.

I think that the exploratory use of dataviz is well-established, and getting better all the time. The dataviz elements of Voyant, for instance, do much to better situate folks in understanding their text and simultaneously understanding text analysis as a method. I see this happen frequently, where just exposing the data in a setting where someone is helping to drive the tool (whether literally, by sitting next to their audience or figuratively by designing the interface) in a piece designed for exploration seems to work quite well.

But it's in publication where dataviz becomes supplementary and, in the worst cases, fluff or purposefully obscuring the (lack of) scholarship. While much of this has to do with the data literacy point made below by Tara, I think it's in the expected interactivity that comes with approaching dataviz in an exploratory manner that publication can learn how to better present more sophisticated information. My own focus has been more and more on interactive dataviz, which forces you to not only create a comprehensible visual representation of the data, but also deal with how the user is transforming, filtering, and exploring the data and the functions that express that data. I think it's very hard to escape the traditional design paradigm when dealing with static dataviz, and have found that in dynamic (animated) and interactive methods that you're forced to treat with the data and functions in a more detailed and scholarly manner. Of course, this means that along with principles of data visualization, you have to integrate UI/HCI/UX, which comes out of our already strapped budgets.

To get back to the distinction, as long as dataviz is considered to be supplementary--either to a domain expert presenting to an audience, or as a figure accompanying a linear narrative explanation of the scholarship, or as a stand-alone infographic--I think it's nearly impossible to hold it to higher standards than those found in general purpose design principles. The one exception I can think of is in data visualization using well-established methods, such as surfaces for functions in mathematics or schematics like genograms. While I could imagine the digital humanities community settling on something like this, it necessitates very small audiences familiar with a specific visual jargon.

I was recently reviewing a batch of papers that all articulated the difference between scientific visualization and information visualization as methods and fields. One of the most common ways of defining the distinction is to say that, in scientific visualization, the dimensionality of the data "given" while for information visualization, it is "chosen" -- that is to say, there is some obvious or natural way for data to about a brain to be arranged in relation to 2D or 3D images / sections of a brain as it is perceived in reality if we place it in a jar, but we have no intuitive physical convention for what is a 'natural' way to place all of the words of Shakespeare in a jar. Thus, the information visualization must invent and design its conventions.

I'm suspicious of this distinction for a long list of reasons, including but not limited to the idea that brains in jars are either a commonplace or a natural event, or the implication that books are immaterial. I could probably saw a codex in half to visually expose a cross section of it much more easily than I could a skull, so I'm doubtful that viz explorations of some objects are automatically more or less 'given' than others.

Still, I bring it up because you mention "well established methods" (which come with literacy / recognizable genre included), and this seems to be a crux. If we accept that in infoviz the form of representation is always chosen, never given, then an information visualization will always, like Finnegans Wake, be a thing that we must expect at the outset is unique in itself and that we should learn to read. If, on the other hand, both sciviz and infoviz work best when they are legible -- whether to expert communities or to almost everyone -- then the dimensionality and idiom of any work is not given, but it need not be unfamiliar. Many subway and rail maps work in the same way, few use "natural" dimensionality, and few people understand how to use them intuitively -- but once they learn to read one due to pressing need, they will find that they can read most. Sometimes, literacy is just a word for a recognizable genre -- or genre is just a word for a community with a shared literacy.

Your concern is very timely to me. I am just finishing up a group project in Katy Börner's Information Visualization class, and our data set is intended for scholars--specifically the the very scholars who gave us the data so that they can examine trends and outliers in an aggregated manner so as to direct future research. There has been a fair amount of push and pull as we debate accessibility for the class and nuance for the client.

Great points above, and Tassie, I'd love to hear more about the 'push and pull' - how much of it was about process, and how much about the product? And how did that intersect with discussion of the tools and theories you used?

Unlike some of the other projects available (for the in person class and the MOOC) we chose one in which process replicaion for our clients was as important as final visualization products. So, you are right on the money in identifying both as sources of tension as we wanted to streamline the process, produce something informative, but finally weren't necessarily highly invested in the end result being quickly digestable by a lay audience. Tableau Public allowed us to created a wonderful mash-up of several different kinds of visualiziations and the dashboard allows for the amount of data being displayed to become nuanced quite easily. We began with the premise that we might need to have 7-12 different visuzalizations, but the interactive ability of Tableau really saved us.

However, as far as process goes, there is a learning curve to Tableau, and we'll be sitting down with our clients to see how invested they are in having a grad student learn the ins and outs so that as their data set grows, they can continue to use the framework we've set up.

What's interesting with interactive scholarly works like ORBIS is that they will be accessed by different audiences. Walter Scheidel (the PI of ORBIS) jokingly referred to the "37 people" that ORBIS was built for, but besides those peer scholars, it's also a public history piece in a sense, as well as course material. Fortunately for us, ORBIS leveraged a well-known way of interacting with something that seemed as exotic as a geospatial transportation network model: Google Maps. Because the interface resembled something that a broad audience was literate in (data literacy, again) it dramatically increased the accessibility and therefore use of it. When we're using dataviz and there aren't well-established patterns (like more traditional network visualization) there's a greater cost to making the work accessible. One of the generic dataviz representations that I think is widely applicable is the parallel coordinates layout, but while I think it's very powerful and learnable, there's a dramatic barrier to accessibility to folks who have never seen something like that. When you look at the parallel coordinates component of City Nature, you'll see that we tried very hard to signal its functionality using numerous mechanisms (including a video tutorial) and it's still intimidating to a general audience.

Hello! Thanks for the invitation to participate, I'm looking forward to the conversation.

By way of introduction, I was initially interested in visualisations as explorations of datasets about museum collections. I wanted to help people get a sense of the scope of a particular collection, the range of objects within it, when and where they were created and collected, particularly the 95% of objects not on display. More generally I'm interested in finding ways of moving easily between different scales, from overview to detail, from distant to close reading and back again.

Because I tend to work with historical datasets (whether datasets based on record keeping begun centuries or decades ago or new datasets of necessarily incomplete historical material), I'm interested in pushing tools or script libraries made for neat, born-digital datasets to better represent fuzzy, incomplete, messy humanities data.

I've noticed there's a certain convincing 'truthiness' in visualisations that means they might be accepted uncritically - I'd like to see pedagogical and theoretical practices develop around critiquing visualisations and contextualising and justifying the choices made during their design and the cleaning, smoothing or aggregation of records. Humanists have sophisticated language for discussing the credibility of sources or interpretations and I'm hoping that can be translated for visualisations. I think working through the processes of creating your own visualisations is one way to help develop that critical faculty, but as Abby said, 'Humanists, though, sometimes use the term “network” when what they mean is simply “connection.”' - vagueness doesn't help us talk across disciplines.

PS I'm Australian/British so yes, I do spell 'visualisation' differently!

Hello Mia! We're very glad to have you participate! I couldn't agree more with your second paragraph. I have been really interested in critically deconstructing data visualization--both as a set of objects and a set of practices. I do think that there is always a danger that visualizations might distort/obscure/misrepresent the original data, whether intentionally or unintentionally. Adam Crymble had a great blog post a few months ago that termed the reliance on powerful graphics (rather than persuasive arguments) "shock and awe visualization." I definitely agree that producing your own visualizations is an excellent crash course in the subjective decision-making that undergirds the practice. I think this is why it is particularly useful to examine the history of visualization and to ask what the embedded goals of the technology are (efficient representation/mediation of data) and how these goals might impact or be impacted by "traditional" humanities practice.

Elijah and Mia, thanks for your insightful posts! You both point to a shared concern about we expect (or should expect) from visualization as part of interdisciplinary scholarly research. And in response, I cannot help point to what I see as another common tie -- data literacy (or rather the need for it).

Data literacy, or the ability to interpret and manipulate data as part of evidence-based thinking, is something that we don't often think about when we talk about visualization. Perhaps this is because visualization gives the impression that it is innate. Yet, as much of the discussion in the forum thus far points out, it is not. It is learned.

Disciplines that have grown up around quantitative research methods (e.g. the sciences and social sciences) seem to realize this. Data literacy, along with the use of digital tools, is something that is often taught and even interwoven into the process of scholarly inquiry. This is too often not the case in disciplines that have traditionally grown up around qualitative research methods (e.g. the arts and humanities).

Perhaps this situation sounds familiar. It is for me. You are sifting through a new and uncleaned cultural data, and realize 1) you need more data; 2) have no idea how the data is structured; or worse, 3) you have no idea what you need to do to the data to visualize what you are interesting in looking at. Looking at examples of pre-canned data sets doesn't offer much help either.

As data has become an integral part of all forms of scholarship and as visualization has followed as a way to understand this data, this untaught literacy becomes increasingly problematic. We expect to be able to use and interpret visualizations for scholarly research, but we really have no basis for doing this. If we really want to talk about using visualization (and data) across disciplines, using it critically, and using it effectively, data literacy is were we need to start. What is it? How do you teach it? Is it the same across all forms of scholarship?

This is a great point Tara. One way I have tried to talk about and teach critical data literacy is through the production of actual data sets. Participating in the construction of a data set provides a lot of insight into how subjective such processes actually are, how many decisions go into data structuring/manipulation/management, etc., and how much of this subjectivity is occluded by the end-user interface. I think things definitely get more complicated when you talk about data literacy across all forms of scholarship, but it seems that the basic idea of acknowledging the fact that data sets are not magically (objectively) constructed, is a good start for interdisciplinary scholars.

Tara, interesting question -- particularly whether data literacy is the same across all forms of scholarship. Intuitively I would say "no," as all literacies are complex etc., but even if it isn't exactly the same, I wonder what the broad shared base would be -- not just across various forms of scholarship, but also art practice, journalism, etc.

Is it a checklist of skeptecisms about (primarily) provenance and notability, as in "Become Data Literate in 3 Simple Steps" from the Data Journalism Handbook? Is it a question of special training, as it is framed by the Science Data Literacy Project, and if so, is that training primarily domain-specific? Or is it jus a special way of framing reasoning of the type that we typically introduce in elementary-level education -- is it a special form of "numeracy", for example, or of "critical thinking" skills? My guess is that there are a few basic heuristics that are common to most areas, whether you are a data producer / handler or an analyst / reporter / consumer, and that many of these focus on provenance, causality, and the constant need to remind ourselves that representations (such as data) never have a privileged relationship to the real. That said, there probably isn't any way to do this except a slew of domain-specific primer's -- but I'm torn between whether I'd rather give my college undergrads "Data Literacy for Literature and Linguistics" or just "IDGs Data Literacy for Dummies."

Could you elaborate on what you mean by "evidence-based thinking" as it relates to data literacy? Is it a prerequisite to becoming data literate?

In my understanding of the term, for example, this would share similarities with the scientific process, or at least with iterative design; it's a feedback loop in that preliminary findings drive further inquiry.

"I've noticed there's a certain convincing 'truthiness' in visualisations that means they might be accepted uncritically - I'd like to see pedagogical and theoretical practices develop around critiquing visualisations and contextualising and justifying the choices made during their design and the cleaning, smoothing or aggregation of records. "

I think Mia is spot on. My use of visualizations is quite rudimentary thus far, working mostly with Many Eyes and generally for "quick" illustrative purposes only. I like very much the idea that every visualization should come with explicit details of the manipulations it entailed. I just created a visualization of 400+ newspaper articles about yarnbombing in order to quickly introduce a conference audience to the dominant interpretation of yarnbombing. While I still think a graphic is better than me reading a list of terms follow by numbers, I did extensive manipulation of the terms used in the visualization in order to highlight what I'm talking about. Without explanation, the visualization would be confused at best and disinegnous at worst.

First of all, I would love to see that visualization of yarn bombing! But I think you hit on exactly the right points about visualization: transparency and contextualization. For what it's worth, I think transparency and contexualization are vital for any scholarly work, but especially so in visualizations, which can be easily misinterpreted.

It's a pleasure to participate in the conversation! My own work has been around implementation, rather than academic research. In particular, I've led the design and production of data visualization tools that help teachers in the US meet learning standards set forth by the new Common Core (e.g., see visualccl.org). We're still early in our work, but are asking and exploring questions about what people might need to learn, to learn to analyze and visualize.

So, I'd like to pick up this element of the thread. What do you believe that someone might need to learn to analyze and visualize data effectively, in the complex ways that we intend?

Some provocative questions:

- Assuming that we take an essentialist approach to setting learning standards, what should all people understand about data visualization? What, beyond understanding basic data manipulation (e.g., in Excel) and graphing (again, Excel) is important to address or teach?

- What, then, constitutes a more rarified skillset - necessary to move learners beyond, say, a novice level of understanding to a more expert capability?

I’d love to learn your thoughts, and suggest that the answers are indeed challenging simply because data visualization requires so much prior knowledge; subject understanding, perhaps programming, and data manipulation (and so on).

What do you think?

--Dave

p.s. If there are a sufficient number of replies, I’ll synthesize the info into a collaborative / Creative Commons doc for our future reference.

Thanks David! I'm glad to see the pedagogical interest in visualization. As somebody who teaches writing and visualization, I find that I often think about these questions a lot.

As you can see from my "Data Literacy" response above, I think that the fundamental issues of data literacy is central to learning to visualize and understand visualizations. So, in response to what beyond Excel is important to emphasize, I would say Excel! I have recently had many conversations with others about teaching visualization at the university level and the common complaint is that you have to spend over half the time teaching students how to use Excel. Unfrotunately, this is at the expense of talking about interpretation or any theoretical issues.

Maybe the argument for going back to the basics is...well too basic, but if we're going to use visualization as a medium to understand data (and this is definitley where we seem to be headed), then we need to learn the ABCs.

Thanks for your reply! I'm going to push back a bit, because I do think there's an important question here.

First, I do want to recognize your experience. I've spoken with a number of educators and professors, and there's agreement that many undergrads do need to learn how to use Excel. It's my understanding that schools, including K-12s and universities, are trying what they can to alleviate students' challenges.

But let's assume that we wanted to develop a hypothetical progression whereby we could enable learners to envision what they might have to learn to gain a foothold in working with big data and data visualizations. Surely there are some approaches that might help them understand various possible directions they could take. Here are a few examples of what I mean:

(1) Managing data is different. Using Excel, the data are local (or in Google Docs, 'cloud-based') are relatively straightforward. There are a limited number of datatypes. Big Data, however, is all about working with massive datasets that might not be easily observed in raw format; they might consist of a large number of tables, or even require mashing up databases with no predefined table->table key.

(2) Visualization techniques begin with Excel-like structures, but engage an enhanced sense of "design for communication" in the Tuftian sense. I haven't heard Tufte's name mentioned in an Excel class :o) (would love to!)

(3) To visualize, one must often bring together (or even build) complex tools. Big data analysis and visualization taps skill and ability in programming languages and mathematical thinking and application in a way that creating a basic Excel chart certainly does not.

So, these are three examples of areas that are important to address were one to consider 'what might be taught and learned' in teaching big data/data vis/analysis.

Perhaps there are other areas to address as well? Would love to hear collective thoughts!

It sounds like Dave is interested in the skills needed to move beyond Excel into more advanced visualization, whereas Tara is asking about users who want to visualize but don't even have background knowledge of things like spreadsheets. As someone who works full time to support visualization work at a university campus, I certainly see both types of users. The learning needs for people at different experience levels are naturally going to be different, but there are also differences depending on the type of work/analysis that accompanies the visualizations.

In my work, novice visualization developers are often trained in the humanities, and they have a rich understanding of the complexity of their data. Understanding how to represent and organize those data in a way that a visualization program will understand is like learning a new language. Spreadsheet logic is used in many programs (Excel, Tableau, statistical software, even GIS programs), and since most of the introductory/novice visualization applications work from spreadsheets, I agree with Tara that this skill set is not to be dismissed - or trivialized.

It isn't enough to learn that Excel can calculate an average of a column of data, or that you can do some rudimentary text processing with Excel. First you have to learn that numbers must be precise (e.g., not "about 20,000"), data can have different levels of granularity (e.g., nation vs. city), format needs to be consistent across records (e.g., day before month in a date), etc. Beyond even just capturing complex data in numbers and cells, I've also seen students struggle trying to figure out exactly what sorts of columns and rows are needed by Excel to create the line chart they have in their heads, and it doesn't help that this is a different table structure from the one Tableau and other statistical programs might expect.

And yes, if someone wants to develop more customized visualizations than can be achieved in Excel and other stats programs, it may be necessary to learn about larger data structures, programming languages, and mathematical thinking. I do think it's a mistake to conflate big data and data visualization, though, even when talking about advanced visualization work. Plenty of custom visualizations built on coding platforms like d3 or Processing are still using relatively small data - in fact, they often have to, for performance reasons. Data management and processing for big data are really a very separate issue for me. It's the quantitative thinking skills (that data can have types, that well-structured data points are in some sense modular and can be operated on automatically, that the process of cleaning data should be done in a conscientious way and should be well documented) that connect novice and advanced users.

The design of visualizations is certainly a learning opportunity for people at all skill levels, but I'm an empiricist at heart. I think aesthetic suggestions from pioneers like Tufte should really be tested on the intended audience of a visualization. I think it's more important for people to learn how to validate their designs with user tests than to unquestioningly accept existing precepts. Plenty of recent work suggests that Tufte's simplified design style is not necessarily clearer than other approaches, and it may suffer in terms of memorability or other user experiences that are important for various applications - e.g, uncertainty visualization.

In sum, for people who want to be more serious about developing visualizations, even at a novice level, I guess I would recommend: spreadsheet/data type logic, basic social science data collection methodologies (visualizations are only as good as the data they use!), general perception-based design guidelines (e.g., color usage, axis truncation, 3d effects - shameless plug for my own list), and basic usability testing.

I recently had a difficult time trying to add a D3 (Data Driven Documents) assignment to a course. While the variety of things possibile with D3 almost "off-the-shelf" in its gallery are really impressive, recent changes in the relationship of web browsers to client-side javascript have made things like D3 almost unusable for hobbiest scripters -- you need either a full remote website per student or else a full local webserver on each laptop. Sigh.

Anyhow, the situation made me reflect on the fact that technical implementation time was eating into conceptual time -- my students struggled so much to get their visualizations to function that they didn't have as much time as I would have liked to play with different forms of representation or just interact with them at length. This is sending me back to paper prototyping in a big way. Sometimes, the thing to do is to get out of the menus and source code, and think through how it will work on a napkin.

I was familiar with bl.ocks.org, but not Plunker etc. -- thanks for the recommendation, I look forward to checking them out, especially given that livecoding.io has d3 examples built in, and Plunker looks to be d3 recommended.

When problems arise, there is almost always a workaround, or a better tool / environment / workflow (at least, temporarily -- web services come and go). Still, I do think it is an important general point that technical time can unexpectedly eat conceptual time in viz, whether for a student or for a researcher / practitioner. In my example, we were doing a short viz assignment for non-technical literature and media students on the quarter system, so any bump in the road is costly by the time it is discovered, and deploying any solution doubles down on even more technique time leaving even less time for conceptual exploration. It perhaps isn't the best example, but it speaks to a general dynamic.

I suspect most of us have at one time or another spent hours trying to get a detail to resolve rather than stepping back and thinking about what it is we we want to do -- debugging instead of designing. This is true of software and design work in general, but web design, graphic design, and information visualization design seem particularly susceptable to this kind of studio-time trap where we spend too much time massaging our medium. What are the things students need to understand in processing, prefuse, or d3 that they can't understanding by working with a familiar office chart-wizard? What are the things they need to understand with the wizard that they couldn't with a napkin sketch? Sometimes, leading with paper prototyping really brings the answers to such questions into focus, reserving interactive graphics time for what it is good at.

One of the problems with visualization is the focus on data. We're all drowning to death in data, and I think that what we should focus on is instead the functions or processes we're using to transform and filter that data. Whether it's a focus on robust, formal models or simply exposing common functions like queries and filters and arrays, I think that's where the uncertain and complicated problems lie. To echo Mia's comments, I've worked with a lot of historic data where certain aspects had to be inferred, estimated, or interpolated, and while it's one step to show those objects with fuzzy borders or transparencies, I think the real challenge is in visually representing the scholar's reasoning and methods that went into deciding what to include and what to leave out. Some of the visual model building tools (like Model Builder in ArcGIS) do this to a certain degree, but I think we need more than simple flowcharts to impart this to an audience. While I'm a big proponent of code literacy, I think it's not enough to release the code along with the data, since it creates barriers to domain experts who may be able to speak quite definitively on the choices being made without being able to understand the programmatic representation of it.

I wonder if anyone has any examples of good visual representations of process and functions and other ways that data has been dealt with.

One project that comes to mind that visualizes part of a decision-making process is an outwardly simply visualization system that sits on top of a translation algorithm (Lattice Uncertainty Visualization, Christopher Collins). What I love about this visualization is that it shows both certainty of the results and alternative choices that were rejected, especially useful for translations of works that involve highly nuanced texts.

I am a big proponent of using multiple, coordinated views to deal with complex data visualization issues, so I would love to start seeing not just uncertainty indicators in "final" visualizations (e.g., maps or networks of data that had to be processed/normalized/quantified) but also visualizations of those data wrangling processes, as Elijah seems to be hinting at. It could be something as simple as showing how the number of unique entities reduced after each stage of filtering and normalizing, or taking an approach like Scott Weingart to show how small variations in the data sample (or, perhaps, how small choices made by the research) might result in different patterns. As far as I know, most of the uncertainty visualization work seems to be focusing on how to change existing visualizations to appear less certain, and while I think that can be compelling, I think our visualization literacy may also be well served by generating companion visualizations to make the frequently hidden data wrangling processes more accessible.

Angela, thanks for your post. The Lattice Uncertainty Visualization example certainly uses opacity as a way to indicate uncertatinty in the data much more artfully and effectively than many attempts I've come across.

One of the things it does really well in the context of the rest of your post is point out that there are actually multiple levels of uncertainty going on. There is: 1) uncertainty in the data, 2) uncertainty as a result visualization/translation, and 3) uncertainty through viewer interpretation. One of the reasons it does this really well is because it forces viewers to move through multiple data "scenarios" almost like a database narrative and thus work through the creation process in some way.

Narrative is a topic that's been prevelant in visualization research circles lately (Segel & Heer, 2010; Kosara & MacKinlay, 2013). Unfortunately, it seems to be being used in a way that ultimatley favors linearity as opposed to anything more complex and processual. But I wonder to what extent a more nuanced narrative model -- one that entails motivation, intentionality and uncertatiny -- might be a useful way for approaching the type of visualizations you and Elijah seem to be advocating.

Check out this easy-to-use viz tool put together at the CCL lab here: http://www.indiana.edu/~semantic/. It's part of their "Playground" and avaiable under the visualizations drop down (which will eventually take you over to a Google Code site that has the code and a wiki with great tutorials and info). I mocked up some network vizualizations really quickly, and while it's still in development, feel free to email Brent Kievit-Kylar if there's something you'd like to do but can't quite manage.

This is such a rich conversation, with several threads being pulled. I'm particularly interested in the iterative / in-progress possibilities by mobilizing data visualization, and layering networks with content. Not to mention sorting through the increasing number and functions of the tools available (some particularly good insights about these above).

What of time-based material? How does live or recorded media play into data visualization?

One generalization is that, in addition to layering information on top of time based media, information visualization often transforms time-based patterns into a spatial representation, or animates a sweep through different spaces (often depth) over time. When it comes to familiar time-based media such as film, these transformations become invisible as temporal visualizations pretty quickly. For example, the Internet Archive autmatically generates thumbnails as a time-sampled index of each of its videos, presenting them in a grid:

Strange, though. The film is already a time-sequence which was captured into a spatial arrangement of images ( a film reel), which we then transform into temporal juxtaposition (with a projector, projection later encoded in digital video), which we then sample back into spatial representation (a jpeg series). We could call this last step a form of media visualization. But really, it is space = time = space = time = space all the way down.

Hello HASTAC 2013! +City: ‘Public/Private – Playing in the Digital Sphere,’ will be running live & interactive through the Conference.

In +City’s latest data visualization series, ‘Public/Private – Playing in the Digital Sphere,’ +City’s research and practice investigates the troubled & unstable grey zone of how Twitter content in the digital public realm changes from public to private, depending on the context of use and the question and often, point of access.

Our data visualization modes will stream tagged tweets generated by the HASTAC community & conference attendees. In all of the data visualization modes, tweets and retweets are represented as (variously) individual bubbles, squares, and tweet-flowers, although each data visualization supports interaction in different ways.

As a series of ongoing, interrelated projects, our research now asks: What experiences are we designing when data visualizations function as interfaces? what does it mean to make ‘art’ with content pulled from the digital public realm, especially when Twitter users often list personal details (location, occupation, etc) on their pages? & profiles pics are just as likely to be head shots as custom avatars? What is/should be the borderline between the public & private digital spheres? What are the implications of data mining & the commercialization of digital content in the era of big data? What does it mean to resurrect archived content in a public interactive context? And to be able to search with twitter hashtag streams in real time?

Co-founded by Faisal Anwar and Siobhan O’Flynn, PlusCity Design creates projects that provide insights into social good, social change, art and innovation, with the goal to change how we perceive the interplay of physical and digital lives. Siobhan is an artist, researcher and consultant engaged in the design of interactive narrative experiences across media and disciplines. Faisal is an artist and interactive producer of hybrid cross-platform projects that use real-time live data, multilayer video projections, and social interactivity.

In my recent article "What is Visualization," I tried to describewhat I see as the most fundamental principles which run through the whole history of visual representation of quantitative (and more recently, categorical) data, from 18th century to today. This is the summary of my arguments:

"The first principle is reduction. Infovis uses graphical primitives such as points, strait lines, curves, and simple geometric shapes to stand in for objects and relations between them - regardless of whether these are people, their social relations, stock prices, income of nations, unemployment statistics, or anything else. By employing graphical primitives (or, to use the language of contemporary digital media, vector graphics), infovis is able to reveal patterns and structures in the data objects that these primitives represent. However, the price being paid for this power is extreme schematization. We throw away 99% of what is specific about each object to represent only 1%- in the hope of revealing patterns across this 1% of objects’ characteristics."

"The second principle: data visualization privileges spatial dimensions over other visual dimensions. All visualization techniques use spatial variables(position, size, shape, and more recently curvature of lines and movement) to represent key differences in the data and reveal most important patterns and relations. This is the second (after reduction) core principle of infovis practice as it was practiced for 300 years - from the very first line graphs (1711), bar charts (1786) and pie charts (1801) to their ubiquity today in all graphing software such as Excel, Numbers, Google Docs, OpenOffice, etc.

In other words, we map the properties of our data that we are most interested in into topology and geometry. Other less important properties of the objects are represented through different visual dimensions - tones, shading patterns, colors, or transparency of the graphical elements. "

What do you think? Agree or disagree? If you agree, how do these principles affect the use of visualization in humanities research and publications?

I agree with you about the priveleging of the spatial, and I've noticed it particularly because traditional network visualization that utilizes force-directed or non-anchored hierarchical layouts have problematic spatial results. I noted this "spatial problem" in my Interactive Introduction to Network Analysis and Representation ( http://dhs.stanford.edu/dh/networks/ ). I think the subversion of the spatial variable in network visualization might explain why there are so many critics of it (such as Ben Fry).

Also, when you refer to "geovisualization" in your paper, are you also including traditional cartographic principles? Maps as information visualization provide a longer and more formalized history of the practice.

Normally data visualization shows data as points, lines and other geometric elements. In our lab (softwarestudies.com), we have been working on methods and software to explore large image and video collection without such reduction. The idea is to sort a collection (interactively if hardware allows) in many ways, using both metadata (e.g., dates) and visual properies of images extracted by software.

Jumpling in this thread, I would distinguish our current & past +City data visualizations in that in using real-time data we are also interested in working with playful, dynamic forms that foreground our relationship to data/Twitter content in the near moment of transmission and as this data (Tweets/Retweets) becomes archived. In designing interactive data visualizations, we have deliberately built in the opportunity to reply in DV/app in the 3D DV.and replies of course, will also be captured and archived. As such the temporaI status of data is important as Tweets/Retweets recede in time and are dropped out of the Data Viz, though it/they remain searchable in the archive. Networked data, connectivity, flows, and the status of data in the digital public realm are all core concerns. I would put our investigations closer on a spectrum to the work being done by Trevor Paglen, Christian Marc Scmidt, and Jonathan Harris *& Seb Kamvar.

I am using information visualization in a variety of different ways in my research and practice, and the prompt for this forum (cases / tools / theory / pedagogy) is a good opportunity to reflect those ways and how they work.

Tools: In an ongoing grant collaboration with Lev Manovich, we are developing information visualization software tools for Mellon that are workflow-based -- batch processing analyzes large collections of images and then converts them into media visualizations (e.g. montages or image plots). The later stage of the project involves enabling researchers with specific case studies to use these tools with their projects, so this takes the form of a build-tools-then-do-cases project, rather than a choose-a-case-then-try-out-all-the-tools-at-hand approach.

A Case: By contrast, I'm currently finishing up a book project with two collaborators, Jessica Pressman and Mark Marino, and this coauthored manuscript uses infoviz in a very different way -- as a tool for close reading a single work of electronic literature, Poundstone's Project for Tachistoscope: Bottomless Pit. There, the emphasis is on using multiple visualization techniques (we tried over a dozen, but use four in the book) to gain insight into the visual logic and high-speed patterns in an approximately 9-minute-long cycling software-generated story/poem/film/animation. Think of how video analysis has been used on the Zapruder film of the Kennedy assasination -- that is, from a forensic perspective -- and you'll have some sense of how this project uses a few of the same formal techniques as the Mellon initiative, but to different purposes. Not genre studies or distant, large-scale recognition of cultural patterns, but close interpretation of millisecond-level details.

Pedagogy: In the classroom, I am using visualization in a few ways. Last quarter, my collaborator Zach Horton and I had a group of ~24 students explore, annotate, and analyze 24 years of The Simpsons, currently the longest running animated series on television (and longest running sitcom, and longest running scripted primetime series). In order to survey over 500 episodes in 10 weeks, the class divided up the show into seasons and also used a variety of collective annotation techniques, including the automatic generation of montages that created two poster images for each episode -- one with images samples ~3 seconds, and one with one image per shot. One of the interesting things about these montages was that, unlike the work for Mellon, they were not intended to be used to detect large-scale patterns and, unlike my book project, they weren't really intended for close reading. Instead, they acted as a visual index (like a book index or table of contents) to allow students to quickly look up scenes while reading each other's annotations and see a rich context for what is described in each annotation. The visual information was metadata to metadata.

(Big) Theory: One final thought, theory and research. My next upcoming visualization project has to do with mapping narrative structures in a collection of thousands of interactive print gamebooks (think "Choose Your Own Adventure") from the Demian Katz Gamebook Collection, recently donated to the UCSB Library. These branching-path books cover over six decades, a dozen languages and many more countries, and include many forms of genre fiction as well as non-fiction and experimental writing. So, how might we visualize the evolution of branching narrative structures in print over the past 60+ years? We have a long tradition of projects done in this area. Both authors and readers have been mapping such works since the first emerged, and there arae some beautiful artists projects -- one of the best known is a project by Christian Swinehart (CYOA). Still, admiring a profusion of complex snowflakes of data in infoviz multiples only gets us so far. Can visualization get us from 4000+ individual graphs to big statements about the trends, typical works and outliers that would be of interest to literature and game scholars? I think so....

Just saw this video making the rounds on Twitter and want to share it here. Some of it is over my head, but it's a really useful introduction to making data visible. I liked how he started with lower-tech visualizations on paper and then showing how the visualization and understanding is transformed through different kinds of data presentation.

Throughout the history of science, diagrams and graphs have been essential thinking tools. In the past, such visualizations were drawn with pen on paper, and could embrace the directness, freedom, and expressiveness of hand drawing. Most modern visualizations are programmed instead, where a single description can dynamically generate a unique picture for any dataset.

Today's tools offer the benefits of one or the other -- either directness or dynamics -- but not both. Photoshop and Illustrator allow direct-manipulation drawing of static pictures. D3, R, and Processing allow indirect-manipulation coding of dynamic pictures.

This talk presents a tool for drawing dynamic pictures -- creating data-driven visualizations, like D3, but via direct manipulation of the picture itself, like Illustrator.

Hey there. I started a group just for visualization topics because of this very reason...can you post this in that group too? I am trying to create an area where all of the neat and innovative stuff people post about visualization can also be found in one place.

Please join as well!

The group is called: Visualizing Data: Visual Technologies, Projects, News, and Practice

Thank you Tara for the link. Looks very promising. I can't help but get a bit frustrated that HASTAC users post things all over the place and there doesn’t seem to be a great way to curate them all into communities because it depends on 1. Participation and 2. Sorting out the groups. I know there are tags and categories to help but they seem to not be used to their full metadata potential...maybe that is a limitation of the server/software used.