**This text was part of an extinct chapter of Visual Complexity: Mapping Patterns of Information, which never saw the light of day. Instead of being forgotten in a dusty folder, I decided to make it available to the general public and invite any constructive criticism by our growing community. Hope you will find it useful.**

-

Data and information visualization are fundamentally about showing quantitative and qualitative information so that a viewer can see patterns, trends, or anomalies, constancy or variation, in ways that other forms – text and tables – do not allow.

Michael Friendly

-

The concept of visualization is certainly not new. Humans have been involved in the visual representation of information for more than 30,000 years. During this time, there has been a variety of portrayed subjects, many of them pertaining to natural phenomena, but the common underlying purpose of communicating a message has always been present. Whether we talk about cave paintings, cuneiforms, maps, or charts, we are always alluding to information in a quality of a message from a sender to one or more receivers. “The progress of civilization can be read in the invention of visual artifacts, from writing to mathematics, to maps, to printing, to diagrams, to visual computing.”[1], say Card, Mackinlay and Shneiderman. Historian Alfred W. Crosby attests to the importance of visual aids throughout the ages, by claiming that visualization and measurement were the two factors most responsible for the rapid development of all of modern science[2].

Even though visual artifacts have always been a central element in the history of humankind, over the last 25 years the term “visualization” has become immensely popular, being fragmented in a profusion of subfields, carrying a diversity of specialized labels such as Information Visualization, Data Visualization, Scientific Visualization, Software Visualization, Geographic Visualization, Knowledge Visualization, Flow Visualization, and even Music Visualization. Many of these areas emerged in the midst of existing parallel areas like Information Design, Information Graphics, and Visual Communication. The distinction between them is occasionally thin, and in some cases almost inexistent. This rich plethora of labels is certainly indicative of the outburst of a new practice, but one that is still struggling to define itself. While some consider this to be the birth of a new medium, or even a new science, the consensus on a definite descriptive label is not so obvious.

According to Michael Friendly, the renowned professor of Psychology at York University in Canada, information visualization is the broadest term that could be taken to include all the developments in visualization, since “almost anything, if sufficiently organized, is information of a sort: tables, graphs, maps and even text, whether static or dynamic, provide some means to see what lies within, determine the answer to a question, find relations, and perhaps apprehend things which could not be seen so readily in other forms.”[3] But even able to accommodate the broadest of scopes, information visualization has also been the definite title of a multidisciplinary field emerging out of the computer science community in the late 1980s.

Originally coined by Jock Mackinlay and his User Interface Research Group at Xerox PARC in 1986, information visualization relates to the “use of computer-supported, interactive, visual representations of abstract data to amplify cognition”[4]. It’s in essence a computer-driven transformation of abstract data (distinct from physical data – the earth, molecules, cells, human body, etc) into an interactive visual depiction aiming at insight – which in turn translates into “discovery, decision-making, and explanation”[5]. Congregating a vast body of research from computer science, human-computer interaction, communication design, cognitive psychology, semiotics, statistical graphics, cartography, and art, modern information visualization surfaced from advances in computer graphics and was further consolidated in 1987, when the NSF Panel on Graphics, Image Processing, and Workstations published its landmark report Visualization in Scientific Computing. Since then, information visualization has grown considerably as an independent discipline, fostered by many conferences and workshops dedicated to the topic, particularly the prominent IEEE Computer Society symposium on Information Visualization, known as the InfoVis conference, first held in 1995.

With roughly two decades, information visualization has already been the target of some criticism and dismissal. Most of it comes from an inadequacy of the field to swiftly adapt to recent changes, caused by a large adoption from eager art and design communities and an escalating curiosity from media, advertising, and publishing. As a close-knit group, naturally inclined towards the computer science community, as a result of its own heritage, information visualization must take a stance to either adjust to these changes and fully accept its growing popularity, or instead, remain a niche inward-looking academic practice. Some signs of an embrace between traditional circles and the new wave of enthusiasts are already starting to surface, and this initial hesitation might simply go down in history as the normal shyness of a first date. Nonetheless, it is not surprising that under the present uncertainty, some voices have come forward suggesting new terms and definitions. Ben Fry in his PhD thesis defended a new label called “Computational Information Design”, able to properly integrate information visualization, data mining and graphic design, while Robert Kosara is a promoter of “Visual Analytics”, with a stronger emphasis on analytical reasoning. While many of the arguments for new labels reinforcing specific scientific or design concerns are certainly valid, there’s a major concern of an excessive breakup of a field that’s still defining itself.

Instead of trying to devise new titles for alternative branches highlighting a particular area of focus, the effort should be in creating a bridge between the existing body of research and the abundance of novel demands, in an attempt to revise and renovate the field, steering information visualization into a mature, integrated, and in demand hotspot. If willing to adapt, the field is broad enough to fully encompass most requirements, from a stronger prominence of design to a reinforced attention to analytics. This doesn’t mean the discipline can incorporate any attempt at visualizing data. But in essence, all interactive visual representations, able to make the depicted subject more intelligible and transparent, or find a new explicit insight within it, can and should be embraced by information visualization.

Unified Framework

Information visualization is well known for its multidisciplinary nature, assembling people from a vast assortment of backgrounds, but notwithstanding the contribution of innumerous disciplines, we can still highlight three main spheres of activity that best characterize its key attributes and capabilities. Readers familiarized with research publications in the field will find this conception slightly different from previous frameworks developed by Stuart Card, Jock Mackinlay, Ben Shneiderman, and Ed Chi. The deliberate intent of this reframing is to emphasize the leading role of design, in both visual and interactive choices, and the fundamental function of statistics and data mining. This is ultimately an integrating, yet diverse framework, keeping alive the heterogeneous nature of the discipline. Here we describe the three central layers of information visualization: Data Transformation, Visual Mapping and Interactive Framing. Even though there’s a natural progression between the three stages that doesn’t mean they sustain in a fixed order. There’s a lot of refinement taking place in a continuous iterative process that forces each step to be occasionally revisited.

Data Transformation

This is the very first stage in the development of any information visualization project. Without data no visualization would even be possible, hence everything starts by attaining access to a particular dataset relevant to the project’s pursuit. After getting hold of the data, what follows is a long process of data analysis, which includes inspecting, cleaning, filtering, and parsing the data, while organizing the relevant parts and removing the irrelevant. The subsequent process of data mining is crucial in order to have a better understanding of the natural affordances of the dataset. It encompasses a series of queries and algorithms in order to extract particular patterns in the data for some quick modeling and visualization tests, which will be of great importance in the build up of the second stage. Data transformation is the essential foundation of a successful execution, and covers areas like programming, statistics, data analysis, data mining, analytics, and machine learning.

Visual mapping

Visual mapping is a critical step in information visualization, where data finally comes to life through a deliberate visual form. It takes into consideration key factors like top-to-bottom hierarchy, color, legibility, typeface, contrast, spacing, position, size, shape, orientation, layout, and depth. This central task contemplates not only individual views or modules, but also the composition of the entire contiguous environment. The choice of a particular method (or methods) is tied with the specific goal of the piece – its intrinsic purpose – and might be defined a priori or during project development, as the natural affordances of the data come into place. It’s also highly dependent on end users, their immediate context and expressed needs – when, where, and how the final execution will be used. Visual mapping is tied with various areas of visual design, including graphic design, information design, interface design, visual perception, cognitive psychology, aesthetics, and typography. Furthermore, it’s essentially made of two components: graphical objects and textual objects.

Interactive Framing

Information visualization is ultimately a discovery tool, and interactivity provides the final coalescing layer for exploration. “Visual representations and interaction techniques take advantage of the human eye’s broad bandwidth pathway into the mind to allow users to see, explore, and understand large amounts of information at once”[6], elucidate James Thomas and Kristin Cook, and they further explain, “Visual representations alone cannot satisfy analytical needs. Interaction techniques are required to support the dialogue between the analyst and the data. While basic interactions such as search techniques are common in software today, more sophisticated interactions are also needed to support the analytical reasoning process.”[7]

Some don’t see the clear-cut need for interaction in information visualization, so it’s important to clarify this assertion. In a broader definition of visualization, it’s broadly consensual that information can be successfully conveyed in either static or interactive executions. However, we have to question what really sets information visualization apart from other parallel fields such as information design or information graphics. It’s in fact its computer-supported interactive nature that truly makes it distinct, and this unique offering becomes imperative as the degree of complexity of the portrayed system increases. The representation of complex networks is just an instance where interactivity is vital. Coupled with a relevant time-variant dataset, interactivity can also be a critical driver in a shift from short-term casual engagement to long-term active engagement, substantiating information visualization as a significant tool for exploration.

But interactive framing is not limited to the constraints of a computer screen. It covers any responsive visualization where a two-way communication between user and layout is established, from reactive surfaces to highly immersive visualization environments. This ultimate unifying layer is critical for explorative analysis, enabling users to inquire, filter, manipulate, reshape, and examine the visual outcome in order to identify properties, relationships, regularities, or patterns. Finally, it’s important to elucidate that even though interactivity is a central component of information visualization, the field doesn’t aim at replacing static depictions of information, since they can successfully complement each other. It simply provides an alternative, yet extremely powerful medium.

Structural Foundation

Even though there is a widespread consensus on its qualifications, information visualization, as a recent emergent field, still lacks a structural foundation able to uphold and expand its projection well into the future. We cannot consciously claim to be a new medium or a new science, when innumerous questions are still unresolved. It is critical for such an introspection effort to happen without delay, since there’s too much work to be done, and once we all agree on what we do as a community, it will be easier for external parties to recognize the goals and boundaries of our discipline. It’s obvious that we are still pulling together the different parts that make this practice and trying to understand when best to use them, but in order for information visualization to take the next step, and grow into a cohesive field of study, it requires the consolidation of three critical components:

Theory

Assemble a clear underlying theory able to combine many of the learnings, knowledge and insights from the variety of disciplines that make information visualization. If recent years have been marked by a significant profusion of new projects, this sturdy practice needs to be sustained by a reliable system of ideas and ideological principles. The purpose is to ultimately provide a broad consensual framework able to evaluate past, present and future endeavors. The current unguided exploration is by no means detrimental, since it’s the perfect setup for innovation to sprout, however, if the discipline wishes to mature as a reliable knowledge domain, it needs a supporting body of theory capable of accommodating all recent advances. Cognitive psychology might be one of the most reliable instruments in the edification of such a system, able to easily translate cognitive behaviors into objective design principles. A theory of information visualization will have to embrace diversity, and consequently several theories might need to coexist in opposition to one universal all-encompassing framework.

Taxonomy

Define the spectrum of representational methods and techniques of information visualization. The central aim should be to consolidate and further exemplify, by recognizing the different data types and structures that underlie a common typology of patterns. Chaomei Chen, an important figure in the field, asserts on this current call to arms: “a taxonomy of information visualization is needed so that designers can select appropriate techniques to meet given requirements”[8]. This is not meant to be a fixed and definite taxonomy, but an evolving, ever-growing, ever-expanding endeavor. This effort doesn’t contemplate a mere collection of techniques either; it should foremost supply a set of foundational principles able to guide present and future practitioners. Some initial steps in the description of common information visualization patterns have started to arise, but we still have a long way to go.

Evaluation

Provide easy evaluation methodologies for existing tools and approaches. Information visualization requires a common rule system that can accordingly distinguish the good from the bad, the appropriate from the inappropriate, the usable from the unusable, the effective from the ineffective. Case studies and success stories are a great first step in this direction. If information visualization is a vehicle for evidence and clarity, it should embrace the same ideology in the definition of its own practice, by creating a systematic body of analysis able to properly evaluate the success of any project. Quantitative and qualitative evaluation methods should be welcomed, including observational studies, participatory assessment, usability testing, contextual interviews, and user feedback. This effort should, most importantly, go hands in hands with the development of an adequate language of criticism.

In the Preface of Visual Complexity: Mapping Patterns of information, I exposed my astonishment with the amount of dead links and error messages encountered while reviewing projects to feature in the book. It’s therefore not surprising that preserving many of these projects for posterity became a central drive for the book’s completion. This took an even more serious tone when I started digging deeper into an unsettling prospect commonly referred to as the Digital Dark Age. This expression essentially contemplates a future scenario where it will be difficult or impossible to read historical documents or artifacts, because they have been stored in an obsolete digital format. Even though this is a widespread dilemma of modern technology, affecting a variety of knowledge domains, when it comes to information visualization, the possibility of many present-day digital projects vanishing within a few decades is a considerable worrying prospect.

As I researched many of the projects to showcase in the book, I was surprised to find that it was easier to retrieve an illustration from Joachim of Fiore, produced 800 years ago, than to attain an image of a visualization of web routers, developed in 2001. As I expressed in the Preface:

The reasons for the disappearance are never the same. In most instances, pieces are simply neglected over time, with authors not bothering to update the code, rendering it obsolete. In other cases, the plug-in version might become incapable of reading older formats or the API from an early dataset source might change, making it extremely difficult to reuse the code that generated the original visualization. Lastly, projects are occasionally moved into different folders or domains or just taken down from the servers, simply because they highlight an outdated model that does not fit the current ambitions of their respective author or company.

Just yesterday while researching for meaningful tree visualizations, the project Ecotonoha came to mind, as it always does, so many times I lost count. Sponsored by NEC and developed by Yugo Nakamura in 2003, Ecotonoha was a project to nurture a virtual tree collaboratively, and at the same time contribute to the actual environment to cope with global warming. More important, Ecotonoha has been a major inspiration for several artists/designers over the last decade and influenced numerous new media projects. If you try to access its website today, there’s a simple message that reads:

The Ecotonoha campaign launched in 2003 has come to an end. We thank you very much for participating in this initiative. During the campaign, 7,423 trees were planted based on messages generated. We believe that your messages in shape of these trees will contribute greatly to sustaining the earth. We hope to have your continued support in our activities and to the conservation of the earth.

But Ecotonoha is a success story in the current landscape. Most online visualization projects have a much shorter lifespan, and very few will reach Ecotonoha’s milestone of 8 years. Overall, this digital laissez-faire contributes to the ephemeral nature of most online artifacts, and consequently the whole field suffers from memory loss.

New York Times - Timelapse

A few months back I saw a canny post by Philip Vieira, which made me rethink about the dangers of our current digital laissez-faire. Due to an errant cron task that ran twice an hour from September 2010 to July 2011, Philip Vieira, a developer based in Toronto, Canada, accidentally collected 12,000 screenshots of the front page of the nytimes.com. With this rich content at hand, Philip created a time-lapse video showing the dinamic, ever-changing nature of the New York Times online frontpage over months. The result was utterly fascinating and absorbing, but it also led Philip to equate how how much is being lost, every minute of the day, across numeral digital artifacts.

As Philip Vieira expresses on his post:

Having worked with and developed on a number of content management systems I can tell you that as a rule of thumb no one is storing their frontpage layout data. It’s all gone, and once newspapers shutter their physical distribution operations I get this feeling that we’re no longer going to have a comprehensive archive of how our news-sources of note looked on a daily basis.

His concern is valid and entirely in line with mine:

This, in my humble opinion, is a tragedy because in many ways our frontpages are summaries of our perspectives and our preconceptions. They store what we thought was important, in a way that is easy and quick to parse and extremely valuable for any future generations wishing to study our time period.

Digital Archeology

Of course many others are also concerned with the prospect of a Digital Dark Age. In October 2010 the exhibit Digital Archeology opened in London as part of Internet Week Europe, with the primary purpose of harvesting and uncovering dozens of websites created in the last 20 years. As the organizers state on their website:

Over this short time, technological and communications developments have been so fast that the groundbreaking work of the early creative pioneers, produced on now defunct hardware and software, have disappeared almost as soon as they appeared, like Mayflies in spring doomed to die as the daylight fades.

Concerned that “the evidence of this explosion of creativity may be consigned to digital oblivion”, this exhibit is timely and extremely relevant:

Soon we will know less about these HTML blossomings than we do about the relief carvings in Mohenjo-Daro or the Yucatán. While they helped define our new culture, almost none of the websites of less than two decades ago can be seen at all. Today, when almost a quarter of the earth’s population is online, this most recent artistic, commercial and social history is being wiped from the face of earth and a hundred million hard drives lie festering in recycling yards or rusting in landfills.

The Deleted City

Another recent, and even more evocative project on this topic is The Deleted City. The installation is an interactive visualization of a 650 gigabyte backup of Geocities made by the Archive Team on October 27, 2009. It depicts the file system as a city map, spatially arranging the different neighborhoods and individual lots based on the number of files they contain. As the authors explain:

Around the turn of the century, Geocities had tens of millions of “homesteaders” as the digital tennants were called and was bought by Yahoo! for three and a half billion dollars. Ten years later in 2009, as other metaphors of the internet (such as the social network) had taken over, and the homesteaders had left their properties vacant after migrating toFacebook, Geocities was shutdown and deleted.

In an heroic effort to preserve 10 years of collaborative work by 35 million people, the Archive Team made a backup of the site just before it shut down. The resulting 650 Gigabyte bittorrent file is the digital Pompeii that is the subject of an interactive excavation that allows you to wander through an episode of recent online history.

The need to preserve

Today there are numerous cuneiform records - one of the earliest known forms of written expression, some 6,000 years old - which communicate a great number of insights about Sumerian, Assyrian, and Babylonian cultures and societies. Can we safely guarantee that some, if any, modern-day digital artifacts will last this long?

I’m not surprised by the news of Amazon’s e-book sales surpassing printed ones, or by any recent story on the conversion of atoms into bits. As Benny Landa once said in respect to this inevitable progress: “Everything that can be digital, will be”. I’m not concerned with mass digitization, I’m simply fearful we are not making enough effort to preserve it. After all, what good is all this information if we cannot safely guard it for future generations?

Many readers of Visual Complexity: Mapping Patterns of Information have been wondering about the cover design and its underlying meaning. Since there’s no information about the piece in the book, primarily due to an oversight that will be fixed in a following edition, here’s a bit of an explanation.

I always wanted to feature a visualization or bespoken illustration in the cover, and it would have to be something related to the book and its content. Talking to a friend of mine a while back, the idea of creating a piece based on the book’s body of text came up. I thought that was definitely the way to go and started looking closely at some of my favorite textual visualizations out there.

It took me a while to decide on the appropriate style and method I wanted to feature, but then I payed a closer look at the amazing work of Boris Müller. For those who are not familiar with his work, Boris has created many great projects in the past, including Connected Communities and Knowledge Maps. But I have always been particularly impressed by his remarkable Poetry on the Road series. Since 2002, Boris has been commissioned to design a visual theme for the Poetry on the Road international literature festival, which is held every year in Bremen, Germany. For the various editions of the festival, Boris has created a poster with rich visual graphics generated by a computer program that turns selected poems of the participants into striking compositions. Every year has a different theme, and some are truly outstanding (e.g. 2003, 2006).

When I approached Boris to do something similar for the cover of Visual Complexity he was immediately on board. He ended up providing a simple java app that could be used with any text to generate a similar visual output to the one he created for Poetry 2008. I started playing around with it and this was the very first set of experiments:

There was still no information on the cover, it was pure visual exploration at this stage. I started by depicting individual chapters, simply because it was more manageable and easier to grasp the type of outcome provided by the app. Below is a second group of tryouts using different colors and including the title and author’s name.

In order to reveal a bit more diversity, we also explored different colors in the same composition and one unique visualization featuring all seven chapters of the book.

Finally, and after a long discussion between myself and the design and marketing departments at Princeton Architectural Press, we finally agreed on the last version of the cover, this one including the entire seven chapters, or roughly 35,558 words. The final printed outcome has exceeded my expectations and sometimes it is easy to forget how much time, sweat, love, and dedication goes into a book cover.

In case you are still wondering how everything works, here’s an extended description to be featured in a later edition of the book:

“Visualization featuring all 35,558 words displayed in the entire book, spread across its seven chapters. It was built by sorting all words based on their frequency in the text and representing them as lines. Lines are grouped in seven horizontal bands, representative of all chapters, from top to bottom, chapter 1 to chapter 7. Thicker lines depict most frequent words, which are placed on the left hand side of the diagram. As words are repeated across different chapters their lines flow vertically from one band to the other.”

Here is a close-up of the final product:

And the book being displayed in two bookstores in NYC, respectively Strand (left) and St. Mark’s (right):