Bulletin Issues

Feedback

Bulletin, June/July 2009

Information Visualization Services in a Library?
A Public Health Study

by Barrie Hayes, Hong Yi and Andrés Villavecesc

Barrie Hayes, bioinformatics librarian and Collaboration Center manager in the Biomedical Informatics Research Support and Training Program of the Health Sciences Library at the University of North Carolina at Chapel Hill, can be reached at barrie_hayes<at>unc.edu.

Hong Yi, senior research software developer in the Renaissance Computing Institute at the University of North Carolina at Chapel Hill, can be reached at Hyi<at>email.unc.edu.

Andrés Villaveces, research assistant professor in the Department of Epidemiology and the Injury Prevention Research Center at the University of North Carolina at Chapel Hill, can be reached at avillav<at>unc.edu.

The role of information visualization in enhancing communication, understanding and discovery of ideas is well documented
[1]. While information visualization capabilities have existed in some form for centuries
[2], providing these capabilities in an academic library is new. At their best, visualization applications are powerful explanation, decision-making and discovery tools for researchers. The library is a logical hub of information expertise for researchers to be able to access and utilize these resources. In 2004 the Health Sciences Library (HSL) at the University of North Carolina at Chapel Hill (UNC)
[3] partnered with the Renaissance Computing Institute (RENCI)
[4] to build and provide visualization infrastructure and expertise as one of the keystone services of its new Collaboration Center.

The center opened in July 2005, the first of a network of (now) seven RENCI “engagement sites” across North Carolina. At HSL and the other engagement sites, researchers can consult and collaborate with visualization experts to develop custom applications to render data in ways that communicate information, answer questions, support analysis, reveal patterns and facilitate new questions or discoveries. Surrounded by the five UNC health professional schools and UNC hospitals and with biology, biochemistry and computer science buildings in close proximity, the HSL offers a collaborative, neutral visualization venue central to UNC’s extensive biomedical campus.

The center’s visualization resources include a display wall (Figure 1) with a 10 ft. x 8 ft. rear-projection display screen capable of 12.5 million pixel resolution (4,096 x 3,072 pixel native resolution). The RENCI-HSL wall is compatible with both Linux and Windows applications, thereby broadening its accessibility to common researcher platform preferences. The ability of researchers to interact with very high-resolution imagery inches from the screen on the display wall provides opportunities for new collaborative applications.

Researchers’ data visualization needs vary in their complexity. In some cases, researchers may wish to utilize existing open source or proprietary visualization tools installed on the display wall server to view and analyze their data on the large-scale, high-resolution screen in order to enhance visibility and detail. Often, however, researchers have specialized data and visualization needs that require collaboration with Collaboration Center staff, particularly the HSL-based RENCI visualization researcher, to develop a custom application to render their data. The center’s recent project with Andrés Villaveces of the UNC Department of Epidemiology and the Injury Prevention Research Center exemplifies this latter, optimal utilization of HSL-RENCI visualization resources.

A Data-Visualization Need
The visualization application development project with Dr. Villaveces arose from his need to show longitudinal injury-prevention data in alternative ways. Specifically, he wanted to be able to see a more dynamic view of the changes in injury occurrence over time given a specific intervention or a set of interventions.

The original idea of visualizing his injury data in a different way occurred to Dr. Villaveces when he learned about software created in Sweden by Hans Rosling, a professor of global health at the Karolinska Institutet. This software, originally known as the “World Health Chart,” used an interactive view to demonstrate changes in selected health variables over time, by country and by World Health Organization regions of the world. For example, one could plot infant mortality rates with several socioeconomic indicators, such as gross domestic product (GDP), and see how each of these variables changed dynamically in a graph as time (in years) advanced. One could compare changes in a health variable like infant mortality over time within a single country, among countries within a region of the world or in selected countries. The World Health Chart software incorporated a pre-loaded set of variables for users to view. A later version allowed users to import limited information. Subsequent developments of this software led to the creation of the Trendalyzer® software owned by
Gapminder.org [5]. This package was free software when Google purchased it in March 2007.A simple version, included as a Google gadget
[6], currently exists and is called the “Google Motion Chart®.”

As a researcher interested in injury prevention and working for the World Health Organization in 2002, Dr. Villaveces found that the Trendalyzer software would be useful for looking at trends in injuries over time and by country, even though the original Trendalyzer data set contained no injury data. Unfortunately, successful gathering of this injury information was slow and the opportunities to upload the available injury data into Trendalyzer’s existing structure were difficult to arrange with Gapminder.org. In addition, for the purposes of his research, Dr. Villaveces was interested in more flexibility than this application provided at the time. For example, earlier Trendalyzer versions displayed data by country only and could not display data within countries or by other categories of comparison. Time units were also limited to years.

Dr. Villaveces approached the HSL Collaboration Center and RENCI about his data visualization interests in March 2008. In April, he began working with Hong Yi, the RENCI visualization researcher at the center, on a project to create software that emulated the superb visual characteristics of Trendalyzer while providing the greater flexibility in variable selection that his specific research data required. Specifically, Dr. Villaveces needed to be able to select any of his injury-related data variables for plotting and visual display over various units of time and to compare these data not only across countries or territories of the world as provided by Gapminder’s Trendalyzer software, but also across institutions, days of the week or a number of other meaningful categorical units. The DataVIZ3Dapplication was born out of this data visualization need and the resulting collaboration.

Development of the DataVIZ3DVisualization Tool
Given Dr. Villaveces data visualization needs and his experience with early versions of Trendalyzer, he and Dr. Yi determined in their initial meeting that extension of this software would be a good starting point for this collaborative project. The Trendalyzer software was originally open source, but after its acquisition by Google in 2007, the source code was no longer available even though the software is freely available via
Gapminder.org to use and view various statistical world health charts and economic indicator trends. Google also published the API (application programming interface) of this visualization software for web developers to incorporate this visualization-software gadget into their own web applications. Fortunately, Dr. Villaveces had downloaded an old 2003 version of Trendalyzer software prior to its acquisition by Google. However, this old version written in Flash Action Script only included partial source code. While this older version could serve as a visual model for tool development, the limited source code was far from sufficient to extend its functionality. Consequently, Dr. Yi developed a data visualization tool from scratch that not only included the visualization functionality of Trendalyzer, but also provided much greater flexibility and freedom for the users to customize any visualization variables according to their specific needs. From the outset, a fundamental design principle for this data visualization tool was to make the tool as generally applicable as possible to any statistical data without limitation to any particular type of data.

Development of the DataVIZ3D visualization tool has progressed iteratively. After obtaining initial injury datasets from Dr. Villaveces, Dr. Yi designed and developed an initial prototype. This prototype strictly adhered to the design principle of general data applicability described above. Based on the initial datasets to be visualized and the general visualization needs outlined by Dr. Villaveces, the initial prototype referenced Trendalyzer’s time-varied dynamic bubble visualization format (Figure 2) while providing the user much greater flexibility to customize how variables were presented. For example, in the dynamic bubble visualization display of data, users can customize which data variables are represented by any of the visual presentation motifs including bubble names, bubble colors, bubble grouping variable, correlation variables and animation axis variable (not being restricted to a time variable). In addition, every color used in the visualizations can be customized and modified by users to accommodate their aesthetic preferences.

Figure 2. Dynamic bubble visualization of data: “All Injury Rates with GDP per Capita for Countries in Different WHO Regions.” Presents how these variables change over time (years).

How this visualization represents data: Each bubble represents a country. Bubble size indicates population size. Bubble color indicates WHO (World Health Organization) region that country belongs to. Current year is indicated in the background. Vertical axis is injury mortality rate per 100,000 population; horizontal axis is GDP per capita. Visualization can be “played” forward or backward in time (by years) and bubbles/countries will change position on the graph indicating changes in injury rate (y position) and GDP per capita (x position) and will change in size as their populations change over time. Visualization can select single or groups of countries and trace their trajectories over time.

In addition to the original Trendalyzer bubble visualization format, DataVIZ3D also includes a new more static time series curve and line-based visualization format. Rather than using a dynamically moving image or bubble to compare variables over time, this format allows the user to compare variables in different categories over time by manually moving the cursor back and forth through the time series. This time series format can depict risk estimates and confidence intervals and simultaneously incorporate measures such as interventions or additional variables of interest so that viewers can see changes in risk estimates linked to a group of interventions. A typical use could be to dynamically evaluate the progress of an epidemic over time, given a set of interventions (Figure 3).

With these two data display options in place, the initial prototype was demonstrated for Dr. Villaveces on the HSL Collaboration Center display wall. Dr. Villaveces was very pleased and excited to see his data visualized for the first time, and he provided a great deal of constructive feedback to further improve the visualization tool’s ability to effectively present his data. Dr. Yi incorporated all of Dr. Villaveces’ feedback into the next version of the DataVIZ3D tool prototype and demonstrated the improvements in a subsequent review meeting.

Figure 3. Longitudinal time series visualization of data: “Incidence rate of fatal firearm injury events per 100,000 person-years for the general population at different points in a 24 hour period and at various intervention points”

How this visualization represents data: Time intervals on the horizontal axis; incidence rate on the vertical. Visualization includes display of data confidence intervals and intervention times, and covers a period of approximately 90 days in this view. Pink boxes represent interventions; blue, holidays; and orange, paydays.

This iterative development process continued for several rounds until Dr. Villaveces determined that the DataVIZ3D tool prototype satisfied his requirements. During latter stages of this iterative process, a visualization technique called “parallel coordinates” in both its standard two-dimensional (2D) form and the extended three-dimensional (3D) form was also incorporated into the DataVIZ3D tool to allow users to look at general overall statistical correlations between multiple variables. Parallel coordinates, in its standard 2D form, is a technique used for analyzing multivariate data by mapping each data variable to parallel axes so that each multivariate data item is displayed as a series of connected line segments intersecting all axes. This simpler visualization format allows viewers to compare correlations of rates that have changed over time. A typical application would be to see in a general way how rates change before and after a specific intervention or policy is implemented in an institution, city, state, country or any other meaningful category of comparison (Figure 4).

How this visualization represents data: The left and right axes show whether a state in the United States has implemented an administrative license revocation law (alrlaw = 0 or 1 on the left axis) or a blood alcohol concentration law (baclaw = 0 or 1 on the right axis). The central axis (acount) shows the count of alcohol-related fatal motor vehicle injuries, which is generally lower when states have implemented both laws and generally higher in years these states do not have these laws in place.

The parallel coordinates technique can also be extended to 3D to allow simultaneous one-to-one relation analysis between the data variable mapped to a central cylindrical axis and each of other data variables mapped to peripheral cylindrical axes arranged along a circumference at equal distances from the central axis. For example, the 3D parallel coordinates visualization technique in the DataVIZ3D tool allows Dr. Villaveces to simultaneously look at the statistical impact of the implementation of each of 8 different laws on alcohol-related fatalities (Figure 5).

Figure 5. Three-dimensional parallel coordinate visualization of data: “Simultaneous view of the statistical impact of implementing each of eight specific alcohol-related laws (peripheral axes) on injury events (central axis) in the United States”

How this visualization represents data: The visualization shows that all eight law implementations reduce alcohol-related fatalities. In this graphic the “0 = no law” case is at the top of column and the “1= law-implemented” case is at the bottom. The white lines therefore represent the count of alcohol-related fatal motor vehicle injuries with a particular law in place and the green, without it.

From initial design and prototype construction through sequential demonstration-feedback-improvement cycles, our iterative and collaborative process has been a successful approach for this DataVIZ3D tool development project. We have found that researchers usually can better explicate their specific requirements and needs after seeing their data visualized and animated in an initial prototype. In addition, discussions during a prototype demonstration are much more effective and typically generate more specific feedback and subsequently more fruitful outcomes than solely abstract idea-based discussions. Another key success of this DataVIZ3D tool development process has been to establish the design principle of generalizability prior to development to provide the necessary foundation for the later tool development and extension. This fundamental design principle enables the DataVIZ3D tool to be extended to work with diverse data sets with different visualization needs and gives users flexibility to fully customize all visualization variables to fit their specific requirements and preferences.

Next Steps
Initial next steps for the DataVIZ3D software include expanding the same iterative process used for development beyond the development team through presentations of the tool to large audiences of researchers and other potential users. Our goal is to gather more feedback about useful and desirable improvements to the software’s functionality and visual display quality, as well as about potential data applications for the tool. One example of additional functionality that we expect may significantly strengthen the tool is the inclusion of statistical analysis capabilities alongside DataVIZ3D’s didactic data presentation functions. We will explore the desirability and usefulness of building such capabilities into the tool through demonstration-based discussions with Dr. Villaveces’ public health colleagues and other researchers. Another extension to the tool’s functionality we would like to discuss with potential DataVIZ3D users is expanding the tool’s compatibility with other data formats besides Microsoft Access, such as Excel or delimited text (Table 1). These sorts of functional and compatibility enhancements, as well as aesthetic improvements to the data presentation and visual display, provide an endless source of development possibilities.

Programming language

Java

Input formats

Currently Microsoft Access database (mdb), but can be extended to a variety of other formats such as Excel and other database formats.

Platform

Platform-independent and should run on Windows, Mac and Linux. However, since Microsoft Access database format can only be accessed on Windows, this input format constraint restricts the tool from running on other platforms. This restriction can be overcome by supporting other data input formats that work on multiple platforms.

Table 1. DataVIZ3D Current Technical Specifications

To realize many of these enhancement possibilities with the DataVIZ3D tool, we will pursue grant funding to support extended post-prototype development activities. In the near future, we plan to submit grant proposals for internal funding opportunities provided through both the Injury Prevention Research Center and the Gillings School of Public Health where Dr. Villaveces is a faculty member. Beyond internal funding opportunities, we will also investigate external funding opportunities with federal agencies such as the Centers for Disease Control and the National Institutes of Health.

Ultimately, the DataVIZ3D software was developed to aid researchers and users in their understanding and use of data via visually creative formats that can facilitate better understanding of public health and other information at a local, regional, national or global level. Like the Gapminder Trendalyzer tool it draws and improves upon, the DataVIZ3D software is intended to be entirely free and open source. With some additional enhancements, we expect to make it available for use and improvement by others. The DataVIZ3D software has the potential to become a highly useful pedagogical tool. Our hope is that it will fulfill that potential by facilitating enhanced understanding of research data and other scientific findings through effective visualization. Finally, it is also our hope that the tool successfully demonstrates the power of visualization tools to aid in information communication and discovery and engages UNC researchers to take advantage of visualization services and resources at the HSL and elsewhere on campus.