Through the Visualization

Transcription

1 Article development led by queue.acm.org doi:./. A survey of powerful ualization techniques, from the obvious to the obscure. by Jeffrey Heer, Michael Bostock, and Vadim ogievetsky A Tour Through the Zoo T hanks to ADVA nces in sensing, networking, and management, our society is producing digital information at an astonishing rate. According to one estimate, in alone we will generate, exabytes million times the content of the Library of Congress. Within this deluge of lies a wealth help engage more diverse audiences in exploration and analysis. The challenge is to create effective and engaging ualizations that are appropriate to the. Creating a ualization requires a number of nuanced judgments. One must determine which questions to ask, identify the appropriate, and select effective ual encodings to map values to ical features such as position, size, shape, and color. The challenge is that for any given set the number of ual encodings and thus the space of possible ualization designs is extremely large. To guide this process, computer scientists, psyof valuable information on how we conduct our businesses, governments, and personal lives. To put the information to good use, we must find ways to explore, relate, and communicate the meaningfully. The goal of ualization is to aid our understanding of by leveraging the human ual system s highly tuned ability to see patterns, spot trends, and identify outliers. Well-designed ual representations can replace cognitive calculations with simple perceptual inferences and improve comprehension, memory, and decision making. By making more accessible and appealing, ual representations may also june vol. no. communications of the acm

2 practice Time-Series : Figure a. Index chart of selected technology stocks,. Gain / Loss Factor.x.x.x.x.x.x -.x Jan Time-Series : Figure b. Stacked of unemployed U.S. workers by industry,. AAPL AMZN GOOG IBM MSFT S&P Source: Yahoo! Finance; Agriculture Business services Construction Education and Health Finance Government Information Leisure and hospitality Manufacturing Mining and Extraction Other Self-employed Transportation and Utilities Wholesale and Retail Trade Source: U.S. Bureau of Labor Statistics; Time-Series : Figure c. Small multiples of unemployed U.S. workers, normalized by industry,. Self-employed Other Education and Health Finance Transportation and Utilities Manufacturing Mining and Extraction Agriculture Leisure and hospitality Business services Information Wholesale and Retail Trade Construction Government Time-Series : Figure d. Horizon s of U.S. unemployment rate,. Source: U.S. Bureau of Labor Statistics; Source: U.S. Bureau of Labor Statistics; chologists, and statisticians have studied how well different encodings facilitate the comprehension of types such as numbers, categories, and networks. For example, ical perception experiments find that spatial position (as in a scatter plot or bar chart) leads to the most accurate decoding of numerical and is generally preferable to ual variables such as angle, one-dimensional length, two-dimensional area, three-dimensional volume, and color saturation. Thus, it should be no surprise that the most common ics, including bar charts, line charts, and scatter plots, use position encodings. Our understanding of ical perception remains incomplete, however, and must appropriately be balanced with interaction design and aesthetics. This article provides a brief tour through the ualization zoo, showcasing techniques for ualizing and interacting with diverse sets. In many situations, simple ics will not only suffice, they may also be preferable. Here we focus on a few of the more sophisticated and unusual techniques that deal with complex sets. After all, you don t go to the zoo to see chihuahuas and raccoons; you go to admire the majestic polar bear, the graceful zebra, and the terrifying Sumatran tiger. Analogously, we cover some of the more exotic (but practically useful) forms of ual representation, starting with one of the most common, time-series ; continuing on to statistical and maps; and then completing the tour with hierarchies and networks. Along the way, bear in mind that all ualizations share a common DNA a set of mappings between properties and ual attributes such as position, size, shape, and color and that customized species of ualization might always be constructed by varying these encodings. Each ualization shown here is accompanied by an online interactive example that can be viewed at the URL ed beneath it. The live examples were created using Proto, an open source language for Web-based ualization. To learn more about how a ualization was made (or to copy and paste it for your own use), see the online version of this article available on the ACM Queue site at communications of the acm june vol. no.

3 practice acm.org/detail.cfm?id=/. All example source code is released into the public domain and has no restrictions on reuse or modification. Note, however, that these examples will work only on a modern, standards-compliant browser supporting scalable vector ics (SVG). Supported browsers include recent versions of Firefox, Safari, Chrome, and Opera. Unfortunately, Internet Explorer and earlier versions do not support SVG and so cannot be used to view the interactive examples. Time-Series Sets of values changing over time or, time-series is one of the most common forms of recorded. Timevarying phenomena are central to many domains such as finance (stock prices, exchange rates), science (temperatures, pollution levels, electric potentials), and public policy (crime rates). One often needs to compare a large number of time series simultaneously and can choose from a number of ualizations to do so. Index Charts. With some forms of time-series, raw values are less important than relative changes. Consider investors who are more interested in a stock s growth rate than its specific price. Multiple stocks may have dramatically different baseline prices but may be meaningfully compared when normalized. An index chart is an interactive line chart that shows percentage changes for a collection of time-series based on a selected index point. For example, the image in Figure a shows the percentage change of selected stock prices if purchased in January : one can see the rocky rise enjoyed by those who invested in Amazon, Apple, or Google at that time. Stacked Graphs. Other forms of time-series may be better seen in aggregate. By stacking area charts on top of each other, we arrive at a ual summation of time-series values a stacked. This type of (sometimes called a stream ) depicts aggregate patterns and often supports drill-down into a subset of individual series. The chart in Figure b shows the number of unemployed workers in the U.S. over the past decade, subdivided by industry. While such charts have proven popular in recent years, they do have some notable limitations. A stacked does not support negative numbers and is meaningless for that should not be summed (temperatures, for example). Moreover, stacking may make it difficult to accurately interpret trends that lie atop other curves. Interactive search and ing is often used to compensate for this problem. Small Multiples. In lieu of stacking, multiple time series can be plotted within the same axes, as in the index chart. Placing multiple series in the same space may produce overlapping curves that reduce legibility, however. An alternative approach is to use small multiples: showing each series in its own chart. In Figure c we again see the number of unemployed workers, but normalized within each industry category. We can now more accurately see both overall trends and seasonal patterns in each sector. While we are considering time-series, note that small multiples can be constructed for just about any type of ualization: bar charts, pie charts, maps, among others. This often produces a more effective ualization than trying to coerce all the into a single plot. Horizon Graphs. What happens when you want to compare even more time series at once? The horizon is a technique for increasing the density of a time-series view while preserving resolution. Consider the five s shown in Figure d. The first one is a standard area chart, with positive values colored blue and negative values colored red. The second mirrors negative values into the same region as positive values, doubling the density of the area chart. The third chart a horizon doubles the density yet again by dividing the into bands and layering them to create a nested form. The result is a chart that preserves resolution but uses only a quarter of the space. Although the horizon takes some time to learn, it has been found to be more effective than the standard plot when the chart sizes get quite small. Statistical Distributions Other ualizations have been designed to reveal how a set of numbers is distributed and thus help an analyst better understand the statistical properties of the. Analysts often want to fit their to statistical models, either to test hypotheses or predict future values, but an improper choice of model can lead to faulty predictions. Thus, one important use of ualizations is exploratory analysis: gaining insight into how is distributed to inform transformation and modeling decisions. Common techniques include the histogram, which shows the prevalence of values grouped into bins, and the box-and-whisker plot, which can convey statistical features such as the mean, median, quartile boundaries, or extreme outliers. In addition, a number of other techniques exist for assessing a distribution and examining interactions between multiple dimensions. Stem-and-Leaf Plots. For assessing a collection of numbers, one alternative to the histogram is the stem-and-leaf plot. It typically bins numbers according to the first significant digit, and then stacks the values within each bin by the second significant digit. This minimalistic representation uses the itself to paint a frequency distribution, replacing the information-empty bars of a traditional histogram bar chart and allowing one to assess both the overall distribution and the contents of each bin. In Figure a, the stem-and-leaf plot shows the distribution of completion rates of workers completing crowdsourced tasks on Amazon s Mechanical Turk. Note the multiple s: one group s around high levels of completion (% %); at the other extreme is a of Turkers who complete only a few tasks (~%) in a group. Q-Q Plots. Though the histogram and the stem-and-leaf plot are common tools for assessing a frequency distribution, the Q-Q (quantile-quantile) plot is a more powerful tool. The Q-Q plot compares two probability distributions by ing their quantiles against each other. If the two are similar, the plotted values will lie roughly along the central diagonal. If the two are linearly related, values will again lie along a line, though with varying slope and intercept. Figure b shows the same Mechanical Turk participation compared with three statistical distributions. Note how the forms three distinct components when compared with uniform and normal (Gaussian) distributions: this suggests that a statistical model with three components might june vol. no. communications of the acm

4 practice Statistical Distributions: Figure a. Stem-and-leaf plot of Mechanical Turk participation rates. Source: Stanford Group; Turker Task Group Completion % Statistical Distributions: Figure b. Q-Q plots of Mechanical Turk participation rates. % % % % % % % Uniform Distribution % % % Gaussian Distribution % % Fitted Mixture of Gaussians Source: Stanford Group; Statistical Distributions: Figure c. Scatter plot matrix of automobile. horsepower weight acceleration displacement European Union United States Japan Source: GGobi; Statistical Distributions: Figure d. Parallel coordinates of automobile. cylinders displacement cubic inch cubic inch weight lbs lbs horsepower hp hp acceleration ( to mph) ( to mph) mpg miles/gallon miles/gallon year Source: GGobi; communications of th e ac m j u n e vo l. n o. be more appropriate, and indeed we see in the final plot that a fitted mixture of three normal distributions provides a better fit. Though powerful, the Q-Q plot has one obvious limitation in that its effective use requires that viewers possess some statistical knowledge. SPLOM (Scatter Plot Matrix). Other ualization techniques attempt to represent the relationships among multiple variables. Multivariate occurs frequently and is notoriously hard to represent, in part because of the difficulty of mentally picturing in more than three dimensions. One technique to overcome this problem is to use small multiples of scatter plots showing a set of pairwise relations among variables, thus creating the SPLOM (scatter plot matrix). A SPLOM enables ual inspection of correlations between any pair of variables. In Figure c a scatter plot matrix is used to ualize the attributes of a base of automobiles, showing the relationships among horsepower, weight, acceleration, and displacement. Additionally, interaction techniques such as brushing-and-linking in which a selection of points on one highlights the same points on all the other s can be used to explore patterns within the. Parallel Coordinates. As shown in Figure d, parallel coordinates ( -coord) take a different approach to ualizing multivariate. Instead of ing every pair of variables in two dimensions, we repeatedly plot the on parallel axes and then connect the corresponding points with lines. Each poly-line represents a single row in the base, and line crossings between dimensions often indicate inverse correlation. Reordering dimensions can aid pattern-finding, as can interactive ing to along one or more dimensions. Another advantage of parallel coordinates is that they are relatively compact, so many variables can be shown simultaneously. Maps Although a map may seem a natural way to ualize geoical, it has a long and rich history of design. Many maps are based upon a cartoic projection: a ematical function that maps the D geometry of the Earth to a D image. Other maps

5 practice knowingly distort or abstract geoic features to tell a richer story or highlight specific. Flow Maps. By placing stroked lines on top of a geoic map, a flow map can depict the movement of a quantity in space and (implicitly) in time. Flow lines typically encode a large amount of multivariate information: path points, direction, line thickness, and color can all be used to present dimensions of information to the viewer. Figure a is a modern interpretation of Charles Minard s depiction of Napoleon s ill-fated march on Moscow. Many of the greatest flow maps also involve subtle uses of, as geoy is bended to accommodate or highlight flows. Choropleth Maps. is often collected and aggregated by geoical areas such as states. A standard approach to communicating this is to use a color encoding of the geoic area, resulting in a choropleth map. Figure b uses a color encoding to communicate the prevalence of obesity in each state in the U.S. Though this is a widely used ualization technique, it requires some care. One common error is to encode raw values (such as population) rather than using normalized values to produce a density map. Another issue is that one s perception of the shaded value can also be affected by the underlying area of the geoic region. Graduated Symbol Maps. An alternative to the choropleth map, the graduated symbol map places symbols over an underlying map. This approach avoids confounding geoic area with values and allows for more dimensions to be ualized (for example, symbol size, shape, and color). In addition to simple shapes such as circles, graduated symbol maps may use more complicated glyphs such as pie charts. In Figure c, total circle size represents a state s population, and each slice indicates the proportion of people with a specific BMI rating. Cartograms. A cartogram distorts the shape of geoic regions so that the area directly encodes a variable. A common example is to redraw every country in the world sizing it proportionally to population or gross domestic product. Many types of cartograms have been created; in Figure d we use the Dorling cartogram, which represents Maps: Figure a. Flow map of Napoleon s March on Moscow, based on the work of Charles Minard. WA OR CA Dec NV ID Dec Dec UT AZ MT WY Nov Nov Maps: Figure b. Choropleth map of obesity in the U.S.,. - % - % - % - % - % - % - % Maps: Figure c. Graduated symbol map of obesity in the U.S.,. Normal Overweight Obese Maps: Figure d. Dorling cartogram of obesity in the U.S.,. - % - % - % - % - % - % - % CA WA OR M M M K CO NM ND SD NE TX KS OK Nov MN IA MO AR LA Nov WI IL IN TN KY MI MS AL GA Oct Oct OH FL WV SC NC VA NY PA NJ MD DE Source: National Center for Chronic Disease Prevention and Health Promotion; NV Source: National Center for Chronic Disease Prevention and Health Promotion; ID UT AZ MT W Y CO NM ND SD NE TX KS OK IA MN AR MO LA WI IL MS KY TN AL MI IN Source: National Center for Chronic Disease Prevention and Health Promotion; VA WV GA FL OH MD SC NC DE NY NJ PA RI VT CT NH VT NH MA CT RI ME MA ME june vol. no. communications of the acm

9 practice cliques and bridges. Further, as with the indented-tree, multivariate can easily be ed alongside nodes. The problem of sorting the nodes in a manner that reveals underlying structure is formally called seriation and has diverse applications in ualization, statistics, and even archaeology. Matrix Views. Mathematicians and computer scientists often think of a in terms of its adjacency matrix: each value in row i and column j in the matrix corresponds to the link from node i to node j. Given this representation, an obvious ualization then is: just show the matrix! Using color or saturation instead of text allows values associated with the links to be perceived more rapidly. The seriation problem applies just as much to the matrix view, shown in Figure c, as to the arc diagram, so the order of rows and columns is important: here we use the groupings generated by a community-detection algorithm to order the. While path-following is more difficult in a matrix view than in a node-link diagram, matrices have a number of compensating advantages. As networks get large and highly connected, nodelink diagrams often devolve into giant hairballs of line crossings. In matrix views, however, line crossings are impossible, and with an effective sorting one quickly can spot s and bridges. Allowing interactive grouping and reordering of the matrix facilitates even deeper exploration of network structure. Conclusion We have arrived at the end of our tour and hope the reader has found the examples both intriguing and practical. Though we have ited a number of ual encoding and interaction techniques, many more species of ualization exist in the wild, and others await discovery. Emerging domains such as bioinformatics and text ualization are driving researchers and designers to continually formulate new and creative representations or find more powerful ways to apply the classics. In either case, the DNA underlying all ualizations remains the same: the principled mapping of variables to ual features such as position, size, shape, and color. As you leave the zoo and head back All ualizations share a common Dna a set of mappings between properties and ual attributes such as position, size, shape, and color and customized species of ualization might always be constructed by varying these encodings. into the wild, try deconstructing the various ualizations crossing your path. Perhaps you can design a more effective? Additional Resources Few, S. Now I See It: Simple Techniques for Quantitative Analysis. Analytics Press,. Tufte, E. The Visual Display of Quantitative Information. Graphics Press,. Tufte, E. Enioning Information. Graphics Press,. Ware, C. Visual Thinking for Design. Morgan Kaufmann,. Wilkinson, L. The Grammar of Graphics. Springer,. Development Tools Prefuse: Java API for information ualization. Prefuse Flare: ActionScript library for ualization in the Adobe Flash Player. Processing: Popular language and IDE for ics and interaction. Proto: JavaScript tool for Web-based ualization. The Toolkit: Library for D and scientific ualization. Related articles on queue.acm.org A Conversation with Jeff Heer, Martin Wattenberg, and Fernanda Viégas Unifying Biological Image Formats with HDF Matthew T. Dougherty, Michael J. Folk, Erez Zadok, Herbert J. Bernstein, Frances C. Bernstein, Kevin W. Eliceiri, Werner Benger, Christoph Best Jeffrey Heer is an assistant professor of computer science at Stanford University, where he works on humancomputer interaction, ualization, and social computing. He led the design of the Prefuse, Flare, and Proto ualization toolkits. Michael Bostock is currently a Ph.D. student in the Department of Computer Science at Stanford University. Before attending Stanford, he was a staff engineer at Google, where he developed search quality evaluation methodologies. Vadim Ogievetsky is a master s student at Stanford University specializing in human-computer interaction. He is a core contributor to Proto, an open-source Webbased ualization toolkit. ACM -// $. june vol. no. communications of the acm

The table below lists the licensure requirements for already-licensed PTs and PTAs applying for licensure in another jurisdiction. Summary Number of jurisdictions requiring license from: license was ever

State Annual Report Due Dates for Business Entities page 1 of 10 If you form a formal business entity with the state, you may be required to file periodic reports on the status of your entity to preserve

Regional Electricity Forecasting presented to Michigan Forum on Economic Regulatory Policy January 29, 2010 presented by Doug Gotham State Utility Forecasting Group State Utility Forecasting Group Began

New York Public School Spending In Perspec7ve School District Fiscal Stress Conference Nelson A. Rockefeller Ins0tute of Government New York State Associa0on of School Business Officials October 4, 2013

State Corporate Income Tax-Calculation 1 Because it takes all elements (a*b*c) to calculate the personal or corporate income tax, no one element of the corporate income tax can be analyzed separately from

Section A. Measures of Central Tendency and Dispersion A A. Measures of Central Tendency and Dispersion What you should learn How to find and interpret the mean, median, and mode of a set of data How to

State Estimates of Health Insurance Coverage Data from the National Health Interview Survey Eve Powell-Griner SHADAC State Survey Workshop Washington, DC, January 13, 2009 U.S. DEPARTMENT OF HEALTH AND

This document reports CEU requirements for renewal. It describes: Number of required for renewal Who approves continuing education Required courses for renewal Which jurisdictions require active practice

Ambulance Industry Receives Financial Relief Through the MMA On June 25, 2004, the Centers for Medicare and Medicaid Services (CMS) issued Transmittal 220 to Medicare Contractors outlining changes to the

THE Tax Burden ON TOBACCO HISTORICAL COMPILATION VOLUME 49, 2014 THE TAX BURDEN ON TOBACCO Historical Compilation 2014 i ACKNOWLEDGMENTS This is the 65 th version of the annual compendium on tobacco revenue

Welcome to the Future of Nursing: Campaign for Action Dashboard About this Dashboard: These are graphic representations of measurable goals that the Campaign has selected to evaluate our efforts in support

2014 APICS SUPPLY CHAIN COUNCIL OPERATIONS MANAGEMENT EMPLOYMENT OUTLOOK 1 ABOUT THIS REPORT APICS Supply Chain Council, in conjunction with the Cameron School of Business at the University of North Carolina-Wilmington,

What does Georgia gain by investing in its colleges and universities 2 A tremendous return: More economic prosperity. Less government spending. A stronger competitive advantage. A higher quality of life.

Department of Business and Information Technology College of Applied Science and Technology The University of Akron Summer 01 Graduation Survey Report 1. How would you rate your OVERALL EXPERIENCE at The

Visualization of Software Metrics Marlena Compton Software Metrics SWE 6763 April 22, 2009 Abstract Visualizations are increasingly used to assess the quality of source code. One of the most well developed

How learning benefits Georgians over a lifetime Education empowers Employability 2 Income 7 Home ownership 13 The next generation 17 Older citizens 21 Quality of life 26 Social well being 31 It s a simple

State of the Residential Property Management Market Survey Report, Fall 2012 Recently we asked you to give us your opinion regarding the state of the residential property management market. Now that we

AL No 2 Yes No See footnote 2. AK No Yes No N/A AZ Yes Yes Yes No specific coverage or rate information available. AR No Yes No N/A CA Yes No No Section 11590 of the CA State Insurance Code mandates the

E-Commerce Customer Acquisition Snapshot Q2 2013 This is the first in a series of reports examining emergent e-commerce customer acquisition trends. These findings are derived from data spanning 72 million

Frequently Asked Questions About Using The GRE Search Service General Information Who can use the GRE Search Service? Institutions eligible to participate in the GRE Search Service include (1) institutions

U.S. Department of Education NCES 2011-460 NAEP Tools on the Web Whether you re an educator, a member of the media, a parent, a student, a policymaker, or a researcher, there are many resources available

Enrollment Snapshot of Radiography, Radiation Therapy and Nuclear Medicine Technology Programs 2013 A Nationwide Survey of Program Directors Conducted by the American Society of Radiologic Technologists

The ACO Model/Capabilities Framework and Collaborative Wes Champion Senior Vice President Premier Healthcare Alliance Roadmaps to Serve as a Bridge from FFS to ACO Current FFS System What are the underpinning

Trends in Medigap Coverage and Enrollment, 2011 May 2012 SUMMARY This report presents trends in enrollment in Medicare Supplement (Medigap) insurance coverage, using data on the number of policies in force

Follow a winning strategy with The market is hardening, service is deteriorating, and pricing and losses are on the rise. It s time to make your move. make the right move At, we believe that workers compensation

When Medicare-Medicaid enrollees lose their Medicaid coverage: Who loses it, for how long, and what are the consequences? Gerald Riley Lirong Zhao Negussie Tilahun Medicare-Medicaid enrollees Vulnerable

States Future Economic Standing if current education levels remain the same. Presentation by Joe Marks SREB Director of Data Services State Leaders Forum St. Petersburg, Florida November 17, 2004 1 The

Federation of State s of Physical The table below provides information on approval of continuing education/competence courses and for each jurisdiction. Summary Number of jurisdictions requiring approval

These tables provide information on what type of supervision is required for PTAs in various practice settings. Definitions Onsite Supervision General Supervision Indirect Supervision The supervisor is

LIMITED LIABILITY COMPANY ORGANIZATION CHART The following Chart has been designed to allow you in a summary format, determine the minimum requirements to form a limited liability company in all 50 states

Dignified Choice - Classic Series Final Expense Life Insurance Columbian Mutual Life Insurance Company Home Office: Binghamton, NY Administrative Service Office: Norcross, GA Columbian Life Insurance Company

APICS OPERATIONS MANAGEMENT EMPLOYMENT OUTLOOK REPORT SUMMER 2013 1 ABOUT THIS REPORT APICS, in conjunction with the Cameron School of Business at the University of North Carolina Wilmington, is pleased

INTEREST RATES - June 16, 2016 to July 15, 2016 Notices 1. Before soliciting or taking any annuity applications, it is required that you have completed Lafayette Life's Annuity Training and any Continuing

An Introduction to... Equity Settlement The New York CEMA & Co-op Process June 2009 About Us... Established in 1986 Over 100 Associates Approved Vendor for Bank of America Preferred Vendor for Many National

Rates and Bills An Analysis of Average Electricity Rates & Bills in Georgia and the United States During regulatory and public policy discussions of electricity costs for Georgia ratepayers, the conversation

HEALTH CARE IN RETIREMENT GROWTH IN HEALTH CARE COSTS in the U.S. has significantly outpaced overall inflation. From 1982 to 2013, spending on health care increased at an average of 5.1%, faster than all

2015 Q3 Small Business Credit Outlook All s Well that Ends Well Shakespeare All s Not Well - Hamlet Business Cycle As 2015 ends, it is easy to say all is well going into 2016. Lending activity stands strong,

About Us For over 30 years, we have protected the interests of the small- to mid-sized businesses that insure with us. At Berkshire Hathaway Insurance Companies, we dedicate our efforts in the areas that

Enrollment Snapshot of, and Nuclear Medicine Programs 2012 A Nationwide Survey of Program Directors Conducted by the American Society of Radiologic Technologists January 2013 2012 ASRT. All rights reserved.

ABOUT LPL FINANCIAL serving financial advisors and their clients the need for objective advice has never been greater Amid an ever-changing investment landscape, investors need an expert and experienced

Public Policy for Angels Angels are Important to the Economy: Public Policy Strategies to Promote More Investment in Entrepreneurial Companies Agenda Who angels are and how they support entrepreneurs and

Time to fill jobs in the US January 2015 The 30day tipping point Time to fill jobs in the US Key Findings For businesses that fail to fill job openings within the first month, there is a 57% chance that