Data Visualization

What it is and why it is important

Data visualization is the presentation of data in a pictorial or graphical format. For centuries, people have depended on visual representations such as charts and maps to understand information more easily and quickly.

As more and more data is collected and analyzed, decision makers at all levels welcome data visualization software that enables them to see analytical results presented visually, find relevance among the millions of variables, communicate concepts and hypotheses to others, and even predict the future.

Because of the way the human brain processes information, it is faster for people to grasp the meaning of many data points when they are displayed in charts and graphs rather than poring over piles of spreadsheets or reading pages and pages of reports.

Interactive visualization

Interactive data visualization goes a step further – moving beyond the display of static graphics and spreadsheets to using computers and mobile devices to drill down into charts and graphs for more details, and interactively (and immediately) changing what data you see and how it is processed.

Why is data visualization important?

Visualizations help people see things that were not obvious to them before. Even when data volumes are very large, patterns can be spotted quickly and easily. Visualizations convey information in a universal manner and make it simple to share ideas with others. It lets people ask others, “Do you see what I see?” And it can even answer questions like “What would happen if we made an adjustment to that area?”

Consider the manufacturing director of product reliability for an international company that produces small vibrating cell phone motors. One of the director’s principal responsibilities is to determine how reliable the cell phone motors will be with each year of age. If the product’s reliability falls short of the standards set forth by the cell phone manufacturers who use the motors, his company could lose major contracts.

Spreadsheets are hard to visualize

Because of the amount of data collected on the age and reliability of the cell phone motors, a traditional electronic spreadsheet cannot visually represent the information due to data presentation limitations. And, if printed out, the spreadsheets would be a humongous pile of paper on the director’s desk. In both cases, the director would spend hours searching among thousands of rows and columns of data with still no concrete answer to the original question about the relationship between the motor’s age and its reliability.

Data visualization makes interpretation easier

Data visualization presents the data in a way that the director can easily interpret, saving time and energy. For example, the graph above shows the number of units that correspond to each age (represented by the color gradient) as well as the reliability as the age of a unit increases. In a matter of seconds, the director can see that units approaching 10 years of age are approximately 40 percent reliable. This visual simplifies the data, instantly clarifying the factors affecting the reliability of the cell phone motors.

Interactive charts and graphs like the ones shown above make it easier for decision makers across all organizations to:

Data visualization made easy

Common techniques

There are a few basic concepts that can help you generate the best visuals for displaying your data:

Understand the data you are trying to visualize, including its size and cardinality (the uniqueness of data values in a column).

Determine what you are trying to visualize and what kind of information you want to communicate.

Know your audience and understand how it processes visual information.

Use a visual that conveys the information in the best and simplest form for your audience.

Data visualization is an art and a science unto itself, and there are many graphical techniques that can be used to help people understand the story their data is telling.

Visualizing big data

Big data brings new challenges to visualization because large volumes, different varieties and varying velocities must be taken into account. And, in many cases today, data is just being generated faster than it can be digested. There are quite a few factors to consider.

The cardinality of the columns you are trying to visualize is a factor. Cardinality is the uniqueness of data values contained in a column. High cardinality means there is a large percentage of totally unique values (e.g., bank account numbers, because each item should be unique). Low cardinality means a column of data contains a large percentage of repeat values (as might be seen in a “gender” column).

Data visualization software from SAS

Building upon basic graphing and visualization techniques, SAS Visual Analytics has taken an innovative approach to addressing the challenges associated with the visualization of data. Using in-memory capabilities combined with SAS Analytics and data discovery, SAS provides new techniques based on core fundamentals of data analysis and the presentation of results. With SAS Visual Analytics, you can:

How do you decide which visual is best?

One of the biggest challenges for nontechnical and business users in producing data visualizations is deciding which visual should be used to represent the data accurately. SAS Visual Analytics uses intelligent autocharting to create the best possible visual based on the data that is selected. It is important to note that autocharting may not always create the exact visualization you had in mind. In that case, you can select a specific visual to create.

Autocharting in SAS Visual Analytics produces a bar chart to show the distribution of a single measure.

The addition of a second measure results in an autocharted scatter plot.

If SAS Visual Analytics determines that the data is geographic, a map frequency chart is used.

When you are first exploring a new data set, autocharts are especially useful because they provide a quick view of large amounts of data. This data exploration capability is helpful even to experienced statisticians as they seek to speed up the analytic life cycle process because it eliminates the need for repeated sampling to determine which data is appropriate for each model.