Telling stories with data

by Wei Yin, Research Support & Data Services Librarian, Columbia University Libraries

Why storytelling with data is important?

A piece of news from Forbes indicates data storytelling is the essential skill everyone needs in this big data era. A best-seller book on Amazon, “Storytelling with data: a data visualization guide for business professionals”, addresses the importance of knowing how to choose a most effective way of visualization (i.e. tables, graphs, bars, or others) for drawing audience’s attention both intellectually and emotionally. In other words, data visualization helps telling a meaningful story. Storytelling with data is not only useful in Business, but also embeds in people’s daily life. Strategic storytelling using government-released opendata is becoming more and more popular in city governance. Here, strategic storytelling helps with building up a campaign platform and promoting administrative level legislations and policies. For better storytelling with data, a research paper by two Stanford scholars suggests a balance between author-driven and reader-driven visualization. Author-driven visualization is good for message delivering, while reader-driven visualization supports interactive thinking based on current data display.

There are 4 kinds of open-source data visualization tools for storytelling that are popular in academia nowadays.

Tableau brands itself as a powerful Business Intelligence tool for visual analytics. Rather than displaying graphs and tables for audience, it provides interactive data dynamics to make both presenter and audience understand data better. Tableau products include Tableau desktop (for personal use), Tableau server (for enterprise use) and Tableau online (for cloud service). Though these services are not free, Tableau offers one-year free desktop license, called Tableau public, to students at K12 and postsecondary levels around the world.

D3 is short for “Data-Driven Documents”, which is a JavaScript library for manipulating documents based on data. D3 uses HTML, CSS, and SVG for any type of data visualization you can imagine. Most of functions are free and open-source.

R is widely used in academia, which is famous for its data visualization functions. Jeff Chen and Star Yang from Commerce Data Service wrote a detailed introduction to both 2D and 3D visualization using R. 2D visual tools include the packages of “ggplot2”, “datatables” and “dygraphs”, and 3D visual tools include the libraries of “Threejs” and “leaflet” (for mapping).

Python is another good programming tool for data scientists because it has extensive built-in functions and libraries. This article compares 5 essential data visualization libraries, Pandas, Seaborn, Bokeh, Pygal and Ploty, to help you choose the right data visualization tool.