New MIT Media Lab Tool Lets Anyone Visualize Unwieldy Government Data

DataViva, a project developed in part by Media Lab professor César Hidalgo, aims to make a wide swath of government economic data usable with a series of visualization apps.

In the four years since the U.S. government created data.gov, the first national repository for open data, more than 400,000 datasets have become available online from 175 agencies like the USDA, the Department of Energy, and the EPA. Governments all over the world have taken steps to make their data more transparent and available to the public. But in practice, much of that data—accessible as spreadsheets through sites like data.gov—is incomprehensible to the average person, who might not know how to wrangle huge data sets. Never-ending tables mean next to nothing to me, even if I know that they might be hiding some interesting relationship within their numbers, like how income stacks up with happiness.

To wade through what César Hidalgo, director of the Macro Connections group at the MIT Media Lab, calls "the last 10 inches" separating people from their government's incoherent tables and spreadsheets, Hidalgo turned to visualization. DataViva, a website Hidalgo and a few collaborators helped develop with the Brazilian state government of Minas Gerais, offers a wide array of web apps that turn those spreadsheets into something more comprehensible for the average user, whether that's a policy maker, someone working for the World Bank, an entrepreneur, or a student. The site, which officially launched last week, can be a bit overwhelming to navigate, but it has lofty goals: to visualize data encompassing the entire Brazilian economy over the last decade, with more than 100 million interactive visualizations that can be created at the touch of a button in a series of apps. The future of open government isn't just dumping raw datasets onto a server: It's also about making those datasets digestible for a less data-savvy public.

"Opening up data in a visual form is essential for transparency, and to empower other people to use this data," Hidalgo says over email. "Most of the people interested in economic development data, like the data we are making available on DataViva, are people that do not have the technical skills needed to work with machine readable formats." They might be entrepreneurs or investors who want to demonstrate a gap in the market, or strategic planners looking to fund certain projects. "They are interested in what's in the data and on using this visualization as building blocks for the presentations they need to construct," he says. The future of open government isn't just dumping raw datasets onto a server; it's about making those datasets digestible.

DataViva started in 2011 with the office of strategic priorities in Minas Gerais, Brazil's third largest state by GDP. The office wanted an internal tool to help sift through some of the huge amounts of economic data available from the federal government, and turned to the Observatory of Economic Complexity, a project from MIT Media Lab, to visualize trade between countries. "We thought that there was so much value" to the data, the office's director André Barrence tells Co.Design, that it would be a shame to limit it to internal uses. "We thought it would be much more interesting if we could open up the data for the entire country rather than just for ourselves."

DataViva contains apps that allow users to see the same economic statistics in different ways, through geography, treemaps, stacked graphs, and more. Some of the apps allow users to compare the locations, wages, and exports of different industries, while others visualize links between different occupations or what jobs are available in certain industries or states. A table comparing average wages across the country looks almost akin to the falling streams of green data in The Matrix: with hundreds of rows of numbers spanning a decade, it's difficult to decipher broader trends, and whether some years are complete outliers or part of a pattern. But in the graph app, it becomes neatly sorted into colored stacks organized by occupation, where you can see, for instance, maintenance workers' wages have surpassed the average monthly wages of administrative workers in the last 10 years—no fiddling with Excel required.

The average person could use it to figure out how his income compares to the average wages a few states over, or to see how often biologists change careers to become doctors, for example. Businesses looking to expand could use it to glean how many workers trained in their field live in a certain place or how much cheese is exported from different Brazilian cities. There’s a lot of effort to make data public, but there’s very little effort to make the data visible.

Most of the data will be updated annually, hopefully living up to the site's name, which roughly translates to "living data." Since the visualizations aren't static, but built through apps, they can update from the latest data. Though it's strictly economic information right now, according to Barrence, the site will hopefully expand to include datasets on education and health in the future.

"There’s a lot of effort to make data public, but there’s very little effort that I know of in governance to make the data visible," he says. For instance, the U.S.'s data.gov largely leaves the visualization up to private developers. The city of New York does offer some visualization capabilities for its open data, but the feature isn't easy to use, nor does it offer as many different types of visualization. "There’s not a lot of value for data without the right visualization," Barrence says.

Infographics like those created by DataViva's apps could help average voters—as well as the people running the government—understand economic issues in a more simple way. People wouldn't have to wait for a news organization or a designer to take a look at how incomes differ across geography, they could take a look themselves. And this seems like the type of project that could expand to any country (the platform is open source on GitHub). The type of data available might differ, but the basic ways to visualize it—through charts, webs, scatterplots, and maps—doesn't change.

Add New Comment

6Comments

Hi Cesar, I'm really interested in how this type of visualization can be used with government non-public data, is there an email I could send you more information on? Many of our state government clients would greatly benefit from these types of visualizations.

I know personally that Germany has this kind of data, but in order to access it you need to get certified and can only do so at the location where the data is stored. I am not an expert on the regulation per se, but my impression is that countries like Germany are at a disadvantage at developing this type of technologies because before doing so they would need to modify the regulations needed to make the data publicly accessible. As of now, the data is considered private. This is a good example of a technology that is highly constrained by the local regulatory environment, and in particular, the privacy laws of each country. This puts Brazil at an advantage since in Brazil all government data is by default public.