Visualizing the New York Subway System's 'Data Exhaust'

In 2011, MetroCards were swiped through the turnstiles of the New York City subway system 1.6 billion times. Each swipe was, itself, a data point, and it came connected to myriad others about the day of the week, the subway stop, the identity of the rider. Did the commuter have a student MetroCard, or a senior citizen one? What about a seven-day pass, or a 30-day one?

As a sheer byproduct of moving so many people around the city, the New York Metropolitan Transit Authority constantly churns out information like this. And, thanks to the rapidly expanding movement for open data, it’s now available to the public – if we can just begin to figure out what to do with it.

"This is such a big sprawling thing," says John Geraci, who heads the New York office of faberNovel, a Paris-based company that consults with cities, non-profits and private companies on how to act more like startups. "This data was not created really with this in mind, with the idea of being seen by people. It’s like data exhaust."

Geraci’s firm has just created a data visualization site playing with all of this information to at least begin to tempt our imaginations on what we could learn from it.

"Now we’re capturing that exhaust, and piping it back into another part of the system," Geraci says. "It seemed like this data wasn’t getting as much use as it could, even though it was out there. We decided to dig into it, to take a look and see what kind of insights we could glean from it."

Unlike other transit systems such as Washington, D.C.’s which charge riders by distance – and that require them to swipe cards both when they enter and exit the subway – the MTA really only collects this data at the point of entry. (As a side note to mobility researchers and data geeks: you could really do a lot of neat stuff with the data produced by systems like Washington's.) This MTA data, however, tells us a lot about where people are entering the system.

Here’s a snapshot from faberNovel’s site of the annual ridership on the MTA broken down by borough.

That's compared to the total distribution of stations:

Here's a map of the top 10 stations visited most often by students:

The site has also built navigable maps of the network’s most popular destinations (senior citizens are not, as it turns out, going to the same places as students), as well as graphics of the most visited stations in each borough, which you can spend more time playing with here.

These developers are working here with a first-generation data set, and these visualizations only begin to scratch the surface of what might be possible to illustrate and learn about how people move about the city once we’re able to take live snapshots of transit systems (this 2011 data was captured by MTA at six-hour intervals). All of this begins to get at the bigger question: Now that cities are opening up their data, what can we do with it? What can we create beyond the obvious trip planners?

"That’s always a much harder question to answer," Geraci says. "At the same time, that question has to be answered satisfactorily in a pretty short period of time, otherwise [transit agency] budgets for these kinds of things are going to disappear."

About the Author

Emily Badger is a former staff writer at CityLab. Her work has previously appeared in Pacific Standard, GOOD, The Christian Science Monitor, and The New York Times. She lives in the Washington, D.C. area.