New York Times Looks for Answers in Data

High up on the 28th floor of the New York Times, a pair of researchers have been poring over the newspaper’s data, looking to understand the way influence plays out online. What Mark Hansen, a UCLA statistics professor on sabbatical and Jer Thorp, a data artist in residence at the Times, have found is that stories take on a life of their own, which can be mapped and visualized in some startlingly beautiful ways. The work, still “crazy” preliminary, shows how organizations are looking to mine their data to find ways to improve their operations. And it also shows the challenges that lay ahead in trying to turn the data into clear actions.

Hansen and Thorp, who talked at a TimesOnline event last night, took two weeks of August data from the paper, looking at how stories were shared through the Times’ site, Bit.ly and Twitter. The pair built a tool that allowed them to see the life of a story, as it first began as a URL tweeted by the Times and then retweeted and shared again and again. The tool can render a simple timeline, a wheel with spokes or a radar view showing spikes of tweets. But it can also go 3-D, creating a funnel that expands over time as stories keep getting shared.

By visualizing the data, Hansen and Thorp were able to isolate “cascades,” a chain of events that extend the life of a story, and can identify who has the influence online to keep it going. For example, a column by Paul Krugman inspired modest sharing but took off when Tim O’Reilly, founder of O’Reilly Media, retweeted it. In other cases, like the story of the flight attendent who escaped down the plane’s slide, the cascades are more dynamic and complicated.

While it’s still quite early, Hansen said the next steps will be to make the project handle both real time and archived information. And the hope is that the Times can suss out which factors can affect a story’s life, whether it’s the section it’s in or the time it’s released. But this is where the tough part begins. It’s not enough to get the data, now the paper has to ask the right questions of it. As Michael Driscoll, founder of Dataspora and co-founder of Metamarkets (see disclosure below) said in a previous story, analytics is the key to tapping the potential of big data. The ingesting and visualization of data are critical elements but analysis is where companies make their money.

Think of using data as a three step process. One has to have the data, then once has to ask the data the right questions and then act upon the information. But with more data available to people, the number of questions that can be asked expand. It’s kind of like suddenly going from photographs to moving pictures. There’s more information for our brains to process, which makes the experience richer. Now, with more data and cheaper, more powerful computing, our metrics can move from a still, snapshot in time to a moving picture of business health and activity. But where in that moving picture should businesses look? People will have to rethink the metrics they use in the snapshot era and find new focal points for the moving picture era of data. That act of finding out what new questions to ask will help separate the winners and losers, not mere analytics.

That’s the challenge for the New York Times, which like many traditional media companies is trying to revive revenues as their core audience shifts to digital. It’s great for the Times to have data to look at, especially identifying and perhaps eventually targeting key influencers who can make the paper relevant in the Twitterverse. But it’s got to quickly take the next step and turn that data and all those beautiful charts into business decisions that can affect the bottom line.

Disclosure: Metamarkets is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, Giga Omni Media. Om Malik, founder of Giga Omni Media, is also a venture partner at True.