Derek Willis, a New York Times developer involved with Times digital data venture The Upshot, spoke about “interviewing data” to a packed room this morning at the Philip Merrill College of Journalism on the second day of the Journalism Interactive 2014 conference.

AJR is the publishing partner for the Journalism Interactive 2014 conference, held April 4 and 5 in College Park, Md.

Journalists don’t need to be Einstein to understand data sets, Derek Willis says. They need to be MacGyver.

Willis, a New York Timesdeveloper involved with Times digital data venture The Upshot, spoke about “interviewing data” to a packed room this morning at the Philip Merrill College of Journalism on the second day of the Journalism Interactive 2014 conference.

“The bulk of the skills involved in interviewing people and interviewing data are actually pretty similar,” Willis said. “We want to get to know it a little bit. We want to figure out what’s here. Who are you? What are you about?”

Derek Willis of The New York Times discusses data-driven reporting at Journalism Interactive 2014. By Joanna Nurmis/AJR.

Here is his advice for succeeding with data-driven journalism:

“You need to adopt a posture of deep, deep, abiding skepticism,” he said. Problems with data sets are often fundamental, not just cosmetic. “Act on the assumption from the minute you look at data that there’s something wrong.”

Sort the data as soon as possible (with headers). Figure out what you’re dealing with. For police arrest records, that might involve ordering by age, or charge or address — after you’ve figured out which of those categories needs to be cleaned up because of dud fields like “, ,” instead of a real address. Willis said Excel can sometimes make inferences that lead to confusion, such as subbing in 5/1/01 for a police code of 5-1-1.

Think of data as another source. But unlike human sources, data can’t tell you that you asked a stupid question. Write out the questions you want to ask, or even say them out loud, before filtering data.

Use data filter tools instead of Ctrl-F or Command-F, which he called less efficient. The filters are limited in Excel, but powerful in SQL-based programs. Start broadly, then narrow your questions down. “Because the law is oddly specific, we have to be oddly specific,” he said.

Be careful about translating data within the original spreadsheet and jumping to conclusions. And save a copy of the original before modifying it in any way. “The more you type, the more you type wrong,” he said.

Government agencies put data online so they don’t have to talk to journalists, Willis said. But there is often no translation key, so journalists have to do some reporting.

The great thing about data, Willis says: “It encourages you think of stories as questions, not as statements.”

Willis admits he might be an outlier. He always liked making lists as a kid, and when he was a congressional reporter, he half-jokingly would tell people he preferred interviewing data to interviewing politicos.

“Some journalists will be just fine without ever learning it,” Willis said.

Still, he said, we’re living in a world with more data than ever, from Facebook to Fitbit. And it’s really not that hard to learn Excel, as long as you have the willingness to get your hands dirty and play around with it.

Without learning to work with data, journalists might find some stories are simply out of their reach, Willis said.

Like MacGyver, Willis said, students and professionals often surprise themselves with what is possible with the basics.