Posts about data

I’ve posted two really short chapters from Geeks Bearing Gifts today on Medium: one on curation, one on data. Then I’ll take a break for the holiday and come back with a bigger chapter on rethinking what mobile really means for news.

A snippet from the chapter on curation (relevant to current discussions about Google and news in Europe):

And here’s a snippet from the chapter on data:

Data is a critical new opportunity for news organizations. What journalists have to ask — as with the flow of news — is how they add value to data by helping to gather it (with effort, clout, tools, and the ability to convene a community), analyze it (by calling upon or hiring experts who bring context and questions or by writing algorithms), and present it (contributing, most importantly, context and explanation). . . .

Data needs to become a mindset and a skill set in news organizations. Journalists should receive training to become literate in the opportunities and requirements of using data. Journalists also have to work with specialists who can analyze, interpret, and present data, and who can create tools allowing both reporters and the public to work with it. From a business perspective, data should be seen as an asset worth investing in, one that can yield news and new engagement often at a low cost. Data is/are a step past the article.

Read the rest of each chapter here and here. If you can’t wait for the rest, then you can buy the book here. The perfect gift for the journowonk on your list.

Tom Loosemore, blogging MP Tom Watson — and others, including the Guardian — have been fighting to get more public data made public in the UK. Now Watson and Loosemore have launched a $40k prize to mashup this data and come out with lots of lemonade. Here‘s Paul Bradshaw on the movement. Here are some — as a Brit tweet said — stonking good ideas already.

: LATER: This tweet by Charles Arthur of the Guardian — “wtf? No downloadable school league tables?” — made me realize that newspapers are also foolish not to make their data mashuppable. If we put out all our sports data as tables that could be downloaded and mashed up people would build no end of great stuff on top of us. That’s thinking like a platform. WWGD?

The lesson of the Thomson-Reuters merger is the value of change. Thomson was a newspaper company and in the ’90s started shifting, getting rid of papers and getting into data and finding great success and growth there. Reuters was a newspaper service company and it made the shift into not only data but also, thanks to the wisdom of its current chief Tom Glocer, into direct-to-consumer news. Both specialized highly, in financial data in their cases. Compare and contrast them with Knight Ridder, which doubled down on broad, generalized print products, and Tribune Company, which diversified from print, though not on a specialized track but in more generalized electronic media (TV and radio). Recognizing the value of specialized data as news and acting early — with strong headstarts, of course — was a successful strategy. Can existing newspaper companies start to think of themselves as data providers and enablers, but in different spheres (e.g., hyperlocal, listings)? Is there time?

The issues in the fight over telephone companies releasing data to the NSA aren’t so simple as they are being reported and spun under the dark cloud of privacy violation.

From what we know, data was released to the NSA so it could be analyzed to find patterns and thus to find anomalies that might lead to suspect communication and suspects, in turn. In other words, you can’t tell what’s abnormal until you define normal and we define normal.

If, in fact, it is aggregate data they are using to discover those exceptions, then we need to ask a new question that isn’t really being addressed in the networked world: Who owns the wisdom of the crowd? If the people own it, then one could argue that the government, acting as the people, may seek and use that data unless we, the people, forbid it through law. There is, of course, a proper debate about whether the law does allow it. There is also a proper debate over whether this is a necessary and prudent weapon in finding terrorists (and whether that is being done effectively). Indeed, a Washington Post poll says that 63 percent of Americans consider this an “acceptable way for the federal government to investigate terrorism.” And didn’t we protest that our government did not do a good enough job analyzing data and intelligence to prevent 9/11? If someone had been analyzing patterns of enrollment in flight schools — hmm, why are an abnormally high number of Saudis suddenly learning how to fly passenger jets? — then could we have stopped them? A further question is whether we have a right to know that all this is going on or whether that public knowledge cripples this investigation and our safety. Finally, it is not clear that releasing aggregate data necessarily violates individuals’ privacy. My point is that this isn’t as simple as raising the tattered-from-overuse privacy flag. Neither is this as simple as raising the also tattered war-on-terrorism flag.

This is about a new asset that is created in the networked world — the aggregate knowledge generated by our aggregate behavior — and who has a right to that.

This is certainly not new, only more efficient. Insurance companies have long used our health and mortality data in aggregate to set rates. Marketers use our aggregate data to adjust products and ad campaigns. Google uses our aggregate data to improve its search engine. So Google owns, analyzes, and exploits the data we create through our actions. In the case of the kiddie porn investigation, Google tried to refuse to hand over random aggregate data about our searches to the government; other search engines complied. The same thing occurred in the NSA case; some phone companies complied and Qwest did not.

The bottom line is that there isn’t yet a bottom line: The law and ethics around aggregate data are not clear.

Well, here we go again with the horrified screams from the crowd that’s inclined to believe the big bad government is peeping through every keyhole and recording every streetcorner chat about whether or not it looks like rain.

Revelations that the National Security Agency has been collecting a database of every telephone call in America – numbers dialed, that is, not conversations parsed – happen to come as British probers report that July’s London transit bombings might have been prevented if only security forces had been aware that one of the bombers regularly called Pakistan in the days before the blasts.

No, it’s no crime to call Pakistan. But when the call is part of a pattern that suggests a security risk, this is worth red-flagging and perhaps eavesdropping on – with a warrant and court supervision, as all right up to the commander in chief agree would be necessary.

Anyway, the idea that phone companies have been turning over raw logs to the NSA somehow doesn’t strike us as all that revelatory. Of course they have been, and they have been doing it legally. If the purpose is synthesizing data, then certainly the NSA would be keeping a database from which to synthesize. And where did you think the NSA was going to go to collect log data? …

“I wish I could say I was bothered by it but I’m not,” said Jacques Domenge, a 28-year-old Potomac man who visited a Cingular Wireless store in Rockville yesterday to replace a stolen phone.

“If it’s only done to protect people and find patterns that help the government find terrorists — I don’t think it will work, by the way, but let’s say it will — then I am all for it,” he said, adding that he had no problems with Cingular — or any other phone company — turning over records.

According to a Washington Post-ABC News poll released yesterday, 63 percent of Americans said they found the NSA program to be an acceptable way to investigate terrorism, including 44 percent who strongly endorsed the effort. Another 35 percent said the program was unacceptable, including 24 percent who strongly objected to it.

“The value of fighting terrorism, in a lot of our research, seems to be more important to the public than what they perceive as violations of their privacy — so far,” said Frank Newport, editor in chief of the Gallup Poll and vice president of the Gallup Organization in Princeton, N.J.

Newport said views of the NSA program — which was disclosed on Thursday by USA Today — should be viewed in the broader context of Americans grappling with more and more of their personal data being collected and analyzed by businesses. “When we ask what’s the most important problem facing the country, we don’t see any signs that privacy is beginning to percolate up,” he said.