Is Big Data Overhyped?

For some in Silicon Valley, the rise of new data and communication networks creates unprecedented opportunities to solve problems like obesity, traffic, and flu pandemics. For example, an app like FitBit or LoseIt can keep track of calories and buzz a dieter once he goes over his daily limit. Futuristic early warning systems can warn drivers away from bottlenecks, and detect emerging influenza outbreaks.

Evgeny Morozov’s illuminating book To Save Everything, Click Here challenges both “internet centrism” and “solutionism.” The internet may, for instance, make traffic worse. Moreover, solutionism tends to “reach for the answer before the questions have been fully asked.” Is the problem really traffic, or something deeper in the way cities and opportunities are arranged? Solutionism tends to prioritize issues that widely accessible tech can address: small, algorithmically decomposable bits of wicked problems.

While a solutionist might think of gamified calorie counting as a wonderful new way to fight obesity, a more sober analysis of the problem will lead us to doubt the smartphone will make us svelte. Similarly, calorie counts may be a great disclosure tactic, but disclosure is only the first step on the road to changing behavior. And our food problem, like our traffic problem, may entail reconsideration of privilege, taste, and inequality as far deeper problems than individual struggles for self-control.

EHR [electronic health record] vendors are making slow progress towards achieving interoperability, the ability of two or more systems to exchange information and to operate in a coordinated fashion. In 2010 only 19% of hospitals exchanged patient data with providers outside their own system. Vendors may have little incentive to produce interoperable systems because interoperability might make it harder to market products as distinctive and easier for clinicians to switch to different EHR products if they are dissatisfied with the ones they purchased. . . .

Even if the EHR data themselves are flawless, analysts seeking to answer causal questions, such as whether particular public health interventions have had a positive impact, will face significant challenges relating to causal inference. These include selection bias, confounding bias, and measurement bias.

[A]s medical research follows the lead of Google Flu Trends and begins to slip outside these traditional institutions and their concomitant safeguards, we should be concerned about the relative lack of controls. Particularly as more medical research is conducted by proﬁt-driven companies—–whether large corporations or small startups—–we should worry about forcing the public to accept new risks to privacy with little countervailing beneﬁt and none of the controls. The worst of all worlds would occur if medical researchers at non-proﬁt institutions began to clamor for relaxed human subjects review in a race to the bottom to compete with their forproﬁt counterparts.

Ohm’s point about maintaining a baseline of standards is prescient: I have heard at least one behavioral scientist argue that research will migrate out of universities and into private companies if the universities don’t relax IRB standards. Ohm also questions whether something as celebrated as Google Flu Trends has led to actionable data:

Who has created an app, therapy, or epidemiological study based on the colors on [Google’s flu maps]? Has a traveler ever avoided boarding a plane to a city on a distant coast because of the relative diﬀerence in the shading of the oranges between home and destination? The answer, I suspect, is that none of these positive results has occurred. Instead, the project’s primary mission is to market Google: we are reminded by a colorful map that Google is not evil.