Why Data Will Never Replace Thinking

Big data, it has been said, is making science obsolete. No longer do we need theories of genetics or linguistics or sociology, Wired editor Chris Anderson wrote in a manifesto four years ago: “With enough data, the numbers speak for themselves.”

I believe that math is trumping science. What I mean by that is you don’t really have to know why, you just have to know that if a and b happen, c will happen.

Anderson and Ranadivé are reacting to something real. If the scientific method is to observe, hypothesize, test, and analyze, the explosion of available data and computing power have made observation, testing, and analysis so cheap and easy in many fields that one can test far more hypotheses than was previously possible. Quick-and-dirty online “A/B tests,” in which companies like Google and Amazon show different offers or page layouts to different people and simply go with the approach that gets the best response, are becoming an established way of doing business.

But that does that really mean there are no hypotheses involved? At Techonomy, Ranadivé made his math-is-trumping-science comments after recommending that the Federal Open Market Committee, which sets monetary policy in the U.S., be replaced with a computer program. Said he:

The fact is, you can look at information in real time, and you can make minute adjustments, and you can build a closed-loop system, where you continuously change and adjust, and you make no mistakes, because you’re picking up signals all the time, and you can adjust.

As best I can tell, there are three hypotheses inherent in this replace-the-Fed-with-algorithms-plan. The first is that you can build U.S. monetary policy into a closed-loop system, the second is that past correlations in economic and financial data can usually be counted on to hold up in the future, and the third is that when they don’t you’ll always be able to make adjustments as new information becomes available.

These feel like pretty dubious hypotheses to me, similar to the naive assumptions of financial modelers at ratings agencies and elsewhere that helped bring on the financial crisis of 2007 and 2008. (To be fair, Ranadivé is a bit more nuanced about this stuff in print.) But the bigger point is that they are hypotheses. And since they’d probably prove awfully expensive to test, they’ll presumably stay hypotheses for a while.

There are echoes here of a centuries-old debate, unleashed in the 1600s by protoscientist Sir Francis Bacon, over whether deduction from first principles or induction from observed reality is the best way to get at truth. In the 1930s, philosopher Karl Popper proposed a synthesis, in which the only scientific approach was to formulate hypotheses (using deduction, induction, or both) that were falsifiable. That is, they generated predictions that — if they failed to pan out — disproved the hypothesis.

Actual scientific practice is more complicated than that. But the element of hypothesis/prediction remains important, not just to science but to the pursuit of knowledge in general. We humans are quite capable of coming up with stories to explain just about anything after the fact. It’s only by trying to come up with our stories beforehand, then testing them, that we can reliably learn the lessons of our experiences — and our data. In the big-data era, those hypotheses can often be bare-bones and fleeting, but they’re still always there, whether we acknowledge them or not.

Data-driven predictions can succeed — and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves.

One key role we play in the process is choosing which data to look at. That this choice is often made for us by what happens to be easiest to measure doesn’t make it any less consequential, as Samuel Arbesman wrote in Sunday’s Boston Globe (warning: paywall):

Throughout history, in one field after another, science has made huge progress in precisely the areas where we can measure things — and lagged where we can’t.

In his book, Silver spends a lot of time on another crucial element, how we go about revising our views as new data comes in. Silver is a big believer in the Bayesian approach to probability, in which we all have our own subjective ideas about how things are going to pan out, but follow the same straightforward rules in revising those assessments as we get new information. It’s a process that uses data to refine our thinking. But it doesn’t work without some thinking first. _____________________

Partner Center

The email and password entered aren’t matching to our records. Please try again, or reset your password. If you have a username from our previous site, start by using that. Please See our FAQ for more.

If you are signing in for the first time on the new HBR.org but have an existing account, please enter your existing user name and password to migrate your account.Please see Frequently Asked Questions for more information.