Cloud didn't make the scientific method irrelevant in '08—AI won't do it in '17, either.

Back when I was doing research, one of my advisors once joked that, if you wait long enough, you can produce an old result using new methods, manage to get it published, and everyone will be impressed. I think his time limit was 15 years. Apparently, when it comes to big ideas about science (rather than scientific results), the schedule's a bit accelerated.

Just shy of 10 years ago, Chris Anderson, then Editor-in-Chief at Wired,published a piece in which he claimed that cloud computing was making the scientific method irrelevant. All those models and theories didn't matter, so long as an algorithm could identify patterns in your data. The piece was wrong then, as I explained at the time (see below). It hasn't gotten any more right in the meantime.

Yet a quote from Chris Anderson's article led off a new column last month that essentially says Anderson was right, he just had the wrong reason. It's not cloud computing that's going to make theory irrelevant—it's AI, the piece argues. Once trained, AI can recognize patterns using rules that we don't comprehend. Set it loose on scientific data, and it can pull things out without needing anything like a model or a theory.

The column is sweeping in its scope, and it includes some good and accurate information. But its fundamental premise is based on a misunderstanding of how AI is being used in science. I'll illustrate that using one of its own examples: the use of neural networks in particle physics.

Identifying a new particle, like the Higgs boson, depends on being able to recognize not the particle itself, but the spray of particles that it decays into. Neural networks have gotten pretty good at detecting patterns, and so they've come to be used in particle physics, searching through the debris of countless collisions, looking for a pattern that represents something interesting. This, the column suggests, hints that AI is replacing the need for theory in particle physics.

That's wrong on multiple levels. To begin with, particles like the Higgs boson can't decay to any old random spray of lighter particles. They obey rules that dictate which particles are possible, and in which combinations. Those rules are what allowed us to identify the Higgs in the first place, and they were set by theory—the Standard Model of physics, to be specific.

Yes, you don't necessarily have to teach a neural network those rules to get it to recognize particles. All you need to do is feed it enough examples of collisions with and without a Higgs in them. Of course, the only way that's possible is to have a bunch of examples of the Higgs in the first place... which we needed the Standard Model to know we produced. And the resulting neural network wouldn't have its own alternative version of the Standard Model percolating through its nodes; it would just be doing standard pattern recognition.

Could a neural network be trained to find something completely new, unpredicted by theory? Maybe. But you'd have to train it on lots of things we know aren't interesting and then get it to look for exceptions. And, of course, we know those things aren't interesting because we have a solid theoretical understanding of particle physics. Would any of the results it pulled out have testable consequences? Nope. We'd have to go back to crafting a theory to get those.

I'm not going to claim that neural networks or some other form of AI will never be able to find something truly novel on their own, or that we're never going to have to revise our current understanding of the value of theories and testable models. But I'm comfortable saying that we're not there yet, and it's not clear that we've gotten any closer.

My original response to that old Chris Anderson's column, first published in June 2008, appears unchanged below.

Nope, the cloud won't obscure it either

Every so often, someone (generally not a practicing scientist) suggests that it's time to replace science with something better. The desire often seems to be a product of either an exaggerated sense of the potential of new approaches, or a lack of understanding of what's actually going on in the world of science. This week's version, which comes courtesy of Chris Anderson, the Editor-in-Chief of Wired, manages to combine both of these features in suggesting that the advent of a cloud of scientific data may free us from the need to use the standard scientific method.

It's easy to see what has Anderson enthused. Modern scientific data sets are increasingly large, comprehensive, and electronic. Things like genome sequences tell us all there is to know about the DNA present in an organism's cells, while DNA chip experiments can determine every gene that's expressed by that cell. That data's also publicly available—out in the cloud, in the current parlance—and it's being mined successfully. That mining extends beyond traditional biological data, too, as projects like WikiProteins are also drawing on text-mining of the electronic scientific literature to suggest connections among biological activities.

There is a lot to like about these trends, and little reason not to be enthused about them. They hold the potential to suggest new avenues of research that scientists wouldn't have identified based on their own analysis of the data. But Anderson appears to take the position that the new research part of the equation has become superfluous; simply having a good algorithm that recognizes the correlation is enough.

The source of this flight of fancy was apparently a quote by Google's research director, who repurposed a cliché that most scientists are aware of: "All models are wrong, and increasingly you can succeed without them." And Google clearly has. It doesn't need to develop a theory as to why a given pattern of links can serve as an indication of valuable information; all it needs to know is that an algorithm that recognizes specific link patterns satisfies its users. Anderson's argument distills down to the suggestion that science can operate on the same level—mechanisms, models, and theories are all dispensable as long as something can pick the correlations out of masses of data.

I can't possibly imagine how he comes to that conclusion. Correlations are a way of catching a scientist's attention, but the models and mechanisms that explain them are how we make the predictions that not only advance science, but generate practical applications. One only needs to look at a promising field that lacks a strong theoretical foundation—high-temperature superconductivity springs to mind—to see how badly the lack of a theory can impact progress. Put in more practical terms, would Anderson be willing to help test a drug that was based on a poorly understood correlation pulled out of a datamine? These days, we like our drugs to have known targets and mechanisms of action and, to get there, we need standard science.

Anderson does provide two examples that he feels support his position, but they actually appear to undercut it. He notes that we know quantum mechanics is wrong on some level, but have been unable to craft a replacement theory after decades of work. But he neglects to mention two key things: without the testable predictions made by the theory, we'll never be able to tell how precisely it is wrong and, in those decades where we've failed to find a replacement, the predictions of quantum mechanics have been used to create the modern electronics industry, with the data cloud being a consequence of that.

If anything, his second example is worse. We can now perform large-scale genetic surveys of the life present in remote environments, such as the far reaches of the Pacific. Doing so has informed us that there's a lot of unexplored biodiversity on the bacterial level; fragments of sequence hint at organisms we've never encountered under a microscope. But as Anderson himself notes, the only thing we can do is make a few guesses as to the properties of the organisms based on who their relatives are, an activity that actually requires a working scientific theory, namely evolution. To do more than that, we need to deploy models of metabolism and ecology against the bacteria themselves.

Overall, the foundation of the argument for a replacement for science is correct: the data cloud is changing science, and leaving us in many cases with a Google-level understanding of the connections between things. Where Anderson stumbles is in his conclusions about what this means for science. The fact is that we couldn't have even reached this Google-level understanding without the models and mechanisms that he suggests are doomed to irrelevance. But, more importantly, nobody, including Anderson himself if he had thought about it, should be happy with stopping at this level of understanding of the natural world.