2011-02-18

Watson's Jeopardy win, and a reality check on the future of AI

So Watson beat the two best Jeopardy champions at their own game. What now?

Call me cynical, but as someone who has undertaken machine learning research for 14 years, the Jeopardy result is really not that surprising -- you would win too if you had the equivalent knowledgebase of all of Wikipedia at your fingertips for instant recall, and if you had a huge buzzer advantage by being able to process individual pieces of information in parallel at much faster rates than the human brain!

But there is a much deeper problem with some of the media pontification about the future of AI and machines taking over the world: try asking Watson how "he" feels about winning.

Watson's learning model is currently (only?) really, really good at figuring out what question you were asking given an answer to a general knowledge question. I'm sure there are lots of reusable pieces of the Watson system (some natural language processing (NLP) code, etc.). But what the mainstream media doesn't seem to understand is that it would be an enormous stretch to say that the system could simply and easily be applied to other domains.

The promise of machine learning is that algorithms should in theory be reusable in many situations. The Weka machine learning toolkit, for example, provides a generic ML framework that is used for all sorts of things. But extracting the right features from your data, and deciding how to represent them, is a huge problem on its own, and can be tackled completely separately from the learning issues. (This is all further muddied once you throw in NLP.)

Today most of the feature selection for any given learning task is done by hand engineers. An AGI (Artificial General Intelligence) would have to do that itself. We don't have much of a clue yet how to teach an AGI how to pick its own reasonable and useful feature sets in a totally generic or smart way. But it's quite easy to show that, for most complex datasets, your feature selection strategy is almost as important as, or more important than, the exact machine learning algorithm you apply.

What very few people appreciate is that machine learning has so far amounted to little more than learning arbitrary function approximators. You learn a mapping from a domain to a range, or from an input to an output. Minimizing the classification error is the process of refining that function approximation to minimize error on as-yet unseen data (the test dataset, i.e. data that was not used to train the previous iteration of function approximation). Because all machine learning algorithms (as they are currently framed) are basically just trying to learn a function, they are all in some deep sense quite equivalent. (Of course in practice, not all algorithms even work with the same data types, so that's why this is mostly only true in the deepest sense, but there has been quite a bit of work done to show that at the end of the day, most of today's machine learning algorithms are basically doing the same thing with different strengths and weaknesses.)

Incidentally, the fact that the whole field of machine learning is about learning arbitrary function approximators is pretty much the whole reason that a lot of people in CS learning theory don't really talk about AI anymore, only ML. There's nothing much intelligent about machine learning as it stands currently. I heard it said that CSAIL (the CS and AI Lab) here where I work at MIT is only still called CSAIL in deference to Marvin Minsky and the glory days of AI, and that a lot of people don't like the name and want to change it when Marvin finally totally retires. (That probably won't happen, but the statement alone was illustrative...) We need a complete revolution in learning theory before we can start to truly claim we're creating AI, even if the behaviors of ML algorithms feel "smart" to us: they only feel smart because they are correctly predicting outputs given inputs. But you could write down a function to do that on paper.

I'm not claiming we can't do it -- "It won't happen overnight, but it will happen" -- I'm just stating that ML and AI are quite different, and we're very good at ML and not at all good at AI.

Efforts to simulate the brain are moving along, and Ray Kurzweil predicts that in just a decade or two we should be able to build a computer as powerful as the brain. While that may be true in terms of total computational throughput of the hardware, there is no way to know if we will be able to create the right software to run on this hardware by that time. The software is everything.

One of the problems is that we don't know exactly how neurons work. People (even many neuroscientists) will tell you, "of course we know how a neuron works, it's a switching unit, it receives and accumulates signals until a certain potential is reached, then it sends on a signal to the other neurons it is connected to." I suspect in several years' time we will realize just how naive that assumption is. For now, there are already lots of fascinating discoveries made that show that things are just not that simple, e.g. (hot off the press yesterday): http://www.eurekalert.org/pub_releases/2011-02/nu-rtt021711.php

From that article:
> "It's not always stimulus in, immediate action potential out. "
> "It's very unusual to think that a neuron could fire continually without stimuli"
> "The researchers think that others have seen this persistent firing behavior in neurons but dismissed it as something wrong with the signal recording."
> "...the biggest surprise of all. The researchers found that one axon can talk to another."

This is exactly the sort of thing that makes me think it's going to take a lot longer than Ray predicts to simulate the brain: we don't even know what a neuron is doing. A cell is an immense, extraordinarily complex machine on the molecular scale, and simplifying it to a transistor or thresholded gate is not necessarily going to produce the correct emergent behavior when you connect a lot of them together. I'm glad people like the researcher conducting the above research are doing some more fundamental work into what a neuron actually is and how it really functions. I suspect that years down the line we'll discover much more complicated information processing capabilities of individual cells -- e.g. the ability of a nerve cell to store information in custom RNA strands based on incoming electrical impulses in order to encode memories internally [you read it here first], or something funky like that.

Of course even a simplified model is still valuable: "Essentially, all models are wrong, but some are useful" (--George E. P. Box). However we have to get the brain model right if we want to recreate intelligence the biologically-inspired way. Simply stated, we can't predict what it will take to build intelligence, or how long it will take, until we understand what it actually is we're trying to build. Just saying "it's an emergent property" is not a sufficient explanation. And emergent properties might only emerge if some very specific part of the behavior of our simplified models works correctly -- but we have no way of knowing which salient features must be modeled correctly and which can be simplified.

But a much bigger problem will hold up the arrival of AGI: not only do we not know how single neurons really work, we have NO CLUE what intelligence really is. And even less clue what consciousness really is. And the problem with Ray's predictions is that even though we can forecast the progress of a specific quantifiable parameter of known technology, perhaps even if the exact underlying technology that embodies the parameter changes form (e.g. Moore's Law continued to hold across at least 50 years, even across the switch from vacuum tubes to transistors to silicon wafers etc.), we can't forecast the time of creation or invention of a new technology that is for all intents "magic" right now because we still don't know how it would work. In fact we can predict the arrival of a specific magic technology about as well as we can predict the time of discovery of a specific mathematical proof or scientific principle. Nature sometimes chooses simply not to reveal herself to us. Can we even approximately predict when we will prove or disprove P=NP or the Goldbach Conjecture? How much harder is it to define intelligence (or even more so, consciousness) than to prove or disprove a mathematical statement?

Finally, and most importantly, somebody needs to get Watson to compete in Jeopardy against Deep Thought to guess the correct question to the answer 42...