Dr. Aditya Vailaya, chief scientist of consumer-electronics recommendation site Retrevo, doesn’t discount the Watson victory, but he does recognize the clear line between machine learning (the broad umbrella under which Watson’s question-answering algorithms reside) and artificial intelligence. That a system such as Watson can understand natural language is a huge step forward in machine learning, he explained, but it’s still only as good as the data it has to work with and the algorithms, or rules, developed to process that data. Put more simply, machine learning is a largely a case of “garbage in, garbage out.”

Vailaya’s comparison of Watson and IBM’s formerly most-famous human destroyer, Deep Blue, is illustrative of this point. At their cores, he said, both systems just do math: They use computer processors to analyze every possible outcome or answer, whereas humans use instinct to eliminate some possibilities and narrow their focus. Deep Blue analyzed a chess board, its possible moves and the possible outcomes of each possible move — and it went many, many levels deep. Watson essentially does the same thing, he explained, only in a less-constrained environment where nothing is as finite as chess moves. In processing natural language, Watson needs to decipher key words and their context, then analyze that information against its enormous data set.

What makes natural-language processing so difficult, though, is making sense of poorly written language, where it can be difficult even for humans to decipher subjects, verbs and other parts of speech. Watson’s ability to do this is based wholly on its human “trainers'” understanding of language and the algorithms with which they are able to program Watson based on this understanding. The same will hold true when, as has been announced, IBM begins working to implement Watson in the health care field. Vailaya explained that not only will Watson need to be loaded with a vast amount of medical data, but it will need very knowledgeable medical experts and computer scientists to develop algorithms that enable Watson to analyze symptoms against that data. If Watson were crawling the web to bolster its data store, humans would have to develop rules to help it decide which sources are providing correct, usable data, something Vailaya already does with Retrevo.

This is in contrast to artificial intelligence, where machines are designed to mimic human and/or animal brains and determine their own rules about what to do in any given situation. Vailaya describes one process of creating artificial intelligence as giving machines “emotions” and implementing some sort of incentive system. Such a machine would then remain in its environment and create its own rules for making decisions based on past experiences. In essence, Vailaya explained, the machine would be self-motivated to act in its own best interests. Although there are some small-scale artificial-intelligence implementations in areas like gaming and robotics, he noted it’s a very complex field, and even still, scientists are limited to building only what they can understand. If artificial intelligence were as far along — or as easy (relatively speaking) — as machine learning, then humans might have something to worry about.

Don’t get Vailaya wrong, though. He acknowledges it’s a big deal that IBM and its research partners were able to develop a computer that can answer natural-language questions better than humans can. Because of computers’ inherent ability to learn spaces more deeply than humans can, systems like Watson have the potential to be very useful “in any area where [a professional] need[s] an expert consultant” to answer difficult questions in a hurry or, ultimately at the consumer level. From Vailaya’s perspective, though, we’re a long way off from having to “welcome our new computer overlords.”

For many deeper discussions about data-analysis algorithms, attend our upcoming Structure Big Data conference March 23 in New York City, where data scientists will discuss the algorithms in place for analyzing everything from Twitter streams to market data.