Elementary my dear Watson

If you missed Jeopardy! the past couple of days, do not miss it tonight! Watson, the IBM supercomputer (90 IBM Power 750 servers, 16 Terabytes of memory, 2,880 processor cores) is playing against two of the best Jeopardy! players, for an exciting match-up between humans and machines.

There are undertones of 2001: A Space Odyssey here, but if one looks below the surface, they realize that we are still not very close to HAL 9000. Watson is capable of understanding natural language, and particularly adept at deciphering the difficult language constructs used in Jeopardy (double meanings, expressions, puns,etc.). It (or should I say “he”?) cannot see or listen; the questions are provided to Watson in text format, so the seemingly natural interaction between Watson and the host is unfortunately not real. Nevertheless, so far Watson has been able to outplay both human opponents.

My intuition tells me that this is not because Watson is intelligent, but only because it is fast. First, humans face the electromechanical limitation of a buzzer (the thumb has to physically move to press the button). Second the majority of questions in Jeopardy! are fairly simple to answer using traditional search mechanisms (a la Google). A combination of important (high IDF) keywords directly lead to an answer. On the other hand, Watson seems incapable of answering questions correctly when the answer is not simply a keyword, but a phrase construct. Watson’s creators claim that they have used very sophisticated natural language processing algorithms. Personally, I doubt that anything very sophisticated is necessary for finding documents related to Jeopardy! questions. Of course, in order to understand the question and be able to provide the right answer, a basic understanding of parts of language is needed, but again, no new technology is necessary here. Nevertheless, the achievement is still significant. Being able to identify the actual answer fast is not trivial.

In the first match-up, Watson did fairly well. It tied for 1st place with Brad Rutter. In the second match-up Watson annihilated the competition by answering almost all of the questions on the board. Watson did make mistakes in both games, all of them silly and easy to fix. For example, in the first game Watson repeated the incorrect answer of another player. If Watson could hear or be given in text the answers of other opponents this could be avoided. In the second game Watson messed up the final Jeopardy! round question: It gave “Toronto” as an answer to a question about U.S. cities. This seems like a bug. On the other hand, Watson knew that the answer was probably wrong, so it bet a very small amount of money.

You will notice that the majority of questions are not very difficult to answer, provided that one has the information stored somewhere. A simple Google or Bing search would provide a highly relevant document for most questions. Watson is not connected to the Internet of course, but it is storing a huge database of information about everything. Here are some questions that Watson got wrong on the first day:

Olympic Oddities: It was the anatomical oddity of U.S. gymnast George Eyser, who won a gold medal on the parallel bars in 1904. Watson’s answer: What is leg. Correct answer: he is missing a leg.

Final Frontiers: From the Latin for “end”, this is where trains can also originate. Watson’s answer: What is finis. Correct answer: terminus

Final Frontiers: It’s a 4-letter term for a summit; the first 3 letters mean a type of simian. Watson did not answer. Correct answer: apex

Alternate Meanings: Stylish elegance, or students who all graduated in the same year. Watson’s answer: What is chic. Correct answer: class

To push one of these paper products is to stretch established limits. Watson did not answer. Correct answer: envelope

The last question is instructive. It requires the creative type of reasoning that only humans are currently capable of. Watson could have identified the answer if and only if it “knew” that a common expression for stretching established limits is “to push the envelope”. A simple Google search for “to push the envelope” reveals a plethora of different definitions, none of which contain the words “established” or “limits”. I would expect Watson to be able to find an answer here by using some reasoning on synonyms, but I guess that it doesn’t (or it was not fast enough).

In the category “Name the decade” Watson failed to answer any of the questions. This is odd, given that it should be easy to identify the decade that a particular event mentioned in a question took place in. Humans were faster here.

Overall, Watson had 15 correct and 4 incorrect answers the first day. On the second day things were different. The questions, in my view, are much easier to answer. There are a lot of important keywords in each question. The language constructs are also fairly standard and easy to parse. Watson answered 23 questions correctly. It made only one mistake:

The Art of the Steal: In May 2010 5 paintings worth $125 million by Braque, Matisse & 3 others left Paris’ Museum of this art period. Watson’s answer: Picasso. Correct answer: modern art

Watson here failed to correctly analyze the question and understand what it was really asking for. It does not seem like a difficult question to me. Watson should have been able to give a more meaningful (if not correct) answer.

Overall, I believe that the effort so far has been admirable, but I wouldn’t go so far as to say that Watson possess any sort of intelligence. Standard technologies, cleverly arranged together, should be able to perform as well as Watson. Notice also that Watson does not learn from its mistakes, neither does its understanding improve the more it plays the game. It would be interesting to see a match-up between Watson and an open source system!

Here is a link to a paper by Watson’s principal investigators, describing their approach.