Jeopardy: IBM’s Watson almost sneaks wrong answer by Trebek

Watson, a powerful computer designed by IBM to take on the task of processing …

Watson, the IBM computer designed to take on humans in the quiz game Jeopardy, made its television debut last night. Positioned between two past Jeopardy champs, Ken Jennings and Brad Rutter, Watson's swirling globe avatar was able to hold its own, finishing the first round tied with Rutter at $5,000.

Dr. Chris Welty, a member of Watson's algorithms team, was on hand to provide commentary during Rensselaer Polytechnic Institute's showing of the Jeopardy episode at the school's Experimental Media and Performing Arts Center. Ars was there to hear his take.

No context

First, a quick rundown of how Watson plays Jeopardy. The computer is fed the answer in text form at the same time the answer panel appears to the two human players. Watson then queries its database for an appropriate question response, a process that doesn't involve using the Internet at all. Welty noted that game shows are federally regulated and there were two auditors present while the episode was filmed to make sure the computer wasn't querying the Internet for answers.

Watson then must push a physical buzzer to answer questions, just like its human competitors. While this would seem to be a task at which computers would have an overwhelming advantage, Welty noted that Rutter was so well-known for his lightning fast buzzing that the producers weren't even mildly concerned.

When the match began, the computer got off to a strong start: it took control of the board away from Rutter on the second turn, immediately nailed a Daily Double square, bet $1,000, and got the question right. But later, on a Name That Decade question, Jennings answered incorrectly with "what is the 1920s?" Watson, which can't see or hear and so can't pick up on the follies of its competitors, followed Jennings' answer with its own: "What is the 1920s?"

During a commercial after Watson's decade gaffe, Welty noted that the team thought the ability to process other players' wrong answers would be unnecessary. "We just didn't think it would ever happen," Welty said, laughing.

Watson also tripped up on an "Olympic Oddities" question, but so imperceptibly that Alex Trebek didn't notice at first, raising an important point of clarification. After Jennings answered incorrectly that Olympian gymnast George Eyser was "missing a hand," Watson answered, "What is a leg?"

Welty said Trebek initially accepted Watson's answer, but the taping had to be stopped and the sequence reshot because Trebek had forgotten that Watson wasn't aware of the context created by Jenning's answer.

If a person had answered the Oddities question the way Watson did, they could have been presumed to be following the context of Jennings' answer, with the "missing"-ness of the leg implied. But since Watson couldn't have heard Jennings, its answer of "What is a leg?" rather than "What is missing a leg?" was actually deemed incorrect. In the aired version of the episode, Trebek declares Watson's answer wrong.

Last night's airing was the first of three and it covered only the first round of the game. Watson, Jennings, Rutter, and Trebek will continue tonight beginning with the double and final Jeopardy rounds of the first game, with a second full game to be played on the third night.

I dunno, I think Watson should have gotten it. Sure, the programmers at IBM should count it as incorrect and work to improve Watson, but Jeopardy should have awarded the money to Watson. It's a clear case of discrimination by the meatbags!

"Watson also tripped up on an "Olympic Oddities" question, but so imperceptibly that Alex Trebek didn't notice at first, raising an important point of clarification. After Jennings answered incorrectly that Olympian gymnast George Eyser was "missing a hand," Watson answered, "What is a leg?"

Welty said Trebek initially accepted Watson's answer, but the taping had to be stopped and the sequence reshot because Trebek had forgotten that Watson wasn't aware of the context created by Jenning's answer.

If a person had answered the Oddities question way Watson did, they could have been presumed to be following the context of Jennings' answer, with the "missing"-ness of the leg implied. But since Watson couldn't have heard Jennings, its answer of "What is a leg?" rather than "What is missing a leg?" was actually deemed incorrect. In the aired version of the episode, Trebek declares Watson's answer wrong."

I am seriously not afraid to say I do not understand (I do not watch Jeopardy often tbh... Only when browsing channels)

I dunno, I think Watson should have gotten it. Sure, the programmers at IBM should count it as incorrect and work to improve Watson, but Jeopardy should have awarded the money to Watson. It's a clear case of discrimination by the meatbags!

No, I think it should be wrong because it demonstrated a critical miscalculation by the engineers. If they had programmed it to incorporate other competitors' answers, both this and the other gaffe could have been avoided.

Another oddity in the show was a question about the Beatles referencing "his silver hammer was blah, blah, blah" and Watson said "what is Maxwell's silver hammer?" Seems like the correct answer would've been just Maxwell, but I guess they chose to not be quite that nit picky.

This was awesome to watch. As Watson was killing the other two contestants, I actually found myself wondering if Watson was happy with how well it was doing. Then I caught myself and realized just how bizarre the whole thing was. Just awesome.

How well Watson does really depends on the questions. Anything which require nuance would probably be much more difficult... it depends on how much meta-data exist about a given subject.

If you had a question about explorers like "in the 13th century, this man was considered to be the biggest liar in all of europe" (that's an old Jeopard question) would Watson have determined the answer to be Marco Polo?

If the producers of the show really wanted to make Watson look bad, they could probably pick a lot of questions a computer couldn't answer very well (answers involving opinions, puns, particular expressions and turns of phrases, inferred intentions... avoid the who, what, where, when types of questions). They probably should... since they wouldn't want to marginalize their human players... I should think that they'd want for their best player to seem to be "smarter" than a powerful super computer).

How well Watson does really depends on the questions. Anything which require nuance would probably be much more difficult... it depends on how much meta-data exist about a given subject.

If you had a question about explorers like "in the 13th century, this man was considered to be the biggest liar in all of europe" (that's an old Jeopard question) would Watson have determined the answer to be Marco Polo?

If the producers of the show really wanted to make Watson look bad, they could probably pick a lot of questions a computer couldn't answer very well (answers involving opinions, puns, particular expressions and turns of phrases, inferred intentions... avoid the who, what, where, when types of questions). They probably should... since they wouldn't want to marginalize their human players... I should think that they'd want for their best player to seem to be "smarter" than a powerful super computer).

You vastly underestimate the complexity of Watson. It's whole point is that it gets all the those tough questions involving puns, and expressions and turns of phrases right. Its an advancement (or state of the art implementation) of "natural language processing". If it was a just a fact machine this would be pretty ho-hum.

If the producers of the show really wanted to make Watson look bad, they could probably pick a lot of questions a computer couldn't answer very well

A computer cannot answer any question on its own, it has to be programmed to do so. I think it was the goal of IBM to figure out how to understand opinions, puns, idioms, etc. About the only thing they could not program it to understand would be inflection, given the textual context.

"Watson also tripped up on an "Olympic Oddities" question, but so imperceptibly that Alex Trebek didn't notice at first, raising an important point of clarification. After Jennings answered incorrectly that Olympian gymnast George Eyser was "missing a hand," Watson answered, "What is a leg?"

Welty said Trebek initially accepted Watson's answer, but the taping had to be stopped and the sequence reshot because Trebek had forgotten that Watson wasn't aware of the context created by Jenning's answer.

If a person had answered the Oddities question way Watson did, they could have been presumed to be following the context of Jennings' answer, with the "missing"-ness of the leg implied. But since Watson couldn't have heard Jennings, its answer of "What is a leg?" rather than "What is missing a leg?" was actually deemed incorrect. In the aired version of the episode, Trebek declares Watson's answer wrong."

I am seriously not afraid to say I do not understand (I do not watch Jeopardy often tbh... Only when browsing channels)

If the clue was something along the lines of "he suffered from this impairment", the response of 'what is a leg' doesn't make sense, therefore it wasn't judged in watson's favor.

Since watson is a computer that is unaware of its competitors answers, the answers need to stand on their own to be correct. With input, context can be assumed, in addition, Alex will occasionally prompt contestants for clarification before their time runs out, which he can't do with watson.

If the clue was something along the lines of "he suffered from this impairment", the response of 'what is a leg' doesn't make sense, therefore it wasn't judged in watson's favor.

Have you watched football lately? You do not suffer a leg injury, you "have a leg". If Madden were the judge instead of Trebek, it would have been valid. I do think it should have been valid anyway; if a human player had said that it would have counted. They should be judging the answers, not the IBM engineer's work.

My favorite is when a guy "has a groin." If he did not have a groin, I do not think he would be playing!

If the producers of the show really wanted to make Watson look bad, they could probably pick a lot of questions a computer couldn't answer very well (answers involving opinions, puns, particular expressions and turns of phrases, inferred intentions... avoid the who, what, where, when types of questions). They probably should... since they wouldn't want to marginalize their human players... I should think that they'd want for their best player to seem to be "smarter" than a powerful super computer).

It did certainly seem like Watson had a harder time on the questions that required more understanding of syntax and/or idioms instead of just keywords. By the end of the show one of the players had caught up to Watson because it got off to a fast start but then later struggled a bit as it seemed they got into some questions it couldn't parse as easily.

Actually the reason why Watson was created was to see if software could understand context and not give a strict librarian answer or search and repeat an answer that some meatbag posted somewhere on the internet. The impact of understanding context is huge.... it should give computers the ability to understand the "intent" of your request and not what you "said". And yes, IBM will sell this technology to boost their information insight offerings (how to catalog unstructured data or badly worded structure data and make some insight into it).

The thing with the answer about the leg answers one thing for me - why Brad didn't ring in after Watson's answer and rephrase it to say "What is 'he was missing a leg'?" That's certainly what I would have done - but since it was initially ruled correct Brad wouldn't have had the opportunity during normal play, and once they went back to refilm Alex's mistake he couldn't very well do it then. Honestly, I don't think Alex should have ruled it right anyway, even if it was a human. Usually the contestants will make it clearer when they're correcting a previous answer that was oh-so-close. "What is 'he only had one hand'?" "No. Player B?" "What is, 'he only had one LEG.'" Just saying "Leg" in that instance of what the infirmity/oddity was could have implied the leg was attached backwards, or he had an extra one, or something else.

"Watson also tripped up on an "Olympic Oddities" question, but so imperceptibly that Alex Trebek didn't notice at first, raising an important point of clarification. After Jennings answered incorrectly that Olympian gymnast George Eyser was "missing a hand," Watson answered, "What is a leg?"

Welty said Trebek initially accepted Watson's answer, but the taping had to be stopped and the sequence reshot because Trebek had forgotten that Watson wasn't aware of the context created by Jenning's answer.

If a person had answered the Oddities question way Watson did, they could have been presumed to be following the context of Jennings' answer, with the "missing"-ness of the leg implied. But since Watson couldn't have heard Jennings, its answer of "What is a leg?" rather than "What is missing a leg?" was actually deemed incorrect. In the aired version of the episode, Trebek declares Watson's answer wrong."

I am seriously not afraid to say I do not understand (I do not watch Jeopardy often tbh... Only when browsing channels)

If the clue was something along the lines of "he suffered from this impairment", the response of 'what is a leg' doesn't make sense, therefore it wasn't judged in watson's favor.

Just to clarify a little more...

The human contestants are presumed to be aware of their opponents' mistakes. If 'Human A' replied with "What is a missing hand?" first, it would have been deemed incorrect, and 'Human B' followed that with "...a leg?" the show's host and judges would have deemed it correct, because it follows directly in the context of "What is a missing..." from 'Human A'.

Since Watson has no context of its human opponents' incorrect answers, it would be unfair to judge its answers in the context of those things it has no awareness of.

During a commercial after Watson's decade gaffe, Welty noted that the team thought the ability to process other players' wrong answers would be unnecessary. "We just didn't think it would ever happen," Welty said, laughing.

For normal games of Jeopardy, the other guy's wrong answers can be an important hint. Maybe they should hire a stenographer for the next version.

Check out the special Nova did, they go into much more detail az to how Watson "learns". It is impossible to program every nuance into Watson, the bulk of the system uses machine learning to come up with answers to the questions. They had a breakthrough, touched on in this article, when they fed the responses to the other contestants after the question which gave Watson context for the category. Still an impressive showing.

My only issue was Watson got the question the second it was flashed on the screen, which means it could read and query before the humans would be able to read and interpret the question. I still think it's got an advantage in pressing the buzzer too.

If the clue was something along the lines of "he suffered from this impairment", the response of 'what is a leg' doesn't make sense, therefore it wasn't judged in watson's favor.

Have you watched football lately? You do not suffer a leg injury, you "have a leg". If Madden were the judge instead of Trebek, it would have been valid. I do think it should have been valid anyway; if a human player had said that it would have counted. They should be judging the answers, not the IBM engineer's work.

My favorite is when a guy "has a groin." If he did not have a groin, I do not think he would be playing!

Was the answer that he had a leg injury, or was missing a leg? Watson didnt specify.

If the clue was something along the lines of "he suffered from this impairment", the response of 'what is a leg' doesn't make sense, therefore it wasn't judged in watson's favor.

Have you watched football lately? You do not suffer a leg injury, you "have a leg". If Madden were the judge instead of Trebek, it would have been valid. I do think it should have been valid anyway; if a human player had said that it would have counted. They should be judging the answers, not the IBM engineer's work.

My favorite is when a guy "has a groin." If he did not have a groin, I do not think he would be playing!

In addition, consider that watson would have answered the same even if he hit the buzzer first. Had a human playet said that, Alex would have been able to say "be more specific", which happens occasionally.

Wouldn't connecting to the internet be pretty useless for Watson, anyway? The game is so fast paced that there's just no time to query any online data, especially with players like Rutter who are fast on the buzzer as opponents.

The human contestants are presumed to be aware of their opponents' mistakes. If 'Human A' replied with "What is a missing hand?" first, it would have been deemed incorrect, and 'Human B' followed that with "...a leg?" the show's host and judges would have deemed it correct, because it follows directly in the context of "What is a missing..." from 'Human A'.

Since Watson has no context of its human opponents' incorrect answers, it would be unfair to judge its answers in the context of those things it has no awareness of.

It would be fascinating to see a future contest where there is a real-time feedback look where Watson (or its successor) is informed of the opponents' wrong responses and the correct responses if no one gets it right. This is a very distinct advantage the humans have over the machine in the current setup.

The audio processing on the wrong responses would be a big bottleneck, but the correct answers could be fed in by pushing a button when Trebek or the human opponent gives them.