Bug lets humans grab Daily Double as Watson triumphs on Jeopardy

Though Jennings got the final Jeopardy question right, he knew he'd been defeated when Watson scored the last Daily Double. His answer shows his concession.

Note: In this article, Jeopardy's "answers" are referred to as "questions" and vice versa.

The humans tried to hold on in the second game of Jeopardy against the IBM computer, but ultimately were no match. Watson finished with a two-game total of $77,147 to Ken Jennings' $24,000 and Brad Rutter's $21,400. Jennings and Rutter managed to make a larger dent in Watson's progress in the second game, but the computer managed to take both Daily Doubles away from the human contestants, not affording them enough of an opportunity to make up for Watson's $25,000 lead from the first game. Still, there were a few aspects of the game that gave the humans some ins, including a bug that let Ken Jennings score the first Daily Double.

During a panel at Rensselaer Polytechnic Institute's Experimental Media and Performing Arts Center, Dr Chris Welty, a member of Watson's algorithms team, noted that the start-and-stop nature of filming the episode got Watson mixed up and allowed a bug to surface. Watson begins every round looking for Daily Double clues, because they are crucial to progress in the game. After one filming pause in the first round when Watson had been made to stop and then pick up again, Welty said Watson began again thinking the Daily Double had already been found. So it stopped looking for the clue, allowing Jennings to find it first.

"They were having a lot of problems in that particular round and they kept stopping," Welty said. "There was still a Daily Double left in that round, and the front end that keeps track of the game state had thought the Daily Double was already revealed." Because Watson thought the Daily Double was gone, it started working its secondary strategy of selecting the lowest level clues to allow it to learn about a category. This left Jennings free to sort through the remaining higher value clues where the Daily Double was, allowing him to pick it up while Watson was cherry picking the top rows.

Another of Watson's biggest weaknesses was laid bare by a category from the first round, "Actors Who Direct." The questions in the topic were shorter than standard clues, usually only the names of two movies pointing to one man, and didn't give enough time for Watson to process and hit the buzzer first. "The answers were not ready in time because the questions were so quick," said Chris Welty. "One of the things that Watson actually doesn't know is that it's losing the buzzer because its answers aren't ready."

Not only was this bad from a score standpoint, but it formed a vicious circle for Watson's clue selection. Welty pointed out that Watson will select clues from categories based on where it's getting responses correct, which it was in the case of Actors Who Direct, but Watson doesn't get any information on whether its right answers are actually allowing it to buzz in first and get the points."It's going to keep going back because it's getting all the right answers," Welty said.

Aside from issues of timing, Watson's algorithms worked well in the sense that it was very rarely certain of a wrong answer. On answers it was certain of, it nearly always beat Jennings and Rutter to the buzzer; if the answer didn't turn up a high-confidence response, as was often the case with subtly worded questions, Watson would remain silent.

That's not to say there weren't outliers—Watson was occasionally unsure of answers that were correct. For example, in a Daily Double question on art from the first game, Watson came up with the correct answer, Baghdad, but with only 32 percent confidence. And as happened with the infamous Final Jeopardy question from the first game, Watson seems to struggle with the relationship that categories can have to a correct response. In the topic "On the Keyboard" during the second game, the clue "A loose-fitting dress hanging straight from the shoulders to below the waist," prompted Watson to ask "What is a chemise?" The correct response was the dress shape and keyboard key "shift."

But in regular Jeopardy rounds, Watson was able to learn during the game based on previous answers in the category what type of answer was required. For example, in the first Jeopardy game, Watson eventually figured out—albeit a bit late—that the "Name that Decade" category did, in fact, want a decade as the answer. Even Watson's handlers were impressed: "It actually kind of figured out on its own that decades were important," Dr. Adam Lally, a senior software engineer from IBM, said.

Towards the end of the panel, Welty and Lally were prompted to discuss the choice of gender for Watson's voice, which is currently of the smooth, genial male variety. "We did experiment a lot with female voice as well," Welty said. "But the speech software we had, the way you could change the settings of the voice, and I mean this in the best possible way, it just was not possible to get a female voice that wasn't a little bit grating." This drew sounds of ire from the crowd, but Welty added that having the voice operate in lower ranges made it easier to soften, and that both men and women on the development team preferred the male voice.

Watson's machine learning may come in handy in the future that its creators are envisioning for it, which include medical diagnoses and tech support. Of course, phone or voice input is currently out the question, as parsing sounds isn't something Watson can currently do. But with text input, Watson could be able to do great things from an information standpoint, especially given that it is able to find high-level connections between tiny details.

As a result of Watson's two-game win, 100 percent of its prize money, $1 million, will be donated to charity. Jennings and Rutter walk away with $300,000 and $200,000, respectively, and each is donating half of their prize to a charity of his choice.