Can a Computer Win on 'Jeopardy'?

Defeating a chess champion is a piece of cake compared to parsing puns and analyzing language.

Developed over four years at an estimated cost of
more than $30 million, IBM's "Jeopardy"-playing computer, Watson, will
face the quiz show's grand masters, Ken Jennings and Brad Rutter, in two
games to be aired Feb. 14, 15 and 16. As Stephen Baker relates in the following excerpt from
his new book, "Final Jeopardy: Man vs. Machine and the Quest to Know
Everything," doubts remain about how well Watson can process the endless
subtleties of human language.

Watson paused. The closest thing it had to a face, a glowing orb on a
flat-panel screen, turned from forest green to a dark shade of blue.
Filaments of yellow and red streamed steadily across it, like the paths
of jets circumnavigating the globe. This pattern represented a state of
quiet anticipation as the supercomputer awaited the next clue.

Interactive: Match Wits With Watson

It was a
September morning in 2010 at IBM Research, in the hills north of New
York City, and the computer, known as Watson, was annihilating two
humans, both champion-caliber players, in practice rounds of the
knowledge game of "Jeopardy." Within months, it would be playing the
game on national TV in a million-dollar man vs. machine match-up against
two of the show's all-time greats.

As Todd Crain, an actor and the host of these test games, started to
read the next clue, the filaments on Watson's display began to jag and
tremble. Watson was thinking—or coming as close to it as a computer
could. The $1,600 clue, in a category called "The eyes have it," read:
"This facial wear made Israel's Moshe Dayan instantly recognizable
world-wide."

The three players—two human and one
electronic—could read the words as soon as they appeared on the big
"Jeopardy" board. But they had to wait for Mr. Crain to read the entire
clue before buzzing. That was the rule. At the moment the host
pronounced the last word, a light would signal that contestants could
buzz. The first to hit the button could win $1,600 with the right
answer—or lose the same amount with a wrong one. (In these test matches,
they were playing with funny money.)

This pause for reading gave Watson three or four seconds to hunt down
the answer. The first step was to figure out what the clue meant. One
of its programs promptly picked apart the grammar of the sentence,
identifying the verbs, objects and key words. In another section of its
cluster of computers, research focused on Moshe Dayan. Was this a
person? A place in Israel? Perhaps a holy site?

During these seconds, Watson's cognitive apparatus—2,208 computer
processors working in concert—mounted a massive research operation
around Moshe Dayan and his signature facial wear. They piled through
thousands of documents stored in the machine. After a second or so,
different programs in the computer, or algorithms, began to suggest
hundreds of possible answers. To humans, many of them would look like
wild guesses. Some were phrases that Mr. Dayan uttered, others were
references to his military campaigns and facts about Israel. Still
others proposed various articles of his clothing. At this point, the
computer launched its second stage of analysis, figuring out which
response, if any, merited its confidence. It proceeded to check and
recheck facts, making sure that Moshe Dayan was indeed a person, an
Israeli, and that the answer referred to something he wore on his face.

A human looking at Watson's frantic and repetitive labors might
conclude that the player was unsure of itself, laughably short on common
sense, and scandalously wasteful of computing resources. This was all
true. Watson barked up every tree, and from every conceivably angle. The
pattern on its screen during this process, circles exploding into
little stars, provided only a hint of the industrial-scale computing at
work.

In a room behind the podium, visible through a horizontal window,
Watson's complex of computers churned, and the fans cooling them roared.
This time, its three seconds of exertion paid off. Watson had come up
with a response. The computer sent a signal to a mechanical device on
the podium. It was the size of a large aspirin bottle with a clear
plastic covering. Inside was a buzzer. About one one-hundredth of a
second later, a metal finger inside this contraption shot downward,
pressing the button.

Justin Bernbach, a 38-year-old from Brooklyn, stood to Watson's left.
The airline lobbyist had pocketed $155,000 while winning seven straight
"Jeopardy" matches in 2009. Unlike Watson, Mr. Bernbach understood the
sentence. He knew precisely who Moshe Dayan was as soon as he saw the
clue, and he carried an image of the Israeli leader in his mind. He had
the answer. He gripped the buzzer in his fist and frantically pressed it
four or five times as the light came on.

But Watson had arrived first.

"Watson?" said Mr. Crain.

The computer's amiable male voice arranged the answer, as "Jeopardy"
demands, in the form of a question. "What is eye patch?" it said.

Mr. Bernbach slumped at his podium. This match with the machine wasn't going well.

***

It was going magnificently
for David Ferrucci. As the chief scientist of the team developing the
"Jeopardy"-playing computer, Mr. Ferrucci was feeling vindicated. Only
three years earlier, the suggestion that a computer might match wits and
word skills with human champions in "Jeopardy" sparked opposition
bordering on ridicule in the halls of IBM Research. And the final goal
of the venture, a nationally televised match against two "Jeopardy"
legends, Ken Jennings and Brad Rutter, seemed risky to some, a bit
déclassé to others. "Jeopardy," a TV quiz show, appeared to lack the
timeless cachet of chess, which IBM computers had mastered a decade
earlier.

Nonetheless, Mr. Ferrucci and his team
went ahead and built their machine. Months earlier, It had fared well
in a set of test matches. But the games revealed flaws in the machine's
logic and game strategy. It was a good player, but to beat Messrs.
Jennings and Rutter, who would be jousting for a million-dollar top
prize, it would have to be great. So they had worked long hours over the
summer to revamp Watson. This September event was the coming-out party
for Watson 2.0. It was the first of 50 test matches against a higher
level of competitor: humans, like Justin Bernbach, who had won enough
matches to compete in the show's Tournament of Champions.

Watson, in these early matches, was
having its way with them. Mr. Ferrucci, monitoring the matches from a
crowded observation booth, was all smiles. Keen to promote its
"Jeopardy"-playing phenom, IBM's advertising agency, Ogilvy &
Mather, had hired a film crew to follow Mr. Ferrucci's team and capture
the drama of this opening round of championship matches. The observation
room was packed with cameras. Microphones on long booms recorded the
back and forth of engineers as they discussed algorithms and Watson's
response time, known as latency.

It was almost as if Watson, like a
human giddy with hubris, was primed for a fall. The computer certainly
had its weaknesses. Even when functioning smoothly, it would commit its
share of wacky mistakes. Right before the lunch break, one clue read,
"The inspiration for this title object in a novel and a 1957 movie
actually spanned the Mae Khlung." Now, it would be reasonable for a
computer to miss "The Bridge Over the River Kwai," especially since the
actual river has a different name. Perhaps Watson had trouble
understanding the sentence, which was convoluted, even for humans. But
how did the computer land upon its outlandish response, "What is Kafka?"
Mr. Ferrucci didn't know. Those things happened, but Watson still won
the two morning matches.

It was after lunch that things
deteriorated. Mr. Bernbach, so frustrated in the morning, started to
beat Watson to the buzzer. Meanwhile, the computer was making risky bets
and flubbing entire categories of clues. Defeat, which seemed so remote
in the morning, was now just one lost bet away. This came in the fourth
match. Watson was winning by $4,000 when it stumbled on this final
clue: "On Feb. 8, 2010, the headline in a major newspaper in this city
read: 'Amen! After 43 years, our prayers are answered.'" Watson missed
the reference to the previous day's Super Bowl, won by the New Orleans
Saints. It bet $23,000 on Chicago. Mr. Bernbach also botched the clue,
guessing New York. But he bet less than Watson, which made him the first
human to defeat the revamped machine. He pumped his fist.

In the sixth and last match of the
day, Watson trailed Mr. Bernbach, $16,200 to $21,000. The computer
landed on a Daily Double, which meant it could bet everything it had on
nailing the clue. It was under the category "Colleges and Universities."
A $5,000 bet would have brought Watson into a tie with Mr. Bernbach. A
larger bet, while risky, could have catapulted the computer toward
victory. "I'll take five," Watson said.

Five. Not $5,000, not $500. Five
measly dollars of funny money. The engineers in the observation booth
were stunned. But they kept quieter than usual, since cameras were
rolling.

Then Watson crashed. It occurred at
some point between placing that lowly bet and attempting to answer a
clue about the first Catholic college in Washington. Watson's "front
end," its voice and avatar, were waiting for its thousands of
processors, or "back end," to deliver an answer. It received nothing.
Anticipating these situations, the engineers had prepared Watson with
set phrases. "Sorry," Watson said, reciting one of them, "I'm stumped."
Its avatar displayed a dark blue circle with a single filament orbiting
mournfully in the antarctic latitudes.

What to do? Everyone had ideas. Maybe
they should finish the game with an older version of Watson. Or perhaps
they could hook up Watson to another up-to-date version of the program
at the company's Hawthorne labs, six miles down the road. But some
worried that a remote connection would slow Watson's response time,
causing it to lose more often on the buzz. In the end, as often happens
with computers, a reboot brought the hulking "Jeopardy" machine back to
life. But Mr. Ferrucci and his team got an all-too-vivid reminder that
their "Jeopardy" player, even as it prepared for a national TV debut,
could go haywire at any moment. When Watson was lifted to the podium,
facing banks of TV lights, it was anybody's guess how the computer would
perform.

***

Only four years earlier,
in 2006, Watson was a prohibitive long shot, not just to win at
"Jeopardy," but even to be built. For more than a year, the head of IBM
Research, a physicist named Paul Horn, had been pressing different teams
at the company to pursue a "Jeopardy"-playing machine. The way Mr. Horn
saw it, IBM had triumphed in 1997 with its chess challenge. The
company's machine, Deep Blue, had defeated the reigning world champion,
Garry Kasparov. This burnished IBM's reputation among the global
computing elite while demonstrating to the world that computers could
rival humans in certain domains associated with intelligence.

That triumph had left IBM's top
executives hungry for an encore. Mr. Horn felt the pressure. But what
could the researchers get a computer to do? Deep Blue had rifled through
millions of scenarios per second, calculated probabilities, and made
winning moves. But it had skipped the far more complex domain of words.
This, Mr. Horn thought, was where the next challenge would be. The next
computer should charge into the vast expanse of human language and
knowledge. For the test, Mr. Horn settled on "Jeopardy." The quiz show,
which debuted in 1964, attracted some nine million viewers every
weeknight. It was the closest thing in the United States to a knowledge
franchise. "People associated it with intelligence," Mr. Horn later
said.

There was one small problem. For
months, he couldn't get any takers. "Jeopardy," with its puns and
strangely phrased clues, seemed too hard for a computer. IBM already had
teams building machines to answer questions, and their performance, in
speed and precision, came nowhere close to even a moderately informed
human. How could the next machine grow so much smarter?

Mr. Horn eventually enticed David
Ferrucci and his team to pursue his vision. An expert in Artificial
Intelligence, Mr. Ferrucci had a wide and ranging intellect. He was
comfortable conversing about everything from the details of
computational linguistics to the evolution of life on Earth and the
nature of human thought. This made him an ideal ambassador for a
"Jeopardy"-playing machine. After all, his project would raises all
sorts of issues, and fears, about the role of brainy machines in
society. Would they compete for jobs? Could they establish their own
agendas, like the infamous computer, HAL, in "2001: A Space Odyssey,"
and take control? What was the future of knowledge and intelligence, and
how would brains and machines divvy up the cognitive work?

For humans, knowledge is an entire
universe, a welter of sensations and memories, desires, facts, skills,
songs and images, words, hopes, fears and regrets, not to mention love.
But for those hoping to build intelligent machines, it has to be
simpler. Broadly speaking, it falls into three categories: sensory
input, ideas and symbols.

Consider the color blue. It's
something that computers and people alike can perceive, each in their
own fashion. Sensory perception is the raw material of knowledge. Now
think of the three-letter word "sky." Those letters are a symbol for the
biggest piece of blue in our world. Computers can handle such symbols.
But how about this snippet from Lord Byron? "Friendship is love without
his wings." That sentence represents the third realm of knowledge:
ideas. How can a machine make sense of these? In these early years of
the 21st century, ideas remain the dominion of humans—and the frontier
for thinking machines.

Over the next four years, Mr. Ferrucci
set about creating a world in which people and their machines often
appeared to switch roles. He didn't know, he later said, whether humans
would ever be able to "create a sentient being." But when he looked at
fellow humans through the eyes of a computer scientist, he saw patterns
of behaviors that often appeared to be pre-programmed: the zombie-like
commutes, the near-identical routines, from tooth-brushing to feeding
the animals, the retreat to the same chair, the hand reaching for the TV
remote. "It's more interesting," he said, "when humans delve inside
themselves and say, 'Why am I doing this? And why is it relevant and
important to be human?' "

His machine, if successful, would nudge people toward that line of
inquiry. Even with an avatar for a face and a robotic voice, the
"Jeopardy" machine would invite comparisons to the other two contestants
on the stage. This was inevitable. And whether it won or lost on a
winter evening in 2011, the computer might lead millions of spectators
to rethink the nature, and probe the potential, of their own humanity.

From McKinsey Quarterly. This is taken from the fourth chapter of Final Jeopardy. I've cut-and-pasted it here.

The programmer’s dilemma: Building a Jeopardy! championIBM computer scientist David Ferrucci and his team set out to build a machine that could beat the quiz show’s greatest players. The result revealed both the potential—and the limitations—of computer intelligence.

FEBRUARY 2011 • Stephen BakerIn 2007, IBM computer scientist David Ferrucci and his team embarked on the challenge of building a computer that could take on—and beat—the two best players of the popular US TV quiz show Jeopardy!, a trivia game in which contestants are given clues in categories ranging from academic subjects to pop culture and must ring in with responses that are in the form of questions. The show, a ratings stalwart, was created in 1964 and has aired for more than 25 years. But this would be the first time the program would pit man against machine.

In some sense, the project was a follow-up to Deep Blue, the IBM computer that defeated chess champion Garry Kasparov in 1997. Although a TV quiz show may seem to lack the gravitas of the classic game of chess, the task was in many ways much harder. It wasn’t just that the computer had to master straightforward language, it had to master humor, nuance, puns, allusions, and slang—a verbal complexity well beyond the reach of most computer processors. Meeting that challenge was about much more than just a Jeopardy! championship. The work of Ferrucci and his team illuminates both the great potential and the severe limitations of current computer intelligence—as well as the capacities of the human mind. Although the machine they created was ultimately dubbed “Watson” (in honor of IBM’s founder, Thomas J. Watson), to the team that painstakingly constructed it, the game-playing computer was known as Blue J.

The following article is adapted from Final Jeopardy: Man vs. Machine and the Quest to Know Everything (Houghton Mifflin Harcourt, February 2011), by Stephen Baker, an account of Blue J’s creation.

It was possible, Ferrucci thought, that someday a machine would replicate the complexity and nuance of the human mind. In fact, in IBM’s Almaden Research Center, on a hilltop high above Silicon Valley, a scientist named Dharmendra Modha was building a simulated brain equipped with 700 million electronic neurons. Within years, he hoped to map the brain of a cat, and then a monkey, and, eventually, a human. But mapping the human brain, with its 100 billion neurons and trillions or quadrillions of connections among them, was a long-term project. With time, it might result in a bold new architecture for computing, one that could lead to a new level of computer intelligence. Perhaps then, machines would come up with their own ideas, wrestle with concepts, appreciate irony, and think more like humans.
But such machines, if they ever came, would not be ready on Ferrucci’s schedule. As he saw it, his team had to produce a functional Jeopardy!-playing machine in just two years. If Jeopardy!’s executive producer, Harry Friedman, didn’t see a viable machine by 2009, he would never green-light the man–machine match for late 2010 or early 2011.

This deadline compelled Ferrucci and his team to build their machine with existing technology—the familiar semiconductors etched in silicon, servers whirring through billions of calculations and following instructions from many software programs that already existed. In its guts, Blue J would not be so different from the battered ThinkPad Ferrucci lugged from one meeting to the next. No, if Blue J was going to compete with the speed and versatility of the human mind, the magic would have to come from its massive scale, inspired design, and carefully-tuned algorithms. In other words, if Blue J became a great Jeopardy! player, it would be less a triumph of science than of engineering.

Blue J’s literal-mindedness posed the greatest challenge. Finding suitable data for this gullible machine was only the first job. Once Blue J was equipped with its source material—from James Joyce to the Boing Boing blog—the IBM team would have to teach the machine to make sense of those texts: to place names and facts into context, and to come to grips with how they were related to each other. Hamlet, just to pick one example, was related not only to his mother, Gertrude, but also to Shakespeare, Denmark, Elizabethan literature, a famous soliloquy, and themes ranging from mortality to self-doubt, just for starters. Preparing Blue J to navigate all of these connections for virtually every entity on earth, factual or fictional, would be the machine’s true education. The process would involve creating, testing, and fine-tuning thousands of algorithms. The final challenge would be to prepare the machine to play the game itself. Eventually, Blue J would have to come up with answers it could bet on within three to five seconds. For this, the Jeopardy! team would need to configure the hardware of a champion.

Every computing technology Ferrucci had ever touched had a clueless side to it. The machines he knew could follow orders and carry out surprisingly sophisticated jobs. But they were nowhere close to humans. The same was true of expert systems and neural networks. Smart in one area, clueless elsewhere. Such was the case with the Jeopardy! algorithms that his team was piecing together in IBM’s Hawthorne, New York, labs. These sets of finely honed computer commands each had a specialty, whether it was hunting down synonyms, parsing the syntax of a Jeopardy! clue, or counting the most common words in a document. Outside of these meticulously programmed tasks, though, each was fairly dumb.

So how would Blue J concoct broader intelligence—or at least enough of it to win at Jeopardy!? Ferrucci considered the human brain. “If I ask you what 36 plus 43 is, a part of you goes, ‘Oh, I’ll send that question over to the part of my brain that deals with math,’” he said. “And if I ask you a question about literature, you don’t stay in the math part of your brain. You work on that stuff somewhere else.” Ferrucci didn’t delve into how things work in a real brain; for his purposes, it didn’t matter. He just knew that the brain has different specialties, that people know instinctively how to skip from one to another, and that Blue J would have to do the same thing.

The machine would, however, follow a different model. Unlike a human, Blue J wouldn’t know where to start answering a question. So with its vast computing resources, it would start everywhere. Instead of reading a clue and assigning the sleuthing work to specialist algorithms, Blue J would unleash scores of them on a hunt, and then see which one came up with the best answer. The algorithms inside of Blue J, each following a different set of marching orders, would bring in competing results. This process, a lot less efficient than the human brain, would require an enormous complex of computers. More than 2,000 processors would each handle a different piece of the job. But the team would concern itself later with these electronic issues—Blue J’s body—after they got its thinking straight.

To see how these algorithms carried out their hunt, consider one of the thousands of clues the fledgling system grappled with. Under the category Diplomatic Relations, one clue read: “Of the four countries the United States does not have diplomatic relations with, the one that’s farthest north.”

In the first wave of algorithms to handle the clue was a group that specialized in grammar. They diagrammed the sentence, much the way a grade-school teacher would, identifying the nouns, verbs, direct objects, and prepositional phrases. This analysis helped to clear up doubts about specific words. In this clue, “the United States” referred to the country, not the Army, the economy, or the Olympic basketball team. Then the algorithms pieced together interpretations of the clue. Complicated clues, like this one, might lead to different readings—one more complex, the other simpler, perhaps based solely on words in the text. This duplication was wasteful, but waste was at the heart of Blue J’s strategy. Duplicating or quadrupling its effort, or multiplying it by 100, was one way the computer could compensate for its cognitive shortcomings, and also play to its advantage: speed. Unlike humans, who can instantly understand a question and pursue a single answer, the computer might hedge, launching searches for a handful of different possibilities at the same time. In this way and many others, Blue J would battle the efficient human mind with spectacular, flamboyant inefficiency. “Massive redundancy” was how Ferrucci’s described it. Transistors were cheap and plentiful. Blue J would put them to use.

While the machine’s grammar-savvy algorithms were dissecting the clue, one of them searched for its focus, or answer type. In this clue about diplomacy, “the one” evidently referred to a country. If this was the case, the universe of Blue J’s possible answers was reduced to a mere 194, the number of countries in the world. (This, of course, was assuming that “country” didn’t refer to “Marlboro Country” or “wine country” or “country music.” Blue J had to remain flexible, because these types of exceptions often surfaced.)

Once the clue was parsed into a question the machine could understand, the hunt commenced. Each expert algorithm went burrowing through Blue J’s trove of data in search of the answer. One algorithm, following instructions developed for decoding the genome, looked to match strings of words in the clue with similar strings elsewhere, maybe in some stored Wikipedia entry or in articles about diplomacy, the United States, or northern climes. One of the linguists focused on rhymes with key words in the clue. Another algorithm used a Google-like approach and focused on documents that matched the greatest number of keywords in the clue, paying special attention to the ones that popped up most often.

While they the algorithms worked, software within Blue J would be comparing the clue to thousands of others it had encountered. What kind was it—a puzzle? A limerick? A historical factoid? Blue J was learning to recognize more than 50 types of questions, and it was constructing the statistical record of each algorithm for each type of question. This would guide it in evaluating the results when they came back. If the clue turned out to be an anagram, for example, the algorithm that rearranged the letters of words or phrases would be the most trusted source. But that same algorithm would produce gibberish for most other clues.

What kind of clue was this one on diplomatic relations? It appeared to require two independent analyses. First, the computer had to come up with the four countries with which the United States had no diplomatic ties. Then it had to figure out which of those four was the farthest north. A group of Blue J’s programmers had recently developed an algorithm that focused on these so-called nested clues, in which one answer lay inside another. This may sound obscure, but humans ask these types of questions all the time. If someone wonders about “cheap pizza joints close to campus,” the person answering has to carry out two mental searches, one for cheap pizza joints and another for those nearby. Blue J’s “nested decomposition” led the computer through a similar process. It broke the clues into two questions, pursued two hunts for answers, and then pieced them together. The new algorithm was proving useful in Jeopardy!. One or two of these combination questions came up in nearly every game. They are especially common in the all-important Final Jeopardy, which usually features more complex clues.

It would take Blue J almost an hour for its algorithms to churn through the data and return with their candidate answers. Most were garbage. There were failed anagrams of country names and laughable attempts to rhyme “north” with “diplomatic.” Some suggested the names of documents or titles of articles that had strings of the same words. But the nested algorithm followed the right approach. It found the four countries on the outs with the United States (Bhutan, Cuba, Iran, and North Korea), checked their geographical coordinates, and came up with the answer: “What is North Korea?”

At this point, Blue J had the right answer. But the machine did not yet know that North Korea was correct, or that it even merited enough confidence for a bet. For this, it needed loads of additional analysis. Since the candidate answer came from an algorithm with a strong record on nested clues, it started out with higher-than-average confidence in that answer. The machine would proceed to check how many of the answers matched the question type: “country.” After ascertaining from various lists that North Korea appeared to be a country, confidence in “What is North Korea?” rose further up the list. For an additional test, it would place the words “North Korea” into a simple sentence generated from the clue: “North Korea has no diplomatic relations with the United States.” Then it would see if similar sentences showed up in its data trove. If so, confidence climbed higher.

In the end, it chose North Korea as the answer to bet on. In a real game, Blue J would have hit the buzzer. But being a machine, it simply moved on to the next clue.

About the Author
Stephen Baker is the author of The Numerati (Houghton Mifflin Harcourt, 2008) and was previously a writer at BusinessWeek.

Final Jeopardy

Well
before Ken Jennings and Brad Rutter, IBM's design team grappled with a
different challenge - getting beaten to the punch by someone else
inventing a trivia-savvy artificial mind.Final Jeopardydiscusses Watson's early development and how this Q&A juggernaut overcame the "Basement Baseline."

In the early days of 2007, before he agreed to head up a Jeopardy project, IBM's David Ferrucci
harbored two conflicting fears. The first of his nightmare scenarios
was perfectly natural: A Jeopardy computer would fail, embarrassing the
company and his team.

But his second concern, failure's diabolical twin, was perhaps even
more terrifying. What if IBM spent tens of millions of dollars and
devoted centuries of researcher years to this project, played it up in
the press, and then saw someone beat them to it? Ferrucci pictured a
solitary hacker in a garage, cobbling together free software from the
Web and maybe hitching it to Wikipedia and other online databases. What
if the Jeopardy challenge turned out to be not too hard but too easy?

That would be worse, far worse, than failure. IBM would become the
laughingstock of the tech world, an old-line company completely out of
touch with the technology revolution - precisely what its corporate
customers paid it billions of dollars to track. Ferrucci's first order
of business was to make sure that this could never happen. "It was due
diligence," he later said.

He had a new researcher on his team, James Fan,
a young Chinese American with a fresh doctorate from the University of
Texas. As a newcomer, Fan was free of institutional pre-conceptions
about how Q-A systems should work. He had no history with the annual
government-sponsored competitions, in which IBM's technology routinely
botched two questions for every one it got right. Trim and soft-spoken,
his new IBM badge hanging around his neck, Fan was an outsider. And he
now faced a singular assignment: to build a Jeopardy computer all by
himself. He was given 500 Jeopardy clues to train his machine and one
month to make it smart. His system would be known as Basement Baseline.

So on a February day in 2007, James Fan set out to program a Q-A
machine from scratch. He started by drawing up an inventory of the
software tools and reference documents he thought he'd need. First would
be a so-called type system. This would help the computer figure out if
it was looking for a person, place, animal, or thing. After all, if it
didn't know what it was looking for, finding an answer was little more
than a crap-shoot; generating enough "confidence" to bet on that answer
would be impossible. For humans, distinguishing President George
Washington from the bridge named after him isn't much of a challenge.
Context makes it clear. Bridges don't deliver inaugural addresses;
presidents are rarely jammed at rush hour, with half-hour delays from
Jersey. What's more, when placed in sentences, people usually behave
differently than roads or bridges.

But what's simple for us involved hard work for Fan's Q-A computer.
It had to comb through the structure of the question, picking out the
subjects, objects, and prepositions. Then it had to consult exhaustive
reference lists that had been built up in the industry over decades,
laying out hundreds of thousands of places, things, and actions and the
web of relationships among them. These were known as "ontologies." Think
of them as cheat sheets for computers. If a
finger was a subject, for example, it fell into human anatomy and was
related to the hand and the thumb and to verbs such as "to point" and
"to pluck." (Conversely, when "the finger" turned up as the object of
the verb "to give," a sophisticated ontology might steer the computer
toward the neighborhood of insults, gestures, and obscenities.)

In any case, Fan needed both a type system and a knowledge base to
understand questions and hunt for answers. He didn't have either, so he
took a hacker's shortcut and used Google and Wikipedia. (While the true
Jeopardy computer would have to store its knowledge in its "head,"
prototypes like Fan's were free to search the Web.) From time to time,
Fan found, if he typed a clue into Google, it led him to a Wikipedia
page - and the subject of the page turned out to be the answer. The
following clue, for example, would confound even the most linguistically
adept computer. In the category The Author Twitters, it reads: "Czech
out my short story ‘A Hunger Artist'! Tweet done. Max Brod, pls burn my
laptop."

A good human Jeopardy player would see past the crazy syntax, quickly
recognizing the short story as one written by Franz Kafka, along with a
reference to Kafka's Czech nationality and his longtime associate Max
Brod. In the same way, a search engine would zero in on those helpful
key words and pay scant attention to the sentence surrounding them. When
Fan typed the clue into Google, the first Wikipedia page that popped up
was "Franz Kafka," the correct answer. This was a primitive method. And
Fan knew that a computer relying on it would botch the great majority
of Jeopardy clues. It would be crashing and burning in the game against
even ignorant humans, let alone Ken Jennings. But one or two times out
of ten, it worked. For Fan, it was a start.

The month passed. Fan added more features to Basement Baseline. But
at the end, the system was still missing vital components. Most
important, it had no mechanism for gauging its level of confidence in
its answers. "I didn't have time to build one," Fan said. This meant the
computer didn't know what it knew. In a game, it wouldn't have any idea
when to buzz. In the end, Fan blew off game strategy entirely and
focused simply on building a machine that could answer Jeopardy clues.

It was on a March morning at IBM labs in Hawthorne, NY, that James
Fan's Basement Baseline faced off against Big Blue's in-house
question-answering system, known as Piquant. The results, from
Ferrucci's perspective, were ideal. The Piquant system succeeded on only
30 percent of the clues, far below the level needed for Jeopardy. It
had high confidence on
only 5 percent of them, and of those it got only 47 percent right. Fan's
Basement Baseline fared almost as well by a number of measures but was
still woefully short of what was needed. Fan proved that a hacker's
concoction was far from Jeopardy standards - which was a relief. But by
nearly matching the company's state-of-the-art in Q-A technology, he
highlighted its inadequacies.

The Jeopardy challenge, it was clear, would require another program,
another technology platform, and a far bolder approach. The job,
Ferrucci said, called for "the most sophisticated intelligence
architecture the world has ever seen." He proceeded to tell his bosses
that he would lead a team to assemble a Jeopardy machine—provided that
they gave him the resources to build a big one.

Since many people have read the e-book minus the last chapter, I'm posting the beginning of the chapter leading up to the point where Watson faces off against Ken Jennings and Brad Rutter on the Jeopardy stage.

Chapter Eleven: The Match

David Ferrucci had driven the same stretch hundreds of times. It was the
route from his suburban home to IBM’s Yorktown labs, or a bit farther to
Hawthorne. For fifteen or twenty minutes along the Taconic Parkway each morning
and evening, he went over his seemingly endless to-do list. How could his team
boost Watson’s fact-checking in Final Jeopardy? Could any fix ensure that the
machine's bizarre speech defect would never return? Was the pun-detection
algorithm performing up to par? There were always more details to focus on,
plenty to fuel both perfectionism and paranoia--and Ferrucci had a healthy
measure of both.

But this January morning was different. As he drove past frozen fields and
forests, the pine trees heavy with fresh snow, all of the to-do lists were
history. After four years, his team’s work was over. Within hours, Watson alone
would be facing Ken Jennings and Brad Rutter, with Ferrucci and the machine’s
other human trainers reduced to spectators. Ferrucci felt his eyes well up. “My
whole team would be judged by this one game,” he said later. “That’s what
killed me.”

The day before, at a jam-packed press conference, IBM had unveiled Watson
to the world. The event took place on a glittering new Jeopardy set mounted
over the previous two weeks by an army of nearly 100 workers. It resembled the
set in Culver City: the same jumbo game board to the left, the contestants'
lecterns to the right, with Alex Trebek's podium in the middle. In front was a
long table for Jeopardy officials, where Harry Friedman would sit, Rocky
Schmidt to his side, followed by a line of writers and judges, all of them
equipped with monitors, phones, and a pile of old-fashioned reference books.
All of the pieces were in place. But this east coast version was plastered with
IBM branding. The shimmering blue wall bore the company’s historic slogan, Think,
in a number of languages. Stretched across the shiny black floor was a logo
that looked at first like Batman’s emblem. But closer study revealed the planet
earth, with each of the continents bulging, as if painted by Fernando Botero.
This was Chubby Planet, the symbol of IBM’s Smarter Planet campaign, and the
modal for Watson’s avatar. In the negotiations with Jeopardy over the past two
years, IBM had lost out time and again on promotional guarantees. It had seemed
that Harry Friedman and his team held all the cards. But now that the match was
assured, and on Big Blue's home turf, not a single branding opportunity would
be squandered.

The highlight of the press event came when Jennings and Rutter strode
across the stage for a five-minute, 15-clue demonstration. In this test run,
Watson had held its own. In fact, it had ended the session ahead of Jennings,
$4,400 to $3,400. Rutter trailed with $1,200. Within hours, online headlines
proclaimed that Watson had vanquished the humans. It was as if the game had
already been won.

If only this were true. The demo match featured just a handful of clues and
included no Final Jeopardy--Watson’s Achilles heel. What’s more, after the
press emptied the auditorium that afternoon, Watson and the human champs went
on to finish that game and play another round--"loosening their
thumbs," in the language of Jeopardy. In these games Ferrucci saw a
potential problem: Ken Jennings. It was clear, he said, that Jennings had
prepped heavily for the match. He had a sense of Watson's vulnerabilities and
an aggressive betting strategy specially honed for the machine. Brad
Rutter was another matter altogether. Starting out, Ferrucci’s team had been
more concerned about Rutter than Jennings. His speed on the buzzer was the stuff
of legend. Yet he appeared relaxed, almost too relaxed, as if he could barely
be bothered to buzz. Was he saving his best stuff for the match?

In the first of the two practice games, Jennings landed on all three daily
doubles. Each time he bet nearly everything he had. This was the same strategy
Greg Lindsay had followed to great effect in three sparring games 10 months
earlier. The rationale was simple. Even with its mechanical finger slowing it
down by a few milliseconds, Watson was lightening fast on the buzzer. The
machine was likely to win more than its share of the regular Jeopardy clues. So
the best chance for humans was to pump up their winnings on the four clues that
hinged on betting, not buzzing. Those were the three Daily Doubles hiding
behind certain clues, and the Final Jeopardy. Thanks to his aggressive betting,
Jennings ended the first full practice game with some $50,000, a length ahead
of Watson, which scored $39,000. Jennings was fired up. When he clinched the
match, he pointed to the computer and exclaimed, “Game over!” Rutter finished a
distant third, with about $10,000. In the second game, Jennings and Watson were
neck and neck to the end, when Watson edged ahead in Final Jeopardy. Again,
Rutter coasted to third place. Ferrucci said that he and his team left the
practice rounds thinking, “Ken’s really good--but what’s going on with Brad?”

When Ferrucci pulled in to the Yorktown labs the morning of the match, the
site had been transformed for the event. The visitors’ parking lot was cordoned
off for VIPs. Security guards posted at the doors checked every person entering
the building, matching their names against a list. And in the vast lobby,
usually manned by one lonely guard, IBM’s luminaries and privileged guests
circled around tables piled with brunch-fare. Ferrucci made his way to Watson’s
old practice studio, now refashioned as an exhibition room. There he gave a
half-hour talk about the supercomputer to a gathering of IBM clients, including
J.P. Morgan, American Express, and the pharmaceutical giant Merck and Co.
Ferrucci recalled the distant days when a far stupider Watson responded to a
clue about a famous French bacteriologist by saying: “What is ‘How Tasty Was My
Little Frenchman’?” (That was the title of a 1971 Brazilian comedy about
cannibals in the Amazon.)

His next stop, the make-up room, revealed his true state of mind. The
make-up artist was a woman originally from Italy, like much of Ferrucci's
family. As she began to work on his face she showered him with warmth and
concern--acting "motherly." This rekindled his powerful feelings
about his team and the end of their journey, and before he knew it, tears were
streaming down his face. The more the woman comforted him, the worse it got.
Ferrucci finally staunched the flow and got the pancake on his face. But he
knew he was a mess. He hunted down Scott Brooks, the light-hearted press
officer. Maybe some jokes, he thought, “would take the lump out of my throat.”
Brooks laughed and kidded his colleague.

This irritated the testy Ferrucci and, to his relief, knocked him out of
his fragile mood. He joined his team for one last lunch, all of them seated at
a long table in the cafeteria. As they were finishing, just a few minutes
before 1 p.m., a roaring engine interrupted conversations in the cafeteria. It
was IBM’s Chairman Sam Palmisano landing in his helicopter. The hour had come.
Ferrucci walked down the sunlit corridor to the auditorium.

****

Ken Jennings woke up that Friday morning in the Crown Plaza in White Plains.
He’d slept well, much better than he usually did before big Jeopardy matches.
Jennings had reason to feel confident. He had destroyed Watson in one of the
practice rounds. Afterwards, he said, Watson’s developers told him that the
game had featured a couple of “train wrecks”--categories where Watson appeared
disoriented. Children’s literature was one. For Jennings, train wrecks signaled
the machine’s vulnerability. With a few of them in the big match, he could
stand up tall for humans, and perhaps extend his legend from Jeopardy to the broader
realm of knowledge. “Given the right board,” he said, “Watson is beatable.” A
stakes were considerable. While IBM would give all of Watson’s winnings to
charity, a human winner would earn a half million-dollar prize, with another
half million to give to the charity of his choice. Finishing in second or third
place was worth $150,000 and $100,000, with equal amounts for the players’
charities.

A little after 11, a car service stopped by the hotel, picked up Jennings and
his wife, Mindy, and drove them 13 miles north to IBM’s Yorktown laboratory.
Jennings carried three changes of clothes, so that he could dress differently
for each session, simulating three different days. As soon as he stepped out of
the car, Jeopardy officials whisked him past the crush of people in the lobby
and toward the staircase. Jeopardy had cleared out a couple of offices in IBM’s
Human Resources department, and Jennings was given one as a dressing room.

On short
visits to the East Coast, Brad Rutter liked to sleep late, so that he stayed in
sync with West Coast time. But the morning of the match, he found himself awake
at 7, which meant he faced four and a half hours before the car came by. Rutter
was at the Ritz Carlton in
White Plains, about a half mile from Jennings. He breakfasted, showered, and
then killed time until 11:30. Unlike Jennings, Rutter had grounds for serious
concern. In the practice rounds, he had been uncharacteristically slow. The
computer had an exquisite sense of timing, and Jennings seemed to hold his own.
Rutter, who had never lost a Jeopardy game in his life, was facing a flame-out
unless he could get to the buzzer fast.

Shortly after Rutter arrived at IBM, he and Jennings played one last
practice round with Watson. To Rutter’s delight, his buzzer thumb started to
regain the old magic. He beat both Jennings and the machine. Now, in the three
practice matches, each of the three players had registered a win. But Jennings
and Rutter noticed something strange about Watson. Its game strategy, Jennings
said, “seemed naive.” Just like beginning Jeopardy players, Watson started with
the easy low-dollar clues and moved straight down the board. Why wasn’t it
hunting for Daily Doubles? In the Blue-ray disks given to them in November,
Jennings and Rutter had seen that Watson skipped around the high-dollar clues,
hunting for the single Daily Double on the first Jeopardy board, and the two in
Double Jeopardy. Landing Daily Doubles was vital. It gave a player the means to
build a big lead. Equally important, once Daily Doubles were off the board, the
leader was hard to catch. But in the practice rounds, Watson didn’t appear to
have this strategy in mind.

The two players were led to a tiny entry hall behind the auditorium. As the
event commenced, shortly after one p.m., they waited. They listened as IBM
introduced Watson to its customers. “You know how they call time outs before a
guy kicks a field goal?” Jennings said. “We were joking that they were doing
the same thing to us. Icing us.” Through the door they heard speeches by John
Kelly, the chief of IBM Research, and Sam Palmisano. Harry Friedman, who
decades earlier had earned $5 a joke as a writer for Hollywood Squares,
delivered one of his own. “I’ve lived in Hollywood for a long time,” he told
the crowd. “So I know something about Artificial Intelligence.” When Ferrucci
was called on to the stage, the crowd rose for a standing ovation. “I already
cried in make-up,” he said. “Let’s not repeat that.”

Finally, it was time for Jeopardy. Jennings and
Rutter were summoned to the stage. They walked down the narrow aisle of the
auditorium, Jennings leading in a business suit and yellow tie, the taller
loose-gaited Rutter following him, his collar unbuttoned. They settled at their
lecterns, Jennings on the far side, Rutter closer to the crowd. Between them,
its circular black screen dancing with jagged colorful lines, sat Watson.

The show began with its familiar music. A fill-in for legendary announcer,
Johnny Gilbert (who hadn’t made the trip from Culver City), introduced the
contestants and Alex Trebek. But even then, Jennings and Rutter had to wait
while an IBM video told the story of the Watson project. In a second video,
Trebek talked to Ferrucci about the machinery behind the bionic player--now up
to 2,880 processing cores. Then Trebek gave viewers a tutorial on Watson’s
answer panel. This would reveal the statistical confidence that the computer
had in each of its top responses. It was a window into Watson’s thinking.

Trebek, in fact, had been a late convert to the answer panel. Like the rest
of the Jeopardy team, he was loath to stray from the show’s time-honored
formulas. People knew what to expect from the game: the precise movements of
the cameras, the familiar music, voices and categories. Wouldn’t the intrusion
of an electronic answer panel distract them, and ultimately make the game less
enjoyable to watch? He raised that concern on a visit to IBM in November. But
the prospect of playing the game without Watson’s answer panel horrified
Ferrucci. Millions of viewers, he believed, would simply conclude that the
machine had been fed all the answers. They wouldn’t appreciate what Watson had
gone through to arrive at the correct response. So while Trebek was eating
lunch that day, Ferrucci carried out an experiment. He had his technicians take
down the answer panel. When the afternoon sessions began, it only took one game
for Trebek to ask for the answer panel back.Later, he said, watching Watson without its analysis was “boring as
hell.”

A hush settled over the auditorium. Finally, it
was time to play. Ferrucci, sitting between David Gondek and Eric Brown, laced
his hands tightly and made a steeple with his index fingers. He watched as
Trebek, with a wave of his arm, revealed the six categories for the first round
of Jeopardy....