System Administration of the IBM Watson Supercomputer

System administrators at the USENIX LISA 2011
conference (LISA is a great system administration conference, by the way)
in Boston in December got to hear Michael Perrone's
presentation "What Is Watson?"

Michael Perrone is the Manager of Multicore Computing from
the IBM T.J. Watson Research Center. The entire presentation
(slides, video and MP3) is available on the USENIX Web site, and if you
really want
to understand how Watson works under the hood, take
an hour to listen to Michael's talk (and the sysadmin Q&A
at the end).

I approached Michael after his talk and asked if there was a sysadmin
on his team who would be willing to answer some questions about
handling Watson's system administration, and after a brief introduction to
Watson, I include our conversation
below.

What Is Watson?

In a nutshell, Watson is an impressive demonstration of the
current state of the art in artificial intelligence: a computer's
ability to answer questions posed in natural language
(text or speech) correctly.

Watson came out of the IBM DeepQA Project and is an application
of DeepQA tuned specifically to Jeopardy (a US TV trivia game show).
The "QA" in DeepQA stands for Question Answering, which means the
computer can answer your questions, spoken in a human language
(starting with English).
The "Deep" in DeepQA means the computer is able to analyze deeply
enough to handle natural language text and speech successfully.
Because natural language is unstructured, deep analysis is required
to interpret it correctly.

It demonstrates (in a popular format) a computer's capability to interface with
us using
natural language, to "understand" and answer questions correctly
by quickly searching a vast sea of data and correctly picking out
the vital facts that answer the question.

Watson is thousands of algorithms running on thousands of cores using
terabytes of memory, driving teraflops of CPU operations to deliver
an answer to a natural language question in less than five seconds. It is
an exciting feat of technology, and it's just a taste of what's to come.

IBM's goal for the DeepQA Project is to drive automatic Question
Answering technology to a point where it clearly and consistently
rivals the best human performance.

Ten compute racks, 80kW of power, 20 tons of cooling (for comparison,
a human has one brain, which fits in a shoebox, can run on a tuna-fish
sandwich and can be cooled with a handheld paper fan).

How Does Watson Work?

First, Watson develops a semantic net.
Watson takes a large volume of text (the corpus) and parses that with
natural
language processing to create "syntatic frames"
(subject→verb→object).
It then uses syntactic frames to create "semantic frames", which have a
degree
of probability.
Here's an example of semantic frames:

Inventors patent inventions (.8).

Fluid is a liquid (.6).

Liquid is a fluid (.5).

Why isn't the probability 1 in any of these examples? Because of
phrases like "I speak English fluently". They tend to skew the
numbers.

To answer questions, Watson uses Massively Parallel Probabilistic
Evidence-Based Architecture. It uses the evidence from its
semantic net to analyze the hypotheses it builds up to answer
the question. You should watch the video of Michael's presentation
and look at the slides, as there is really too much under the hood
to present in a short article, but in a nutshell, Watson
develops huge amounts of hypotheses (potential answers) and uses
evidence from its semantic Web to assign probabilities to the answers
to pick the most likely answer.

There are many algorithms at play in Watson. Watson even
can learn from its mistakes and change its Jeopardy strategy.

Watson Is Built on Open Source

Watson is built on the Apache UIMA framework, uses Apache Hadoop,
runs on Linux, and uses xCAT and Ganglia for configuration management
and monitoring—all open-source tools.

Interview with Eddie Epstein on System Administration of the Watson
Supercomputer

Eddie Epstein is the IBM researcher responsible for scaling out
Watson's computation over thousands of compute cores in order
to achieve the speed needed to be competitive in a live
Jeopardy
game. For the past seven years, Eddie managed the IBM team doing
ongoing development of Apache UIMA.
Eddie was kind enough to answer my questions about system administration
of the Watson cluster.

AT: Why did you decide to use Linux?

EE: The project started with x86-based blades, and the researchers
responsible for admin were very familiar with Linux.

AT: What configuration management tools did you use? How did you
handle updating the Watson software on thousands of Linux servers?

EE: We had only hundreds of servers. The servers ranged from
4- to 32-core machines. We started with CSM to manage OS installs,
then switched to xCat.

Hi just wanted to give you a quick heads up and let you know a
few of the pictures aren't loading correctly. I'm not sure why but I
think its a linking issue. I've tried it in two different web browsers and both show the same outcome.

Good post. I learn something totally new and challenging on
sites I stumbleupon on a daily basis. It's always helpful to read through content from other writers and practice a little something from their web sites.

I'm impressed, I have to admit. Rarely do I come across a blog that's equally educative and amusing,
and let me tell you, you have hit the nail on the head.
The problem is something that too few folks are speaking intelligently about.
Now i'm very happy I came across this in my search for something relating to this.