answer the larger scenario-based question. We show
that WatsonPaths not only outperforms a baseline
system that uses simple information retrieval, but
also outperforms its own subcomponent, Watson, in
answering a set of scenario-based questions from the
medical domain.

WatsonPaths Medical Use Case
Although WatsonPaths is intended as a domain-gen-eral technology for scenario-based question answering, we decided to start by focusing our attention on
the medical domain. We focused on the problem of
patient scenario analysis, where the goal is typically
a diagnosis or a treatment recommendation.

To explore this kind of problem solving, weobtained a set of medical test preparation questions.These are multiple-choice medical questions basedon an unstructured or semistructured natural lan-guage description of a patient. Although Watson-Paths is not restricted to multiple-choice questions,we saw multiple-choice questions as a good startingpoint for development. Many of these questionsinvolve diagnosis, either as the entire question, as inthe previous medical example, or as an intermediatestep, as in the following example:A 63-year old patient is sent to the neurologist with aclinical picture of resting tremor that began 2 yearsago. At first it was only on the left hand, but now itcompromises the whole arm. At physical exam, thepatient has an unexpressive face and difficulty inwalking, and a continuous movement of the tip of thefirst digit over the tip of the second digit of the lefthand is seen at rest. What part of his nervous systemis most likely affected?

For this question, it is useful to diagnose that the
patient has Parkinson’s disease before determining
which part of his nervous system is most likely affected. These multistep inferences are a natural fit for the
graphs that WatsonPaths constructs. In this example,
the diagnosis is the missing link on the way to the
final answer.

Scenario-Based Question Answering
In scenario-based question answering, the system
receives a scenario description that ends with a
punch line question. For instance, the punch line
question in the Parkinson’s example is “What part of
his nervous system is most likely affected?” Instead of
treating the entire scenario as one monolithic question as would Watson, WatsonPaths explores multiple facts in the scenario in parallel and reasons with
the results of its exploration as a whole to arrive at
the most likely conclusion regarding the punch line
question. The architecture of WatsonPaths is shown
in figure 2. In this section, we briefly outline each
step, while the bulk of the rest of the article goes into
more detail on important steps.

Scenario Analysis
The first step in the pipeline is scenario analysis,
where we identify factors in the input scenario that
may be of importance. In the medical domain, the
factors may include demographics (“32-year old
woman”), preexisting conditions (“type 1 diabetes
mellitus”), signs and symptoms (“progressive renal
failure”), and test results (“hemoglobin concentration is 9 g/dL,” “normochromic cells,” “normocytic
cells”). The extracted factors become nodes in a
graph structure called the assertion graph, on which
the remaining steps of the process will operate.

Node Prioritization
The next step is node prioritization, where we decide
which nodes in the graph are most important for
solving the problem. In a small scenario like this
example, we may be able to explore everything, but
in general this will not be the case. Factors that affect
the priority of a node may include the system’s confidence in the node assertion or the system’s estimation of how fruitful it would be to expand a node. For
example, normal test results and demographic information are generally less useful for starting a diagnosis than symptoms and abnormal test results.

Relation Generation
The relation-generation step builds the assertion
graph. We do this primarily by asking Watson questions about the factors. In medicine we want to know
the causes of the findings and abnormal test results
that are consistent with the patient’s demographic
information and normal test results. Given the scenario in the Introduction, we could ask, “What does
type 1 diabetes mellitus cause?” We use a medical
ontology to guide the process of formulating subquestions to ask Watson. Relevant factors may also be
combined to form a single, more targeted question.
Because in this step we want to emphasize recall, we
take several of Watson’s highly ranked answers. The
exact number of answers taken, or the confidence
threshold, are parameters that must be tuned. Given
a set of answers, we add them to the graph as nodes,
with edges from nodes that were used in questions to
nodes that were answers. The edge is labeled with the
predicate used to formulate the question (like causes
or indicates), and the strength of the edge is initially
set to Watson’s confidence in the answer.

Although Watson is the primary way we add edges
to the graph, WatsonPaths allows for any number of
relation generator components to post edges to the
graph. For instance, we apply term matchers to pairs
of nodes, and post a relation between nodes that
match.

Belief Computation
Once the assertion graph has been expanded in this
way, we recompute the confidences of nodes in the
graph based on new information. We do this using