Introduction

Research communities need a large corpus of representative,
relevant and interesting problems to evaluate their proposed solutions
in a meaningful way, systematically, repeatably and with statistically
significant results. Unfortunately, the knowledge representation and
reasoning community at current lacks such a corpus.

Clio Knows is
an attempt to construct just such a corpus of knowledge representation
and reasoning problems, drawing upon readily available historical
real-world events and their interpretations for contents.

Construction Principles

Types of Information in the Corpus

The corpus contains different types of informaton:

Questions about historical events, e.g.

"How did Wellington react to the report of Napoleon's
death?"

One or more answers for each of the questions,
e.g.

"Wellington cried."

One or more explanations (or justifications)
for each of the answers given, e.g.

"Wellingon cried
because he admired Napoleon as a general."or
"Wellington cried because he felt an era was coming to an
end."

Foreground and background knowledge to successfully
understand the question and to arrive at an answer with
justifications; e.g.

"Wellington fought Napoleon at the
battle of Waterloo in 1815.""Great generals admire their opponent
generals." "Some people weep when people they admire
die."

(The line between foreground and background
knowledge is sketchy, but at least intuitively the second and the
third element of the justification appear more background-like than
the first — maybe because they are applicable in more
situations.)

Types of Corpus Contents Representation

While the corpus contains the types of information specified
above — questions, answers, justifications and required
knowledge — that information is stored in multiple forms or
representations. Some of these representations are formal, as is usual
in knowledge representation and reasoning; others employ natural
language.

A colloquial Natural Language representation of the
information; the Wellington-Napoleon example we introduce above is of
this form

Multiple formal language representations of the
information, in languages with well-specified semantics. Many of these
formal languages will be related to first-order predicate logic or
interesting fragments thereof. Notable examples include OWL, CycL,
Concept Graphs, Situation Calculus or Prolog.

Multiple ontological grounding representations of the
information, using one of the formal languages, but taking their
vocabulary from different ontologies. Notable examples include
ResearchCYC, SUMO, and any of a variety of OWL-compatible
ontologies.

An explicit Natural-Language representation of the
information, that is, a rendering in short natural language sentences
that attempts to be as unambiguous as possible. Also, in the case of
the justifications for the answers, the representation gives a rather
detailed proof sketch. This representation, sometimes informally
called English Zero, is intended to bridge the gap between the
ambiguity of colloquial natural language and the rigor of formal
representations.

Such multiplicity of representations is not a redundancy or
even an accident, but one of the research contributions the corpus
makes. One of the goals of this corpus is to help investigate which
formalisms and ontologies are most suitable for historical research
along which dimensions. Providing multiple representations of what
aspires to be the same contents supports research that investigates
the relative strenghts and weaknesses of different formal languages
and/or approaches to ontologizing concepts.

Equally, supporting multiple natural languages (at least by
design) assists those forms of natural language processing research
that work with parallel corpora.

Organizational Considerations

TBD

How to Contribute

From people with basic literacy to researchers with advanced
degrees in history or knowledge representation, almost anyone can
contribute to Clio Knows.
Here are just some of the ways, in ascending order of required skill:

Submit a Problem: Merely submitting a historical
question is a big help. If the answer is readily available, then
please include it; but if not, submit the question anyway. For extra
credit, provide some or all of the sources used by historians to
support their answer to the question.

Convert a Problem to "English-Zero": Some people are
good at hunting down questions and citations, others are good at copy
editing and simplifying. Converting a problem to "English-Zero" is one
way the latter skill set can be put to good use. This includes
suggesting alternate forms of stating the problem; we do not
assume a 1:1 correspondence between colloquial and English-Zero
formulations.

Translate a Problem into another Natural Language:
Akin to the conversion to "English-Zero", this contribution includes
suggesting alternate forms of stating a problem in another natural
language. Again, we make no assumptions about a 1:1 correspondence
between the original natural language of a problem and its
translations into another natural language.

Suggest an alternate Proof: Often, there are multiple
ways to get the right answer, and we want to cover as many of them as
possible. Usually, this will include either the use of alternate
sources or the application of alternate rules and heuristics.

Formalize a Problem: Eventually, this corpus should be
available in all of the relevant formal representation languages and
ontologies. This means, that for any new colloquial problem, there
will be n formalizations of that problem.

Support a new Formalism: Extending the corpus to
include a new formal representation language is a lot of
work. Hopefully, much of it could be done via automatic conversion
from an existing formal representation; but that still requires
conversion tools.