Abstract. The present paper presents a system for “metagnostic question answering” from biomedical texts. This kind of system can answer questions from texts and give explanations for its answers exhibiting “self-awareness”. This is accomplished by representing and processing its state of reasoning. The system is implemented in Prolog and uses a question grammar combined with a text grammar. The system answers the input questions using these two grammars, reasoning rules, lexicons, ontologies and the history of its own state. The semantics of the question grammar are expressed in terms of combinations of predicates that analyze chains of text chunks recognized by the text grammar. Inference of implicit facts is performed directly from the text without using any formal representation of it. An evaluation with real sentences downloaded from Pubmed gave satisfactory results.Keywords: question answering, biomedical texts, explanation

1 Introduction

In the present paper we propose the use of the term “metagnostic” instead of the term “metacognitive” used in [1] for question answering systems that exhibit self-awareness. The reason is that we want to avoid any confusion that may arise from the psychological connotations of the adjective “metacognitive” that may be connected with human metacognition [2]. We are not aiming at modeling in any way human metacognition but instead we are exploring the ways that computer based question answering systems can be implemented that produce answers exhibiting machine metacognition. The nature of this kind of systems is manifested by the fact that the explanations they generate refer explicitly to the history of their reasoning. This kind of explanation is particularly important in Biomedicine because an answer will not be easily accepted by an information user in the biomedical field unless sufficiently convincing explanations are provided for its correctness. State of the art question answering systems and in particular biomedical question answering systems still lack the capacity of generating explanations that exhibit self-awareness [3], [4].

To the best of our knowledge the only question answering systems in existence and current development that exhibit self-awareness are the CASSIE system [5], the EPILOG system [6] and the system presented in [1]. The first two systems use for their answers information extracted solely from formal knowledge bases and have no relation to biomedical applications. On the other hand the system presented in [1] is restricted to answering questions from Euclidean Geometry proof texts. On the contrary the metagnostic question answering system presented in the present paper accepts natural language questions and generates answers with information extracted from unstructured natural language biomedical texts. Another question answering system with some similarity of function is the system COGEX presented in [7]. This system however is based on the translation of texts into a logical formalism that ours does not and generates answer justifications of somewhat formal nature that ours does not either. The answers generated by our system contain explanations that are generated either directly or after applying deductive inference and using meta-information concerning the state history of the system. The internal record of the state sequence of the system contains information concerning the strategy followed and the progress of the reasoning performed. This information is used for generating automatically explanations of the answers given to the input questions aimed at the persuasion of the user for their correctness. The deductive inference from texts by computer is traditionally performed in two stages. In the first stage the text is translated by computer or by hand into some formal representation. In the second stage reasoning is performed with this formal representation. Contrary to that we avoid the translation step and the deduction necessary for answering the questions is performed directly from the texts as it was first proposed for deductive question answering from biomedical texts in [8] and further elaborated in [9]. The main advantage of this method as applied to scientific text is that there is no need for retranslation whenever a change is made in the ontology or the meaning of the technical terms of the domain of question answering.

2 An illustrative Example

An example will be used to illustrate the operation of our system using two abstracts found in PubMed that concern the very important protein p53. This operation includes the question answering process of the system and the manifestation of “self-awareness” in the explanation generated. The first abstract consists of six sentences from which the following two sentence fragments were selected automatically using the entities p53 and/or mdm2 as keywords:

“The p53 protein regulates the mdm2 gene”
“regulates both the activity of the p53 protein”

The second abstract consists of seven sentences from which two were selected from which the following two fragments were selected automatically using again p53 and/or mdm2 as keywords:

The example question “What influences p53” is input to the system and it is answered using the above sentence fragments.

Our system discovers a causal feedback loop by appropriate chaining of the relevant sentence fragments and reports and explains its finding as “p53 influences p53 and therefore there is a feedback loop for p53” .

3 The System Implementation and Operation

The question answering system presented here was implemented in Prolog and consists of three modules implemented as separate programs totaling about 40 pages of code. These modules communicate through some temporary files that store intermediate results and they are: a question processing module, a text chunking module and a question answering module.
The form of the questions accepted by the question processing module is “What is by ?” where stands for the passive form of a member of the group of verbs known to our system as bio-medically significant verbs and stands for a noun denoting a biomedical entity such as a protein or a gene. Examples of verbs are the verbs regulate, enhance, influence and inhibit. Examples of entities are p53 and mdm2. The analysis performed by the question processing module extracts the components of the question such as the verb denoting a relation namely “influenced” in the example question and the entity such as “p53” for the example question. These components are stored in the internal data base. The progress of the deductive inference in the question answering module of our system can be monitored by following and recording the use of the sentences of the text and the prerequisite knowledge that supports the generation of the explanations provided as deduced answers to the questions of the user. This monitoring together with the self-awareness of the strategy being followed constitutes the main elements of the self-awareness nature of our question answering system.

4 The Illustrative Example Explanation Generated

The explanation generated when answering the question of the illustrative example described above is presented here. The system is able to identify the two sentences of the input text that define the p53 feedback loop. These sentences with their identification numbers 1411 and 1421are:

<1411>:
<1421>:

The Example Explanation as verbatim generated by our system runs as follows:

Your question is :

I succeeded answering this question using strategy :

With its parameters taking the values :

, , , , ,

which means that I am testing for causality using

no synonyms, no inversion and no prerequisite knowledge

I searched the input for the entity .

in the chunk of the sentence <1411>.

Since its first token is not an entity, I tested the rest of the tokens.

Repeating the search

I found that the entity is one of the tokens of the chunk

which is the chunk to the left of the verb of the sentence <1411>.

I found that the chunk to the right of the verb of the sentence <1411>

is the chunk and

Since its first token is not an entity, I tested the rest of the tokens.

Repeating the search the entity is one of the tokens of the chunk

which is the chunk to the right of the verb of the sentence <1411>.

I found that the entity is included in the sentence<1411>:

I searched then for occurrence of the entity as follows:

I searched the input for the entity .

in the chunk of the sentence <1411>.

Since its first token is not an entity, I tested the rest of the tokens.

Repeating the search

I found that the entity is not a token of the chunk

of the sentence and for this reason I will test the following sentence

in the chunk

of the sentence <1421>.

Since its first token is not an entity, I tested the rest of the tokens.

Repeating the search

I found that the entity is one of the tokens of the chunk

which is the chunk to the left of the verb of the sentence <1421>.

I found that the chunk to the right of the verb of the sentence <1421>

is the chunk and

is the first token of the chunk

the entity is one of the tokens of the chunk

which is the chunk to the right of the verb of the sentence <1421>.

Therefore answering the question

I found that from the following sentences:

<1411>:

<1421>:

It follows that is influenced by

Therefore there is a process loop of p53.

5 Evaluation

The performance of the system was evaluated using a set of 129 sentences obtained from the PubMed Data Base that were selected from the titles of papers. The criteria of selection were that they contain the name of the protein p53 and the influence verb “enhance”. These 129 sentences are all that were found by PubMed on October the 7th 2008. This set constituted the input text to the system and the answer and the explanation given to the question “What is influenced by p53 ?” were examined for their correctness. The results quantified in terms of “precision” and “recall” are: Precision= 97,5 % and Recall= 88,9 %, where precision and recall are computed as usual.

6 Conclusion

The present paper presented a system for metagnostic question answering from biomedical texts. The system can answer questions from biomedical texts by providing explanations that exhibit self-awareness. This is accomplished by representing and processing its own state of automatic question and text analysis. The explanations generated are offered in order to convince the user of the correctness of its answers. The system was implemented in Prolog and uses a question grammar combined with a text grammar. These two grammars use rules, lexicons, ontologies and history of the state of the system to answer the input questions. The questions are first analyzed into components using the question grammar. Our system can be applied either to biomedical meta-cognitive tutoring or to the answering of questions from biomedical text for the use of research and/or practicing biomedical personnel.