Paper Details

Nowadays, most answers to natural language questions can be found in the first few results of web search engines. We describe the WAMBy question answering system that attempts to extract answers from the top 10 results of Google. We propose a new ranking method for factoid answers that among others takes into account the semantic similarity of the context of each answer candidate with the question and named entities found in the titles and the snippets of search results. The proposed method gave very promising results in a variety of questions. Also we describe the methods used for non-factoid answer extraction and ranking. An important part of the system is a new text similarity measure that extends TF-IDF by utilizing word vectors. The new text similarity measure solves the problem of synonyms and improves the performance of TF-IDF in the paraphrase identification task. The source code of WAMBy is publicly available as well as two datasets that were created for question classification and factoid question answering evaluation.