| This links to a wiki for the Explanation-Based Analysis annotation described in our ACL 2010 paper [[http://l2r.cs.uiuc.edu/~danr/Papers/SammonsVyRo10.pdf]], with the annotations, the annotation instructions, and an invitation to participate in a community-based annotation effort.

+

| style="text-align: center;"|0

+

| style="text-align: center;"|0

+

| style="text-align: center;"|0

+

| [[ebaWiki- RTE Users|Users]]

|- bgcolor="#ECECEC" "align="left"

|- bgcolor="#ECECEC" "align="left"

Revision as of 21:57, 15 November 2010

Knowledge resources have shown their relevance for applied semantic inference, and are extensively used by applied inference systems, such as those developed within the Textual Entailment framework.

This page presents a list of the knowledge resources used by systems that have participated in the last RTE challenges. The first table lists the publicly available resources, the second one lists unpublished resources. Both tables are sortable by Resource name, type, author and number of users.

RTE Participants are encouraged to add information about all kind of knowledge resources used, from standard existing resources (e.g. WordNet) to knowledge collections created for specific purposes, which can be made available to the community.

Contents

Call for Resources

In order to help the research, all the participants are invited to contribute, sharing their own resources with the RTE community.
Making the resources available to be used by other systems has several advantages. On the one hand, it helps improve the TE technology; on the other hand, it offers an opportunity to further test and evaluate the resource.

Ablation Tests

An ablation test consists of removing one module at a time from a system, and rerunning the system on the test set with the other modules, except the one tested.
Ablation test are meant to help better understand the relevance of the knowledge resources used by RTE systems, and evaluate the contribution of each of them to the systems' performances. In fact, comparing the results achieved in the ablation tests to those obtained by the systems as a whole allows assessing the contribution given by each single resource.

Human Language Technology Research Institute, University of Texas at Dallas

Extension of WordNet based on the exploitation of the information contained in WordNet definitional glosses: the glosses are syntactically parsed, transformed into logic forms and content words are semantically disambiguated. The Extended Wordnet is an ongoing project.

The resource is the result of the application of a learning algorithm for inducing semantic taxonomies from parsed text. The algorithm automatically acquires items of world knowledge, and uses these to produce significantly enhanced versions of WordNet (up to 40,000 synsets more).

Peter Mark Roget (Electronic version distributed by University of Chicago)

Roget's Thesaurus is a widely-used English thesaurus, created by Dr. Peter Mark Roget in 1805. The original edition had 15,000 words, and each new edition has been larger. The electronic edition (version 1.02) is made available by University of Chicago.

Light-weight and extensible ontology. It contains more than 2 million entities and 20 million facts about these entities. The facts have been automatically extracted from Wikipedia and unified with WordNet.

DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. The DBpedia knowledge base currently describes more than 2.9 million things in 91 different languages and consists of 479 million pieces of information.

DIRT (Discovery of Inference Rules from Text) is both an algorithm and a resulting knowledge collection. The DIRT knowledge collection is the output of the DIRT algorithm over a 1GB set of newspaper text.

Co-occurrence of the word pairs in RTE3 and RTE4 using Normalized Google Distance (Cilibrasi and Vitanyi, 2004). The word pairs are all the possible combinations of content words in T and H. In practice, we used Yahoo! as the search engine.

Co-occurrence of the word pairs in RTE3 and RTE4 using Normalized Google Distance (Cilibrasi and Vitanyi, 2004). The word pairs are all the possible combinations of content words in T and H. In practice, we used Yahoo! as the search engine.

Text file containing 5800 pairs of sentences which have been extracted from news sources on the web, along with human annotations indicating whether each pair captures a paraphrase/semantic equivalence relationship.

Extraction of about 8 million lexical reference rules from the text body (first sentence) and from metadata (links, redirects, parentheses) of Wikipedia. Provides better performance than other automatically constructed resources and comparable performance to WordNet. Offers complementary knowledge to WordNet.

This is a resource of directional distributional term-similarity rules (mostly lexical entailment rules) automatically extracted using the inclusion relation as described in (Kotlerman et.al., ACL-09).

This links to a wiki for the Explanation-Based Analysis annotation described in our ACL 2010 paper [[1]], with the annotations, the annotation instructions, and an invitation to participate in a community-based annotation effort.

Not available Resources

The following table lists the unpublished resources used by RTE participants. Some of them have been developed by Users themselves specifically for RTE. Interested people may turn to authors to obtain further information.

Ontology containing geographic terms and two kinds of relations: the directional part-of relation, and the equal relation for synonyms and abbreviations of the same geographic area (e.g the United Kingdom, the UK, Great Britain, etc.)