(Cat? OR feline) AND NOT dog?
Cat? W/5 behavior
(Cat? OR feline) AND traits
Cat AND charact*

This guide provides a more detailed description of the syntax that is supported along with examples.

This search box also supports the look-up of an IP.com Digital Signature (also referred to as Fingerprint); enter the 72-, 48-, or 32-character code to retrieve details of the associated file or submission.

Concept Search - What can I type?

For a concept search, you can enter phrases, sentences, or full paragraphs in English. For example, copy and paste the abstract of a patent application or paragraphs from an article.

Concept search eliminates the need for complex Boolean syntax to inform retrieval. Our Semantic Gist engine uses advanced cognitive semantic analysis to extract the meaning of data. This reduces the chances of missing valuable information, that may result from traditional keyword searching.

Method for Constructing Corpus-Based Thesaurus

Publishing Venue

IBM

Related People

Uramoto, N: AUTHOR

Abstract

Disclosed is a method for locating unknown words in existing thesaurus using statistical data from large-scale corpora. The input of this method is the word that does not appear in the thesaurus. The output is a part of thesaurus (sub-tree) that the input word should be located in the thesaurus.

Country

United States

Language

English (United States)

This text was extracted from an ASCII text file.

This is the abbreviated version, containing approximately
82% of the total text.

Method for Constructing Corpus-Based Thesaurus

Disclosed is
a method for locating unknown words in existing
thesaurus using statistical data from large-scale corpora. The input
of this method is the word that does not appear in the thesaurus.
The output is a part of thesaurus (sub-tree) that the input word
should be located in the thesaurus.

This method consists of the following four parts:
1.
The co-occurrence data for the unknown word are extracted from
the corpora. Example A shows the example of the 3-gram
data
for
the input word
"bus". Each data is
generalized by deleting
the
unknown word from the data. They are called co-occurrence
patterns for the unknown word
(Example B).
2.
The co-occurrence data for the words in the thesaurus are
extracted using the same method
described in (1) and the
co-occurrence patterns for words
in thesaurus are created.
3.
The similarity value between the co-occurrence patterns for the
unknown word and each co-occurrence
patterns for words in
thesaurus is calculated. The word in the thesaurus is marked
if
the value precedes a certain
threshold.
4.
From the multiple marked words, the sub-trees are constructed.
The sub-tree that has the largest
number of words is selected
as
the place that the input word
should be located.