wninput(5WN)

Name

Description

WordNet's source files are written by lexicographers. They are the product of a detailed relational analysis of lexical semantics: a variety of lexical and semantic relations are used to represent the organization of lexical knowledge. Two kinds of building blocks are distinguished in the source files: word forms and word meanings. Word forms are represented in their familiar orthography; word meanings are represented by synonym sets (synset s) - lists of synonymous word forms that are interchangeable in some context. Two kinds of relations are recognized: lexical and semantic. Lexical relations hold between word forms; semantic relations hold between word meanings.

Lexicographer files correspond to the syntactic categories implemented in WordNet - noun, verb, adjective and adverb. All of the synsets in a lexicographer file are in the same syntactic category. Each synset consists of a list of synonymous words or collocations (eg. "fountain pen" , "take in" ), and pointers that describe the relations between this synset and other synsets. These relations include (but are not limited to) hypernymy/hyponymy, antonymy, entailment, and meronymy/holonymy. A word or collocation may appear in more than one synset, and in more than one part of speech. Each use of a word in a synset represents a sense of that word in the part of speech corresponding to the synset.

Adjectives may be organized into clusters containing head synsets and satellite synsets. Adverbs generally point to the adjectives from which they are derived.

See wngloss(7WN) for a glossary of WordNet terminology and a discussion of the database's content and logical organization.

Lexicographer File Names

The names of the lexicographer files are of the form:

pos.suffix

where pos is either noun , verb , adj or adv . suffix may be used to organize groups of synsets into different files, for example noun.animal and noun.plant . See lexnames(5WN) for a list of lexicographer file names that are used in building WordNet.

Pointers

Pointers are used to represent the relations between the words in one synset and another. Semantic pointers represent relations between word meanings, and therefore pertain to all of the words in the source and target synsets. Lexical pointers represent relations between word forms, and pertain only to specific words in the source and target synsets. The following pointer types are usually used to indicate lexical relations: Antonym, Pertainym, Participle, Also See, Derivationally Related. The remaining pointer types are generally used to represent semantic relations.

A relation from a source to a target synset is formed by specifying a word from the target synset in the source synset, followed by the pointer_symbol indicating the pointer type. The location of a pointer within a synset defines it as either lexical or semantic. The Lexicographer File Format section describes the syntax for entering a semantic pointer, and Word Syntax describes the syntax for entering a lexical pointer.

Although there are many pointer types, only certain types of relations are permitted between synsets of each syntactic category.

Many pointer types are reflexive, meaning that if a synset contains a pointer to another synset, the other synset should contain a corresponding reflexive pointer. grind(1WN) automatically inserts missing reflexive pointers for the following pointer types:

Pointer

Reflect

Antonym

Antonym

Hyponym

Hypernym

Hypernym

Hyponym

Instance Hyponym

Instance Hypernym

Instance Hypernym

Instance Hyponym

Holonym

Meronym

Meronym

Holonym

Similar to

Similar to

Attribute

Attribute

Verb Group

Verb Group

Derivationally Related

Derivationally Related

Domain of synset

Member of Doman

Verb Frames

Each verb synset contains a list of generic sentence frames illustrating the types of simple sentences in which the verbs in the synset can be used. For some verb senses, example sentences illustrating actual uses of the verb are provided. (See Verb Example Sentences in wndb(5WN) .) Whenever there is no example sentence, the generic sentence frames specified by the lexicographer are used. The generic sentence frames are entered in a synset as a comma-separated list of integer frame numbers. The following list is the text of the generic frames, preceded by their frame numbers:

Lexicographer File Format

Synsets are entered one per line, and each line is terminated with a newline character. A line containing a synset may be as long as necessary, but no newlines can be entered within a synset. Within a synset, spaces or tabs may be used to separate entities. Items enclosed in italicized square brackets may not be present.

The general synset syntax is:

{ words pointers ( gloss ) }

Synsets of this form are valid for all syntactic categories except verb, and are referred to as basic synsets. At least one word and a gloss are required to form a valid synset. Pointers entered following all the words in a synset represent semantic relations between all the words in the source and target synsets.

For verbs, the basic synset syntax is defined as follows:

{ words pointers frames ( gloss ) }

Adjective may be organized into clusters containing one or more head synsets and optional satellite synsets. Adjective clusters are of the form:

Each adjective cluster is enclosed in square brackets, and may have one or more parts. Each part consists of a head synset and optional satellite synsets that are conceptually similar to the head synset's meaning. Parts of a cluster are separated by one or more hyphens (- ) on a line by themselves, with the terminating square bracket following the last synset. Head and satellite synsets follow the syntax of basic synsets, however a "Similar to" pointer must be specified in a head synset for each of its satellite synsets. Most adjective clusters contain two antonymous parts. See wngloss(7WN) for a discussion of adjective clusters, and Special Adjective Syntax for more information on adjective cluster syntax.

Synsets for relational adjectives (pertainyms) and participial adjectives do not adhere to the cluster structure. They use the basic synset syntax.

Comments can be entered in a lexicographer file by enclosing the text of the comment in parentheses. Note that comments cannot appear within a synset, as parentheses within a synset have an entirely different meaning (see Gloss Syntax ). However, entire synsets (or adjective clusters) can be "commented out" by enclosing them in parentheses. This is often used by the lexicographers to verify the syntax of files under development or to leave a note to oneself while working on entries.

Word Syntax

A synset must have at least one word, and the words of a synset must appear after the opening brace and before any other synset constructs. A word may be entered in either the simple word or word/pointer syntax.

A simple word is of the form:

word[ ( marker ) ][lex_id] ,

word may be entered in any combination of upper and lower case unless it is in an adjective cluster. A collocation is entered by joining the individual words with an underscore character (_ ). Numbers (integer or real) may be entered, either by themselves or as part of a word string, by following the number with a double quote (" ).

See Special Adjective Syntax for a description of adjective clusters and markers.

word may be followed by an integer lex_id from 1 to 15 . The lex_id is used to distinguish different senses of the same word within a lexicographer file. The lexicographer assigns lex_id values, usually in ascending order, although there is no requirement that the numbers be consecutive. The default is 0 , and does not have to be specified. A lex_id must be used on pointers if the desired sense has a non-zero lex_id in its synset specification.

Word/pointer syntax is of the form:

[ word[ ( marker ) ][lex_id] , pointers ]

This syntax is used when one or more pointers correspond only to the specific word in the word/pointer set, rather than all the words in the synset, and represents a lexical relation. Note that a word/pointer set appears within a synset, therefore the square brackets used to enclose it are treated differently from those used to define an adjective cluster. Only one word can be specified in each word/pointer set, and any number of pointers may be included. A synset can have any number of word/pointer sets. Each is treated by grind(1WN) essentially as a word , so they all must appear before any synset pointers representing semantic relations.

For verbs, the word/pointer syntax is extended in the following manner to allow the user to specify generic sentence frames that, like pointers, correspond only to a specific word, rather than all the words in the synset. In this case, pointers are optional.

Pointers are optional in synsets. If a pointer is specified outside of a word/pointer set, the relation is applied to all of the words in the synset, including any words specified using the word/pointer syntax. This indicates a semantic relation between the meanings of the words in the synsets. If specified within a word/pointer set, the relation corresponds only to the word in the set and represents a lexical relation.

A pointer is of the form:

[lex_filename : ]word[lex_id] , pointer_symbol

or:

[lex_filename : ]word[lex_id] ^ word[lex_id] , pointer_symbol

For pointers, word indicates a word in another synset. When the second form of a pointer is used, the first word indicates a word in a head synset, and the second is a word in a satellite of that cluster.word may be followed by a lex_id that is used to match the pointer to the correct target synset. The synset containing word may reside in another lexicographer file. In this case, word is preceded by lex_filename as shown.

See Pointers for a list of pointer_symbol s and their meanings.

Verb Frame List Syntax

Frame numbers corresponding to generic sentence frames must be entered in each verb synset. If a frame list is specified outside of a word/pointer set, the verb frames in the list apply to all of the words in the synset, including any words specified using the word/pointer syntax. If specified within a word/pointer set, the verb frames in the list correspond only to the word in the set.

A frame number list is entered as follows:

frames: f_num [, f_num...]

Where f_num specifies a generic frame number. See Verb Frames for a list of generic sentences and their corresponding frame numbers.

Gloss Syntax

A gloss is included in all synsets. The lexicographer may enter a text string of any length desired. A gloss is simply a string enclosed in parentheses with no embedded carriage returns. It provides a definition of what the synset represents and/or example sentences.

Special Adjective Syntax

The first word of a head synset must be entered in upper case, and can be thought of as the head word of the head synset. The word part of a pointer from one head synset to another head synset within the same cluster (usually an antonym) must also be entered in upper case. Usually antonymous adjectives are entered using the word/pointer syntax described in Word Syntax to indicate a lexical relation. There is no restriction on the number of parts that a cluster may have, and some clusters have three parts, representing antonymous triplets, such as solid , liquid , and gas .

A cross-cluster pointer may be specified, allowing a head or satellite synset to point to a head synset in a different cluster. A cross-cluster pointer is indicated by entering the word part of the pointer in upper case.

An adjective may be annotated with a syntactic marker indicating a limitation on the syntactic position the adjective may have in relation to noun that it modifies. If so marked, the marker appears between the word and its following comma. If a lex_id is specified, the marker immediately follows it. The syntactic markers are: