A mechanism for specifying relationships between classified elements and other classified elements or specific kinds of data (numbers, dates, etc.)

An simple way to specify ontologies containing rules that define valid classifications, relationships, and inferred rules.

The intent of this specification is to make it possible for user-agents, robots, etc., to gather truly meaningful information about web pages and documents, enabling significantly better search mechanisms and knowledge-gathering.

The general way one goes about this is as follows:

First, define an ontology describing valid classifications of web objects, and valid relationships between web objects and other web objects or data. This ontology may borrow from other ontologies.

Annotate HTML pages to describe themselves, other pages, or subsections of themselves, as having attributes as described in one or more ontologies.

We're playing a bit fast-and-loose with the term ontology here. In this specification, ``ontology'' simply means an ISA hierarchy of classes/categories, plus a set of atomic relations between these categories, and a set of inferential rules in the form of simplified horn clauses. Categories inherit relations defined for parent categories.

User agents following this specification should be aware that assertions made by HTML pages are not facts, but claims. I.e., if element x claims that element y is related with relation r to element z, then the user-agent should not be entering r(y,z) into its database (i.e., "Now I know that y is related to z with the relationship r!!"). Instead, it should be entering something along the lines of r(x,y,z) into its database (i.e., "x is claiming that y is related to z with relationship r."). This is an important distinction: it's perfectly fine for HTML pages out there to be making completely false claims; one shouldn't simply accept them as truth. For similar reasons, HTML pages can only make assertions, not retractions.

An element under which HTML page instances or subinstances can be classified. Category names are element names, and may be prefixed. Categories may have parent categories. Categories define inheritance: if an instance is classified under a category, keys classified with this category may fill argument positions in relations defined for that category or any of its parent (or ancestor) categories. Multiple inheritance is valid.

Data

Data which can be placed in an argument of a relationship but is not an instance. Data must be of the following types:

Strings (STRING)

HTML String Literals, as defined in the HTML 2.0 specification.

Numbers (NUMBER)

Floating-point numerical constants. Knowledge-agents should be able to read common floating-point numbers like 2, 2.0, -1.432e+4, etc. Numbers may be of the form,0|
([+|-|]
['.'digit*|0'.'digit*|non-zero-digit digit*['.'digit*|]]
[([e|E][+|-]non-zero-digit digit*)])

Dates (DATE)

Date/Timestamps following RFC 1123, as shown in section 3.3.1 of the HTTP/1.0 specification.

Booleans (TRUTH)

HTML String Literals of the form YES or NO, case-insensitive.

Categories (CATEGORY)

Category names.

Relationships (RELATION)

Relation names.

Element

A category or relationship name, or one of the following reserved keywords (all caps): STRING, NUMBER, DATE, TRUTH, CATEGORY, or RELATION. Element names are case-sensitive, and may contain only letters, digits, or hyphens.

Instance

An element which may be classified under zero or more categories, and included as an argument to relationships (along with other forms of data). Some instances, page instances, are associated with World-Wide Web documents. All page instances are automatically of the category PageInstance. Other instances, subinstances, are associated with subsections of HTML page instance documents. Subinstances are automatically of the category Subinstance. Unless they are special subinstances for delegating declaritive power (see section 4.7), subinstances have a parentInstance relationship linking them to their enclosing instance. Instances form the most common data entities in databases built up from this specification.

Key

A string which uniquely defines a page instance or a subinstance. It is up to you to decide on the keys for your documents. For page instances of SHOE-conformant documents, the proper method is to use a single absolute URL for the document.. For example, http://www.cs.umd.edu is a valid key for the document located at that URL.

To create keys for subinstances, add to the page instance's unique key a pound-suffix such as #MyDog. For example, http://www.cs.umd.edu#MyDog is a valid key for a subinstance located at http://www.cs.umd.edu. It's good style for this unique key to correspond with an actual anchor in the document.

The unique key of a non-SHOE-conformant document is defined to be one particular absolute URL of the document, chosen for the document by a SHOE-conformant document which references it.

The SHOE spec reserves the key "me" and any capitalized form of it. "me" (under any capitalized form) may be used as an argument of a claim to refer to the enclosing data entity actually making the claim.

Ontology

As defined in this specification, a description of valid classifications for HTML page instances and subinstances, and valid relationships between instances and elements.

Prefix

A small string attached with a period at the beginning of an instance, category, or relation name. For example,cs is a prefix in cs.junk. Prefixes may also be attached to already-prefixed elements, forming a prefix chain. For example, foo.bar.cs is a prefix chain for foo.bar.cs.junk. A prefix indicates the ontology from which the element (or prefixed element) following it is defined.

Relation (Relationship)

An element which defines a relationship between elements. Relation names are element names, and may be prefixed. Elements fill one or more arguments to a relation. Arguments are explicitly ordered, so each has a numbered position (the first is argument 1). Many relations are binary (have exactly two arguments). A binary relation's domain is argument 1 of the relation. A relation's range (the element the relation is ``to'') is argument 2 of the relation.

Rule

A formal rule in an ontology defining valid classifications (categories) or valid relationships that can be asserted.

Unique Name

A string which uniquely defines an ontology. Unique names are different from keys in that they do not uniquely define instances but rather the ontologies which the instances may use. Different versions of an ontology may have the same unique name so long as they have different version numbers.

Version (Version Number)

A string which describes the version of an ontology. Versions are case-sensitive, and may contain only letters, digits, or hyphens.

An HTML document may contain any number of ontology definitions. Each ontology definition should use a unique name. Ontology definitions are accompanied with a version number. If an ontology completely subsumes previous versions of the same ontology (it contains all the rules defined in those versions), it may declare itself to be backward-compatible with those versions. To begin an ontology definition, use:

An ontology may be declared to extend one or more existing ontologies. This means that it will use elements in those ontologies in its own rules. To distinguish between those elements and its own elements, an ontology must provide a unique prefix for each ontology it extends. This will be prefixed to elements borrowed from each particular ontology whenever they are referred to. To declare that an ontology is extending another ontology, use:

The prefix you are assigning the extended ontology. All categories and relations from the extended ontology which are used in your ontology must be prefixed with this prefix. Within an HTML document, a prefix must be different from all prefixes declared with either <USE-ONTOLOGY ...> or <ONTOLOGY-EXTENDS ...> tags.

Inside an ontology definition, an ontology may declare various new categories which instances can belong to. Categories should descend from one or more parent categories. To declare a new category, or to add new parent categories for a category, use:

The newly declared category, or the one being given more parent categories. Newly declared categories should be distinct from all other categories and relationships declared in the ontology.

ISA

A whitespace-delimited list of categories to define as parent categories of this category.

DESCRIPTION

A short, human-readable description of the category's semantics.

SHORT

A phrase which an agent may use to display the category to a user a more understandable fashion than the category's name. In English ontologies, SHORT should be a noun, lower-case unless it is a proper noun. For example, the category "LaunchVehicleNASA" might have SHORT="rocket".

A particular category should not be defined more than once within an ontology's declaration.

The newly declared relationship name. This should be distinct from all other categories and relationships declared in the ontology.

ARGS (mandatory)

The arguments of the relation. This should be a whitespace-delimited list of (commonly two) elements, representing the arguments to the relation. Elements can be either declared categories, or the following keywords (all caps): STRING, NUMBER, DATE, TRUTH, CATEGORY, RELATION.

CATEGORY establishes a relationship not with category instances but with categories themselves.RELATION establishes a relationship not with instances but with other relationships. These last two elements are rare and should only be used in special circumstances.

DESCRIPTION

A short, human-readable description of the relationship's semantics.

ARGS-DESCRIPTION

A whitespace-delimited list of one-word descriptions of each argument in the relation. This may be useful for an agent to display to the user in a more understandable fashion than "argument 1", "argument 2", and so on. For example, the relation "nameOf" relating a person with a name might have ARGS-DESCRIPTION="person name".

SHORT

A phrase which an agent may use to display the relation to a user a more understandable fashion than the relation's name. In English ontologies, SHORT should be a verb, lower-case, and in the format that makes some sense when appearing after the first argument but before the remaining arguments. For example, the relation "nameOf" might have SHORT="is named".

A particular named relationship should not be defined more than once within an ontology's declaration.

To reduce the number of prefixes, an ontology may rename a category or relation (plus its prefix chain) to a simpler name, so long as this name is not used in any other category or relation in the ontology. For example, an ontology could rename the category cs.junk.foo.person to simply person, so long as person is not defined elsewhere in the ontology.

Ontologies are not permitted to rename (or rename elements to) the following keywords: STRING, NUMBER, DATE, TRUTH, CATEGORY, or RELATION. To rename a category or relation, use:

An ontology may declare one or more inferences which may be automatically made based on the relationship and category declarations found in marked-up SHOE text.

A SHOE inference clause is similar to a Horn clause in that it consists of a body of one or more subclauses describing claims entities might make, and a head consisting of one or more subclauses describing a claim that may be inferred if all the claims in the body do turn out to be made.

Subclause Rules.Both the head and the body of an inference clause must contain at least one subclause; inference clauses with an empty head or body are incorrect and may be ignored. A subclause in the head may either be a relation declaration or a category declaration. A subclause in the body may be either a relation declaration, a category declaration, or a "special" declaration as described below. Special declarations may not be made in the head. Special declarations permit inferences to include special equals/not-equals/greater-than/less-than relationships between elements in subclauses.

Constants and Variables. The data in a head or body subclause may be either constants or variables. Constant data must be matched exactly as it is. Variables may be matched (bound) to any constant data interned by a SHOE agent, so long as variables of the same name in subclauses within the same inference clause always bind to the same value, and fill argument positions of the same type. Variables are case-insensitive: the variable V matches with the variable v.

Variable Rules . There are two variable rules, intended to eliminate ambiguities and excess computational complexity in the inference clauses:

The head of the inference clause may not contain variables that do not appear somewhere in the body of the clause.

If there is more than one variable in the body, then each variable must appear at least once in a relationship-type (<ONTIF RELATION=...>) subclause within the body along with another variable in such a way that one could trace a path from any variable to any other variable through a series of relationships (in other words, the relationship graph for variables in the body is connected). Special declarations (other than "equal") and category declarations do not count as part of this relationship graph.

Inference clauses that do not meet these constraints are incorrect and may be ignored.

An inference may have zero or more body subclauses, defined with the ONTIF tag. Each body subclause is either a category declaration, or a relationship declaration, but not both. Body subclauses indicate claims that must be actually made by SHOE documents for the inferences in the head subclause to take place. If an inference has no body subclauses, this indicates that the inferences in the head should always be true. However, this method of making assertions should be made sparingly; such claims are better expressed in SHOE documents themselves, not in an ontology.

To define that a body subclause is a category declaration, use:

<ONTIF CATEGORY="prefixed.category"
FOR="variable">

``prefixed.category'' (mandatory)

A category with full prefix chains showing a path through extended ontologies back to the ontology in which it was defined.

FOR (co-mandatory)

Contains a variable to be bound to an instance which has been declared to belong to this category.

CFOR (co-mandatory)

Contains the key of an actual instance which has been declared to belong to this category.

A relationship with full prefix chains showing a path through used and extended ontologies back to the ontology in which it was defined.

C1, C2, C3, C4, C5, C6, C7, ...

Declares the element as a constant in argument position indicated by the tag. For example, C7="George" declares that the constant "George" is argument 7 in the relation. This element must be of the type declared for that particular argument position.

1, 2, 3, 4, 5, 6, 7, ...

Declares the element as a variable in argument position indicated by the tag. For example, C7=VAR1 declares that the variable VAR1 is argument 7 in the relation.

It is incorrect for an ontology to declare relation subclauses with missing arguments; if this happens, the inference clause is incorrect and may be ignored.

One of the special declaration key words: equal, notEqual, greaterThan,greaterThanOrEqual,lessThanOrEqual, or lessThan. These are all binary special declarations, and indicate that argument 1 is equal, not equal to, greater than, or less than argument 2. For strings, instance keys, category names, and relation names, these declarations assume case-sensitivity, and greaterThan/lessThan have no meaning. For dates, assume that earlier dates are "less than" later dates. For truths, assume that NO is "less than" YES.

C1, C2, C3, C4, C5, C6, C7, ...

Declares the element as a constant in argument position indicated by the tag. For example, C7="George" declares that the constant "George" is argument 7 in the special declaration predicate. This element must be of the same type as all other argument positions. Since the current key words are all binary, values C3 and beyond currently have no meaning.

1, 2, 3, 4, 5, 6, 7, ...

Declares the element as a variable in argument position indicated by the tag. For example, C7=VAR1 declares that the variable VAR1 is argument 7 in the special declaration predicate. There must be at least one variable as an argument in a special declaration.

It is incorrect for an ontology to declare special declaration subclauses with missing arguments; if this happens, the inference clause is incorrect and may be ignored.

An inference must have at least one head subclause. Each head subclause is either a category declaration, or a relationship declaration. A head subclause indicates a claim that may be inferred if all the claims defined in the body subclauses are met.

SHOE-conformant HTML documents must declare themselves page instances and provide a unique key for themselves. To declare an HTML document to be a page instance, add the following text to the HEAD section of the document:

Before you can classify documents or establish relationships between them, you'll need to define exactly which ontologies these classifications and relations are derived from, and associate with each of these ontologies some prefix unique to that ontology. An HTML document may declare that is using as many ontologies as it likes, as long as each ontology has a unique prefix in the document. To declare that a page instance and all its subinstances use a particular ontology, use:

The prefix you are assigning the ontology. All categories and relations from this ontology which are used in this document must be prefixed with this prefix. Within this document, the prefix must be different from all prefixes declared with either <USE-ONTOLOGY ...> or <ONTOLOGY-EXTENDS ...> tags.

Instances may be classified, that is, they may be declared to belong to one or more categories in an ontology, using the CATEGORY tag:

<CATEGORY "prefixed.category"
[FOR="key"]>

``prefixed.category'' (mandatory)

A category with full prefix chains showing a path through used and extended ontologies back to the ontology in which it was defined.

FOR

Contains the key of the instance which is being declared to belong the category. If FOR is not declared, then the key is assumed to be that of the enclosing subinstance, or (if there is no enclosing subinstance) the page instance. If FOR is declared, then it provides the key.

A relationship with full prefix chains showing a path through used and extended ontologies back to the ontology in which it was defined.

1, 2, 3, 4, 5, 6, 7, ...

Declares the element in argument position indicated by the tag. For example, 7="George" declares that "George" is argument 7 in the relation. This element must be of the type declared for that particular argument position.

FROM

Synonymous with 1. A relation declared with both FROM and any 1, 2, 3, 4 ... tags is invalid.

TO

Synonymous with 2. A relation declared with both TO and any 1, 2, 3, 4 ... tags is invalid.

Explicit declarations take two forms, the "binary" form and the "general" form.

The "binary" form is used only for binary relationships. In this form, the tags FROM and TO are used. A tag (FROM or TO) may be omitted if the element type for that tag is INSTANCE. In this case, the key for the element is assumed to be that of the enclosing subinstance, or (if there is no enclosing subinstance) the page instance. If the tag is declared, and the type of the tag's argument is an instance, then it provides the key.

The "general" form may be used for relationship of any number of arguments (including binary relationships). In this form, the 1, 2, 3, 4 ... tags may be used. If a numbered tag is not declared, and an argument of that number position exists in the relationship, then the information should be considered "unknown". Note that this is different from the assumptions in the "binary" form.

The reason for two different forms is that while the "general" form is necessary to describe all possible relationships, the large majority of relationships are binary relationships between the claimant and some other instance or data. Hence, the "binary" format allows the claimant to declare binary relationships in a more natural format ("to" and "from") and not have to include himself in the claim when it's clear from the context. The "binary" form makes relations feel more like slots in the claimant "object" than ordinary predicate relationships.

A page instance may indicate that it permits another page instance to make declarations on its behalf. The other page instance may then declare a subinstance of the same key name as the permitting page instance's key. Agents should consider all claims made within that subinstance as if they were made on the permitting page instance itself. This might be done to consolidate claims for a web site into a single document, perhaps, or to eliminate a large number of claims from slowing down the download time of a document.

If the other page instance does not in fact declare this special subinstance, then delegating declarative power is simply a pointer to an agent to look elsewhere for relevant SHOE knowledge. To delegate declarative power to another web page, use (in the HEAD section of the HTML document):

<META HTTP-EQUIV="Instance-Delegate"
CONTENT="key">

``key'' The key of the page instance to whom we are delegating declaritive power.

Delegated subinstances should be considered to be proxies for the delegating page instance, and any subinstances contained within a delegated subinstance should be considered to be proxies for subinstances within the delegating page instance. An agent can guess that a subinstance is a delegated subinstance instead of an ordinary subinstance by examining its key. If the key is not based on the enclosing page instance URL, it's more than likely a delegated subinstance for some other URL. However, an agent should not take this at face value, but should check the delegating page instance first to make certain that the delegation is valid.