RDF Content Labels: Schema Description

4 July 2005

NOTE:this document is published here as a
work-in-progress within the EU-funded Quatro project. It uses W3C
technology and explores the application of W3C's RDF/XML format to
Content Labelling in the PICS tradition. It is not currently a work item
of any W3C Working Group. I am exploring the possibilities for
bringing this work into a chartered W3C Group, eg. as an Interest
Group note through the Semantic Web Interest Group.
--Dan Brickley

Introduction

This document describes a method through which any number of resources may be associated with a common description.
A resource may link to a description directly or to a set of rules which can be processed to identify the correct
description.

The term Content Label (or just label) is used to refer to such descriptions.

The schema further allows groups of resources to be linked to a common classification. There is a difference between
a Content Label and a classification. A Content Label is a set of descriptors for a resource. In the examples below, a
fabric has a colour and a transparency value. In RDF terms, a Content Label is a Class and the properties of that class
describe the resource.

A classification is a description in itself - for example, a fabric might be natural or synthetic - and is
independent of a Content Label. In RDF terms, classifications are usually expressed as a Class.

An important feature of the RDF Content Label schema is support for a group of resources, such as all resources on a
given website, to have a default Content Label and a default classification. These can be overridden if the resource is
associated with a particular label and/or classification, whether directly or through processing a rule set.

The system has been designed to meet the needs of a wide range of potential uses, including, but not limited to,
trust mark schemes, child advocacy groups, educational institutes etc. It is offered as a successor to the
PICS system.

The namespace for the schema defined below is http://www.w3.org/2004/12/q/contentlabel#

Elements of a labelling scheme

A typical labelling scheme consists of one or more categories which group together related content descriptors and
zero or more modifiers which provide further context for a label.

To create a trivial example, a labelling organisation might define "Appearance" as a
category within which there were descriptors for:

Colour (c) with values of 0 for Black, 1 for Red and 2 for Green

Percentage transparency (t) with values between 0 and 30.

The labelling organisation further defines "Matt" (m) and "Shiny" (s) as an optional modifier.

In order to create RDF Content Labels, we define a small set of classes and properties that are the basis for
defining labelling schemes. A particular labelling scheme is created by defining instances of these classes and using
the properties to define the relationships between those instances.

Assuming relevant namespace declarations have been made, an example of a Content Label written in RDF/XML might then
be:

This simply means that, according to the example labelling scheme, the labelled resource is green, 20% transparent
and shiny. Note that the context modifier has no associated value - it is either present or absent.

To extend the example, a complete RDF/XML instance might be created as shown in Example 2 and made available at
http://www.resources.com/labels.rdf.

Resources can now be created that link to specific Content Labels. Any number of resources that appear shiny and green
with 20% transparency can include the link tag below (or its HTTP Response header equivalent):

Similar tags can be included to identify resources to be red, 10% transparent and matt, or, black and 0%
transparent.

Restricting a Content Label

Any resource, anywhere on the web can link to a Content Label using the mechanism described above. There are circumstances
where this is a useful facility. Equally, however, a content provider or labelling organisation may wish to restrict
the scope of their labels. This is achieved by using the label:hostRestriction property within
the label:Hosts class as shown in example 3.

The Hosts class (a sub class of label:Ruleset) is provided so that host restrictions can
easily be held in a separate RDF instance if required. Whether this is done or not, the same
SPARQL query will give the list of host restrictions.

Example 3 declares the same Content Labels as example 2, however, an agent SHOULD only consider the labels
applicable to URIs that match either of the label:hostRestrictions declared within the
label:Hosts class. That is, URIs on the resources.com or resources.co.uk hosts. If a
resource from another host links to the Content Label, an agent SHOULD disregard the description.

Content labels may be applied to subdomains of the listed host restrictions. In our examples with declared host
restrictions of resources.com and resources.co.uk, Content Labels would be applicable to www.resources.com, support.
resources.co.uk etc.

If it is necessary to restrict the scope of the labels further, this can be achieved by provding one or more
label:hasURI properties for the label:Ruleset class. Labels SHOULD only
be applied to URIs that are within at least one host restriction and that match at least one Perl5 regular expression
given in a label:hasURI property. This is particularly useful where several users may have
space on a common host, as is often the case with personal websites.

Identifying default labels

It is possible to define a default Content Label for the group of resources identified in the
label:Ruleset class. Furthermore, a default set of management data and a default classification
may be defined in a similar way. These are achieved by using the hasDefaultLabel,
hasDefaultManagementInfo and hasDefaultClassification properties
respectively. Each may link to, or directly include, descriptions, management and classification data that can be
applied to the resources unless otherwise specified.

In this example, everything on the resources.com and resources.co.uk hosts is described by label 1, is published
by Example Fabrics and offered under the given Creative Commons licence. These details can be overwritten by linking
the resource directly, or via rules (discussed below), to specific labels.

It is expected that, for the sake of optimisation, agents will locate the label:Ruleset class
first to quicky ascertain whether the RDF instance carries information that can be applied to the resource in question (by
checking for host restrictions) and, if so, find any default data supplied.

Rules for identifying a label

In examples 2 - 4, a resource was linked to a specific Content Label. A content provider will need to include a
specific link to the correct label in each resource. This is show diagrammatically in figure 1, where, for example,
Resource A will include the link tag: <link rel="meta href="labels.rdf#label3" ...

Figure 1. Each resource includes a link to a specific label within the RDF instance at labels.rdf.

This will be a convenient approach for some providers. However, a simple set of application rules allows all
resources to be linked to a single RDF instance containing multiple labels. An agent can process those rules to select
the correct label for a given resource, based on its URI. This is shown diagrammatically in figure 2.

Figure 2. A simple rule set allows all content to link to the same RDF instance and the correct label to be identified.

The advantage of this system is that a content management system or suite of servers can be configured to include
the same link tag with all resources, for example:

Example 5. A similar listing to previous examples with added application rules.

In the example, anything on resources.com or resources.co.uk that contains the string "afternoon" in the
URL will be associated with label 2 which describes it as being red, 10% transparent and matt. Anything on those hosts
with either "night" or "evening" in the URL will be described as black and opaque. All other
resources will be green 20% transparent and shiny.

All resources, irrespective of their description, will be identified as being published by Example Fabrics and
available under the given Creative Commons licence.

Labelling movies, games etc.

Movies, games and other forms of moving content often require different labels at different times. The notion of a
movie containing no sex or violence for most of its duration but having "occasional scenes of peril" or
"a single scene with nudity" is used by film classifiers in many parts of the world. The schema for RDF
Content Labels supports these ideas.

As described above, a label is associated with a resource via the label:hasLabel property. In
the context of labelling movies and games, this label should be taken to apply throughout the running time of
the resource. In addition, a resource may also have labels that describe content that occurs for limited periods
during the running time of the resource. These labels are associated with the resource by properties that indicate the
frequency with which they occur.

Figure 3: Movie labelling example

In figure 3, the movie's content is described by "label A." That is, the label applies throughout. However,
the movie also contains frequent scenes described by label B and a single scene described by label C.

The full list of frequency identifiers included in the schema is:

hasFrequentScenes: the content described by this label appears frequently throughout the running time of the resource

hasSeveralScenes: the content described by this label appears several times throughout the running time of the resource

hasOccasionalScenes: the content described by this resource appears occasionally throughout the running time of the resource

hasSingleScene: there is a single, relatively short period when content described by this label occurs.

Labelling scheme operators are, of course, free to provide more precise definitions and synonyms for these terms if
so desired.

Provenance of a label

The nature and expected use of Content Labels is such that the question of who generated and, perhaps, who has since checked the label is of particular importance. An end user may wish to know not only that the labelled resource is green, 20% transparent and shiny, but who says so and to what extent the description can be trusted.

This cannot be done by looking at the data alone. However, a variety of methods may be available, either from the
labelling organisation itself or third parties. For this reason, an RDF instance containing Content Labels may give
details of who created it. This is done using the Dublin Core [DC] creator property. It is expected that the creator will be expressed as a URL at which information is made available about how a user may increase their trust in the assertions made in the label. The homepage of the labelling organisation is likely to be a common value for this.

Example 6: The creator of the RDF instance is declared, along with the namespace about which
information is available from that authority.

As a single RDF instance may contain labels from any number of labelling organisations, there is a facility to
declare about which descriptions (i.e. which namespaces) a given label creator provides information. In example 6, it
is expected that information will be available at http://www.example.org about how a user or an agent might be able to
gain trust in assertions made about things like the colour and transparency of a resource. If labels from a different
namespace are provided, no information will be available at example.org about their trustworthiness.

It is expected that RDF/XML instances will be subject to processes such as digital signing etc. for the purposes of
user-trust.

ContentLabel (Class)

An instance of this class is a single descriptive label for content which may be applied to one or more resources.

Properties.The following properties may be specified for a Content Label instance:

hasModifier specifies the modifiers for the Content Label.

Any subproperty of the descriptor property.

Category (Class)

A category is a grouping of related content descriptors. These groupings may be thematic but this is not a constraint
on category instances in general.

Properties

hasDescriptor specifies the descriptors which make up this category.

descriptor (Property)

A descriptor defines a single form of content which may or may not be present in a resource. When labelling web
resources, a descriptor is used as a property of the Content Label that it applies to. This means that a descriptor
has a range of allowed values. The range of allowed values is not constrained.

hasDescriptor (Property)

This property connects a category to the descriptors that make up that category. It can be used by applications to
quickly list what all the possible descriptors for a category are.

isMoreSevereThan (Property)

The isMoreSevereThan property allows the scheme compiler to express a relationship between vocabulary terms
such that if A isMoreSevereThan B then declaring B makes A obsolete. The typical example for usage would be in an
age-based rating scheme where material suitable for 12 year olds was automatically suitable for 15 years olds as
well.

Modifier (Class)

A modifier provides context for a Content Label as a whole. Each content labelling scheme may define its own set of
modifiers.

hasModifier (Property)

This property connects an instance of the Modifier class to the Content Label that it modifies.

hasLabel (Property)

This is a property that links a resource to the Content Label that labels that resource.

hasDefaultLabel (Property)

This property links the Ruleset to a Content Label that provides a default label for all resources in the group. It
may be overridden if a hasLabel property points to a different Content Label

hasManagementInfo (Property)

This is a property that links a resource to a Content Label that contains management information such as the
Dublin Core [DC] metadata set, licensing information etc..

hasDefaultManagementInfo (Property)

This property links the Ruleset to a Content Label that provides default management information for the
group of resources. It may be overridden if a hasManagementInfo property points to a different Content Label.

hasClassification (Property)

This property links to a Class that provides a closed description of a resource, such as a subject classification, a
star rating or an age-classification.

hasDefaultClassification (Property)

This property of the Ruleset provides the default classification for the resource in the group.

hasFrequentScenes (Property)

This is a property that links a resource to a Content Label that labels that resource. It indicates that the resource,
typically a movie or game, has frequent scenes of the type described by the Content Label, however, it is not a
complete description.

hasSeveralScenes (Property)

This is a property that links a resource to Content Label that labels that resource. It indicates that the resource,
typically a movie or game, has several scenes of the type described by the Content Label, however, it is not a
complete description.

hasOccasionalScenes (Property)

This is a property that links a resource to a Content Label that labels that resource. It indicates that the resource,
typically a movie or game, has occasional scenes of the type described by the Content Label, however, it is clearly not a
complete description.

hasSingleScene (Property)

This is a property that links a resource to a Content Label that labels that resource. It indicates that the resource,
typically a movie or game, has a single scene of the type described by the Content Label, however, it is clearly not a
complete description.

Ruleset (Class)

The Ruleset Class defines the group of URIs for which descriptions are available and how the correct description may
be identified for a given URI.

Properties

hasHostRestrictions links to the Hosts class that, in turn, specifies the host(s) for which data is applicable

hasURI specifies an additional Perl5 regular expression that must match the URI to which
labels are being applied. If multiple hasURI properties are given, the target URI SHOULD match at least one of them.

hasDefault...

hasDefaultLabel

hasDefaultManagementInfo

hasDefaultClassification

these properties link to descriptions (labels or classifications) that should be applied to the grouped resources
unless a resource links to a specific description either directly or through processing the rules.

rules links to the list of rules to be processed in order to identify descriptions to apply
instead of the defaults.

hasHostRestrictions (property)

This property links a Ruleset to a Hosts class.

Hosts (Class)

The Hosts class (defined as a sub class of Ruleset) is a placeholder for a list of one or more hosts for which labels
are available.

Property

hostRestriction This property defines a host for which labels are available.

hostRestriction (Property)

As described above, this property defines a host for which labels are available. Subdomains of the given host
are in scope.

rules (Property)

This is a specialisation of rdf:Seq that contains classes that define which description(s) should be used for which URIs.
The rules SHOULD be processed in sequence, breaking out of the sequence on the first match. Individual rules are created as
simple rdf:Description, UnionOf or IntersectionOf Classes with hasURI properties.

IntersectionOf (Class)

This is a specialisation of rdf:Bag. If all of the hasURI properties of the containing Class
match the resource, the Content Label(s) and/or classification associated with it should be applied.

UnionOf (Class)

This is a specialisation of rdf:Bag. If any of the hasURI properties of the containing Class
match the resource, the Content Label(s) and/or classification associated with it should be applied.

hasURI (Property)

This property declares a string that should be processed as a Perl5 regular expression and matched against the
resource in question.

authorityFor (Property)

This property stands apart from the others. It has a domain of rdf:resource and a range of xsd:string. The value
of the property is a labelling scheme's namespace. It is used as a property of a description
of the RDF instance itself that SHOULD include a URI, given as the object of a dc:creator predicate (or similar), from
which information can be gained about how a user or agent may "test" the Content Labels. The authorityFor
property allows organisations to make statements about the veracity of an RDF instance with respect to labels using the
given namespace without making any comment on statements made using other namespaces. Although the value will be a URI, it is a
string, not an rdf resource since it does not form part of the descriptive graph.

Creating a Binary Labelling Scheme

This section describes one method of creating a binary labelling scheme, that is, one that declares whether a
particular property is or is not present in the labelled resource without implying any relationship between those
properties.

Identifying, Naming and Describing Scheme Components

The schema makes use of basic RDF functionality for identifying, naming and describing the components that make up
a labelling scheme.

Each component of the scheme is assigned an ID. This ID, when combined with the base URL of the RDF resource that
describes the scheme, gives a unique URI identifier for the component.

Each component should always be assigned a short name. This should be a name suitable for display in a user
interface and should be consumer-oriented in nature. A good short name would be "Appearance" or
"Colour", a bad short name would be "ax" or "ca." RDF provides a mechanism for these short
names by using the rdfs:label property. A component can have any number of rdfs:label property
values, although it is STRONGLY recommended that they should be distinguished from each other using an xml:lang
attribute and that there should be only one label per language.

A component may also be assigned a longer description that might be displayed to a user as pop-up help text. For
this description, use the RDF-defined rdfs:comment property. Again, multiple rdfs:comment
labels may be provided, but should be distinguished by language using the xml:lang attribute.

Finally, a component may also contain a link to another web resource that provides a much more detailed description.
For this link, use the RDF-defined rdfs:seeAlso property. The value of this property MUST
be an RDF resource URI.

Define Categories

Each category in a labelling scheme has the identifier, name and descriptions described above, and a list of the
descriptors that are part of that category. The descriptors are linked to the category using the
label:hasDescriptor property. As there is a list of descriptors, and we want the list to be
closed (i.e. no more can be added to the list without modifying our vocabulary file), we specify the
hasDescriptors property value as a collection.

Each descriptor must be defined as being a subPropertyOf the descriptor property.

Define Modifiers

Each modifier is simply defined as an instance of the label:Modifier class. Modifiers
should be defined with names and descriptions as described above, but there is no need to define any other properties
for a modifier.

Defining a Hierarchical Classification System

This section describes a method of creating a hierarchical classification scheme, that is, one in which the
declaration of one element supersedes another. The simplest example is of an age-based system - the higher the age
classification, the older the audience should be.

In Example 12, a series of ages are defined. This is in line with, for example, movie ratings. The
label:isMoreSevereThan property means that if we declare that a resource has a
classification of a16, then information about the classifications lower down can be inferred. In other words, if a
user has access to resources with the a16 classification, they automatically have access to those with classifications
of a12, a6 and All.

Linking to the classification.

If the URI of the classification schema is http://www.classification.org/# then any resource be classified using
the following (X)HTML Link tag:

Example 14. Example showing the same classification as Example 13 applied within a fragment of an RDF instance

Finally, the classification can be applied (and overridden) using the
label:hasDefaultClassification and label:hasClassification methods
exemplified in the use cases document [USE], in particular, use case 4.

Hierarchical Classifications within Content Labels

The examples above show the label:hasDefaultClassification and label:hasClassification
properties can be used to connect resources to a simple descriptive Class. It is equally valid to use hierarchical
classifications within Content labels.

The Example Geological Society publishes the scale of hardness and the Wentworth scale of size as hierarchical
schemas [GEO]. Using these scales, a
Content Label can be created such as that shown in Example 15.

Change log

Revision dated 2005-07-04 modifies the implementation of hostRestriction by introducing the Hosts class and
the linking property hasHostRestrictions. Previous versions of the document simply had label:hostRestriction as a
property of the label:Ruleset class. The amended version allows use of consistent SPARQL queries
whether the host restrictions are given in the same RDF instance as the rest of the data or held separately.