Dear all,
I have revised the description of the Vocabulary Management
Task Force (below). It ended up turning into something of
an outline:
1. First we define our terms.
2. We articulate our assumptions regarding the scope of
"vocabulary use in a Semantic Web context".
3. We formulate principles of good practice for identifying
and declaring terms and term sets.
4. We identify and summarize related problems about which
good practice is still evolving.
After reviewing recent discussion on the list as well as
materials from Tim Berners-Lee, SKOS, OASIS Published Subjects,
the Proposed TAG Finding on Versioning XML Languages, etc etc,
and of course DCMI, I feel some hope that agreement on parts
of the "good practice" section (Section 3) might actually
be achievable...
I picture the deliverable as roughly fifteen pages long,
which means no more than a maximum of one page each even for
the hairiest of the bullet points in Section 4. I'm thinking
we could perhaps divide up responsibility for drafting these
points among the Task Force members.
Ideally I would sit on this draft for a few days, but I want
to get this out before the telecon this afternoon. The next
telecon falls on my first day of vacation (July 8), and I
return in August, which is a hopeless month for group work.
In today's call I would like to establish whether the document
is good enough as a task-force description to turn into a
First Draft and move ahead with in September.
Tom
P.S. Note that I have listed as Members everyone who
indicated even a tentative interest in the TF.
P.S.S. If Ralph can give me CVS Put access, I'd be happy
to move the draft to the CVS space.
-----
SWBPD "Vocabulary Management" Task Force Description
Draft, 2004-06-24
NAME
Vocabulary Management
STATUS
Considered
COORDINATORS
Tom Baker and ?
MEMBERS
Libby Miller
Natasha Noy
Dan Brickley
Alistair Miles
Alan Rector
James Hendler
Aldo Gangemi
Bernard Vatant
Ralph Swick
OBJECTIVES
1. To establish the terminology for our discussion of the
declaration, identification, use, and management of
vocabulary terms in a Semantic Web environment -- something
roughly along the lines of:
-- Term
-- Vocabulary (a set of Terms)
-- Namespace (hmm...)
-- Namespace URI (identifies a Namespace)
-- Namespace Owner (controls a Namespace)
-- Language (uses and mixes Vocabularies)
-- Versioning (identification of changes to a Language)
-- Term Concept (notional)
-- Term URI (identifies a Term Concept)
-- Term Annotation (a representation of or gloss on a Term Concept)
-- Term Version (an identifiable state of a cluster of Term Annotations)
-- Term Version URI (identifies a Term Version)
-- Term Declaration (represents a term in a machine-processable schema
language)
-- Namespace Document (definitive material about a Namespace)
-- Namespace Schema (definitive material about a Namespace in a
machine-processable schema language).
2. To articulate assumptions regarding the use of terms in
a Semantic Web environment, including:
-- Open, loosely-coupled, mixed-language environments
("the Web").
-- Organizations or even individuals defining and publishing
vocabulary terms in an open, bottom-up, and distributed
process (as both desirable and de-facto).
-- The need to support processes of referencing,
repurposing, recombining, merging data from a diversity
of sources.
-- The need to support the inevitable evolution of languages
("evolvability").
-- The Must Ignore Principle: "If you find a language element
you don't understand, ignore it" (e.g., IETF practice,
Tim Berners-Lee, TAG Finding on Versioning).
-- The Principle of Free Extension: "Allow extensibility:
language designers should create extensible languages"
(TAG Finding on Versioning). Languages are extensible
if they can mix Vocabularies.
-- An emerging infrastructure (keyword "registries") for
holding or harvesting Vocabularies for display, search,
tool configuration, inferencing, or other such services.
3. To articulate guidelines of good practice for Namespace
Owners to identify and declare Terms and Term Sets (Vocabularies)
for use in a Semantic Web environment. Something like:
-- Identify Terms using URIs.
-- Term URIs should remain stabile within the limits of
"semantically compatible" change and evolution of the
Terms identified (where "semantically compatible"
is defined with respect to backwards and forward
compatibility, as in the TAG Finding on Versioning).
-- Associate URI-identified Terms with human-interpretable
Term Annotations -- usually, at a minimum, with text
defining the Term.
-- Consider associating the URI-identified Terms with
machine-processable Term Declarations in Namespace
Schemas.
-- Optionally, identify Term Versions using URIs.
Follow (by analogy) the W3C method of distinguishing
the timeless "Latest Version" from the date-stamped
"This Version" and "Previous Version" (is this method
formally described anywhere?).
-- The Namespace Owner should describe and publish a
description of the terms identified by URIs and of
policies governing their maintenance, e.g.: expectations
about persistence, institutional commitment, and
semantic stability.
-- Only a Namespace Owner should change the meaning of a Term
in a namespace (though non-owners may constrain meanings in
semantically compatible ways for use in specific contexts).
-- When making assertions about terms belonging to another
Namespace Owner, consider seeking their endorsement of
those assertions ("assertion etiquette" or "good neighbor"
policies).
-- Version Namespace Documents and Namespace Schemas the way
W3C versions documents and schemas.
4. To point to and briefly summarize ongoing the evolving
diversity of practices and approaches to declaring and
managing vocabularies. The following problems should each
be discussed in one page or less:
-- The problem of resolving (dereferencing) Term URIs.
URI-identified Terms should be associated with or
resolve to what sort of human-interpretable Term
Annotations or machine-processable Term Declarations?
The VM note should summarize the state of discussion
about whether a URI resolves to anything at all, and if
so, whether to a Web page, a machine-processable schema
(of whatever flavor), or a resource directory, pointing
to examples in practice. If Terms are documented in
multiple ways, should a Namespace Owner distinguish
between "canonical" versus "derived" sources?
-- The problem of work-flow and tools for documenting
Terms. The VM note should point to tools and methods
for maintaining multiple documentation forms, such as
schemas and Web pages.
-- The problem of finding versus becoming a Namespace
Owner. People want to know: "If we want to declare
a term but lack the institutional context to support
a persistent namespace policy, how can we do it?
Should I use an existing term, get a Namespace Owner
(such as DCMI) to declare one, or declare my own?
If I were to coin my own URI, where could I put it?"
-- The problem of describing Terms. What are the properties
of a Term Annotation or Term Declaration? Besides
a Definition, what are some of the properties
more commonly in use? How important is it for
interoperability to use existing properties in Term
Annotations or Term Declarations?
-- The schema language of a Term Declaration: The
VM note should not take a stand on the use of
a particular flavor of OWL/RDF+S for declaring a
vocabulary but should simply point to documents
which focus on this issue.
-- The formation of URIs. The issues here include
"hash or slash", the implied semantics of language
strings and of implied directory hierarchies in URIs,
and the use of version numbers in URI strings.
-- Application profiles. Most vocabulary initiatives
end up having some notion of "profile" to designate
either a constrained subset of a vocabulary and/or
a language which mixes multiple vocabularies for
a particular purpose or application. The VM note
should characterize the nature of these constructs,
possibly referring to notions such as Term Usage (a
cluster of Term Annotations about a Term of which one
is not the Namespace Owner).
-- The problem of "semantic context". Terms may be
embedded in clusters of relations from which they
may be seen in part to derive their meaning. It may
therefore not always be sensible to use those terms out
of context. Examples include the terms of thesauri
or ontologies, as well as XML elements, which may
be defined with respect to parent elements and may
therefore not always be reusable as properties in an
RDF sense without violating their semantic intent.
APPROACH
The issues above have been discussed and documented in
various vocabulary maintenance communities. The Task
Force deliverable will provide an overview of the issues
and principles involved in declaring and maintaining
a vocabulary, pointing to available examples of good
practice. In order to do this, it must first define
a common terminology for describing the diversity of
practices in a comparable manner.
SCOPE
Guidelines and principles for the identification,
declaration, and management of Terms in Vocabularies
(Metadata Element Sets, Thesauri, Ontologies, Published
Subjects, and the like).
DELIVERABLE
A relatively concise (fifteen-page?) technical note
summarizing principles of good practice, with pointers to
examples, about the identification of terms and term sets
with URIs, related policies and etiquette, and expectations
regarding documentation.
TARGET AUDIENCE
-- Maintainers of terms and term sets (vocabularies)
for use in a Semantic Web environment.
-- Anyone else wishing to declare terms reusably.
DEPENDENCIES (in the broadest sense)
-- THES - SWBP Thesaurus Task Force
http://www.w3.org/2004/03/thes-tf/mission
-- FOAF
http://xmlns.com/foaf/0.1/http://www.w3.org/2001/sw/Europe/events/foaf-galway/
-- Dublin Core - DCMI, for example:
http://dublincore.org/documents/dcmi-namespace/http://dublincore.org/documents/dcmi-terms/
-- Dublin Core - CEN MMI-DC Working Group
http://www.bi.fhg.de/People/Thomas.Baker/Versioning-20040611.txthttp://www.cenorm.be/isss/cwa14855/
-- Proposed TAG Finding on Versioning XML Languages
http://www.w3.org/2001/tag/doc/versioning/
-- SKOS - SWAD Europe
http://www.w3.org/2001/sw/Europe/reports/thes/1.0/guide/http://www.w3.org/2004/skos/core.rdfhttp://www.w3c.rl.ac.uk/2003/11/21-skos-mapping
-- W3C TAG on "What should a 'namespace document' look like?
http://www.w3.org/2001/tag/issues.html#namespaceDocument-8
-- SWAD-E Thesaurus (wants "standard" thesaurus change management guidelines)
http://lists.w3.org/Archives/Public/public-esw-thes/2004Apr/
-- Image Annotation meeting in Madrid
http://rdfig.xmlhack.com/2004/06/07/2004-06-07.html#1086615887.400193
-- Tim Berners-Lee on Evolvability
http://www.w3.org/DesignIssues/Evolution.html
-- OASIS Published Subjects Technical Committee
http://www.oasis-open.org/committees/download.php/3050/pubsubj-pt1-1.02-cs.pdfhttp://www.oasis-open.org/committees/tc_home.php?wg_abbrev=tm-pubsubjhttp://www.oasis-open.org/committees/tm-pubsubj/docs/recommendations/issues.htm
-- OASIS (ISO/TS 15000) ebXMLRegistry Semantic Content (Carl Mattocks)
-- Libby and Dan work on RDF query
http://www.ilrt.bris.ac.uk/discovery/2001/06/process/
-- Sandro's work on a vocabulary directory (reference needed)
-- Alan: experience in medical contexts with large vocabularies
-- Alistair: recommendations for change management
-- CORES Resolution on Metadata Element Identifiers
http://www.dlib.org/dlib/july03/baker/07baker.html
--
Dr. Thomas Baker Thomas.Baker@izb.fraunhofer.de
Institutszentrum Schloss Birlinghoven mobile +49-160-9664-2129
Fraunhofer-Gesellschaft work +49-30-8109-9027
53754 Sankt Augustin, Germany fax +49-2241-144-2352
Personal email: thbaker79@alumni.amherst.edu