Background

One of the major stumbling blocks in deploying RDF has been the
difficulty data providers have in determining which vocabularies
to use. For example, a publisher of scientific papers who wants
to embed document metadata in the web pages about each paper has
to make an extensive search to find the possible vocabularies and
gather the data to decide which among them are appropriate for
this use. Many vocabularies may already exist, but they are
difficult to find; there may be more than one on the same subject
area, but it is not clear which ones have a reasonable level of
stability and community acceptance; or there may be none, i.e.
one may have to be developed in which case it is unclear how to
make the community know about the existence of such a
vocabulary.

There have been several attempts to create vocabulary catalogs,
indexes, etc. but none of them has gained a general acceptance and
few have remained up for very long. The latest notable attempt
is LOV, created
and maintained by Bernard Vatant (Mondeca) and Pierre-Yves
Vandenbussche (DERI) as part of the DataLift project. Other
application areas have more specific, application-dependent
catalogs; e.g., the HCLS community has established such
application-specific "ontology portals" (vocabulary hosting and/or
directory services) as NCBO and OBO. (Note that for the purposes
of this document, the terms "ontology" and "vocabulary" are
synonyms.) Unfortunately, many of the cataloging projects in the
past relied on a specific project or some individuals and they
became, more often than not, obsolete after a while.

Initially (1999-2003) W3C stayed out of this process, waiting
to see if the community would sort out this issue by itself. We
hoped to see the emergence of an open market for vocabularies,
including development tools, reviews, catalogs, consultants, etc.
When that did not emerge, we decided to begin offering ontology
hosting (on www.w3.org) and we began the Ontaria project (with
DARPA funding) to provide an ontology directory service.
Implementation of these services was not completed, however, and
project funding ended in 2005. After that, W3C took no active
role until the emergence of schema.org and the eventual creation
of the Web Schemas Task Force of the Semantic Web Interest Group.
WSTF was created both to provide an open process for schema.org
and as a general forum for people interested in developing
vocabularies. At this point, we are contemplating taking a more
active role supporting the vocabulary ecosystem.

Business model/member benefits

The W3C Vocabulary management proposals set out here have
emerged from our extensive discussions around the strategic
direction that W3C should take in the Semantic Web and eGov
Activities. It answers a clear community need and in that sense is
simply something that W3C should do. Arguably, it's something W3C
should have been doing for a long time but that's water under the
bridge. Our work is required to underpin the development of the
linked and open data visions. In that respect, undertaking this
work is something our members — actual and potential —
can reasonably expect of us.

The proposals below are simple and easy to do. Each has
potential to generate tremendous value for the community. We
suggest waiting until the value is clearly present before putting
much effort into monetization. At a few points below, we note
some possible revenue sources. In addition, once this work has
demonstrated its value, it may be easier to obtain grant funding
to improve it.

Vocabulary Management Activities

1. Vocabulary Providers Group

Goal: provide a forum for experts to talk to each other and
newcomers to talk to the experts. This group can also help point
out and coordinate areas of overlap among vocabularies, and help
gather people into groups for new vocabularies

Proposal: redirect the Web Schemas Task
Force to take on this role. Rename it, avoiding the word
'schema' to help clarify it's not particularly about schema.org,
but keep the mailing list name (public-vocabs@w3.org) to avoid
disruption. Add another chair. Perhaps call it "Vocabulary
Advice Task Force" or "Vocabulary Coordination Group".

If possible, the group should host regular discussions on
general vocabulary development issues, answer questions. Maybe
have regular presentations where one group presents its vocabulary
to the wider audience to get feedback (like w3c staff project
reviews).

2. Domain-Specific Vocabulary Groups

Goal: provide a forum for the people involved in each
vocabulary to communicate and share material. Provide a
trusted archive of public comments.

Proposal: in general, use Community Groups (CGs), with their
normal tools (mailing lists, wikis, etc). Use Working Groups
in situations where enough W3C members want a more formal
process, possibly more restricted participation, and the stamp
of "W3C Recommendation" on the vocabulary. (possible revenue source)

This is already done sometimes, as with the Open Annotation
CG. We can encourage more people to do this by linking it
with other services, such as vocabulary hosting (item 3,
below).

Note that vocabularies have somewhat different stability
and interoperability characteristics than most W3C Recommended
technologies, so the full Recommendation Track is often not
warranted. A well-constructed and properly published
vocabulary seems to be taken at least as seriously by much of
the market as a W3C-Recommended one.

3. Vocabulary Hosting

Goal: Make it practical, even easy, for people to publish their
vocabulary namespace document in accord with best practice,
especially with regard to long-term stability.

Proposal: offer URLs starting with
http://www.w3.org/ns/ to any W3C group, including
Community Groups, as long as that group has an open/consensus
decision process. Provide a simple Web interface for people
to allocate a namespace and then update the contents of the
namespace document as needed. This would be subject to
reasonable terms of service, including the understanding that
individuals act as editors, on behalf of the group, and that
W3C has ultimate authority over the content.

The W3C
Namespace Policy already allows W3C groups to claim URLs
starting with http://www.w3.org/ns/, but that
policy was written before Community Groups existed and its
applicability to them is unclear. At the moment, requests to
allocate names and requests to update the contents of a
namespace document have to be handled by W3C staff.

In a second iteration, the W3C vocab hosting service could provide
various tools which support or even enforce good practice in
vocabulary development, such as not removing terms that people
might be using. Existing Web-based tools such as WebProtege
(from Stanford), Neologism (from DERI), and Knoodl (from
Revelytix) should be considered.

A key advantage of W3C hosting, unlike most other options
available, is that we can handle changes in personnel,
business models, governments, corporate mergers, etc, through
our normal group consensus processes.

Q: What if someone wants to host a vocabulary at W3C, but
does not want to turn over control to the group? A:
For now: tell them to come talk to us and we'll consider it on
a case-by-case basis. (possible revenue source)

Q: What if a name is allocated and never used? A: Abandoned
vocabulary names may be reclaimed, depending on evidence of
their use.

Q: What about name conflicts? What if someone wants to
claim "html" or "google" as a namespace name?A: The
terms of service will require that groups declare they have
made reasonable effort to find other uses of the term,
considered them, and concluded there is no significant
likelihood of user confusion. If such confusion is reported,
especially in the early days of the term being used, we may
reallocate the name.

Q: Can we use https://www.w3.org/ns/ (TLS secure)?
A: That's included automatically; all of www.w3.org is simultanously served with TLS.
Q: What about http://id.w3.org/?
A: We are considering this, as a way to better manage the load.
Q: What about http://www.w3.org/yyyy/mm/?
A: It's a possibility, if someone wants that. Does anyone?
Q: What about http://www.w3.org/ns/foo/bar (subdirectories)?
A: Yes, subdirectories will be supported. (maybe this is just reserve-a-prefix)
Q: What about http://foo.org/ ?
A: Possibly in the future, to
accommodate vocabularies that started outside W3C or that see a
need for someday moving away from W3C. (possible revenue
source)

4. Vocabulary Selection Metadata

Goal: Make sure that people selecting vocabularies have
the data they need to make a good choice. Some of this data
will be provided by the vocabulary providers (first-party
metadata, self-reported) while some will be provided by others
(third-party metadata).

Proposal: ask a group (maybe the experts group, item 1,
above — or maybe a new CG) to come up with a vocabulary
for this metadata, promote it, and also use it in the W3C
vocabulary directory (item 5, below)

Some of this is currently in-scope for the Government
Linked Data (GLD) Working Group, under Best Practices for
Vocabulary Selection ("... issues of stability, security, and
long-term maintenance commitment...").

Some possible items:

Who/what is publishing data, and who/what is consuming data, using the vocabulary? (Might be broken down by each term in the vocabulary)

What public comments have been made about this vocabulary, and how have they been addressed? Maybe separate different kinds of comments, eg bug reports, editorial suggestions, and in-depth professional reviews

Who has been involved in the development/maintenance of this vocabulary? How were these people selected/recruited, and what decision process was used among them? (See UK Government's 2012 consultation for a definition of open standards)

Is the vocabulary encumbered by a restrictive license?

Is the vocabulary encumbered by a patents?

Does the vocabulary complement existing vocabularies or does
it duplicate/compete with them?

Does the vocabulary have proper basic metadata, on
authorship, provenance, etc.?

Is the vocabulary actively maintained?

What is the vocabulary's versioning policy? In particular, under what circumstance might a term's definition be changed or removed?

Have appropriate domain experts been involved in the
development process, or at least reviewed it?

Is there a credible fall back should the organization
currently responsible no longer be able or willing to maintain
it in the public interest?

5. Vocabulary Directory

Goal: Provide vocabulary consumers (people publishing and
in some cases consuming RDF data) with a convenient way to find
the vocabularies that might work for them, along with metadata
to help them choose among the options.

Proposal: in the first iteration, make a simple web page
showing all the vocabularies we host, along with all the others
that have been reported to us (via a web form which asks for
basic metadata). For each vocabulary, provide some of the
metadata from item 4, above. Depending on available resources
and on feedback, grow this into a more complete "shopping site"
where people can search, sort, and filter on various criteria,
as well as enter their own ratings, reviews, and other
metadata.

Q: Do we include all known vocabularies, or only ones that
seem pretty good? A: We include any vocabulary for which
someone is willing to fill out the form, or which already has
embedded the basic required metadata.

Q: Do we include en masse the vocabularies known to
NCBO, LOV, prefix.cc, etc?A: Probably not in the first
iteration, because without good search and filter tools, those
will dwarf the others. At the start, focus on vocabularies
hosted at W3C or which people specifically request to be
included, via the submission form.