Posted:July 12, 2007

Announcing UMBEL: A Lightweight Subject Structure for the Web

A Reference for Data Set Interoperability, Look Up and Retrieval

UMBEL is a lightweight way to describe the subject(s) of Web content, akin to the relationship “isAbout”. Its subject reference structure is meant to be simple, universally applicable, and agnostic to the form or schema of source data. UMBEL does not replace formal domain or upper ontologies and has little or no inferential power. It is merely a pool of consensus ‘proxies’ to initially describe what subjects data sets are about.

UMBEL’s design includes binding mechanisms that work with HTML, tagging or other standard practices, including various RDF schema and more formal ontologies. Its reference subject ‘backbone’ is derived from the intersection of common subjects found on popularly used Web sites and other accepted subject references. Access and easy adoption is given preference over inferential or logical elegance.

In addition to its core reference subjects, the UMBEL project will provide look up, query, registration, pinging, and related services. The project is completely open and supported by a community process. All project products are made available without charge under Creative Commons licenses. UMBEL’s development is being backed by a number of leading open data efforts and entities; see the last section for how to get involved.

The UMBEL project stands for the Upper-level Mapping and Binding Exchange Layer. UMBEL is pronounced like “humble” — in keeping with its nature — except without the “h”. The name has the same Latin root as umbrella (umbra for shade, or umbella for parasol), meant to convey the umbrella-like nature of UMBEL’s subject bindings.

What’s the Problem?

With dozens of protocols and hundreds of thousands of potentially useful data sets, there are many challenges to getting Web data to interoperate. Two of these problems are foundational.

First, there are dozens of formalisms, schema, models and serializations for characterizing and communicating data and data content on the Web, ranging from the simplest Web page to the most formal OWL ontologies. A universal mechanism is lacking for how these variations can describe or publish to one other what they are about. This mechanism must be simple, neutral, broadly applicable and widely accepted.

Second, even if this publication mechanism existed, there is no accepted set of subjects for referencing what this diverse content is about. No attempt to date to provide a reference subject structure has been widely accepted.

Combined, these twin problems mean there are few road signs and poor road maps for how to find relevant data sets on the Web. UMBEL provides simple — but necessary — first steps to address these basic problems.

Simple, with Low Expectations

Advocates and users of various models and formalisms on the Web have their real-world reasons for embracing each form. Domain experts and various communities have their own world views, represented by their own vocabularies and structure. Only by understanding and respecting those differences can means to bridge them become widely accepted.

There is, of course, no such thing as complete objectivity or neutrality. But, from the standpoint of UMBEL and its purpose, keeping its approach simple with a minimum of structure poses the least challenge to the world views of existing publishers and data sets on the Web — and therefore the best likelihood of wide acceptance. Where choices are necessary, such as the selection of the reference subjects themselves, building from accepted Web practices and norms helps minimize bias and arbitrariness.

Thus, by necessity, UMBEL must be simple with limited ambitions. Its reference structure is merely a ‘bag of subjects’, with each subject reference only acting as a ‘proxy’ to a set of concepts that specific users may describe and refer to in their own ways. UMBEL’s core structure is completely flat, with no implied hierarchy or structure amongst its reference subjects. UMBEL’s reference subjects are simply that, proxy references and no more.

UMBEL thus has no or minimal inference power (though some disambiguation is possible). Inferencing, usefulness and authoritativeness are the responsibility of others. UMBEL is meant only to be a map to possible subjects, not whether those destinations are worthwhile or, indeed, even correct.

Consensus and Use Determine the Subject Pool

The selection of the actual subject proxies within the UMBEL core are to be based on consensus use. The subjects of existing and popular Web subject portals such as Wikipedia and the Open Directory Project (among others) will be intersected with other widely accepted subject reference systems such as WordNet and library classification systems (among others) in order to derive the candidate pool of UMBEL subject proxies. The actual methodology and sources of this process are still being determined (see further the project specification).

The objective, in any case, is to provide a simple and transparent method for subject selection that reflects current use and consensus to the maximum extent possible. The anticipation is that the first subject candidate pool will number in the many hundreds to the low thousands of proxies.

UMBEL as a general subject ‘backbone’ is meant to be useful as a reference by more specific domains or ontologies, but not fully descriptive for any of them. The core, internal UMBEL ontology is to be based on RDF and written in the RDF Schema vocabulary of SKOS (Simple Knowledge Organization System).

Universal Applicability

Very simple binding mechanisms will be developed and extended to the most widely employed approaches on the Web. UMBEL will, at minimum, support Atom, microformats, OPML, OWL, RDF, RDFa, RDF Schema, RSS, tags (via Tag Commons), and topic maps in its first release. The simplicity of the ontology and approach will enable other formats to be easily added.

Ping, update and registration protocols will also be provided for these formats. Existing project sponsors already possess a variety of ping, update, conversion and translation utilities for such purposes.

Additional UMBEL Initiatives

Besides the core structure, the UMBEL project will also develop a second ‘unofficial’ structure of hierarchical and interlinked subject relationships. This ‘unofficial’ structure will be used solely for look up and browsing functions, and will reside external to the core UMBEL subject and binding structure. Indeed, we anticipate that many such look-up structures from other parties may evolve over time for specific purposes and viewpoints.

Finally, besides development of the UMBEL ontology, the project will also be providing a data set registration service, information and collaboration Web site, tools clearinghouse, and support for language translations and some tools development.

How to Help and Get Involved

The initial project site is at http://www.umbel.org, including this project introduction, the draft project specification (http://www.umbel.org/proposal.xhtml), and other helpful background information. A more interactive Web site is currently under development and will be announced shortly.

Schema.org Markup

headline:

Announcing UMBEL: A Lightweight Subject Structure for the Web

alternativeHeadline:

author:

Mike Bergman

image:

description:

A Reference for Data Set Interoperability, Look Up and Retrieval UMBEL is a lightweight way to describe the subject(s) of Web content, akin to the relationship “isAbout”. Its subject reference structure is meant to be simple, universally applicable, and agnostic to the form or schema of source data. UMBEL does not replace formal domain or […]

articleBody:

see above

datePublished:

July 12, 2007

3 thoughts on “Announcing UMBEL: A Lightweight Subject Structure for the Web”

Your “isAbout” relationship sounds very much like dc:subject from Dublin Core. In fact, they are probably the same relationship. It might be useful for UMBEL to make the relationship between these two relationships clear.

I’m not 100% clear on what UMBEL is. Is it intended to, for example, allow me to register my Topic Maps PSI set (could be an ontology (that is, just a set of entity and property types), or a taxonomy, or some other kind of data structure) so others can find it?

If so, why RDF, and why not Topic Maps, given that inferencing was not a key requirement?

You are correct that UMBEL is similar to dc:subject or foaf:interest (there are others). However, unlike those approaches, there will be a reference set of subject ‘proxy’ names to pick from and synonym-like relationships similar to WordNet synsets. As intended, a single proxy name can therefore relate to the embracing “concept” of the subject without getting overly hung up on the precise name or description. Section 3.1 of the draft UMBEL specification, in fact, touches a bit on the use of UMBEL in relation to such other systems: “In addition to their use as a binding layer, this standard listing of subjects can also be referenced by resources described by other ontologies (e.g., dc:subject or foaf:interest).”

You are also correct on UMBEL’s intent as a means to describe the subject(s) of various data sets, applicable to Topic Maps and other data formalisms, so that others can find it.

Finally, RDF was chosen as the internal representation of UMBEL over Topic Maps for a number of reasons: 1) RDF appears to be a natural “middle ground” in the spectrum of data formalisms (see my earlier post on this); 2) SKOS, the actual intended language for UMBEL, is a RDF Schema designed for the kind of information and classification purpose of UMBEL; and 3) there are RDF/SKOS-Topic Map interoperability prospects that you have described directly and in support of the W3C. Most importantly, the choice of RDF (SKOS, actually) is really meant to be an internal representation and not prejudicial to the use of UMBEL in relation to any other data formalism.

The whole point is to be neutral (as much as possible! 🙂 ) and applicable to a number of important frameworks. As someone so involved with Topic Maps over the years, I hope you can keep an open mind at minimum and preferably a helping hand in this effort.

I encourage you to share your thoughts and concerns on the UMBEL Google forum.