We have also recently gotten an updated estimate of the size of the semantic Web and a new release of the linking open data (LOD) cloud diagram.

A New Instance of the LOD Cloud Diagram

Since DBpedia’s release, it has become the central hub of linked open data as shown by this now-famous (and recently updated!) LOD diagram [1]:

[click for full size]

Each version of the diagram adds new bubbles (datasets) and new connections. The use of linked data, which is based on the RDF data model and uses Web protocols to name and access data, is proving to be a powerful framework for interconnecting disparate and heterogeneous information. As the diagram above shows, all types of information from a variety of public sources now make up the LOD cloud [2].

A Beginning Basis for Estimating the Size of the Semantic Web

The most recent analysis of this LOD cloud is by Michael Hasenblas and colleagues as presented at I-Semantics08 in September [3]. About 50 major datasets comprising roughly two billion triples and three million interlinks were contained in the cloud at the time of their analysis. They partitioned their analysis into two distinct types: 1) single-point-of-access datasets (akin to conventional databases), such as DBpedia or Geonames, and 2) distributed records characterized by RDF ontologies such as FOAF or SIOC. Their paper [3] should be reviewed for its own conclusions. In general, though, most links appear to be of low value (though a minority are quite useful).

Simple measures such as triples or links have little meaning in themselves. Moreover, and this is most telling, all of the LOD relationships in the diagram above and the general nature of linked data to date have based their connections on instance-level data. Often this takes the form that a specific person, place or thing in one dataset is related to that very same thing in another dataset using the owl:sameAs property; sometimes it is that one person knows another person; or, it may be in other examples that one entry has an associated photo. Entities are related to other entities and their attributes, but little is provided about the conceptual or structural relationships amongst those entities.

Instance-level mapping is highly useful to aggregate various attributes or facts about given entities or things. But, they only scratch the surface of the structure that can be made available through linked data and the conceptual relationships between and amongst all of those things. For those relationships to be drawn or inferred a different level of linkages needs to be made: what is the class or collection or schema view of the data.

The UMBEL Subject Concept ‘Backbone’

UMBEL, or similar conceptual frameworks, can provide this structural backbone.

UMBEL (Upper Mapping and Binding Exchange Layer; see http://www.umbel.org) is a lightweight reference ontology of about 20,000 subject concepts and their logical and semantic relationships. The UMBEL ontology is a direct derivation of the proven Cyc knowledge base from Cycorp, Inc. (see http://www.cyc.com).

UMBEL’s subject concepts provide mapping points for the many (indeed, millions of) named entities that are their notable instances. Examples might include the names of specific physicists, cities in a country, or a listing of financial stock exchanges. UMBEL mappings enable us to link a given named entity to the various subject classes of which it is a member.

And, because of relationships amongst subject concepts in the backbone, we can also relate that entity to other related entities and concepts. The UMBEL backbone traces the major pathways through the content graph of the Web.

The UMBEL backbone provides structure and relationships at large or small scale. For example, in its full extent, the structure of UMBEL’s complete structure resembles:

But, we can dive into that structure with respect to automobiles or related concepts . . .

. . . all the way down to seeing the relationships to Saab cars:

It is this ability to provide context through structure and relations that can help organize and navigate large datasets of instances such as DBpedia. Until the application of UMBEL — or any subject or class structure like it — most of the true value within DBpedia has remained hidden.

But no longer.

Some Example Queries

UMBEL already had mapped most DBpedia instances to its own internal classes. By a simple mapping of files and then inferencing against the UMBEL classes, this structure has now been brought to DBpedia itself. Any SPARQL queries applied against DBpedia can now take advantage of these relationships.

Below are some sample queries Kingsley used to announce these UMBEL capabilities to the LOD mailing list [4]. You can test these queries yourself or try alternative ones by using a standard SPARQL query.

For example, go to one of DBpedia’s query endpoints such as http://dbpedia.org/sparql and cut-and-paste one of these highlighted code snippets into the ‘Query text’ box:

5. Create UMBEL Inference Rules

Conclusion

A new era of interacting with DBpedia is at hand. Within a period of just more than a year, the infrastructure and data are now available to show the advantages of the semantic Web based on a linked Web of data. DBpedia has been a major reason for showing these benefits; it is now positioned to continue to do so.

[1] This new LOD diagram is still being somewhat updated based on review. The version shown above is based on the one posted at the W3C’s SWEO wiki with my own updates of the two-way UMBEL links and the blue highlighting of DBpedia and UMBEL. There is also a clickable version of the diagram that will take you to the home references for the consituent data sources in this diagram; see http://www4.wiwiss.fu-berlin.de/bizer/pub/lod-datasets_2008-09-18.html.

[2] The objective of the Linking Open Data community is to extend the Web with a data commons by publishing various open datasets as RDF on the Web and by setting RDF links between data items from different data sources. All of the sources on the LOD diagram are such open data. However, the best practices of linked data can also be applied to proprietary or intranet information as well; see this FAQ.

[3] See, Michael Hausenblas, Wolfgang Halb, Yves Raimond and Tom Heath, 2008. What is the Size of the Semantic Web?, paper presented at the International Conference on Semantic Systems (I-Semantics08) at TRIPLE-I, Sept. 2008. See http://sw-app.org/pub/isemantics08-sotsw.pdf.

Schema.org Markup

headline:

DBpedia Gains a Subject Class Structure; LOD Cloud Diagram Updated

alternativeHeadline:

author:

Mike Bergman

image:

description:

The Linkage of UMBEL’s 20,000 Subject Concepts and Inferencing Brings New Capabilities Thanks to Kingsley Idehen and OpenLink Software, DBpedia has been much enrichened with its mapping to UMBEL‘s 20,000 class-based subject concepts. DBpedia is the structured data version of Wikipedia that I (among many) wrote about in depth in April of last year shortly […]