Posted:May 21, 2012

New UMBEL Release Gains schema.org, GeoNames Capabilities

Modularization Also Leads to Big Graph Visualization

We are pleased to announce the release of version 1.05 of UMBEL, which now has linkages to schema.org[6] and GeoNames[1]. UMBEL has also been split into ‘core’ and ‘geo’ modules. The resulting smaller size of UMBEL ‘core’ — now some 26,000 reference concepts — has also enabled us to create a full visualization of UMBEL’s content graph.

The first notable change in UMBEL v. 1.05 is its mapping to schema.org. schema.org is a collection of schema (usable as HTML tags) that webmasters can use to markup their pages in ways recognized by major search providers. schema.org was first developed and organized by the major search engines of Bing, Google and Yahoo!; later Yandex joined as a sponsor. Now many groups are supporting schema.org and contributing vocabularies and schema.

I was one of the first to hail schema.org hours after its announcement [7]. It seemed only fair that we put our money where our mouth is and map UMBEL to it as well.

The UMBEL-schema.org mapping was manually done by, firstly, searching and inspecting the current UMBEL concept base for appropriate matches. If that mapping failed to find a rather direct correspondence between existing UMBEL concepts and the types in schema.org, the source concept reference of OpenCyc was then inspected in a similar manner. Failing a match from either of these two sources, the decision was to add a new concept to the ‘core’ UMBEL. This new concept was then appropriately placed into the UMBEL reference concept subject structure.

The net result of this process was to add 298 mapped schema.org types to UMBEL. This mapping required a further three concepts from OpenCyc, and a further 78 new reference concepts, to be added to UMBEL. Along with the new updates to UMBEL and its mappings, the section of Key Files below provides further explanatory links. We are reserving the addition of schema.org properties for a later time, when we plan to re-organize the Attributes SuperType within UMBEL.

Modularization of the UMBEL Vocabulary

Even in the early development of UMBEL there was a tension about the scope and level of what geographic information to include in its concept base. The initial decision was to support country and leading-country province and state concepts, and some leading cities. This decision was in the spirit of a general reference structure, but still felt arbitrary.

GeoNames is devoted to geographical information and concepts — both natural and human artifacts — and has become the go-to resource for geo-locational information. The decision was thus made to split out the initial geo-locational information in UMBEL and replace it with mappings to GeoNames. This decision also had the advantage of beginning a process of modularization of UMBEL.

Two sets of reference concepts were identified as useful for splitting out from the ‘core’ UMBEL in a geo-locational aspect:

Geopolitical places and places of human activities and facilities

Natural geographical places and features.

These removed concepts were then placed into a separate ‘geo’ module of UMBEL, including all existing annotations and relations, resulting in a module of 1,854 concepts. That left 26,046 concepts in UMBEL ‘core’. Because of some shared parent concepts, there is some minor overlap between the two modules. These are now the modular splits in UMBEL version 1.05.

Mapping to GeoNames

GeoNames has a different structure to UMBEL. It has few classes and distinguishes its geographic information on the basis of some 671 feature codes. These codes span from geopolitical divisions — such as countries, states or provinces, cities, or other administrative districts — to splits and aggregations by natural and human features. Types of physical terrain — above ground and underwater — are denoted, as well as regions and landscape features governed by human activities (such as vineyards or lighthouses) [1]. We wanted to retain this richness in our mappings.

We needed a bridge between feature codes and classes, a sort of umbrella property generally equivalent to owl:sameAs in nature, but with some possible inexactitude or degree of approximation. The appropriate choice here is umbel:correspondsTo, which was designed specifically for this purpose [2]. This predicate is thus the basis for the mappings.

The 671 GeoNames feature codes were manually mapped to corresponding classes in the UMBEL concepts, in a manner identical to what was described for schema.org above. The result was to add another further three OpenCyc concepts and to add 88 new UMBEL reference concepts to accommodate the full GeoNames feature codes. We thus now have a complete representation of the full structure and scope of GeoNames in UMBEL.

There are three modes in which one can now work with UMBEL:

With UMBEL ‘core’ alone, recommended when your concept space is not concerned with geographical information

In the latter case, you may use SPARQL queries with the umbel:correspondsTo predicate to achieve the desired retrievals. If more logic is required, you will likely need to look to a rules-based addition such as SWRL [3] or RIF [4] to capture the umbel:correspondsTo semantics.

New Big Graph Visualization

Because of the UMBEL modularization, it has now become tractable to graph the main ontology in its entirety. The core UMBEL ontology contains about 26,000 reference concepts organized according to 33 super types. There are more than 60,000 relationships amongst these concepts, resulting in a graph structure of very large size.

It is difficult to grasp this graph in the abstract. Thus, using methods earlier described in our use of the Gephi visualization software [5], we present below a dynamic, navigable rendering of this graph of UMBEL core:

Note: at standard resolution, if this graph were to be rendered in actual size, it would be larger than 34 feet by 34 feet square at full zoom !!! Hint: that is about 1200 square feet, or 1/2 the size of a typical American house !

Note: If you are viewing this in a feed reader, click here to see the interactive graph.

This UMBEL graph displays:

All 26,000 concepts (“nodes”) with labels, and with connections shown (though you must must zoom to see)

The color-coded relation of these nodes to the 33 or so major SuperTypes in UMBEL, as well as the relative position of these clusters with respect to one another, and

When zooming (use scroll wheel or + icon) or panning (via mouse down moves), wait a couple of seconds to get the clearest image refresh:

The solution we are currently exploring is to define a new property to assert that two RDF instances are co-referential when they are believed to describe the same object in the world. The two RDF descriptions might be incompatible because they are true at different times, or the sources disagree about some of the facts, or any number of reasons, so merging them with owl:sameAs may lead to contradictions. However, virtually merging the descriptions in a co-reference engine is fine — both provide information that is useful in disambiguating future references as well as for many other purposes.Our property (:coref) is a transitive, symmetric property that is a super-property of owl:sameAs and is paired with another, :notCoref that is symmetric and generalizes owl:differentFrom.

When we look at the analog properties noted above, we see that the property objects tend to share reflexivity, symmetry and transitivity. We specifically designed the umbel:correspondsTo predicate to capture these close, nearly equivalent, but uncertain degree of relationships.

[3] SWRL (Semantic Web Rule Language) combines sublanguages of the OWL Web Ontology Language (OWL DL and Lite) with those of the Rule Markup Language (Unary/Binary Datalog). SWRL has the full power of OWL DL, but at the price of decidability and practical implementations. See further http://www.w3.org/Submission/SWRL/.

[4] The Rule Interchange Format (RIF) is a W3C Recommendation. RIF is based on the observation that there are many “rules languages” in existence, and what is needed is to exchange rules between them. RIF includes three dialects, a Core dialect which is extended into a Basic Logic Dialect (BLD) and Production Rule Dialect (PRD). See further http://www.w3.org/2005/rules/wiki/RIF_FAQ.

UMBEL version 1.05 has been released with linkages to schema.org and GeoNames. UMBEL has also been split into 'core' and 'geo' modules, which has also enabled us to create a full visualization of UMBEL's content graph