PLEASE NOTE: This blog has moved to www.novaspivack.com -- Go there for current content from now on.

You should follow me on Twitter for unusual news and ideas at @novaspivack

August 31, 2006

The Ontology Integration Problem

The OWL language, and tools such as Protege and TopBraid Composer make it easy to design ontologies. But what about the problem of integrating disparate ontologies? I haven't really found a good solution for this yet.

In my own experience designing a number of OWL ontologies (500 classes - 3000 classes on average) it has often been easier to create my own custom ontology branches to cover various concepts than to try to integrate other ontologies of those concepts into my own.

One of the reasons for this is that each ontology has it's own naming conventions, philosophical orientation, domain nuances, design biases and tradeoffs, often guided by particular people and needs that drove their creation. Integrating across these different worldviews and underlying constraints is often hard. Simply stating that various classes or properties are equivalent is not necessarily a solution because thier inheritance may not in fact be equivalent and thus they may actually be semantically quite different in function, regardless of expressions of equivalence. OWL probably needs to be a lot more expressive in defining mappings between ontologies to truly resolve such subtle problems.

The alternative to mapping -- importing external ontologies into your own -- is also not great because it usually results in redundancies, as well as inconsistent naming conventions and points of view. As you keep adding colors to your pallete, it starts to become kind of brown. If the goal is to make ontologies that are elegant, easy to maintain, extend, understand and apply, importing ontologies into other ontologies doesn't seem to be the way to accomplish that. Different ontologies usually don't fit together well, or even at all in some cases.

Because of the above problems, it is often easier to simply reinvent
the wheel. Instead of trying to map between ontologies, or import other
ontologies into one's own ontology, it is usually easier to just write
everything oneself. Of course this is not really as efficient as we
might like -- it would certainly be great to be able to easily reuse
other people's ontologies in one's own ontologies. But in practice that
is still very very hard.

And this is a problem, isn't it? Because if the semantic web is
really going to take off we have to either find easy and effective ways
to connect ontologies together, or we have to get everyone to use the
same ontology. Nobody has yet solved the problems of mapping and
importing ontologies well enough. And likewise so far nobody has
succeeded in making the uber-ontology and convincing everyone else to
use it. In fact, I think it's probably safe to say that the more
comprehensive and powerful an ontology is, the fewer the number of
people who will agree on it, let alone understand it well enough to use
it.

The dream of the Semantic Web vision is that someday there will be
thousands or millions of ontologies around the web, and millions of
instances of them. And these will all somehow be integrated
automagically, or at least if they aren't integrated on the semantic
level, then there will be magic software that embodies that
integration. In any case, the hope is that someday intelligent agents
will be able to freely and seamlessly roam around harvesting this data,
squishing it together into knowledgebases, and reasoning across them.
But neither harvesting, nor squishing, nor reasoning can really take
place without some level of semantic integration of the underlying
ontologies. Yet, how will all these disparate ontologies be connected?
Unless mappings are created between them, instead of a Semantic Web,
we'll just have millions of little semantic silos. Maybe some company
will succed in making the biggest silo and that will be "the" semantic
web to most people. That might be the best solution in fact, but I'm
not sure that is really what Tim Berners-Lee had in mind! If that is
not the solution that the semantic web community wants, then the
integration issue needs to be solved sooner rather than later. The
longer we wait to solve this, the harder it will get to solve it later
on, because the number of ontologies is increasing with time.

So in conclusion, I think that the most critical missing piece of
the semantic web puzzle is a good tool -- and a good methodology -- for
mapping between ontologies. I just haven't found one yet (but if you
have, feel free to suggest it to me!). The reason I think a mapping
tool is a critical need is that I think while in theory it's a nice
idea to imagine ontologists reusing ontologies from one another, in
practice many ontologists (especially those working on large complex
ontologies) would rather write their own internally consistent
ontologies and map them to other ontologies rather than importing other
ontologies into what they are making and then having to deal with all
the inconsistencies and confusion that arises from doing that. In
practice, ontologists are usually people who value elegance and
consistency: A solution that runs contrary to those values won't really
be adopted by that community (of which I am a member).

The OWL language provides a means to express mappings between
equivalent classes and equivalent properties for example, and that
might be good enough. But I haven't seen good support for actually
building, and managing, such mappings within the ontology development
tools I've looked at. And until this process of mapping between
ontologies is made far more productive and powerful, we will see
increasing fragmentation instead of integration across the semantic
web. Similarly in tools like Protege, you can import other ontologies,
but once you do so, very little support is provided for working with
and modifying the new combined ontology.

The requirements for a good semantic integration tool are numerous.
But chief among them is that such tools need to move beyond merely
helping with integration between two ontologies -- they need to help an
ontologist map their ontology to perhaps tens of other ontologies.
There will also need to be specialized error checking capabilities and
consistency checkers -- to look for logical problems and inheritance
incompatibilities that may arise in complex mappings, and to identify
classes and properties that should be mapped but were missed. Perhaps
by analysing instance data from different ontologies (such as different
ontologies' representations of the same unique entities or concepts)
these tools could even learn or suggest mappings in order to assist or
automate the mapping process to some degree. I have seen papers on
automatic ontology mapping, but these capabilities haven't made into
the ontology design tools. This needs to happen.

Until the
process of integrating ontologies is less work than simply reinventing
the wheel, we are not going to see much semantic integration on the
semantic web. In short the vision of the semantic web as a decentralized fabric in
which multiple ontologies interoperate, really hangs on a good
solution to this issue.

I believe the semantic web is emerging and will
continue to evolve even if semantic integration is not made easy -- but
in such a case, I think ultimately it will be dominated by a few large
ontologies and service providers that everyone integrates with, rather
than the original vision of a more decentralized system.

Comments

As long as works are concentrated mostly on the semantic representation of knowledge using ontology, the vision of the semantic web will not be fully achieved. The real problem relies on interoperability between ontologies. If mapping has failled to provide a viable solution(since no one exist), the solution may rely in adding one extra layer on the semantic web called interoperabiliy layer. This layer will will provide the principles, theories and interactive agent of interoperability. The layer is expected to interact with the ontology layer of the semantic web above it and also with interoperability layer of others document seen as layers beneath it(although it is a different document )

TermExtractor is a software package for automatic extraction of terminology consensually
referred in a specific application domain. The package
takes as input a corpus of domain documents, parses
the documents, and extracts a list of "syntactically
plausible" terms (e.g. compounds, adjective-nouns,
etc.). Documents parsing assigns a greater importance
to terms with text layouts (title, bold, italic,
underlined, etc.). Two entropy-based measures, called
Domain Relevance and Domain Consensus, are then used.
Domain Consensus is used to select only the terms
which are consensually referred throughout the corpus
documents. Domain Relevance to select only the terms
which are relevant to the domain of interest, Domain
Relevance is computed with reference to a set of
contrastive terminologies from different domains.
Finally, extracted terms are further filtered using
Lexical Cohesion, that measures the degree of
association of all the words in a terminological
string. Accept files formats are: txt, pdf, ps, dvi,
tex, doc, rtf, ppt, xls, xml, html/htm, chm, wpd and
also zip archives.

Great summary of the problem facing semantic web development. My personal experience leans toward putting emphasis first on building good ontology for your application rather than reusing existing ones. This is because, at this early stage, it's rare to find classes or properties in existing ontologies that can meet all the specific requirements for the new ontology being developed.

For example, I tried to reuse as much as I can for the scientific publishing ontology being developed under W3C task force I'm coordinating. The current version has properties taken from DC and FOAF, but I always feel it's not right. Often times, the terms look right, but the ontological definition is off. I may have to throw most of them out in the next revision.

Some useful applications do not necessarily rely on integration of data represented in many different ontologies. Semantic publishing is an example, which I’m experimenting with now. I think compromising the integrity of the ontology itself for the sake of ontology reuse or future integration does not serve the purpose well.

I have to agree with bblfish, in fact he took the words right off my tounge. Encouraging authors to create their own ontologies from scratch will be a disaster for the semantic web, despite the existence of "merging" tools and interfaces. And any effort to merge synonymous ontologies would be as wasteful as an effort to merge Java classes in different libraries which serve the same purpose. It may even be bit reckless to do so. Remember, we adopted the semantic web to finally get away from ambiguity. What you're proposing will only encourage it. Ontology authors should instead be encouraged (through the availability of good search tools) to find, reuse, extend and re-publish! I understand your argument about the idiosyncrasies of different authors needs. Perhaps by establishing good conventions and best practices for developing "extension-friendly" ontologies, authors can be encouraged to develop their ontologies with the idea in mind they are to be used by others. This may mean refraining from using organization methodologies which might hinder other other's efforts.

I think semantic silos are created when ontologies are authored in vacums. The answer could be a wikitology. This allows the greatest denominator of methodologies to win out by democracy. As for those ontologists who can't shoe-horn their needs into what's availible in the wiki, here again, I believe they have no choice, because in the semantic web, if you're not talking the same language as everyone else, then you simply won't be heard (by sw agents, indexing tools, crawlers etc). With that aside though, I think such a wikitology could provide the "source of truth" that is lacking, and still accommodate the need for autonomy which you speak of, since everyone has a chance to design and influence the features of the ontology. There are public indecies of RDF ontologies (schemaweb, pingthesemanticweb), and there are even semantic wiki's (ontoworld), but there has yet (to my knowledge) been anyone whose created a wiki that allows us to collaboratively develop ontologies. If anyone has, please post as I'd like to know how I can help.

This may be a common problem for all 4th generation language programming. When I write SQL or XSLT, I have less incentive to reuse, partially because of the power of these languages, partially because of the effort involving adapting the abstraction layer (e.g. table structure, xml schema).
For ontologies, maybe we should start question whether mixing OO (Java) and functional rules is appropriate, even though most people say it's a happy marriage.
The question is with "enough" abstracted information, aspects, and ontologies, is it possible to blur the line between reality and virtuality.

This is a thoughtful introduction to the nature of the beast. Let me say a few words about the path I am taking in this regard, a simple path: I am not in pursuit of any *integration* methodology. Rather, I am evolving methodologies for *federation*. Patrick Durusau and I gave a telecon lecture on the early version of federation [1] and I am now building the platform to do subject-centric federation. At SRI, we grafted a "delicious" workalike we call Tagomizer onto my subject map provider TopicSpaces. We did that to explore more learning opportunities for our project CALO.

I realize it's a kind of change of subject from "integration" to "federation." There are, I think, two primary use cases for ontologies that direct how they are crafted and how they are used. One is the purely authoritative stance where questions to be answered must be judged, by some authority, to be correct. The other is not at all authoritative, and can be thought to be closer to the general "understanding some universe of discourse" needs of humanity. One would likely never want to integrate authoritative ontologies, except to the extent that some information will be lost when one "authority" contradicts another and the merging process is required to make a choice. But, it's more then a good idea to federate disparate world views in order to more thoroughly present some universe of discourse. No information is lost. That's the role of subject-centric federation.

As a final comment, it's bound to happen that some ontology classes imported into a subject map will find no "mates" with which to merge -- nothing else in the map talking about the same subject. Those new "subjects" will not become islands in the map; they will always be linked to the subject that is their source, as will be each merged class within the subject proxy that is its new container in the map.

I agree about the problem. But does one not have the same problem in Java? In java everyone can go and create their own classes. And that's what most people do in fact do. Then when they find that there is really a large distributed need for the same functionality pressure is created towards integrating those classes into standardised and well established libraries. These then get to be widely used, and the cycle starts again.

Integration on the Semantic Web should be a lot easier than with Java in some ways. But I can see the same thing happening. People open up their database and create their own ontologies. Then they find that a number of people share the same terms, so that they might as wells standardise on those, for legal and for business reasons (it's difficult to maintain, there's less trust, and the network effect). Hence the pressure will build towards standardised ontologies.

This is not to say that good integration tools would not be useful. In fact it would be a very powerful tool, that would make things a lot easier. A little bit like refactoring IDEs in java.

Twine | Nova Spivack - My Public Twine items

Radar Networks

In 1999 I flew to the edge of space with the Russian air force, with Space Adventures. I made it to an altitude of just under 100,000 feet and flew at Mach 3 in a Mig-25 piloted by one of Russia's best test-pilots. These pics were taken by Space Adventures from similar flights to mine. I didn't take digital stills -- I got the whole flight on digital video, which was featured on the Discovery Channel.

In 1999 I was invited to Russia as a guest of the Russian Space Agency to participate in zero-gravity training on an Ilyushin-76 parabolic flight training aircraft. It was really fun!!!! Among other people on that adventure were Peter Diamandis (founder of the X-Prize and Zero-G Corporation), Bijal Trivedi (a good friend of mine, science journalist), and "Lord British" (creator of the Ultima games). Here are some pictures from that trip...

People I Like

Peter F. DruckerPeter F. Drucker was my grandfather. He was one of my principal teachers and inspirations all my life. My many talks with him really got me interested in organizations and society. He had one of the most impressive minds I've ever encountered. He died in 2005 at age 95. Here is what I wrote about his death. His foundation is at http://www.pfdf.org/

Mayer SpivackMayer Spivack is my father; he's a brilliant inventor, cognitive scientist, sculptor, designer and therapist. He also builds carbon fiber trimarans in his spare time, and studies animal intelligence. He is working on several theories related to the origins of violence and ways to prevent it, new treatments for learning disabilities, and new theories of cognition. He doesn't have a Web site yet, but I'm working on him...

Marin SpivackMarin Spivack is my brother. He is the one of the only western 20th generation lineage holders of the original Chen Family Tai Chi tradition in China. He's been practicing Tai Chi for about 6 to 10 hours a day for the last 10 years and is now one of the best and most qualified Tai Chi teachers in America. He just returned from 3 years in China studying privately with a direct descendant of the original Chen family that created Tai Chi. The styles that he teaches are mainly secret and are not known or taught in the USA. One thing is for sure, this is not your grandmother's Tai Chi: This is serious combat Tai Chi -- the original, authentic Tai Chi, not the "new age" form that is taught in the USA -- it's intense, physically-demanding, fast, powerful and extremely deadly. If you are serious about Tai Chi and want to learn the authentic style and applications, the way it was meant to be, you should study with my brother. He's located in Boston these days but also travels when invited to teach master classes.

Louise FreedmanLouise specializes in art-restoration. She does really big projects like The Museum of Fine Arts in Boston, The Gardner Museum and Harvard University. She's also a psychotherapist and she's married to my dad. She likes really smart parrots and she knows how to navigate a large sailboat.

Kris ThorissonKris has been working with me for years on the design of the Radar Networks software, a new platform for the Semantic Web. He has a PhD from the MIT Media Lab. He designs intelligent humanoids and virtual realities. He is from Iceland, which makes him pretty cool.

Kimberly RubinKim is my girlfriend and partner, and also a producer of 11 TV movies, and now an entrepreneur in the pet industry. She is passionate about animals. She has unusual compassion and a great sense of humor.

Kathleen SpivackKathleen Spivack is my mother. She's a poet, novelist and creative writing teacher. She was a personal student of Robert Lowell and was in the same group of poets with Silvia Plath, Elizabeth Bishop and Anne Sexton. She coaches novelists, playwrites and poets in France and the USA. She teaches privately and her students, as well as being published, have won many of the top writing prizes.

Josh KirschenbaumJosh is a visual effects whiz, director and generalist hacker in LA. We have been pals and collaborators since the 1980's. Josh is probably going to be the next Jim Cameron. He's also a really good writer.

Joey TamerJoey is a long-time friend and advisor. She is an expert on high-tech strategic planning.

Jim WissnerJim is among the most talented software developers I've ever worked with. He's a prolific Java coder and an expert on XML. He's the lead engineer for Radar Networks.

Jerry MichalskiI have been friends with Jerry for many years; he's been advising Radar Networks on social software technology.

Chris JonesChris is a long-time friend and now works with me in Radar Networks, as our director of user-experience. He's a genius level product designer, GUI designer, and product manager.

Bram BorosonBram is an astrophysicist and college pal of mine. We spend hours and hours brainstorming about cellular automata simulations of the universe. He's one of the smartest people I ever met.

Bari KoralBari Koral is a really talented singer songwriter. We co-write songs together sometimes. She's getting some buzz these days -- she recently opened for India Arie. She worked at EarthWeb many years ago. Now she tours almost all year long and she just had a hit in Europe. Check out her video, on her site.

Adam CohenAdam Cohen is a long-term friend; we were roommates in college. He is a really talented composer and film-scorer. He doesn't have a Web site but I like him anyway! He's in Hollywood living the dream.