Business Maps: Topic Maps Go B2B

Interoperability between ontologies is a big, if not the single biggest
issue in B2B data exchange. For the foreseeable future there will not be a
single, widely accepted B2B vocabulary. Therefore we will need mappings
between different ontologies. Since these mappings are inherently
situational, and the context is very complex, we cannot expect computers
to create more than a small part of those mappings. We need tools to
leverage the intelligence of humans business experts. We need portable,
reusable, and standardized mappings. Topic Maps are an excellent vehicle
to provide those "Business Maps". (This article presumes a basic
understanding of Topic Maps, readers may wish to read A Gentle Introduction
to Topic Maps in conjunction with this article.)

We have lots of data and descriptions of data. Take for instance the
abundance of vocabularies for B2B exchange: xCBL, FinXML, FpML, etc. Those
vocabularies can be seen as ontologies. Older EDI technologies such as
X.12 and EDIFACT are also ontologies. There are as of yet no general
standards for B2B vocabularies in XML. The ebXML initiative did not have actual
business documents as one of its deliverables. Right now work is being
done on the Universal
Business Language (UBL) to fill this gap. Beside those
"industry-strength" solutions, there are lots of tailor-made data
exchanges between companies, often using nothing more than simple ASCII
comma-separated files. Together with their documentation, those
ASCII-files also constitute ontologies. And even within larger companies
many different ontologies exist within the different legacy databases of
different departments. Those different data sources present huge
interoperability problems.

One of those interoperability problems is finding out which data items
from different sources are the same. To do that, we need to compare the
meanings of those data items. This means we have to look up data
definitions for different data sources and compare those data
definitions. Comparing human-made definitions is a tough job. Different
organizations may come up with very different definitions for things that
really are the same, and with very similar definitions for things that are
very different in reality.

First of all, hard as we try, mistakes and obscurities occur in our
data definitions. Second, in making data definitions we may find that a
lot of data aren't that well defined to start with. In other words, when
we make data definitions for a data source, it's sometimes the first
attempt to define the data at all, and when there already is a definition,
it is often not precise enough. Third, when we make a definition like "an
employee is a person working at a company", we introduce many new words
("person", "work", "company") from natural language. When meanings in
natural language aren't precise, those definitions aren't going to be
precise either. We should stop thinking we can fix meanings once and for
all in any but the most limited contexts.

Some Solutions and Why They Don't Work

There are some solutions to these problems. The first, which I shall
call "the naive approach" is to make a new vocabulary which covers
everything, and then let everybody use that vocabulary. It's an easy
solution to think of, but it does not work in practice. Multiple
vocabularies are a fact of life. Think only of the huge number of existing
applications using legacy formats, which won't simply go away. And even
for new applications, there are so many different business needs in
different contexts that there's a huge drive toward specific, directly
applicable vocabularies and away from generic standards which take a long
time to evolve. So the main problem should be how to make the different
vocabularies interoperate, not how to replace them by a single unifying
standard. Developing unifying vocabularies is a good thing; the more
success they have, the better. But one should think of them as central
pieces in the plethora of vocabularies, not the only ones. The success of
new, unifying vocabularies will depend not so much on their inherent
capabilities as on their ability to interoperate with existing
vocabularies. Interoperability is the shortest route to acceptance.

Another approach to interoperability is the use of Published Subject
Indicators (PSIs) as used in Topic Maps. The basic idea is to make public
libraries of unique IDs for things. In our vocabularies we incorporate
PSIs, and then we can compare the terms in our vocabularies. In an
informal example:

The PSIs in the English and Dutch topics allow us to conclude that both
topics are the same. Note that this really just shifts the problem from
vocabularies to public libraries. In general we can say this approach is
successful if the problem space consists of clearly delimited entities and
there is a widely accepted canonical public library. Examples of areas
were this approach will work are for instance ISO currency and country
codes. Currently OASIS is working on standards for PSIs in its OASIS Topic Maps
Published Subjects TC. Once this work is done, the situation might
improve as more PSIs are being published.

In actual mappings between ontologies, we often do not really establish
semantic equivalence in a true sense as needed in PSIs. Consider an
example. When GigaSellers decides to let Print & Send handle its
invoices, invoice information flows from GigaSellers to Print &
Send. When we have found we can use GigaSellers "CustomerAddress" as the
"invoice_address" in Print & Send's invoicing application, we stop. We
do not need to find out whether they are truly equivalent in all
circumstances. There is no direct business need to find this out, and
therefore the boss doesn't pay for it.

Solutions like PSIs do not work here because PSIs require true semantic
equivalence. The interesting observation is that most real world mappings
are unidirectional: we translate from a source ontology to a destination
ontology for a specific business process. For instance, an order goes from
buyer to supplier. It does not go back (though a different document such
as an invoice or order confirmation might go back). So for an order only a
translation from the buyer's ontology to the supplier's ontology is
needed. This unidirectional nature of business exchange means that usually
we do not establish equivalence relationships, but subset relationships
between ontologies. In the above example, "CustomerAddress" is a subset of
"invoice_address". All instances of "CustomerAddress" constitute a valid
instance of "invoice_address". We do not know whether the reverse is
true. It could very well be the case that GigaSellers requires its
"CustomerAddress" to be a physical address where goods can actually be
delivered, and Print & Send allows postal boxes as
"invoice_address". Further, there is often no true equivalence because the
data items are in different formats. GigaSellers may store all its dates
in CCYY-MM-DD format, and Print & Send as MM/DD/CCYY. In this case,
too, there is no true equivalence, since a transformation is needed.

It might be tempting to conclude that we simply have to make a mapping
between every two ontologies we use. That, however, is going to far. Even
when we do not always establish true semantic equivalence relationships,
the mappings we make can be reusable. What we need to do is capture
knowledge about the mapping process itself. We need to store the fact that
we can use "CustomerAddress" as "invoice_address" in this particular
context. Then, when someone else needs to find out whether
"CustomerAddress" can be used as "mailing_address" in a different context,
they can use this information. When we store this kind of information, we
could facilitate the process of mapping ontologies through the use of
semi-automated tools which show existing mappings for items in our
ontology that we need to map onto another ontology. The human expert
making the mapping can still make all the relevant choices and provide new
mappings where existing ones can't be reused. Such semi-automated tools
could then generate a new mapping, which also can be stored to provide
information for the next one. It would also become much easier to exchange
information about mappings without having to provide full one-on-one
equivalence relationships.