The editors of the Topic Maps Reference Model
thank Steve Pepper and Lars Marius Garshol for their careful
analysis of N490 ("Topic Maps
-- Reference Model Use Cases") that appears in N0497 ("Analysis
of RM Use Cases")(sic). Their analysis illustrates the
need for the Topic Maps Reference Model more compellingly than our
own efforts. The Pepper/Garshol reading of the TMRM (Topic Maps
Reference Model) use cases as procedural questions, rather than as
examples of situations requiring declarative disclosures, highlights
the insufficiency of the TMDM (Topic Maps Data Model) and its
related syntax/procedure based components in addressing the central
unanswered question in Topic Maps. That central unanswered question
is: How does a topic map declares its approach to the problem of
reflecting the "territory" of which it is a "map"? More
specifically: How does a topic map declare exactly how its topics
identify their subjects?

No topic maps standard has yet been adopted that
provides any generalized means for specifying arbitrary bases for
the recognition of subject identity. This omission has been masked
by the Topic Maps community's invention, on an ad hoc, basis,
of whatever approaches were needed for subject identity recognition, and
its use of such ad hoc approaches to inform the designs of any
subject-oriented processes that are implemented in their software.
(Such an ad hoc process can, for example, treat the contents
of certain <resourceData> elements as somehow
contributing to the recognition of the identity of a topic's
subject, while ignoring the contents of other, equally
eligible-appearing <resourceData> elements. This
particular approach is one of those suggested by Pepper/Garshol;
we discuss it in 1 below.) In the
absence of clear guidance from the adopted standards, the ad
hoc approach that the community took was appropriate,
reasonable, necessary, wise, and helpful to the cause of Topic
Maps.

But these historical facts do not make the
omission of a standard way to disclose the basis for subject
identity from the revision of ISO 13250 a virtue, nor do they make a
virtue of allowing XTM, TMDM, TMCL, and TMQL to fail to provide for
disclosing subject identity. It is in the best interests of the
topic maps community and our users that these omissions be corrected
in any future edition of 13250.

We treat only one of the Pepper/Garshol use case
responses below. The responses to the other use cases missed the
point of the Topic Maps Reference Model in the same way: the
responses describe procedures whose undeniable effectiveness masks a
lack of explicit underlying semantic declarations, a lack which
leaves the ways in which the topic map reflects the mapped territory
unstated and ambiguous.

In Use Case 2, the US Geological Service wishes
to base subject identity on geographic coordinates, including the
ability to say that locations within a range are the same
subject. The question raised by this use case was how to convey that
basis for subject identity, in the absence of the TMRM.

The above suggestion does not answer the
question that was being posed by the use case. Latitude and
longitude can certainly be specified as internal occurrences of a
topic (we withhold comment on the consistency of such a strategy
with our understanding of the semantics of occurrences), but no mere
specification of latitude and longitude can, by itself, establish a
doctrine for understanding the subject of any topic. As an
illustration, consider a very similar topic map snippet:

Both the above example and the Pepper/Garshol
example have strongly-typed occurrences that are raw alphanumeric
data. What determines the identity of the subject of the topic
in either of these examples? While a plausible argument could be
made that the information contained in the
<resourceData> elements in the Pepper/Garshol example
will always specify subject identity in the first example, the same
argument will be much less plausible in the second example. So how
is a recipient of either topic map supposed to know when the
information contained in <resourceData> elements
specifies subject identity, and when it doesn't?

The answer does not lie in the syntax of either
example, nor is the answer found in the TMDM, nor in any querying or
constraints. The answer is prior to, and must inform the design of,
any process in which subject identity is important.
Subject-oriented querying, merging, validation, or transformation
processes must start either with undisclosed assumptions about what
constitutes subject identity, or with disclosed bases for subject
identity. A process that starts with unstated assumptions may give
the correct result, but only as long as the unstated assumptions
hold true. Moreover, no new process can be created in the
absence of knowledge of the applicable bases of subject
identification. To say to oneself, "I always model longitude and
latitude as internal occurrences of topics that are geographic
locations" is insufficient as a basis for information interchange.
Neither XTM syntax nor the proposed TMDM/TMQL/TMCL provide means for
disclosing the information required in order to interchange topic
maps that will behave predictably when undergoing subject-oriented
processing.

Above, we considered a semantic variation on the
Pepper/Garshol solution. Now let's consider a syntactic
variation:

Both the above example and the original
Pepper/Garshol example can be used to interchange information about
Tokyo, and both can be understood in such a way as to make the
geographic coordinates that they both specify the basis on which
their common subject is identified. However, in the absence of
declarations of their respective bases of subject identity, both
examples are ambiguous. Neither of them can be reliably and
predictably understood as representing exactly the same subject as
that of the other topic, or, for that matter, of any other topic.

Finally, let's consider the information conveyed
by the examples in tabular form:

element ID

baseName

latitude

longitude

population

life expectancy

tokyo

Tokyo

35 40 N

139 45 E

japan

Japan

127214499

80.93

The above table is intended to represent a
relational database containing the same information as the XTM
snippets already shown. We presume that few, if any, would argue
that it is not useful or appropriate to be able to regard such a
database as a topic map. In order to view such a database as a
topic map, all that is really necessary is to explicitly disclose,
somewhere, somehow, exactly how it can be seen as a topic map. The
questions that such a disclosure must answer include:

What are the subjects that should be
regarded as being the subjects of topics?

How should the resulting topics specify
their subjects?

The TMRM provides essential guidance in how to
answer the above questions, so that ways of regarding an information
resource as a topic map can be disclosed and known. Rather than
enumerating and exemplifying RDF, KIF, LTM, OSL, various XML
vocabularies, and other information representations that may
usefully be viewed as if they were topic maps, we instead invite the
reader to contemplate:

how useful it will be to interchange
disclosures of arbitrary ways of viewing any information
resource as a topic map,

how important it is to make an
international standard that provides a nomenclature, such as the
TMRM's proposed nomenclature, for making such disclosures,
and

whether it makes sense to allow any
other part of the ISO Topic Maps standard to fail to be
explained using the nomenclature provided by that same
standard.

Skillful use of a tool, such as the Topic Maps
Data Model, does not necessarily imply an understanding of
why a tool works. It is not possible to choose an
appropriate tool, or to decide how to use it (save by chance or
mimicry) unless one understands why the tool works as it
does. For example, a rake is a tool that does a good job of
gathering leaves, but it is less ideally suited for gathering
school children for a trip to the zoo -- even though a rake can be
used for such a purpose, if it is handled skillfully enough. If
the children's parents object to the use of a rake, however, their
concerns are unlikely to be mitigated by assurances that "it
works", even if they trust that the rake will be handled by a
person known to be very skillful with rakes.

In general, the characteristics of a tool that
make it usable in one context (the reasons why, for example, a
rake is useful for gathering leaves), are not necessarily the same
characteristics that make it useful in another context (such as
the reasons why a rake may be useful for gathering children).
Again, it is important to know why a tool may be useful for
a purpose. Even if we have a procedure for using a rake to gather
children for a trip to the zoo, and the procedure demonstrably
works, that doesn't mean that we know very much about rakes, or
about children, either. We can't say whether the same procedure
could gather children using a shovel, or what advantages a shovel
would offer when gathering leaves.

If we know some effective procedures for
querying a topic map, it's quite true that we may, by analyzing
the query, be able to deduce some of the ways in which the topic
map reflects the mapped territory. But even if we can make such
deductions by analyzing procedures that are demonstrably
effective, it is far preferable to know the underlying design of
the topic map explicitly. It's better not only because the
maintainers of the topic map can be held to their own standard,
but also because such declarative explicitness leaves open the
possibility of using the same topic map for new, unforeseen
purposes: it provides a principled basis for the invention of new
procedures. Such explicitness is valuable and necessary even in
cases where the same person is both the maintainer of a topic map
and its only user; it provides a principled basis on which both
kinds of tasks -- both maintenance and use -- can evolve and
improve, in explicit and testable harmony with each other.

As the TMRM demonstrates, the question of
subject identity is not a simple one, nor is the question
applicable only to a particular syntax, or to a particular data
model representation of a syntax. Different users have different
notions of subject identity; this was recognized in the original
text of ISO 13250, in the definition of subjects as: "...any
things whatsoever, regardless of whether they exist or have any
other specific characteristics, about which anything whatsoever
may be asserted by any means whatsoever." The variety of bases for
subject identity is potentially boundless. We attempted to
dramatize that boundlessness in our use case document.

The Topic Maps paradigm must embody a
practical, general approach to the problem of making the bases for
specifying and understanding the identities of subjects known. In
order for the Topic Maps standard to reflect the paradigm, the
standard must establish, separate and apart from any syntax or
data model, a means by which users can declare the bases on which
subject identity is determined in their topic map syntaxes, their
data models, and their topic maps.

Most users will not need to master the
complexities of the assertion model that forms part of the TMRM.
Consider the complexities of regular expressions as a parallel
case: every designer of a language that has regular expression
capabilities has to understand the complexities of regular
expression theory. Every developer who writes software based on
such a language has to understand the capabilities and limits of
the regular expression language that it provides. However, users
need not be exposed to any of these complexities; they may only
need to know that they can use "Ctrl+F" to perform string searches
on their documents. However, even though the user does not have
to understand what happens when "Ctrl+F" is pressed, the user
depends upon a stack of logical layers, at the bottom of which is
an underlying paradigm or abstract model, and the top of which is
the software implementation that is actually being used to serve
the needs of users. Those who participate in the construction of
implementations, and of the logical layers that undergird them,
must understand how the lower layers inform the designs of the
upper ones. (And, of course, the lower layers must exist, and
they must actually inform all the layers above them.)

It is not necessary for users to understand
exactly how the Topic Maps standard provides topic map designers
and software developers with the ability to unambiguously
understand each other's approaches to the problem of mapping
subject territories. But if we want the Topic Maps standard to be
adopted widely, it is essential for the standard to provide that
ability. Users who are interested in protecting their investments
in topic maps expect their adherence to an international standard
to afford them some protection. Users understand the dangers of
making investments in information whose underlying model is not
completely explicit, or that is only explicit in non-standard
terms.

The TMRM allows authors of topic maps to use
occurrences, resourceData, associations and other
syntax/information items to mean many different things, just as
diverse users of Topic Maps are already doing. The TMRM seeks to
provide the basis for developing a means to allow such authors to
say (in a standard way) what their various usages of the syntactic
constructs and information items mean, i.e., how to recognize the
identities of the subjects of their topics. The TMRM only
facilitates disclosure. It does not constrain the models and
mapping approaches that can be disclosed.

The TMRM is a valuable tool for creating
consensus around a deep common understanding of a syntax or data
model. The work of systematically disclosing, in conformance with
the TMRM, how instances of a syntax or data model are expected to
be viewed as topic maps, can highlight ambiguities and
misunderstandings. Such a systematic effort requires
consideration of the consequences of each possible approach to the
mapping problem.

All geographic maps have legends that disclose how each
symbol that appears in them reveals something about something in
the mapped territory. Every topic map needs a legend, too: an
indication of how its symbols correspond to the subjects in the
mapped subject-territory. The TMRM provides a way of disclosing
the legends of topic maps, but it imposes no constraints on the
symbols that the legend can define, or their definitions.

The notion that the TMRM "competes" with the
TMDM is illusory. The TMDM and TMRM are as different as apples and
deoxyribonucleic acid (DNA). The proposed TMDM is a data model
based on a particular syntax. By contrast, the TMRM is a proposed
nomenclature by which any syntax, data model or topic map author
can disclose, in a "standard way" (SC34 is a standards body), how
subject identification is done when using a particular syntax or
data model, or even within a particular topic map.

The question that led to WG3's development of
the TMRM arose when 13250 incorporated the XTM syntax, in addition
to the original HyTM syntax. At that moment, Topic Maps began to
have multiple standard syntaxes, and it became necessary to have
an answer to the question: What is it that makes instances of
HyTM and XTM -- and, by extension, LTM and other notations --
"Topic Maps"? The most salient characteristic that all topic
map syntaxes and models share is that they are all designed to
facilitate of the "one location per subject" goal. This is not a
coincidence. The impetus for the invention of the paradigm was
the problem of merging diverse independently-maintained indexes,
and the invention of the paradigm was simultaneous with the
coining of its "Topic Maps" moniker.

The achievement of the outward sign of being a
"topic map" -- having one location per subject -- implicitly
depends on the answer to a more difficult question: On what
basis are subjects distinguished from, or found to be the same as,
other subjects? It is true that a particular syntax or data
model may gather pieces of information about each subject in a
single syntactic construct or in-memory object that is, in some
sense, dedicated to a single subject. The syntax may be contrived
in such a way as to guide its human users to express their topic
maps with a degree of predictability and consistency that, at
least in some circumstances and for some purposes, is "good
enough". But in the absence of explicit knowledge of the bases on
which the identities of subjects will be consistently determined,
there can be no certainty in the interchange and processing of
topic maps, even if they all use the same interchange syntax and
are processed in systems that use the same data model.

The means whereby the identities of subjects
are discriminated must be declarable separately from any syntax or
data model. To insist that a topic map's doctrines of subject
identification are inseparable from the representation of that
topic map in some specific notation or data model is to attempt to
create an obstacle that topic map owners who wish to exploit their
assets outside the contexts of systems that have been designed
around that syntax or data model will have to overcome. Such
insistence does not serve the interests of topic map owners or
users. If we want the Topic Maps standard to be widely adopted,
it must encourage investments in topic maps. Adopters of the
standard must be rewarded by enhanced exploitability of their
investments in their information assets. We can reasonably expect
the rate at which the Topic Maps standard is adopted to be a
function of the degree to which adoption adds value to investments
in information assets.

Many interchange syntaxes, data models, and
database schemas have already been designed without any knowledge
of Topic Maps, but, at least implicitly, with specific and
consistently-applied ideas about subject identification in mind.
Many of these syntaxes, models, and schemas have been adopted by
significant shares of significant markets. So, the question,
How can ideas about subject identification be expressed
independently of any syntax or data model? is far from being
merely academic; it goes to the heart of today's largest business
cases for Topic Maps.

The TMRM is essential to the Topic Maps
standard because it provides answers to the most essential
questions about the Topic Maps paradigm. If the Topic Maps
standard cannot provide convincing answers to these questions, the
standard cannot be truthfully defended against the charge that
Topic Maps are "full of sound and fury, signifying nothing."