First draft of RM use cases
document - PDEdited by SRN.Entirely new draft by PD, including material
contributed by AW.Added new first paragraph, changed association
to connection (all cases), added new use cases for non-Unicode
texts and soundex merging, incorporated comments from Jan, general tightening of languageAdded material by Duane Degler.Edited by SRN.Edited some of the language in the
introduction, Subject Identity and Legacy Data becomes Subject
Identity and Data (with internal edits as well), disclosure of
merging rules tightened up. PLDIntroductionThe following use cases have been developed to
guide the discussion of requirements for the Topic Maps Reference
Model (TMRM). There has been extensive work on and discussion of
the both topic maps and TMRM and this document is written against
that background. The casual reader is therefore cautioned that
terms of art and usage occur without warning or explanation.No Abstract Model of Topic MapsCurrently ISO 13250 provides no abstract model
of topic maps. The situation is analogous to the path not taken
in the early development of airplanes. Without the underlying
model that guided the design of the Wrights' airplane, others
could copy their work, making airplanes that, like the Wright's
flyer, would really fly -- but only for a few hundred meters. The
development of airplanes for diverse practical purposes required a
general model of the dynamics of powered flight -- one that could
form a basis on which many problems could have many creative
solutions. Similarly, the first interchange syntaxes and
processing models for topic maps have guided the construction of
topic maps that really work. However, by themselves, these
syntaxes and processing models provide an inadequate basis for
creating and using diverse solutions to the evolving problems
confronted by those who create, manage and use human
knowledge.The existing interchange syntaxes and
processing models for topic maps reflect particular approaches to
the identification of subjects -- specific techniques for
determining when two or more topics represent the same subject.
Both ISO 13250 and the proposed revisions of it concede that their
interchange syntaxes and processing models can be extended, but
they provide no guidance for modeling or meaningfully disclosing
those extensions, such that meaningful construction and
interchange of such topic maps are possible. In the
absence of an abstract model for topic maps, it is not possible
for vendors and users to extend current syntaxes and processing
models in a reliable and interchangeable way.The TMRM exposes principles on the basis of
which diverse designs for topic maps can be expressed, compared,
evaluated, and made to work together. This document describes
some use cases in which the TMRM is expected to enable solutions
to problems that, in the absence of the TMRM, would be more
difficult to solve. Subject Identity Based on ConnectionsOverviewIn topic maps, the connections between
topics represent connections between subjects. George may be
connected to Laura (his wife), to the US Government (his
employer) and to Osama bin Laden (his nemesis). The question
raised by this use case is: "How, in the absence of an abstract
model for saying so, we can know whether to merge two topics on
the basis of their connections to other topics?"In this particular use case, the Social
Security Administration, an agency of the US government
responsible for distributing funds to elderly and disabled US
citizens, is interested in investigating fraud in claims for
payment. In some cases, fraud is committed by persons who claim
multiple payments by pretending that they are multiple persons,
each with a different "social security number" (a different
presumably-unique identifier assigned by the US government).
In order to detect this kind of fraud, the Social Security
Administration wishes to enable merging of topics that represent
individuals on the basis of their connections to other
individuals, geographic locations, treating physicians and the
types of claims being made.Preconditions

Topics that represent individual
claimants. Each such topic has a property whose value is
the claimant's social security number(s).

Topics that represent types of
claims.

Topics that represent treating
physicians.

Topics that represent geographic
locations.

Connections between the above topics
that reflect the information known to the Social Security
Information.

ScenarioThe investigator needs to enable merging
of topics that represent individuals, despite the lack of
equal locator items (as specified in the proposed Topic Maps -
Data Model Section 5.4.6 "Properties") on the basis of
connections to type of claim, treating physician and both the
person's and physician's geographic location topics.PostconditionsInvestigator obtains merger of topics
representing individuals who share a connection to geographic
location as specified with a connection to a particular
physician and connection to a type of claim. When such mergers
occur, there is some possibility of fraud if the resulting
merged topic has more than one social security number.Business CaseThe approach allows efficient detection of
cases in which there is a possibility of collusion of patients
and physicians in fraud. The approach depends on having
flexibility in how subject identity is determined. (If the
Social Security Administration needed to determine subject
identity solely on the basis of social security numbers, the
TMDM's approach to subject identification would be adequate.)
This use case illustrates that there is utility in applying
merging rules other than those provided by the TMDM.Specifying Properties of TopicsOverviewThe interchange syntax-based explication of
topic maps in ISO 13250 enunciates certain properties for
topics, including topic name, occurrence, and
association. Some current proposals, such as the TMDM,
recognize that 13250's interchange syntax is not intended to
constrain the properties of the non-interchangeable topic
objects found in implementations. These proposals provide
additional properties, but they nevertheless would provide all
topics with only a single, specific fixed set of properties,
exclusively reserving unto themselves the privilege of defining
the properties of topics.There is nothing particularly sacred about
the properties of topics reflected in any interchange syntax or
proposed data model. As has been the case for many years,
different industries and users employ different notions of
subject identity, even when they are processing exactly the same
information. Creators of topic maps should have the ability to
declare the properties in terms of which they intend their
topics to be understood, and their ability to declare such
properties should not be constrained by the topic maps standard.
Users of topic maps should be free to use the advice provided by
their creators, or to ignore it; users should be able to decide
for themselves the properties in terms of which they wish to
understand topic maps.In this particular use case, the US
Geological Survey (USGS), another agency of the US government,
wishes to construct a topic maps in which topics have, in
addition to names, location properties whose values are
expressed in terms of longitude and latitude. As it happens,
the USGS does not wish to create topics to represent individual
quanta of geographic space; instead, it prefers to understand
latitude and longitude values as points in their respective
continua. This attitude has implications for subject identity,
and therefore for merging, and the USGS needs to understand and
explain those implications to itself and to the users of its
topic map products.PreconditionsThe USGS wishes to build a topic map that
contains topics whose subjects are geographic locations. For
each such topic, the following information will be conveyed:

the name of the location

the variant names of the location

the longitude of the location

the latitude of the location

The USGS intends the topic map to be
understood in such a way that, when any two topics have the same
longitude (within some tolerance), the same latitude (within
some tolerance), and any name or variant name in common, the two
topics will be regarded as having the same subject (i.e., they
will be merged).ScenarioThe ability to integrate information about a
given specific set of geographic coordinates is just one of the
USGS requirements. Another requirement is to be able to respond
to queries about identified locations with respect to any set of
coordinates. In the data model that the USGS needs to use for
its topic maps, longitude and latitude are as much
characteristics of its topics as topic names or information
locators may be in some other data model.PostconditionThe TMRM shows how the USGS can enjoy the
benefits of Topic Maps without having to surrender the freedom
to construct subject identity properties that accurately reflect
its own understandings and attitudes with respect to the
subjects within its domain (in this case, geographic locations).
Because the TMRM establishes the minimum requirements for
usefully disclosing such understandings and attitudes, the USGS
can provide the users of its topic maps with the option of
understanding them exactly as USGS intends them to be
understood.Business CaseThe TMRM allows the benefits of creating and
using Topic Maps to be realized by diverse user communities,
even when their notions about how subjects should be identified
are highly specialized, or are themselves subject to change. It
allows the topic maps paradigm to be adapted to the attitudes of
its users with respect to their knowledge domains, rather than
requiring the users to adapt their thinking about the
representation of their knowledge domains to the constraints of
topic maps. This can significantly reduce the learning curve
burdens of new users who already have a data model with which
they are already familiar. It also maximizes the freedom of
topic map creators/maintainers to adapt to changes that occur
within their knowledge domains.Topic Maps and Diverse Information
ResourcesOne of the listed purposes of ISO 13250 was to
provide integration of diverse information resources (both
structured and unstructured) through the use of a topic map. The
practical requirements for integrating truly diverse resources
may, but very likely will not always, be fully satisfied by the
properties of topics (and the merging rules based on those
properties) that have been proposed as revisions to ISO
13250.This section, Providing an Integrated View
of European and UK Parliamentary Information, written by Ann
Wrightson, is one example of such a use case. This material,
Copyright 2003 Ann Wrightson, appears here with her
permission.Providing an Integrated View of European
and UK Parliamentary InformationOverviewIt is a fact that at the time of writing,
the European Parliament is evaluating Topic Maps as a medium
for recording the existence and organization of a range of
information assets, and the UK Parliament has decided to adopt
RDF for indexing some of its information assets. This usecase
takes this situation forward into a plausible future scenario
where both these illustrious organizations have followed
through on these early directions, and have furthermore made
substantial collections of their respective information assets
available by remote access. These access interfaces include
the following capabilities:

Performing a query on the
collection's subject index and other metadata, using an
RDF or Topic Map query respectively. These queries may
return "flat" values, or may return a more or less
substantial helping of RDF or Topic Map
respectively.

Retrieving individual resources
using parameters such as a document ID.

This usecase is a high level description
of a user interface that gives an integrated view across these
two collections, including search and retrieval functions that
do not require the user to interact separately with the two
collections of information. The researcher is an independent
third party.Preconditions

Researcher wishes to investigate a
matter with relevant sources in both UK and European
Parliament collections

Researcher has a tool driven by
the TMRM - called below the RM-Nav

Each collection includes metadata,
for example Dublin Core.

Access to both collections is
available, through an interface supporting querying of
subject index & metadata, and retrieval of
documents.

A "researcher's friend" ontology
is available - say in a third technology, X - that
provides cross-references between subject headings used in
the two domains. (This is called the X-ref ontology
below.)

Scenario

Action

Software reaction

TMRM Contribution

Researcher starts RM-Nav

Following user authentication,
RM-Nav presents a browsing interface, including a
navigable network of subject headings.

Queries to both collections
retrieve their current sets of subject headings, including
structure such as hierarchy. These are combined with the
X-ref ontology to yield a single structured collection of
subject headings. This is called the combined subject
index below.

Researcher selects a major
subject heading

RM-Nav "zooms in" to the part
of the network of subject headings that pertains to this
major subject.

Navigation interface, followed
by list of documents pertaining to the subject term
selected. (Assumes that there are a smallish number of
documents.)

Supports the formulation of
queries (ending up as RDF Queries and TMQL queries that
use suitable local subject terms) to retrieve document
IDs, and metadata to populate the list of identified
documents.

Researcher selects a document
to read

The document (information
resource) is retrieved, and rendered according to the data
format recorded in its metadata.

Links are
provided to a selection of closely related documents.

A selection of closely related
documents is identified by combining and filtering
information gathered by queries (to both collections)
using suitable local terms derived from the selected
document's metadata & the combined subject index.

Researcher requests
information on the creator of the selected document

Presents summary information
from the document metadata, plus a suitable interface to
relevant documents (across both collections), eg a list if
few, a navigation interface if many.

A selection of relevant
documents is identified by combining and filtering
information gathered by queries (to both collections)
using the Creator term from the document's metadata.

PostconditionsResearcher obtained suitable source
document, with citation and background on creator.Business caseEffective use of copious published
information.Subject Identity and DataOverviewThe Widget Corporation wishes to use topic
maps to access its current sales activities and to plan its
marketing strategies. In order to determine subject identiy, it
wishes to use values returned from its current database. Due to
its long presence in international markets, some of the data in
question is stored in a variety of encodings, including Unicode,
Shift_JIS, EUC-JP, KS C 5601-1992, and others. Numerical data is
stored in a uniform encoding but names of personnel, sales
territories and other data used primarily by local offices
varies by locale.Preconditions

Subject identity is determined on
the basis of values returned from database. (Note: subject
identity is not determined on the basis of
pointers to those values.)

Data is stored in a variety of encodings.

ScenarioWidget Corporation wishes to specify custom
rules for determining subject identity based upon actual data
and not based upon pointers to that information. Those rules
must allow for the matching of data held in various character
encodings.PostconditionsSubject identity based upon data held by the
Widget Corporation allows it to capitalize on its existing data,
enhanced by the use of topic maps.
Business CaseWithout being restricted to pointers to
data, Widget Corporation can make effective use of its existing
data to determine subject identity and by implication, the
merging rules that apply to subjects that are of interest to
it. Reuse of data is an important consideration for Widget
Corporation due to its long term investment both in the
development and maintenance of that data.Subject Identity and Soundex MatchingOverviewWhile used by telephone companies to assist
operators for years, soundex algorithms have taken on a new
importance in the current war on terrorism. The cancellation of
flights to the US based upon faulty soundex matching that
resulted in a five year old girl being suspected of being a
terrorist is well known.Even if soundex matching yields unreliable
results, the technique is used because it offers at least some
advantages in dealing with the generally intractable problem of
public security. It appears here as a use case because it is an
example of a technique other than string-matching that is used
to establish subject identity.A major provider of security for an
unspecified airport wishes to use topic maps to assist in
screening passengers who are embarking on both domestic and
international flights. Some of the details that are of interest
in establishing subject identity are listed as
preconditions.Preconditions

Names of passengers in various
non-Unicode encodings

Soundex results of passenger
names

Soundex results of suspected
terrorist names

Other information deemed relevant to
screening passengers

ScenarioWhile screening passengers for eventual
boarding, the security provider wishes to use both soundex
matching of names, along with other criteria not disclosed, but
that are supported by actual substantive data, and not by
pointers to such data, in order to establish the identities of
passengers scheduled to board a particular flight.PostconditionsThe security provider can utilize data
comparison algorithms, like soundex, as part of the process of
determining subject identity for airline security.Business CaseThe need to utilize a variety of means to
evaluate subject identity, in the very real sense of who is
going to board a commercial aircraft, or to enter a secure
location, cannot be doubted. Those charged with providing that
security should have the means to adapt subject identity, in the
topic maps sense, to that task as they see fit.Disclosure of Merging RulesOverviewThe prior use case on merging of Social
Security Claim records is certainly allowable under both ISO
13250 and under current proposals for revising ISO 13250.
However, neither the current standard nor any proposal (other
than the TMRM), provide for the disclosure of such variant
merging rules.In this particular use case, the Social
Security Administration, an agency of the US government, wishes
to evaluate new topic map software for use in its fraud
detection unit. It has no knowledge of any merging rules that
were customized as part of its current topic maps
application.PreconditionsThe Social Security Administration informs
the new vendor that it has the following topics stored in its
topic map:

Topics that represent
individual claimants.

Topics that represent types of
claims.

Topics that represent treating
physicians.

Topics that represent geographic
locations.

Associations among the various
topics.

Further, the new vendor is allowed to
observe the operation of the current system, including the
inputs and outputs of proposed merger operations.ScenarioThe Social Security Administration, before
choosing a new vendor, wishes to have both assurances and a
confirming independent evaluation that the new software will
precisely duplicate the current system's functionality with
respect to the merging of topics.PostconditionThe vendor is able to provide a formal
specification that claims rigorously (and legally actionably)
the ability of the new software to duplicate the behavior of the
present system. The proposed new software's conformance to the
specification can be independently verified.Business CaseWithout disclosure of merging rules and the
objects in topic maps affected by such merging rules, vendors
will be unable to provide meaningful assurances to clients that
their software will duplicate or exceed current capabilities.
Customers will be unable to undertake meaningful assessments of
the risks involved in changing from one topic maps software
vendor to another.Access to Information from Multiple Sources,
Preserving ContextThe following is a use case description
drafted by Duane Degler on March 6, 2004. Copyright 2004 Duane
Degler, email: ddegler@ipgems.com. This material may be copied
and redistributed provided that the copyright notice and author's
e-mail address are included on all distributed copies.Access to Information from Multiple
Sources, Requiring ContextOverviewThere are many cases where information
relevant to a user's task is the responsibility of more than
one organizational entity - whether that is two or more
departments within an organization, or two or more separate
organizations. This is particularly true in government,
because activities are governed by agreements between
government agencies/departments in situations where
jurisdictional boundaries are crossed in the completion of a
task or activity (see the scenarios at the end of this
document for two examples of this). Data exchange agreements
will exist for managing the transactional data, but the
process of a user entering that data also requires access to
content that may not fall under the agreement, as it may not
be considered "structured data" in application terms.In a situation where accessing information
is incidental to the user's main task, the software
application being used may take on the responsibility for
accessing required information (e.g. policy, instructions, or
data contributing to task completion) as a background
activity.ActorsData provider, Data entry application,
Local information source, Remote information source Note: for
the purposes of convenience, "Organization A" will be used to
denote the one responsible for the data entry application and
the local information source, and "Organization B" will be
used for a remote information source.Preconditions

Organizations A and B have
information sharing agreements spanning the scope of the
task supported by the data entry application.

Organizations A and B have a
model for describing and mapping the information
resources, and expose these maps to the data entry
application.

The user (data provider) initiates
a request for information at a particular point while
working in the data entry application.

Description

User Action

Role of TM

While using the data entry
application, user requests assistance and clarification
about the data being entered at a particular point in the
data entry process.

Protocol for packaging the
request in a standard form for exchange between
applications, exposing whatever knowledge the data entry
application has about what data the user is working on,
the nature of the information being sought, and how it
processes its own information associations.

Representing the information
sources' (local and remote) maps. Merging or associating
with topics disclosed by the data entry
application. Disclosing what processing was undertaken to
manage the maps involved in the request.

Merge or derive association
inferences from among the various responses. Identify and
provide or assist in provision of occurrence
references. Support categorization and presentation of
information based on topics known to have participated in
the request.

PostconditionsUser is presented with relevant documents
and data based on the activity being performed, with little or
no need to perform further search or query activity to refine
the information received (i.e. access to the remote
information source is more specific than just a "home page" or
table of contents).Variations/ExceptionsNone at this time.Related ScenariosScenario: Policy query when providing
financial data to a government agencyData about an individual's earnings is
collected by one government agency. The individual's
employer provides this data (playing the Actor role of data
provider). The data is transmitted by the agency that
captures the information to two other agencies that store
and use the data to support their service missions. Policy
and guidelines defining the nature and quality of the data
provided is established by all three agencies, based on
their jurisdictional responsibilities. Each of them is
individually responsible for publishing and maintaining
policy and guidelines information.While the user (data provider) is
entering information, a question arises about how particular
data should be itemized. In order for the data entry
application to access the relevant supporting information,
the information must be located that is appropriate to the
application being used, the task being performed within that
application, the particular activity being carried out (at
either function or field level), and the conditions
particular to the data being entered (classification of
employer organization, classification of employee, financial
threshold/range, exception conditions).Goal: Immediate access to only the
relevant policy and guideline information directly from
within the data entry application.Expected outcome: Presentation of a list
of reference paragraphs/documents to the user, with
supporting categorization information to help the user make
an informed selection.Scenario: Asking for information about
an organization for security reviewA user (playing the Actor role of data
provider) wants to get data and supporting information about
security considerations relating to a potential supplier of
services, in advance of a meeting that the user will be
having with that organization. The security criteria for
that organization, and supporting guidelines for the user,
are held by more than one government agency/department. In
order to frame the request, the user needs to enter some
data about him/herself and about the organization in
question.The data entry application submits a
request to a network of information providers, requesting
both specific data and supporting policy information.Goal: Access to the data and relevant
policy information.Expected outcome: Presentation of data
to the user, along with a list of reference
paragraphs/documents, with supporting categorization
information to help the user make an informed
selection. Information is presented in sets associated with
the details of the security data provided. Clear reasons and
policies are presented to the user about information that
could not be accessed based on security criteria that were
not met by that user's particular profile.Conclusion: A Procrustean Bed of Subject
Identity?One of the common features of all the use
cases described above can be stated as follows: Subject
identity, and the properties considered by users to define it, do
not always fit into predefined categories or even the notion of
discrete property values. Subject identity is often, if not
more often than not, a matter of values that lie upon a range of
values that a user considers to represent the same subject.Consider the use case of the USGS, which
wishes to regard longitude and latitude data as the
subject-defining properties of topics that represent geographic
locations. This demonstrates that not all topic characteristics
for determining subject identity consist of discrete values. For
some subjects, those values may lie anywhere along a user-defined
continuum. The handling, if any, of such values in merging
operations must also be user-definable.The characteristics of a subject that define its
identity and the rules for merging topics on the basis of those
characteristics, must be declarable by standard means in the
Topic Maps standard. To be do otherwise belies the claim that:

In the most generic sense, a
'subject' is any thing whatsoever, regardless of whether it
exists or has any other specific characteristics, about which
anything whatsoever may be asserted by any means whatsoever.(ISO 13250 (2002), 3.18 subject

If we limit the characteristics of subjects to a specific list, we
will have to change ISO 13250's definition of subject to:

A subject is anything
whatsover whose identity can be described by...

Such a change would gravely impoverish the topic maps paradigm; it
would limit its scope both unnaturally and unnecessarily.The second common feature of these use cases
is the need for a means for users to disclose the choices they
made in determining the characteristics that govern subject
identity, i.e., the rules they established for merging of topics.
Since no predefined set of characteristics is sufficient for
determining the identity of every subject in every circumstance,
it stands to reason that, at least in the general case, topic map
information cannot be meaningfully exchanged without disclosing
the characteristics and rules that govern identity in a particular
topic map instance. These use cases illustrate the need for a
model of topic maps that allows diverse users to define subject
identity and merging rules for subjects in their diverse domains
and contexts. The model must be flexible enough to allow subject
identity to be understood in terms of the inherent properties of
the topic, or in terms of the topic's relationships to other
topics. The model must provide for disclosure of such design
choices that is sufficiently rigorous that, whenever the same
topic map information is understood in terms of the same
disclosure, it is understood to mean the same thing, and it is
interpreted in the same way.Any single way of interpreting a single
interchange syntax, or any single set of topic properties, or any
single set of merging rules, can, within its limitations, enable
topic map interchange. However, every such thing is necessarily
also by itself a procrustean bed for subject identity, and, by
itself, is insufficient to serve the stated scope of ISO
13250.