Topic maps are an ISO standard for the representation and interchange of structured information models.
The first version of the standard, ISO 13250:2000 was released just two years ago and has since been adopted for use
on the Web by the TopicMaps.Org consortium. The XML Topic Maps (XTM) syntax created by the TopicMaps.Org
consortium is now also included as a normative appendix to the Second Edition of the ISO standard (ISO 13250:2002).

The original motivation for the development of the topic map paradigm was to assist in the process or merging
traditional back-of-the-book indexes. It is no surprise, therefore, that the paradigm easily accommodates the
representation of finding-aids associated with books and other forms of paper documentation such as indexes, tables of
contents, thesauri and so on. However, the generalised nature of the topic map paradigm has enabled its application to
other problems and in its current incarnations, we find topic maps being used to represent ontologies; to provide a
framework for the development of web applications; to integrate diverse electronic information sources under a single
portal; as well as for improving access to published information of all sorts.

In this article I will introduce the basic concepts of topic maps and show how these concepts can be represented in XTM
syntax. Due to space constraints, this article will not cover the practical applications of topic maps, but the
interested reader can find case studies published on the Techquila web-site.

A topic map is a collection of structured mark-up external to any resources it may describe. It is common to think of
the topic map as a layer 'floating' above resources which may be retrievable electronic files such as web pages or
files, or other forms of information (perhaps information which is retrieved manually or retrieved as the result of
performing a database query). There is a clean division between the topic map and the resources referenced from the
topic map that is only crossed by the topic map occurrence construct (described later). This division between the
topics and associations in 'topic space' and the resources documented by the topic map is shown in Figure 1.

A topic map consists of a collection of topics. A topic is a proxy for anything that the topic map author wishes to document in his or her topic map. As already indicated, the thing that a topic represents can be:

An electronic resource that can be retrieved and processed by a computer.

An electronic resource that is not accessible to the computer.

A "real-world thing" such as a person or an object which cannot be retrieved or processed by the computer (although the computer may have information about how an actor in the real world could retrieve the thing).

A concept with no physical manifestation such as an emotion or a technical concept. A business entity such as a company or a department falls into this category too.

A construct in a topic map - we will see later how this is useful for being able to add successive levels of detail to a topic map.

Figure 2, below, shows the distinction between a topic and the object or concept that the topic represents in the real world. An author has in his or her mind the subject to be documented and creates a topic. The topic is a machine representation of the subject which can now be stored, queried and manipulated by the computer.

Characteristics is a collective term used to describe the three properties which a topic may have which together form the collection of assertions that the topic map author has made about the subject that the topic represents.

Names are labels for the topic that serve to identify it to the user of the topic map in some way. These labels can also be used by an application for sorting and display of topics. Names are string labels but each string label can have any number of alternate representations, including non-string forms such as graphics. For example a company may have a full name and a ticker symbol on the market on which it is listed. In addition, some names may be differentiated by their purpose such as sorting, iconic display and so on.

Occurrences are identifiable resources that are in some way related to the topic. Note that an occurrence resource needs only to be identifiable to a topic map processor. It does not need to be retrievable, although typically a non-retrievable resource is not of much use to a topic map application. For topic representing a company, a retrievable occurrence might be its home page or a stock quote for the company. A non-retrievable occurrence might be the registration papers for the company.

As well as resources external to the topic map, occurrences can also be used to specify additional information about the topic which is kept in the topic map itself.

In Figure 1, occurrences are shown as the dashed lines connecting the topics to resources in the resource pool.

Roles played in associations form the final class of characteristic of a topic. Associations are used to relate two or more topics to each other. In Figure 1 associations are shown as solid lines connecting topics together. We will look at associations in a moment; all that is important to know right now is that in any given association a topic will play some identifiable "role". For example an individual might own some shares in a company. In this case there is an association between individual and company with the individual playing the role of "shareholder" and the company playing the role of "share-holding". Note that the roles would be the same in the case of an organisation such as a financial institution holding shares in the company. In XTM syntax, the mark-up that declares what roles a topic plays in an association are part of the mark-up of the association, not of the topic, but conceptually the roles that a topic plays are considered to be a characteristic of the topic.

The following example shows the representation of a simple topic map consisting of a single topic. Note that the <topicMap> element defines the default namespace as the namespace defined for XTM 1.0 and the namespace for XLink. In later examples I will omit the XML declaration and the namespace declaration to save space. To save space, I have not included the full text of the XTM DTD in this article, but it is available to online on the TopicMaps.Org site at http://www.topicmaps.org/xtm/1.0/

The <topic> element has an id attribute. I have deliberately used a value bearing no relation to the topic
itself. The id attribute is simply a syntactical construct that will allow us to make references to the topic
later; beyond this it has absolutely no meaning to a topic map processor. All elements defined by the XTM 1.0
DTD have an id attribute, enabling any part of the topic map document to be referenced if need be, but the id
attribute value is only required for the <topic> element.

The name label is contained within a <baseNameString> element, which is in turn contained within a <baseName>
element. As you might expect, this indicates that there are further properties of a base name that we have not yet
discussed.

Similarly for the <occurrence> the reference to the occurrence resource itself is contained in a <resourceRef>
element. The <resourceRef> element is declared in the XTM DTD as an XLink simple link. It is this use of XLink
(and its use in all the other reference constructs in the XTM DTD) which requires the declaration of the XLink
namespace. The second of the occurrences shows the use of the <resourceData> element to provide the occurrence
data "in-line" with the topic map. This makes it somewhat easier to add meta-data to topics.

If you look at the simple topic map shown in the sample code aboce, you will see that the topic does not really convey
any meaning. There is nothing to indicate that the topic represents a company, nor is there anything to indicate how
the occurrences are related to the topic. What we need is some way to indicate the type of 'thing' represented by the
topic and the type of relationship between the topic and the resources identified or specified by the occurrences.

In topic maps types are defined, like almost everything else, by topics. We create a topic for the topic type or for
the type of relationship between the topic and the occurrence resource and then use the <instanceOf> element with
a nested <topicRef> element to reference the type-specifying topic. So to do this with our simple sample, we need
to create some new topics as shown in the sample topic map below.

You can also see the principle of typing conceptually in Figure 3, below, where the different topic types are rendered
as different shapes for the topics. In the diagram, the topics defining the types are shown as hollow shapes; the
instances of those types are shown as solid shapes and the class-instance relationship is shown as a dashed line.
However there are no restrictions placed on a topic used to define a type. A topic that is used to define a type can
still be used like any other topic in the topic map and it is perfectly legal for a topic to be used as a type and to
have one or more types of its own. This feature makes it possible to document both the ontology used by the topic map
and the instances of that ontology, all with the same basic topic map mechanics.

Figure 3 - A topic map with topic types

Each occurrence can only be of a single type; however, a topic may have multiple types, which are represented using
multiple <instanceOf> elements. The <instanceOf> element asserts a class-instance relationship between the
object containing the <instanceOf> element (be it a topic, occurrence or association) and the topic which is
referenced from the <instanceOf> element.

It is important to understand that an instance-of relationship is not the same as a superclass-subclass relationship.
It is a common mistake to think that mark-up like the fragment shown in Sample 3 represents a class-hierarchy
"'mycorp' is a 'company', a 'company' is an 'organisation', therefore 'mycorp' is an 'organisation'". Unfortunately,
that statement is not what is represented in the fragment. In fact the fragment asserts that "'mycorp' is a 'company'
and 'company' is an 'organisation'." Not only is that English statement grammatically incorrect, the topic map is also
semantically incorrect! What we really need to do is to create a subclass-superclass relationship between organisation
and company. Such an association would make the assertion that "any company is an organisation".
We will see how to do this later.

The <baseNameString> element contains only PCDATA, but you will recall that non-string resources such as
graphics can be used to label topics. This facility is provided by the <variantName> construct. A single base name
may have any number of variant names associated with it. The structure of a <variantName> element is very similar
to that of an <occurrence> - it may contain either an inline string resource or a reference to an external resource.

Unlike occurrences, variant names are not typed; however, they do have a mechanism for indicating the kind of variant
that they are. A variant may have any number of parameters which indicate the nature of the variant. As with types,
each parameter is defined by a topic, and the topic map paradigm allows an author to use any topic he or she likes as
a parameter. The example below shows our company topic with a variant name which is the normalised sort string for the
topic. This sort of variant name might be used by an application to order a list of companies
(although the base name string itself would be the one used for display purposes).

Variants can also be nested within each other. A variant nested inside another variant inherits all of the parameters
of its parent. So if we have, for example, icons of different sizes, we can use nesting to group them all together.
This is shown in the example below - the two variants with <resourceRef> elements pointing to icons are identified
with a single parameter indicating their size. They inherit the parameter indicating that they are icons from the
containing <variant> element. It should be noted that the topics with id attribute values of
"icon", "_16x16" and "_32x32" are the minimal mark-up possible for a topic. This is not a recommended approach to
topic creation, but is used here to save space.

A scope indicates the context within which a characteristic of a topic (a name, an occurrence or a role played in an
association) may be considered to be true. One common use of scope is to provide localised names for topics. For
example "company" in English could be "société" in French or "Firma" in German. They all describe the same concept and
so should be represented by the same topic. Rather than creating separate topics one should instead create a single
topic to represent the concept of a company and add to it three separate base names, using scope to indicate that a
particular base name should be used for a specific language environment. The example below shows a simple use of scope.

<topicMap><!-- Some topics for the languages --><topicid="en"/><topicid="fr"/><topicid="de"/><topicid="company"><baseName><scope><topicRefxlink:href="#en"/></scope><baseNameString>company</baseNameString></baseName><baseName><scope><topicRefxlink:href="#fr"/></scope><baseNameString>société</baseNameString></baseName><baseName><scope><topicRefxlink:href="#de"/></scope><baseNameString>Firma</baseNameString></baseName></topic></topicMap>

If a characteristic has no scope defined for it, then it is said to be in the unconstrained scope. Any characteristic
defined in the unconstrained scope is always considered to be valid. Beyond this, the XTM specification leaves the
precise effect of scope on processing up to individual applications. In practice, the scope of a characteristic is
usually compared to a collection of topics representing the current context of the user or the application. The results
of this comparison are then used to determine whether or not a given characteristic is valid. When comparing sets of
topics in this way, standard mathematical set operations can be applied. For example a characteristic could be
considered "valid" if the topics in its scope form a subset of the topics that define the user context. Alternatively,
a characteristic could be regarded as "valid" if the union of the topics in the scope of the characteristic and the
topics defining the user context is not the empty set. These two possible approaches are shown in Figure 4, below.

So far the topics that we have created have little in the way of machine-processable identity. For example, the topic with id "sort-string" is used to indicate that a variant name is a normalised sort string. But if we were to try the topic map in a topic map application, how would the application know that it is this particular topic that indicates that the nature of the variant name is suitable for sorting ? What is needed is some commonly agreed identity for the topic that represents the concept of "suitable for sorting".

This problem is not limited to the "structural" topics, which might be used to express processing options to a topic map application, but applies also to all other topics we create. For example, if we have created a topic for the concept of a 'company', ideally we would like to apply some commonly agreed identity for the concept to that topic so that the topic map could be more easily interchanged.

In addition to application-specific use, a topic map processor also takes note of the identity assigned to topics. When the processor determines that two topics are 'about' the same thing, then those topics will be merged. How a processor determines that two topics are 'about' the same thing may be application specific, however XTM does define some basic principles, based on the different forms of identity described below.

The subject that a topic represents can be identified either by reference to the resource that the topic represents; or else by reference to a resource that in some way describes the subject in a way that is meaningful to a human being. These resources are known as subject-constituting and subject-indicating resources respectively. In addition to these formal indicators of identity, the topic map paradigm also includes a mapping of the names of topics to an identity. This name-to-identity mapping is defined by a rule called the topic naming constraint.

If we want to make some assertions about a Web page, or some other retrievable resource with a unique address, we can
use the address of the resource as the identifier for the topic we create to represent it. In this case, the identifier
is said to reference a subject-constituting resource. A topic may only reference a single subject-constituting resource
- this makes sense because a topic can only ever be about a single thing. It is considered an error if an attempt is
made to merge two topics with different subject-constituting resources. When two topics have the same
subject-constituting resource, a topic map processor will regard them as being about the same thing and will merge them.

The mark-up for a subject-constituting resource is the <resourceRef> child element inside the <subjectIdentity>
element. The resource pointed to by the XLink href attribute of the <resourceRef> element is the
subject-constituting resource for the topic.

In the example above, we use the address of the company's web-site in order to make some assertions about the site
itself. If we want to make some assertions about a company itself, we could use the address of the company homepage as
an identifier for the topic. In using the address of the company web-site in this way, we are assuming that a reader
who reads the home-page of the web-site will understand that it is the company we are describing in our topic map. In
this case, the identifier is said to reference a subject-indicating resource. A topic may have any number of
subject-indicating resources because each of these resources describes the thing that the topic represents and is not
the represented thing itself. For the same reason, a topic may have both a subject-constituting resource and one or
more subject-indicating resources. When two topics have one or more subject-indicating resources in common, a topic map
processor will consider them to be about the same subject and will merge them. In addition, if one the
subject-indicating resources of a topic is the address of another topic, the topic map processor will consider those
topics to be about the same subject and will merge them.

At first, this last constraint may seem a little strange and it is worth describing in a little more detail. Consider
two topics, A and B. Topic A represents the concept of a company. Topic B has the address of topic A as a
subject-indicating resource. This means that topic B is about the subject described by topic A. This situation is shown
in Figure 5, below. As we already know that topic A represents the concept of a company, it must therefore be obvious
that the one subject that A describes best is that concept. Therefore B must also represent the concept of a company
because A is the descriptor for that subject. This means that A and B are 'about' the same subject and should be merged.

Figure 5 - One way in which two topics can represent the same subject

The example below shows the syntax for treating a URI as a subject-indicating resource. The <subjectIndicatorRef>
element is a child of the <subjectIdentity> element, which uses an XLink simple link to point to the resource that
describes the subject of the topic.

The topic naming constraint states that, in the words of the XTM specification "any topics having the same base name in
the same scope implicitly refer to the same subject". This rule essentially makes the label assigned to a topic into a
form of identity for the topic. It is important when creating the labels for topics that the author be aware of this
rule. When creating a new base name, an author should be sure to qualify the name either within the label string itself,
or else to scope it appropriately.

The example below shows how this rule can lead to some unexpected results. The topics with id "rci-sales" and
"abc-sales" are intended to represent the sales departments of Redmond Computers Inc. and ABC software respectively,
but because each has the name "sales" in the unconstrained scope, a topic map processor will assume that those topics
refer to the same scope and will merge them. Obviously, in this case such a merge would be incorrect. In order to
prevent the merge from happening, the author of this topic map must apply more qualified names to these topics.

There are two different ways in which more qualified names can be created. One approach is to keep the same name string
and add a differentiating scope; the other way is to modify the name string to include some differentiating information.
In the example below, we use the company itself as a differentiator. The assumption made here is that any given
company will have only one sales department (for a multinational company, of course, both company and geographic region
might be required for complete differentiation).

To differentiate using scope, we simply add a <scope> element to the <baseName>, containing a <topicRef>
pointing to the topic which represents the appropriate company. To differentiate using a modified name string, we
include the company name in the department name string. If it is intended that the topics representing the departments
should be accessible outside the context of the company, a combination of these two approaches is most appropriate as
the more qualified name string will be useful for display in cases where the two departments might occur in the same
list of search results set.

When two topics are merged, the result is a single topic representing the aggregate of the information of the two merged topics. In practice this means that the types of the new topic are the union of the types of the two source topics; likewise the names and occurrences of the new topic are the union of the names and occurrences of the two source topics; and finally wherever either of the two topics plays a role in an association, or provides the type for another topic map construct, they will be replaced by the new topic.

Although simple, these merging rules give a great deal of power and flexibility to topic maps, enabling the development of modular systems of topic maps each providing a different view of the same basic concepts; or the development of topic maps by automated processes which can then be further developed manually without the need to edit the automatically generated topic map directly.

Associations represent the relationship between two or more topics. An association consists of two parts. Firstly there is the association itself: this defines the nature of the relationship between all of the associated topics. As with occurrences, associations can be typed by a single type-specifying topic. It is this type that defines the nature of the relationship indicated by the association. Secondly, the association consists of a number of players, each of which is a topic and which plays a role in the association that is in turn described by another topic.

Let us build up an example association showing a relationship between Redmond Computers Inc. and an employee, John Smith. We can start by creating a simple association between the topic representing the company and the topic representing John Smith. This is shown in the following example:

As with our first topic example, this sample barely conveys any information at all. It states that there is some
relationship between something called "Redmond Computers Inc." and something called "John Smith" but does not say
anything about the nature of the relationship nor about what roles each partner plays in the relationship. Once again,
this information is conveyed by topics. It is the type of the association that defines the nature of the relationship.

The association type is defined using an <instanceOf> element. Each member of the relationship can be given a
specific role using a <roleSpec> element. This is shown in the example below:

Now we can see far more information. There is type of the relationship is labelled as "Employs", and the
contributions made by the topics of "Redmond Computers Inc." and "John Smith" are characterised as "employer" and
"employee" respectively. In fact, the naming of these topics can be misleading - it is easy to assume from this
association that the relationship is a one-way relationship from "Redmond Computers Inc." to "John Smith"; whereas, in
fact, the association simply groups together the topics which play roles in it, without implying any ordered
relationship between them.

This aggregation property of the association is part of what gives the topic map paradigm its extraordinary power. From
the topic of "Redmond Computers Inc." it might be possible to list all of the associations of the type "Employs" in
which it plays the role of "Employer", and so get a company directory listing. Equally by following associations of
type "Employs" from "John Smith" in which that topic plays the role of "Employee", we might get a complete employment
history for this person.

If I wanted to list a number of employees in the same association construct, this is allowed in XTM syntax. I can
either add another <member> element with its child <roleSpec> and <topicRef> elements or I can simply add
another <topicRef> to the existing <member> element for the role of "employee". This latter option is a
syntactic short cut for allowing multiple players of the same role in the same association to be specified.

One common issue when creating topic maps is how to label associations. Unlike topics, associations do not have any
mark-up for specifying a label for each instance. Instead, many topic map applications will use the label of the topic
that defines the type of the association. Very often the labels for associations in a topic map will be verbs, for
example "Redmond Computers Inc. employs John Smith", but these verbs imply a direction to the association. Many topic
map practitioners give the topic that types the association a label for each role that the association supports and
then scope those labels by the topic that defines the role (the topic referred to from the <roleSpec> element). The
logical conclusion of this approach with a simple binary association such as the one in our sample is to assign three
separate names to the topic which defines the association type. In the unconstrained scope, the association type should
be named with a noun such as "Employment". The use of a noun frees the name from the context of one or other of the
roles in the association. The two other names should be verbs using the roles as the context for the name and the role
types to define the scope of each name.

This is shown in the sample below. In this sample, the label "Employment" is created in the unconstrained scope to be
treated as the default name for the topic; the label "Employs" is to be used in the context of the employer and so is
scoped by the topic "Employer"; similarly the label "Employed By" is scoped by the topic "Employee". An application may
then use the role played by the topic currently in focus in the application as part of the user context when determining
which is the best name to be applied, so in the context of "John Smith" playing the role of "Employee", the application
would select the label "Employed By".

Just as we use scope to express the context within which a name or occurrence of a topic is valid, so we can also use
scope to express the context within which an association is valid.

The <scope> mark-up itself is exactly the same as that used for <baseName> and <occurrence> elements, and the
mark-up appears as an optional child of the <association> element.

As an example, let us suppose that a rumour of merger talks between two companies is to be represented in the topic map.
One way to do this would be to create a distinct association type, but we would then need to create a distinct type for
every rumoured association. An alternative method would be to scope the association by a topic that indicates that the
context for the association is that of "rumour". This is shown below:

Of all the constructs in a topic map, only the topic is allowed to have names and occurrences and to play roles in
associations. In other words, one can only make assertions about a subject which is represented by a topic. Those
assertions themselves are not topics and so we cannot make assertions about assertions. Reification is the process by
which a topic may be constructed to represent the assertion made by some other construct in the topic map. This process
enables a name to be given to a particular occurrence of a topic, or documentation of an association to be "attached"
to the association itself.

The mechanics of reification are quite simple. To create a topic that reifies another construct in the topic map,
simply create a topic with a subject-indicating resource locator which points to the construct in question. For example,
consider a partnership between two companies. Such a relationship may be publicly announced in the form of a press
release; or from analysts' reports in trade press. By reifying the association, the information resources that gave
rise to the creation of the association can be documented, allowing users of the topic map to get more information
about the merger. The mark-up for this reification is shown in the example below and a conceptual overview of the
reification is shown in Figure 6.

<topicMap><topicid="xyzzy"><baseName><baseNameString>Redmond Computers Inc.</baseNameString></baseName></topic><topicid="abc"><baseName><baseNameString>ABC Software</baseNameString></baseName></topic><topicid="partnership"/><topicid="partner"/><associationid="rci-abc-partners"><instanceOf><topicRefxlink:href="#partnership"/></instanceOf><member><roleSpec><topicRefxlink:href="#partner"/></instanceOf><topicRefxlink:href="#xyzzy"/><topicRefxlink:href="#abc"/></member></association><!-- This topic "reifies" the partnership association --><!-- This enables us to attach the press release announcing the partnership to the association itself --><topicid="foo"><subjectIdentity><subjectIndicatorRefxlink:href="#rci-abc-partners"/></subjectIdentity><occurrence><instanceOf><topicRefxlink:href="#press-release"/></instanceOf><resourceRefxlink:href="http://www.redmondcomputers.com/pressrel/01042002_001.html"></occurrence></topic></topicMap>

Merging is a cornerstone of the topic map paradigm. The merge process enables distributed and modular creation of topic
map "knowledge-bases". There are three ways in which two topic maps can be merged.

Under explicit application control. A topic map application may provide the user with the ability to selectively merge topic maps. There is no restriction in the XTM specification to prevent an application from merging topic maps at runtime as needed or as directed by the application user.

Processing a <mergeMap> element. The <mergeMap> element allows an author to explicitly request the merge of
another topic map with the map that he or she creates. The mark-up for the <mergeMap> element also allows the
author to define a set of topics to be added to the scope of every characteristic in the external map. This
additional, externally defined, scope can be useful for preventing unwanted topic merges from occurring, or else to
indicate the source of a characteristic in the final merged map.

Processing a <topicRef> element. A <topicRef> element is not limited to referring only to topics contained
within the same topic map document. A reference can be made to a topic in another topic map document. If a reference
is made to a topic in an external topic map, then a topic map processor is required to retrieve the entire topic map
containing that topic and to merge it with the topic map containing the reference.

This article has described the basic principles of topic maps and introduced the XTM 1.0 interchange syntax for topic map information. We have seen how topic maps are constructed from the basic elements of topics and associations and how more advanced features such as scope, identity and reification can be applied to make detailed, context-sensitive information available from the topic map.