Proceedings of Extreme Markup Languages®

Browser bookmark management with Topic Maps

Abstract

Making effective use of large collections of browser bookmarks is
difficult. The user faces major challenges in finding specific entries,
in finding specific or general kinds of entries, and in finding related
references. In addition, the ability to add annotations would be very
valuable.

This paper discusses a practical model for a bookmark collection
that has been organized into nested folders. It is shown convincingly
that the folder structure in no way implies a hierarchical taxonomy, nor
does it reflect a faceted classification scheme. The model is presented
as a topic map.

A number of simple enhancements to the basic information are
described, including a very modest amount of semantic analysis on the
bookmark titles. An approach for preserving user-entered annotations
across bookmark updates is delineated. Some issues of user interface are
discussed. In toto, the model, the computed enrichment, and the user
interface work together to provide effective collocation
and navigation capabilities.

A bookmark application that embodies this model has been
implemented entirely within a standard browser The topic map engine is
written entirely in javascript. The utility of this application, which
the author uses daily, is remarkable considering the simplicity of the
underlying model. It is planned to give a live demonstration during the
presentation.

Thomas B. Passin

Thomas Passin has been working with XML-related technologies
since 1998. He helped to create the XML version of the message set in
SAE J2354 Advanced Traveler Information Systems, currently in
balloting, and has created a number of demonstration applications that
use XML, XSLT, and Python technologies together. He also consults at
work about XML and XSLT matters, and is active on a number of related
discussion lists.

His interest in Topic Maps developed naturally from past
experience with data modeling. He is currently finishing a manuscript
about the Semantic Web.

Mr. Passin studied physics at the Massachusetts Institute of
Technology and the University of Chicago.

Browser bookmark management with Topic Maps

Extreme Markup Languages 2003® (Montréal, Québec)

Introduction

Browser bookmark collections pose both an opportunity and a
challenge to knowledge management technology. Browser bookmarks often
play the role of a major database of reference information. Everyone has
them, and there is a large amount of semantic content in the arrangement
and names of folders, and the titles of the actual bookmarked resources.

The tools available for navigating and viewing bookmark
collections are not able to make much use of this information, and even
the best that the author has tested have serious weaknesses. Thus, users
turn to Google or another web search site. These sites often do the job,
but the user has to repeat the process of winnowing out the undesired
hits.

After reviewing problems and issues with current bookmark
managers, this paper presents an analysis of the nature of bookmark
collections — assumed to contain nested sets of folders into which the
bookmarks are organized — and describes a Topic Map-based approach that
significantly increases the usefulness of a bookmark collection. Some
user interface issues are discussed. Then an example browser-based
implementation is briefly described.

Why bookmarks are hard

In this paper we are primarily considering relatively large
collections of bookmarks. The author’s own collection currently
has 1855 bookmarked pages organized into a structure that has 723
folders. In fact, these bookmarks are collected by three different
browsers. As we will see, this fact makes the collection even harder
to keep organized.

Forgetting

It is impossible to remember where every interesting bookmark
is located, and impossible to recall everything that has been
stored. It is also hard to recall why sets of folders were organized
the way they were. This makes it difficult to consistently file new
bookmarks and to know where to look for old ones.

Organizing

Often it is plain hard to know how to classify a particular
web page. With the rational for the folder structure half-forgotten,
and in the rush of the moment, a bookmark may get filed in strange
and (later) unexpected places. New subtrees of folders may begin to
grow. Naming conventions drift. For example, the author has noticed
that he has been creating more plural folder names recently.

When multiple browsers are involved, it is even harder to keep
a consistent organization of the folders.

Finding

In some browsers, and most bookmark managers, it is possible
to search the bookmark collection by title. This is quite useful if
one has some memory of the title, but this is often not the case.

Beyond finding a specific title, one often wants to find
related bookmarks. If similar bookmarks are scattered among distinct
folder subtrees, a simple search will not find them.

Serendipity is valuable and exciting. It should be possible to
find references that are related but unexpected.

Merging bookmark sets

It should be possible to combine bookmark sets from several
browsers. The difficulty in merging is that the various browsers
will probably not have the same organization.

Annotating

The ability to annotate a bookmark in various ways would be
useful. Some bookmark managers do allow a description to be attached
to a bookmark, and some will let the user create one or more
searchable keywords. But often one wants to make a number of
different kinds of notes, a capability apparently not supported. A
good knowledge management system should support this desire.

In addition, the annotations should not get deleted when the
collection is refreshed or changed. Some bookmark managers are able
to store a global set of bookmarks separate from the browser, which
would reduce the update problem, but it would probably be better to
be able to work with the bookmarks maintained by each browser.

Desirable features of a bookmark manager

The key goals for a bookmark manager are the same as for any
other library-like collection of information — collocation
of related information, and effective navigation
of the collection (see [Svenonius 2000] for a modern
review of these objectives). Merging and annotation capabilities go
beyond the classical objectives of library science but are very
desirable.

The problem for the design of a bookmark manager lies in
providing these capabilities given the extremely uncontrolled and
variable nature of the organization of bookmark collections.

Bookmark issues

In this section we look at the main issues with bookmark
collections that make them hard to work with.

What do the folders mean?

In a typical display, bookmarks are laid out in a tree-like
display with folders inside of other folders, much like a display of a
file system. Some bookmark managers dispense with folders altogether,
relying instead on title searches and category labels or keywords.
This works for relatively small collections, but as the collections
increase in size, management of the growing collection of categories
generally becomes a major problem. If they exist in a flat space,
there are too many of them and they are not structured effectively. If
they are allowed to be structured hierarchically, they amount to the
same thing as a set of nested folders.

Do the folders form a hierarchy?

Obviously the folder names have some relationship the
user’s notions of classification. It is tempting to imagine that
the folders form a hierarchy of concepts, like a taxonomy. This is
not the case, though. They are not a hierarchy, and sub folders do
not depict progressively more specialized versions of their
“parents”.

To show this, consider a fragment of an actual collection,
taken from the author’s collection. Under one top-level folder,
we have this structure:

Now, obviously, Bakeries is not a kind of Food, nor is
Sourdough a kind of Baking. On the other hand, Bread could be
considered a subclass of Food, and likewise Knives could be a
subclass of Tools.

Clearly, this fragment is in no way a taxonomy, structured as
class-subclass-sub-subclass... Next we will see that is not even a
proper hierarchy.

Leaving aside the obvious point that the same bookmark could
appear under different folders, it is more significant that the
order is often arbitrary and could easily have been reversed. For
example, we could easily have had this listing instead:

Food
Bakeries
Bread
Sourdough
Baking

If the order could be altered and still make sense, it is
impossible that the folders represent a true hierarchy. As we have
seen, though, a particular sub folder could in fact be a subclass of
its “parent”. But this is not true in general, and not even
necessarily within any one branch.

The KWIK [KeyWord In Context] technique sometimes permutes the
order of keywords in compound terms. It would seem that bookmark
folder can tolerate some degree of permutation, suggesting that they
are more akin to compound indexing terms.

Do the subfolders represent facets of their parent folders?

Faceted classification schemes present a variety of
subproperties that may be applied to an entity. A complete
classification includes all facets that apply to the thing of
interest, but their order is not generally significant.

Since we have seen that folder order is not always
significant, could the subfolders represent facets? Unfortunately,
this is not true in general either. Facets are supposed to represent
orthogonal and exhaustive collections of applicable subproperties [Tzitzikas 2003]. Obviously, the subfolders shown above
do not represent orthogonal concepts, let alone exhaustive sets.

Indeed, in most cases the subfolders do not represent
properties or subproperties at all. Many times they represent
different perspectives on the “parent” folder. Thus, Baking
is more a perspective on Food than a facet of it. At the same time,
some subfolders may well actually be facets.

To sum up, typical collections of bookmark folders are not
true hierarchies, are not taxonomies, and are not faceted
classification schemes. They are instead a somewhat incoherent
combination of all of these together with other, probably unnamed
schemes. This is just what one would expect from a person, untrained
in library classification and without a controlled vocabulary.

In Section 1, we will see how to model real folder
collections so as to get the most mileage from them.

Semantic content of bookmark collections

There is a great deal of semantic information in the titles of
bookmarked resources. Although its quality is variable and its terms
are uncontrolled, most titles had some meaning to the page author and
also have some meaning to the user who created the bookmark. Because
of its uncontrolled and variable nature, and because they are short,
it is hard to do much useful computerized analysis of individual
titles, especially with relatively simple software.

Users, though, are skilled at extracting useful information from
titles. Title searches and browsing techniques make use of this human
strength.

In a similar way, the titles of folders made sense to the person
who created them, in the context of her thoughts and goals at the
time. Thus, searches and browsing of folder titles is likely to be
useful as well. Whether much useful semantic analysis can be done on
the titles is open to question, again because they are short and
inconsistent.

The structure of the folders somehow reflects the user’s
notions of classification. In some way, a subfolder must have seemed
at one time to have some meaningful relation to its containing folder.
The structures can be analyzed, if suitable principles can be found
for doing so.

Change and stability in Folderland

As noted earlier, both the organization and the naming styles of
bookmark folders are prone to drift over time. This argues against
trying to derive a fixed ontology ahead of time to model the
collection. Instead, the design must be created anew from time to
time, so that it can adapt to the changing structure of the
collection.

Over time, it is likely that more and more duplicates of certain
bookmarks will be filed in different folders. This can happen because
the user forgets that a particular page was already captured, because
it is being viewed in a different browser, or because the user just
wishes to classify the same page differently because the focus of her
interests has changed.

If the same bookmark is filed in different locations, it is
likely that they have something in common. For example, a page might
eventually get filed under “RDF”,
and also under “Ontology”, and perhaps under “Knowledge
Management” as well. What do these three have in common. Well,
obviously, many things, but it might be hard to articulate the
commonality that caused the specific page to be filed.

In a library setting, the librarian would spend the time
necessary to arrive at a convincing classification, using a controlled
vocabulary. But in the browsing setting, the user will file the page
in a matter of seconds.

Because the different filing locations most likely have
something in common, to satisfy the goal of effective collocation, it
would is desirable to retrieve all of them when a particular bookmark
is found.

Of course, some bookmarks get deleted from the collection as
well. Dead bookmarks may or may not get purged.

Updates and annotations

The principal problems relating to personal annotations in a
bookmark collection are first, to be able to preserve them when the
bookmark collection changes or gets restructured, and second, to be
able to save them if the user decides to change her software.

For the first, it seems best to attach annotations to each
bookmark as defined by its URL.
That way, even if the structure of the collection changes, the
annotations will stay with the URL, which is generally what is
intended.

For the second, it should be possible to save the bookmark
collection with its annotations in some relatively standard format,
either an XML format, or
following some standard such as Topic Maps or RDF.

The model

In this section, we arrive at a suitable model for representing
the folder structures. This model underlies the implementation covered
in Section 6.

Folder structure as a subject language

We have seen how a set of bookmark folders is likely to be
neither a taxonomy nor a set of classification facets, and how it
tends to be inconsistent. In fact, uncontrolled bookmark structures
are normally very informal. Any given subfolders may represent a
subclass, a facet, a perspective, a “See also” relationship,
or some other relation to the parent folder that may not even be easy
to articulate. For example, in the author’s own collection, there
are many folders whose names begin with “And” — “And
Java”, “And Web Services”, “And Python”, and the
like. These “And” folders are an attempt to indicate a
relationship of equal status between the two concepts.

A further complication is that several folders may have the same
name. For example, “Articles” may appear in folders in several
unrelated branches. So the folder name by itself may not be enough to
identify the right concept or filing place, we need somehow to
represent the context in which it appears.

In the study of the organization of information as applied in
the library sciences, there is a concept called “Subject
Language” [Svenonius 2000] . A subject language
is a vocabulary for describing the subject of a work in such a way
that it can be recognized and thereby found — that is, to provide
navigation capability for a collection. The Dewey Decimal System is a
familiar subject language, but there are a vast number of others.

The insight here is that the folder collection represents a kind
of subject language. It is informal to be sure, but a subject language
none the less. Now, there are many types of subject languages some of
which are hierarchical and some not, some ordered and some not. The
terms of the language may be atomic or compound. In fact, a faceted
classification scheme can be seen as a kind of subject language.

Thus, we seek a subject language design in which there are
compound terms, and in which order is not of great importance, but not
completely ignorable either. The next section shows a practical way to
accomplish this.

The approach to a subject language

In XML, nested elements are usually represented as some kind of
a tree, much like nested folders in a file system or in a bookmark
collection. But there is a common alternative way to express the same
structure. That is with path expressions, as for instance XPATH
expressions. A set of three nested folders can be represented by the
path expressions A/B/C, instead of the tree-like view:

A
B
C

Now a compound subject language term might be written like this:

Food::Baking::Bread

This form matches a path expression exactly, except for the
choice of separator, which is arbitrary anyway. With this insight, we
can directly interpret the folder structure, written as path
expressions, as a set of compound subject language terms. As we will
see below, we can decompose the compound terms into simpler ones, and
into atomic terms as well.

By using path expressions, the terms automatically carry their
context with them. Atomic terms that are obtained from the
decomposition of the compound terms no longer have their identifying
context, but if some other atomic term is found to have the same name,
there is a reasonable chance that the two have something in common.
This will help us to find related resources in places where we have
not thought to look.

Consider this fragment — Software/Language/Java.
We see again that it is not a taxonomy, for Language is not really a
Kind of software. Language/Software might have been better, since we
speak of “Software Languages” (although “Programming
Languages” is heard more often). “Language”, as used here
is more of a perspective — or perhaps a facet — of Software. Java, on
the other hand, is a kind of software language.

It is hopeless to expect to deduce such relationships by
analyzing the particular bookmark collection. Very likely the person
who created it was not very clear or consistent in her notion of why
she created the folders this way in the first place. But we can
automatically create and decompose the subject language terms without
regard to their semantics. Then we can look to see if there is any
semantic enhancement that would be practical and useful. This is the
path taken here.

Terms

The example fragment — Software/Language/Java
— is thus to be considered a compound indexing term in an unknown
subject language. It can be decomposed into one compound and one
atomic term, that is to say, into Software/Languages
and Java. We call the term first
term the head term, and the second
the tail. Figure 1 depicts this
decomposition, which is the key to the analysis of the bookmark
collection.

Figure 1: Model of bookmark indexing terms

Basic model of a bookmark collection, showing compound
indexing terms decomposed into “head” and “tail”
terms. Note that a bookmark is indexed by the compound term, rather than by the
“tail” term, even though the folder tree display appears
to show the bookmark under the “tail” term.

In the tree-like view, actual bookmarks seem to be filed under
the folder “Java”, However, as we saw above, the context is
critically important, and the context is captured by the full path
expression. Therefore, as shown in Figure 1, the actual bookmarks
(that is, the URL resources) are associated with the compound term, which in this case is
Software/Languages/Java. For labeling purposes, we may want to display
the label of the tail term, which we can get through the compounded from association.

Figure 1 is drawn in the style of a topic map [ISO Topic Maps 2002]. In the topic map pattern, relationships, called
associations, are non-directional.
Each arc is described by the role
it plays in the association. Topics
are computer structures that represent concepts, and the type of a
role is a topic because (of course) it is a concept in its own right.
Figure 1 depicts a second kind of association as well, the association
labeled describes that connects
the actual bookmarked pages to their (compound) indexing term. The
figure illustrates several resources associated to the indexing term
via the single association, but alternatively there could be several
describes associations, each with
just one associated resource. Of course, each bookmarked resource also
gets its own topic.

The head term, which in this example is still a compound term,
also gets decomposed into its own head and a tail terms. We continue
to decompose the compound terms until they have all been fully
decomposed. Of course, each intermediate compound term may have its
own bookmarked resources associated with it.

At each step of this process, each compound indexing term gives
rise to one association and three topics — one for the original term
and one each for the head and tail terms. We note in passing that
there might be duplication of atomic folder names, although those tail
terms are not equivalent because they arise in different contexts.

This analysis into paths, compound indexing terms, and then into
atomic terms, is the main step in analyzing the bookmark collection.
It is technically easy to do, and ought to scale approximately as O(n)
(except for any issues of indexing the new topics after they have been
created).

This construction procedure is entirely mechanical, yet it does
capture some of the semantics of the collection. That is because the
user created the structure based on her personal concepts and
connotations. To capture the structure is to capture some part of its
semantics. It is as if the structure and connections communicate to us
in ghostly ways. If any analysis of the actual semantics of the labels
should be possible, it would only enhance a structure that is already
surprisingly rich. We will see in Section 4 that there is a simple
semantic enhancement that can sometimes be made.

When a topic is created for a bookmarked resource, a check is
made to see if a topic already exists with that URI. If there is one,
it is used and a new one is not created. It is intended that no
resource topic be duplicated. Since each bookmarked resource becomes a
topic, naturally it is easy to attach any kind of annotation or meta
data to it.

Enrichment

In principle it should be possible to perform some analysis of
folder labels, but with a small sample and little supporting text it
would be hard to accomplish much automatically. Although it would be
possible to arrange for the user to add semantic information (since the
creator of the collection presumably would understand it better than a
computer), in this work we take a different approach.

Those “And” terms

Earlier it was mentioned that the author’s collection
includes a number of folders whose name starts with “And”,
such as Python/And Java. Although
this may be idiosyncratic to the author, it turns out that a very
useful bit of semantic analysis can be done on these folders. The
analysis is very simple, in fact it might be called
“simple-minded”, but it has turned out to be quite effective.

The analysis consists in nothing more than splitting the name
and creating a corresponding association. Figure 2 depicts this
process.

Figure 2: Modeling folders whose names begin with
“And”.

A compound term whose “tail” term starts with the
word “And” is treated specially. The tail term is
decomposed by splitting the string, and a new term is created, if
one does not exist already. The type of relationship is called a
“Co-mention” association, because the terms
“Co-mention” each other.

The tail term, which is the term with “And” in it, is
regarded as compound, and the label is split to extract the part
following the “And”. A topic is created with this label, if
one does not already exist, otherwise the existing one is used. A
co-mention association is created
to relate these terms back to the parent compound term.

For this scheme to be useful, the new terms have to match
existing atomic terms. This is often the case, at least in the
author’s collection. When this happy situation exists, the
co-mention relationships allow for finding related subjects in
unexpected places. This will be illustrated in Section 6 where a
working implementation is discussed.

Equivalent terms

Search capabilities have proved to be useful in the
implementation of this model, as discussed in Section 5 below. One
could add some ability to search for synonyms and other equivalent
kinds of terms, just as for any other search. One would presumably
then want to bring in stemming, and the task starts growing beyond
simple programming techniques.

In this connection it is interesting to ask how to present
search results that came from matches on equivalent terms, since
little semantic analysis has been done on the bookmark collection. For
example, since the words and words senses are not well known for the
collection, should all the results be mixed together, or should they
be segregated somehow? The author has done a little experimenting
about this in the implementation (see Section 6).

Other possibilities

Obviously any number of other enhancements can be devised. The
unanswered question is to what extent could they be useful given the
uncontrolled and variable nature of bookmark collections?

One possibility is to try to control the collection by getting
the user to contribute her knowledge, the knowledge and ways of
viewing the world that led to the structure of the collection in the
first place. How to accomplish this, how useful the results would be,
and whether users could be bothered to cooperate with the software in
this way over time must remain for future work to answer.

Another possibility is to try to map the collection to some
controlled vocabulary, and over time migrate the structure and
navigation to make progressively more use of the controlled
vocabulary. In addition to the question of user cooperation, it is
unclear how the user could be induced to start using the controlled
classifications in a typical browsing environment. When browsing, the
user wants to make decisions and file bookmarks in a matter of
seconds, and any questions or suggestions by the computer might seem
intolerable. At the least, this represents an extremely difficult
problem in usability and user interface design. But perhaps with
sufficient cleverness a satisfactory approach could be devised.

User interfaces

A topic map constructed according to the model described in
Section 3 is able to supply a rich serving of data to the user. How
can it be presented? What kinds of interactions will be useful? Devising
a good user interface for the map can be challenging. The author’s
preliminary experiments with several general purpose topic map viewers
showed clearly that a custom application would be necessary. The general
purpose viewers were simply not able to present and navigate the linked
information effectively enough. What characteristics are likely to be
important?

Navigation

There are two classic ways to navigate bookmark collections. One
is to browse, and the other is to search titles or, if available,
keywords. The existence of our topic map will not change their
usefulness (or lack of it, as the case may be). What the topic map of
the collection does offer is links that are not present in the
original, non-topic-mapped, collection. There are links between terms
and bookmarked pages, just as for any collection, there are links
between compound terms and atomic terms, and there are links between
compound terms of differing degrees of decomposition. There are links
between the “co-mention” terms, and there are indirect links
between folders that are known to share the same bookmarked resource,
simply by virtue of the fact that they share a resource. In this way,
a search of folder titles usually returns a rich set of entries into
the collection.

Tree views

To make good use of the topic map, then, we need to be able to
capitalize on the links. As we learned in Section 3, it is the
compound path that carries the context for a particular act of
filing. We need to show the context, and this suggests that we show
the path in some fashion. This could be done with a pseudo-tree
view. A tree view can show the local context very well, but it can
take up a lot of vertical space in the display. Also, a tree view is
not as good for displaying remote but related contexts.

Path views

Alternatively, the path expressions can be listed. This
provides a compact display in which it is easier to pick up
separated but related contexts. But a list of paths is harder to
make easy to read. Here is an example of a group of path statements:

This list (which has been truncated to save space) is
presented by the sample implementation in response to clicking on
the atomic term “Tools”. If this set of data were presented
in a tree format, it would be fairly incoherent and hard to absorb.
Tree views work well when many leaves are at the end of a few
branches, but here we have many branches, and it is their leaves
that are related. A tree view would not be very effective, but with
the path view, it is possible to scan and notice that, for example,
if I am interested in tools for ontologies, I might want to look at
tools stored under Conceptual Graphs as well.

In the example above, all the paths ended with the term of
interest — “Tools” in this case. This is by no means always
the case. To illustrate, here is another fragment:

The user would probably have forgotten that an Ontology
directory exists under Agents. This listing brings back not only the
fact but also something about the rationale. These fragments merely
hint at the power available using this system, which will emerge
more clearly though this the rest of this section and in Section
6.

One might think that, since it is argued above that path
expressions are usually more useful than tree views, that a list of
all path expressions in the collection should replace the usual full
tree view. This can be done with the sample implementation, but in
the author’s opinion, it does not work well because it is too
dense and too rich for casual browsing. It seems that a path listing
style view is better suited for limited sets of information about
closely related subjects. Perhaps this conclusion could be changed
by a sufficiently well designed display format.

Graphical views

The importance of the links suggests that some kind of
graphical representation would be useful. In graphical views there
are always three challenges to be met. One is achieving a
satisfactory layout automatically, another is to avoid an overly
cluttered graph, and the third is to have intelligent grouping of
the nodes. Programming such graphical displays is harder than
programming textual displays.

Clearly there is potential for devising a really good
graphical display for the collection’s data. The sample
implementation so far uses only textual displays.

Hyperlinking the compound paths

Obviously we expect to be able to click on a displayed path
and receive some useful information in return. But with a path
display such as the ones illustrated above, how should the
hyperlinks be arranged?

In many applications that show navigation links in the form of
compound paths, each step of the path is a separate hyperlink that
allows a user to return to previous pages and to skip intervening
links. However, this is not a good plan for our case. Here, each
compound path represents a context that may contain filed URLs or
other paths (i.e., folders). It is forward-looking and not backwards
looking in the sense that the links are not there to help the user
return anywhere, but instead to help her proceed.

Thus, it has proved better to let the entire compound path be a
single link that returns all related information about it, including
bookmarked pages and related folders. But this design has one
drawback, because it is sometimes desirable to also allow clicking
on the various steps in a compound path after all. The sample
implementation deals with this conundrum by also listing the next
higher part of each compound term but in a separate section of the
listing. This makes for a longer listing but still seems to be
effective.

Collocation

The examples above also illustrate a degree of collocation, that
is, finding related information in one place or at least nearby. We
saw, for example, potentially related folders. Not shown here but
depicted in Section 6, the stored bookmarks for any expression are
also shown on the same page as related path expressions.

If every related resource cannot be shown in one place at the
same time, the next best thing is that there are easy routes to get
from one set of related information to another. In an on-screen
application, this usually translates to few mouse clicks and minimal
cognitive effort. With other features that will be discussed further
in Section 6, it is usually possible to move to potentially related
information in three or less mouse clicks. Of course, one still has to
decide if the bookmarked resources found in this manner are relevant.
This would be done by using the titles of the resources, together with
any annotations the user has made.

Searching

As mentioned at the start of Section 5, searching is likely to
remain important, topic map or no. Can this topic map approach bring
any new aspects to searching, compared with ordinary bookmark
managers? The answer is both “yes” and “no”.

The answer is “no” in the sense that we can mainly
search resource titles and folder titles. Some ordinary bookmark
managers also offer searching of the URL strings themselves, and of
keywords. With the topic map, we could search all of these, and also
the compound paths. In addition, there are the new terms generated
from decomposing the “And” terms (see Section 4-1), and of
course, any other enrichment of the topic map that might be devised.

In the author’s experience, searching the path expressions
per se is not very helpful because it is difficult to remember them
and to spell them right. What does work very well is text search with
partial matching among the titles of the atomic terms — the leaves of
the folder branches, in other words, together with any synthetically
generated atomic terms. The full set of compound paths that are
related to the atomic “hits” can be just one mouse click away.
This design has proven to be highly effective.

Of course, searches among the resource titles themselves
continues to be extremely valuable. The titles contain a great deal of
semantic information, and searching is a prime way to access it. So
far the author has not felt the need to search the URL values
themselves. Once in a while it would be useful, but rarely enough that
he as not gone to the effort to write the code.

Browsing

The main method most people use to navigate and search their
bookmarks is the classic browse through the folder tree. The sample
implementation provides an especially convenient way to expand and
contract the folders to make this easier. Nevertheless, the author has
found browsing the folder tree to be the feature he uses the least. It
is good to have, but rarely essential. That is because it is so easy
to find a starting point and to see related paths through the
collection. After getting used to the system, the author feels rather
crippled when he is reduced to mere browsing of a tree view.

With the implementation, browsing is only the first step.
Selecting any one folder in the browse immediately brings up the
entire array of linked information, and one normally does not need to
go back to browsing the tree view.

For example, as an experiment, the author began to browse the
tree view for “Bookmark”, an appropriate term here. But there
is no top level folder with that name. Where to look? Instead of
browsing the folder tree any longer, a search of topic titles
immediately returned “Bookmark Managers” and
“Bookmarklets”. The first of these led through one
intermediate mouse click to “Web/Browsers/Bookmark Managers”,
which had URLs for five bookmark manager products, and a related link
to Web/Bowsers as well.

“Co-mentions” and pseudo-facets

Recall that so-called “co-mention” associations are
created when terms starting with “And” are encountered. There
may of course be other idioms that could be easily analyzed. When the
user selects a term, the displayed results include any co-mentions,
which are, of course, hyperlinks to their atomic terms. This provides
another path to related information, one that would not be found by
plain browsing. Here is an example. The selected term is
“Python” (the programming language, as this collection has no
entries as yet for the snake variety). The co-mentions are:

Python/And Web Services
XML/And Python

Notice how the target term, “Python” need not appear in
either the first or the last position. The idea behind co-mentions is
that the two terms are peers, and the use of “And” is a hint
to that effect. As was mentioned earlier, this is the only bit of
actual semantic analysis in the model to date, and it is barely worthy
of the name “semantic”. But this simple relationship turns out
to open up the collection to a surprising degree.

During the discussion of the model in Section 3, we inquired
whether child folders represent “facets”. It became clear that
in general they do not, but sometimes they do. Because little semantic
analysis of the collection is possible, there is no way to tell which
child folders are facets and which are which are not. The same is true
for perspectives and subclasses. There are also common patterns. For
example, many folders in the author’s collection have a subfolder
called “Articles”. Certainly an Article is some kind of
information about the subject of the parent folder, even if it is not
a facet or perspective.

It turns out to be useful to break out the immediate child
folders as if they were facets or
perspectives. In other words, we finesse the fact the we do not have
enough semantic information about these “pseudo-facets” by
ignoring it. For example, Under “Python” there is a folder
called “Articles” (path Python/Articles). So we include
“Articles” in a list called, in the sample implementation,
“Perspectives and Facets”.

This is useful because selecting “Articles” leads to all
the other subjects for which we have also have articles.

Screen real estate

Showing all this information on a single computer screen is of
course difficult. Choices include limiting the amount in any one view,
making it smaller, and opening other windows. None of these options
are desirable, but so far one or more of them must be invoked. In the
author’s opinion, there is a great need for creativity in this
area.

In the sample implementation, information is presented in three
side-by-side panels, and the actual bookmarked resources are opened in
a separate window. The panels are side-by-side to minimize the amount
of vertical scrolling needed when the returned listing is long. It
succeeds because the entries are typically relatively short, so that
horizontal scrolling or excessive wrapping is rarely needed. Long path
expressions fare less well with this design. The approach works fairly
well, at least for the author, but more creative and sophisticated
designs are called for. Of course, this applies to any rich
information-presenting system, not just the topic map bookmark
manager.

Implementation

Up to now, a “sample implementation” has been mentioned
many times. This application was developed to test and refine the model,
to prototype user interface features, and to eventually provide a usable
application for the author’s personal use. To make development and
changes quick, and to avoid having to create user interface machinery
for the application, the whole system is implemented in javascript as a
stand-alone program in a web browser.

Standards support and cross-platform capabilities

The code makes use of HTML,
CSS2,
Javascript, (ECMAScript,)
and DOM. Any standard
browser that supports these standards sufficiently well should work.
In practice, this means Internet Explorer 6 and Mozilla-based
browsers. Since the code runs in Mozilla, the application should work
cross-platform, although this has not been tested yet.

Architecture

The system is highly modularized and consists of a set of
“core” javascript modules that contain the topic map engine
and related utility code. This core has been used in several other
topic map applications without modification. The engine, and a topic
map editing application that uses it, is now — thanks to Alexander
Johannesen — an open source project hosted on Sourceforge, under the
name “TM4JScript”.

The engine is designed as a set of classes that attempt to
implement the structures described in the [XTM]
specification as closely as possible. XTM [XML Topic Maps] is an
interchange format, part of the ISO standard for topic maps [ISO Topic Maps 2002], not a program description, but it is very
feasible to consider the XML elements to be data structures and to
turn them into objects,

This approach does not necessarily lead to an
“efficient” program, but it is very efficient for its intended
purpose. It is not necessary to learn specialized, high-performance
programming structures and to mentally translate them to topic map
constructs. This maximizes the power of the system to support
prototyping and experimentation. Where performance becomes too slow,
the first line of defense is to create indexes rather than
sophisticated data structures.

Applications write pages dynamically, since there is no server
to do it for them. Typical applications have several frames in an HTML
frameset. Common code resides in the frameset where all the frames can
refer to it. Frames communicate only by requesting the reload of
another panel, passing any required information in query parameters.
In this way, the application could be converted to a server-based one
with very little effort beyond the actual porting of the engine. An
earlier version of the engine was ported to Python with ease, which is
not surprising because of the similarities between Javascript and
Python, and because the code was written as if it were for Python so
far as was possible.

Persisting and interchanging Topic Maps

The native format of a topic map file is a generated set of
javascript instructions that cause the map to be constructed. There is
utility code to write the native format and also to write standard XTM
format topic map files. Because standard browsers cannot write to the
file system in a portable way, it is necessary to do a source view,
then save from there.

The bookmark application does not need to import XTM files to
date, but a set of XSLT
stylesheets is able to convert an XTM file into the native javascript
format. This capability is used by other applications that use the
core engine.

Creating a bookmark Topic Map

To create the topic map from a set of bookmarks, the XML format
for browser bookmarks called XBEL is used. Python scripts turn the
browser bookmarks into XBEL files. The author uses three different web
browsers (Internet Explorer, Mozilla, and Firebird, which is based on
Mozilla). The XBEL files for these three browsers get merged by an
XSLT stylesheet. Duplicate URLs in the same directory get purged (to
avoid duplications between the browsers). Another XSLT stylesheet
generates the path expressions for each folder and creates the basic
topic map in an intermediate XML format (not XTM). A final XSLT
stylesheet converts the intermediate XML to the native javascript
format.

The whole process is driven by a batch file, which puts the
javascript topic map file into a standard directory where the browser
can find it. When the frameset for the application loads, the topic
map gets imported and processed. This processing builds the basic map
and then enriches it as discussed in Section 5.

It is interesting to note that the basic topic map can be
constructed with XSLT. That is, topics are created for the compound
terms and for the bookmarks themselves, and they are related by
associations. The javascript application decomposes the compound
terms, creates topics for the head and tail terms along with the
associations that relate them, and enriches the topic map.

Screen captures

In this section we use screen captures to illustrate some of the
workings of the interface. Figure 3 shows a classic tree view. The
display on the right is the result of clicking “Programming”
under “Books” in the tree view. Notice that the tree view
shows only folders and not actual bookmarks. This design feature is
intended to reduce the amount of information in the view. The
bookmarks themselves get listed when a folder is clicked. In this case
there are three bookmarks in the chosen folder.

Figure 3: Screen shot of the implementation

The image shows a tree view on the left and on the right, the
results of selecting the “Programming” folder under
“Books”.

The figure with its annotations is fairly self-explanatory.
Under Related Terms, the link to
Books is of course a link to the
“head” term of the path expression. The link to Programming under the heading Perspectives and Facets is more subtle. It
lists all tail terms in which the indexing term for the page (i.e.,
Books/Programming) plays the head
role. This amounts to a listing of the last step of the path of the
given indexing term. However, the collection is not a tree and there
is not necessarily a one to one relation between “parent” and
“child”. The same indexing term may play a role in many other
paths. Thus, listing these terms, which were called
“pseudo-facets” earlier, provides navigational shortcuts that
span the collection.

Figure four illustrates these pseudo-facets. It is the result of
clicking on the “Programming” link.

Figure 4: Screen shot of the implementation

The image shows three indexing terms listed under “By
Context”. Each one is related to the topic that is the focus of
the page.

Three indexing terms (that is, path expressions) are listed
under By Context. They all have
Programming, the focal topic of
the display, as part of their paths. This means that there is a
“programming” folder in the tree-like view under the each of
the three topics Books, Logic, and Software.
The user might not have remembered that the other folders also contain
a “Programming” folder, but now she has instant access to
them.

In this example, the focal term always appears on the right hand
side of the paths, but this is not always true, as we see in Figure
5.

Figure 5: Screen shot of the implementation

The image illustrates that the “Related Context”
indexing terms may contain the focal topic at either end.

Notice how the display for Parsers
shows a reversal of order — in two the term comes last, while in the
other one, it comes first. Once again we see how easily the
application brings potentially related links together, in a manner
impossible for conventional bookmark managers.

The last screen capture, Figure 6, illustrates several more
features of the implementation. First, the left hand panel contains
two “co-mention” links. This kind of link was discussed in
Section 4. It comes from removing the “And” from the name
of a term that starts with it, and then linking to the topic that has
that name — in this case, XML.

Figure 6: Screen shot of the implementation

The image illustrates two “Co-mention” indexing terms,
and on the right hand panel, shows a note added to a bookmarked
page. The controls for editing annotations and adding new ones are
also visible.

Once again, the design promotes additional modes of navigation
through the collection.

The right panel shows details about a particular resource.
Normally this view is displayed by clicking on the details link in one of the other views. In
this case, after the data was displayed about the bookmarked resource,
additional navigation was done in the left-hand panel. Therefore,
subjects of the two panels no longer match. In this illustration, a
note has been added to the data for a specific bookmark. The figure
also shows controls to add new annotations, edit existing ones, and to
save the annotations. This display also shows all associations in
which the resource of interest participates. As the topic map is
constructed, the only relationship is the one that links it to its
indexing term. In the future, should any other associations be added
to the resource, they would be listed here.

These static screen shots cannot capture the real power of the
application. A live demonstration is planned for the conference
presentation. The application is responsive in use, although the time
to load a large collection of bookmarks is longer than desirable. For
reference, the javascript-format topic map for the author’s
bookmarks is over 2.5 MB long. It contains nearly 2000 bookmarks and
over 700 folders.

Conclusions

The model delineated in Section 3 seems to work very well. It
is simple and easy to understand once the mental picture is changed from
hierarchies to compound indexing terms. The application is very
effective for the author, who never uses the bookmark capabilities of
his web browsers any more, except to capture new bookmarks. It cannot
replace a good web search engine like Google, of course, but now it is
feasible to search the bookmark collection before resorting to Google.
Previously, it was often better to simply go to the Web even though the
information would be in the bookmark collection.

Even better, the author constantly finds references of interest
that he had lost or forgotten about. It is now rewarding to explore the
collection, whereas in the past that had become impractical because of
its size. So the experiment is a success, even though more development
of the user interface is needed. The original goal of improving
navigation and collocation has been achieved.

Potential improvements

The system certainly can be improved. The user interface is very
busy because there is so much crosslinked information, and clever
ideas in this area would be very welcome. It would probably be useful
to devise a keyword scheme to give fast access to web pages that are
often used. The usual problem with keyword schemes lies in their
management, once there get to be many of them. This, of course, is
another user interface problem, not a technical one. Naturally, the
keywords would become topics in the topic map.

No doubt there are many ways to enrich the information in the
topic map. It is remarkable how useful the system is given how little
analysis is done. However, the variability and even instability over
time that occurs in practical bookmark collections makes it hard to
arrive at good approaches. In one experiment, the author devised a
subschema to designate equivalent terms. Just one pair of such terms
is built in at present, the pair Article
and Paper. A search for
“Article” will also match “Paper”, and vice-versa.
This is useful, but it is unclear how far it would be practical to
extend the set of word pairs.

RDF and other technologies

The basic model is very simple, and could easily be implemented
in RDF, a relational database, or by other means as well. The topic
map pattern was very helpful to the author in the beginning, and
sticking to the pattern has made it relatively easy to see how to
extend the application with new capabilities. The clear distinction in
topic maps between identifiers and retrievable resources is also
helpful. There are capabilities specific to topic maps, such as the
scoping mechanism, that remain to be applied to the model. For
example, scopes could be helpful in delineating larger contexts than
the simple path expressions used to represent the folder structure.
Such possibilities remain for future exploration.

Summary

This paper analyzes the structure of a bookmark collection and
presents a model that represents it effectively as a topic map. User
interface issues are explored. Finally, an example implementation is
presented that uses a modular javascript topic map engine as the core
of a standalone, browser-based, bookmark manager application.

Acknowledgments

The author would like to thank Nikita Ogievetsky, Sam Hunting, and
Steve Newcomb for discussions and suggestions that were extremely
helpful for this work. Thanks also go also to Alexander Johannesen for
creating the Sourceforge project TM4JScript to host the javascript topic
map engine.