Search in Topic Maps portals

One of the features that sets Topic Maps-based portals apart is
their support for search, which is generally better than in ordinary
portals. However, implementing search in any given portal generally
requires lots of discussion with the customer and interaction
designers, and it's not always clear what is the best approach.

As part of my work on semantic searching (about which more later) I
looked at how various Topic Maps-based portals have approached search
so far. This informal survey doesn't actually fit anywhere, so I
thought it might be just as well to make a blog entry out of it,
rather than throw it away. I've also added some recommendations based
on the experiences I've had so far.

The survey

I looked at the following aspects of the search implementations in
a number of Topic Maps-based portals:

Pre-filtering

Whether users are given the chance to apply some kind of
structured filter before they search, using a drop-down
list of categories or by other means.

Post-filtering

Whether users are able to filter search results after
they have been displayed.

Grouping

Whether or not search results are displayed grouped by type.
Here I don't mean whether the type is displayed, but whether the
entire layout of the results is organized so that topics of the same
(or similar) types are displayed together as a group.

Categories

Whether or not categories can be found in search.

Topic types

Whether or not topic types can be found in search.

Description

Whether or not statements about the topics (beyond just the
name and type) are used to provide more information about the topics
found.

I've put "Y-" in some cases where sites have a feature, but only in a
rather limited way.

Some observations

Pre-filtering is quite rare, and in general it does not
appear to work very well. The main problem is that people are
reluctant to filter before they search because they have less of an
idea what the filters mean before they have searched. And in any case
they don't know if they need to filter until afterwards, and
it's just as easy to do it then.

Post-filtering is quite widely supported, and is very
powerful, and is definitely one of the aspects of Topic Maps-based
portals that have worked very well. However, it's easy to make the
filtering interface too crowded and complex for users. So the main
challenge here seems to be to make this rather complex feature
intuitive to use for non-technical people. In other words, at least
with Topic Maps this is more of a design challenge than a technical
challenge.

Grouping is not very common, and my experience with it has
been very negative. It can seem attractive at first, but users don't
expect it, it makes it more complicated for them to "parse" the
results page, and it makes it harder to scan the list of results. The
worst thing, however, is that it breaks the relevance ranking, since
the ordering of results is determined by the order of the groups.

In the City of Bergen portal there are four groups, which means
that on average in 3 out of 4 cases the best hit will not be listed at
the top. This is not because of some limitation in the ranking of
results, but because the order of the groups is fixed. In other words:
vertical grouping defeats ranking of results. Fuzzzy.com also has
grouping, but only into two groups, and as it does this horizontally
it works better. (I'm still not sure this is a good idea, though.)

Nearly all sites allow you to find categories when
searching, but in several cases I've been involved with getting the
customer to agree to allow this has been a real uphill struggle. I
don't know why, but to customers it often seems wrong that categories
should be findable through search. The winning argument for making
them findable has been that the customer typically spends considerable
effort on collecting the most relevant set of content possible under
each category. If the user then types the name of a category, why not
offer what is in effect a hand-made page of search results to the user
among the other search results? Nobody has been able to formulate a
good reason not to that I've heard.

Topic types cannot be found via search in any portal that
I've seen, and this is probably because the portals tend not to have
any pages for the topic types. This makes sense given that a list of
all persons or articles in a portal is rarely very useful. Still, this
is a search that people perform, and it's not really clear that this
might not be useful.

The descriptions of search hits are very limited in most
portals, but some Topic Maps portals go much further in this regard.
In the Kulturnett portal, for example, search hits for "Ibsen" are
described as "book by Atle Næss," "museum in Oslo", "author", etc, and
these descriptions are structured. This is a very useful feature for
users, since it tells them much more about what they've found without
taking up much visual real estate. I think many of the portals which
left this out did so because their ontologies are so weak that they
cannot really describe the topics much.

How do you measure "the quality of the relevance ranking" and what do you consider to be good relevance? As for the topic map portals i've been involved in we have connected weighted values to both topic types and occurrences for ranking of the search results. E.g a 'person-typed' topic would be ranked higher in the search results than an 'article-typed' topic as we normally would consider a 'person-typed' topic to be of higher importance. So far this has shown to be a ok solution.

Lars Marius - 2007-08-21 15:00:21

The "measurement" of ranking quality was done very unscientifically, but trying out a couple of searches for things I knew were in the topic maps and seeing how the presumed best hits were ranked.

Typically, if I search for part of a person's name, and an article with no obvious relevance to the name is ranked before people with that name, I consider that poor ranking. And so on.

None of the portals I've worked on yet do this kind of score weighting that you mention, but I agree that it's a good idea. I've been pushing for it for a while in various contexts, and it looks like Bergen is implementing it now. I want to go further and offer it as part of the OKS product out of the box, but that will take a while yet.

This is indeed an interesting topic. I think the reason for Topic Maps based portals generally having better search is obvious: the search leans on an underlying semantic structure. Or at least it should do.

Most CMS's have poor search facilities because they have weak or no support for semantic structures. It is also very common to find that the search application knows nothing about the site's structure. It behaves like being in a vacuum. I also think many information architects underestimates the importance of having the search function build on the site's semantic structure (I am going to give a talk on this on EuroIA 2007 in Barcelona in mid September together with my colleague Nils Arne..).

The problem with some Topic Maps based search facilities is the tempting "let's show them all we know". The key to good searching is to restrict the features to the most useful and not try to show all the information in the Topic Map. The most useful feature I think often boils down to categorisation by topic type.

Your survey is interesting but the table leads one to think that the more "y"'s the better. I don't think this is the case. One thing is that the quality of the semantic structure, the ontology, is extremely important in order to get a good result. When Bergen divides information in services and articles it is a very bad categorisation. They are obviously not talking about the same things. And the search will suffer from this unclear division.

Lars Marius - 2007-08-25 05:21:54

I definitely agree with your comments on CMSs. Many of them don't even support searching of PDFs etc that get attached to articles, and in general they could do much better than they do. So could the Topic Maps systems, admittedly.

I definitely agree that more Ys is not necessarily better. I'm skeptical about showing topic types, and I think the "grouping" and "pre-filter" columns are downright negative. But I actually collected this data because I think these are negative; I wanted to be able to show future customers this and say: look, these features are not very popular, and that's for a reason.

Looking forward to seeing your EuroIA slides, and, I hope, a blog posting about the talk.