Journal of the American Society for Information Science (JASIS) -- Table of Contents

We begin this issue with four diverse papers on clustering as a
retrieval
method and end with three even more diverse papers on user study.

Research

Order-Theoretical RankingClaudio Carpineto and Giovanni Romano

First we have Carpineto and Romano, who make use of a
clustered document
file based upon set inclusion relations among terms, merge queries
into the
clustered document space and consider the shortest path between a
query and
document as the basis of a retrieval status value. Typical
hierarchical
clustering methods do not produce all likely clusters due to
arbitrary tie
breaking, and fail to discriminate between documents with
significantly
different degrees of similarity to a query. In their concept
lattice
ranking (CLR), a lattice is built on the basis of term
co-occurrence in
documents and supplemented rather than totally re-computed with
the
addition of each new document or query.

Using the CACM and CISI collections and queries, weighted
term vectors
were computed to be used in best match retrieval, and a
hierarchical single
link clustering using cosign ranking, for comparison with CLR.
Lattice
construction took 15 minutes for CACM and 2 hours for CISI. Both
best match
and CLR return better precision and recall measures than
hierarchical
clustering, but little difference appears between the two. A
comparison of
CLR and hierarchical clustering on unmatched documents was then
carried out
using expected search length as a measure. CLR outperforms and may
be
useful in discovering non-matching relevant documents.

A Linear Algebra Measure of Cluster QualityLaura A. Mather

Mather proposes a new measure of cluster effectiveness
independent of
knowledge of retrieval measures computed for queries on the
clustered file,
and based on the theory that the clustering quality of a term
document
matrix is determined by the disjointedness of the terms across the
clusters. The ideal clustering case is that where terms which
occur in one
cluster occur only in that cluster, or, that is to say, are
mutually
exclusive across clusters. Such clusters occur if and only if the
matrix is
``block diagonal,'' that is to say, has rows and columns that can
be
permuted to produce a matrix that has some set of blocks on the
diagonal of
the matrix that contain nonzero elements, while the remainder
contain zero
elements. The singular values of each of the blocks of a block
diagonal
matrix are the same as the singular values of a block diagonal
matrix when
terms are disjoint and as the structure diverges from block
diagonal the
two sets of singular values diverge as more term intersection
occurs. A
measure of the distance between the singular values of the term
document
matrix and the cluster matrices indicates cluster value, but is
difficult
to interpret. By taking random permutations of the matrix and
creating
clusters one can approximate the mean and standard deviation and
by
subtracting the mean from the actual observed clustering and
dividing by
the standard deviation of the samples, one can produce the number
of
standard deviations from a random clustering for the observation.
These
values can be compared to indicate the best clustering. The
computation of
the singular values of many large matrices is required and would
be
expensive. Experimentally the metric correlates significantly with
Shaw's F
and with the precision measure, increasing as these measures
increase.

Dominich reviews the basic retrieval models concentrating
upon the vector
space and probabilistic representations. He shows that these
retrieval
models define systems of vicinities of documents around queries
which can
both be represented by a similarity space and thus have a unified
mathematical definition.

Zhu and Chen compare the performance of their
Geographical Knowledge
Representation System with image retrieval by human subjects.
Gabor filters
are used to extract low level features from 1282 pixel tiles cut
from
aerial photograph images. A 60 feature vector describes each tile
and a
Euclidean distance similarity measure is used to sort the tile
images by
least distance. Adjacent similar tiles are grouped to create
regions which
in turn are represented with derived vectors. Kohonen's Self
Organizing Map
(SOM) is created showing tiles representing the textures to be
found in the
data. Clicking on these displays the tiles in the same category.

Thirty human subjects were assigned an image and six
randomly selected
reference tiles to score for similarity to each of the 192 tiles
in the
image. A second group of ten subjects were asked to draw lines
around areas
they found similar to the reference tiles. A third group of ten
subjects
were given the SOM selected reference tiles and asked to
categorize each
tile in the whole image into categories represented by these
reference
tiles. The system exhibited no significant difference in precision
from the
human subjects but preformed less well on recall. Humans selected
more
tiles viewed as similar and the top 5 system and subject tiles
were
consistently different. Both had difficulty with tiles where
texture alone
did not distinguish one from another. In tile groupings into
regions,
humans out preformed the system on both measures but in image
categorization no significant difference existed. Adding features
other
than texture may help performance which is close to inexpert human
performance.

How Can We Investigate Citation Behavior? A Study of Reasons for
Citing
Literature in Communication Donald O. Case and Georgeann M. Higgins

Case and Higgins review the previous studies providing
lists of reasons
for author's citing behavior, and studies using these categories
where
investigators classify citation behavior on the basis of content
analysis.
They also reexamine the smaller set of studies involving surveys
of authors
as to the reasons for their behavior. Choosing the two most highly
cited
authors appearing in both of two recent studies of the
Communication
literature all citations to their work in the years 1995 and 1996
were
collected. 133 unique citers were identified and sent 32 item
questionnaires with the questions from a recent study in the
Psychology
literature. Returns from 56 were received, 31 for author A and 25
for
author B, and responses for the two authors were not significantly
different. No new reasons for citation were identified. The top
reasons
were a review of past work, acting as a representative of a genre
of
studies, and as a source of a method. Negative citation is quite
rare.
Twenty five not redundant items with some indication of importance
were
subjected to a factor analysis. Seven factors explain 69% of the
variance;
classic citation, social reasons, negative citation, creative
citation,
contrasting citation, similarity citation, and cite of a review.
Factors
predicting citation are; perception of novelty and representation
of a
genre, perception that citation will promote cognitive authority
of the
citing work, and perception that the cited item deserves
criticism.

In the Bilal study twenty two middle school students were
assigned a
question to search in Yahooligans! as part of their Science class.
The
teacher provided ratings of the children's topic knowledge,
general science
knowledge, and reading ability. A quiz administered to the
students
indicated knowledge of the Internet and of Yahooligans! in
particular.
Lotus ScreenCam was used to record 18 of the student system
interactions.
Student's transcribed moves were classified and counted with a
score of one
(relevant) for selection of a link that appears appropriate and
leads to
the desired information; .05 for the selection of a link that
appear
appropriate but is not successful, and 0 to the selection of links
that
give no indication of information leading to success. Weighted
effectiveness and efficiency scores are then computed.

Thirty six percent initially browsed subject categories
while the rest
entered single or multi-word concepts. Key words and in some cases
natural
language were used in subsequent moves despite the fact that
Yahooligans!
does not support natural language search. Subsequent activity
mixed
browsing with term search. Looping and backtracking were very
common but
the go button using the search history links was unused. Most
children
scrolled but not often the complete page. Half were successful but
all were
inefficient.

Crabtree et al. object to traditional ethnographic
analysis as applied to
information problems on the basis that the application of
pre-defined rules
and procedures yields an organization of the activity observed
from the
point of view of the analyst rather than that of the participants.
Such a
``constructive analysis'' approach does not describe the actual
activities,
but in the name of objectivity imposes a structure which obscures
the real
world practices through which subjects make sense of their
surroundings,
and produce information.

Ethnomethodology emphasizes rigorous thick description of
local practices
by assembling concrete cases of preformed activity as the direct
units of
analysis. EM analysis attempts to generate a description in great
detail of
how the described activity could be reproduced in and through the
same
practices. Such description provides a sense of the real world
aspects of a
socially organized setting to systems designers and thus provides
the
exceptions, contradictions, and contingencies of the activities
that
otherwise might not be evident. Practitioners of ethnography and
computer
system design have quite different cultures but communication can
lead to
far better design practices.

We begin this issue with Bakar et alia's evaluation of string
matching
methods on Malay texts. Much of current post 1960 Malay text is in the
Rumi
alphabet, a romanised system based on English phonemes. English
conflation
algorithms can be used effectively. Because of prefixes and infixes
stemming alone is not effective, and the addition of n-gram matching is
required. Using a data set with 5085 unique Malay words and 84 query
words,
eight phonetic code lists were created using four coding methods from
stemmed and not stemmed dictionaries. One hundred words surrounding a
matched key are chosen, equally above and below unless too close to the
top
or bottom of the list. Stemming proves to be very helpful, as does
phonetic
coding. It seems that smaller key sizes perform better. Diagram, an
existing string matching algorithm, gave the best relevant and retrieved
results.

Within application domains users with common objectives create
heterogeneous databases to store and manage similar data types. Usage
patterns indicate the knowledge of the users. The notion, for Srinivsan
et
alia, is to create a ``middle layer'' of concepts extracted from similar
patterns in existing systems and from the use of these systems, which
can
wrap the existing databases and provide a common access mechanism.
Entities
defined in existing systems as sets of variables, are extracted and
classed
using similarity measures based on commonality in structure and use
patterns. Those classed together represent a common application specific
generic concept.

For each class user group pair a ``group data object'' is
created. A
tree of ``group data objects'' that represents user types at different
levels of specificity is generated from user supplied terms and query
extracted terms from each user type. A user is mapped into a user type
and
then the appropriate group data objects are generated and their labels
displayed to the user for selection. Selection generates the extractors
from each database for that user type in that group data object. Three
medical databases clustered yielded eight concept classes and multiple
user
objects were created. Tests showed varied query production in the same
concept classes for the various groups.

After reviewing the literature of evaluative web search tool
research,
Nicholson replicates the 1996 Ding and Marchionini search service study
ten
times during the Summer of 1998. Previous work finds replication yields
significantly different results over time. The first twenty pages
returned
by Infoseek, Lycos, Alta Vista and Excite for the five queries were
examined and ranked between 0 and 5 for relevance. Differing engine
rankings for each replication are the rule. Using two queries, one
designed
to have a stable answer and another a dynamic answer over time, the four
systems were tried again on five successive weeks. New pages appearing
in
the first 20 pages in each successive week were counted, as were pages
that
changed ranked position. Both queries showed considerable change week to
week. The results were aggregated and the frequency of the engine with
the
highest number of relevant documents found to show a replicable pattern
over all weeks, the odd weeks, and the even weeks. This pattern provides
a
clear ranking of the five engines, which was not determinable from the
individual replications.

The Personal Construction of Information SpaceCliff McKnight

An information space, according to McKnight, is just the
objects, real
or virtual, an individual uses to acquire information. A repertory grid
is
a means of externalizing a person's view of the world where a triad of
elements is presented and the subject asked to find how two are the same
and the third different. The focus that makes this possible is given a
rating scale with extremes for both poles, and called a construct.
Multiple
constructs with element ratings provide an individuals view of a domain.
Eleven information sources were elicited from a University lecturer and
presented as triads. Ten constructs were elicited and the elements rated
on
the constructs. A cluster analysis reorders the grid so similarly rated
elements and similarly used constructs are adjacent. Both construct and
element clusters seem to make sense and likely reflect the subject's
views
of his information space. It remains to be seen if parts can be shared
with
other subjects.

Schamber uses her weather information data collected by
time-line
interview techniques and content analysis to address the effectiveness
of
these techniques. By soliciting a sequence of events where weather
information was needed and sought, and soliciting the one event in the
sequence where information was most actively sought, the key event, and
those before and after it could be studied in some detail. The time-line
technique provides an unobtrusive means of collecting data on
perceptions
and yields rich data. It is, however, a labor intensive method. The
content
analysis was also unobtrusive and effective, but also very labor
intensive.
In this framework criteria are best defined from user's perceptions,
which
are indicated with validity from self reports.

Abstracts Produced Using Computer AssistanceTimothy C. Craven

Craven evaluates abstracts produced with the assistance of
TEXNET, an
experimental system which provides the abstractor with text words and
phrases extracted by frequency after a stop-list pass. Three texts of
approximately 2000 words each were chosen and for each text a set of 20
different subjects drawn by advertisement within a University community
created abstracts using TEXNET. Half got a display of keywords occurring
eight or more times, and half got a display of phrases of the same
occurrence. All subjects were surveyed as to background and reaction to
the
software, provided with a demonstration of the software, and told their
abstract should not exceed 250 words. Nine of these, including the
author
abstract, were read by three raters again recruited by advertisement.
Analysis shows no correlation between keywords or phrases and quality
ratings or usefulness judgements by subjects. Experience did not lead to
conciseness, originality or approximation of the author abstract. Female
gender correlated positively with length and use of words from the text.
Subjects wanted to view text and emerging abstract simultaneously, easy
scrolling, standard black on white screens, a dynamic word count and
spell
checker.

Encounters with the OPAC: On-Line Searching in Public Libraries Deborah J. Slone

Slone looks at the behavior of OPAC users conducting known
item, area
(broad search with most refinement off line), and unknown item searches
in
a public library. Thirty six participants, who approached the terminals
and
agreed, answered a pre-search questionnaire on OPAC experience, reason
for
coming, and length of time spent planning their search. They were then
observed, and their searching terms, comments, reactions, age, gender,
time
on line, and outcome logged. Feelings were inferred from observation and
noted except that confidence level was solicited in the questionnaire.
Twenty eight began confident, but only 14 displayed confidence during
their
search. Successful unknown item searchers began broadly, and focused
with
terms selected form initial results. Area searchers searched broadly for
a
general area and focused at the shelves using minimal computer
resources.
Known item searches were quickly effective at the terminal. Frustration,
anxiety and disappointment abounded during unknown item searches.

When disparate bibliographic databases are integrated
different
authority conventions prevent physical combination and require a mapping
that hides the heterogeneity from users. French, Powell, and Schulman
advance automated techniques for the assistance of those maintaining
authority for author affiliations in the Astrophysics Data System.
Strings
were extracted, clustered, reviewed by a domain expert and iterated to a
final form. Concentration was on an ideal set of 38 institutions
represented by 1,745 variant strings, with a goal of properly clustering
these while excluding instances of the other 12,139 identified strings
in
the ideal clusters. First a lexical cleanup was run removing uppercase,
country designations at the end of a string, as well as ZIP codes, and
state abbreviations, and expanding a list of abbreviations. Then string
and
frequency of occurrence pairs are sorted and beginning with the most
common
string its distance to all other strings is computed and those exceeding
a
threshold are clustered with the most common item and removed from
consideration. The process is iterated until the file is exhausted.
Tested
distance measures are: edit distance i.e. the number of four simple
operations required for transforming one string to another, edit
distance
with words rather than characters, and the Jaccard coefficient. Allowing
the threshold to be some fraction of the length of the shorter string
improves results over a fixed threshold but higher thresholds required
to
cluster all variants still result in significant errors. Required human
effort rises with the number of misplaced strings but such effort is
reduced roughly in half by the clustering procedure.