D-Lib Magazine
November 1997

ISSN 1082-9873

From the Editor

Castles in the AirSystems, Networks, and Libraries

In its inaugural issue, PreText Magazine featured digital libraries. With Nancy
Davenport, Paul Duguid, Andrew Odlyzko, and Robert
Wilensky, I agreed to participate in the magazine's on-line
forum. It was an interesting experience - partly from the
way we tiered off of comments that had been made, partly
for the relative formality of those comments - sentences
were balanced and paragraphs were clearly crafted - and
partly for the interaction with moderator, Dominic Gates,
who carefully copied us all via e-mail as the comments were
appended and amended, and who periodically tossed a
firecracker into the circle. The following essay is the
substance of my first contribution, which followed a
discussion launched by Andrew Odlyzko on the tensions
among conventional libraries of books and bricks, the
problems posed by trying to archive the web, and the
future of librarians and librarianship.

He concluded that "the combination of human intuition,
skills, and knowledge will likely provide the most powerful
information systems," just as the best chess will be played
by a "combination of human tactical skills and computer
tactical power." That is to say, the wetware in front of the
monitor counts at least as much as the soft- and hardware
behind it. Or perhaps more, since the software and
hardware did not come about absent human intervention,
and they embody human and cultural assumptions about
engineering and the way the world works as surely as the
messy technologies of literature and art. For example, at a
recent conference, a very fine computer scientist told me,
half laughingly, half plaintively, that all the hard work that
went into designing a system seemed wasted, because users
weren't using the system "the way they're supposed to."
The writer's equivalent lament goes something like, "but
can't they understand what I meant?"

The point here is not improving system design by better
user and usability studies, which is a worthwhile discussion.
Rather, I would prefer to pursue a somewhat larger issue,
which is the existence of assumptions about the world
embodied in the engineering and in particular those
assumptions related to digital libraries and networked
information. And there are many, one of the most
fundamental being that we should treat all of the
information accessible via the web - or the Internet - as an
enormous library that should be equipped with the kinds of
tools and services that we associate with an idealized view of
a "library." An obvious illustration of this conception of the
web is Dominic Gates' own
story on the Universal Library -- which is, in Robert
Wilensky's lovely image, "everywhere and nowhere."

If we, indeed, take the library metaphor seriously for a
moment, the first thing we realize is that there are many
kinds of libraries and many kinds of users. And many ways
we look for information. For example, when I'm working like
a journalist, I frequently call rather than use e-mail
because I am working against a tight deadline, or I want to
be able to follow up on an answer in real time. However,
some of us, myself included, enjoy the process of search as a
way to refine and understand a topic; this is particularly
useful when I am acting like a writer working on a new book.
On the other hand, when I am behaving like an editor and I
want to double-check the spelling of a name, a classic
known-item search, then all I want is reliable information
quickly. And most of the existing search engines do that
very well for the authors I work with. So the issue is not
simply whether existing search engines will scale to the web
of the future, although that seems like a laudable albeit
difficult goal. Rather, the fundamental issue concerns
human behavior and technology.

Note, though, the importance of reliability in these
examples. Traditional libraries, archives, and finding aids
possess that attribute in their basic definition; it inheres in
their collections policies, organizational structure, and
selection of journals to abstract and index. For example, I
once found a previously-unidentified document by Robert
E. Lee, who has come to personify the anguish of choosing
among loyalties during the American Civil War (1861-
1865) as well as dignity in defeat. The document was a
routine maintenance report he had written while posted to
Fort Hamilton, New York in the 1840s, noticeable initially
only because the handwriting was remarkably legible and
the sentences were grammatical - at least until I reached
the signature line. I found it in a box of similar reports,
which I was dutifully reading for research on the Fort's
batteries. It is not surprising that the document was filed
there, nor that it had gone unidentified as a Lee autograph.

Now, I think that it is fair to argue that this is precisely the
situation that the digital world promises to remedy. Details
and facts can become findable more easily. But I worry
about losing the value of context. In this example, where I
was interested in the fortifications and not Lee, the fact that
Lee wrote the report was less important than the
information it contained. But for a moment, a vision of a
talented young officer grinding away at routine tasks like
many other talented young officers of his generation was
instantly conveyed by virtue of the location of the
document Moreover, its location - in a box in the National
Archives with similar documents - provided undisputed
provenance and authentication.

I also think that it is fair to argue from this example that
here's a situation in which archiving virtually everything
worked. But while I applaud Brewster Kahle's pioneering
efforts to archive the web and to come up with an
innovative way of searching the extraordinary collection he
is creating, it is worth pointing out what professional
archivists have long known: not everything needs to be
saved - even if the intellectual property issues were
resolved. Heaven help us if I saved the "versions" of stories
where the difference between one "version" and the next
consisted of correcting the spelling. Particularly with
technologies where repetitions can become conflated with
measures of relevance, and where accuracy of the
information is important and subject to change, based on
new research, it seems to me that the human intervention
becomes more rather than less important and hence
discarding misleading information matters.

A good example of an attempt to evaluate sites is described
in the most recent issue of the Tufts University Diet &
Nutrition Letter (the site itself is at http://navigator.tufts.edu).
The authors point out the dangers of misleading or
inaccurate information and explain a ranking system that
they devised which considers several dimensions: depth,
accuracy, performance, and display. People do the
evaluation, but the outcome is computational. Of course,
this is domain-specific rather like traditional pathfinders,
subject bibliographies, or review articles in the print world.
The virtue in the digital world is that maintaining currency
in information contexts like health and medicine where
currency matters is much easier.

Researchers engaged in design of human-computer
interfaces (HCI) would probably be among the first to agree
that many user communities must be served and that HCI,
search, and retrieval should be considered facets of the
same problem rather than bounded disciplines (see, for
example, last January's
story by
Bruce Croft, Ben Shneiderman, and Don Byrd). But I
remain unconvinced that all issues can be solved by
technology - although many can, and the process of
thinking about it tells us much about ourselves and how we
seek, use, and share information. Should we try to make it
easy to go from one kind of data to the next? Where
heterogeneity can be managed and disparate systems made
interoperable? For example, where it is easy to find images
of Leonardo's paintings, analyses of the deterioration of the
paints and successive restorations, copies of his notebooks,
and authoritative discussions of his life, which may now be
physically distributed across several countries and in
several languages? Of course. But the point is that
boundaries and categories possess meaning, and traversing
boundaries among heterogeneous data is useful. Moreover,
we also learn by trial-and-error and by making mistakes,
and we will need tools to support the quest as well as places
for storing the answer. So to strive toward one great
library, some wonderful castle in the air where all
knowledge resides and one query will always yield The
Answer, may be the siren's song rather than the holy grail.