ISSN 1082-9873

When Xerox PARC
(Palo Alto Research Center)
was founded in 1970, its charter was to develop
technology to support the ``architecture of information''
[Pake].
Many important contributions resulted from this call,
including the
first personal computer with a user-friendly interface and
bitmapped
display, the first WYSIWYG text editor, laser printing, and
the
ethernet network that could flexibly connect workstations,
file
servers, and printers in order to provide communication
among many
workers. Perhaps because of the social nature of
information
creation and use, much of the technical research at
PARC has
emphasized the human-computer interaction [Card et al.] and social aspects of
computing.

Today, there is a growing understanding throughout the
technical,
business, and legal worlds of the importance of the social
aspects of
technology, and social factors are recognized as being
especially
important in digital library research [Levy
and
Marshall]. There is a very wide range of
problems in digital libraries that link the social to the
technical.
Issues surrounding the coordination of naming and
cataloging of
documents and other information artifacts, compensating
authors and
publishers while at the same time promoting access to
written works,
and the role of human and automated intermediaries in
information
seeking tasks are being extensively studied.

The digital library-related research at PARC spans the
spectrum from
the examination of the social role of documents to state-of-
the-art
technology for finding documents about social roles. This
two-part
article describes a some of the ongoing digital
library-related research at PARC.

This first part introduces very briefly some recent
investigations
into the social roles of documents and how they are
changing as
digital representations of published works become more
readily
accessible. The second part, to appear next month, will
present
overviews of relevant technologies that support the
creation, capture,
use, search, synthesis, and presentation of documents and
information.

For many years, researchers at PARC have studied what is
known as
work practices, that is, looking closely at the use of
technologies in
specific organizational settings, uncovering the implicit and
perhaps
unrecognized assumptions that workers bring to their
tasks, and
understanding what technical and social channels workers
use to
cooperate and teach one another. There is a very large
literature
associated with this work (see
[Suchman] for references). Most
recently, several work practice researchers at PARC are
helping design
and evaluate the ideas and technologies developed for the
NSF-sponsored digital library project at UC Berkeley [Van House].
This work is ongoing and so is not described in detail here.

PARC has a long history of research on innovative,
interactive
document representations, including significant work in
hypertext (see,
for example, [Halasz et al.]). The
growing influence of the World Wide Web
and the Internet are also contributing to the mutation of
our
understanding of what it means to publish. As is commonly
noted
today, the notion of what constitutes a document is
becoming
increasingly complicated and amorphous. As one example,
documents
containing non-textual media and even dynamic elements
may soon no
longer be considered an aberrant form.

The social role of documents and how this will change as the
influence
of on-line, digital forms continues to grow is quite relevant
to
digital library research. Several PARC researchers have
written on
the role of books and documents in society, past, present,
and future,
and the remainder of this article attempts to acquaint the
reader with
a few of their ideas.

John Seely Brown and Paul Duguid have written about the
social role of
documents [Brown and Duguid].
They argue that documents, both
pre-dating and within the digital age, are as much a means
for creating and
maintaining social structure as they are a means for
constructing and
conveying information. For example, fan magazines and
other cheaply
produced newsletters are often put together at home by
one or two
people and are ``mid-cast,'' that is, sent out to small groups,
and
thus unite geographically scattered people who have never
met,
giving them a common sense of community. This
phenomenon also occurs,
and to a larger and growing extent, in their newer on-line
variants
(often referred to as ``zines'').

One important aspect of this kind of publishing is its
volatility and
how this volatility is reflected in the corresponding social
groups.
Brown and Duguid write: ``[T]he growth of zine titles, both
on and off
the Internet, may also indicate how much more volatile new
documents
make social worlds. The key to forming a new group is
starting a new
publication to help hold it together. Consequently, as
publication
costs come down, formation becomes much easier. ...
Equally, however,
disintegration is also easier. ... [O]nce formed, social worlds
continually face disintegration (as dissenting members split
off into
`sub-worlds'). In the past, the cost of starting a new sub-
group
undoubtedly put limits on dissent. As the costs descend,
forming a
splinter group becomes easier. ... Old paper forms may, then,
have
been a resource for stability.'' However, Brown and Duguid
are
careful to note that they do not claim that documents
themselves
determine social processes. Rather, technology is an
enabler with the
potential to support various scenarios.

Brown and Duguid also discuss the use of documents as a
means for
negotiation. They note that conventional forms of
publishing severe
the link between the original document and the
commentary made on it,
moving comments from the margins to the bottom of pages
to the back of
the book. The rise of hypertext reintroduces the usefulness
of the
document as a means for supporting dialogue and
commentary. Because
writing often promotes more writing, documents can be
used both to
extend debate or as a common basis for agreement. As
another example,
they consider the case of faxes, non-digital document that
can be easily annotated. Annotated faxes show the trail of
an
argument as well as the participants in the discussion, as
comments
are written on comments, and addressees' names are
appended to
addressees' names. They suggest that the popularity of fax
machines,
a non-digital technology, is not surprising precisely because
of the
close analog link between the text and the commentary.

David Levy of PARC writes about a related idea: the
perceived contrast
between the fixity of paper documents and
fluidity of
digital documents [Levy]. He writes
that traditionally one of the
most salient characteristics of documents has been their
fixity, that is,
the fact that their contents remain stable and unchanged
across time
and space, allowing people through the ages to have access
to the same
meanings or communicative intent. Today,
however, with the increasing use of digital technologies, it is
often
asserted that we are moving from the fixed world of paper
documents to
the fluid world of digital documents. Levy argues that all
documents,
regardless of medium, are both fixed and fluid. He
notes that
paper documents are subject to change, as in the fax
example given
above, and that digital documents have fixed properties.
For example,
before someone can edit a digital document, a fixed version
must be
loaded into the word processing program. Only those parts
that are
explicitly edited are changed; the rest of the document
remains
unaltered.

Brown and Duguid comment on the social consequences of
the
fixity/fluidity contrast:

``[T]he fixed, immutable `document' is best
understood not as an
inferior and outdated alternative to conversation or other
types of
unmediated and immediate communication, but, in
appropriate places, as
an object that plays valuable social roles because it
mediates
and temporizes, records traces and fixes spaces, and
demands
institutions as well as technologies of distribution. Attempts
to
introduce time stamps, hash marks, and other forms of
electronic
version identification stress how important to social and
particularly
legal institutions the idea of a fixed state of a document is. ...
Already, many documents retain a constant text while their
links are
continually changed. As the social roles of continuity and
change, of
areas of status and areas open to dynamic revision, are
better
understood, social institutions may develop around this
joint capacity
[of fixity and fluidity] in intriguing ways, much as libraries
developed their usefulness out of the juxtaposition of fixed
individual texts combined to an ever expanding collection
and a
continually revised set of interlinked catalogues. This
interplay
between fixity and fluidity, formerly possible only on the
scale of
collections may now become a central feature of individual
documents.''

Another important aspect of the changing social role of
documents is
the effect on what it means to publish. Geoff Nunberg, in an
essay
entitled ``The Places of Books in the Age of Electronic
Reproduction,'' writes about the interaction between
publication and
the social creation of a body of knowledge [Nunberg]:

``[T]he shift to electronic publication wouldn't
be possible in the
absence of a social organization that enables scientific
communities
to compensate for features of print discourse that are lost in
the
transition. For example, electronic publication by itself can't
canonize an article in the way that publication in a
prestigious print
journal or review can, partly because of the reduction of
editorial
authority, and partly because the form of publication
provides no
guarantee that other members of the community will have
seen the
article. In scientific communities, however, formal
publication isn't
the only or even the most important way of bringing
research to the
attention of the relevant audience. A large part of scientific
discourse is transacted through seminars, conference
papers, exchanges
of photocopies, and most important, in informal discussions
among
practitioners (a type of discourse that electronic
communication
extends and enhances in a very useful ways).''

The discussion above centered around how freely accessible
documents
help to shape the social space. This
section addresses social aspects of document use and
distribution in
the commercial world.

A great deal of attention in the discourse surrounding
electronic
publishing and digital libraries centers on the question of
how
documents can be copied and distributed while at the same
time fairly
and efficiently compensating the authors of the works.
Mark Stefik of
PARC has proposed a set of ideas called Digital Property
Rights that
include a technological component that has the potential to
enable new
forms of exchange and distribution of digital documents and
other
intellectual commodities [Stefik].

Digital property rights take into account the practices and
uses of
documents and their newly mutable forms, and attempt to
satisfy the
needs of publishers and users of published works. The
technological
base rests on the idea of trusted systems, that is, ``a
computer
system that can be relied upon to respect the rules
governing the use
of a digital work.'' A trusted system can keep track of which
rights are
associated with which works and who has access to those
rights.
However, the aspect of the work that is of interest to this
discussion
is the rights language and what it entails about how
published works
are used, and this can be understood independently of the
underlying
technology.

In developing the specifications for a digital rights language,
perplexing philosophical questions such as what does it
mean to make
a copy, and complicated social issues such as how to provide
fair
use of digital documents can be addressed, or at least
clarified to
some extent.

Stefik observes that there is confusion about what it means
to make a
copy. With a photocopier, making a copy means putting
marks on paper
that can be used in the same way as the original. This
analogy also
applies well to the copying of videotapes. However, it does
not
extend well to making copies of documents on computers.
Simply
copying the bits from a network to an input buffer to some
part of
main memory could be considered making three copies of
the document.
But this kind of bit replication does not constitute the
creation of
three usable copies, and this is the critical point. The
usability
of the copy is what is of interest; publishers and authors
should be
able to expect to be compensated for usable copies.

Further extending this idea, Stefik suggests making a
distinction
between a Copy right and a Transfer right. A
Copy right
makes a new usable digital copy without destroying the old
one. A
Transfer right makes a new usable copy and destroys, or
makes
inaccessible, the old one. The Transfer operation is similar in
behavior to a bank transaction in which a customer
transfers money
from one account to another; once the transfer has
occurred the money
no longer exists in the original account. Similarly, when a
person
loans a book to a friend, the lender no longer has a copy of
the book.
Stefik discusses the possibility of a Loan right, which is
similar to
a Transfer right, but has time limits associated with it. After
the
loan period is over the rights to the use of the document
revert
automatically to the lender and the book is no longer
accessible by
the lendee. The transaction could be set up to offer an
extension to
the loan period, potentially for a fee, as well as offering an
option
to buy the work. Furthermore, a loaning library could offer
a
combination of for-free and for-fee usage rights. For
example, a
library could have five copies of a popular book available for
free
and ten copies available for a small fee. Those patrons who
did not
wish to wait for a free copy to become available could pay a
fee for
faster Loan access (but still pay less than would be required
for
purchasing a copy outright), and this fee could be used to
subsidize
more for-free copies.

Stefik argues that if digital libraries made use of a
mechanism like
the Loan right, publishers and authors would not need to be
concerned
about loaning libraries undermining the value of their
digital works
because the number of copies available would be kept
constant. At the
same time, loaned copies will never be lost or turned in late,
because
as soon as the time period has expired the library will
recover the
rights to access its copy of the document.

Stefik points out [private communication] that it is perhaps
paradoxical that given availability of digital documents,
library
patrons might have to wait in order to read a book. On the
other
hand, publishers are trying to find ways to recoup costs and
some fair
way to amortize costs across users. The ``conservation of
copies''
idea is one way that we already understand to amortize
these costs,
but it is not the only way, and alternative models should be
considered.

Once digital property rights are established, mechanisms
are needed
for fast and efficient distribution of the services those rights
support. Bernardo Huberman, Tadd Hogg, and other PARC
researchers
have been studying the social and computational aspects of
large
distributed systems. (See
[Huberman et
al.].) In the context of global distributed markets,
Huberman has
developed algorithms for what he calls Market Based
Document
Services. He notes that current technology for
document services
is predominantly manual, requiring the user to be aware of
what
services are available in advance. Having the user specify
the
service can also lead to inefficiencies, since the user
probably does
not know about the most appropriate and cost-effective
resources. For
example, a high-quality printer may be preferable for some
tasks, but
the user might not know of the existence of such a printer,
or how to
send documents to it. To improve this situation, Huberman
has developed a novel way of providing document services
which relies
on computer-mediated auctions. These auctions
automatically
pair the needs of the user with the best matched providers,
using the
Internet as the communication medium.
Based on previous experience with auction based
algorithms for
resource allocation in distributed computer systems he
conjectures that
this mechanism will be fast and efficient
enough to lead to true market fair prices, large savings for
customers
and good matches between the needs of the user and the
available resources.

This article has given brief account of how some PARC
researchers
expect the nature and use of documents to change as digital
libraries
and electronic publishing continue to expand in importance.
Part II
will describe some of the technology created at PARC in
support of
these emerging phenomena. The focus will be on three
main areas:

Capture, analysis, and presentation of document
images, including document image decoding, image
search
and retrieval and creation of new paper presentations that
combine
information from multiple sources. Information access and visualization, including
search, browsing, and visualization of large text collections,
summarization, category assignment, and automatic
detection of
thematic structure. Middleware for the support of document services,
including
a system architecture to support connectivity of distributed
document
services and a uniform programming interface to document
management
systems.

Some of this work is being used in the NSF-sponsored digital
libraries
projects.

[Van House]
Nancy Van House, ``User Needs Assessment and Evaluation
for the UC Berkeley Electronic Environmental Library
Project,'' in the
Proceedings of Digital Libraries '95: The Second
International
Conference on the Theory and Practice of Digital Libraries,
June
11-13, 1995, Austin Texas.
http://www.csdl.tamu.edu/DL95/contents.html