ISSN 1082-9873

1 Introduction

The Internet and World Wide Web (Web) afford libraries new possibilities
to disseminate information. For instance, many libraries are already
offering on-line public access catalogs, public ftp sites, or
repositories of Internet resources. In the near future, digital
documents will belong to the document collection of a library
as well. Since these trends are changing the definition of what
a conventional library is, the terms "virtual library"
and "digital library" have come into use.

In this paper, we only use the term "digital library".
In accordance with Gladney, Fox et al.
1994,
we view a digital library as extending the holdings of a conventional
library into digital documents and Internet resources. An Internet
resource is a link to other digital documents which are stored
elsewhere on the Internet. Thus, only the link is under control
of the library, not the document to which the link points. Additionally,
a digital library provides digital catalogs, containing meta-data
about the holdings (i.e., digital documents, Internet resources,
and physical documents like books, journals, etc.). Finally, a
digital library must accomplish, as far as possible, all necessary
services of conventional libraries and must also exploit the advantages
of the technology used. According to Nuernberg, Furuta et al.
1995,
a "digital library system" consists of several components.
To set up and to use a digital library requires a client and server
computing system and tools supporting interaction among people
and between people and client or server software.

Normally, digital libraries are based on the Web server technology.
This enables patrons to access the library using a Web client
like Netscape Navigator or Microsoft's Internet Explorer. These
days, a lot of different Web servers are available (e.g., httpd from NCSA,
or the different servers provided by
Netscape).
In several projects these
servers have proven useful for building digital libraries. Beyond
this, a second generation Internet information system exists,
namely Hyper-G or HyperWave (we
will be using the term "Hyper-G" throughout this paper).
Since Hyper-G has already become widely adopted, our research
group at Dortmund University decided to use Hyper-G to set up
a digital library system called DogitaLS1. DogitaLS1 is an acronym
for "The Dortmund Digital Library System of LS1" (LS1
is the name of our lab at the Computer Science Department of the
Dortmund University). LIBERATION
is another European project in which Hyper-G is being used to
set up a digital library. The main idea of this project is to
distribute already existing electronic information to libraries
(e.g., via CD, local or wide area networks). Since this project
has just begun, first results are not yet available.

In this paper, we report about our experiences with using Hyper-G
as an underlying server technology for our digital library system.
To do this, the remainder of the paper is structured as follows:

Section 2 gives a brief overview of Hyper-G. Then, section 3 describes
the structure of DogitaLS1. It also shows how we exploit the concepts
of Hyper-G for organizing the holding of DogitaLS1. In section
4, we report about our experiences with using Hyper-G. This includes
a list of requirements for future Internet information systems
that aim at being used for digital library systems. Finally, we
give a brief conclusion in section 5.

Remark to the references used:References to documents available on the
Internet are made explicit by links pointing to them. All links
have been proven to be defined in September 1996. Once the paper
is published, any changes of URLs used in this paper will not
further be updated by the authors. Thus, undefined links might
be possible at some time in the future.

2 Hyper-G at a glance

Hyper-G is a second generation Internet information system that
is being developed under the leadership of Herrmann Maurer and
Frank Kappe at the Institute for Information Processing and Computer
Supported New Media (IICM)
at Graz University (Austria). Hyper-G comprises server and client
software. The server
software is available for the several different operating systems including
SUN Sparc and IBM AIX. The client
software runs
under Microsoft Windows (Amadeus) and under Unix (Harmony). Hyper-G
complies with the WWW; widely accepted Web clients, like Netscape
Navigator, can be used to browse a Hyper-G server. Hyper-G clients
can be used to browse a WWW server like NSCA's httpd.

an object-oriented database to store documents and
meta-information about them (i.e., documents are not stored in
the file system of the machine on which the server is installed).

an integrated search engine for meta-information such
as titles, keywords, and author as well as for full text. The
search engine even allows iterative search;
that is the result of a search query can be defined to
be the search space of further queries.

a link database in which all links are stored. Links
are separated from documents and they are also bi-directional.
This is exploited to guarantee link consistency in Hyper-G.

a pre-defined set of attributes which can be defined
for each Hyper-G object (e.g., the keyword attribute can be used
to store meta-information about collections or documents.).

3 DogitaLS1

The aim of our research is to examine Hyper-G for defining organizational
structures for digital libraries. To do this, we set up a digital
research library for our lab providing a heterogeneous document
collection, i.e., a collection containing:

meta-data for physical documents,

digital documents including meta-data, and

Internet resources.

In addition, we place a special emphasis on services for different
type of users (e.g., librarians and patrons) as well as communication
services. A more detailed description of the communication services
is omitted here. We recommend our technical report "A first Step Toward Communication in Virtual
Libraries"
for those who are interested in this area.

3.1 Organizational Structure of DogitaLS1

We installed six collections at the top level of DogitaLS1. The
collections "Catalogs", "Digital Documents",
and "Internet Resources" serve for storing meta-data
about physical documents, digital documents including their meta-data,
and Internet resources, respectively. The collections "Services"
and "Workspaces" are needed for general services and
communication services. Finally, an on-line help is provided in
the collection "On-line Help" (c.f. figure 3.1). In
the next sections, we describe how we took advantage of Hyper-G
concepts for the integration of documents belonging to the different
collections.

Remark: All screen shots
are taken from Harmony.

Figure 3.1: Top level of the collection hierarchy in DogitaLS1

The collection "Catalogs"

The collection "Catalogs" includes an alphabetic catalog
in which meta-data about all books, journals, etc.
of our research library are stored. A question was how to model the corresponding Hyper-G documents so
that we can take advantage
of the integrated search engine of Hyper-G. One possible solution
could have been to use attributes that can contain meta-information
about a Hyper-G document. However, except for the keyword attribute,
all the pre-defined attributes relate to a Hyper-G document and
not to books or journals about which a Hyper-G document contains
information. For example, assume the user tochterm creates
a Hyper-G document providing meta-information about a book written
by Aho and Ullman. Then, the attribute Author of the
Hyper-G document contains the name tochterm and, thus,
it cannot be used to store the names of the two authors Aho and
Ullman. We therefore decided to put important information (authors,
title and publication year) about books, journals, etc. into the
title of the corresponding Hyper-G document (c.f. figure 3.2).
Some other useful information, like umbrella words, is stored
in the keyword attribute. (Note, the Hyper-G document itself contains
much more meta-information about the cataloged book or journal.)
As a result, a title search can be used to search for books by
author's name, title or publication year. In addition, a keyword
search can be used for searching by means of umbrella words.

Figure 3.2: Sample title for documents in the collection
"Catalogs"

The collection "Digital Documents"

Digital documents (e.g., research reports and other publications
written by members of our lab) are stored in this collection.
To ease browsing through the document collection, we set up a
small systematic catalog covering the different areas in which
members of our lab do research. We had three requirements to meet
when we started thinking about modeling digital documents in Hyper-G:

each document should be available in different formats (e.g.
HTML and Postscript);

the title scheme for documents should be the same as the one
used in the collection "Catalog", and

meta-data should be separated from documents.

To capture this intention, the use of digital objects as
introduced by Kahn and Wilensky seemed appropriate to us. In their
notion, a digital object can be regarded as a content-independent
package. The principal components are a unique identifier for
the digital object (its handle), and data. The data in the digital
object package is, itself, a container for streams of bits that
may take multiple forms (e.g., Postscript
or HTML).

In DogitaLS1, we modeled digital objects by means of collections.
Note, that we have now two different notions of collections: 1.
we use them to define the overall structure of DogitaLS1, that
is as containers for documents; 2. we use them to model digital
objects. This might be irritating from a conceptual perspective.
On the other hand, there was no other way in Hyper-G to represent
digital objects. Figure 3.3
sketches three digital objects belonging to "Information
Systems" in the classification scheme for Digital Documents.

When the collection is being created, Hyper-G assigns a server-wide
unique handle to it. Each digital document is embedded in a collection
that also contains meta-data about
the document and different formats for the document (normally
HTML and Postscript). The meta-data is stored as a collection
head. In Hyper-G, a collection head is a special document which
is always displayed, when a user (e.g., librarian or patron) accesses
a collection. In this way, whenever a digital object is accessed,
the user gets information about the data stored in this digital
object. The following figure shows a digital object represented
as collection in Hyper-G. The document "Abstract: Kommunikation
in virtuellen Bibliotheken" contains an abstract and meta-data
about the document. The actual document is available in two different
formats, HTML and Postscript.

Figure 3.4: Example for a digital object in DogitaLS1

The collection "Internet Resources"

In this collection, links to interesting resources available on
the Internet are collected. Each link is represented as a Hyper-G
document. As a consequence, meta-information about each link can
be stored in the keyword attribute of Hyper-G documents. Due to
this, a keyword search for links is possible in DogitaLS1.

Links to resources in other Hyper-G servers are displayed in the
same manner as if the resource were stored in DogitaLS1. This
is a big advantage for setting up distributed digital libraries
based on Hyper-G. The reason is that in distributed digital libraries,
users need not know where documents are stored - whether locally
or on remote servers. Among others, the following figure displays
a link to the "Journal of Universal Computer Science"
which is stored on a server in Graz
(Austria). However, the user interface displays this link in the
same way as a collection stored on the server of DogitaLS1.

Figure 3.5: Links to other resources in the Internet

The collections "Services" and "Workspaces"

The current version of DogitaLS1 provides several services (written
in PERL) adapted to the needs of librarians and patrons. All these
services are stored in the collection "Services". The
Common Gateway Interface mechanism is used to provide interaction
among users and between users and DogitaLS1.

In the collection "Workspaces", librarians and patrons
have their own workspace where they can store private documents,
private links, etc. Besides private
documents, logical copies of all the services needed by a given
librarian or patron are stored at this place. To do this, we exploit
the copy document function of Hyper-G. The advantage
of this approach is that all the services are available at a central
place in DogitaLS1. This is a big relief when new librarians or
patrons are to be added to or to be removed from the system. For
adding a new user, we only have to create a new workspace and
add copies of the services needed by the new user. For removing
a user, we can just delete his collection without worrying about
the documents and services stored therein.

4 Experiences in the project

First, this section describes our experiences with Hyper-G. We
also give a list of items that are not covered by Hyper-G but
would have been useful for our digital library system.

4.1 Valuation of concepts currently available in Hyper-G

In general, both the server and client software run without severe
problems. We are also grateful to the development team in Graz
which normally helped us immediately when we had technical problems.
Finally, a newsgroup (comp.infosystems.hyperg) may be used for
asking other Hyper-G users questions. Thus, a lot of technical
support exists for Hyper-G. Besides these more general comments,
we evaluate concepts provided by Hyper-G with a special respect
to digital libraries.

Collections

Collections in Hyper-G are very useful to define the structure
for the holding of a digital library. Since collections in collections
are allowed, it is also possible to set up more complex structures
for organizing the document collection.

Link consistency

Normally, more than one attempt is required for defining the organizational
structure of a digital library. Unfortunately, new requirements
arise usually after documents have already been stored in the
library. In addition, at that time, links between documents have
already been established. Since re-organizing the document collection
in a digital library means moving documents from one collection
to another, links are most probably affected. By contrast to "traditional"
servers where a link denotes a path to a file on the server machine,
in Hyper-G, links are objects separated from documents. The advantage
of this is that in Hyper-G documents can be changed between collections
without worrying about dangling links. This was a useful feature
for prototyping several different collection hierarchies in DogitaLS1.

Access rights

Different parts of a collection can be made accessible to different
types of users using access rights. An advantage of the concept
of user accounts in Hyper-G is that they do not depend on user
accounts on the server or on any other machine. Therefore, it
is not necessary that patrons must have an account on another
machine. A further advantage is that even links can be protected
by access rights. This means that link filtering is possible.

External tools

Hyper-G provides a lot of external tools (e.g., hginstext,
hginscoll, and hggetdata for inserting text documents in
the server, for inserting collections in the server and for retrieving
documents from the server, respectively). We could exploit these
tools in different CGI-programs (e.g., for inserting already existing
on-line-data of our research library in DogitaLS1).

Aspects of distributed digital libraries

In the near future, the holdings of digital libraries will not
only be based on one server but will be distributed among several
servers. In three ways, Hyper-G already covers aspects for distributed
digital libraries:

In distributed digital libraries, a document is normally stored
only on one server. However, it will probably be helpful to have
logical copies of the documents on other servers of the digital
library as well. With the copy document function, such
a concept is already available in Hyper-G (c.f., The collections
"Services" and "Workspaces").

Clients display all documents and collections in the same
way regardless of whether they are logical copies, links to documents
on other Hyper-G servers, or the physical
document on the server to which the client is connected. Thus,
users are not aware of where the documents are stored. As a result,
a distributed digital library appears to the user in the same
way as a digital library that is not distributed.

Link consistency is an important issue for distributed digital
libraries since documents on the different servers will probably
be interconnected densely. Fortunately, Hyper-G also guarantees
link consistency for links that connect documents on different
Hyper-G servers to one another.

4.2 Requirements for future systems

This section lists items that we are missing in the current version
of Hyper-G. To us, these items seem to be helpful for future Internet
information systems that intend to be used for digital library
systems.

The search engine of Hyper-G does not provide the proximity
operator NEAR to express queries like A NEAR B. However,
to overcome this drawback users can exploit the iterative search
in Hyper-G (c.f. section 2).

Hyper-G presently lacks a Z39.50 interface.
Z39.50 is a protocol for communication between computers for information
retrieval. In particular, it is designed to support the retrieval
of bibliographic records independent of the type of system on
which they are stored. For example, we are currently preparing
another project together with the library of the Dortmund University.
On the basis of Z39.50, the meta-data collected in this project
should also be made available and searchable in conjunction with
other meta-data databases. Since Hyper-G does not support this
protocol, we will have to store all the meta-data in an extra
database.

Normally, different users have different access rights to
the holding of a digital library. This means that a kind of super-user
has to define the access rights for each new user account. Unfortunately,
Hyper-G has no notion of what we call a "sub-super-user".
A super-user in Hyper-G has all access rights for the server software.
This is more than is required to install user accounts. This could
be regarded as a security problem since it might easily happen
that too many rights are granted to a user (e.g., a patron gets
access to collections of librarians). In contrast to a super-user,
a "sub-super-user" has only restricted access rights.
These access rights could also be adapted to the types of users
for whom a "sub-super-user" is eligible to install accounts.
To guarantee security, a "sub-super-user" can only grant
all or fewer of the access rights he has.

5 Conclusion

On the basis of Hyper-G, we set up a digital library system. This
system consists of the server and client software as well as different
tools that we developed (however, we omitted the description of
tools in this paper). The document collection of DogitaLS1 mainly
consists of documents that are needed in the day-to-day work of
members of our lab. Even though this document collection is rather
small, Hyper-G has proven stable for huge amounts of documents
as well. For example, the Graz University runs a Hyper-G server
to store meta-data about its documents. In October 1995, this
server contained more than 300,000 documents.

As we see it, Hyper-G offers several features that are not presently
available in other Internet information systems. In this paper,
we showed how most of these features can be exploited to serve
needs of digital libraries. Since we are convinced of the capabilities
of Hyper-G, we will be using it in a joint project with the library
of the Dortmund University. The aim of this project is to build
an electronic archive for master-level and doctoral theses.

Acknowledgment

The first author is grateful to the Max Kade Foundation, New York
USA, for funding his post-doc at the Center for the Study of Digital
Libraries of the Texas A&M University.