GILS

What is it? Where's it going?

ISSN 1082-9873

Abstract -- GILS starts from a basic premise that the Global Information Infrastructure should
enhance the free flow of information. The initiative primarily addresses the ways in which characteristics of
information resources are exposed to searching, a fundamental aspect of information infrastructure that has
policy as well as technology implications. With GILS, all manner of social, political, economic, and
thematic organizations of information exist independently, yet their diverse information locators are
interoperable for searching. GILS adopts existing open standards that achieve this interoperabilty by
allowing reference to common semantics for characterizing information resources. Sensitive to the world's
many languages, as well as legal and financial issues, GILS adopts the ANSI Z39.50 standard to specify
how electronic network searches can be expressed and how results are returned. GILS-compliance is a
particular way in which servers support searching for the characteristics of any kind of information, at any
level of aggregation. Such "locator records" may be hand-crafted as in traditional cataloging, or may be
generated dynamically. GILS-compliant servers must recognize the numbers for certain registered elements
and search attributes, but any specific set of locator records may use any or none of these, or may use
locally-defined elements. By applying a long-standing international standard, GILS takes advantage of
existing networks and software already providing access to a vast array of very valuable resources.
Although GILS will certainly evolve radically over the coming decades, some basic principles should
remain constant, including: adoption of open standards; support for international use; support for diverse
points of view; and preservation of access to accumulated knowledge.

In common with many other initiatives worldwide, the Government Information Locator Service (GILS) has
a simple goal: make it easier for people to find information.

Obviously, this is not merely a technological challenge. The Global Information Society now taking form
will have deep implications worldwide for many decades to come. William Y. Arms, with the Corporation
for National Research Initiatives, has written: "Early networked information systems were developed
by technical and professional communities, concentrating on their own needs....The digital library of the
future will exist within a much larger economic, social and legal framework." [ 1 ]

I think we are at a rare moment in history where specific public policy and technology choices today can
have profound and widespread influence. And, no choices are more crucial than the mechanisms by which
people find information.

The ideas behind the Government Information Locator Service have many roots. One of these is ongoing
work concerning data management for Global Change research, which focuses on topics such as Global
Warming and Loss of Biological Diversity. These are aptly named "Grand Challenges", especially as it has
now become obvious that Earth systems are intrinsically entwined with many other kinds of systems--social,
economic, political, etc.

One of the fundamental realities in this arena is that researchers and policy-makers today and in the future
will need access to long baseline data. There is a fundamental need for continuity not only among world
data centers, but with the vast treasure-houses of literature and other information maintained by libraries,
museums, and archives worldwide. Current and future records must also be managed appropriately, but it is
fundamentally not possible to fully anticipate what data and information will be needed. In some areas of
basic research today, it may be 40 years until societies know the right questions to ask and what crucial data
and information should have been maintained.

Given this context, a reasonable question to ask is this:

What should we do today to influence the global information infrastructure so as to maximize the
accessibility of relevant data and information worldwide for decades to come?

Among science organizations and throughout democratic societies, there is a basic premise that information
infrastructures should enhance the free flow of data, information, and ideas. Yet, it is also recognized the
global information infrastructure will be built primarily to serve interests other than long-term Earth systems
monitoring and natural resources management. If the free flow of ideas is to be enhanced in a way that is
sustainable over the long term, it will be through finding common solutions with commercial and
entertainment interests rather than through government fiat.

A great strength of the emerging Global Information Society is the wide diversity evident in the many
separate but overlapping perspectives and authority regimes. At this moment in history, virtually all nations
appreciate that they should not dictate to other nations how to manage information, and that highly
centralized approaches are often dangerous even from a strictly operational perspective. It follows that any
information standard intended for global use should not presume central authorities or a master index of
information. Rather, global information standards need to embrace interoperable but very decentralized
approaches.

Although more complex to design, such interoperable and decentralized approaches are entirely feasible.
Common standards employed in bibliographic cataloging provide a measure of interoperability across
independently maintained libraries, while allowing wide latitude in how collections are developed and
organized. Anyone can use the catalog standards to any extent desired and for any purpose whatsoever.
Open standards from the library and information services communities have now evolved to take advantage
of public networks such as the Internet. Happily these international standards are sensitive to the world's
many languages, and to legal and financial issues such as copyright, security, privacy, and payments.

In accepting this challenge to influence the Global Information Infrastructure in a fundamental way, one
must accept the bare fact of the currently primitive facilities for dealing with human knowledge in its myriad
contexts. People have barely begun to understand how to handle complex information at the personal and
corporate level. Much less clearly can ways be seen to facilitate the exchange of knowledge or to model the
interactions of information organizations among whole societies.

Continuous evolution and drastic revolutions must be expected. Yet, there are immediate strategies that
seem worthwhile to pursue. While access mechanisms such as user interfaces, data base technologies, and
network protocols change at a frequency of every few months, certain fundamental aspects have persisted
for many decades and continue to carry long term implications. For example, people searching for
information resources typically need clues regarding: What is it about? Who created it?, When was it
published?, Where is it available?, etc.

As technology, GILS primarily addresses one fundamental aspect of information access--the ways in which
the characteristics of information resources are exposed to searching. It happens that this technological
approach is also useful in the context of policy initiatives intended to promote diversity of information
access without sacrificing coherence.

One definition of GILS is "A decentralized collection of locators and associated information services
used by the public either directly or through intermediaries to find information." [ 2 ] This definition highlights the role of intermediaries and expresses the
decentralized nature of GILS, but in some ways it begs further elaboration.

Service The word "service", for example, carries different meanings depending on
whether GILS is discussed in the context of policy, technology, or standards. For policy people, GILS can
be viewed as a "service" in the sense of a set of human, organizational, and technological facilities that help
people find information. Extending this policy sense to technology, implementors sometimes refer to GILS
not as a service but as a specific system built of software and utilized in support of the policy goals. To my
mind, such systems are examples of compliance with GILS, just as a particular building may be an example
of compliance with a building code.

For the Digital Libraries audience, I would like to focus on the precise meaning of GILS in the standards
context. In the general architectural model of networks, GILS is a service in the sense of an application
profile that is part of a service definition. This service performs specific functions useful for locating
information. It is available for use by higher level applications and makes use of lower level components
such as bitways.

Locator A locator is defined as an information resource that identifies other information
resources, describes the information available in those resources, and provides assistance in obtaining the
information. It is typically modeled as a database of locator records, each of which is a set of related data
elements descriptive of various characteristics of an information resource.

Depending on how they are created and the nature of the information resource, locator records can be
known by many other names: metadata, meta-information, directories, catalogs, abstracts, etc.

Outside of a specific policy context, there is no prescription for what is an appropriate level of aggregation.
GILS locator records already exist for everything from individual pamphlets to multi-national programs. North Carolina is using GILS locator records to describe
individual fields within databases throughout the state. The Government Printing Office has created GILS
locator records at the level of entire Federal agencies.

Servers acting as GILS locators are also information resources and can themselves be described by a GILS
locator record in other GILS locators. Using the Linkage element, the separate GILS locators can be
exploited as a network for distributed search, and traversal of that network can be informed by any of the
resource characteristics described (e.g., Subject Terms, Originator, Distributor Name, Access Constraints).
This recursive feature should make GILS useful in attempts to solve the "query routing" problem of Internet
searching.

Information It is a common misconception that GILS locator records are merely another
variety of "META" tagging for Internet information resources. While GILS locator records are quite useful
for networked information resources, GILS locator records are designed to act as pointers to ALL kinds of
information including people, organizations, events, artifacts, etc. These and countless other sources of
information worldwide are not now, and in some cases will never be, either electronic or network
accessible. GILS locator records must have a representation on electronic networks; but the referenced
information resources need not.

The "G" word When told that the "Government" Information Locator Service was also
intended to serve as a "Global" Information Locator Service, the late Paul Evan Peters remarked that the
same "G" would continue to be useful when GILS becomes the "Galactic" Information Locator Service.
Although GILS is being implemented in multiple countries and so might be considered "international" if not
"global", it is also intended to be "global" in the sense of extensible to various other kinds of information.
(Sebastian Hammer once suggested that GILS could be taken to stand for "Generic" Information Locator
Service.)

A great diversity of application contexts is an intentional design feature of GILS. All of these disparate
social, political, economic, and/or thematic organizations and other entities are free to independently
organize relevant parts of information space for their own particular needs. They do not need to subscribe to
a single view of information, nor do they need to subordinate their locator to some "mother of all GILS".
Although their disparate goals lead to various usage guidelines, the one thing they have in common with all
other GILS is that their locators are interoperable for searching.

Interoperability among the many different GILS-compliant servers is defined through the GILS Application. A profile is "The
statement of a function and the environment within which it is used, in terms of a set of one or more
standards, and where applicable, identification of chosen classes, subsets, options, and parameters of those
standards. A set of implementor agreements providing guidance in applying a standard interoperably in a
specific limited context." [ 3 ]

The GILS Application Profile is an Implementors Agreement developed and coordinated internationally
through the Open Systems Environment Implementors Workshop (OIW), a "de jure" standards process.
(Following its acceptance as an international profile, the GILS Application Profile was also adopted as U.S.
Federal Information Processing Standard (FIPS) Publication 192.)

The specific group that manages the GILS Application Profile is the OIW Special Interest Group on GILS
(OIW/SIG-GILS). The OIW/SIG-GILS meets approximately once a month. Meetings are open to anyone
and meeting summaries are posted under "GILS Discussions" at http://www.usgs.gov/gils/gilscopy.html.

The GILS Application Profile is defined only in the context of peer networks, such as OSI and TCP/IP. The
profile assumes a client-server architecture. The GILS Application Profile does not constrain clients at all,
but is designed to anticipate clients acting as automated agents (i.e., the client's human is not monitoring the
search session as it occurs). End-user clients, gateways, and agents have free rein to interact with GILS
servers in either a stateless or stateful manner.

The GILS Application Profile only specifies the behaviors of server software in conversation with client
software. The profile specifies server behavior in terms of an abstraction layer--the service definition is
independent of how the content is actually managed at the server.

A GILS-compliant server appears to a client as though holding a searchable set of information locator
records. Each locator record can characterize other information of any kind, at any level of aggregation.
These may be hand-crafted catalog records or on-the-fly products of an automated abstracting or
classification process. For example, a locator record that describes another server might include a listing of
the words most characteristic of that server's contents and so act as an intermediary resource for information
discovery.

Searches can be content-based using full-text searching or some other manner of feature extraction. Or, the
search may take advantage of registered attributes such as structured elements (e.g., Title, Author, Subject)
and relations (e.g., Equal, Greater Than).

In GILS, there are defined about 70 registered
attributes to control searching, with explicit semantics for each element as it is used to characterize
information resources. Another 100 registered attributes are available because they are inherited from the
Z39.50 Bib-1 Attribute Set. GILS locator
records can incorporate locally-defined elements as well. GILS-compliant servers must be capable of
searching with the required attributes and handling all of the registered elements. However, specific sets of
locator records available from a GILS-compliant server may have any number of these or other elements,
and may also have none at all. (Records without any element structure are searched full-text through a
special attribute called "Anywhere".)

Because GILS provides interoperability through reference to the registered semantics of elements, it is not
necessary to specify a canonical format for structured metadata. Natively or through gateways, the service
can support interoperable search of many different metadata structures--HTML, SGML, X.500, SQL
databases, PURL's, Handles, Dublin Core, SOIF's, IAFA, Internet mail, DIF's, Whois++ templates, spatial
metadata, etc. Whenever appropriate, servers simply map local semantics to the registered elements. (In
Z39.50 version 3, there is an "Explain" facility by which clients can dynamically discover what attributes
are available.) Multi-lingual searching is also facilitated because the elements are referenced by number--
user interfaces simply translate the number to whatever is appropriate for the particular language chosen.

The GILS Application Profile has a mixture of components from the Internet, such as URI's and MIME
types, as well as OSI components such as ISO 10163 (ANSI Z39.50). There is a special RFC that specifies
how the OSI functions inherent in Z39.50 get mapped onto the functions available in TCP/IP.

The Z39.50 standard allows different information databases to be interoperably searchable. (In 1994, 16.5%
of public libraries serving populations over 100,000 were using Z39.50; in 1995 this increased to 23.8%.)
Z39.50 specifies how electronic network searches can be expressed and how results are returned. As stated
by Sebastian Hammer of Index Data in Denmark and John Favaro of Intecs Sistemi in Italy: "...the
essential power of Z39.50 is that it allows diverse information resources to look and act the same to the
individual user. At the same time, it allows each information system to assume a different interface for every
user, perfectly suited to his or her particular needs. " [ 4 ] (For an
overview discussion of how Z39.50 complements the World Wide Web, see http://www.usgs.gov/gils/webz3950.html.)

The GEO profile for the Geospatial data locators is currently under development and is essentially a
superset of GILS. The new Catalog Interoperability Profile being tested by the Committee on Earth
Observing Satellites is designed to be compatible with GILS. Also, the profile being developed for Digital
Collections (previously known as "Digital Libraries") is compatible with GILS semantically, and the same
should be true of the forthcoming profile for the interchange of information among museums worldwide.
(For further information on Z39.50 Profiles, see http://lcweb.loc.gov/z3950/agency/profiles.html
).

The semantics of GILS metadata elements are also a major component of the recent Internet search protocol proposal
developed for Web "metasearcher" companies under the Stanford Digital Library project.

It is common experience that many separate communities use bibliographic techniques to characterize data
and information resources. Unfortunately, as each community chooses different tags for the bibliographic or
metadata elements, any commonality that may actually exist gets completely obscured. The usual result is
that there is no functional interoperability between the catalog services, unless some organization is able to
force the communities to accept imposition of a common format. Such strong-arm tactics may be useful at
times, but they are clearly inappropriate on a long-term and global scale.

At its center, the thrust of GILS from a standards perspective is to encourage interoperability at the semantic
level. All information resources should not be characterized in the same way--the appropriate searching
characteristics should be a function of the information itself but also of the cataloger's purpose and the
searcher's needs. Again Bill Arms provides a useful conceptual reference: "Digital objects are the
basic building blocks of the digital library, but users of the library usually want to refer to items at a higher
level of abstraction...Which digital objects should be grouped together can not be specified in a few
dogmatic rules. The decision depends upon the context, the specific objects, their type of content and
sometimes the actual content." [ 5 ]

As one example, for some combinations of resource, cataloger, and searcher, the semantics of "system
name" could be equivalent to "title", "principal investigator" could be equivalent to "author", and
"keyword" could be equivalent to "subject". In that case, there is clearly a ripe opportunity for
interoperability. With such a semantic mapping, the European science community's Catalogue of Data Sources could be interoperable with the library
community's OPAC (Online Public Acccess
Catalog) Network- Europe

The library and information services community has always been concerned with such interoperability
problems. Over many decades, they developed a variety of schemes such as the Dewey Decimal System, the
Library of Congress Subject Headings, Machine-Readable Cataloging (MARC), the Anglo-American
Cataloging Rules (AACR), and the Z39 family of information standards.

The theory and practice of information science is quite distinct from computer science and database
technologies, and unfortunately the approaches to interoperability have been quite different. In relational
data bases, for example, one foray into semantic interoperability was the Integrated Resources Dictionary
System. A common semantic registry is also a feature of the Electronic Data Interchange standard and of the
X.500 family of standards.

Now that so many people are cross-training in both the information science and computer science fields
(witness the Digital Libraries initiative, for example), perhaps the time is now ripe for enhanced semantic
interoperability approaches. I am aware of relevant work at the Distributed Systems Technology Centre in
Australia on a CORBA Type Management
Service, and a Catalog Services Request for Information in the context of the OpenGIS Abstract Specification. Although
there may not yet be sufficient momentum to deploy such a comprehensive architecture, it may be timely to
explore semantic interoperability between the Z39.50 standard and mechanisms such as the Common Indexing Protocol,
Resource Description Messages,
and the Platform for Internet Content Selection.

By applying an existing standard in wide use for many years, GILS takes advantage of existing networks
and software to access a vast array of valuable resources, including
hundreds of libraries, as well as museums and archives worldwide. Together with spatial data catalogs,
these professionally maintained resources provide free access to information resources collectively valued
in the tens of billions of dollars, with even more information available on a fee basis from commercial
information services.

WAISserver commercial software was recently acquired by Fulcrum
Technologies, which will distribute a GILS-compliant release and will also integrate GILS-compliance
into Fulcrum Surfboard. The Online Computer Library Center SiteSearch server is being made GILS-compliant in
support of the Solinet Public Information
Project. GILS-compliance is also being developed by Verity, SIRSI, and Sovereign-Hill. (The Sovereign-Hill search engine is a
commercialization of a product of the Center for Intelligent Information Retrieval, called Inquery.) Elsevier is looking at GILS for
shared search access to science publications through their new Science Direct service.

Because the GILS Application Profile has no user interface, access to
GILS-compliant servers must be accomplished through gateways, clients, or
agents.
Gateway freeware and toolkits are
available for World Wide Web and
for
X.500. An interesting technology proposal is the
building of interoperability between GILS and the X.500 directory that underlies the
U.S. Government-wide Electronic Directory. This government-wide directory is
intended to include White Pages (employees), Yellow Pages (organizations),
Blue Pages (services), and Green Pages (documents). Once there are two-way
gateways for Z39.50/GILS-compliant servers and these X.500 directory
databases, locator information in both schemes would be readily accessible
from the access mechanisms of either.

Any client software capable of access to a server compliant with Z39.50 version 2 or 3 can access GILS-
compliant servers. Extra capability is provided by GILS-aware clients such as the BookWhere 2000 product from Sea
Change Corporation, and products under development such as SIRSI Vizion and the new Java client being developed by Blue Angel Technologies.

The GILS application profile specifies, though it does not mandate, spatial searching by a "bounding box"
of latitude, longitude pairs. This extension of search beyond text is an important feature of Z39.50. It has
allowed the protocol to be applied to searching for chemicals by bond angles and for the searching of gene
sequences. Other kinds of pattern-matching can be supported as well.

To achieve a tighter integration of GILS with World Wide Web servers, the USGS crafted a search module for the freeware Apache Web
server. This module embeds into the Web server the Indexdata freeware indexer, gateway, and client
components. Without requiring dual administration and security, the GILS-compliant server is accessible in
a presentation mode through HTTP and in a search protocol mode through Z39.50.

The same approach can be applied to the new freeware AOLserver, as well as the Netscape Commerce
Server,the Microsoft Internet Information Server, and others. The AOLserver looks especially interesting since it includes the
Informix Illustra database management system. Illustra is the
commercialized form of the PostGres object-oriented database management system. The USGS has already
demonstrated porting of the Indexdata GILS-compliant protocol and server freeware to PostGres, so it
should be straightforward to provide a GILS-compliant Illustra DataBlade. It is a major positive
development to have an object-oriented relational data base management system integrated with a freeware
Web server. And, it is clearly a great advantage that Illustra already supports spatial search facilities.

As a long-term infrastructure initiative, it is clear that GILS will be evolving in many ways. Yet, I think
there are some basic principles from which GILS should not diverge over the decades:

GILS must continue to adopt open standards, and be fully coordinated through the international
voluntary standards processes.

GILS must continue to support international use. It must be sensitive to the world's many languages
and technical standards. It must also accommodate policy, legal, and financial issues, including copyright,
security, confidentiality, and coordination of payments.

Policy and technology choices must support the diversity of sources, and points of view, in our Global
Information Society.

GILS should be implemented on networks, but designed to locate information in all media and forms.

The crucial role of intermediaries should be clearly recognized. It is not feasible nor desirable to
establish master repositories to serve all the world's information needs.

GILS should enable content owners and intermediaries to draw from other locators. It should also
make it easy for them to make their value-added products known through the same mechanism.

GILS should contain no inherent preference toward any particular hierarchy or other way of organizing
information. Rather, it should allow many organizing structures to co-exist.

GILS must be designed to handle the meaning of information in different contexts. It must be
extensible into the many ways people extract information from data.

GILS must be built for the future, fully mindful that the Global Information Infrastructure now taking
form will be with us for the long term. But, GILS must also preserve access to accumulated knowledge
stored in the culture treasure houses we know as libraries, museums, and archives worldwide.

The adopted standards must be evolutionary. They must accommodate the variable pace at which
different parts of the world become full participants.

We cannot today predict what will be the appropriate technical basis for GILS over the long term.
Evolutionary and sometimes revolutionary changes must be expected and accommodated. In the current
revolutionary turmoil, the technology strategy must be flexible but careful not to compromise any of the
basic principles.