NOTICE:

This Legacy journal article was published in Volume 7, June 1998, and has not been
updated since publication. Please use the search facility above to find regularly-updated information about
this topic elsewhere on the HEASARC site.

The Générateur de Liens Uniformes - Uniform Link Generator (GLU) system developed by the Centre de Données astronomiques de Strasbourg (CDS) to manage heterogeneous, distributed databases is presented, as well as the AstroGLU prototype of a discovery tool for querying heterogeneous databases.

Introduction

The CDS, a French laboratory of the Institut National des Sciences de l'Univers (INSU) located at the Strasbourg Astronomical Observatory, is a data center devoted to the collection and worldwide distribution of astronomical data and related information (Genova et al., 1996). It has developed widely used on-line services, such as the SIMBAD database, which contains names, bibliography, basic data, and some measurements for more than 1,600,000 astronomical objects outside the solar system (and more than 4,500,000 object names). The CDS also provides a catalog service, with ftp access to more than 2,000 catalogs and published tables, the VizieR catalog browser, with more than 1,600 catalogs and published tables, including the HIPPARCOS catalogs, and the ALADIN image database and interfaces. It also manages bibliographic information (in particular in collaboration with Astronomy & Astrophysics) and information retrieval tools, the Dictionary of Nomenclature of astronomical objects, and Yellow Page services. All these are different heterogeneous databases, to which mirror copies of bibliographic information have been added in recent years: the European mirror copies of the NASA Astrophysics Data System (ADS) (Eichhorn et al., 1996), and several on-line journals: the Astrophysical Journal, Letters and Supplement Series, the Astronomical Journal and the Publications of the Astronomical Society of the Pacific. The CDS home page at http://cdsweb.u-strasbg.fr/CDS.html gives access to the services maintained and hosted by CDS.

When the World Wide Web became available, it was clear that it was a very powerful tool for gaining easy access to on-line services. In addition, it significantly increased the impact of the services, by allowing navigation between them, in a way quite transparent for the user. This is evident for the CDS services (Wenger et al., 1996), as one could for instance navigate from one object in SIMBAD, to the 'Dictionary of Nomenclature' information about the name(s) of this object, then from the Dictionary entry to the list of origin for this object name, taken from the catalog service, etc. Moreover, external services could be included in this chain, in particular the bibliographic services hosted by the CDS, the ADS and the journals. For instance, the user will be able to navigate from one object in SIMBAD, to the references citing the object (this is included in SIMBAD), then to the ADS services for each reference and to the full paper from the journal editors when available. As a second step, links were also needed to remote services. For instance, the first link between SIMBAD and observatory archives has been implemented with HEASARC: for all SIMBAD objects which have at least one name from a 'high energy catalog', a query can be sent directly from the SIMBAD result page to the HEASARC archive. A prototype link between VizieR and the VLA FIRST survey archive, and another one between SIMBAD and the IUE archive, are also operational.

In this context, the CDS decided to develop a tool for ensuring efficient interoperability between its services: the Uniform Link Generator or GLU. In fact, this tool has been shown to be useful in a much wider frame: the general question for service providers is to access remote and heterogeneous databases, either to build links between services - as in the case of the CDS, or to build general discovery tools in a scientific domain, or in a set of domains - as illustrated e.g. by the AstroBrowse initiative (see article by McGlynn et. al. elsewhere in this issue).

The development of the World Wide Web is a very important milestone in unifying the access to remote databases, by providing a unique user interface, a unique network protocol, and a hypertext link mechanism via (Uniform Resource Locators) URLs. But if creating URLs is easy, maintaining them is the real challenge for database managers. One important goal is certainly to try to avoid the too frequent 'Error 404 - Not found', which means to provide the data manager with a tool to define and easily maintain a unique access key for a given information location.

For this purpose, the concept of a URN (Uniform Resource Name) has been discussed for several years, in particular in an Internet Engineering Task Force (IETF) working group (Sollins et al., 1994), but there is presently no implementation of such a system. Therefore, everyone is using URLs, and everyone knows that in practice any component of a given URL (hostname, directory, resource name, parameter syntax) can be modified at any time, leading to frequent failures in the links.

The GLU tool

The GLU allows the database manager to use symbolic names instead of hard-coded URLs in data. In practice, two complementary tools are implemented:

- the GLU Dictionary, which links the symbolic names with their corresponding URLs and other relevant information;

- the GLU resolver, which replaces on the fly symbolic names by the corresponding URLs.

In a Web document, it is then possible to use the symbolic names – GLU tags - instead of URLs. Every time the Web page is queried, the GLU replaces the GLU tags by the URLs. If there are parameters, they are included at the proper location in the anchor.

One Dictionary item can describe a Data Type or an action. For each item, the GLU Dictionary contains information about the action to be performed, with a description, the URL, the possible parameters, a test sequence, the data type of the action result, a URL for user documentation (help file), etc. For instance, the GLU Dictionary entry describing the query of SIMBAD by object name contains the following information:

The GLU allows each of the participating services to resolve its URLs locally using its own view of the GLU Dictionary. This is important to increase the resolution speed and the system security. At system level, one simply has to specify that the output data stream has to be filtered by the GLU resolver.

A fundamental piece of the GLU system is the mechanism which maintains each view of the GLU Dictionary. This task is performed by a daemon which sends and receives the GLU records (the entries in the Dictionary).
The GLU protocol used by these daemons has the following characteristics:
- It is based on distribution domains: managers choose the GLU domains to which they distribute their own GLU records and the ones from which they want to receive GLU records.
- It uses hierarchical name space for the GLU record identifiers, to insure their uniqueness through the whole system.
- It is fault tolerant and fairly secure: independent views of the GLU Dictionary, authentification of the update sender, etc. A failure in connection for a site simply means that the local view of the GLU Dictionary will not be updated during the failure, but the GLU resolution of anchors will go on with the current version.

More advanced functionalities offered by the GLU are the management of clones, the capability to use the GLU system as a general macro system (it is used for instance to maintain homogeneous HTML pages for the CDS services), to build an automatic test sequence which checks all the URLs of the GLU Dictionary, and to implement data conversion (conversion between astronomy coordinate systems has for instance been implemented), etc. On the other hand, the capability of the GLU to give access to the participating services by data types is used to build a 'Service Browser' called AstroGLU (Egret et al., 1997), as described below.

AstroGLU: the prototype of a discovery tool for querying heterogeneous services

How to help users find their way through the jungle of on-line information services is a central question raised during the past years (e.g., Egret, 1994, or the AstroBrowse paper in this issue), and the question becomes more and more acute with the very rapid development of the World Wide Web. There seems to be a general agreement that centralized systems are not the right answer, and that any solution must put as few constraints as possible on the data providers in order to be accepted and implemented by them. The World Wide Web is obviously an excellent system as a departure point, since it allows easy development of user interfaces with a common but flexible language. The Web also gives the possibility to navigate between distributed services, but maintenance of links is a difficult problem.

In this context, the GLU can be a very efficient tool to remove at least part of the difficulties:

- the management of addresses through the GLU Dictionary, which is maintained by each data provider which implements and updates the information concerning his/her own service(s). No change is needed in the service itself. If some essential information from services which do not participate in the GLU are needed, a GLU Dictionary entry can be created, and the GLU test tool can be used to verify that the information remains valid. All participating services have access to up-to-date information through their local view of the GLU Dictionnary.

- the GLU Dictionary contains information which allows a query to be generated automatically, but also a complex request such as 'which of the participating services are able to give information about astronomical object names' or 'which of the participating services are able to give information about bibliographic references'. This relies on the fact the GLU Dictionary can be used as a Reference Directory which manages 'generic data types', and indicates which is the data type given as input for all the actions described in it.

Based on this property, it has been possible to create a simple user interface which takes advantage of the knowledge about distributed services contained in the GLU Dictionary, to help users to find the information they need, exploring new domains for their research. This service is called AstroGLU, and can be found at:

http://simbad.u-stasbg.fr/demo/cgi-bin/astroglu.pl

The question addressed by this service is the following: the user is interested in getting information about this or that data (e.g. a position in the sky, an object name, a bibliographic reference, the name of an astronomer, etc), and does not know which on line service to contact, and what are the different types of information which can be requested from that input data. At the next step, the list of all services which accept the given data type as input is displayed (with access to help files for each of them), and the user can choose those to which he or she will send the query. For instance, a query by an object name can be sent to the relevant CDS services, SIMBAD, the Dictionary of Nomenclature, the VizieR catalog Browser, an image preview from ALADIN, and also to external services such as NED, the NCSA Astronomy Digital Library, or to several observatory or data archives such as HEASARC, the STScI, etc. In addition, the user can build his/her own filter for the list of services, by selection among the possible data providers.

The whole user interface is built automatically from the contents of the GLU Dictionary, and automatically updated.

Another possible usage of the GLU to build discovery tools is in the AstroBrowse context, which concentrates on building services able to query remote databases by position (or by object name, getting the object position from SIMBAD or NED).

The 'GLU community'

The GLU is presently used in the context of Astronomy, and also by the French Space Agency CNES for its new data center in space plasma physics (CDPP - Centre de Donnees de Physique des Plasmas, in development). It is of course used by the CDS to manage its on-line services, and the links between the CDS services and external services such as the ADS or on-line journals, and with observatory archives, and in the AstroGLU tool. As an example, the tag used to provide links to the ADS has been used 3.5 million times in the last 9 months, and the tag to the CDS catalog service 2.5 million times during the same period.

The GLU is also proposed as a possible general tool for the AstroBrowse initiative, and is implemented in the participating institutes. Since its general principles are not 'discipline oriented', it could also be used in the more general context of the SSDS. The capability to define Discipline Domains is already implemented, and it is quite possible to implement tools similar to the conversion of coordinate systems used for astronomy purposes, for the needs of other disciplines.

At present time, the GLU system can manage approximately fifty dictionary providers per GLU domain and local views of the Dictionary of a few Mb. Optimization. Speed improvements are under way, to open the capability to manage a much larger number of distributed services.