Data Archiving Guidelines

Introduction

The information below was written for projects intending to
archive mission data sets within the Mikulski Archive for
Space Telescopes (i.e., MAST).
Recommendations are included regarding:

data set formats for archival data (i.e., using FITS),

the creation of database tables to allow online catalog searches,

the preservation of project-specific software and documentation.

The usefulness of mission data sets to future users will depend on how
well this information is preserved. MAST
staff members can work with
projects to assist them in developing data products that will be useful
to the general astronomical community for the long term.
We can review FITS headers, for example, for compliance with the
standard and to confirm the information most frequently used for searching
is present.

1. Data Set Formats

The astronomical community has adopted the
Flexible Image Transport System
(i.e. FITS) format as the default standard
for the exchange of data between institutions. The FITS file format is
platform independent, supported by many institutions, and endorsed by
both NASA and the IAU. For these reasons FITS is the recommended file
format for archiving data at STScI. A description of the
FITS data format recommendations can be found in the
MAST Data Format Guidelines
document. A online version of the FITS Standard Document and the
FITS User's Guide
is also available.

It is recognized however that some archival data may be stored
in other formats, particularly for those projects which preceded
the recent developments in FITS. One example is the earlier-processed IUE data
which is stored in a VICAR-based IUE "GO" format. In some other cases,
projects have distributed data as ASCII text files or created
auxiliary data sets as ASCII text files or postscript format.
In these cases, no attempt will be made to reformat the data sets
before being archived within MAST.

2. Catalogs

MAST, like other data retrieval systems, uses online database tables
to search for requested data sets.
In many cases, projects store the same information
in both the catalog or database
table and the FITS keywords. To simplify adding new data
sets into MAST, it is helpful if either

the project provides a target list or catalog of observations, or

the FITS headers are constructed such that MAST staff members can
create a catalog from the FITS keywords.

Obviously, project-delivered catalogs would greatly simplify the archiving
of mission data sets.
In the absence of catalogs or target lists, we would appreciate guidance from the
project staff concerning which of the FITS header keywords would be most
useful for searching. We do not want to do a wholesale ingest of all header keywords

Catalogs should contain those fields which
would most help users locate the desired observation(s). Coordinates, observation
date, exposure time, and target name are fairly essential
(depending on the method of observation), while
parameters needed for analyzing or interpreting archived data would be
highly desirable. A target classification
entry has been very useful for users interested in particular
types of objects.

Although the MAST uses the Sybase Database
Management system, tables can be exchanged between most database systems
by copying them to ASCII table files (be sure to include a sufficient
number of significant figures for representing floating point values.)
A WEB page containing an
observation list may be an adequate replacement for a database table.
In either case, a description of the individual fields within the table or list
should be provided as well. The description should also define the
source of the entries.
For example, it should state whether the coordinates were
supplied by the observer, or obtained from an existing catalog.

3. Documentation

Project-supplied documentation in the following categories
should be made available for archive users:

Project Description - General descriptions of the mission and
instrumentation,

Data Format - A general description of the contents of
the archived mission data sets including, for example,
documentation on the FITS keyword entries. (Note generally
the FITS keyword comment field alone is insufficient to properly define many
keywords. Without additional documentation, many of these keywords will
be of little or no use to future users.)

Since MAST documentation will be accessed primarily from the WEB,
documentation is most useful if it exists online. Most text processing
formats such as LaTeX or Microsoft WORD (or standard ASCII files) can be
fairly easily converted to HTML by staff members.
Large documents such as user manuals or data analysis guides
should be made available to users in several formats such as HTML for online
access, and POSTSCRIPT and/or PDF for downloading.

4. Software

Some projects have written software to analyze and interpret
raw and/or processed data. These programs should be
archived for future users. MAST will make project-supplied software
available to requesters, although support for the software itself
can not be provided. MAST currently maintains for example, the IUEDAC
IDL software libraries, the UIT BDR software written in C and Fortran-77,
and the EUVE EUV1.8 IRAF software.

A list of available
Fits software packages is available from HEASARC.
The list contains links to sites supporting general
FITS readers and writers written in a variety of programming languages.
hide scripts from old browsers