2001 Annual Report of the National Space Science Data Center

PREFACE

The National
Space Science Data Center serves as the permanent archive for most NASA/Space
Science mission data to ensure future data accessibility and usability. NSSDC
also provides current data access, complementary to the efforts of other NASA/OSS
"active archives," in support of the NASA and international astrophysics
and space physics research enterprises. Finally, NSSDC is a conduit for the
general public to acquire NASA space science data (primarily imagery) of interest
to them.

NSSDC is pleased to issue this 2001 Annual Report describing (1) the 2001 growth
and evolution of NSSDC's data archives, access pathways, and other tools and
services, and (2) the 2001 access to those data and services by NSSDC's customer
communities. This report has been made WWW-accessible in the hope that readers
will avail themselves of the opportunity to link to the services reported herein.

I
welcome suggestions for user-benefiting improvements to this Annual Report
and to NSSDC services.

Joseph
H. King

Head,
National Space Science Data Center

1. INTRODUCTION

This report characterizes NSSDC's data holdings, metadata holdings, access
pathways, and value-added data products, tools, and services at the end of
2001, with a focus on the 2001 activities leading to that end-of-year state.
In addition, this report characterizes the nature and amount of 2001 access
to NSSDC's data and services by its many users from various communities.
It is assumed the reader will have a general familiarity with NSSDC and its
mission. The top NSSDC web page is at http://nssdc.gsfc.nasa.gov/ .

2. HIGHLIGHTS

The most important result of NSSDC’s 2001 activities is the continuing preservation
of growing space science data volumes, ensuring their continuing and future
accessibility to the space science, education, and general public communities.
The statistics to follow reveal that NSSDC’s archive has now grown to 19.2
TB of space science data and an additional 3.3 TB of Earth science data.
During 2001, 3.3 TB of data were added to the NSSDC archive that now holds
data from 1,301 experiments flown on 373 spacecraft.

Next, NSSDC continues to distribute large amounts of data by network to the
space science community and general public, and by offline mailings to the
general public.

Again, following statistics detail the data volumes disseminated via various
pathways to various communities. We note here that during 2001, NSSDC’s customers
downloaded via network over 3.5 million data files (a 13% increase over 2000)
and received about 1.3 TB of data on mailed media.

NSSDC’s data dissemination is leading to the publication of significant new
science. The Appendix of this Annual Report
lists 96 science papers acknowledging NSSDC data or services as contributing
to their analyses. These are papers that have come to the attention of our
staff. Most science journals in which NSSDC data or services may have been
used are not routinely reviewed by our staff, and several which use NSSDC data/services
do not cite such use, so the list represents a lower limit on papers enabled
or benefited by NSSDC.

The CDAWeb system that provides access to multi-source data needed in analyses
of magnetospheric processes and of solar wind-magnetosphere coupling continues
to grow in popularity and usage. Especially noteworthy in 2001 was the creation
of a number of ingest pipelines enabling data from many non-core-ISTP sources
to flow directly into CDAWeb rather than flowing from those sources to CDAWeb
through the ISTP Central Data Handling Facility which was significantly descoped
in late 2001.

The OMNIWeb system had its functionality significantly enhanced. An option
was provided enabling users to generate scatter plots for user-selected science
parameters for user-selected time spans, with the ability to filter (include
or exclude points) based on user-specified ranges of any OMNI science parameters.
The new option also computes linear regression fit parameters. A separate
new option allows users to link to many of the individual data sets contributing
to the multi-source OMNI solar wind data set.

The year 2001 marked the retirement of the NSSDC Data Archive and Dissemination
System (NDADS). NDADS was a VMS-based optical disk jukebox pair which, over
its ~10-year life, served over 2.4 million astrophysics (IUE, etc.) and space
physics (IMP 8, etc.) data files in response to over 100,000 distinct user
requests.

Although started in the second half of 2000, NSSDC truly went into production
with its reengineered data management in 2001. Archive Information Packages
(AIP; bundles of data files and companion attribute files as prescribed by
the ISO/CCSDS
Archive Reference model) were defined, created and written to DLT
jukebox. During 2001, 696,000 such AIPs, containing 660 GB, were created
from newly arriving data and from data formerly on NDADS. At the same time
the AIPs’ constituent data and attribute files were written to a unix-based
RAID magnetic disk environment for external user access. During 2002 we will
begin the creation of AIP's from the offline digital archive and their ingestion
to the nearline DLT jukebox.

During 2001, the first inflow of data from a spacecraft project that used
NSSDC-provided software to prepare Archive Information Packages for NSSDC
submission was ingested to the DLT jukebox. The project was IMAGE.
This facilitates NSSDC data ingest and management. The approach will hopefully
be replicated with other missions and individuals preparing data for NSSDC
submission.

A new version of the International Reference Ionosphere (IRI) model was released
by NSSDC early in 2001 with many improvements, including ionospheric storm
effects and a significantly improved equatorial, bottomside electron density
profile. There are several additional output parameters, including vertical
ion drift at the magnetic equator. The IRI, evolved by a worldwide community
led by an NSSDC staff member, remains one of NSSDC's most often requested
software package and is at http://nssdc.gsfc.nasa.gov/space/model/ionos/iri.html
.

To aid users in the data preview and selection process, NSSDC developed in
2000 a new family of graphical-display-and-subset interfaces for selected
ftp-accessible ASCII and gzipped-ASCII data sets. This family, called Ftphelper,
is at http://nssdc.gsfc.nasa.gov/ftphelper/
. The number of space physics spacecraft some of whose data are Ftphelper-browsable
was brought from six to 14 during 2001.

A new software tool
was developed which extracts photometry information (flux densities) from the
time-ordered data of COBE's Diffuse Infrared Background Experiment (DIRBE).
The interface allows users to specify the direction to the source of interest
and the time span (up to the full 10-month cryogen life) over which source crossing
data are to be analyzed. To date, this tool has been accessed 2007 times.
One resultant publication involved the extraction of Mira variable star IR light
curves of unprecedented quality and time and wavelength coverages.

To support possible (but unlikely) future use of old data in a multiplicity
of vendor-specific binary formats, NSSDC created a series of web pages documenting
the binary representations of words (bit patterns) for 18 vendor-specific
formats. This set of pages is at http://nssdc.gsfc.nasa.gov/nssdc/formats/
.

NSSDC's NASA/Science Office of Standards and Technology (NOST) organized
and hosted a 5-day XML workshop under the sponsorship of the Consultative
Committee for Space Data Systems (CCSDS). The 24 expert participants made
good progress in the technical area of packaging data using XML and in identifying
the objectives and need of a continuing working group. See http://www.ccsds.org/meetings/xml2001summer/papers/ReportOfXMLWG.doc

In 2001, the NASA Sun-Earth Connection
Education Forum (SECEF) team, with major NSSDC participation, orchestrated
Sun-Earth Day held in April, 2001. Ten thousand packets of information were
sent to teachers, scientists, etc. for Sun
Earth Day programs, reaching hundreds of thousands of people. For
programs like this, SECEF received a NASA group achievement award in August,
2001.

Readers are encouraged to exercise the multiple options on the hierarchical
array of WWW pages starting with NSSDC's home
page. There are several more
functionalities beyond those called out in the preceding paragraphs.

4. DATA MANAGED AT NSSDC, AND 2001 INFLOW AND OUTFLOW

There are several ways to characterize the multi-disciplinary NSSDC archive.
Byte counts are a common metric for modern archives, and will be reported
herein. Numbers of distinct data sets and numbers and diversity of media
volumes managed are also very important. (In NSSDC's terminology, a data
set is typically all the data from a given source at a given processing level
in a given format.) The diversity of data sets and of media types relate
to the intellectual heterogeneity and technical heterogeneity of the archive,
respectively, and we shall report on these also.

At the end of 2001, NSSDC had 4,359 distinct data sets and accompanying documentation
packages being managed. Table 1 indicates the
disciplines from which these data sets come and whether the data sets are digital
or non-digital (film, etc.). The table shows that these data sets come from
1,301 experiments that have flown on 373 mostly-NASA spacecraft. By data set
count, space physics is the dominant discipline, accounting for nearly half
of NSSDC's data sets. This reflects the fact that in its early years, NASA
launched a preponderance of space physics missions and also that space physics
spacecraft typically carry more independent experiments than do astrophysics
missions.

[Astute readers will notice that these counts of data sets and of source
spacecraft and experiments are typically smaller than for last year. This
is not because NSSDC has released significant data. Rather, in this period
of heightened sensitivity to good accounting practice, NSSDC developed this
year's numbers by a "zero-base" analysis of its information files
rather than merely incrementing prior year's numbers with current-year inflows.
Also contributing to the decrement in data set counts is a partially completed
new approach to defining multi-source data sets. For example, the ISEE 1
"data pool" microfilm to which 10 experiments contribute are newly
counted as one rather than 10 data sets.]

Note from the table that NSSDC manages almost
as many non-digital (mostly film) data sets as digital data sets, although
it should be noted that NSSDC has been acquiring almost no non-digital data
in recent years and has been gradually converting parts of its film archive
to a digital form.

Table 2 is a different characterization of the
NSSDC archive, by byte counts and media volume counts. The table shows 22.5
TB of total data, a 1.15 TB subset that is network-accessible, and 66,692 digital
media at NSSDC. The byte counts are estimates, involving for some data sets
assumptions about the mean numbers of bytes on various media types. The number
of media has decreased by 13% since last year as data are moved from low-capacity
old media to newer high-capacity media.

Data are also being moved from NSSDC's traditional offline archive to a nearline
archive based on a DLT jukebox attached to a unix server. As described at http://nssdc.gsfc.nasa.gov/nssdc_news/dec00/dec00_toc.html,
data are newly archived in "archive information packages" which hold
data files and companion attribute files as per the specification of the ISO/CCSDS
Open Archival Information System reference model. Table
3 shows the volumes of data ingested to this new archive, by mission, in
2000 and 2001 (177 GB and 660 GB, respectively). Much of the data were formerly
network-accessible from the retired NDADS system and other data are currently
inflowing to NSSDC. Most of the data were made ftp-accessible in addition.

From the research community's perspective, only astrophysics data and space
physics data are network-accessible from NSSDC. That planetary data are not
network-accessible from NSSDC is the result of the Planetary Data System’s making
most or all its planetary data accessible via the network or via CD-ROM creation
and dissemination. NSSDC's photo
gallery and image catalog
which are WWW-accessible from http://nssdc.gsfc.nasa.gov/planetary/
contain much planetary image data but these are largely oriented towards the
general public.

Tables 4 and 5
better characterize NSSDC's network-accessible astrophysics and space physics
data, by project. In space physics, NSSDC holds a large volume of CDF-formatted
data underlying CDAWeb and a comparably sized separate holding of data in other
formats, mostly plain ASCII, and we report these separately in Table
5. There is very little overlap. All the data are ftp-accessible. All
the CDF-formatted data are CDAWeb-accessible. Some of the ASCII data are accessible
via Ftphelper, ATMOWeb, or, for the long-wavelength astrophysics data, through
mission-specific web pages.

The volume of data network-accessible from NSSDC is seen in Tables 4
and 5 to be 1.15 TB. This is down from the 2.39
TB reported a year ago. This drop is associated with the retirement of the
nearline NDADS system. The current magdisk-accessible 1.15 TB may be compared
with 0.44 TB of data that were magdisk-accessible a year ago. Most data that
were NDADS-resident a year ago that are not part of the current 1.15 TB are
now network-accessible from NASA/astrophysics "active archives," specifically
HEASARC at Goddard and MAST at STScI.

Table 6 characterizes the digital media types
managed at NSSDC, not including back up copies. This table is an expansion
of Table 2 in which total numbers of unique digital media volumes were given.
It should be noted that most volumes are replicable and have one backup volume.
However, for "CD-ROM (Titles)" which are not locally replicable, NSSDC
typically holds between 20 and 200 copies of each title. For these, NSSDC must
replenish stock through a commercial vendor as request activity drives NSSDC
stock down. DLT and DVD are expected to become increasingly important at NSSDC.

Table 7 characterizes NSSDC's non-digital archive,
by disciplines by form factor. This is unchanged from a year ago. Note that
NSSDC has large volumes of non-digital data for each of the discipline areas it
supports. It should be noted, however, that very little new data have been arriving
at NSSDC in non-digital form in recent years. NSSDC has recently begun an effort
to systematically convert this film archive to computer-readable form. During
2001, NSSDC scanned 6,159 film frames from such spacecraft as the Mariner and
Gemini series and Magellan, thereby producing about 65 GB of new digital data.

4.1 Data Inflow

Tables 8 and 9
characterize the inflow of digital data to NSSDC during 2001. In particular,
Table 8 shows that NSSDC received approximately
3.3 TB of new data in 2001, via a combination of networks and hard media. Table
8 shows data volumes by project, with the astrophysics and space physics subsets
of ISTP/Wind data attributed to their respective disciplines. Dominating the
counts are media-based HEASARC-provided data, Level-0 data from the FAST, ISTP
and IMAGE missions plus data from the FUSE, Mars Global Surveyor and EUVE missions.
Table 9 characterizes the inflowing media types
by discipline. As in recent years, CD-WO media continue as the dominant input
media type overall.

During 2001, NSSDC received approximately 211 GB of data electronically, in
addition to the data arriving on the media reported in Table
9. This 211 GB is included in the Table 8 counts.
The electronic inflow was dominated by ISTP Key Parameter data (109 GB), IMAGE
data (49 GB) and data from the ISIS ionogram digitization effort (48 GB), with
lesser amounts from a number of spaceflight projects.

By data set count, which as noted earlier marks the intellectual heterogeneity
of NSSDC, entireties or parts of 132 data sets arrived at NSSDC during 2001.
Of these, 46 were new data sets, a further subset of which were the first
data from 17 experiments.

4.2 Data Outflow

NSSDC provides user access to its data holdings through multiple electronic
interfaces and, in addition, through a user support infrastructure for the
mailing of offline digital and non-digital data volumes. Most electronic
interfaces are accessible through NSSDC's WWW home page and include (1) special
WWW-based interfaces to specific data sets or groups thereof and (2) ftp pathways
to a range of data files maintained permanently on NSSDC magnetic disk. The
CDF-formatted data underlying CDAWeb are at ftp://cdaweb.gsfc.nasa.gov/
while all other data are at ftp://nssdcftp.gsfc.nasa.gov/ .

The dominant special WWW-based data access interfaces that NSSDC offers to
the research community relate to: ISTP key parameter and a growing range of
other space physics data (CDAWeb);
the OMNI and uniformized-COHO solar wind datasets (through OMNIWeb
and COHOWeb, respectively);
various atmospheric and ionospheric data (ATMOWeb);
IRAS, COBE
and SWAS long-wavelength
astrophysics data; and the Astronomical
Data Center astronomical source catalogs and journal tables. In addition,
FTPHelper
provides a browse/preview functionality for selected ASCII data sets otherwise
only ftp-accessible.

The OMNI data set is an NSSDC-created, 38+ year compilation of cross-normalized,
multi-spacecraft near-Earth solar wind magnetic field and plasma data and energetic
particle data, while the COHOWeb
database is a uniformized set of files of NSSDC-merged magnetic field, plasma,
and position data for each of many deep space spacecraft. Table
10 shows annual statistics for the CDAWeb,
OMNIWeb, COHOWeb
and ATMOWeb systems. Note
the remarkable growth in usage of these systems. In 2001, they were used by
NSSDC’s customers to produce over 700 plots, listings and data files every working
day.

Table 11 reports statistics on the usage of
NSSDC’s executable geophysical
models services and its services for magnetospheric and heliospheric orbits.
The models service lets users specify a model, a spatial point of interest,
and any other parameters on which the model depends, and have the model parameters
computed at the point or along a profile through the point. Table
11 shows that there were about 99,000 such computations done by NSSDC customers
in 2001, with geomagnetic, ionospheric and atmospheric models dominating. This
almost doubles the 52,000 model computations reported for 2000. Ftp access
to models’ software (95,000 file downloads in 2001) is included in ftp access
statistics in Table 12, not in Table 11.

Table 11 also reveals 47,000 orbit computations,
a 68% increase over 2000. Of these, about 84% use the primarily magnetospheric
SSCWeb service and the balance use
the Heliocentric
Ephemerides page.

A great many NSSDC data sets and other information services are held permanently
on magnetic disk for ftp access. The reader is invited to review all these
services from the ftp link on NSSDC
home page. Table 12 gives the annual counts
of files downloaded, both overall (over 3.5 million files in 2001, up by 43%
from 2000) and for selected directories with high activity. Note that the Photo
Gallery, of high public interest, dominates the statistics with 87% of the
total downloads from nssdcftp. The researcher-downloading via ftp of 201,000
CDF-formatted files from CDAWeb and 155,000 data files from the spacecraft_data
subdirectory (more than double the year-2000 number!) shows the high interest
in and great value of these services.

WWW access statistics are frequently misleading, insofar as they usually
individually count the many files (buttons, etc.) that make up a page. Nevertheless,
growth in WWW accesses is indicative of continuing and growing use of the
WWW-provided services. In 2001, there was an average of 12.9 million hits
monthly to NSSDC’s web pages, up by 16% over 2000!

While the dominant mode of dissemination of data to the astrophysics and space
physics research communities is via the internet, NSSDC continues to provide
a high level of offline data dissemination. Table
13 shows that NSSDC responded to over 900 distinct requests for “traditional”
products and that NSSDC provided over 8,700 Milky
Way and COBE posters
to requesters. This poster count is up by a factor of three relative to 2000!
Table 13 also characterizes the user community
of NSSDC’s offline services. To a very large extent it is the U.S. and international
general public, the education enterprise, publishers, etc. and their desire
for NASA imagery on CD-ROM and as film products that account for most of NSSDC’s
offline request activity.

Table 14 gives the counts of requests for offline
data sets from various disciplines in 2001, and as integrated over NSSDC's history.
(A small fraction of requests that are multi-disciplinary are double counted
in this table.) Note particularly the dominance of planetary data over both
time scales. This is largely associated with lunar and planetary image data
that are widely requested by the general public. The high level of astrophysical
offline activity to a large extent reflects requests by the amateur and professional
astronomical communities for ADC catalogs on CD-ROM. Most offline space physics
request activity was for copies of the IMAGE-based “Solar Storms” video tape.

Table 15 shows the most recent 5-year history
of NSSDC's offline satellite data request activity by media type. Several points
are noteworthy. The dominant mode of offline digital data dissemination continues
to be by CD-ROM. It is of interest to note that every working day of 2001,
NSSDC mailed about 9 CD-ROMs to 2 requesters. These numbers are down somewhat
from 2000 as more members of the general public are able to access NSSDC’s data
electronically. Also significant in Table 15
is the fact that while requests to NSSDC for film data declined somewhat over
the past year, the number of film products mailed was steady (excluding effects
of one anomalously large request satisfied in 2000). Finally, for this report
we drop the reporting of magnetic tape dissemination statistics and initiate
the reporting of dissemination of videotapes. The videotapes are as created
within GSFC/Space Science Data Operations
Office.

5. ADDITIONAL NSSDC SERVICES

In addition to its archive of scientific data and the variety of data interfaces
characterized in the preceding part of this Annual Report, NSSDC offers a number
of additional services, which are described in this Section.

5.1 NSSDC Information Management System

The NSSDC Information Management System (NIMS) database now encompasses many
of the separate databases that NSSDC has used to track data and information
through the years. The combining of these databases, on an Oracle-dedicated
host computer, has yielded improved performance and reliability for NSSDC users
and staff. To aid readers through a transition in terminology, we use a mix
of old and new terminology below.

The Automated Internal Management (AIM) database identified virtually all launched
spacecraft, the experiments carried by many of those spacecraft, and data sets
from those spacecraft primarily as archived at NSSDC. This database served
as the source of information for many of NSSDC's WWW information pages. The
NSSDC Master Catalog (NMC)
and a number of discipline and project pages retrieved information from AIM
and built WWW pages "on the fly" so that the latest information is
presented to the user.

The NSSDC Supplementary Data File (NSDF) was similar to AIM, but tracked non-spacecraft
data, multi-source spacecraft or other data, models and programs, and other
NSSDC-held data sets that did not fit the AIM spacecraft/experiment/data set
hierarchy.

The AIM and NSDF databases have recently been merged into a single JEDS (Java
Experiments, Data sets, and Spacecraft) "database" within NIMS that
continues as the database underlying the unchanged NMC web page. JEDS content
statistics are reported in Table 16. Note that
JEDS knows over 5800 spacecraft, 5000 experiments and 5000 data sets. Of this
latter number, only a small minority is formerly NSDF data sets; the remainder
is from AIM. Not all NSDF data sets had been moved to JEDS as of 12/31/01.

During 2001, there were 2.56 million accesses to JEDS through the NMC web pages,
a 56% increase over 2000. Of these 70% and 12% were to spacecraft- and experiment-descriptive
material, respectively. Most of these were likely from the general public.

The Technical Reference File (TRF) tracks individual published papers associated
with space flight experiments. The NSSDC ID for the experiment is attached to
the reference information so lists of papers relevant to a particular experiment
can be reported, and/or provided to persons accessing data from a given experiment
from NSSDC. Table 17 shows that 971 papers
were newly identified in TRF during 2001 mainly as the result of NSSDC staffers
reviewing the Journal of Geophysical Research and the Geophysics Research Letters.
The TRF was used to generate the Appendix listing 96 NSSDC-acknowledged papers
published in 2001.

The Java-based Request and Name Directory (JRAND; formerly IRAND) tracks people
who have interacted with NSSDC over the years. It includes full names, one or
more addresses, telephone and email information, and what NSSDC distribution
lists they are on. The database contains approximately 57,000 entries. This
information is also accessed and made available through the PIMS
interface on the NSSDC WWW Home Page. Further JRAND statistics are available
as Table 18. JRAND also tracks individual staff-involved
requests for satellite and non-satellite data, now more than 82,500 over the
years.

The Interactive Data Archive (IDA) is another database of interest. IDA tracks
the inventory of NSSDC's digital data volumes (tapes, disks, etc.). IDA had
167,807 records at the end of 2001, with 3,383 records having been added during
2001.

5.2
NASA/Science Office of Standards and Technology (NOST) at NSSDC

NOST's mission is to facilitate the recognition and use of standards to reduce
cost/benefit ratios in the exchange and management of scientific data among NASA
entities and the scientific communities they serve. NOST's Web Home Page is at
http://ssdoo.gsfc.nasa.gov/nost/
. The NOST strategy is to play a coordinating role in helping the science disciplines
identify new standards requirements. NOST participates in partnerships with them,
other agencies, and industry on facilitating the adoption of leading-edge technologies
with national or international visibility that can be tailored to meet NASA science
information management and exchange requirements, and it assists in the process
of moving these technologies toward standards with commercial support.

5.2.1 Consultative Committee for Space Data Systems
(CCSDS)

NOST operates NASA's highest level Control Authority office in accordance
with the applicable Consultative Committee for
Space Data Systems (CCSDS) and ISO standards to formally archive data
descriptions for interchange and long term preservation. New registrations
were started, and identifiers assigned, for some 26 ISIS data sets being migrated
into archival packages on new media. As of 12/31/01, there were 438 registered
identifiers, up from 411 a year ago.

NOST participated in the development of draft CCSDS/ISO standards applicable
to multi-discipline and sub-discipline information interchange. The primary
standards and their usage categories were:

Data
Entity Dictionary Specification Language (DEDSL overview in pdf format): This standard addresses
the problem of providing a standard way to document and exchange the various
attributes needed to fully define data elements. It has been harmonized with
the conceptual data element standard from ISO known as ISO 11179 and the ANSI
X3.L8 standard known as X3.285. The DEDSL is split into two components - one
addressing the conceptual model and one addressing interchange forms. The
conceptual model and an interchange form using the ISO Parameter Value Language
(PVL) were completed as CCSDS standards in 2000. An interchange form using
XML with DTDs was completed during 2001 and is in the process of being published
by CCSDS. All three of these standards have also been submitted to ISO for
their approval. These standards support the publication and exchange of data
Elements, and groups of data elements, and should lead to more automated access
and understanding of data across science disciplines and among organizations.

Reference
Model for an Open Archival Information System (OAIS description in pdf format): This standard provides
a conceptual model of a digital archive, including a functional view and an
information view, and it provides a framework for discussing migration issues
and interactions among archives. The model establishes initial criteria for
recognition of a true archival function and should lead to improved archival
implementations, provide a basis for further standardization, and provide
more cost-effective vendor support. It has been adopted as a starting point,
in addressing digital preservation issues, by an ever growing variety of organizations
around the world. The reference model draft underwent formal agency review
as both a CCSDS standard and an ISO standard, and was updated in accordance
with the comments. It has undergone a second CCSDS review and has not yet
emerged from the second ISO review. As very few additional comments are anticipated
it is expected to become a full ISO standard in the Spring of 2002. It can
be found at http://www.ccsds.org/documents/pdf/CCSDS-650.0-R-2.pdf.

5.2.2 XML Workshop

During 2001, NOST organized and hosted a 5-day XML workshop under the sponsorship
of the CCSDS. This workshop brought together some 24 experts from the US
and international space agencies, academia, and industry to address the application
and harmonization of XML efforts in the space domain leading to viable standards.
Good progress was made in the technical area of packaging data using XML and
in identifying the objectives and need of a continuing working group. It
was recommended that a formal group be established under the CCSDS to further
the harmonization effort. This action has gone to a vote by the members of
the CCSDS Management Council and it appears it will be accepted. The workshop
report is available from: http://www.ccsds.org/meetings/xml2001summer/papers/ReportOfXMLWG.doc

5.3 Astronomical Data Center

In the Astronomical Data Center (ADC) (http://adc.gsfc.nasa.gov/
), over 3700 astronomical source catalogs and journal tables are maintained
online for easy access. Some 700 new data sets were acquired during the past
year. Entire catalogs and tables can be retrieved via FTP. Web-based visualization
tools (http://tarantella.gsfc.nasa.gov/adf/visualization/design.html
) are available for browsing, plotting, and subsetting the contents of the catalogs
and tables before download. Users can query interactively for information on
individual plotted data points and search for observations made by NASA missions.

ADC staff members conduct research using the
eXtensible Markup Language
(XML), and ADC users are benefiting from this effort. During 2001 the ADC established
a public XML-based repository of 500 ADC data sets and unveiled the first data
access services for this repository. An example of the new services is a form-based
metadata search capability that takes advantage of various browse indices (author,
keyword, etc.). The XML repository and services were shown to attendees of the
American Astronomical Society Meeting in Washington, D.C. during the week of
January 7, 2002. Access to the ADC's XML Public Archive is available on the
Web at http://xml.gsfc.nasa.gov/archive/.

The ADC’s research activities and data holdings
are highly relevant to the development of a National Virtual Observatory. During
2001 ADC staff participated in the Aspen NVO workshop and the NSF-funded NVO
Framework project, and gave talks at four universities.

ADC staff members are developing, as Project
AstroData, a pilot series of on-line science education tutorials and exercises
for K-12 students. These products are intended to demonstrate the connections
between new HST observations and existing astronomical data at the ADC that
students can easily access. The AstroData web site is at http://adc.gsfc.nasa.gov/adc/education/astrodata/
.

5.4 Common Data Format

The NSSDC Common Data Format (CDF)
is a self-describing data format for the storage and manipulation of multidimensional
data in a discipline-independent fashion. CDF is comprised of three parts,
the CDF data files that contain both the actual data values and metadata, the
CDF software library that is used to create, access, manage, manipulate, etc.
CDF files, and a well-defined Applications Programming Interface (known as the
CDF Interface) that provides transparent access to underlying software and data.
The NASA ISTP
and IMAGE missions and the ESA Cluster
mission use CDF extensively. We also note that CDF underlies NSSDC’s OMNIWeb,
COHOWeb, CDAWeb
and SSCWeb services.

During 2001, NSSDC's CDF office released CDF 2.7.1 and the CDF Perl Applications
Programming Interfaces (APIs). CDF 2.7.1 contains a more robust CDF library;
a built-in installation support for Solaris on PC, Mac OS-X on Macintosh, and
Linux on DEC Alpha; and a complete suite of the 7 CDF text-based and Graphical
User Interface (GUI)-based tools. Besides the current C, Fortran, and Java
APIs, the advent of the CDF Perl APIs enables a CDF application to be written
in popular Perl scripts and run on any one of the Perl-supported platforms without
modifications.

In a bid to facilitate and promote data sharing with other data formats, the
CDF office developed a HDF5-to-CDF translator and adopted Extensible Markup
Language (XML) as a basis for establishing interoperability with other scientific
data formats. The adoption of XML resulted in creation of the CDF Markup Language
(CDFML) that employs some of the basic building blocks/objects defined in eXtensible
Data Format (XDF) within CDF tags to describe CDF data and metadata. XDF is
an XML-based scientific data format, and it is considered by many to be the
most matured Web-based scientific data format available today.

A web page at http://nssdc.gsfc.nasa.gov/cdf/ provides
a description of CDF, access to the software distribution, documentation, papers,
a list of Frequently Asked Questions, and facilitates interaction with the CDF
support group at the NSSDC.

Approximately 14,000 files were FTP-downloaded from the CDF directory of NSSDC’s
anonymous account during 2001. These were mostly files describing CDF, software
tools from the CDF library, etc. In addition, a great many users browse the
CDF web pages identified above.