Abstract

The overarching goal of Unidata's Thematic Real-time
Environmental Distributed Data Services (THREDDS) is to provide
students, educators and researchers with coherent access to a
large collection of real-time and archived datasets from a
variety of environmental data sources at a number of
distributed server sites. The datasets will be conveniently
accessible from a collection of THREDDS-enabled data analysis
and display tools. THREDDS will provide real-time data delivery
via reliable, event-driven "push" technology as well as
transparent access to datasets using "pull" systems that make
it possible to access data on remote servers as if they were on
the user's own computer. The system will be built on a set of
software components and data servers that are already in
operation or under development. The heart of THREDDS is
metadata contained in publishable inventories and catalogs
(PICats). The creation, publication and distribution of PICats
will be facilitated by the discovery system and services
provided by DLESE. For example, sites receiving real-time
environmental data can create PICats describing data products
automatically as they arrive using decoders and crawlers. On
the other hand, since PICats do not have to reside on the
server with the data, researchers will be able to create PICats
for online publications that point to datasets residing on
several data servers. Similarly, educators will incorporate
PICats of illustrative datasets into modules that also include
tools for data analysis and visualization, and students will be
able to use PICats to point to datasets related to their
research projects, just as they now use URLs to point to
relevant documents. This paper presents an overview of THREDDS
and an update on the current status.

1 Overview

In "The
Absorbent Mind" Maria Montessori described education
simply and elegantly: "It is not acquired by listening to
words, but in virtue of experiences in which the child acts
on his environment". On a different level with more detail,
the National
Science Education Standards describe a learning process
based on inquiry: "Inquiry is a multifaceted activity that
involves ... using tools to gather, analyze, and interpret
data; proposing answers, explanations, and predictions; and
communicating the results". These quotes capture the essence
of the interactive data environment that Thematic Real-time
Environmental Distributed Data Services (THREDDS) will
foster.

Each second of each day, observing systems around the globe
are gathering data that provide snapshots of almost every
measurable aspect of our environment: satellites monitor
cloud movements, atmospheric constituents and the temperature
of the land and ocean surfaces. Lightning strikes are
recorded as they occur throughout the country. Global
positioning system and seismic sensors monitor tiny movements
as well as major shifts of the planet's tectonic plates.
Modeling programs are being developed that use the current
data to forecast future evolution on scales ranging from
short-term weather forecasts to very long-term climatic
changes.

The goal of this work is to expand the means by which
learners -- including students, educators, scientists and the
general public -- can use these vast resources to perform
their own inquiries, i.e. to "act on their environment".
Figure 1, a screen dump from a prototype of one of the
THREDDS interactive data analysis and display applications,
illustrates a few of the ways in which users can interact
with environmental datasets that are accessed from remote
servers as if they were on local disks. In this particular
instance, the display is a 3D rendering of the jet stream as
predicted by a supercomputer model dataset on a server at the
National Center for Atmospheric Research (NCAR).

Figure 1. Interactive data analysis and display
application (the screen image above was created by
software engineer Stuart
Wier of the Unidata Program Center
MetApps project)

Data collections are a cornerstone of the scientific research
and education environment. While the amount and variety of
earth system data are increasing daily, the systems for
making these data readily available and useful to the
academic community have not kept pace. We envision a
framework -- a scientific data web -- that will allow faculty
and students to search (in the vocabulary of their particular
discipline) for available data and to find them, regardless
of where the data reside. Just having the data is not enough,
however. Even the many spectacular pictures generated from
datasets available on the Web present an essentially passive
view of what is happening. To interact with the environmental
phenomena represented by the data, users need specialized
visualization and analysis tools that enable them to
manipulate and examine the datasets themselves. They need to
create their own visual images, and they must be able to
manipulate those images in 3D space and perhaps even "fly"
through and around them. It should be possible to move a
probe around in the image to see how the temperature or
pressure changes with depth in the ocean or height in the
atmosphere at different points on the globe. Moreover, it is
important to overlay images of data from different sources.
For example, at the time of a severe thunderstorm, one might
ask how the information about rainfall from a nearby radar
site correlates with measurements of stream flows in the
local river basin. If those measurements indicate a problem
is arising, it would be valuable to overlay predictions from
forecast (meteorological and hydrological) models. Ultimately
it may be important to include demographic information about
populations in threatened areas.

As a two-year project with limited resources, THREDDS clearly
will not do all of this. However, our goal is to build key
components that will make such a system possible and to
incorporate them into a working prototype that includes a
large number of data providers, a group of interactive tool
builders, metadata experts, and representatives of the
digital library community. The broad access to data and
analysis tools envisioned in the prototype scientific data
web will enable educators to work with data in classrooms,
scientists to examine and incorporate data from other
disciplines, and students to explore and test their ideas
using the yardstick of data. Indeed, in the end, anyone with
Internet access will be able to incorporate scientific data
into their everyday lives more easily.

2 Strategy: a Variety of Tools and
Data Sources Bound by Metadata Catalogs

2.1 Interactive Data Analysis and
Display Tools

The strategic goal of THREDDS is to provide students,
educators and researchers with coherent access to a large
collection of real-time and archived datasets from a variety
of environmental data sources at a number of distributed
server sites. The datasets will be conveniently accessible
from a collection of THREDDS-enabled data analysis and
display tools. The arsenal of tools includes Web-based "thin"
clients" that allow the learner to browse and manipulate data
using the processing power on the servers; interactive data
analysis applets that can be embedded directly into html
educational documents; full "thick" client applications that
harness the computing power and flexibility of the user's own
workstation while accessing data from a collection of remote
servers.

2.1.1 "Thin" Client
Browser-based Analysis and Display Systems

On a superficial level, the browser-accessible data analysis
and display tools look similar to the more traditional Web
sites that offer a display of images generated from data.
There is one important difference: namely, these thin clients
enable the user to interact directly with the data by using a
set of analysis tools that run on the server. An example of
this powerful server-based approach resides at the Climate Data
Library of the International Research Institute (IRI) for
Climate Prediction at Lamont Doherty Earth Observatory
(LDEO). The Climate Data Library enables interactive analysis
of datasets on the server via the INGRID system developed by
Benno Blumenthal. A second example is the Live Access
Server (LAS) which was developed at the Pacific Marine
Environment Laboratory (PMEL) under the direction of Steve
Hankin.

The screen shot in Figure 2 is part of a Web page from the
collection of interactive WeatherWise
(WXWise) applets developed by a team led by Tom Whittaker and
Steve Ackermann for use in courses at the University of
Wisconsin-Madison. This particular applet accesses a current
infrared satellite image and allows the learner to see how a
portion of the image would change if the temperature were
higher or lower than it actually is. The learner is then
asked to respond to questions at the bottom of the page. It
is an illustration of an embedded Java applet that allows for
direct interaction with real-time environmental data stored
on THREDDS servers. You can activate the WeatherWise applet
in a Java-enabled browser by clicking on the image.

2.1.3 Fully Interactive
"Thick" Client Applications

This animated loop in Figure 3 is a series of screen dumps
from a prototype application of the Unidata MetApps project.
The loop shows how the user can interact with data on a
remote server. The panels on the left show the parameters
available in the dataset under investigation -- along with a
set of options for viewing the data. The specific data that
have been selected for the 3D rendering are views of the jet
stream predicted by a supercomputer forecast model run at the
National Centers for Environmental Prediction and delivered
to a THREDDS server at NCAR via Unidata's Internet Data
Distribution (IDD) system. Using the Distributed Ocean Data
Systems (DODS) client-server protocol, the application was
able to bring across only the subset of the data needed for
the visualization. The loop illustrates several aspects of
the image that were generated by the user manipulating the 3D
image with her mouse.

In the long term, the intention is to develop THREDDS
capabilities to the point where one can embed pointers to
datasets and tools into online publications such as this one.
In the meantime, it is still necessary to install some
client-side software components on your own computer. If
you're interested this can be done for the current beta test
version of at least one of the client applications. There are
two approaches to this. One is to
get the full Java application running on your own
computer. The other is to
use a Java applications startup facility called WebStart.
Both approaches are described by Stuart Wier at http://www.unidata.ucar.edu/staff/wier/index.html.

2.2 Distributed Data Sources

The schematic in Figure 4 shows how a user running a THREDDS
client on a local workstation can access data from a number
of distributed servers, each of which has its own emphasis or
"theme". Many of the servers are in turn populated with
environmental data in real time via the IDD system that has
been delivering data to nearly 100 universities for the last
seven years. A few of these servers already exist, others are
being built, and a couple (the streamflow and demographic
data servers) are still in the formative idea stage.

Figure 4. Client data access from distributed data
servers

Figure 5 shows how data from a set of servers can be plotted
together in an interactive application. Only the required
portions of the datasets are transmitted over the network and
the application can allow for the wide variety of spatial and
temporal resolutions for each data element. This particular
screen image is one frame from an animation showing the
evolution of the data over time.

Figure 5. Interactive analysis and visualization of
data from distributed servers (The screen image
above was created by Don
Murray lead software engineer on the Unidata Program
Center MetApps project. The prototype application that
generated the image was developed by Unidata in collaboration
with the Atmospheric
Technology Division at the National Center for
Atmospheric Research)

2.3 Metadata Catalogs

At the heart of THREDDS is metadata contained in publishable
inventories and catalogs. Based on XML, these inventories and
catalogs can be created in many different ways. Data
providers receiving real-time environmental data are
instrumenting decoders to create entries describing data
products as they arrive and become part of the data server
inventory. Crawlers are being implemented to create
inventories by traversing existing retrospective data
collections. Since catalogs do not have to reside on the data
servers, researchers will be able to create specialized or
personal catalogs for research publications that point to
datasets residing on several data servers. Educators will
incorporate catalogs of illustrative datasets into
educational modules that also include tools for data analysis
and visualization. Just as they now use URLs to
point to relevant documents, students will eventually be able
to reference datasets and analysis tools related to their
research projects. Since the inventories and catalogs are
text-based, they can be "harvested" and indexed into Digital
Library for Earth System Education (DLESE) and other digital
libraries.

The screen shot in Figure 6 is also from a prototype client
data analysis application, part of the Unidata MetApps
development project. The screen illustrates key aspects of
THREDDS data catalog access from within a client application.
First, the pop-up "Choose DODS Dataset" window enables access
to several catalog servers on different machines on the
Internet. The lower part of the pop-up window shows a menu of
data items available on one of the servers. This particular
catalog has dataset entries arranged three different ways: by
variable, by model, and by experiment. The details of the
individual catalog entries are not important, but one should
note that the words associated with each dataset or
collection of datasets can be chosen by the creator of the
catalog and that the catalog itself can refer to datasets and
collections of datasets on a variety of data servers.

Figure 7 is a screen shot from another MetApps client which
depicts a catalog that is automatically generated as
real-time weather forecast model data arrives at the
motherlode server at NCAR. In this case, the main menu items
are the names of the various models and one of the model
collections, SST-A, has been opened to show the individual
datasets available on the server. In essence, the
hierarchical list in this case comprises an inventory of the
model output datasets available on the server at the time.

Figure 7. Data server inventory listing as seen in
analysis and display tool (click on the image to see
the current version of the catalog - needs an up-to-date
version of Internet Explorer)

Figure 8 is a different view of the same catalog shown in
Figure 7, seen from within an application accessing the
catalog. The view below shows the actual XML code for the
catalog as seen from within the Internet Explorer browser. If
you are viewing this page with a recent version of
Internet Explorer, you should be able to look at the current
version of the catalog by clicking on either Figure 7 or
Figure 8.

Figure 8. Data server catalog in native XML
form (click on the image to see the current version
of the catalog - needs an up-to-date version of
Internet Explorer)

3 Teams

THREDDS is a highly collaborative project, and this section
lists of the partners working on the three main areas of
THREDDS development: a set of data provider sites; a group of
software developers working on systems for data analysis and
display; and a set of metadata experts relating to Earth
system data collections.

3.1 Data Providers

The following institutions have agreed to be data-server
partners:

The National Climatic Data Center, NCDC,
including the NOAA Operational Model Archive and
Distribution System
NOMADS

Note that NCAR and SSEC will serve as testbed sites for
server-side software. As the project progresses and the
common underpinnings are tested at the initial sites,
additional sites will be added. Sites under consideration
are:

University of Florence Interoperability System for
supporting the Italian Scientific Community

working in the Earth Observation from the Space (SINOTS) for
European satellite data.

It is not possible in this article to provide a detailed
description of the content of each of these sites. Some are
large national data centers. To give a sense of the magnitude
and breadth of a typical THREDDS server, the prototype
systems at NCAR are initially targeted to handle about 1
terabyte of data online. This will hold several months of
data arriving at the site at a rate of about 10 gigabytes
each hour. During busy hours, more than 1 gigabyte of data
arrives at the server, with several products each second. The
products range from satellite images and the output of
numerical weather prediction models that are hundreds of
megabytes to 80-character reports from individual weather
reporting stations from around the world. In between the
product list includes lightning strike data; images and
four-dimensional volume scans from NEXRAD radar sites;
atmospheric data recorded by commercial aircraft in flight;
and vertical profiles taken by weather balloons. By the end
of the project, we hope to find resources to be able to store
a full year of data on the prototype server. The reader is
encouraged to visit the sites to get a more detailed
understanding of the holdings.

3.2 Client Analysis and Display
Tools

The THREDDS prototype will provide examples of a wide variety
of working applications that use our metadata framework to
find, analyze and display data from server sites. This will
demonstrate an end-to-end system for data access and
visualization. The following developers will incorporate our
client-side data-access components (class libraries and
metadata access) into their own data manipulation tools:

Live
Access Server
(LAS, PMEL, Steve Hankin). LAS illustrates the use of a
Web-based (thin) client with the bulk of the analysis and
display generation done on the server side.

Ingrid
(IRI/LDEO, Benno Blumenthal). This is another example of
a system enabling analysis and display of data via a Web
browser.

WXWise
applets
(the University of Wisconsin-Madison, Tom Whittaker).
These applets illustrate the use of Java to embed
data-analysis and display tools directly into educational
modules on a Web site.

Virtual Geophysical
Exploration Environment
(VGEE, formerly The Virtual Exploratorium, the University
of Illinois, West Chester State, DLESE, and NCAR, Don
Middleton). This application incorporates the educational
functions directly into the data analysis and display
tool itself.

Data Discovery Toolkit and Foundry based on EDMI (Earth
Data Multimedia Instrument, New Media Studio, Bruce
Caron). These are a set of data-analysis and display
tools based on IDL and Macromedia Director. They can be
used to generate elaborate educational modules.

Meteorological Applications
(MetApps) (Unidata Program Center, Don Murray). A set of
pure Java, platform-independent, two- and
three-dimensional data-analysis and display tools-based
on the VisAD infrastructure.

3.3 Metadata Expertise

As noted earlier, the technological core of this initiative,
the crucial component now under development, is a system for
adding the semantic description of scientific datasets
necessary for data manipulation and discovery. It must
interoperate with data providers, data servers, data clients,
catalog servers, discovery systems and other middleware
components. Investigators will select key scientific datasets
and semantic descriptions developed for an end-to-end
demonstration of the utility of this approach. Unidata staff
will work closely with DLESE to ensure that the resulting
metadata system will interoperate effectively with the
National STEM (Science Technology Engineering Math) Digital
Library (NSDL).

Partners with whom we will consult on matters of metadata and
interoperability are:

The University of Florence (Italy).
Prof. Stefano Nativi is acting as a liaison with the
international metadata standards community.

4 Conclusions

In perhaps a different way than Maria Montessori originally
envisioned, THREDDS will provide a way in which we can learn
by "acting on our environment". Much work remains to be done
to achieve the long range THREDDS mission of developing an
environmental data web that allows learners of all ages to
find and interact with datasets that illustrate the current
state of the global environment, but we have designed the
system and have begun construction. This article provides a
glimpse of the interactivity that will be possible, a sense
of the range of data types and partners involved in the
effort, and a basic understanding of architecture of the
system and the approach being taken to make it a reality.

Acknowledgements

The authors wish to thank the National Science Foundation
Division of Undergraduate Education for making this work
possible as part of the NSDL initiative under the direction
of Lee Zia. THREDDS is a highly collaborative project, so
thanks are in order to all the individuals and organizations
who are working with us as collaborative partners. These
partners have been cited individually in the article.