Presented at AMS Conference 12th International Conference on Interactive Information and Processing System (IIPS) for Meteorology, Oceanography, and Hydrology
Atlanta, Georgia
January 28-February 2, 1996

Table of Contents

The National Weather Service (NWS)Office of Hydrology (OH) has begun development of the
NOAA Hydrologic Data System (NHDS). The essential characteristic of the NHDS that discriminates it from previous data management approaches is its scope and level of integration, its cross-cutting nature. Current data management systems have been designed for specific isolated purposes, perhaps a few purposes in some cases. The NHDS will satisfy a diverse range of clients and needs and will provide access to the broad range of information that is required by new hydrologic data assimilation and analysis techniques. The information managed by the NHDS spans the time domain from real-time to historical, and spans the quality domain from as-is to high levels of quality managing raw to highly processed data. Information within NHDS will be transitioned across the time and quality domains from as-is-real-time to high-quality-historical. This paper discusses the NHDS from the point of view of the problems it is to address, as well as the
interplay between NHDS and the context in which the development takes place.

While this may seem a trite question, it is useful to reexamine
why data is needed and how it is used in
order to establish a basis for operationally useful data systems.
Data is used in both the operational and
development environments. The basic assumption we make when
forecasting is that if we can simulate the past
then we can successfully predict the future given a knowledge of
the appropriate inputs. In an operational
environment, this means that we need observations to know if we
have been successful in simulating the
processes that lead to current conditions. If the simulation has
not been successful, we use information
contained in the observations to bring the simulation to a
reasonable representation of reality. In the
development environment, we try to produce models whose physics
are appropriate and whose parameters
enable the physics to represent to the hydrology of particular
basins. Again, we use the information contained
in observations as a representation of the reality we are trying
to simulate and then we vary the model physics
and the model parameters until we are satisfied that we have a
useful model for the particular basin.
In addition to these two situations, we can use simulations to
infer information that is difficult to measure.
In these cases, the inference is a function of the model physics,
model parameters and the inputs that we used.
For example, soil moisture accounting models can be used to infer
information about moisture fluxes between
the atmosphere and the soil surface - a current area of
investigation that is not easily characterized by
measurement.

Observed data is used to infer the reality we are trying to
represent by simulation. Generally, the physics
we are trying to represent is distributed in space and time,
whereas the observations we make are merely
samples of the distribution. The task is then is to sufficiently
interpret the samples so as to make them useful
to the physics of the models. We can use statistical tools,
guided by an understanding of the physical process
to infer the actual time/space distribution that the observations
have sampled. In contrast to ■data analysis■
which is the process of inferring reality from the samples,
■quality analysis■ gives us information relating to
both potential errors in the sample and potential errors in the
inference. Errors in the sample can arise not
only from variation in the parameter being sampled, but also from
errors in communications, transcription,
equipment etc. For example, there are cases where several
devices for measuring river stage are used at a
single location but with different vertical references. This can
lead to simultaneous observations which are
apparently quite different if knowledge of the vertical frame of
reference is not associated with the particular
observation. If these apparently different observations are used
for data analysis without being corrected
during quality analysis, a false picture of the variation in the
parameter itself will be implied.

Data exists in and spans a variety of domains. This point can be
clearly understood by looking at the time
and space domains. The time at which the observation was made
i.e. at which the datum is valid, is the point
in the time domain at which the data exists. The time attribute
may be complicated by the fact that the data
may apply to a non-instantaneous period, e.g., it may be an
average for a period or an accumulation for a
period. Similarly, the data exists in the space domain in that
it has location. The location attribute may be a
point, a line, or an area, etc and the datum may be an average
over the location or some typical or extreme
value over the location (spatial representation). While there
are many domains in which the data may exist,
there a few which are particularly relevent to the NHDS.

Clearly the time and location domains are important. The time
domain can also be classified into
"operational" and "historical" where the term operational applies
to near real time observations and the term
historical applies to data which is older. By using this
classification, it becomes apparant that operational data
becomes historical as time moves forward.

The position of the data in the data type domain (what is the
data type?) can be used to determine if the
data should be continuous over space and time, or discrete. This
knowledge helps in determining appropriate
data analysis procedures. The quality domain provides
information about the power of the information content
of the data to imply the reality of the parameter being
described. Another way of looking at this domain is to
assume that it is synonymous with the level of processing (which
is not always the case). For example, the
NEXRAD Stage III precipitation product has a much higher level of
processing and a much higher information
content than an individual rain gage measurement.

The knowledge that one combines information that exists at
different positions in various data domains,
provides assistance in choosing appropriate data analysis
techniques.

The classification of the time domain into operational and
historic also serves to highlight different
approaches that exist today within the NWS hydrology program.
Operational data is obtained from sources
with a wide range of quality. The requirement to produce
forecasts in near real time places a constraint of the
level of processing that can be applied to the data. Data
analysis applied in near real time is minimal, however
the NWS is attempting to improve the level of analysis in order
to obtain a better estimate of reality. An
example of this higher level of processing is the NEXRAD
Precipitation Processing System.
Operational data has generally been lost! NWS offices have not
been charged with archiving data.
Historical data to be used in model development and calibration
is generally obtained from agencies that are
charged with archiving such as the National Climate Data Center.
However, significant amounts of operational
data do not enter these archives, and often the sources of the
archived data are different from the real time
sources. This results in statistical differences between the
operational and historical data sets. Using a model
with operational data, that has been developed and calibrated
using historical data raises questions with respect
to model output.

Historical data often undergoes a broad range of quality analysis
and/or data analysis conducted by the
archiving agencies. In some cases information pertaining to the
operations performed on the data is published,
in other cases it is not, adding to the complexity of the data
analysis problem when combing the data in
subsequent analyses.

Another issue for NHDS to deal with is that while in the past, we
have focused on individual points of
data and their associated quality, newer modeling techniques are
driving us to using estimates of the time/space
distribution of the sampled parameters. This requires a new
approach to data analysis, one that considers the
data as multiple samples of a time/space distribution rather than
as independent and discrete events.

Data systems in the NWS have generally grown as an adjunct to
either communications systems or science
applications such as modeling software. As a result, a variety
of data systems have evolved, each tailored to
the specific application they are associated with. The result of
this evolution is that there are a number of data
systems in hydrology, all of which do some of the job of managing
hydrologic data and none of which do all
of the job. This situation of ■disaggregated■ or ■anarchic■ data
systems creates problems when a data user
tries to get all appropriate data regardless of domain. The task
becomes one of aggregating data from a variety
of systems, in a variety of formats into a single useable data
set (which then becomes yet another system).
The NWS hydrology program relies to a large extent on the
NWS River Forecast System (NWSRFS)
which incorporates a custom data management system. While the
NWSRFS data management system meets
the majority of the needs of the NWSRFS operations, it still
displays the characteristics of having been
developed as an adjunct to the specific applications. This
evolution is in contrast to an approach which
considers a data system initially in the context of the data to
be managed. Such an approach reflects more of
the natural architecture of the data and while considering the
access patterns of particular applications does not
become overly constrained by them, resulting in a data system
that allows for evolution in the use of the data
more readily than systems that evolve as adjuncts to particular
applications.

While it is important that a data system be designed from the
point of view of the data to be managed, it
would be unreasonable to imply that the system is independent of
its context. The following paragraphs
discuss a number of the contexts that are pertinent to the NHDS.

The problem of statistical differences between operational and
historical data referred to earlier can be
mitigated by retaining operational data for later use (merging it
with the more traditional historical data
sources). This implies that one of the requirements for NHDS be
that it be able to transition data from the
operational to the historical time domain. Another aspect of the
disaggregation problem is that the results of
data analysis done in the context of one application are not
easily transferred to the context of other
applications. This reduces the effectiveness of model
calibration and development efforts and mitigates against
the results of the intuitive analysis done in the midst of the
event, being available later in the historical domain.
Applications would be more readily able to share information if
it were stored in a shared data management
system.

In the future, the NWS will be making greater use of ESP
(Extended Streamflow Prediction) technology.
ESP uses information contained in historical data to establish
viable distributions of
future outcomes. The technology uses high quality historical
data sets in real-time computation, resulting in
the need for historical data in an operational or real time data
management system.

AWIPS (Advanced Weather Interactive Processing System)
is the future processing environment for NWS
operations. Any complacency resulting from the long gestation
period of AWIPS must be dispelled. AWIPS
is almost here and we must be ready for it. One of the
fundamental assumptions in the design of AWIPS is
that it will be an integrated system, providing all of the
functionality necessary for operations. Key necessary
conditions for an integrated system that are relevant to NHDS
are:

- functions cooperate through shared data
- data can be shared if there is a common definition of the
data
- data integrity is maintained through controlled access
- there is a common look and feel (or user interface)
- the system has an effective user and systems operations
concept

For AWIPS, these concepts will be promoted and maintained through
a software architecture that provides
common system functions to applications through defined
Application Programmers Interfaces (APIs). Data
management integration is achieved through the Data Management
API, common look and feel through the
Human Interface API, and system operations concepts through the
Communications, System Support, and
Monitoring and Control APIs. If it is to be more than yet one
more data management system, NHDS must
exist in the AWIPS context, making use of the AWIPS system
architecture including the APIs.
As AWIPS is a system that will evolve through continuous
improvement (Pre-Planned Product
Improvement or P3I in AWIPS parlance), NHDS must also be able to
accommodate planned improvements in
NWS hydrologic applications such as are foreshadowed by the
Advanced Hydrologic Prediction System.

New data analysis procedures and science are required to generate
the high quality historical input to
calibration, ESP, and model development. The NHDS will provide
tools to generate and use high quality
historical data sets in a productive environment that provides
integrated access to all relevant data. The
environment will provide unified logical access for all hydrology
applications whether developed by OH or in
the field and the results of data analysis will remain available
for future reuse. The NHDS will exist within
the context of AWIPS where many elements of NHDS functionality
will be provided by AWIPS.