ESRL Global Systems Division

The Object Data System

Paul Hamer
Colorado State University - Cooperative Institute for Research in the Atmosphere
NOAA/ESRL/GSD
325 Broadway R/GSD
Boulder, CO 80305

Abstract

This paper describes the evolution of the Forecast Systems Laboratory (FSL) [1] Central Facility data ingest, routing and product generation systems from the legacy configuration to a system that is both highly configurable and one that supports the processing of datasets in real- and non-real-time (case-study) situations. In addition, we will highlight the significant cost savings made through the use of Object Oriented Analysis and Design (OOAD) coupled with the use of Open Source[2] for both the development and deployment of the resulting systems. In particular, we will discuss the reduced software development costs in generating the required products from both existing and new datasets obtained by the new system. Finally, we will give examples of how the system supports the FSL mission of technology transfer to entities outside of the laboratory and possible future developments within the Central Facility.

1. Introduction

At FSL the Central Facility plays an important role in supporting both scientists and engineers in the development of the forecast systems to be used, now and in the future, by both the National Weather Service and private industry. Within the Central Facility the Data Systems Group (DSG) is responsible for the obtaining, storing, transforming and distrubuting of any dataset required by the research groups within FSL.

Since dataset volumes and types are ever increasing, a refactoring of existing ingest and processing was considered necessary in order to reduce both software costs and the software development lifecycle duration. Initial efforts to reduce costs involved the introduction of Open Source[2] software for both development and deployment (e.g. the GNU Compiler Collection (GCC) and other GNU tools.) A switch to Linux running on less costly hardware was also embarked upon while continuing to maintain software portability to other platforms. One of the largest cost reductions can be made by reducing the software development time and therefore refactoring of the legacy systems using OOAD was considered important.

First, a description of the legacy system, developed in the early nineties and called the Posix Data System (PDS) that contained, at its core, the Networked Information and Management client-Based User Service (NIMBUS)[3]. PDS/NIMBUS was the DSG switch from proprietary operating systems to an open architecture under UNIX. The basic concept was for processes to pass data and notifications via a routing process (called the "cloud") to other processes that had registered to receive those data (See Figure 1).

Figure 1 - PDS/NIMBUS Architecture

PDS/NIMBUS worked well initially, but the effort required to introduce new datasets became increasingly costly in both software development and the configuration required for the software to operate. The development of the Object Data System (ODS) was initiated with the aim of reducing the time taken to introduce new datasets and generate user requested products while, at the same time, increasing the flexibility of the configuration to allow the rapid introduction of the new software. As an additional benefit the increased flexibility and open source development tools used enable the software to more easily port and run outside of the confines of the Central Facility and thus support the FSL mission of technology transfer.

2. The Object System Design

From the start the ODS intended to leverage existing software packages, the Local Data Manager (LDM)[4] from Unidata for example, and to decouple the metadata from the the datasets themselves. Further, the design of ODS was primarily driven by looking for known "patterns" of software design for solutions to the problems faced by DSG in carrying out its function of supplying required data.

The first design decision made was to remove the .cloud server. from the distribution scheme. Consequently, data ingest and routing is now to be handled through LDM. The advantages of this approach are that LDM is widely used, several existing software packages are available for handling common meteorological datasets and, being open source, developers can extend LDM to even better support ODS development. The extensions to, and bug fixes in, LDM are discussed with Unidata regularly and, when implemented, introduced into new versions of LDM that are then released to the user community. Since a full description of ODS is beyond the scope of this document the following sections are intended to highlight some of the concepts used in the development of the system and give an example of how the Central Facility contributes to FSL's mission.

2.1 Design Patterns [5]

Most, well designed, object-oriented architectures contain patterns, where a pattern is nothing more than a simple and elegant solution to a specific problem in object-oriented software.

Figure 2 - The Observer Pattern

Within ODS we have attempted to identify those patterns and have implemented solutions accordingly. For example, access to data from LDM is achieved through use of the Observer pattern which allows clients to access LDM data in real-time as they arrive in LDM product queue. (See Figure 2). From this pattern we were able to develop processes to handle all data sets in a common way and therefore simplify both development and deployment. Other patterns commonly used are the Singleton/Multiton , Proxy and Factory patterns, all of which provided well-crafted solutions for the problems faced in the development of ODS.

2.2 ODS Architecture

Using LDM as the data routing mechanism allows ODS to use a facility that provides for the triggering of jobs on events using notifications. This is handled through a product queue action where specified product keys will result in jobs to handle a given event being spawned. (See Figure 3).

Figure 3 - ODS Architecture

Processes responsible for product generation, typically spawned on arrival of the original data, arbitrate between objects that are proxies for real data types, GRIdded Binary (GRIB) for example, and those objects responsible for certain products, like netCDF. This much simplified architecture supports rapid introduction of new datasets and a more manageable distribution model for the processing of all data.

As an example of the significant savings available under the new system, consider the handling of new model data from a generating center previously unknown to FSL using a generating center defined parameter table. In terms of developer effort, ODS needs less than a day of person effort to take a requirement to make this new model in GRIB available as NetCDF compared with 15 or more days of effort under the legacy systems to support the same requirement. This is possible because no software is required to be modified or written, tested and integrated; the ODS solution needs only the metadata in the form of the parameter table from the originating center and netCDF description file for the required product. Similar savings have been realized for other data types using the ODS model.

Another significant advantage of ODS is the archive and recall capability. Under ODS data inserted into LDM is tagged with a product key that includes original receipt time. This has enabled the development of the FSL Data Repository (FDR) for data archive using the same class libraries that ODS uses for real-time processing from LDM data ingest. FDR archives can then themselves be used directly in any ODS configured system to generate case-study data products by replaying data from the archive through a LDM.

3. Technology Transfer

A core element of FSL's mission is to transfer technology developed within the research environment to entities outside FSL. In support of that mission an effort has been made to package elements of ODS using GNU's autoconfigure[8] tool. Packages available from DSG currently include the following ODS elements:

LdmNexrad2NetCDF - Level II Nexrad data to NetCDF;

Grib2NetCDF - GRIB editions 1 and 2 to NetCDF;

GOES GVAR Ingest and NetCDF production.

All of these packages are in use outside of FSL. Interested parties should contact Peter Mandics, FSL's Chief Information Officer (Peter.A.Mandics@noaa.gov)

To more fully illustrate this aspect of FSL's mission we can cite the distribution of the GVAR ingest and associated NetCDF generation software as an example. Internally, GVAR processing within the Central Facility is handled for all operational GOES satellites in order to support, amongst other things, model initialization and analysis for the Local Area Prediction System (LAPS). The Central Weather Bureau (CWB) of Taiwan use LAPS for local forecasting and, due to the failure of Japans GMS weather satellite with the subsequent retasking of the GOES-9 satellite to the region while waiting for a Japanese replacement to become operational, needed to deploy a low cost solution for ingesting GVAR data from GOES-9. DSG's solution was choosen because of the low cost, compatibility with LAPS and the ability to deploy the solution quickly into an operational environment.

Members of DSG were able to contribute in the specification of the hardware, deploy, install and configure ODS software within the required timeframe to support operational start date for GOES-9 in large part due to the ODS architecture and development. We've also used the transfer opportunity to develop new tools to help FSL and the Central Facility to better manage GVAR data internally as these tools were additional requests made by CWB to help address operational concerns. Currently the ODS GVAR is in operation helping CWB provide weather forecasts and information for its citizens[9]