There has been an effort in recent years to establish long-term ocean observing systems in several countries, such as The Integrated Ocean Observing System (IOOS®) in the U.S., MyOcean in Europe and the Integrated Marine Observing System (IMOS) in Australia. These large observing systems have moved from being question or issue specific to becoming more generalized infrastructure to support a wide range of scientific and societal uses.

Part of this change has been the idea of integrated observations—the 'I' in IMOS and IOOS. This reflects a number of issues, such as the realization that new understandings will come more and more from multidimensional datasets. It is the combination of data from a range of sources that is essential to understanding the processes and outcomes of a changing world. The other issue is that the data will be used for very different purposes than what they were collected for. The reuse and reanalysis of data will become a critical way of extracting understanding in a world where it is often impossible to go back and resample.

While the idea of integration has become important, many projects struggle to understand what integration actually means, especially in terms of the design and deployment of observing systems and how data integration can actually be delivered. This article looks at data wintegration efforts for an observing system deployed on the Great Barrier Reef.

Observing the Great Barrier Reef
The Great Barrier Reef Ocean Observing System (GBROOS) is a regional node of Australia's IMOS, focusing on the Great Barrier Reef of Australia. In the southern area, within the Capricorn and Bunker groups of islands and reefs, a range of observing equipment has been deployed.

The instrumentation includes a coastal high-frequency (HF) radar that provides real-time estimates of wave height and direction and surface currents; a remote sensing receiving station that provides satellite imagery from the Advanced Very High Resolution Radiometer and the Moderate-Resolution Imaging Spectroradiometer on the Terra and Aqua satellites; an array of deepwater oceanographic moorings which provide information on waves, acoustic Doppler current profiles (ADCPs) and water-quality parameters. These are complemented by real-time sensor networks that include a range of in-water and above-water (meteorological and light conditions) sensors located in the lagoons of two reefs in the region.

Fitting an ADCP to a buoy in preparation for deployment off the Great Barrier Reef.

Integration
Achieving real integration across the various components of GBROOS is not trivial; often integration is not fully understood when systems are being designed and deployed. The GBROOS project is working to facilitate data integration from the sensor upward through to data structures, metadata, data descriptions and data-access mechanisms.

At the lowest level, the project has standardized a number of sensors and has looked to use well-known 'traditional' oceanographic instruments that are robust and produce quality data. For the sensor network, smart controllers are added to the oceanographic sensors so that real-time data can be collected and centralized control is possible.

At the next level, common data structures to hold all of the data have been built. Much of the data can be presented as simple X (longitude)/Y (latitude)/Z (depth)/T (time) sets of numbers; the ADCP and the HF radar data are more complex, but even they can be reduced (with processing) to a simple set of values. Underlying this is the need to set rigorous standards for these values. An example is depth: Often equipment is deployed on floating buoys, where the depth is relative to the surface, or on anchored platforms, where the depth is relative to the bottom. To integrate data from a floating and a benthic platform, each must be able to relate to the other.

Part of data integration is a common set of processing and quality control steps. While there is no uniform representation for quality control, there are some examples of community-based systems. GBROOS uses the Intergovernmental Oceanographic Commission (IOC)/International Oceanographic Data and Information Exchange (IODE) quality-control flags with a rules-based software system to confirm the accuracy of the logged and real-time data. Remote sensing and HF radar data are treated differently, with the quality control coming from the processing software—in the case of the radar, the quality control follows the IOC/IODE system.

Deploying a deepwater in-line mooring from the AIMS research vessel RV Solander.

The next level involves the description and representation of the data and the observing systems. The GBROOS project uses the International Organ-ization for Standardization ISO-19115 metadata standard to describe the equipment and data, with metadata records down to the sensor level. The metadata is hierarchical and so can be discovered using systems such as the IMOS Web portal.

The final level of integration is at the data-access level. By using Web services, it is possible to make the data available to clients in a standardized manner. For the instrument data, this can be a sensor observation service feed; for spectral and gridded data (such as the remote sensing and radar data), Web services such as the Thematic Real-Time Environmental Distributed Data Service can be used. Clients can then use these services to fuse the data for further presentation and analysis.

Designing Integration
For most projects, integration is something that is bolted on after the project is designed and often after the equipment is deployed, and to a degree, this was the pattern with GBROOS. True integration, however, requires the setting of standards and commonality at all stages, from design to deployments to the management and dissemination of the data.

At the design level, it is important that the equipment for the various components is, at worst, equivalent, and at best, identical. There needs to be confidence that the parameters being measured by each component have a similar level of accuracy and repeatability and that the all equipment is equivalently maintained.

At the deployment level, it is important that the readings taken by each component are in themselves equivalent. This means deploying sensors at similar depths with equivalent sampling times/frequencies and ensuring that the processing done by the instruments is equivalent. Getting deployment-level integration is often difficult, as the environment will dictate how the equipment is deployed: The differing environments of the open-ocean and reef lagoons will dictate mounting, mooring and housing of the instruments. This is an area where planning is paramount in order to develop a set of common deployment designs for each of the components and environments.

At the data level, it is important that sampling frequencies are matched (i.e., that sampling frequencies are set to multiples of the fastest sampling rate), that instrument clocks are synchronized and checked for drift, that the same time datum is used (e.g., Universal Coordinated Time) and that units and processing levels are matched.

The higher level data integration is more straightforward and relies on the data being stored in the one system or in a set of systems that can be queried as if they were one system. This may require consistent use of identifiers and common higher level data structures (such as the site, equipment and identifier tables).

The final level of integration is at the user level. This means presenting the user with a set of data that 'looks' the same. This means the data have the same quality-control identifiers and processing and are in equivalent units in terms of both time and the value being measured; parameters such as location, time and depth are in the same datum and units; and the data, when coplotted, neatly overlays with no additional processing.

Achieving Integration
The first step to achieving integration is to define what is meant by integration, given the components of the observing system and the questions or issues being addressed. For most systems, it will mean that data from one component of the observing system can easily and logically be overlayed with other data to form a seamless new data product that is both valid and has scientific meaning.

The second step is to set standards for everything and never deviate from these unless the consequences are clearly understood. Standards should reflect those used by the community with the caveat that the standards used must make sense to the end user. One example is measurement units, where often the users will be more familiar with older units (such as knots for wind speed) than the SI units.

The next step is to build systems to ingest all of the data being collected and build linkages between these so that they appear as a single system, even if they are made up of separate components. For example, it is now possible to build data systems that include relational databases, flat files (e.g., netCDF) and XML files and run queries over the collection of components as if it were a single system. The data systems should include standardized processing, quality-control checking, metadata and other data descriptions.

The final step is to document everything. Only by recording everything, preferably in the metadata, can a user decide if the data can logically and validly be merged.

Models as Agents of Integration
Models present a unique data integration platform because they allow preprocessing and treatment of the various data streams and because they can be built to deal with levels of error and uncertainty.

The sophistication of the modeling platform therefore represents an ideal way to bring data together and deliver integrated end products. The modeling work also acts to highlight integration issues and so is useful for testing how the various observational data actually can be melded together.

If this is done at the design phase using test data, it becomes possible to optimize designs to better deliver integrated data.

Conclusions
While many systems use the term 'integrated,' there is often very little actual design work to ensure that true data integration is possible. Standards in equipment, deployment practices, data structures, processing and quality control, and data access are all required to enable data to be truly integrated in a meaningful way. One approach is to work backward, looking at models and modeling needs to inform what observations are useful and how these need to be delivered (e.g., temporal and spatial density, levels of accuracy, validation from other data, etc). Modeling therefore provides a key component in the design of the observing system and is a method of delivering integrated data products.

Acknowledgments
The GBROOS project is part of the Australian IMOS project funded by the Queensland state government and the Australian federal government through the National Collaborative Research Infrastructure Strategy and Super Science Initiative. The authors acknowledge the help of the teams at the Australian Institute of Marine Science (AIMS) and James Cook University, as well as the staff on the research vessels and stations.

Scott Bainbridge is the project manager for the Great Barrier Reef Ocean Observing System component of the Australian Integrated Marine Observing System. His background is in coral reef systems, informatics, sensor networks and real-time data systems.

Craig Steinberg is a physical oceanographer undertaking research in interdisciplinary studies at the Australian Institute of Marine Science through the Responding to Climate Change Research Team. He is subfacility manager of the Integrated Marine Observing System moorings being deployed across tropical northern Australia.

Mal Heron is the director of the Australian Coastal Ocean Radar facility of the Integrated Marine Observing System. He is an adjunct professor of physics at James Cook University in Townsville, Australia. He is also a senior member of IEEE and a fellow of the Institution of Engineers, Australia.

Sea Technology is
read worldwide in more than 110 countries by management, engineers,
scientists and technical personnel working in industry, government
and educational research institutions. Readers are involved with
oceanographic research, fisheries management, offshore oil and gas
exploration and production, undersea defense including antisubmarine
warfare, ocean mining and commercial diving.