InfoSci®-Journals Annual Subscription Price for New Customers: As Low As US$ 4,950

This collection of over 175 e-journals offers unlimited access to highly-cited, forward-thinking content in full-text PDF and XML with no DRM. There are no platform or maintenance fees and a guarantee of no more than 5% increase annually.

Receive the complimentary e-books for the first, second, and third editions with the purchase of the Encyclopedia of Information Science and Technology, Fourth Edition e-book. Plus, take 20% off when purchasing directly through IGI Global's Online Bookstore.

Arcot Rajasekar (University of North Carolina at Chapel Hill, USA), Mike Wan (University of California at San Diego, USA), Reagan Moore (University of North Carolina at Chapel Hill, USA) and Wayne Schroeder (University of California at San Diego, USA)

Abstract

Service-oriented architectures (SOA) enable orchestration of loosely-coupled and interoperable functional software units to develop and execute complex but agile applications. Data management on a distributed data grid can be viewed as a set of operations that are performed across all stages in the life-cycle of a data object. The set of such operations depends on the type of objects, based on their physical and discipline-centric characteristics. In this chapter, the authors define server-side functions, called micro-services, which are orchestrated into conditional workflows for achieving large-scale data management specific to collections of data. Micro-services communicate with each other using parameter exchange, in memory data structures, a database-based persistent information store, and a network messaging system that uses a serialization protocol for communicating with remote micro-services. The orchestration of the workflow is done by a distributed rule engine that chains and executes the workflows and maintains transactional properties through recovery micro-services. They discuss the micro-service oriented architecture, compare the micro-service approach with traditional SOA, and describe the use of micro-services for implementing policy-based data management systems.

Introduction

Traditional data management requires the application of administrative functions to enforce management policies such as backup, retention, and disposition, and to validate assessment criteria such as authenticity, integrity, and chain of custody. The administrative functions require the management of state information about each file including the location, owner, and access controls. Service Oriented Architectures provide mechanisms to tune environments to implement specific data management policies by chaining procedures together. We explore whether a policy-based data management environment can be created that provides the extensibility of SOA while managing state information normally associated with digital libraries. We demonstrate that data analysis environments can be tightly integrated with data management environments. Indeed, for petabyte-scale collections, it is not feasible to move the entire collection to a compute server. Data analysis procedures will need to be applied at the storage location to extract the data sets of interest. In practice, it is more effective to execute low-complexity operations (that have a small number of operations compared to the size of the data in bytes) at the remote storage location. A simple example is the extraction of a subset of a file. It is faster to extract the data subset at the storage location through partial I/O commands than it is to move the entire file to a remote compute engine. Data analysis can be significantly accelerated through the execution of services at remote storage locations.

These are the driving motivations behind the integration of data processing functions into the data management infrastructure, and the execution of the functions under the control of a service oriented orchestration. We have integrated the SOA paradigm with collection management functions within the integrated Rule Oriented Data System (iRODS), and applied the technology in support of data sharing environments, data processing pipelines, data publication systems, data preservation systems, and data federation environments for long-term sustainability.