Press Archive

SRB upgrade helps scientists better manage large data collections

Published 06/20/2003

The San Diego Supercomputer Center (SDSC) at UCSD has released version 2.1 of the popular SDSC Storage Resource Broker (SRB) middleware package, which enables scientists to create, manage, and collaborate with flexible, unified "virtual data collections" that may be stored on heterogeneous data resources distributed across a network.

"In addition to a number of bug fixes, we've made version 2.1 more 'grid-friendly' by including Web Services Description Language (WSDL) features, a pure Java programming interface, and encrypted data transfers," said Arcot Rajasekar, director of the Data Grids Technologies group in the Data and Knowledge Systems (DAKS) program at SDSC.

SRB version 2.1 along with the user manual and release notes are available online
at
http://www.npaci.edu/DICE/SRB/. SDSC SRB Version 2.1 is supported on the following platforms: UNIX, including Linux Redhat 7.3; Solaris; AIX; SGI; and Macintosh OS X; as well as Microsoft Windows 2000.

"There is growing interest in the research community in the SRB software because of the need to integrate, manage, and access explosively growing data collections," said Reagan Moore, codirector of SDSC's DAKS program.

Developed by Moore, Rajasekar, Michael Wan, and the SRB team in SDSC's Data and Knowledge Systems (DAKS) program, the SDSC SRB is being used in projects as diverse as helping astronomers integrate multi-terabyte image collections in the NSF's National Virtual Observatory, enabling NIH-funded neuroscientists to share brain data across the country in the Biomedical Informatics Research Network, and developing persistent archives for the National Archives and Records Administration.

Still other SRB applications include NASA, which is using the SRB to manage massive collections of satellite data; the Science Environment for Ecological Knowledge, a large NSF Information Technology Research project, which will use the SRB to integrate ecological data collections; and the NSF ROADNet project, which is employing the SRB in conjunction with object ring buffers to bring together diverse types of sensor data in real time.

New features in SRB Version 2.1 include better support for Grid Security Infrastructure (GSI); optional data encryption and compression; and SDSC Matrix, a Web service-oriented interface. Matrix uses W3C standards to provide services including data movement, replication, access control, data set ingestion, retrieval, and container support. Other new features include JARGON, a pure Java Application Program Interface (API) for developing portable programs with a grid interface; the ability to bulk load data without requiring the use of a container; the ability to list host-specific resources; configurable parameters for determining the number of threads for parallel transfer; and a SRB Python binding.

The SRB Data Management Middleware

The SDSC SRB is client-server middleware that solves many problems associated with traditional file systems. What appears as a single collection to the user is in fact a virtual collection consisting of digital entities scattered across distributed, heterogeneous storage resources, including file systems, archives, and databases. The SRB makes all these differences transparent to users, negotiating all protocols, access permissions, etc. across the multiple sites, so that users can access data based on familiar, user-defined global file names. Users are freed from having to keep track of such complexities as local file names, physical locations, protocols, and security arrangements.

As scientific disciplines become more integrated, data sharing becomes more important. The SRB is very helpful in collaborative science because it can finely tune data sharing and access according to the needs of individual researchers and groups in complex collaborations. Users can also quickly and flexibly "repurpose" or restructure collections through customized "views" that they shape by searching with rich descriptive metadata that is expressed in familiar, user-defined terms.

The SRB organizes metadata about the data and files in the MCAT metadata catalog to help researchers assemble, search, access, and manage collections of data. The MCAT provides a global name space that spans all the separate resources. The power of the MCAT comes from its relational database technology, so that it can be extended to include capabilities beyond those of traditional file systems including more complex access control systems, proxy operations for such things as delivering subsets of a collection, and knowledge discovery based on system- and application-level metadata.

SRB collections are highly scalable, both in size and in distribution across remote sites. SRB collections at SDSC support nearly 9 million files and 51 terabytes of data. Once a collection is created, it can be transparently replicated, managed, and controlled across geographically distributed locations through any of several interactive interfaces: a command-line interface, and new graphical user interfaces including a Windows-Explorer-like interface called inQ - short for inQuisitor - and a Web interface, MySRB.

The SRB is proven, production software, with more than 200 registered users at more than 50 sites. DAKS researchers on the SDSC SRB project, led by Arcot Rajasekar, include Sheau-Yen Chen, Charles Cowart, Lucas Gilbert, Arun Jagatheesan, George Kremenek, Roman Olschanowsky, Vicky Rowley, Wayne Schroeder, Michael Wan, and Bing Zhu. -
Paul Tooby