Abstract

OBJECTIVE:

To develop software infrastructure that will provide support for discovery, characterization, integrated access, and management of diverse and disparate collections of information sources, analysis methods, and applications in biomedical research.

DESIGN:

An enterprise Grid software infrastructure, called caGrid version 1.0 (caGrid 1.0), has been developed as the core Grid architecture of the NCI-sponsored cancer Biomedical Informatics Grid (caBIG) program. It is designed to support a wide range of use cases in basic, translational, and clinical research, including 1) discovery, 2) integrated and large-scale data analysis, and 3) coordinated study.

MEASUREMENTS:

The caGrid is built as a Grid software infrastructure and leverages Grid computing technologies and the Web Services Resource Framework standards. It provides a set of core services, toolkits for the development and deployment of new community provided services, and application programming interfaces for building client applications.

RESULTS:

The caGrid 1.0 was released to the caBIG community in December 2006. It is built on open source components and caGrid source code is publicly and freely available under a liberal open source license. The core software, associated tools, and documentation can be downloaded from the following URL: https://cabig.nci.nih.gov/workspaces/Architecture/caGrid.

CONCLUSIONS:

While caGrid 1.0 is designed to address use cases in cancer research, the requirements associated with discovery, analysis and integration of large scale data, and coordinated studies are common in other biomedical fields. In this respect, caGrid 1.0 is the realization of a framework that can benefit the entire biomedical community.

The caGrid 1.0 infrastructure and environment. The core caGrid services include the security services (Dorian, Grid Trust Service (GTS), and Grid Grouper), metadata services (Index Service, Global Model Exchange (GME), Enterprise Vocabulary Services (EVS), and cancer Data Standards Repository (caDSR)), and high level services such as the Federated Query Processing service (FQP) and the Workflow services. In addition to the core services, data services and analytical services (e.g., caArray and caBIO services in the figure), which are provided by research groups, institutions, individual researchers, can be discovered and securely accessed using the caGrid core services and protocols. Common invocation patterns are indicated by directional arrows.

The GAARDS security infrastructure and its deployment in a multi-institutional environment. It consists of Dorian for federation and management of grid credentials, the Grid Trust Services for management of a trust fabric in the environment, and Grid Grouper for facilitating authorization and access control. When a Grid client wants to access a secure caGrid data or analytical service, the client can use his/her local authentication mechanisms and Dorian to get a temporary grid credential (based on the long term certificate and private key of the client, which are managed by Dorian). The client can then interact with the secure caGrid service and invoke its methods with his/her grid credentials. The caGrid service may contact the Grid Trust Services infrastructure to check if the client’s credentials are valid (i.e., they are not revoked; Certificate Revocation Lists: CRLs are used to keep track of revoked certificates) and if the credentials are issued by a trusted identity provider. If the authentication of the client is successful, the caGrid service may contact the Grid Grouper service to obtain the client’s group information upon which access control policies may be defined. Depending on the authorization information, the caGrid service may deny the client’s access to some or all service methods or the data served by the service.