Is caBIG Ready to Bloom?

April 14, 2006 | What is the National Cancer Institute’s Center for Bioinformatics initiative called the Cancer Biomedical Informatics Grid (caBIG), and why should researchers care?

Essentially, the caBIG project is a voluntary network of researchers and organizations whose goal is to “create the World Wide Web of cancer research.” To that end, the effort has developed criteria for applications or analytical processes so they may more easily work together and share data (see Going for the Gold). Additionally, caBIG is undertaking projects in distinct areas such as its efforts to develop clinical trials management systems, ontology mining tools, and systems for in vivo imaging.

“The goal is to create a network that connects teams of cancer investigators, their data, and their tools to accelerate discovery,” said Kenneth Buetow, director, NCI, Center for Bioinformatics, and the person heading up the caBIG effort.

When it was proposed in 2003, the idea was to address the lack of a “unifying infrastructure in cancer research,” according to a 2004 caBIG white paper that discussed a data-sharing prototype grid. That prototype grid was based on the Globus Tool Kit and the Open Grid Services Architecture-Data Access Integration technologies. Other technologies singled out were Web services, peer-to-peer, grid computing, metadata, object-oriented data representation, and Semantic Web.

The initial scope and activities of caBIG were determined in discussions among various NCI Cancer Centers. Five areas ripe for collaboration were identified: clinical trial management systems, integrative cancer research, tissue banks and pathology tools, architecture, and vocabularies and common data elements. Over time, other subjects will be tackled.

“From its inception, the caBIG pilot has planned to expand to include additional key stakeholders,” said Buetow. For example, newer work includes imaging and proteomics projects.

Since caBIG launched in 2004, there have been more than 70 caBIG products (as they are called) developed and delivered. These products include white papers, vocabularies, data specifications, and software tools such as a Web-based application for managing clinical trial data across multiple trials, a microarray data repository, a gene ontology miner tool, and many others.

Broader ReachWhile the caBIG effort is NCI-centric, its benefits could extend to the general life science community. But that’s only if the community knows about caBIG.

Unfortunately, the majority of researchers barely know what caBIG is. An online poll of Bio-IT World readers found about 35 percent of respondents didn’t have a clue about the caBIG effort. An additional 24 percent said they were vaguely aware of it. Only about 13 percent reported they were actively involved in caBIG-related development or deployment activities.

It’s expected that the lack of awareness by the general life science community will decline quickly because vendor tools now used by NCI must meet caBIG data-sharing and integration criteria. While the majority of vendors are not actively branding their products caBIG-compatible (some likely will), the benefits from making such applications fit into NCI already can be useful to many researchers.

To get the project done, CGF and InforSense had to take caBIG compatibility into account. At that time, InforSense CEO Yike Guo noted that InforSense KDE met the compatibility requirements for integration by handling the range of data types and associated metadata involved, but also by providing a way to rapidly plug in Web services and ontologies.

Those capabilities helped speed the effort. “We could move more quickly [with the commercial software] than the other approaches,” said Meredith Yeager, scientific director of the NCI’s CGF.

For this reason, industry experts expect the benefits derived from caBIG to spread throughout the industry.