Data Warehousing Project for the NIH

ByAllison Proffitt

Oct. 10, 2007 | Three years into its bioinformatics practice within its life sciences division, Northrop Grumman is working on two data warehousing projects valued at over $47 million for the National Institute of Allergy and Infectious Diseases (NIAID).

“There are similarities between the two engagements,” says Kevin Biersack, bioinformatics program manager for Northrop Grumman. Both data warehousing projects offer one-stop shopping to users and both make use of public data. But the projects’ user communities are different.

BioHealthBase is open to both researchers and the public. Working with a science partner at the University of Texas Southwestern Medical Center and two subcontractors, Northrop Grumman has developed BHB to include organisms with public health and biodefense implications including tuberculosis and influenza. Biersack says that the warehouse is public resource useful for scientific research in support of vaccine development and drug discovery.

A major goal of the project is to support researchers developing rapid, inexpensive, and broad-based diagnostic approaches using genomics and proteomics. From the BHB website (www.biohealthbase.org), searchers can run queries, analyze their findings, and display them visually without even entering an email address.

Open Source ArchitectureBHB data are culled from several public sources including National Center for Biotechnology Information databases, GenBank, UniProtKB, and internal sources. “We have firewalls, of course,” Biersack says, to protect the data sources. Los Alamos National Laboratory, for instance, is currently collaborating with the BHB team to integrate their data and move their public influenza site to BHB.

The ImmPort system on the other hand, Biersack says, is different because access is limited to researchers funded by NIAID. “In the future, the public data will be moved ‘out front,’” Biersack says, but for now, ImmPort is a semi-public warehouse.

ImmPort serves as an archive for research results for allergy, immunology and transplantation projects supported by NIAID. Researchers have access to private storage, as well as the ability to compare their data, if they wish, with other public research data based on the NIH data-sharing policy. “It’s results-oriented storage,” Biersack says, and ImmPort currently boasts terabytes of total storage space.

The data warehouses are web-enabled and browser based, with quarterly software updates, and the use of “mostly open source,” software Biersack says. ImmPort uses Oracle, Linux, Java 2 Enterprise Edition, and Hibernate. Most of the visualization and analytical tools are also open source and have been leveraged from previous NIH-funded grants.

Northrop Grumman’s contracts for the BHB and ImmPort projects expire in 2009 and 2010, respectively, and Biersack says he “anticipates competition” for these renewals. But for now, he’s focused on providing new software functions to support the needs of the user communities, updating the data, adding storage capability, and “enhancing the scientific discovery process through data integration.”