NCI Cancer Data Underscores Need for Shared Downstream Analytics

July 24th 2013

Last week’s announcement that the National Cancer Institute had released the world’s largest cancer data set and made the information publicly available via Ingenuity and other databases was newsworthy enough to get attention from mainstream media, including NPR, Bloomberg, and Forbes.

The study offers tremendous promise in the development and prescription of cancer drugs, thanks to a vast array of correlations between the exomes of these 60 cancer cell lines and the predicted sensitivity of their variants to existing and investigational anti-cancer drugs. Truly, this is a critical step toward the cancer treatment holy grail sought by so many scientists and clinicians. We commend the NCI researchers for their tremendous effort, and we are honored that they chose to share the data through our Publish tool. (Indeed, many authors use Variant Analysis to publish their analyses as online supplements concurrent with their peer-reviewed journal articles. See the Publications link in your account to check them out.)

The data set generated and shared by NCI is incredibly rich, and we encourage users to explore it. Here are some things you can do:

Confirm or validate variant findings from your own cancer studies by searching by variant, gene, or pathway within similar cancer cell lines within the NCI-60 data set.

Expand variant findings to other cancer types by searching by variant, gene, or pathway within the NCI-60 data set.

Look in the NCI-60 data sets by pathway or gene (biological context) then by drug (pharmacogenetics filter) to find drugs that might be used to impact a given pathway/gene product to validate conclusions in a follow-on experiment.

Take variant lists out of the NCI-60 to use as filters for your own data sets within variant analysis (try “create list,” then “user defined variants” filter). This could be used to subtract variants that are found across many cancers from your study to enable you to focus on variants unique to your study or cancer.

Use the filter cascade from the NCI-60 analysis to analyze your own samples within Ingenuity Variant Analysis to apply a pre-defined, already-published filtering approach that can rapidly access variants of interest in your study.

Looking at the NCI-60 data more broadly, we believe that this extraordinary work underscores the need for data repositories that allow people to query, analyze, and mine data produced in a project like this. The old standard of publishing a paper and submitting heat map figures or text gene sequences to a view-only database is simply no longer tenable. Take the NCI data, for example, which anyone can work with at no charge through Ingenuity: the raw data is available, as are the scientists’ analyses of all that information. In addition, anyone coming to look at the data can interrogate it and perform their own analyses on it; the data is reset back to its publication status at the end of a user’s session, so its integrity is preserved no matter how many people poke and prod it.

For a project of this complexity and scope, this type of data release and accessibility offer a new standard that will enable the scientific community to speed the pace of progress, improve reproducibility of experiments, and ensure that future work builds on the foundation established by the original project. It is the best way to maximize the investment in such a major research effort, amplifying the potential return on the scientists’ time and precious funding that went into it.

As other sizable genomic projects get underway, we hope that they will follow the model set out by NCI and open up the full value of their data to the research community and to the general public.