Scientists from the University of Helsinki in Finland1 have developed a computer program called Anduril that combines, organizes and analyzes the massive amounts of data generated from The Cancer Genome Atlas (TCGA). To demonstrate the system’s features, they used TCGA’s genomic data on glioblastoma multiforme (GBM), the most common adult brain tumor.

The data from TCGA come from several different technology platforms, each of which produces different types of information about the GBM genome. The data categories provide information about single base pair changes, extra copies or deletions of genes, increased or decreased gene expression and changes in chemical marks on DNA, a process called methylation.

Anduril combines and analyzes each data type and then determines the relationships between data types. The system’s aim is to give a more complete picture of how gene changes are connected to clinical outcomes in cancer patients. The analyzed data are displayed on the Anduril website, which is designed to let scientists without bioinformatics backgrounds access and view graphs and charts describing the data. The authors suggest that the Anduril program is unique in its ability to analyze all the dimensions of TCGA data at once, rather than one type at a time.

Using the New Tool to Answer Research Questions

To illustrate Anduril’s utility, the scientists performed several analyses using TCGA data from 338 GBM patients. The first analysis integrated two different data types: gene copy number and gene expression. In general, increased copies of a gene, or amplification, is thought to result in higher expression levels of that gene. Likewise, gene deletion is thought to result in lower expression levels of the gene. Anduril’s analysis showed that, for several genes involved in GBM, copy number and gene expression do not necessarily always correlate with each other. Methylation status of a gene, which can determine whether a gene turns on or off, accounted for many of the study’s gene expression level changes instead. These findings showed how Anduril can combine different data types to answer scientists’ questions about them.

In the next analysis, the researchers integrated patient survival data and several categories of genomic data to determine which gene changes are clinically important. They found that decreased gene expression, or repression, is most significantly related to poor survival. Scientists had previously assumed that increased gene expression was most closely linked to poor survival.

From the Computer to the Lab Bench

Finally, to demonstrate the ability to develop testable hypotheses from the Anduril analyses, the scientists performed experiments in GBM cell lines. They chose 11 genes that are overexpressed in GBM and also related to poor survival. They targeted each with a chemical that turns off genes and then measured GBM cell line growth. The GBM growth slowed only when a gene called MSN was turned off. The results confirm previous thoughts that MSN, which encodes a protein involved in cell structure and movement, could be a target for future studies or therapeutic tests.

In the future, say the researchers, the Anduril system will allow for similar analyses and experiments by integrating large amounts of data into understandable formats. The massive amounts of data being generated by large-scale projects like TCGA will require organizational tools like Anduril. The study points to the feasibility of developing such systems so that scientists can continue to examine many different data types at once.