A team of computer scientists and a biologist have created a new type of cancer gene map that brings together a diverse collection of genomic information and identifies sets of genes that are active in a variety of tumors. In the process, they stumbled upon a possible explanation for how some breast cancer tumors spread into bone.

Because they are not experts in breast cancer, the researchers were only vaguely aware that breast tumors prefer to spread, or metastasize, into bone. But when they saw a connection on the map they read the literature and developed a hypothesis about what’s going on.

“We think that a breast tumor hijacks some genes associated with bone growth, and when this happens you basically have bone cells that keep growing and subdividing,” says Daphne Koller of Stanford University in California, who led one of three laboratories that collaborated on the map.

This hypothesis will need to be tested, but it is precisely the sort of thing the researchers hope will result from their map. They are confident that the collective value of data from many cancer studies will ultimately yield insights into the biology of tumors that may not be apparent in the original studies.

Indeed, there’s nothing new about the data, which came from 26 cancer studies published in prominent journals and includes a total of 14,000 genes and 22 types of tumors.

What’s new is the analysis.

“These data had previously been exhaustively analyzed, and yet we were still able to develop new hypotheses by taking a different analytical approach,” says Aviv Regev, the biologist on the team from Harvard University’s Bauer Center for Genomic Research in Cambridge, Massachusetts.

Their approach is an example of what is increasingly being called “systems biology.” The researchers view tumor cells as systems, and when something goes wrong it is usually not because a single gene is misbehaving but because an entire set of genes is misbehaving.

The map, therefore, focuses on sets of genes that carry out basic operations in a cell. By cross-referencing genes with tumors, the map shows operations that are carried out in tumors too often—or not often enough.

“You’re never going to replace the detailed gene-by–gene analysis of tumors, but by looking at this map you can get pointers about the interesting places to look,” says Koller.

What’s enabled this map to happen now is a large number of studies done over the past few years in which people have made lists of genes that are active in diverse tumors. “We have enough of these large-scale building blocks to talk about human biology in more global terms,” says Koller.

The gene lists have primarily been used to develop tools for identifying tumors and predicting the course of disease, and this work is making a difference to human health, says Regev. But the same information, she argues, can be used to investigate the biology of tumors.

“I hope we can get a better explanation of the mechanisms involved in the underlying biology of the disease that will take us beyond diagnostics and prognostics,” Regev says. She and Koller collaborated on the map with Nir Friedman of Hebrew University in Israel, and much of the work was done by Eran Segal in Koller’s lab at Stanford.

The most time-consuming part of the study was to “normalize” the data, which were generated by microarray technology that tracks the activity of genes. The results of a microarray experiment depend on its design, and there are as yet no universal standards for conducting experiments. The same experiment done by two people in the same laboratory can yield different results.

The researchers normalized the data using an algorithm, and they acknowledge that the design of the algorithm influences the results you get. Decisions about the design will illuminate certain insights and obscure others.

The tools used to create the maps will be available from the researchers so that anyone can make a map. The study appeared this week in Nature Genetics, and among the few people who have already read the study and tested the map is Naftali Kaminski, who directs the Simmons Center for Interstitial Lung Disease at the University of Pittsburgh Medical Center.

Kaminski initially thought the project was too ambitious and wouldn’t produce anything meaningful because the data sets were too diverse.

“My perspective is that of a convert, and now I’m quite excited,” he says. “This study provides a framework for looking at multiple data sets created using different platforms at different institutions to generate biological hypotheses.”

Although it seems far in the future, he suggests that in five years it may be possible to use correlated gene sets and clinical information to determine whether a person is “likely to respond to chemotherapy” or “not likely to respond” without knowing anything about the person’s medical history.

“In a best-case scenario you wouldn’t have to know the patient’s diagnosis and you could put their information into the gene set and see if they are likely to benefit from chemotherapy or some other treatment,” says Kaminski.

The researchers emphasize that their contribution is to generate hypotheses that someone will have to test. “But the point of genomics is that there are millions of hypotheses,” says Koller. “So the question becomes: How do you generate reasonable ones that you can reliably test?”

She and her colleagues think their map is a place to start.

In future studies the researchers will refine the gene sets associated with operations in the cell. The fact is there’s not much known about most of the 14,000 genes, and consequently, the gene sets, which the researchers call “modules,” are admittedly crude.

But the tools themselves have applications beyond cancer and are already being used in research on plants, according to Regev. “The tools are simple enough for people to master quickly,” she says, and Kaminski agrees.

“Developing the algorithm was not easy, but once you have the algorithm the tools are easy to use,” she says. “And this is not always the case with computational tools.”