The benefits of looking across many cancer
genomes

For much of the 20th century cancer had been thought of as not
a single entity, but rather as more than 100 complex and distinct diseases, with most individual cancer types
demanding unique treatment strategies. The need for a more modern, comprehensive understanding led, in part, to the
launch in 2006 of The Cancer Genome Atlas (TCGA), a joint venture supported by the National Cancer Institute (NCI)
and the National Human Genome Research Institute (NHGRI), both part of NIH.

From the outset, TCGA has been generating and analyzing data according to the organ
in the body from which a tumor first arose. By 2014 the TCGA Research Network had published nearly a dozen papers
examining genomic changes in tumor types, including two manuscripts in the summer of 2014 on gastric and lung
adenocarcinomas each a comprehensive characterization of a type of cancer. The organ-specific findings have been
revealing, providing new information on cancer development and behavior, as well as new insights into molecular
pathways and genetic alterations.

Just as importantly, in many cases, researchers have uncovered shared molecular
patterns among cancers, including similar genomic changes occurring across tumor types. For example, TCGA’s 2012
breast cancer analysis found evidence that a subtype of breast cancer shows marked similarity to a form of ovarian
cancer. The subtype, basal-like breast cancer, and high-grade serous ovarian cancer shared similar mutation
characteristics as well as other genomic features, suggesting that the two cancers are of a similar molecular
origin and may respond to the same treatments. In fact, basal-like breast cancer has more similarities, genomically
speaking, to high-grade serous ovarian cancer than to other subtypes of breast cancer.

Long tail diagram depicts the individual mutations, ordered by number of
occurrences on the x-axis and number of mutations (frequency) on the y-axis.

In addition, data from TCGA’s analyses show that most cancer types possess a great
number of mutations that occur at a low frequency. The collection of mutations has been termed the “long tail”
because in a graph of the frequency of specific changes, they represent a lengthy but low section of the chart.
Scientists have found that some of these mutations are shared by sets of tumor types. The long tail and the shared
inter-tumor molecular patterns are the first suggestions that a cross-tumor analysis may yield clinically
meaningful new findings.

With such similarities increasingly apparent, TCGA researchers developed a formal
project for a cross-tumor analysis, called the Pan-Cancer project. Its goal was to assemble TCGA’s wealth of data
across tumor types, analyze and interpret those data, and finally, make both the analyses and the data available.
The group, led by Joshua M. Stuart, Ph.D., University of California Santa Cruz, analyzed 12 cancer types, whose
selection was based upon numbers of samples available and comprehensiveness of the data as of 2012. The 12 types
were glioblastoma multiforme, acute myeloid leukemia, lung squamous cell carcinoma, lung adenocarcinoma, colon and
rectal adenocarcinomas, head and neck squamous cell carcinoma and ovarian, breast, clear cell kidney, endometrial
and bladder cancers. Because of the breadth and depth of TCGA data, the Pan-Cancer group believed the analysis
would have statistical power to detect genomic changes across the cancers, find changes specific to each
organ-of-origin, and identify molecular commonalities across tumor types.

The culmination of this effort has been a series of manuscripts tied together
through “threads” featured on the website ofNature (see http://www.nature.com/ng/journal/v45/n10/full/ng.2780.html), similar to what was done for papers resulting from the similarly expansive project, the
Encyclopedia of DNA Elements (ENCODE). Each thread is centered on a specific theme and comprises relevant
information across papers and journals. “Each cross-cutting piece offers perspectives on a topic discussed in
several papers. This is a new exciting way to organize information to help bring out themes that unite the
work,” Dr. Stuart said. To this purpose, Dr. Stuart described the genesis, makeup, goals and promise of the
Pan-Cancer project in a commentary online Sept. 26, 2013, in Nature Genetics. As TCGA collects and analyzes more
samples, researchers continue to be better able to detect rare mutations that apply to numerous tumor
types.

This new perspective of analyzing cancers according to their genomic profiles
signals a shift away from organizing cancer by organ of origin. Many clinicians are beginning to imagine a future
where cancers are described by their mutations, such as an ERBB2 amplified tumor or a PI3K-pathway mutant
carcinoma. As Dr. Stuart wrote in the Nature Genetics perspective article, which accompanied two of the group of
Pan-Cancer research papers, “Only time will tell whether the integration of molecular characteristics with data on
histology, organ site and metastatic location will contribute to an improvement in patient outcomes. But the
balance is shifting in this direction.”

Two research papers published in Nature Genetics as part of the first round of
Pan-Cancer papers point to the benefits from this type of cross-cancer analysis. In one paper, researchers at
Memorial Sloan-Kettering Cancer Center, New York, analyzed data on more than 3,000 tumor samples from 12 cancer
types from TCGA, and determined that a limited number of genetic alterations are responsible for most cancer
subtypes. These alterations, no matter what tissue they originated in, fall into two general categories of
“oncogenic” signatures: genetic mutations and copy number changes (changes in the number of copies of genes in a
cell), with many smaller subclasses. The scientists hope that these results will eventually help to tailor
treatment strategies to subsets of patients, resulting in clinical trials based on matching individual patients
whose tumors have been profiled – and oncogenic signatures identified and classified – with a corresponding drug or
combination of therapies.

The second study examined patterns of changes in the number of gene copies in
cells, one of the most common types of mutations that lead to cancer. Investigators at The Broad Institute,
Cambridge, Mass., and Dana-Farber Cancer Institute, Boston, and elsewhere, compiled a database of gene copy number
changes across genes in 5,000 tumors and 12 cancer types. They used this database to identify 140 regions where
these mutations tend to occur most often, pointing to genes in these regions that are likely to contribute to
cancer formation. The scientists showed that the patterns of mutations can provide clues as to how these genes
contribute to cancer. These findings clarify how cancers develop, identify genes that are particularly important in
cancer initiation and may serve as effective therapeutic targets.

The Pan-Cancer project also suggests directions for the future of cross-tumor
analysis, several of which are highlighted in Dr. Stuart’s commentary article. These may include integrating data
sources to increase the power of genomic analyses, using molecular profiles to categorize cancers for making
treatment decisions, figuring out if “predictive signatures” derived from genes transcend tissue types and
determining whether or not comprehensive protein analyses using tools such as mass spectrometry can extend the
power of genomic analyses from TCGA.

In August 2014, two additional studies were published that provide even greater
insight into Pan-Cancer efforts and TCGA data analyses collectively. In one of them, Dr. Stuart, Charles Perou,
Ph.D. (University of North Carolina, Chapel Hill), and colleagues, used six different types of analyses to examine
the molecular characteristics of more than 3,500 tumors across 12 different tumor types to see how they compared to
each other. The researchers showed that the cancers were more likely to be genetically and molecularly similar
based on the type of cell in which the cancer originated as opposed to the tissue type in which it
originated.

In the study, investigators identified 11 integrated cancer subtypes. While many of
the subtypes had molecular profiles linked to their tissue of origin, there were some subtypes that had to be split
further, as tumors showed several different tissues of origin. For example, the study showed at least three
different subtypes of bladder cancer, with one subtype closely resembling lung adenocarcinoma and another similar
to head and neck and lung squamous cell cancers. The findings also confirmed differences in breast cancer subtypes
and similarities to other cancers seen in earlier research. Basal-like breast cancers, the study found, looked more
like ovarian cancer and cancers of squamous cell origin than other types of breast cancer. In revealing a new
approach to classifying cancers, the authors suggest that at least one in 10 cancer patients could be classified
differently by this new system.

In a second study, Roel Verhaak, Ph.D., M.D. Anderson Cancer Center, Houston, and
colleagues reported results in the journal Oncogene suggesting that cancers from all origins can be classified
according to a limited set of gene expression “footprints.” The researchers compared subtypes of gene expression
across 12 tumor types. They identified eight gene expression “superclusters” characterized by the types of disease
pathways that are commonly turned on in cancer, and similarities in the kinds of genes expressed. The investigators
found, for example, that one of the largest superclusters involved an increase in the number of mutations in TP53,
a commonly altered gene that promotes cell growth and proliferation. The supercluster was also marked by an absence
of the gene CDKN2A, which helps control cell proliferation, increased numbers of DNA double strand breaks – a
hallmark of many types of cancer – and higher-than-usual expression of cyclin B1 protein, which plays a role in
cell division. These kinds of changes affect the cell’s DNA damage control response, and can turn on certain cell
pathways that contribute to cancer development.

The researchers also saw a second pattern in nine of 11 solid tumor types they
examined. They found a gene expression subtype related to tumor-associated stroma, which can provide a matrix on
which tumors can grow, and can affect cancer growth. Taken together, these results suggest that tumors can be
grouped according to their gene expression similarities as seen across tumor types, pointing to a limited number of
molecular themes in cancer.

While follow-up studies are needed to confirm these
findings and additional tumor samples and tumor types will expand the work, this past year’s worth of initial
PanCancer-12 analyses lays the groundwork for a more detailed classification of tumors that include molecularly
defined subtypes, unlike all prior cancer classification systems.