Cancer Studies

The Cancer Genome Atlas (TCGA) targets more than 30 different cancer types, collecting hundreds of samples
for each type. Each disease is studied individually by multiple groups across TCGA. Our Center is
analyzing data collected for many of these diseases in order to understand each cancer more deeply.
Our GDAC is also exploring associations between the various diseases to identify commonalities.

Adrenocortical Carcinoma

Our Center participated in the TCGA Adrenocortical Carcinoma (ACC) Analysis Working Group, contributing to working group discussions,
analyses, presentation of results, and the TCGA ACC marker paper. Our Center integrated analysis across different data types for the 91 ACC samples
in this study and captured statistical associations in Regulome Explorer.
The AWG identified novel ACC driver genes, pathways and refined subtypes, and whole genome doubling (WGD) as a milestone in disease progression.

Integrative analysis of 91 ACC cases.
Comprehensive genomic characterization and integrative analysis revealed (left to right) whole-genome doubling as a hallmark of ACC progression,
identified three ACC subtypes with distinct clinical outcome, demonstrated many statistical associations between the adrenal differentiation score
and genomic features of the ACC cases that can be queried and visualized using Regulome Explorer, and enabled comparison of ACC's mutation signature
with other cancer types in TCGA.

Publications

Resources

Breast Cancer

Our Center participated in the TCGA breast cancer analysis working group, contributing to working group
discussions, analyses, presentation of results, and preparation of the TCGA breast cancer marker paper. Numerous analyses were performed by our GDAC, related in particular to the relationship
between individual molecular features and various subtypes discovered through supervised and unsupervised
methods. As a companion feature to the manuscript, our GDAC has provided a comprehensive feature matrix,
including statistical pairwise analysis, that can be explored interactively via
Regulome Explorer.

Associations between molecular features.
Statistically significant associations between features with genomic coordinates are indicated
by arcs connecting pairs of dots which represent the features. Two examples are shown:
significant associations between microRNA and mRNA expression levels (Left), and between copy-number
and mRNA expression (Right).

Publications

Resources

Colorectal Cancer

Our Center participated in the TCGA Colorectal Analysis Working Group, contributing to working group discussions,
analyses, presentation of results, and the TCGA colorectal marker paper. Numerous
analyses were performed by our GDAC, e.g. centered on micro-RNAs,
DNA structural variation, signatures associated with anatomical position, signature association with specific
subgroupings of microsatellite instability categories.
For the colorectal manuscript, we focused on six clinical variables associated with tumor aggressiveness, and generated
a score for the association of molecular features with those six variables. The aggressiveness score is a
composite of association score with six clinical variables in which p-values for each
individual comparison are combined using the weighted Fisher’s method from which an overall p-value is
derived. The aggressiveness score is the negative of the base-10 logarithm of this overall p-value augmented
by a plus or minus depending on whether the signature is higher or lower in the more aggressive tumors,
respectively. This score is color-coded in the visual display with a blue to red color scale from low to high
score. To limit the extent of the display, the score is saturated at -10 and +10.

A "hotspot" of CRC aggressiveness in region 20q13.12.
Certain chromosomal regions are enriched in clinically
associated molecular features. Region 20q13.12 includes a local amplification (orange) and 11 genes (blue), all of which are expressed
more highly in aggressive tumors. A number of methylation probes (green) are also statistically associated with tumor aggression,
nearly all (8/10) with decreased levels in aggressive tumors.

Publications

Resources

Endometrial Cancer

TCGA interrogated a large set of endometrial carcinomas, including both serous (n=53) and
mixed/endometrioid (n=280) types, to generate the multi-dimensional data types. Our Center performed
data analysis on molecular classification and association with clinical and pathological variables. Using the
RNASeq gene expression profile, we identified three gene expression subtypes in TCGA endometrial
carcinoma, which are then termed as ‘mitotic’, ‘hormonal’, and ‘immunoresponsive’, respectively, on the basis
of pathway analysis and gene content. These clusters are significantly correlated with tumor histology, grade,
stage, patient overall survival and progression-free survival. Similar to serous ovarian carcinoma, the FOXM1
transcription factor network is significantly altered in the mitotic subtype. Associative analysis indicates that the
mitotic subtype is enriched with TP53 mutation/deletion and PIK3CA amplification, while PTEN/CTNNB1
mutation mainly occurs in the hormonal and immunoresponsive subtypes. The results were presented in
TCGA Endometrial Cancer workshop (April 9-10, 2012, WashU in St. Louis) and TCGA Semi-Annual Steering
Meeting (April 25-27, 2012, Houston). In addition, our Center contributed to the TCGA endometrial marker paper.

Figure 4: Gene Expression Profiling Identifies Three Gene Expression
Subtypes.
(A) Tumors from TCGA separated into three clusters on the basis
of gene expression, namely mitotic, hormonal and immunoresponsive. (B) The
three clusters are significantly correlated with patient overall survival and
progression-free survival. (C) Association of the three clusters with
clinical/pathological features, mutation, copy-number variation and cluster
assignments from different data types. (D) Molecular and clinical features
associated with tumor histology and FOXM1 transcriptional factor network are
significantly activated in the mitotic subtype.

Publications

Gastric Cancer

Our Center played a key role in the TCGA’s analysis of gastric cancer in a cohort of 295 patients.
This study identified four distinct molecular subtypes of gastric cancer,
along with possible targeted treatments for the some subtypes. An essential component of this was an analysis that performed by the Center, integrating molecular patterns among the six molecular
platforms of the study (which included DNA sequencing, RNA sequencing, and protein arrays) to identify sets of patients that shared molecular profiles. These molecular profiles were found to have
strong associations with a limited set of key variables, which were subsequently used to classify gastric cancer into subtypes. The Center also identified distinct pathway-level differences among the subtypes.

Integrated Molecular Analysis Identifies Distinct Gastric Cancer Subtypes
Subsets of gastric cancer patients share molecular signatures reflected in multiple types of measurements. The central part of this figure (bordered by red line) indicates how a patient
tumor sample (each corresponding to a column) falls into several possible patterns specific to molecular platforms as indicated by a blue tile. For example, in a single sample, copy number (SCNA)
can be either High (blue in row 1) or Low (blue in row 16). Analysis by our Center played a role in revealing that four overall patterns are seen in the data, as indicated by the vertical red
separation lines. Furthermore, these overall patterns were characterized by several key variables, as seen in the annotations below the box, the covariant tracks above the box, and the icons
at the top of the figure representing DNA mismatch repair, diffuse cell type, Epstein-Barr virus, and aneuploidy, respectively. The key variables formed the basis of the classification of gastric
molecular subtypes in the study.

Publications

Resources

Glioblastoma Multiforme

Our center contributed to the glioblastoma multiforme (GBM) Analysis Working Group. For this analysis, we
inferred associations in the data using pairwise statistical analysis as well as the RF-ACE algorithm. These
inference methods have been applied to the entire GBM data set, as well as subsets of the data that have
been partitioned by the four GBM subtypes to identify subtype-specific associations (e.g., the impact of TP53
mutations in classical, mesenchymal, neural, and proneural). The resulting associations have all been made
available through
Regulome Explorer
and collaboratively shared with other members of the Analysis Working
Group. Key associations that have emerged from these analyses and data exploration tools are mutual
exclusivity and co-occurrence of genomic events, identification of associations between these genomic events
and molecular features (e.g., mutations that impact gene and miRNA expression), and subtle relationships
between molecular features and clinical data or sample characteristics (e.g., IDH1 mutation and
hypermethylation phenotype).

Publications

Resources

Ovarian Cancer

Our center reviewed TCGA data for 316 patients with high-grade serous ovarian cancer, the most common form of the
disease. Data for each patient included a genetic survey of the surgically resected primary tumor and
comprehensive clinical data. Most patients in the study had stage III or IV disease and G2 or G3 tumors.
BRCA2 mutations were found in 29 ovarian cancers and BRCA1 mutations in 37. All patients had undergone
surgery followed by platinum-based chemotherapy. Patients with BRCA2 mutations in their tumors had a significantly
higher 5-year overall survival rate (61%) than did patients without BRCA mutations in their tumors
(25%). The 3-year progression-free survival rate also was significantly higher for patients whose tumors had
BRCA2 mutations (44%) than for those whose tumors did not have BRCA mutations (16%). BRCA1 mutations
in tumors were not significantly associated with survival. All patients whose ovarian cancers had BRCA2
mutations responded to primary platinum-based chemotherapy, compared with 82% of patients whose tumors
did not have any BRCA mutation and 80% of patients whose tumors had BRCA1 mutations. The median
platinum-free duration was 18.0 months for patients whose tumors had BRCA2 mutations, 11.7 months for
patients whose tumors did not have any BRCA mutations, and 12.5 months for those whose tumors had
BRCA1 mutations. We also found that tumors with BRCA2 mutations had a median 84 mutations per tumor
sample compared with 52 mutations per tumor sample for tumors without BRCA mutations. This last aspect,
called the hypermutation/mutator phenotype for BRCA2 mutated ovarian cancers, might be a factor in the
development and growth of a tumor and a sign of its vulnerability to DNA-damaging drugs.

Prostate Cancer

Our Center participated in the TCGA Prostate Analysis Working Group, contributing to working group discussions, analyses, presentation of results, and the TCGA prostate marker paper. Our Center worked on the analysis of the clinical data and integrated analysis of the molecular data for 333 primary prostate cancers. The study identified seven subtypes of prostate cancer reflecting the heterogeneity of this cancer.

Publications

Resources

Thyroid Cancer

Our Center participated in the TCGA Thyroid Analysis Working Group, contributing to working group discussions, analyses, presentation of results, and the TCGA thyroid marker paper. Our Center helped to analyze the largest cohort of Papillary Thyroid Cancer (PTC) samples studied to date (496 patients) by performing integrative analysis of DNA sequence, gene expression, microRNA expression, protein expression, and DNA methylation profiles of PTCs. We worked with clinicians to generate a risk of recurrence feature. We collaborated with other AWG members to analyze thyroid differentiation in PTCs. Our Center also played a key role in the identification of certain microRNAs, miR-21, miR-146b, and miR-204, in less differentiated subgroups of PTC. The identification of these microRNAs may lead to more precise surgical and medical therapy.

Figure 7. Unsupervised Clusters for miRNA-seq Data
Heatmap showing discriminatory miRs (5p or 3p mature strands) with the largest 6% of metagene
matrix score, as well as miR-204-5p, 221-3p, and 222-3p, which were highlighted in correlations to BRS and TDS scores.
The scalebar shows log2 normalized (reads-per-million, RPM), median centered miR abundance. miR names in red are
discussed in the text. Gray vertical lines in the clinical information tracks mark samples without clinical data, and in the
mutation tracks gray lines identify samples without sequence data.

Publications

Resources

Pan-Cancer Analysis

The Center participated in the pan-cancer working group and Dr. Shmulevich was one of the
cochairs. TCGA presents unprecedented opportunities to study molecular differences and
similarities across multiple different cancers and their subtypes. The opportunity is to complement the
traditional "tissue of origin" classification of cancers with multidimensional molecular characterization.
Analytical tools such as random forest regression will be applied to multiple cancers to characterize subtypes,
which may span multiple histological categories, at the level of molecular associations among genetic
aberrations (mutations, translocations), expression, epigenetic and other measurements. The Center is also
developing pathway level exploration within Regulome Explorer. This capability, already in prototype, allows
the scientist to view a particular pathway at the level of associations and to identify enrichment of other
pathways associated with the pathway of choice. These associations can be limited to specific datatypes or
combinations thereof. Such capabilities will be important for pan-cancer analysis at the pathway level.
Furthermore, PARADIGM (UCSC) integrated pathway levels are ingested as features into our feature matrices
and analyzed jointly with all other features, providing an additional pathway-level view.