Somatic Mutations in Four Human Cancers

In a letter to Nature this week, a group from Genentech presents an elegant analysis of 2,576 somatic mutations across 441 tumors comprised of breast, lung, ovarian, and prostate cancer types and subtypes. Using something called “mismatch repair detection” (MRD) technology, the authors surveyed 1,507 candidate genes spanning some 4 megabases of sequence, largely comprised of known cancer genes and “druggable” genes. MRD apparently uses E. coli to isolate amplicons that contain mutations relative to a reference sequence, which are then assessed for variations by a resequencing tiling array. Matched normal samples were also screened (in pools of five) to eliminate germline events.

I admit to knowing little about MRD or its capabilities, but I’m very familiar with the validation platform (Sequenom), which has proven its value in the HapMap, Cancer Genome Atlas (TCGA), and 1,000 Genomes projects.

Significantly Mutated Genes

Any doubts I had concerning a study from the private sector were quickly swept away, not just by the quality of the journal, but by the analysis that the authors presented. Simply put, I was enchanted. Figure 1, for example, illustrates the significantly mutated gene (SMG) analysis with a grid of eight bubble plots, one per cancer subtype. Significant genes are notable not just by their position on the Y-axis (Mutation q-score), but the size of the bubble, which corresponds to the number of mutations.

The set of SMGs varied across type and subtype, but some patterns immediately jump out. PIK3CA and TP53 were the most significant across three breast cancer subtypes. TP53, in fact, was significant across all eight subtypes, most strikingly in lung and ovarian cancer. KRAS stood out in pancreatic cancer and lung adenocarcinoma, but not squamous lung carcinoma.

On average across all tumors studied here, the authors found 1.8 protein-altering mutations per megabase, with the highest rates seen in lung adenocarcinomas (3.5/Mb) and squamous carcinomas (3.9/Mb). The lowest mutation rate (0.33/Mb) was in prostate tumors, 75% of which harbored the TMPRSS2-ERG gene fusion. These patterns are consistent with Figure 1, where prostate shows a sparse handful of significant genes, while lung cancers have large and diverse sets of them.

Integrated Copy Number and Mutation Analysis

Next, the authors integrated their mutations with Agilent 244K array CGH copy number data to identifygenes that were significantly altered, either by mutation, copy number, or both. In Figure 2a, the authors plotted significantly altered genes by their copy number gain or loss, which nicely separated oncogenes and tumor-suppressor genes. The integrated analysis identified 35 additional cancer genes including STK11, EPHB1, and notably GNAS (the G-protein alpha subunit). GNAS proved an important finding, as it was mutated and amplified across several human cancers.

Pathway-based and Recurrency Analyses

The integrated dataset identified two pathways – RTK signaling and RAS/MAPK as the most significantly altered across all tumor types. Furthermore, when the authors compared their dataset with the COSMIC database and the findings of recent cancer sequencing studies, they pinpointed novel recurrent mutations in several genes including HER2, NOTCH4, and PIK3R1.

The authors conclude that their study “represents a substantial expansion of the knowledge base of cancer somatic mutations,” and I tend to agree. They not only generated a rich dataset, but also analyzed and presented it in comprehensive fashion. Furthermore, they (perhaps unsurprisingly) identify numerous cancer genes that are druggable targets, thereby translating these findings into actionable information.