Mutation and Selection in a Lung Cancer Genome

A letter to Nature this week presents the whole-genome sequencing of a non-small-cell-lung cancer tumor. Over 500 validated mutations (530 SNVs and 43 structural variants) offer an unprecedented view of genetic variation and selection in solid tumors.

Using arrays of self-assembling DNA nanoballs (DNBs, i.e., the Complete Genomics platform), Lee et al sequenced a primary lung tumor (to 60x) and matched normal tissue (to 46x). They also performed SNP genotyping (Affy 6.0) and array-CGH (Agilent 244A) to assess genome-wide DNA copy number, allelic imbalance, and loss of heterozygosity (LOH). The tumor sample “bears many of the hallmark copy number alterations commonly found in smoking-associated lung cancer” including copy number loss of TP53, amplification of CDK4 and KRAS, and copy-number-neutral LOH of chromosome 13 across the RB1 locus.

Somatic Mutations and the Cost of Smoking

A comparison of tumor and normal sequences yielded some 83,000 predicted somatic SNVs. After validating 70% of predicted coding region SNVs, the authors re-tuned their prediction algorithms (to 90% specificity and 82% sensitivity) and called 50,675 high-confidence somatic mutations genome-wide. The patient was a 51-year-old man who’d reported smoking 25 cigarettes a day for 15 years prior to surgery. What was the cost of his unhealthy habit? At $3.50 a pack, it works out to around $24,000. At 50,000 mutations, it works out to one mutation for every 2.7 cigarettes. Consider that, smokers, the next time you decide to light up.

Mutation Rate and Spectrum

Compared to the observed germline variation, the pattern of somatic mutations was strikingly different, favoring changes at G-C base pairs (78%), the majority of which were G/C->T/A transversions (46%). Similar patterns were observed in the lung cancer cell line recently sequenced by Pleasance et al, and underscores the strong influence of smoking-induced DNA damage.

The authors estimated an overall mutation rate was 17.7 mutations per megabase. Some 17 mutations occurred in the set of 623 genes sequenced by our group (TSP) in 188 lung adenocarcinomas. In that study, non-smokers had fewer than five mutations in the gene set, while smokers had as many as 49. Thus, the authors’ observed mutation rate fits well within the expected range for lung cancers set forth by TSP.

Evidence of Selection in Expressed Genes and Upstream Promoters

The greatest strength of this study was the authors’ analysis of somatic mutation patterns relative to gene structures. They found, for example, that the mutation rate was lower for expressed genes (8.3 per Mb) compared to non-expressed genes (17.5 per Mb), suggesting selective pressure against mutations in active coding regions. Further, mutations were less prevalent on the transcribed strand than the non-transcribed strand, likely due to transcription-coupled DNA repair mechanisms. Intriguingly, the mutation rate in regions 2kb immediately upstream of transcription start sites, i.e., the 2-kb promoters, was 10.5 per Mb, or 40% lower than the genome-wide average. Such an observation suggests that upstream promoters, like coding sequences, are under purifying selection – and supports the notion that these regions harbor key regulatory elements that are disrupted by mutation.

Genetic Complexity and Redundancy

The authors also validated some 43 somatic structural variations. However, only 27 had breakpoints in genic regions, suggesting that the majority of somatic structural events are passenger mutations. Notably, most somatic SVs map near regions of DNA copy number changes, suggesting that structural events and copy number are inter-related.

Taken together, the results of this study suggest that lung cancer tumors can harbor a surprisingly large number of mutations ranging from single-nucleotide events to megabase-scale structural variation. At least eight genes in the EGFR pathway were mutated or amplified in this tumor sample, indicating a multiplicity of partially redundant mutations. The authors conclude that the tumor tissue might therefore represent a heterogeneous mixture of sub-clonal populations, many of them with distinct mutational landscapes. Unfortunately, the authors did not include deep read count data for the validated mutations, which would have yielded precise mutation frequencies and perhaps given additional support to such a conclusion. If true, however, the genetic complexity and redundancy of lung cancer tumors might help explain why they are so difficult to treat.

We need more studies like these – more patients, more tumor types, more validation – before we can truly get a picture of the full spectrum of mutations that underlie tumor development and progression.