A computational biologist's personal views on new technologies & publications on genomics & proteomics and their impact on drug discovery

Wednesday, October 07, 2009

The genomic history of a breast cancer revealed

Today's Nature contains a great paper which is one more step forward for cancer genomics. Using Illumina sequencing a group in British Columbia sequenced both the genome and transcriptome of a metastatic lobular (estrogen receptor positive) breast cancer. Furthermore, they searched a sample of the original tumor for mutations found in the genome+transcriptome screen in order to identify those that may have been present early vs. those which were acquired later.

From the combined genome sequence and RNA-Seq data they found 1456 non-synonymous changes which was then trimmed to 1178 after removing pseudogenes and HLA sequences. 1120 of these could be re-assayed by Sanger sequencing of PCR amplicons from both normal DNA and the metastatic samples -- 437 of these were confirmed. Most of these (405) were found in the normal sample. Of the 32 remaining, 2 were found only in the RNA-Seq data, a point to be addressed later below. Strikingly, none of the mutated genes were found in the previous whole-exome sequencing (by PCR+Sanger) of breast cancer, though those samples were of a different subtype (estrogen receptor negative).

There are a bunch of cool tidbits in the paper, which I'm sure I won't give full justice to here but I'll do my best. For example, several other papers using RNA-Seq on solid cancers have identified fusion proteins, but in this paper none of the fusion genes suggested by the original sequencing came through their validation process. Most of the coding regions with non-synonymous mutations have not been seen to be mutated before in breast cancer, though ERBB2 (HER2, the target of Herceptin) is in the list along with PALB2, a gene which when mutated predisposes individuals to several cancers (and is also associated with BRCA2). The algorithm (SNVMix) used for SNP identification & frequency estimation is a good example of an easter egg, a supplementary item that could easily be its own paper.

One great little story is HAUS3. This was found to have a truncating stop codon mutation and the data suggests that the mutation is homozygous (but at normal copy number) in the tumor. A further screen of 192 additional breast cancers (112 lobular and 80 ductal) for several of the mutations found no copies of the same hits seen in this sample, but two more truncating mutations in HAUS3 were found (along with 3 more variations in ERBB2 within the kinase domain, a hotspot for cancer mutations). HAUS3 is particularly interesting because until about a year ago it was just C4orf15, an anonymous ORF on chromosome 15. Several papers have recently described a complex ("augmin") which plays a role in genome stability, and HAUS3 is a component of this complex. This starts smelling like a tumor suppressor (truncating mutations seen repeatedly; truncating mutation homozygous in tumor; protein in function often crippled in cancer), and I'll bet HAUS3 will be showing up in some functional studies in the not too distant future.

Resequencing of the primary tumor was performed using amplicons targeting the mutations found in the metastatic tumor. These amplicons were small enough to be spanned directly by paired-end Illumina reads, obviating the need for library construction (a trick which has shown up in some other papers). By using Illumina sequencing for this step, the frequency of the mutation in the sample could be estimated. It is also worth noting that the primary tumor sample was a Formalin Fixed Paraffin Embedded slide, a way to preserve histology which is notoriously harsh on biomolecules and prone to sequencing artifacts. Appropriate precautions were made, such as sequencing two different PCR amplifications from two different DNA extractions. The sequencing of the primary tumor suggests that only 10 of the mutations were present there, with only 4 of these showing a frequency consistent with being present in the primary clone and the others probably being minor components. This is another important filter to suggest which genes are candidates for being involved in early tumorigenesis and which are more likely late players (or simply passengers).

One more cool bit I parked above: the 2 variants seen only in the RNA-Seq library. This suggested RNA editing and also consistent with this an RNA editase (ADAR) was found to be highly represented in the RNA-Seq data. Two genes (COG3 and SRP9) showed high frequency editing. RNA editing is beginning to be recognized as a widespread phenomenon in mammals (e.g. the nice work by Jin Billy Li in the Church lab); the possibility that cancers can hijack this for nefarious purposes should be an interesting avenue to explore. COG3 is a Golgi protein & links of the Golgi to cancer are starting to be teased out. SRP9 is part of the signal recognition particle involved in protein translocation into the ER -- which of course feeds the Golgi. Quite possibly this is coincidental, but it certainly rates investigating.

One final thought: the next year will probably be filled with a lot of similar papers. Cancer genomics is gearing up in a huge way, with Wash U alone planning 150 genomes well before a year from now. It seems unlikely that those 150 genomes will end up as 150 distinct papers and more so it will be a challenge to do the level of follow-up in this paper on such a grand scale. A real challenge to the experimental community -- and the funding establishment -- is converting the tantalizing observations which will come pouring out of these studies into validated biological findings. With a little luck, biotech & pharma companies (such as my employer) will be able to convert those findings into new clinical options for doctors and patients.

7 comments:

Thanks for this excellent analysis of the breast cancer genome paper. Your discussion of the mutated genes lends a great deal to the paper since theirs was somewhat lacking. Great job!

Incidentally, my Google alert first picked up the text of your blog the "Oregon Personal Injury Law Blog". It looks like they stole your text and are passing it off as their own - content pirates, if you will. Let's get these guys.

Hey, thanks for the comments (and your commentary picked up some key stuff I missed!).

The Oregon blog is wierd -- looking further I find this isn't the only piece of mine that they have lifted without attribution, including last night's post. It's almost like it is an automated plagiarism-by-RSS.

Hi Keith.I'd also like to thank you for your excellent and prompt summary of our paper. I am a long-time reader and was excited to find my own research featured in your blog. We are also glad you consider SNVmix to be a publishable easter egg (thus far, no editor has shared that opinion unfortunately). I'd also like to make a quick comment in our defense with regards to Dan's comment regarding our discussion of the implications of individual mutations (or rather, lack thereof) in the paper. You probably noticed that the supplement includes a more detailed discussion of the significance of the HAUS3 mutations and (to a lesser extent) PALB2, ABCB11 and SLC24A4. This was originally a section in the main text but, as is too often the case, had to be relegated to the supplement to conform to the constraints of the letter format.

It is always a thrill to have one of the authors of a paper I've covered drop by for a comment!

I did have one question for you. Before I saw your comment to moderate I was going to take a stab at Dan's comment"First of all, only the metastatic sample was whole-genome sequenced - the primary tumor and matched normal were not. Instead, the authors identified nonsynonymous coding variants in met WGS data, and validated them by PCR/3730 sequencing in the met, tumor, and normal samples. This seems laborious to me, since there were 1,120 nonsynonymous SNVs..."

I'm guessing your group picked the PCR-based deep sequencing, rather than whole genome shotgun, because the sample from the primary tumor was available only as Formalin Fixed Paraffin Embedded (FFPE) and you either had insufficient DNA for WGS or were concerned about FFPE artifacts and the need to sequence from multiple, separate DNA isolations.

Hi Keith.I can certainly confirm that the validation of our candidate variants by Sanger sequence was extremely laborious. This choice made for many headaches and late nights visually inspecting traces and I feel for the two individuals who accomplished this (Tesa and Trevor). Had we generated a whole genome sequence from the germline, we could have likely reduced this workload by an order of magnitude or more. We did, in fact, manage to produce a good library from the primary tumour sample (FFPE) but we decided not to pursue it. Considering that so few of the changes we identified in the Met sample were somatic, a genome sequence of the primary tumor would not be of much use to us. Instead, we only had to look at a few sites by PCR (and deep resequencing) to determine whether evolution had occurred. This gave us the added benefit of detecting the subclonal mutations, which we likely would have shrugged off as sequencing errors (had we seen them at all) in a 30-40X genome. I suppose you could say that we picked our battle and stuck with it. Even if in retrospect it would certainly have been better to have all 3 genomes fully sequenced, our method sufficed to tackle our rather focused questions.

Funny to read this article now, it's a great analysis indeed, why I called it funny you ask? well all the women in my family faced cancer and apparently it was not genetic, but since I have heard it's not genetic I really started to read more and more articles like this one.

Follow by Email

Search This Blog

About Me

Dr. Robison spent 10 years at Millennium Pharmaceuticals working with various genomics & proteomics technologies & working on multiple teams attempting to apply these throughout the drug discovery process. He spent 2 years at Codon Devices working on a variety of protein & metabolic engineering projects as well as monitoring a high-throughput gene synthesis facility. After a brief bit of consulting, he rejoined the cancer drug discovery field at Infinity Pharmaceuticals in May 2009. In September 2011 he joined Warp Drive Bio, a startup applying genomics to natural product drug discovery. Other recurring characters in this blog are his loyal Shih Tzu Amanda and his teenaged son alias TNG (The Next Generation).
Dr. Robison can be reached via his Gmail account, keith.e.robison@gmail.com
You can also follow him on Twitter as @OmicsOmicsBlog.