Tag: history of science

I saw this post by Craig Pikaard on Facebook and it brought back some memories:

New paper from my lab in which we identified the RNAs made by RNA Polymerase IV, an enzyme we discovered ~15 years ago. Took us more than ten years to find the little buggers, but we finally got ’em. The paper is “open access”, meaning that anyone can read it without paying a download fee or subscription. So have at it if you need a nap.

And the post included a link to a new paper in Elife. This brought back memories because I had a small part in the discovery (or more accurately, some post discovery analysis). So – let’s step into a time machine here provided by, well, me keeping all my email forever I guess.

It was September 2000. I was working as a faculty member at TIGR (The Institute for Genomic Research) and I was doing some evolutionary analysis of the Arabidopsis thaliana genome, for what would become my most highly cited paper: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. And then on Sept 6 day I got an email from someone who I had gotten to know a little bit who was also analyzing the genome:

———————————-9/6

Dear Jonathan,

In helping Mike Bevan search for the general transcription machinery, I’vestumbled across something odd that might also interest you given itsevolutionary implications.

There should be three related genes in the Arabidopsis genome (or more ifany of the genes are duplicated) encoding ~135 kd (2nd largest)DNA-dependent RNA polymerase subunits – one each for pol I, II and III.These subunits are similar and are clearly related to one another (also tothe B subunit of the single bacterial RNA polymerase) yet they havedistinct motifs that allow them to be placed in each class (pol I, II, orIII) based on clustal analysis with orthologs from other species. Anyway,there ARE three distinct ~135 kd subunit genes in the thaliana genome andbased on multiple alignments vs. mouse, yeast, drosophila etc genes, andclustal analysis to draw phylogenetic trees, one is clearly for pol II, andone is clearly for pol III. The third paralog is strange- it does not groupwith other pol I 135 kd subunits (from yeast, Drosophila, Euplotes, mouse,C. elegans), nor with pol II or III subunits. In fact, it appears as anoutgroup even when archael subunits (e.g. Sulfolobus) are included in theanalysis: archael subunits are more closely related to the pol II secondlargest subunit than the mystery subunit is to other pol I, II, or IIIsubunits. By BLAST searching Genbank, the mystery subunit does not matchanything better than eukaryotic 135 kd subunits and it doesn’t look like achloroplast or mitochondrial subunit. I’m wondering if a plant Pol I canreally be that weird.

Is this something you would be interested in looking at if I send you theprotein sequences for clustal analysis?

Now this certainly seemed interesting and as I was doing a variety of analyses of RNA polymerase homologs for some studies of the evolution of microbes, it was something I actually knew a little bit about. So I wrote back immediately:

Craig

This sounds quite interesting. I have found that for many of the DNA repair genes I have been looking at, the A. thaliana genes do show quite long branches, so long branches might be a possibility. A good phylogenetic analysis should be able to detemrine if that is the case. If you send me the sequences and/or an alignment, I would be happy to put them through a more deailed phylogenetic analysis.

Jonathan

Then, a few minutes later I got another email:

Hi Jonathan,

I’m pasting below the sequences I used for the multiple alignments (usingDNAStar), starting with the mystery gene and then known second subunits ofpol I, II, III, and archae.Thanks for having a look at this.Craig———–

largest subunitMEYNEYEPEPQYVEDDDDEEITQEDAWAVISAYFEEKGLVRQQLDSFDEFIQNTMQEIVDESADIEIRPESQHNPGHQSDFAETIYKISFGQIYLSKPMMTESDGETATLFPKAARLRNLTYSAPLYVDVTKRVIKKGHDGEEVTETQDFTKVFIGKVPIMLRSSYCTLFQNSEKDLTELGECPYDQGGYFIINGSEKVLIAQEKMSTNHVYVFKKRQPNKYAYVGEVRSMAENQNRPPSTMFVRMLARASAKGGSSGQYIRCTLPYIRTEIPIIIVFRALGFVADKDILEHICYDFADTQMMELLRPSLEEAFVIQNQLVALDYIGKRGATVGVTKEKRIKYARDILQKEMLPHVGIGEHCETKKAYYFGYIIHRLLLCALGRRPEDDRDHYGNKRLDLAGPLLGGLFRMLFRKLTRDVRSYVQKCVDNGKEVNLQFAIKAKTITSGLKYSLATGNWGQANAAGTRAGVSQVLNRLTYASTLSHLRRLNSPIGREGKLAKPRQLHNSQWGMMCPAETPEGQACGLVKNLALMVYITVGSAAYPILEFLEEWGTENFEEISPSVIPQATKIFVNGMWVGVHRDPDMLVKTLRRLRRRVDVNTEVGVVRDIRLKELRIYTDYGRCSRPLFIVDNQKLLIKKRDIYALQQRESAEEDGWHHLVAKGFIEYIDTEEEETTMISMTISDLVQARLRPEEAYTENYTHCEIHPSLILGVCASIIPFPDHNQSPRNTYQSAMGKQAMGIYVTNYQFRMDTLAYVLYYPQKPLVTTRAMEHLHFRQLPAGINAIVAISCYSGYNQEDSVIMNQSSIDRGFFRSLFFRSYRDEEKKMGTLVKEDFGRPDRGSTMGMRHGSYDKLDDDGLAPPGTRVSGEDVIIGKTTPISQDEAQGQSSRYTRRDHSISLRHSETGMVDQVLLTTNADGLRFVKVRVRSVRIPQIGDKFSSRHGQKGTVGMTYTQEDMPWTIEGVTPDIIVNPHAIPSRMTIGQLIECIMGKVAAHMGKEGDATPFTDVTVDNISKALHKCGYQMRGFERMYNGHTGRPLTAMIFLGPTYYQRLKHMVDDKIHSRGRGPVQILTRQPAEGRSRDGGLRFGEMERDCMIAHGAAHFLKERLFDQSDAYRVHVCEVCGLIAIANLKKNSFECRGCKNKTDIVQVYIPYACKLLFQELMSMAIAPRMLTKHLKSAKGRQ—————–

Well, this was helpful. Sequences and useful notes about them. So I played around with the sequences and searched for some other homologs and built a few alignments, build some masks to filter out poorly aligned regions, and then fed the data into PAUP and built a tree. (I note – I know about this because amazingly I still have all the files)

And I wrote back to Mike Bevan and Craig on Sept 8:

Mike and Craig

Attached is a phylogenetic tree of RNA polymerase subunits (Craig suggested I look at these because of an unusual protein in the A. thaliana genome). A. thaliana has representatives in five different subfamilies – Pol-I, Pol-II, Pol-III and RpoB (for the chloroplast) as would be expected and then this novel Pol which I have called Pol-IV.

I do not know much about RNA polymerase, but it seems like this is a pretty big deal and I think should be emphasized in the paper. What do you think? I could try to make a pretty tree figure to show the different families.

Jonathan

I got an email back:

Dear Jonathan (and Mike),

Many thanks for the detailed phylogenetic tree of the mystery pol subunit.I think a figure is the only way to show clearly that this protein definesa new clade. Is there room for such a figure, Mike?

In the lab we have also been calling it a putative pol IV subunit just forthe shock value of saying the words (a radical idea in the transcriptionfield), though in the absence of knowing what other subunits associate withit, I’m not sure what to call it in the annotation or figure. Maybe“oddpol” or “atypical polymerase 2nd subunit”. It takes more than a dozensubunits to make a eukaryotic polymerase, so it is not clear that oneunusual subunit is enough to confer new properties-i.e. a true pol IV.Obviously, that will require quite a bit of work.

Cheers,Craig

Me to Craig on 9/11/00:

Yes

I agree that it is too early to call it a true polIV, and I was doing it for the shock value too

Jonathan

PS. Do you mind if I present this at the TIGR GSAC meeting later this week

Jonathan

Craig to me 9/11:

Hi Jonathan,

Feel free to show the data. In thinking more about this, it is worth alsomaking a phylogenetic tree for the largest pol subunit (the equivalent ofeubacterial B’) just to see if there might be a fourth class out there forthe largest subunit, too. If there is, pol IV may not be such a wild idea.

In case you are interested in giving this a try, I’m including somesequences below. In the meantime, is there a good web site for performingthe types of extensive phylogenetic trees you’ve done for the mysterysubunit? I should do this for many of the general transcription factorsjust to be sure they really group with the correct homologs, as yousuggested.

Anyway, here are some largest subunit sequences for pol I, II and III.Vive la difference!

sorry .. no useful sites out there for doing phylogenetic analysis … I am working on such a type of thing right now. I tis tricky becuase to do it correctly you need to filter out parts of a multiple sequence alignment to remove badly aligned regions as well as hypervariable regions.

9/12 Craig to Me

Dear Mike,

Yes, I can do this for the atypical RNA polymerase 2nd subunit. I havealready done multiple alignments with it against pol I, II, III subunitsand it is clear that the atypical subunit has amino acid differences thatset it apart, rather than large indels that skew the data. So I thinkJonathan is safe to go ahead and make a figure while I examine the genesequences and gene models more carefully.

Any comments on the tone/amount of detail in the section I wrote on thegeneral transcription machinery? Either way, I will add some referencesand send you an updated version as soon as I can.

cheersCraig

———————>Speaking on behalf of the editorial committee whom I have not consulted, I>would be delighted to have this in our section. But we need to check out the>gene structure in detail (dodgy gene prediction, missing exons etc. Craig,>could you so this as you know most about these enzymes>>All the best>>Mike

Me to Craig

Craig

I am still working on a slightly better figure … but I have attached the latest version … I think it is sufficient for submission

I have attached it in a few different formats.

I will be out of town for a few days but checking email.

Jonathan

Craig to Me:

Hi Jonathan,

The phytlogenetic tree figure for the atypical pol subunit looks goodthough the font size may need to be reduced to fit “Fungal Plasmids”between the dividing lines for the adjacent categories. Have you sent acopy to Mike?

Craig

Craig again

Hi Jonathan,

I forwarded a copy to Mike. Did you ever have a chance to do a tree for thelargest subunit to further test the hypothesis of a pol IV?

Hope you are having fun in LA

Craig

> I am not sure if I sent a copy to mike>>I am in LA right now and it would be easier if you could send mike a copy to>make sure he has one. I will try and edit the figure and send one with a>smaller font.>>J

10/3 Me to Craig:

Criag

Attached is a new version of the rna pol tree with fonts corrected. I am going to add a few more sequences a rerun it and make a new tree tomorrow.

Jonathan

PS Also … here is a potential figure legend

Figure. Phylogenetic tree of RNA polymerase homologs. Homologs of RNA polymerase were identified by searching sequence databases with representatives of the major known RNA polymerase subfamilies. These proteins, as well as six DNA polymerase homologs from A. thaliana, were aligned using clustalx using default settings. Phylogenetic trees were generated from the alignment (with ambiguously aligned regions and hypervariable regions excluded) using the PAUP* program. The tree shows was generated using the neighbor-joining algorithm with pairwise distances between sequences calculated with a PAM-like matrix. Numbers on the branches are bootstrap values indicating the percentage of 100 trees in which the proteins to the right of the node grouped together to the exclusion of all other proteins.

Craig 10/3

Hi Jonathan,

I will look forward to seeing the final tree, as will Mike, I’m sure. Forthe legend, the fact that this is an alignment of second-largest subunitsshould be made clear. Here is a stab at a minor revision:

Figure—–. Phylogenetic tree for the second-largest subunit ofDNA-dependent RNA polymerases. Homologs of RNA polymerase second-largestsubunits were identified by searching sequence databases withrepresentatives of the major known subfamilies (e.g. pol I, II, III andeubacterial beta subunits). Identified proteins, including six homologsfrom A. thaliana, werealigned using clustalx using default settings. Phylogenetic trees weregenerated from the alignment (with ambiguously aligned regions andhypervariable regions excluded) using the PAUP* program. The treewas generated using the neighbor-joining algorithm with pairwise distancesbetween sequences calculated with a PAM-like matrix. Numbers on thebranches are bootstrap values indicating the percentage of 100 trees inwhich the proteins to the right of the node group together to theexclusion of all other proteins.

Thanks,Craig

Me:

much better figure legend

j

Anyway – and so it went. Alas, for a variety of reasons not much made it into the final paper. What was there was this:

Unexpectedly, Arabidopsis has two genes encoding a fourth class of largest subunit and second-largest subunit (Supplementary Information Fig. 5). It will be interesting to determine whether the atypical subunits comprise a polymerase that has a plant-specific function.

And of course, this Supplemental Information is not exactly easy to find and does not actually work correctly anymore:

Downloading the Zip file and opening first page.htm gets one to this

And then clicking on the Figure 5 you get a broken page w/o the Figure.

But there, hidden in the folder with the Supplemental Information is the figure

So that is the beginning of the story on RNA Pol IV in Arabidopsis.

Go read the E-life paper and some of what it cites for the last 15 years of the story.

——–
This is from the “Tree of Life Blog”
of Jonathan Eisen, an evolutionary biologist and Open Access advocate
at the University of California, Davis. For short updates, follow me on Twitter.

Every year for the last few years I have given a talk on the “Evolution of DNA Sequencing” at the “Workshop in Applied Phylogenetics” at Bodega Bay Marine Lab. I just did the talk and thought I would post the slides here. I note – I also added an evolutionary tree of sequencing methods which I include here as a separate animated gif too.

I note I posted a request to Twitter the day before the talk pointing to last years slides and I got lots of helpful suggestions from people about what to add or change. I included links to Tweets in the talk and thanked those people on the slides. But I would like to thank everyone here too. Published originally on March 10, 2015. Updated 10/20/15 with information below and republished. Finally posted the video of the talk (recorded using Camtasia) to Youtube. It is imperfect (there are a few things I said that came out wrong .. it was late at night). But since it may be helpful to people I am posting it.

——–
This is from the “Tree of Life Blog”
of Jonathan Eisen, an evolutionary biologist and Open Access advocate
at the University of California, Davis. For short updates, follow me on Twitter.

Just found this in an old folder on a different topic. It is a press release from the Wellcome Trust that was handed out at the Cold Spring Harbor Genomes Meeting in 1998 in response to the announcement from Venter et al. that they were starting a company to sequence the human genome.

——–
This is from the “Tree of Life Blog”
of Jonathan Eisen, an evolutionary biologist and Open Access advocate
at the University of California, Davis. For short updates, follow me on Twitter.

In it Simon, who I consider both a friend and colleague and who has been an inspiration to me for much of my work, discusses the history of the concept and the field of ecology. He repeats a key phrase he has used elsewhere:

Ecology, the unifying science in integrating knowledge of life on our planet, has become the essential science in learning how to preserve it.

I like this phrase and plan to use it a bit here and there, with attribution of course.

Levin also discusses how Darwin’s Voyage of the Beagle helped launch the field of ecology because it

defined a new and synthetic way of looking at nature—in which the patterns characteristic of particular regions found explanation in a unifying, dynamic framework

It was only after Voyage of the Beagle and Wallace’s work and others that the term “oekologie” came into being.

I particularly like the end where he connects ecology to study of other complex adaptive systems like economic ones and medical ones.

The article is really really really worth a read.

——–
This is from the “Tree of Life Blog”
of Jonathan Eisen, an evolutionary biologist and Open Access advocate
at the University of California, Davis. For short updates, follow me on Twitter.