It's been sooooo long since I blogged, I'm actually ashamed to come back and post anything now! There is probably some proper term for that like "bloggers' guilt." Aha, yes, a quick google suggests this is a real term!

Anyway, with one thing and another, life has been busy. And while I'd love to say I'm here to write about some exciting topic that's caught my interest (and there are many!), I'm actually back on Blogger to advertise a job opportunity to work with Vasili and I. This one:

Project description:The focus of the project is biochemical analysis of Bacillus subtilis ribosome-associated stress factors. The project addresses questions generated through bioinformatic analysis of ribosomal factors, and will comprise of a biochemical research programme backed up by a complementary set of microbiological and structural investigations. The position is co-financed by funds from Ragnar Söderberg and Carl Tryggers foundations.

Our laboratory uses a combination of experimental (in vitro biochemistry in a reconstituted translational system, NGS-based techniques for in vivo biochemistry, as well as microbiological techniques) and in silico (molecular evolution) approaches. We are currently supported by the Swedish Research Council, Ragnar Söderberg foundation, Kempe foundation, Carl Tryggers foundation, as well as Umeå University funds.

Essential:- Documented proficiency in biochemical assays- Strong skills in experimental design and interpretation- A high level of proficiency in written and spoken English, as well as scientific writing- A strong publication record in international, peer reviewed journals- An ability to work effectively independently and as a group- A willingness to learn new techniques

Applications should be written in English, comprise of a cover letter, CV with a publication list and contact information of at least two referees. Documents should be in MS Word or PDF format and submitted electronically to Dr. Vasili Hauryliuk, vasili.hauryliuk@umu.se, or Dr. Gemma Atkinson,gemma.atkinson@umu.se. The position is available starting spring 2015 (negotiable).

I'll give a little bit of background to the paper. I actually alluded to it last year when we were in the early stages of writing up.

It started with these papers by Gaur et al and Yassin et al, which claim that an insertion in mitochondrial initiation factor mIF2 evolved to compensate for a total lack of IF1 in mitochondria, which is universal in bacteria. This idea is really cute, and in fact they found that bovine mIF2 compensates for a lack of IF2+IF1 in E. coli. In addition, cryo-EM structures suggest the insertion occupies an overlapping site of the ribosome as IF1 does. However, I made an alignment of mIF2 from a broad distribution of eukaryotes and found there's a significant flaw with this hypothesis: mIF1 is absent in all eukaryotes, but the conserved insertion is only present in vertebrates. So somehow most eukaryotes through most of their evolutionary history have managed just fine without IF1 or the mIF2 insertion. This is a really nice example of how you really should do the comparative genomics of you want to make such sweeping claims as X evolved to replace Y. Here's the pic I made for my older blog post:

Just replace "protein 1" with IF2 and "protein 2" with IF1. Simple! The point is, if you only look at humans and bacteria, you may jump to a conclusion that isn't supported when you sample broadly across taxa.

My partner in life and crime, Vasili and I were thinking of writing up a teeny tiny paper about this, but then I started checking out more mitochondrial initiation factors...

Most eukaryotes carry have an orthologue of mitochondrial mIF3, but a homologue had never been found in Saccharomyces cerevisiae. Considering how important IF3 is for bacterial translation, and also how human mIF3 is associated with Parkinson's disease, it was a real bummer that this wasn't present in the yeast mitochondrial system. Well, actually, people just weren't looking hard enough. mIF3 is a small protein with a very biased amino acid content, so if just searching with BlastP, you only pick up a few of the most closely related sequences. I tried the more sensitive PSI-Blast and found an S. cerevisiae homolog called Aim23p. Aim23p was known to be a mitochondrial protein, but its function had not been predicted. Phylogenetic analysis confirmed Aim23p is the orthologue of mIF3. Cool! So our teeny tiny paper suddenly got more substantial.

The two domains of mIF3. The yellow sites are those that are strongly conserved between mIF3 and Aim23p. They are mainly internal, probably important for stabilising the structure. The outer faces are more variable, possibly reflecting lineage-specific evolution of inter-molecular interactions.

But the really cool stuff happened when we got our friends and collaborators from the lab of Piotr Kamenski in Moscow on board. Piotr's group showed that a knock-out of Aim23p can be complemented by Schizosaccharomyces pombe mIF3, strongly suggesting that Aim23 is the functional equivalent of mIF3 as well as the evolutionary orthologue.

I'm really happy about the results of this paper. It's a great feeling to predict something in silico that gets verified in vivo. Our next collaborative project with Piotr's group is a longer shot, but also bloody exciting, so fingers crossed!

Call me a geek, but I'm a big fan of translational GTPases (trGTPases). They're essential factors for translation by the ribosome, highly conserved in all life and they're ancient. Really ancient. There were at least three (EF-Tu, IF2 and EF-G) in the ancestor of all life on earth. These have subsequently diversified through gene duplication and subfunctionalisation into the multiple trGTPase subfamilies that are known today.

Most exciting for me is their claim that all known universal trGTPases contain an active PPIase center, and therefore that the cis-trans isomerization of PS22 is a universal event required for efficient turnover of trGTPases throughout the translation process. This is pretty cool. Hang on though... structures show that the potential PPIase motif is located in the cleft between domains G and V of EF-G, but the other trGTPases they mention in the paper (IF2, EF-Tu, EF-G, EF4 (LepA) and RF3) don't carry a domain homologous to EF-G's C-terminal domain V, so how does that work? Their structural figures of RF3, EF-Tu and LepA show their predictions of where the PPIase centre might be in these proteins. In each case, the centre is formed by residues of the GTPase (G) domain that correspond to the EF-G PPIase sites, plus nearby residues in the non-homologous C-terminal domains of these various trGTPases. So we would predict that if this is a universal mechanism, at least those residues of the G domain that are involved should be universally conserved. Unfortunately, an alignment of this region isn't presented in the paper, but fortunately I have one up my sleeve:

The region of the G domain containing sites proposed to be acting as a PPIase (yellow highlighting). These are just upstream of the G4 nucleotide binding motif (highlighted in turquoise). These are consensus sequences: the residue shown for each trGTPase in each position is the most common amino acid found there across all sequences of the subfamily. In essence, each sequence is an average of hundreds of sequences.

The G domain is generally very well conserved across its length, with patches of universal conservation. However, the potential PPIase sites (yellow) are not universally conserved. The variability of these sites contrasts sharply with the nearby conserved G4 motif (one of the five motifs of the trGTPases - and in fact all GTPases - that coordinate the GTP/GDP nucleotides). It's disappointing, but this pattern of conservation doesn't really bear the hallmark of a universal mechanism. Indeed, Wang et al find that mutating the PPIase motif of EF-G has only a modest effect: a single turn of GTP hydrolysis was unaffected, but the multiple- turnover rates were inhibited by 15–40%. Similarly, mutation of PS22 did not completely abolish translation.

In fact, L11 is actually one of the few ribosomal proteins that is not essential for life, which makes L11 knock out strains useful molecular biological tools (for example this paper). Therefore, the mechanistic details of this protein's role in translation, although very interesting of course, do not translate to an understanding of the core principles of how trGTPase work on the ribosome.

Are we really looking at something as concrete as a conserved "switch-and-latch" mechanism, or is PS22 just a trGTPase binding site with integral flexibility? Anyway, it's all interesting additional details of how trGTPases interact with the ribosome, and another plus: it got me blogging again after a long hiatus!

**** EDIT The position has now been filled and I'm no longer taking applications ****

Having got my Estonian Science Foundation grant funded recently, I have an open PhD position available! See below.

-----------------

We are seeking a highly motivated PhD candidate to be supervised by Dr Gemma Atkinson within the group of Prof Tanel Tenson in the Institute of Technology, University of Tartu, Estonia.

Dr Atkinson’s research addresses protein functional evolution, using bioinformatic approaches and primarily focusing on the ancient families of proteins involved in translation of mRNA to protein. Members of these families are often essential for life and predate the last common ancestor of all life on earth. Thus by studying these proteins we can gain understanding of the fundamental processes of life, and how these processes have evolved over billions of years.

The PhD project will take advantage of the thousands of whole genome sequences now available for the study of evolution of protein families from the origin of life to the present day. Work will involve sensitive sequence searching to identify the presence and absence of particular proteins across genomes, phylogenetic analyses to reconstruct their emergence and evolution, and sequence analyses to link domain- and site-specific patterns of amino acid substitution with molecular function. Specifically, the proposed PhD project will target the ABC superfamily of ATP-binding enzymes found in all domains of life. This superfamily comprises enigmatic proteins of diverse, and often unknown functions. Several ABC enzymes have recently been found to have important roles in regulation of translation such as ribosome recycling protein Rli1/ABCE1, yeast-specific elongation factor eEF3 and starvation response enzymes Gcn1 and Gcn20.

From the results of the PhD, it is expected that enzymes with novel roles in protein synthesis will come to light as interesting targets for subsequent experimental study. Dr Atkinson collaborates with the lab Dr Vasili Hauryliuk, also in Prof Tenson’s group, for biochemical and genetic validation of in silico results. If the candidate so wishes, there is an opportunity to gain practical lab experience in Dr Hauryliuk’s lab.

The candidate should have:

a Masters degree in a biological or computational discipline

a strong interest in, and enthusiasm for molecular evolution

familiarity with basic sequence and phylogenetic analyses

experience in using a programming language such as Python, Perl, Java etc

fluency in spoken and written English

Estonia has a rich culture and beautiful natural environment, with unspoiled forests, meadows and coastlines. Enjoying warm summers and cold winters, the historical city of Tartu is the intellectual capital of Estonia, and its university is the leading research and development institution in the country. The Institute of Technology is a lively, modern centre for biological and technological research.

The PhD will be funded by a monthly stipend, with additional monies available for regular attendance at international conferences and workshops, and for visiting labs abroad. Information on funding is available by request.

Applications should contain:

a full CV with detailed description of previous relevant experience

a statement of academic interests

an electronic version of the Masters thesis

the names and contact details of at least 2 referees

The candidate is expected to start at the latest September 2012. Please send applications and informal enquiries to gemma.atkinson@ut.ee

Last week I got a thick padded envelope from the Wellcome Trust. My colleagues were a bit surprised... I told them it was a grant, and well it kind of was, only not wads of cash, but lumps of modelling clay!

As part of their Festive Tree of Life project, the Wellcome trust sent out free packs of colourful modelling clay in the run up to the festive season. The idea is that you make science-inspired decorations and either hang them on their physical Christmas tree if you're somewhere in the vicinity, or post them on the Festive Tree of Life Flicker page.

I had a fun afternoon the other day making my decorations. Here they are:

Mitochondrion

Chloroplast

Ribosome, with three tRNAs and EF-Tu. The pink thing is supposed to be mRNA, while the string is the nascent polypeptide chain coming out the exit tunnel... scale is overrated anyway.

The clay started to dry up by the time I got to the ribosome, and got less sticky and more tricky to deal with. After a few hours, the tRNAs fell off, and the subunits have now almost dissociated. All of this in the absence of termination and ribosome recycling factors too!

There have been a couple of interesting papers recently on those eukaryotic genes that are more closely related to bacterial, than archaeal homologues. Such proteins are often organellar (athough they may be encoded in the nucleus), having entered eukaryotes with the bacterial endosymbiosis event that gave rise to the mitochondrion (or the event that gave rise to the chloroplast in the case of plants).

This paper tests whether human genes of different ancestries (bacterial versus archaeal) have different effects on phenotype, essentiality of the gene (as judged by lethality in mice), function, selective constraint, expression and position in protein-protein interaction network (PIN). Proteins were classified as bacteria- or archaea-like based on best hit scores in Blast searches.

They found that human genes of archaeal ancestry, although fewer in number, tend to be have higher and broader expression levels, are more likely to be essential, are involved in core information processes, are under greater selection, and tend to be central in the PIN, as compared with bacteria-like genes.

I don't think they mention whether the archaea-like genes they identified have (more distant) homologues in bacteria too... if they do, then we're likely looking at the characteristics of universal, usually essential, core information processing genes. Whether archaeal-like genes that have been lost in bacteria are just as central in eukaryotes as universal genes, it isn't clear.

It's also not clear just how many of the bacteria-like genes are endosymbiotic in origin. 7,884 human genes were found to be bacteria-like, but the human mitochondrion is predicted to contain only 1000-1500 proteins. Of the remainder, while some are likely to be endosymbiotic in origin, but have acquired non-mitochondrial functions, an unknown proportion may actually be of archaeal ancestry, but have been lost in archaea, and so are actually nothing to do with mitchondria. As these proteins are not universally essential, it follows that they would have a less central role in the cell... maybe the two gene populations that are considered in this paper are more like essential for life versus non-essential for life.

Anyway, it's a very interesting paper, particularly the finding that archaeal-like genes are less likely to be involved in inherited diseases. It's also surprising just how many genes did not have an identifiable homologue in either bacteria or archaea (58%).

The second paper, published in MBE addresses the evolutionary history of mitochondrial genes from a broad distribution of eukaryotes:

The idea here is that the endosymbiosis event happened more recently than the divergence of eukaryotes from archaea, and this can be exploited for rooting the eukaryotic tree of life with a less divergent outgroup. Usually eukaryotic phylogenies are made using archaea-like information processing genes, rooted with archaea. However, there is a problem of long branch attraction to the very distant outgroup. This is the phenomenon in molecular phylogenetics where fast evolving, and therefore long branched sequences that should be nested within the tree are pulled down to the base of the tree because of spurious similarities to the outgroup. Using mitochondrial genes to make trees rooted with bacteria theoretically reduces the distance to the outgroup and, therefore, the problem of LBA.

The idea is very neat and I like it in principle. There are a couple of issues though that I think might not help the LBA problem, and in fact might exacerbate the problem.

1. We don't know just how much more recently the mitochondrion was acquired after the divergence of eukaryotes from archaea. Some people might argue that this was the event was involved in the separation of the two lineages.
2. Mitochondrial genes have a faster rate of evolution than their cytoplasmic counterparts.

Still, its interesting to see the results of rooting the eukaryotic tree in this way. The paper doesn't use best hits as in the above paper, but specifically targets known mitochondrial and mitochondrially targeted genes, such as cytochromes and two of the three universal mitochondrial translational GTPases, mIF2 and mEF-Tu. The third, mEF-G was likely excluded because it does not group with alpha-proteobacteria. Although... come to think of it, I don't see mEF-Tu or mIF2 grouping clearly with alphas in my trees... maybe EF-G was excluded because of its duplication early in eukaryotic evolution... though, mEF-Tu has also been duplicated in its history, and actually mEF-G1 is quite a conservative marker... anyway, this paper isn't about trGTPases specifically so I shouldn't drift off topic.

So, the root. They find the root between monophyletic unikonts (opisthokonts and amoebozoa) and bikonts (other eukaryotes), supporting one of the most popular hypotheses. There seems to good statistical support for this topology using the Bayesian inference method, however, maximum likelihood support is only achieved with much filtering of the dataset. It's an interesting new take on rooting the eukaryotic tree, but not one that will convince everyone.

As is so often the conclusion, we're just going to need more eukaryotic protist genomes!

Coevolution (two or more biological objects evolving together) is a common feature of the evolutionary process on all levels from the molecular to the organismal. One of the most beautiful examples is that of hummingbirds and ornithophilous flowers. Hummingbirds feed on the nectar from the flowers, pollinating them in the process. In this mutually beneficial relationship, the plants have evolved flowers that attract the birds with colours that are conspicuous to the bird, and are shaped to perfectly accommodate the bird's beak. This coevolution has happened in a number of hummingbird/plant pairs.

Pic from Wikipedia article on humming birds

For more information on hummingbird/plant coevolution, I direct you to the publications of Ethan Temeles. As usual though, this post will be about proteins, and not whole organisms... and it will include my own crude drawings as usual...

At the molecular level, an example of coevolution is in the establishment of receptor-ligand interactions (Fig 1). The receptor protein binding site has evolved in concert with the binding site of the ligand. In Fig 1, variation of the yelow residues in the receptor is correlated with that of the green residues in the ligand. The yellow sites are close together in the structure, but not necessarily neighboring in the sequence. For example, the amino acid sequence backbone of these imaginary proteins might be arranged like this:

Fig 2. Black lines show the amino acid sequence of the protein, within its structural density.

Thus, if
the structure of the binding interface is known, it's possible to predict
candidate coevolving sites. However from the sequence alone, it's not
so straightforward.

As discussed in a recent paper of Gloor et al in MBE (and references within), there are two explanations for how covarying positions come to be (and these are actually the extremes of the distribution of possible mutational effects):

1. Suppressor mutations. These arise when a mutation with a
deleterious phenotype is suppressed by another mutation at a different
position.
2. Covarions. These are cases when both the original residue and the mutated residue are functionally compatible, but mutation alters the spectrum of amino acids possible at another location.

Covarying sites may occur in the
same protein or in different proteins (Figs 3-4).

In between-protein coevolution, green sites coevolve with yellow sites in our example. But there is also within-protein coevolution among yellow site residues and among green site residues. Imagine for instance a change of green residue that multiple yellow resides interact with at different times (Fig 4). Or perhaps the middle yellow starred residue in Fig 4 mutating and causing different constraints in what residues the neighboring yellow sites can mutate to. Either way, the three yellow sites will covary. Remember that those sites are far away from each other in the sequence. So by showing that these sites co-vary, we can predict that they are functionally related, even if we don't have a structure

Fig 4. Correlated mutations can also occur within one protein

Prediction of co-evolving sites can be useful for understanding cases when binding site residues are unconserved in a multiple sequence alignment. It can also be useful for predicting intermolecular interaction sites, and allosteric sites (for example Chen et al., 2006). An allosteric site can remotely affect the evolutionary pressures on a
distant site by affecting the structural orientation of the protein (Fig 5).

Fig 5. Correlated mutations among binding site residues and an allosteric site.

Prediction of covarying sites is challenging, not only because they may not always be clustered together in sequence and structure, but because covariation is a combined result of structural and functional
constraints and background noise from shared phylogenetic
ancestry and random processes.

There are two classes of methods for predicting covarying sites: tree-aware and tree-unaware. Tree aware methods search for sites whose covariation can not be explained by phylogenetic relationships, while tree-unaware methods ignore phylogenetic relationships, instead searching for covarying sites with the strongest signal. The two classes of methods are discussed in Caporase et al (2008), in which it is concluded that tree-unaware methods perform as well as tree-unaware.

Using a tree-unaware method, Gloor et al. (2010) examine covariation in phosphoglyerate kinase evolution. They identify nonconserved sites that covary, and through mutagenesis show that the sites are important for function and epistatic to each other (mutation in one affects the function of the other). They find that covarying positions
are just as as diverse within and between clades as are noncovarying positions, and suggest that most covarying positions arise from processes more like the covarion model, than the suppression mutation model.

The importance of covariation in sequence evolution is of interest to people like myself who use patterns of sequence variation to predict protein function. In studying molecular evolution of function, we largely rely on the assumption that the most functionally important positions are those that are conserved over time. Although this is generally the case, it seems that some important sites that are able to covary may slip through the net.

Recently, I've been experimenting with the tree-unaware code of Dunn et al., (2008) to find covarying sites Preliminary results, based on the RelA family are... confusing. Residues that would be predicted to be interacting from the structure are not flagged up as covarying, while there are many pairs of predicted covaring sites that are physically distant and don't seem likely to be allostric sites from the structure. It seems like as with many real-life case studies, real biology is a little bit more complicated than naive sketches like mine would have you believe! Oh well, time to delve a little deeper into the data set...

About Me

I'm a Brit working as a post doc researcher in the university of Tartu in Estonia. Moved here in September 2010, after spending two years in Uppsala, Sweden. I'm a computational evolutionary biologist, specifically interested in protein evolution.