An evo-devo geek's scientific meanderings

molecular evolution

I’ve made no secret of my fandom of the RNA world hypothesis, according to which early life forms used RNA both as genetic material and as enzymes, before DNA took over the former role and proteins (mostly) took over the latter. RNA is truly an amazing molecule, capable of doing all kinds of stuff that we traditionally imagined as the job of proteins. However, coaxing it into carrying out the most important function of a primordial RNA genome – copying itself – has proven pretty difficult.

To my knowledge, the previous record holder in the field of RNA copying ribozymes (Wochner et al., 2011) ran out of steam after making RNA strands only half of its own length. (Which is still really impressive compared to its predecessors!) In a recent study, the same team turned to an alternative RNA world hypothesis for inspiration. According to the “icy RNA world” scenario, pockets of cold liquid in ice could have helped stabilise the otherwise pretty easily degraded RNA as well as concentrate and isolate it in a weird inorganic precursor to cells.

Using experimental evolution in an icy setting, they found a variation related to the aforementioned ribozyme that was much quicker and generally much better at copying RNA than its ancestors. Engineering a few previously known performance-enhancing mutations into this molecule finally gave a ribozyme that could copy an RNA molecule longer than itself! It still wouldn’t be able to self-replicate, since this particular guy can only copy sequences with certain properties it doesn’t have itself, but we’ve got the necessary endurance now. Only two words can properly describe how amazing that is. Holy. Shit. :-O

This one’s for those people who say there is nothing special about evolution during the Cambrian – and also for those who say it was too special. (Creationists, I’m looking at you.) It is also very much for me, because Cambrian! (How did I not spot this paper before? Theoretically, it came out before I stopped checking RSS…)

Lee et al. (2013) used phylogenetic trees of living arthropods to estimate how fast they evolved at different points in their history. They looked at both morphology and genomes, because the two can behave very differently. It’s basically a molecular clock study, and I’m still not sure I trust molecular clocks, but let’s just see what it says and leave lengthy ruminations about its validity to my dark and lonely hours 🙂

They used living arthropods because, obviously, you can’t look at genome evolution in fossils, but the timing of branching events in the tree was calibrated with fossils. With several different methods, they inferred evolutionary trees telling them how much change probably happened during different periods in arthropod history. They tweaked things like the estimated time of origin of arthropods, or details of the phylogeny, but always got similar results.

On average, arthropod genomes, development and anatomy evolved several times faster during the Cambrian than at any later point in time. Including the aftermath of the biggest mass extinctions. Mind you, not faster than modern animals can evolve under strong selection – they just kept up those rates for longer, and everyone did it.

(I’m jumping up and down a little, and at the same time I feel like there must be something wrong with this study, the damned thing is too good to be true. And I’d still prefer to see evolutionary rates measured on actual fossils, but there’s no way on earth the fossil record of any animal group is going to be good enough for that sort of thing. Conflicted much?)

Aplacophorans are probably not what you think of when someone mentions molluscs. They are worm-like and shell-less, although they do have tiny mineralised scales or spines. Although they look like one might imagine an ancestral mollusc before the invention of shells, transitional fossils and molecular phylogenies have linked them to chitons, which have a more conventional “sluggy” body plan with a wide foot suitable for crawling and an armoured back with seven shell plates.

Scherholz et al. (2013) compared the musculature of a living aplacophoran to that of a chiton and found it to support the idea that aplacophorans are simplified from a chiton-like ancestor rather than simple from the start. As adults, aplacophorans and chitons are very different – chitons have a much more complex set of muscles that includes muscles associated with their shell plates. However, the missing muscles appear to be present in baby aplacophorans, who only lose them when they metamorphose. (As a caveat, this study only focused on one group of aplacophorans, and it’s not entirely certain whether the two main groups of these creatures should even be together.)

Mammals are pretty rigid when it comes to the differentiation of the vertebral column. We nearly all have seven neck vertebrae, for example. This kind of conservatism is surprising when you look at other vertebrates – which include not only fairly moderate groups like birds with their variable necks, but also extremists like snakes with their lack of legs and practically body-long ribcages. Mammalian necks are evolutionarily constrained, and have been that way for a long time.

Emily Buchholz proposes an interesting explanation with links to previous hypotheses. Mammals not only differ from other vertebrates in the less variable numbers of vertebrae in various body regions; these regions are also more differentiated. For example, mammals are the only vertebrates that lack ribs in the lower back. In Buchholz’s view, this kind of increased differentiation contributes to adaptation but costs flexibility.

Her favourite example is the muscular diaphragm unique to mammals. This helps mammals breathe while they move, and also makes breathing more powerful, which is nice for active, warm-blooded creatures that use a lot of oxygen. However, it also puts constraints on further changes. Importantly, Buccholz argues that these constraints don’t all have to work in the same way.

For example, the constraint on the neck may arise because muscle cells in the diaphragm come from the same place as muscle cells associated with specific neck vertebrae. Moving the forelimbs relative to the spine, i.e. changing the number of neck vertebrae, would mess up their migration to the right place, and we’d end up with equally messed up diaphragms.

A second possible constraint has less to do with developmental mishaps and more to do with plain old functionality. If you moved the pelvis forward, you may not screw with the development of other bits, but you’d squeeze the space behind the diaphragm, which you kind of need for your guts, especially when you’re breathing in using your lovely diaphragm.

*

Buccholz E (2013) Crossing the frontier: a hypothesis for the origins of meristic constraint in mammalian axial patterning. Zoology in press, available online 28/10/2013, doi: 10.1016/j.zool.2013.09.001

If you want to use molecular sequences to uncover the relationships between organisms, you have two choices of molecule. You can use either DNA or the proteins it encodes. I always thought, why the hell would anyone use DNA when they can also use protein?

The DNA alphabet has four different letters, versus the 20 amino acids proteins are made of. There is much more danger of a chance similarity, there’s much more chance of multiple mutations at the same spot returning to the ancestral state and completely erasing important phylogenetic information. You could, I suppose, use codon-based models instead of single-nucleotide models, but what’s the point when you can just translate the sequence and analyse the protein instead?

Well, it seems there is a point. The crucial thing is that while DNA translates unambiguously to protein, this is not true the other way. Take a look at the genetic code table below (modified from here):

The letters in black represent RNA bases (the DNA would have T instead of U), and the coloured ones are the three-letter abbreviations of amino acids, except for Stop, which, as you might have guessed, means “end of protein, stop translating”.

The first thing to note about the table is that most amino acids are encoded by more than one DNA/RNA codon. There’s already more information here than if you simply took the protein sequence. The second point is that some amino acids have two sets of codons that aren’t easily interchangeable.

With something like glycine (bottom right box), all codons are almost the same, only differing in the third letter, which might even be irrelevant anyway due to third base wobble. Mutating between glycine’s codons is easy and unlikely to screw the organism.

In contrast, serine (red and yellow boxes) has two sets of codons that differ in both their first and second positions. It’s much easier to move within either of those sets by mutation than to jump from one set to the other. Changing either of the first two letters in any of these six codons results in a different amino acid, which has a lot more potential to wreak havoc than a mutation that leaves the protein alone.

And apparently, that can, from a phylogenetic point of view, practically turn serine into two different amino acids. In a fairly recent Nature paper, Regier et al. (2010) investigated arthropod relationships and found that while protein-based methods gave very similar results to DNA-based methods, they often couldn’t offer as much support for these results as DNA did. That paper hints at the serine problem and that a couple of the authors are working on it, and now the “working on” bit is out in PLoS ONE. (That’s how I came across this issue, in fact.)

The new analysis (Zwick et al., 2012) finds that tweaking protein-based evolutionary models so that the two kinds of serine count as different letters increases confidence in the resulting tree dramatically. The serines aren’t changing any major conclusions – if you take them out, you still get the same tree, just with lousy statistical support. But clearly, protein sequences alone were missing important evidence. In another situation, they might make the difference between a wrong answer and a right one.

(Now I wonder if anyone’s done codon-based Hox gene phylogenies. Hox genes/proteins can be really difficult to classify because only a short region can be compared among all of them, and this short region evolves pretty slowly, yielding very few informative differences. But what if there’s more information hiding in the codons? There’s not an awful lot of serine in homeodomains, though, and while the other sixfold degenerate amino acid (arginine) is pretty common in them, that one doesn’t have nearly the mutational chasm that separates the two codon clusters for serine. Meh. Maybe codons wouldn’t help at all with Hoxes.)

***

*OK, technically, you sort of have three choices, but RNA and DNA sequences contain the exact same information, so they don’t really count as different.

One of the defining characteristics of life is responding to stuff that the environment throws at it. At the level of cells, such responses are often accomplished by what we call signalling pathways. These are chains of interacting proteins that detect a stimulus (chemicals, voltage differences, pressure, light, etc.) on one end, and affect gene regulation or modify the activity of cellular components on the other end. One of the most common way of passing a message from one protein to another is phosphorylation – an enzyme called a kinase attaches phosphate groups to another protein, changing its behaviour. Kinases that phosphorylate proteins are unsurprisingly called protein kinases. (Their families are named after their favourite amino acid to phosphorylate, so we have tyrosine kinases, histidine kinases, etc.)

There are shitloads of protein kinasess. Legend has it that the acronym JAK, which officially refers to the “two-faced” Janus kinases, originally stood for “Just Another Kinase”. (I guess “Just Another Damned Kinase” didn’t abbreviate so well.) Every cell encounters many different stimuli, each of which may require a different response, and a diversity of signalling pathways can provide a more sophisticated ability to handle all conceivable circumstances. And sometimes, it’s best if such pathways keep to themselves.

Capra et al. (2012) investigate a curious property of a simple signalling pathway in bacteria. This pathway reacts to a shortage of phosphate, and consists only of the histidine kinase PhoR, and the regulatory protein it phosphorylates (PhoB). (Presumably there is still enough phosphate for the enzyme to work when the reaction kicks in…) The PhoR-PhoB pathway is found in all sorts of bacteria. In each major group, the handful of amino acids that determine the specificity of the interaction are strongly conserved. However, these “specificity residues” sometimes differ markedly between groups. Their conservation within groups suggests that changing them has dire consequences. So how and, most importantly, why were they changed anyway?

The study focused on three groups of bacteria: the alpha, beta and gamma classes of proteobacteria, which include familiar bugs like E. coli. In fact, E. coli (a gamma-proteobacterium) was one of the two main experimental species, the other one being the alpha-proteobacterium Caulobacter crescentus. The alpha bugs have an odd set of PhoR specificity residues compared to other proteobacteria, and the researchers hypothesised that this isn’t accidental. Instead, they thought, it might prevent PhoR from meddling with another signalling pathway that gamma-proteobacteria like E. coli lack.

The differences certainly aren’t without consequence. PhoR from E. coli can barely phosphorylate PhoB from alpha-proteobacteria, while it works quite happily on the same protein from other gamma-proteobacteria. It also does reasonably well on PhoB from the beta class, in accordance with the greater similarity of their specificity residues. C. crescentus PhoR only really works on PhoB from its own class.

How about that hypothesised other pathway? Well, when E. coli and C. crescentus PhoR are tested on the regulatory proteins from all similar pathways in C. crescentus, one particular molecule stands out. NtrX is the member of a pathway that has been duplicated in alpha- but not gamma-proteobacteria – and E. coli PhoR phosphorylates it! Is this duplication the reason why PhoR took a strange direction in this class of bacteria?

Multiple lines of evidence indicate that the researchers’ hunch was right. Replacing just one of the three altered specificity residues in C. crescentus PhoR to match the sequence in the other classes causes it to start interacting with NtrX at the expense of its normal function. C. crescentus with such “gamma-like” PhoR grows just as lousily in a phosphate-poor environment as C. crescentus with no PhoR at all, but only if NtrX is also present – delete the ntrx gene from the genome, and the disadvantage almost completely disappears. (NtrX isn’t disposable, though – under normal circumstances, it’s NtrX-deficient bacteria who perform badly.) A gamma-like PhoR can still interact normally with its correct target*, but it’ll simply ignore poor PhoB when NtrX is also around.

[*Which suggests to me that more than those few amino acids are involved in the PhoR-PhoB interaction, since a C. crescentus PhoR with specificity residues completely identical to those of E. coli still phosphorylates C. crescentus PhoB much better than PhoR from E. coli. However, those three do seem to be the main culprits in the NtrX mix-up.]

(In an interesting twist, it turns out that beta-proteobacteria also possess the NtrX pathway, but they tweaked NtrX instead of PhoR. The result is the same – each protein minds its own business, peace and prosperity and mad procreation ensue.)

The authors hypothesise that the Ntr pathways must have duplicated and diverged at a time when phosphate limitation didn’t come up often, given how much of a nuisance NtrX becomes to old-fashioned PhoR when phosphate is scarce. When phosphate did become a problem, the bugs were stuck with an already established NtrX pathway that they couldn’t just boot out of their genomes. Under those circumstances, any mutation getting NtrX out of PhoR’s way would have been the definition of beneficial.

Avoiding crosstalk seems to be a general feature of this kind of pathway: when you compare the specificity residues of all the signalling kinases and kinase targets from the same kind of bacterium, it’s as though they’re all doing their darnedest to be as different as possible. Capra et al. note that signalling pathways relying on a small set of amino acids to ensure specificity are very common in all life forms. They also often proliferate by gene duplication, which would make the crosstalk-avoidance issue a huge force in protein evolution. Good thing that so few mutations are needed, then – where would the complexity of the living world be if duplicated pathways all died, stuck between being redundant and screwing the organism?

Richard Lenski’s team is one of my favourite research groups in the whole world. If the long-term evolution experiment with E. coli was the only thing they ever did, they would already have earned my everlasting admiration. But they do other fascinating evolution stuff as well. In their brand new study in Science (Meyer et al., 2012), they explore the evolution of a novelty – in real time, at single nucleotide resolution.

For their experiments, they used a pair of old enemies: the common gut bacterium and standard lab microbe E. coli, and one of its viruses, the lambda phage. Phages (or bacteriophages, literally “bacterium eaters”) are viruses that infect bacteria. They are also some of mother nature’s funkiest-looking children. Below is an example, because if you haven’t seen one of them, you really should. I borrowed this electron micrograph of phage T4 from GiantMicrobes, where you can get a cute plushie version 😛

Phages work by latching onto specific proteins in the cell membrane of the bacterium, and literally injecting their DNA into the cell, where it can start wreaking havoc and making more viruses. Meyer et al.‘s phage strain was specialised to use an E. coli protein called LamB for attachment.

The team took E. coli which (mostly) couldn’t produce LamB because one of the lamB gene’s regulators had been knocked out. Their virus normally couldn’t infect these bacteria, but a few of the bacteria managed to switch lamB on anyway, so the viruses could vegetate along in their cultures at low numbers. Perfect setup for adaptation!

Meyer and colleagues performed a lot of experiments, and I don’t want to go into too much detail about them (hey, is that me trying not to be verbose???). Here are some of their intriguing results:

First, the phages adapted to their LamB-deficient hosts. They did so very “quickly” in terms of what we usually think of as evolutionary time scales (naturally, “evolutionary time scales” mean something different for organisms with life cycles measurable in minutes). Mutations in the gene coding for their J protein (the one they use to attach to LamB) enabled them to use another bacterial protein instead. Not all experimental populations evolved this ability, but those that did succeeded in less than 2 weeks on average.

The new protein target, OmpF, is quite similar to LamB, which might explain how the viruses evolved the ability to use it so quickly. But more interesting than the speed is the how of their innovation. Amazingly, all OmpF-compatible viruses shared two specific mutations. Another mutation always occurred in the same codon, that is, it affected the same amino acid in the J protein. A fourth mutation invariably occurred in a short region near the other three. Altogether, these four mutations allowed the virus to use OmpF. Plainly, we are dealing with more than mere convergent evolution here. Often, many different mutations can achieve the same thing (see e.g. Eizirik et al., 2003), but in this case, a very specific set of them appeared necessary. I’ll briefly revisit this point later, but first we have another fascinating result to discuss!

By comparing dozens of viruses that did and didn’t evolve OmpF compatibility, the researchers determined that all four mutations were necessary for the new ability. Three were not enough; there were many viral strains with three of the four mutations that couldn’t do anything with LamB-deficient bacteria. On the surface, this sounds almost like something Michael Behe would say (see Behe and Snokes, 2004), except the requirement for more than one mutation clearly didn’t prevent innovation here. Given the distribution of J mutations, it’s also likely that they were shaped by natural selection, even in virus populations that didn’t evolve OmpF-compatibility. So what did the first three mutations do? What use was, as it were, half a new J protein?

The answer would delight the late Stephen Jay Gould: the new function was a blatant example of exaptation. Exaptations are traits that originally had one function, but were later co-opted for another. While three mutations predisposed the J gene to OmpF-compatibility, they also improved its ability to bind its original target. Thus, there was a selective advantage right from the first mutation. And, in essence, this is what we see over and over again when we look at novelties. Fish walk underwater, non-flying dinosaurs cover their eggs with feathered arms, and none of them have the first clue that their properties would become success stories for completely different reasons.

In the paper, there is a bit of discussion on co-evolution and how certain mutations in the bacteria influenced the viruses’ ability to adapt to OmpF, but I’d like to go back to the convergence/necessity point instead. I have a few half-formed thoughts here, so don’t expect me to be coherent 😉

We’ve seen cases where the same outcome stems from different causes, like in the cat colour paper cited above. Then there is this new function in a virus that seems to always come from (almost) the same set of mutations. Why? I’m thinking it has to do with (1) the complexity of the system, (2) the type of outcome needed.

Proteins interact with other proteins through very specific interfaces. Sometimes, these interactions can depend on as little as a single amino acid in one of the partners. If you want to change something like that, there is simply little choice in what you can do without screwing everything up. On the other hand, something like coat colour in mammals is controlled by a whole battery of genes, each of which may be amenable to many useful modifications. And when it comes to even more complex traits like flying (qv. aside discussing convergence and vertebrate flight/gliding in the mutations post), the possibilities are almost limitless.

So there’s that, and there is also what you “want” with a trait. There may be more ways to break a gene (e.g. to lose pigmentation) than to increase its activity. When the selectively advantageous outcome is something as specific as a particular protein-protein interaction, the options may be more restricted again. (To top that, the virus has to stick to the bacterium with a very specific part in its structure, or the whole “inject DNA” bit goes the wrong way.) Now that I read what I wrote that sounds like there will be very few “universal laws” of evolutionary novelty (exaptation being one of them?). Hmm…