An evo-devo geek's scientific meanderings

natural selection

I take all my hats off to Richard Lenski and his team. If you’ve never heard of them, they are the group that has been running an evolution experiment with E. coli bacteria non-stop for the last 25 years. That’s over 50 000 generations of the little creatures; in human generations, that translates to ~1.5 million years. This experiment has to be one of the most amazing things that ever happened in evolutionary biology.

(Below: photograph of flasks containing the twelve experimental populations on 25 June 2008. The flask labelled A-3 is cloudier than the others: this is a very special population. Photo by Brian Baer and Neerja Hajela, via Wikimedia Commons.)

It doesn’t necessarily take many generations to see some mind-blowing things in evolution. An irreducibly complex new protein interaction (Meyer et al., 2012), the beginnings of new species and a simple form of multicellularity (Boraas et al., 1998) are only a few examples. However, a few generations only show tiny snapshots of the evolutionary process. Letting a population evolve for thousands of generations allows you to directly witness processes that you’d normally have to glean from the fossil record or from studies of their end products.

Fifty thousand generations, for example, can tell you that they aren’t nearly enough time to reach the limit of adaptation. The newest fruit of the Long-Term Evolution Experiment is a short paper examining the improvement in fitness the bacteria experienced over its 25 years (Wiser et al., 2013). “Fitness” is measured here as growth rate relative to the ancestral strain; the faster the bacteria are able to grow in the environment of the LTEE (which has a limited amount of glucose, E. coli‘s favourite food), the fitter they are. The LTEE follows twelve populations, all from the same ancestor, evolving in parallel, so it can also determine whether something that happens to one population is a chance occurrence or a general feature of evolution.

You can draw up a plot of fitness over time for one or more populations, and then fit mathematical models to this plot. Earlier in the experiment, the group found that a simple model in which adaptation slows down over time and eventually grinds to a halt fits the data well. However, that isn’t the only promising model. Another one predicts that adaptation only slows, never stops. Now, the experiment has been running long enough to distinguish between the two, and the second one wins hands down. Thus far, even though they’ve had plenty of time to adapt to their unchanging environment, the Lenski group’s E. coli just keep getting better at living there.

Although the simple mathematical function that describes the behaviour of these populations doesn’t really explain what’s happening behind the scenes, the team was also able to reproduce the same behaviour by building a model from known evolutionary phenomena. For example, they incorporated the idea that two bacteria with two different beneficial mutations in the same bottle are going to compete and slow down overall adaptation. (This is a problem of asexual organisms. If the creatures were, say, animals, they might have sex and spread both mutations at the same time.) So the original model doesn’t just describe the data well, it also follows from sensible theory. So did the observation that the populations which evolved higher mutation rates adapted faster.

Now, one of the first things you learn about interpreting models is that extrapolating beyond your data is dangerous. Trends can’t go on forever. In this case, you’d eventually end up with bacteria that reproduced infinitely fast, which is clearly ridiculous. However, Wiser et al. suggest that the point were their trend gets ridiculous is very, very far in the future. “The 50,000 generations studied here occurred in one scientist’s laboratory in ~21 years,” they remind us, then continue: “Now imagine that the experiment continues for 50,000 generations of scientists, each overseeing 50,000 bacterial generations, for 2.5 billion generations total.”

If the current trend continues unchanged, they estimate that the bugs at that faraway time point will be able to divide roughly every 23 minutes, compared to 55 minutes for the ancestral strain. That is still a totally realistic growth rate for a happy bacterium!

I know none of us will live to see it, but I really want to know what would happen to these little guys in 2.5 billion generations…

Technically, the first paper isn’t about new new genes: Assis and Bachtrog (2013) examined recently duplicated genes in fruit flies. But screw technicalities, what they’re saying makes my eyes pop.

When a gene is accidentally copied, a variety of possible fates can await it. Most of the time, the extra copy just dies. Some mechanisms of gene duplication just take the gene without the regulatory elements it needs to function properly. Even if the new copy works, it’s still redundant, so there’s nothing stopping mutations from destroying it over time. However, sometimes redundancy is removed before the new gene breaks irrevocably, and both copies are kept. This can, in theory, happen in a number of ways. Because I’m feeling lazy, let me just quote them from the paper (square brackets are mine, because I hate repeatedly typing out long ugly words :)):

Four processes can result in the evolutionary preservation of duplicate genes: conservation, neofunctionalization, subfunctionalization, and specialization. Under conservation, ancestral functions are maintained in both copies, likely because increased gene dosage is beneficial (1). Under neofunctionalization [NF], one copy retains its ancestral functions, and the other acquires a novel function (1). Under subfunctionalization [SF], mutations damage different functions of each copy, such that both copies are required to preserve all ancestral gene functions (9, 10). Finally, under specialization, subfunctionalization and neofunctionalization act in concert, producing two copies that are functionally distinct from each other and from the ancestral gene (11).

We might add a variation on NF, too: Proulx and Phillips (2006) theorised that differences in function that arise in different alleles (variants) of a single gene can turn duplication into an advantage, turning the conventional duplication-first, new function-next scenario on its head.

Either way, genomes contain lots of duplicated genes, there’s no question about that. What isn’t nearly as well understood is the relative importance of various mechanisms in producing all these duplicates. It’s much easier to theorise about mechanisms than to test the theories. Since evolution doesn’t stop once a new gene has earned its place in the genome, it can be hard to disentangle the mechanism(s) responsible for its preservation from the stuff that happened to it later. Also, to really assess the relative role of different mechanisms, you’ve got to look at whole genomes.

(Assis and Bachtrog say that this hasn’t been done before, and then go right on to cite He and Zhang [2005], which is a genome-wide study of SF and NF. I guess it doesn’t look at all the mechanisms…)

Assis and Bachtrog used the amazing resource that is the 12 Drosophila genomes project, focusing on D. melanogaster and D. pseudoobscura to find slightly under 300 pairs of genes that duplicated after the divergence of those two species. Since Drosophila genomes are very well-studied, they were able to identify the “parent” and “child” in each pair based on where they sit on their chromosomes. They then also extracted thousands of unduplicated genes from the melanogaster and pseudoobscura genomes, to use as a measure of background divergence between the two species.

To measure changes in gene function, they compared the expression of parent and child genes to each other and to the “ancestral” copy (i.e. the unduplicated gene in the other species) in different parts of the body (if a gene is suddenly turned on somewhere it wasn’t before, it’s probably doing something new!).

Long story short, it turned out that in the majority of cases (167/281) cases the child copy behaved much more differently from the “ancestor” than expected, while the parent copy stayed pretty close. These child copies also showed faster sequence evolution than their parents. This means that NF – and specifically that of the new copy – is the most common fate of newly duplicated genes in these animals. There’s also a fair number of gene pairs where both copies gained new functions or both stuck with the old ones, but only three where both copies lost functions. Pure SF, which very influential studies like Force et al. (1999) championed as the dominant mode of duplicate gene survival, appears to be an incredibly rare occurrence in fruit flies!

A few paragraphs ago I mentioned the caveat that duplicated genes don’t stop evolving just because they’ve managed to survive. Well, the advantage of having all these Drosophila genomes is that you can further break down “young” duplicates into narrower age groups, using the species that fall between melanogaster and pseudoobscura on the tree. However, looking at this breakdown doesn’t change the general pattern – NF of the child copy is the most common and SF is rare or nonexistent in even the youngest age groups, along both the melanogaster and the pseudoobscura lineages.

So what exactly is going on here?

Part of the difference in expression patterns between parent/ancestral and child copies is because these new genes are turned on in the testicles, which might give us a big clue. Testicles, you see, are a bit anarchical. Things that are normally kept silent in the genome, like various kinds of parasitic DNA, wake up and run wild during the making of sperm. If you remember my throwaway reference to duplication mechanisms that cut the gene off from its old regulatory elements – well, the balls are a place where even such lost and lonely genes get a second chance.

The genomic anarchy of testes is also one of the reasons these duplications happen in the first place; the aforementioned mechanism involves those bits of parasitic DNA that copy and paste themselves via an RNA intermediate. The enzymes they use to reverse transcribe this RNA into DNA and insert it back into the genome aren’t particularly discerning, and they’ll happily do their thing on a piece of RNA that isn’t the parasite. Indeed, slightly more NFed child genes than you’d expect originated via RNA, although it’s worth noting that more than half of them still didn’t. So while the testes look like a good place for new gene copies to find a use, they aren’t totally responsible for their origins.

Why is there so little SF among these genes?

This is the Obvious Question; my jaw nearly landed on my desk when I saw the numbers. The authors have two hypotheses, both of which may be true at the same time.

First, SF assumes that the two copies have the same functions to begin with. This is not necessarily true when just a small segment of DNA is duplicated – even when it’s not just a bare gene you’re copying, the new copy might lose part of its old regulatory elements and/or land next to new ones, not to mention Proulx and Phillips’s idea of new functions appearing before duplication. So maybe SF is more common after wholesale duplications of entire genomes, and Drosophila species didn’t have any of those recently.

Secondly, SF happens by genetic drift, which is a random process that works much better in small populations. Fruit flies aren’t known for their small populations, and therefore the dominant evolutionary force acting on their genomes will be selection.

This makes sense to me, but the degree to which NF dominates the picture is still pretty amazing. I wonder what you’d get if you applied the same methods to different species. Would species with smaller populations, or those that recently duplicated their whole genomes, show more evidence for SF as you’d expect if the above reasoning is correct? Or would the data slaughter all those seemingly reasonable explanations? What would you see in parthenogenetic species that have no males (and testicles)?

In a way, the limitations of evolution are more interesting to me than its possibilities. It’s cool to figure out how exquisite adaptations and fantastically complex molecular machines might have evolved, but I like my evolution the way Brandon Sanderson likes his magic. If it can do anything, then where’s the fun? Deep underlying rules and constraints are what make it really interesting.

Convergent evolution can hint at such rules. Some of them are just physics and seem pretty straightforward. If you’re a creature swimming in the sea, being streamlined is good for you, and there aren’t that many ways of being streamlined. So dolphins, squid and sharks have the same basic shape despite coming from very different ancestors. Other cases involve more subtle and probably more interesting constraints. The baggage of your ancestry, the interactions in your genome, the pool of available mutations, can all restrict the ways in which you can adapt to a particular challenge. A study I found in the huge backlog of random pdfs on my desktop probes tentatively into the importance of such intrinsic limitations.

Conte et al. (2012) asked a seemingly simple question that has apparently never been systematically investigated before: how often does convergent or parallel evolution of the same trait result from modification of the same genes?

Convergent and parallel evolution are sort of two ends of a continuum. We use parallel evolution to refer to traits that evolved in similar directions starting from the same starting point. For example, three-spine sticklebacks repeatedly lost their bony armour when they moved from the sea to rivers and lakes in various places around the world. The ancestor is the same heavily armoured marine fish in each case, and most freshwater populations underwent very similar changes (including their genetic basis) from this common beginning. At the other end of the scale you find clear instances of convergence, such as “milk” in mammals and birds. Their common ancestors not only didn’t ooze custom-made immune-boosting baby food, they likely didn’t even care for their young.

Back to the paper. Conte et al. conducted what we call a meta-analysis: collecting and analysing data from all published studies that fit their pre-determined set of criteria. Altogether, they looked at a carefully selected set of 25 studies about the genetic basis of convergent traits. Not too great, the authors acknowledge, but it’s a start.

The studies were divided into two sets, because the two main methods of looking at the genetic basis of a trait can’t easily be analysed together. The first set contained genetic mapping studies (“which parts of the genome cause X?”), and the second candidate gene studies (“does this gene cause X?”). The convergent traits in these studies were quite diverse. There was pale skin from cave fish to humans, African and European peoples’ ability to digest lactose as adults, resistance to tetrodotoxin in snakes, wing patterns in butterflies, electric organs in fish…

The comparisons span quite a long time scale. On one end, there are populations within a single species, like lactose-tolerant Europeans and Arabs, that diverged mere tens of thousands of years ago. On the other, pale-skinned cave fish and Swedes are separated by something on the order of 400 million years. This is part of what makes this an exciting study, because you can indirectly observe what happens to genetic constraints over time.

The most exciting, though, is the sheer amount of gene re-use the researchers saw. For mapping studies, they found a 32% chance that the same trait will be associated with the same gene(s) in different species. Candidate genes give an even higher estimate (55%), but that might just be the nature of the beast. When a candidate gene is not behaving as expected it’s probably less interesting and publishable, Conte et al. argue, whereas mapping studies will usually throw up something to write about.*

Within a species, the probability of the same gene being used in the same adaptation gets as high as 80% for both methods. This is despite the fact that often the traits in question are controlled by several genes, any of which could be mutated to the same effect. Where you come from clearly has a huge impact on where (and how) you can go. The impact lessens as you look at increasingly distant species; at a hundred million years of divergence, mapping data show only 10% similarity between convergent traits, and even candidate genes drop to around 40%. (Methinks 10% is still a big number considering how many genes we have, but of course we’re talking about relatively simple traits here, so the number of relevant genes isn’t nearly as high.)

There are some logical possible explanations behind both the high level of genetic convergence in close relatives and the big drop with increasing divergence. For example, it could be that populations within a species have very similar pools of genetic variation. If the same genes vary, then natural selection will “naturally” hit on the same genes when adaptation becomes handy. It’s also likely that the rest of the genome plays a part – closely related populations/species have more similar genetic backgrounds, their genes likely interact with one another in more similar ways, ergo the restrictions on what mutations can become beneficial are also similar. As lineages diverge, so do such interactions and restrictions, lowering the probability that two species evolve the same trait in the same way.

Of course, it’s at this point impossible to say which of the potential reasons actually cause the trends observed in this study, but that wasn’t the point. The authors’ stated goals were pretty modest:

“[O]ur aim here has been to stimulate thinking about these issues and to move towards a quantitative understanding of repeated genetic evolution” (p5044)

In that, I hope, they have succeeded. It’d be lovely to see more of this “big picture” discussion of convergent evolution. Big pictures make Mammals happy.

***

*I’m not sure about that, myself. I think if you’ve got a gene that’s been shown to do X in species after species, a negative finding is a lot more newsworthy than yet another confirmation of the same old shit. I suppose it’s gut feeling versus intuition until someone does a study of that, though 🙂

***

Reference:

Conte GL et al. (2012) The probability of genetic parallelism and convergence in natural populations. Proceedings of the Royal Society B279:5039-5047

Richard Lenski’s team is one of my favourite research groups in the whole world. If the long-term evolution experiment with E. coli was the only thing they ever did, they would already have earned my everlasting admiration. But they do other fascinating evolution stuff as well. In their brand new study in Science (Meyer et al., 2012), they explore the evolution of a novelty – in real time, at single nucleotide resolution.

For their experiments, they used a pair of old enemies: the common gut bacterium and standard lab microbe E. coli, and one of its viruses, the lambda phage. Phages (or bacteriophages, literally “bacterium eaters”) are viruses that infect bacteria. They are also some of mother nature’s funkiest-looking children. Below is an example, because if you haven’t seen one of them, you really should. I borrowed this electron micrograph of phage T4 from GiantMicrobes, where you can get a cute plushie version 😛

Phages work by latching onto specific proteins in the cell membrane of the bacterium, and literally injecting their DNA into the cell, where it can start wreaking havoc and making more viruses. Meyer et al.‘s phage strain was specialised to use an E. coli protein called LamB for attachment.

The team took E. coli which (mostly) couldn’t produce LamB because one of the lamB gene’s regulators had been knocked out. Their virus normally couldn’t infect these bacteria, but a few of the bacteria managed to switch lamB on anyway, so the viruses could vegetate along in their cultures at low numbers. Perfect setup for adaptation!

Meyer and colleagues performed a lot of experiments, and I don’t want to go into too much detail about them (hey, is that me trying not to be verbose???). Here are some of their intriguing results:

First, the phages adapted to their LamB-deficient hosts. They did so very “quickly” in terms of what we usually think of as evolutionary time scales (naturally, “evolutionary time scales” mean something different for organisms with life cycles measurable in minutes). Mutations in the gene coding for their J protein (the one they use to attach to LamB) enabled them to use another bacterial protein instead. Not all experimental populations evolved this ability, but those that did succeeded in less than 2 weeks on average.

The new protein target, OmpF, is quite similar to LamB, which might explain how the viruses evolved the ability to use it so quickly. But more interesting than the speed is the how of their innovation. Amazingly, all OmpF-compatible viruses shared two specific mutations. Another mutation always occurred in the same codon, that is, it affected the same amino acid in the J protein. A fourth mutation invariably occurred in a short region near the other three. Altogether, these four mutations allowed the virus to use OmpF. Plainly, we are dealing with more than mere convergent evolution here. Often, many different mutations can achieve the same thing (see e.g. Eizirik et al., 2003), but in this case, a very specific set of them appeared necessary. I’ll briefly revisit this point later, but first we have another fascinating result to discuss!

By comparing dozens of viruses that did and didn’t evolve OmpF compatibility, the researchers determined that all four mutations were necessary for the new ability. Three were not enough; there were many viral strains with three of the four mutations that couldn’t do anything with LamB-deficient bacteria. On the surface, this sounds almost like something Michael Behe would say (see Behe and Snokes, 2004), except the requirement for more than one mutation clearly didn’t prevent innovation here. Given the distribution of J mutations, it’s also likely that they were shaped by natural selection, even in virus populations that didn’t evolve OmpF-compatibility. So what did the first three mutations do? What use was, as it were, half a new J protein?

The answer would delight the late Stephen Jay Gould: the new function was a blatant example of exaptation. Exaptations are traits that originally had one function, but were later co-opted for another. While three mutations predisposed the J gene to OmpF-compatibility, they also improved its ability to bind its original target. Thus, there was a selective advantage right from the first mutation. And, in essence, this is what we see over and over again when we look at novelties. Fish walk underwater, non-flying dinosaurs cover their eggs with feathered arms, and none of them have the first clue that their properties would become success stories for completely different reasons.

In the paper, there is a bit of discussion on co-evolution and how certain mutations in the bacteria influenced the viruses’ ability to adapt to OmpF, but I’d like to go back to the convergence/necessity point instead. I have a few half-formed thoughts here, so don’t expect me to be coherent 😉

We’ve seen cases where the same outcome stems from different causes, like in the cat colour paper cited above. Then there is this new function in a virus that seems to always come from (almost) the same set of mutations. Why? I’m thinking it has to do with (1) the complexity of the system, (2) the type of outcome needed.

Proteins interact with other proteins through very specific interfaces. Sometimes, these interactions can depend on as little as a single amino acid in one of the partners. If you want to change something like that, there is simply little choice in what you can do without screwing everything up. On the other hand, something like coat colour in mammals is controlled by a whole battery of genes, each of which may be amenable to many useful modifications. And when it comes to even more complex traits like flying (qv. aside discussing convergence and vertebrate flight/gliding in the mutations post), the possibilities are almost limitless.

So there’s that, and there is also what you “want” with a trait. There may be more ways to break a gene (e.g. to lose pigmentation) than to increase its activity. When the selectively advantageous outcome is something as specific as a particular protein-protein interaction, the options may be more restricted again. (To top that, the virus has to stick to the bacterium with a very specific part in its structure, or the whole “inject DNA” bit goes the wrong way.) Now that I read what I wrote that sounds like there will be very few “universal laws” of evolutionary novelty (exaptation being one of them?). Hmm…

Evolution depends on variation, and variation depends on mutations. The evolution of new features, in particular, wouldn’t be possible without new mutations. Thus, mutation is of great interest to evolutionary biologists. More specifically, how mutations affect an organism’s fitness has been discussed and debated ever since the concept of mutations entered evolutionary theory. Relatively speaking, how many mutations are harmful, beneficial, or neither? What kinds of mutations are likely to be each in which parts of the genome? It’s hard to get a confident picture on such questions, partly because there are so many possible mutations in any given gene, let alone genome, and partly because fitness isn’t always easy to measure (see Eyre-Walker and Keightley [2007] for a review).

Hietpas et al. (2011) did something really cool that hasn’t been done before: they took a small piece of an important gene, and examined the fitness consequences of every possible mutation in that sequence. This approach is limited in its own way, of course. Due to the sheer number of possibilities, it’s only feasible for short sequences, which might make it hard to generalise any results. But the unique window it opens on the relationship of a gene’s sequence and its owner’s success is invaluable.

What did they do?

Let’s examine the method in a bit more detail, mainly to understand what “every possible mutation” means in this context; because it’s a little more complicated than it sounds.

The bit of DNA they chose codes for a 9-amino acid region of heat shock protein 90 (Hsp90) in brewer’s yeast. So it really is small, only 27 base pairs altogether (recall that in the genetic code, 3 base pairs [1 codon] translate to 1 amino acid). Hsp90 is a very important protein found all over the tree of life. It’s a so-called chaperone, a protein that helps other proteins fold correctly, and in eukaryotes it’s absolutely required for survival.

The team generated mutant versions of the Hsp90 gene, each of which differed from the “wild type” version in one codon out of these nine. So each “mutation” examined could actually be anywhere between one and three mutations. They generated all possible mutants like that, amounting to over 500 different sequences.

[NOTE: If you check back at the genetic code, you’ll note that most amino acids are encoded by more than one codon, so not all of the resulting proteins differed from one another. Mutations that don’t change the amino acid are called synonymous. This will become important later.]

Then came the measurement of fitness. The researchers took a strain of yeast whose own Hsp90 gene was engineered not to work at high temperatures, and infected the cells with small pieces of DNA called plasmids, each carrying either a wild type (temperature-insensitive) Hsp90 gene or one of the 500+ mutants. They then grew all cells together in a common culture. After a while, they raised the growing temperature to let the engineered genes determine the cells’ survival.

They took samples every few hours – wild type yeast populations doubled every 4 hours – and did something that would not have been possible even a few years ago: sequenced the region of interest from this mixed culture, and compared the abundance of different sequence variants. By counting how many times each mutant was sequenced at each time point, they got a very good estimate of their relative abundances. The way each mutant prospered or declined relative to others over time gave a measurement of their fitness.

What did they find?

There are so many interesting things in this study that I’m not sure where to begin. Let’s start with the result that concerns the first question posed in my introductory paragraph. How are the mutations distributed along the deleterious – beneficial axis?

Perhaps not surprisingly, most non-synonymous mutations were harmful to fitness. I say not surprisingly because this protein has been honed by selection for many, many millions of years. It is probably close to the best it can be, although the researchers tried to pick a region that contained variable as well as highly conserved amino acids.

[ASIDE: They didn’t really succeed in that – among the 400+ species they say they used for comparison, 4 of 9 positions don’t vary at all, 2 are identical in almost all species, another 2 can have two amino acids with roughly equal chance, and only one can hold three different amino acids. I’ve seen more variation in supposedly highly conserved sequences over smaller phylogenetic distances. Perhaps Hsp90 is just that conserved everywhere.]

There were a few mildly beneficial mutations, but no highly beneficial ones. Deleterious mutations could be divided into two large groups, with very few in between: mostly they were either very harmful or close to neutral. This constitutes support for the nearly neutral theory of molecular evolution, but as I said, the sequence they examined is hardly representative of all sequences under all circumstances. It would be interesting to see how (if) the distribution changes in sequences under directional selection, or sequences that don’t experience much selection at all. I’m kind of hoping that that’s their next project 😛

The second interesting observation – interesting to me, anyway – is that nonsense mutations, those that introduce an early stop codon in the sequence, were not as unfit as complete deletions of the gene. A stop codon means the end of the protein – an early stop codon eliminates everything that comes after it. Cells making a truncated protein were lousy at survival, but not quite as lousy as cells with no Hsp90 at all. This is a bit strange, given that earlier the paper states that a region of Hsp90 that comes after their 9 amino acids is necessary for its function. A nonsense mutation in the test region removes that supposedly necessary part, so why did those cells do any better than mutants lacking the gene entirely?

Looking at synonymous mutations, the team determined that these don’t affect fitness much. This has practical importance, because synonymous mutations have long been used as a “baseline” to detect signs of selection in other mutations. If they weren’t neutral, the central assumption of that approach would fall down.

Another question the study asked was whether certain positions in the protein require amino acids of a certain type. The twenty amino acids found in proteins can be loosely grouped according to their physical and chemical properties. For example, some of them are positively charged, while others carry no charge at all; some are (relatively speaking) huge and some are tiny. These properties determine how a protein folds and what its different regions can do, so one would expect that in important positions, only amino acids similar in size and chemistry could work.

To find all the amino acids that worked equally well in a given position, Hietpas et al. looked at a subset of amino acid changes: those whose fitness was very close to the wild type. Surprisingly, they found that several positions tolerated radically different amino acids without losing much fitness. Quoting from the paper,

“[t]his type of physical plasticity illustrates the degenerate relationship between physics and biology: Biology is governed by physical interactions, but biological requirements can have multiple physical solutions.”

This is kind of stating the obvious in this context, but it does echo a more general observation about life. In evolution, there is often more than one way to skin a cat.

[ASIDE: Analogous enzymes provide a striking demonstration of that. These are pairs – or even groups – of enzymes that catalyse the same reaction, without bearing any physical resemblance to one another. Their sequences are different, their 3D structures are different, and their catalytic mechanisms are different, yet they do essentially the same thing. But there are also more familiar, if less extreme, examples. For instance, within vertebrates only, we see three different solutions for powered flight and even more variations on gliding (herearesomeofthem).]

The researchers built a “fit amino acid profile” of their test sequence using these “wild type-like” mutations, then compared it to the actual pattern of amino acid substitutions observed in “real” Hsp90 proteins. It turns out the two are quite different: eight out of the nine positions are conspicuously less variable in real life than the fitness profile would predict. The paper lists a few possible explanations. Lab environments are not natural environments, and amino acids that work fine in their very controlled environment may not be so great under harsher or less stable real-world conditions. Wild type-like fitness does not mean the substitution is completely neutral – many of them are slightly deleterious, which may come out more strongly under natural circumstances, especially over the long term. And one of the substitutions would require more than one mutation at the DNA level – with strongly deleterious intermediate steps.

That last point leads me to the part of the study I personally found most interesting. Thus far, we’ve taken the genetic code as a given, and hardly paid any attention to it at all. But, in fact, the genetic code itself is a product of evolution. Most likely, it didn’t spring into existence fully formed when organisms invented protein synthesis. There is a mind-blowingly large number of possible genetic codes – why is it that organisms use this particular one, with only minor variations? We won’t go into all of the hypotheses about that, mostly because I’m not very familiar with them. It’s enough to note that in principle, the genetic code could be accidental – it just happened to be the one some distant ancestor of all living things stumbled on –, a chemical inevitability of some sort, or it could have risen to prominence by natural selection.

[ASIDE: The options are not mutually exclusive. For example, it is possible that the only important thing about the genetic code is how easy it is to mutate from particular amino acids to certain others – in other words, that it’s the structure of the code that’s under selection, while its finer details, such as which four codons stand for glycine, may be largely coincidental or determined by chemical necessity.]

For this tiny region of the Hsp90 gene/protein, it looks very much like selection had a hand in it. Hietpas et al. used their theoretical fit amino acid profile and a sample of 1000 randomly generated genetic codes – and asked how many substitutions it would take to switch between equally fit amino acids under each genetic code. Intriguingly, very few genetic codes made it as easy as the real one. In other words, the genetic code seems geared to minimise the number of deleterious mutations.

What’s really fascinating about that result is that it came from an analysis of such a tiny sequence. Earlier, I mentioned that it might be hard to generalise anything from a short sequence. But it’s hard to believe that this particular finding doesn’t have general applicability. The genetic code sets the rules for all proteins – if it weren’t optimised in general, what’s the chance that such strong optimisation would be detected in such a tiny sample? This also suggests that roughly the same amino acids are interchangeable across the board, regardless of which protein we’re talking about. (Which is not necessarily surprising if you’ve ever spent time comparing protein sequences between species, but still, it’s valuable as a new way of looking at a familiar phenomenon).

All in all, this is the kind of paper that makes me all giddy with excitement. It digs deep into fundamental questions in evolutionary theory, and it finds some intriguing answers. It’s also a great reminder of how amazingly far technology has come – merely sequencing 27 base pairs would have been a formidable task at the dawn of molecular biology, and now we can mix 500 different versions together, sequence all of them in a single experiment, and reliably count how many of each variant there are. And that’s nowhere near the limits of current sequencing technology. This is the future, folks, and it’s better than sci-fi.

Recently, I’ve been re-reading Life on a Young Planet. As I said before, it’s an excellent book. It is beautifully written, cleverly structured, and the author is obviously knowledgeable about the subject (which, sadly, isn’t always true in popular science). Most importantly, it emphasises the process of science, as opposed to the actual knowledge gained through that process. “How do we know what we know?” is a question at least as important to Andrew Knoll as “What do we know?” As he so eloquently puts it, “[t]extbooks may portray science as a codification of facts, but it is really a disciplined way of asking about the unknown.” This is an attitude I share with him, and probably a big part of the reason the book has such a special place in my heart.

So, I was surprised to discover on this re-read that Knoll falls into one of the most common traps of talking about evolution: teleological thinking. In Chapter 11, “Cambrian Redux”, he writes that “[f]orty million years after the Cambrian began, evolutionary way stations still played a major role in the ecology of marine environments.” He is discussing the Cambrian explosion, of course, and here he is talking about stem groups of living phyla living alongside the crown groups [1]. I don’t think he means to convey a sense of goal-orientation, but the wording does exactly that. It sounds as if, say, Anomalocaris was just something evolution had to pass through to get to arthropods, not a successful animal in its own right. It suggests that the eventual supplanting of these now-extinct lineages was meant to happen.

Richard Dawkins called this “the conceit of hindsight” and complained about it at length in the introduction to his (also really good) book The Ancestor’s Tale. Dawkins characterises such thinking as “seeing the past as aimed at our own time, as though the characters in history’s play had nothing better to do with their time than foreshadow us.” (In this particular case, he’s talking about ordinary history, as a prelude to introducing the same problem in evolutionary history.) It’s a very common way of thinking about evolution (just look at any of the traditional “march of progress” images), and it’s also totally wrong.

If you’ve been in prolonged contact with creationists, you’ve almost certainly encountered conspicuous examples of this common misconception. Types of questions I’ve personally seen include “what use is half a wing/[insert transitional feature here]?”, “why didn’t all X evolve into Y?”, and “how did X know they were evolving into Y?” At the heart of each lurks the idea that evolution works towards goals. That it doesn’t seems to be one of the most difficult aspects of evolutionary theory to grasp, and it’s especially hard to escape when we are looking at the past.

Simply put, evolution has no foresight. Rather than working towards something, the process always reacts to something. Rather than looking ahead, it constantly lives in the present, though it’s often saddled with the baggage of the past. The kinds of things that cause mutation (such as replication errors, radiation and chemical damage) have random effects [2]. Moreover, the processes that sort among mutations, such as natural selection, are similarly blind. Because the mechanisms of evolution are not thinking entities, the only traits that get passed on are traits that help their owners reproduce in the here and now. Any long-term trend is the outcome of repeated rounds of selection on the same traits. Evolution has no goal in the same way a snowflake doesn’t aim for your nose, though in retrospect you can perhaps reconstruct the path it took to get there.

That’s the problem with history: we are looking back on processes whose outcomes we already know. It’s so tempting to view the preceding events as mere stages in a journey aimed at those outcomes. After all, we humans work with goals in mind all the time (ironically, nowadays we might use evolutionary principles to attain those goals!). Unfortunately, viewing evolution in this way can lose sight of the process by focusing on the endpoint – and then people start asking about half wings.

It’s important to remember that the ancestor of the wing was not “half a wing”. It was just a modified arm that had some advantage over its ancestor, e.g. large feathers to help a dinosaur keep her eggs warm, or (closer to “wingness”) glide from tree to tree. These animals weren’t half-functional fliers, they were fully functional at whatever they were doing. If an alien scientist looked around in a Middle Jurassic forest, it might have marvelled at the exquisite gliding adaptations of small dinosaurs much like Microraptor [3], but it surely wouldn’t have focused on how bad they were at flying.

(Also, always remember that when you are the only one who can do something, by definition you’re the best at it!)

I wish we could just drop the teleological language altogether. It’s surprisingly difficult even when you actively try, though. It could be something about the way language works (at least the two I know well). Somehow, it seems much easier to say things like “X evolved to do Y” in them than to give a more accurate description of the evolutionary process. I’m sure that says something profound about human minds…

***

[1] In systematic jargon, a crown group is the last common ancestor of all living members of a group, and all of its descendants (including extinct ones). The corresponding stem group (stem groups are always relative to a crown) includes anything extinct that’s more closely related to the crown group in question than to any other living lineage. For example, all non-avian dinosaurs were stem birds.

[2] We have to be precise about the meaning of “random” here. Some mutagens cause very specific mutations. “Random” refers to their fitness effects, not the chemical changes that happen or even the places where they happen (though the latter is largely random, except for trivial constraints). The same mutation in different parts of the genome can be beneficial, harmful or have no effect at all, and conversely, the same is true for different mutations at the same spot – and all of this is uncontrollable. If you keep your study organisms in a hot environment, they won’t suddenly start producing more mutations that make them heat-resistant. That’s the main thing we mean when we say mutations are random.

[3] Microraptor itself is Early Cretaceous – birds were already around when these guys inhabited the forests of China. The first part of the Jurassic – i.e. the time between early dinosaurs and Archaeopteryx – doesn’t have a great record of dinosaur fossils, so most of what we know of the origin of birds comes from relatives of birds that persisted alongside birds later on. However, a few verybird-likefossils are contemporaneous with, or older than, Archaeopteryx. Like Microraptor, some of these creatures have long leg feathers (unlike Microraptor‘s, theirs aren’t very aerodynamic) , so that may be something ancestral for the “birdy” lineage.