An evo-devo geek's scientific meanderings

gene duplication

The Mammal has emerged from a thesis-induced supermassive black hole and a Christmas-induced food coma, only to find that in the month or so that she spent barely functional and buried in chapters covered in the supervisor’s dreaded Red Pen, things actually happened in the world outside. This, naturally, manifested in thousands of items feeling thoroughly neglected in RSS readers and email inboxes. (Jesus. How many times have I vowed never to neglect my RSS feed again? Oh well, it’s not like unemployment is such a busy occupation that I can’t deal with a measly two and a half thousand articles 😛 )

… earlier tonight, the paragraph here said I wasn’t doing a proper post yet, “just pointing out” a couple of the cooler things I’ve missed. Then somehow this thing morphed into a 1000+ word post that goes way beyond “pointing things out”. It’s almost like I’ve been itching to write something that isn’t my thesis. >_>

So the first cool thing I wanted to “point out” is the genome paper of the centipede Strigamia maritima, which is a rather nondescript little beast hiding under rocks on the coasts of Northwest Europe. This is the first sequenced genome of a myriapod – the last great class of arthropods to remain untouched by the genome sequencing craze after many genomes from insects, crustaceans and chelicerates (spiders, mites and co.). The genome sequence itself has been available for years (yay!), but its “official” paper (Chipman et al., 2014) is just recently out.

Part of the appeal of Strigamia – and myriapods in general – is that they are considered evolutionarily conservative for an arthropod. In some respects, the genome analysis confirms this. Compared to its inferred common ancestor with us, Strigamia has lost fewer genes than insects, for example. Quite a lot of its genes are also linked together similarly to their equivalents in distantly related animals, indicating relatively little rearrangement in the last 600 million years or so. But this otherwise conservative genome also has at least one really unique feature.

Specifically, this centipede – which is blind – has not only lost every bit of DNA coding for known light-sensing proteins, but also all known genes specific to the circadian clock. In other animals, genes like clock and period mutually regulate one another in a way that makes the abundance of each gene product oscillate in a regular manner (this is about the simplest graphical representation I could find…). The clock runs on a roughly daily cycle all by itself, but it’s also connected to external light via the aforementioned light-sensing proteins, so we can constantly adjust our internal rhythms according to real day-night cycles.

There are many blind animals, and many that live underground or otherwise find day and night kind of irrelevant, but even these are often found to have a functioning circadian clock or keep some photoreceptor genes around. However, based on the genome data, our favourite centipede may be the first to have completely lost both. The authors of the genome paper hypothesise that this may be related to the length of evolutionary time the animals have spent without light. Things like mole rats are relatively recent “inventions”. However, the geophilomorph order of centipedes, to which Strigamia belongs, is quite old (its most likely sister group is known from the Carboniferous, so they’re probably at least that ancient). Living geophilomorphs are all blind, so chances are they’ve been that way for the last 300+ million years.

Nonetheless, the authors also note that geophilomorphs are still known to avoid light – the question now is how the hell they do it… And, of course, whether Strigamia has a clock is not known – only that it doesn’t have the clock we’re used to. We also have no idea at this point how old the gene losses actually are, since all the authors know is that one other centipede from a different group has perfectly good clock genes and opsins.

In comparison with fruit flies and other insects, the Strigamia genome also reveals some of the ways in which evolutionary cats can be skinned in multiple ways. There is an immune-related gene family we share with arthropods and other animals, called Dscam. The product of this gene is involved in pathogen recognition among other things, and in flies, Dscam genes are divided into roughly 100 chunks or exons, most of which are are found in clusters of variant copies. When the gene is transcribed, only one of these copies is used from each such cluster, so in practical terms the handful of fruit fly Dscam genes can encode tens of thousands of different proteins, enough to adapt to a lot of different pathogens.

A similar arrangement is seen in the closely related crustaceans, although with fewer potential alternative products. In other groups – the paper uses vertebrates, echinoderms, nematodes and molluscs for comparison – the Dscam family is pretty boring with at most one or two members and none of these duplicated exons and alternative splicing business. However, it looks like insects+crustaceans are not the only arthropods to come up with a lot of DSCAM proteins. Strigamia might also make lots of different ones (“only” hundreds in this case), but it achieved this by having dozens of copies of the whole gene instead of performing crazy editing feats on a small number of genes. Convergent evolution FTW!

Before I paraphrase the entire paper in my squeeful enthusiasm (no, seriously, I’ve not even mentioned the Hox genes, and the convergent evolution of chemoreceptors, and I think it’s best if I shut up now), let’s get to something else that I can’t not “point out” at length: a shiny new vetulicolian, and they say it’s related to sea squirts!

Vetulicolians really deserve a proper discussion, but in lieu of a spare week to read up on their messiness, for now, it’s enough to say that these early Cambrian animals have baffled palaeontologists since day one. Reconstructions of various types look like… a balloon with a fin? Inflated grubs without faces? I don’t know. Drawings below (Stanton F. Fink, Wikipedia) show an assortment of the beasts, plus Yunnanozoon, which may or may not have something to do with them. Here are some photos of their fossils, in case you wondered.

They’re certainly difficult creatures to make sense of. Since their discovery, they’ve been called both arthropods and chordates, and you can’t get much farther than that with bilaterian animals (they’re kind of like the Nectocaris of old, come to think of it…).

The latest one was dug up from the Emu Bay Shale of Australia, the same place that yielded our first good look at anomalocaridid eyes. Its newest treasure has been named Nesonektris aldridgei by its taxonomic parents (García-Bellido et al., 2014), and it looks something like this (Diego García-Bellido’s reconstruction from the paper):

In other words, pretty typical vetulicolian “life but not as we know it”, at first glance. Its main interest lies in the bit labelled “nc” in the specimens shown below (from the same figure):

This chunky structure in the animal’s… tail or whatever is a notochord, the authors contend. Now, only one kind of animal has a notochord: a chordate. (Suspicious annelid muscle bundles notwithstanding. Oh yeah, I also wanted to post on Lauri et al. 2014. Oops?) So if this thing in the middle of Nesonektris’s tail is a notochord, then at the very least it is more closely related to chordates than anything else.

Why do they think it is one? Well, there are several long paragraphs devoted to just that, so here goes a summary:

1. It’s probably not the gut. A gut would be the other obvious ID, but it doesn’t fit very well in this case. Structures interpreted as guts in other vetulicolians – which sometimes contain stuff that may be half-digested food – (a) start in the front half of the body, where the mouth is, (b) constrict and expand and coil and generally look much floppier than this, (c) don’t look segmented, (d) sometimes occur alongside these tail rod-like thingies, so probably aren’t the same structure.

2. It positively resembles modern half-decayed notochords. The notochords of living chordates are long stacks of (muscular or fluid-filled) discs, which fall apart into big blocks as the animal decomposes after death. Here’s what remains of the notochord of a lamprey after two months for comparison (from Sansom et al. (2013)):

This one isn’t as regular as the blockiness in the fossils, I think, but that could just be the vetulicolians not being quite as rotten.

There is, of course, a but(t). To be precise, there are also long paragraphs discussing why the structure might not be a notochord after all. It’s much thicker than anything currently interpreted as such in reasonably clear Cambrian chordates, for one thing. Moreover, it ends right where the animal does, in a little notch that looks like a good old-fashioned arsehole. By the way, the paper notes, vetulicolian tails in general don’t go beyond their anuses by any reasonable interpretation of the anus, and a tail behind the anus is kind of a defining feature of chordates, though this study cites a book from the 1970s claiming that sea squirt larvae have a vestigial bit of proto-gut going all the way to the tip of the tail. (I suspect that claim could use the application of some modern cell labelling techniques, but I’ve not actually seen the book…)

… and there is a phylogenetic analysis, in which, if you interpret vetulicolians as deuterostomes (which impacts how you score their various features), they come out specifically as squirt relatives whether or not you count the notochord. I’m never sure how much stock to put in a phylogenetic analysis based on a few bits of anatomy gleaned from highly contentious fossils, but at least we can say that there are other things – like a hefty cuticle – beyond that notochord-or-not linking vetulicolians to a specific group of chordates.

Having reached the end, I don’t feel like this paper solved anything. Nice fossils either way 🙂

And with that, I’m off. Maybe next time I’ll write something that manages to be about the same thing throughout. I’ve been thinking that I should try to do more posts about broader topics rather than one or two papers (like the ones I wrote about ocean acidification or homology versus developmental genetics), but I’ve yet to see whether I’ll have the willpower to handle the necessary reading. I’m remarkably lazy for someone who wants to know everything 😀

(Aside: holy crap, did I ALSO miss a fucking Nature paper about calcisponges’ honest to god ParaHox genes? Oh my god, oh my GOD!!! *sigh* This is also a piece of incredibly exciting information I’ve known for years, and I miss it when it actually comes out in a journal bloody everyone reads. You can tell I’ve been off-planet!)

Technically, the first paper isn’t about new new genes: Assis and Bachtrog (2013) examined recently duplicated genes in fruit flies. But screw technicalities, what they’re saying makes my eyes pop.

When a gene is accidentally copied, a variety of possible fates can await it. Most of the time, the extra copy just dies. Some mechanisms of gene duplication just take the gene without the regulatory elements it needs to function properly. Even if the new copy works, it’s still redundant, so there’s nothing stopping mutations from destroying it over time. However, sometimes redundancy is removed before the new gene breaks irrevocably, and both copies are kept. This can, in theory, happen in a number of ways. Because I’m feeling lazy, let me just quote them from the paper (square brackets are mine, because I hate repeatedly typing out long ugly words :)):

Four processes can result in the evolutionary preservation of duplicate genes: conservation, neofunctionalization, subfunctionalization, and specialization. Under conservation, ancestral functions are maintained in both copies, likely because increased gene dosage is beneficial (1). Under neofunctionalization [NF], one copy retains its ancestral functions, and the other acquires a novel function (1). Under subfunctionalization [SF], mutations damage different functions of each copy, such that both copies are required to preserve all ancestral gene functions (9, 10). Finally, under specialization, subfunctionalization and neofunctionalization act in concert, producing two copies that are functionally distinct from each other and from the ancestral gene (11).

We might add a variation on NF, too: Proulx and Phillips (2006) theorised that differences in function that arise in different alleles (variants) of a single gene can turn duplication into an advantage, turning the conventional duplication-first, new function-next scenario on its head.

Either way, genomes contain lots of duplicated genes, there’s no question about that. What isn’t nearly as well understood is the relative importance of various mechanisms in producing all these duplicates. It’s much easier to theorise about mechanisms than to test the theories. Since evolution doesn’t stop once a new gene has earned its place in the genome, it can be hard to disentangle the mechanism(s) responsible for its preservation from the stuff that happened to it later. Also, to really assess the relative role of different mechanisms, you’ve got to look at whole genomes.

(Assis and Bachtrog say that this hasn’t been done before, and then go right on to cite He and Zhang [2005], which is a genome-wide study of SF and NF. I guess it doesn’t look at all the mechanisms…)

Assis and Bachtrog used the amazing resource that is the 12 Drosophila genomes project, focusing on D. melanogaster and D. pseudoobscura to find slightly under 300 pairs of genes that duplicated after the divergence of those two species. Since Drosophila genomes are very well-studied, they were able to identify the “parent” and “child” in each pair based on where they sit on their chromosomes. They then also extracted thousands of unduplicated genes from the melanogaster and pseudoobscura genomes, to use as a measure of background divergence between the two species.

To measure changes in gene function, they compared the expression of parent and child genes to each other and to the “ancestral” copy (i.e. the unduplicated gene in the other species) in different parts of the body (if a gene is suddenly turned on somewhere it wasn’t before, it’s probably doing something new!).

Long story short, it turned out that in the majority of cases (167/281) cases the child copy behaved much more differently from the “ancestor” than expected, while the parent copy stayed pretty close. These child copies also showed faster sequence evolution than their parents. This means that NF – and specifically that of the new copy – is the most common fate of newly duplicated genes in these animals. There’s also a fair number of gene pairs where both copies gained new functions or both stuck with the old ones, but only three where both copies lost functions. Pure SF, which very influential studies like Force et al. (1999) championed as the dominant mode of duplicate gene survival, appears to be an incredibly rare occurrence in fruit flies!

A few paragraphs ago I mentioned the caveat that duplicated genes don’t stop evolving just because they’ve managed to survive. Well, the advantage of having all these Drosophila genomes is that you can further break down “young” duplicates into narrower age groups, using the species that fall between melanogaster and pseudoobscura on the tree. However, looking at this breakdown doesn’t change the general pattern – NF of the child copy is the most common and SF is rare or nonexistent in even the youngest age groups, along both the melanogaster and the pseudoobscura lineages.

So what exactly is going on here?

Part of the difference in expression patterns between parent/ancestral and child copies is because these new genes are turned on in the testicles, which might give us a big clue. Testicles, you see, are a bit anarchical. Things that are normally kept silent in the genome, like various kinds of parasitic DNA, wake up and run wild during the making of sperm. If you remember my throwaway reference to duplication mechanisms that cut the gene off from its old regulatory elements – well, the balls are a place where even such lost and lonely genes get a second chance.

The genomic anarchy of testes is also one of the reasons these duplications happen in the first place; the aforementioned mechanism involves those bits of parasitic DNA that copy and paste themselves via an RNA intermediate. The enzymes they use to reverse transcribe this RNA into DNA and insert it back into the genome aren’t particularly discerning, and they’ll happily do their thing on a piece of RNA that isn’t the parasite. Indeed, slightly more NFed child genes than you’d expect originated via RNA, although it’s worth noting that more than half of them still didn’t. So while the testes look like a good place for new gene copies to find a use, they aren’t totally responsible for their origins.

Why is there so little SF among these genes?

This is the Obvious Question; my jaw nearly landed on my desk when I saw the numbers. The authors have two hypotheses, both of which may be true at the same time.

First, SF assumes that the two copies have the same functions to begin with. This is not necessarily true when just a small segment of DNA is duplicated – even when it’s not just a bare gene you’re copying, the new copy might lose part of its old regulatory elements and/or land next to new ones, not to mention Proulx and Phillips’s idea of new functions appearing before duplication. So maybe SF is more common after wholesale duplications of entire genomes, and Drosophila species didn’t have any of those recently.

Secondly, SF happens by genetic drift, which is a random process that works much better in small populations. Fruit flies aren’t known for their small populations, and therefore the dominant evolutionary force acting on their genomes will be selection.

This makes sense to me, but the degree to which NF dominates the picture is still pretty amazing. I wonder what you’d get if you applied the same methods to different species. Would species with smaller populations, or those that recently duplicated their whole genomes, show more evidence for SF as you’d expect if the above reasoning is correct? Or would the data slaughter all those seemingly reasonable explanations? What would you see in parthenogenetic species that have no males (and testicles)?

When we talk about evolutionary novelty, especially if the talking is to non-specialists, gene duplication is all the rage. From the sophistication of vertebrate blood clotting to the seemingly pointless complexity of a yeast proton pump (Finnigan et al., 2012), accidentally copied genes are undoubtedly an important source of new stuff in evolution. But copying and tweaking is not the only way new genes can arise. Sometimes, new genes really are new.

I admit, I wasn’t nearly excited enough about this possibility until this paper landed in my RSS reader a while back. Toll-Riera et al. (2012) find that the boring repetitive DNA that my gut feeling would’ve dismissed as true “junk” may actually be a great source of new proteins. First, it’s a good theoretical source . Long stretches of repetitive sequence are less likely than random sequence to suddenly and unceremoniously end in a stop codon* and translate to a short and useless amino acid sequence. Second, it appears that younger proteins do contain more repetitive sequence than old ones. What’s more, the repeats are often found within the regions that confer function on proteins. They aren’t just useless filler.

So, okay, a lot of proteins seem come from pieces of “junk” DNA. How?

Maybe they arise from random gene expression noise and turn into proper genes gradually, say Carvunis et al. (2012). It has been known for a while that DNA that doesn’t belong to traditionally recognised genes quite often gets transcribed into RNA in cells. Sometimes, these random bits of RNA may even be translated into an amino acid chain. If some of these accidents are actually useful, the researchers reasoned, they could create a selection pressure to turn the DNA that produced them into a proper gene.

They took this idea and applied it in a study of open reading frames (ORFs) in the yeast genome. An “ORF” is jargon for a stretch of DNA that isn’t interrupted by stop codons. In theory, any ORF could make a “meaningful” piece of protein. Most ORFs that aren’t genes are short, often just a handful of codons; and most ORFs known to be genes are long, with hundreds of codons. The team argued that if random ORFs can give rise to genes, there should be plenty of transitional forms.

To test this, they first classified all the hundreds of thousands of ORFs in the yeast genome according to their evolutionary age. The ones that were conserved in all of the yeast species they used for comparison were given a score of 10, and ORFs that only brewer’s yeast had were called zeroes. (Most known genes belong to classes 5-10, meaning they evolved quite far back on the yeast family tree.) The next step was to pick the Class Zero ORFs that were actually transcribed and translated, so might be in the pool of potential “proto-genes”. They found this set of “0+” ORFs by analysing RNA sequencing data in both happy yeast cells and yeast deprived of food, just to make sure they caught any sequences that only acted like genes under some circumstances. In addition, they also checked which of those RNAs were associated with ribosomes, the sites of translation. These filtering steps left over a thousand little ORFs that don’t belong to known genes, are completely unique to Saccharomyces cerevisiae, expressed, and probably translated.

Going up the conservation scale, ORFs become increasingly gene-like. The older ones are longer, their RNA copies are more abundant, and more of them appear constrained by natural selection. (Interestingly, when you translate them, the more gene-like ORFs produce less ordered protein structures. Not sure what to make of that.) Proper genes are also better suited to get ribosomes to translate them. Conservation classes 1-4, those ORFs that are shared only by closely related Saccharomyces species, are intermediate in all of these properties (and some more) between the zeroes and the older ORFs.

There is one more thing about this study that definitely bears mentioning When you count how many new gene duplicates this yeast species has versus how many new, potentially functional, random ORFs, the latter come out on top by far. Between them, S. cerevisiae and its closest sister species apparently have somewhere between one and five newly duplicated genes. The same duo also came up with nineteen new ORFs that are under selection and therefore probably functional. Potentially, these random little sequences people might have dismissed as background noise not long ago are more potent sources of new genes than the celebrated gene duplication.

I don’t know about you, but that absolutely fascinates me.

***

P.S.: Incidentally, this is all about protein-coding genes. However, thousands of genes in your own genome do NOT encode proteins. They include genes for the good old RNA components of the translation machinery, ribosomal and transfer RNA, but there are also other RNA genes with transcripts involved in everything from keeping parasitic DNA in check to editing the messenger RNAs of other genes. I kind of want to find out how these RNAs form and acquire functions. Also, when we are quite happy to call a piece of DNA that doesn’t have a protein product a “gene”, and cells are swarming with RNA that doesn’t come from things traditionally called “genes”, and some of this RNA actually does encode proteins, what does that do to the definition of a “gene”??

***

*Gotta love the mnemonics on that page. I didn’t think three three-letter combinations would be that hard to remember, but I have to admit I chuckled at “U Are Gone”.

Since work is quite frantic lately, and my attention span has gone on holiday, I’ve decided to do something I haven’t done before and just say a few words about papers that caught my interest today without actually reading them. Each of these is probably worth a full-blown meandering of its own, but I know I wouldn’t ever get to them at this rate. Better read their abstracts and give some quick thoughts than let them sink unnoticed into the murk of my “papers” folder!

In the circles I move in, it’s pretty much canon that the ancestors of living vertebrates doubled their entire genomes twice. It’s still debated exactly when these duplications occurred, but few people doubt that they did. This so-called 2R hypothesis is supported by things like our possession of several (quite often, four) copies of genes that are singletons in our closest living relatives (read: lancelets*), and more importantly, that whole big chunks of lancelet chromosomes can be matched to chunks of four different vertebrate (mainly, human) chromosomes. Genes that are close to one another in lancelets are often also close together in vertebrates.

The relationship is not perfect – in well over 500 million years of evolution, genes inevitably get lost and bits of chromosome scrambled. And, thus, there is always room to question the 2R scenario, which is what this paper clearly does. They propose that those four-gene families originated at all sorts of different times, from small local duplications and rearrangements. If they are right, this is a very important result. It basically uproots every bit of speculation ever proposed on how the genome duplications contributed to the evolution of vertebrates, which, far as I can tell, is a hell of a lot of speculation. Not having read the whole paper, I would still put my money on 2R, but who knows what the future holds? Maybe we are facing a minor paradigm shift?

The evolutionary history of segmentation is one of my random interests, and from my point of view, the above is a good reason to squee in a most fangirlish way. Segmentation is the construction of a body from repeating units. In its purest form, which isn’t that common in modern animals, the animal is essentially made up of identical repeated blocks containing a copy of each key organ like kidneys, nerve centres, limbs and muscles. (Even in the most perfectly segmented creatures, head and tail ends form something of an exception. Ragworms make a nice example.) More commonly, only some components are repeated, and they are repeated with slight differences along the body. Vertebrates’ spine and associated muscles are a good example, and so are the defining traits of arthropods, their jointed exoskeletons equipped with repeated pairs of appendages.

Although traditionally it has been thought that arthropod and vertebrate segmentation have independent origins, parts of the genetic machinery are shared between both groups (as well as segmented worms). Various “segmentation genes” are active in distinct stripes in our embryos, marking out future segments even before we can see the segments themselves. In vertebrates, cells periodically switch “segmentation genes” on and off, and this combined with the growth of the embryo produces a dynamic stripey pattern of gene expression. While segments and stripes of gene expression are darn obvious in arthropods, this is the first time anyone has confirmed that some arthropod segmentation genes actually oscillate like their vertebrate counterparts do, as opposed to, say, the cells expressing them moving about. Whether this is a spectacular example of convergent evolution or evidence of a shared ancestral heritage, I couldn’t say, but it’s really cool either way.

So, this claims to resolve a conundrum I wasn’t even aware of before. Gene duplication is thought to be important for the evolution of new functions because two copies of a gene mean there is a backup if one of them fails at its original function. Hence, theory goes, duplicate genes are much less restricted in the evolutionary paths they can take. Apparently, studies in mice have contradicted this common wisdom by claiming that duplicate genes are just as likely to be indispensable as genes without backup copies. However, Chen et al. are saying that this is wrong, confounded by gene age. Since new genes are less likely to be essential than old genes (which had more time to evolve interactions with the rest of the genome), and mouse duplicates are on average older than mouse singletons, the two effects end up cancelling out. When they factor in gene age, duplicates are indeed less essential than loners. One of the central tenets of current thinking about (genetic) novelty stays in the ring for another round…

How complexity increases in evolution is more than a breeding ground for creationist incredulity, it’s also quite interesting for bona fide evolutionary biologists. Looking at the development of mouse teeth, Harjunmaa et al. notice that increases and decreases in the complexity of tooth shapes require different sorts of mess-ups. Simpler-than-normal teeth are common in mutants and easy to make in experiments. More complex teeth – i.e. those with more cusps – are rarely if ever seen in natural mutants. Turns out they are perfectly possible – you just need to manipulate several genetic pathways at the same time to produce a clear result.

Can this be generalised? Is greater complexity usually harder to achieve? When does this apply and when does it not? I’ve recently read papers that explore how complexity increases easily and completely by chance (I have a half-written post about them languishing on my hard drive, FWIW). Are the rules different for different levels of organisation? The aforementioned complexity-by-chance papers analyse the molecular level: one is about the architecture of gene switches, the other about a protein machine. Teeth are pretty large pieces of life with thousands upon thousands of such machines participating in their production. Does that make a real difference, or is what I’m seeing just coincidence? Dunno, but it’s fascinating to think about!

***

Heh, it looks like I took rather bigger “bites” of these news than I planned to. I kind of managed to write the equivalent of a full-blown meandering anyway. The only difference is that I didn’t painstakingly reference this one. I hope that doesn’t mean that half of what I wrote off the top of my head is wrong 😀

***

Note:

*Lancelets are now not considered our closest relatives. Unbelievable as that may seem, that honour goes to sea squirts and their ilk. However, the sea squirt bunch are ridiculously weird in all sorts of respects, and their genomes are jumbled beyond recognition. So… not so great if you want to learn anything about our ancestors.