New genes arise quickly

What role does the appearance of new genes, versus simple changes in old ones, play in evolution? There are two reasons why this question has recently become important.

The first involves a scientific controversy. Some researchers—the most prominent being evo-devotee Sean Carroll—maintain that most important evolutionary change, at least in body form, involves changes in regulatory sequences rather than simple changes in genes themselves, or the appearance of new genes. This question hasn’t yet been answered, since we don’t know a great deal about those mutations that have been important in creating new body plans.

The second controversy is religious. Some advocates of intelligent design (ID)—most notably Michael Behe in a recent paper—have implied not only that evolved new genes or new genetic “elements” (e.g., regulatory sequences) aren’t important in evolution, but that they play almost no role at all, especially compared to mutations that simply inactivate genes or make small changes, like single nucleotide substitutions, in existing genes. This is based on the religiously-motivated “theory” of ID, which maintains that new genetic information cannot arise by natural selection, but must installed in our genome by a magic poof from Jebus.

I’ve criticized Behe’s conclusions, which are based on laboratory studies of bacteria and viruses that virtually eliminated the possibility of seeing new genes arise, but I don’t want to reiterate my arguments here. What I want to do is point out a new paper by some Chicago colleagues that suggests that new genes, at least in the genus Drosophila (fruit fly), not only arise pretty quickly, but also diverge very quickly to become essential parts of the genome.

The paper, by Sidi Chen, Yong Zhang, and my friend Manyuan Long, appears in this week’s Science: “New genes in Drosophila quickly become essential.” It’s a clever piece of work. What the authors did was compare whole-genome sequences between various species of Drosophila (there are now many of these) to see how often new genes appeared in one lineage: the lineage that diverged from the ancestors of D. willistoni to become D. melanogaster. The divergence between these two lineages is 35 million years, but by comparing the genomes of other species that branched off these two branches, they could estimate how often new genes arise over the entire period from 3 million to 35 million years ago.

What do they mean by “new genes”? These are genes in D. melanogaster that aren’t found in D. willistoni, but have arisen since their divergence by several processes—most often the duplication of an ancestral gene or its RNA followed by extensive genetic divergence, so that the gene acquires a brand new function. (This process accounts for about 90% of the new genes. Some genes, however, are so different between the species that how they arose is a mystery.) These “new genes,” then, would qualify as what Behe calls “gain-of-FCT” adaptive mutations (“FCT” = functional coded element): the kind of mutations that Behe did not see arising in short-term lab experiments on bacteria and viruses.

Chen et al. found that a surprisingly large number of genes had arisen in the D. melanogaster lineage over this 35-myr period. Here’s a summary of their results:

The authors identified 566 new genes that arose over this period. That’s about 4% of the total genes in the D. melanogaster genome. And that’s quite a few given that the divergence is only 35 myr. The genus Drosophila itself (including the scaptomyzids) diverged from its sister group about 63 million years ago, so we can estimate that, in the genus as a whole, at least 7% of the genome comprises brand new genes.

The authors were able to take a sample of these genes (195 of them) and knockdown their transcripts using novel RNAi technology (this involves inserting transposable genetic elements in those genes and then using those elements to kill the genes). They found that about 30% of these new genes are essential for viability—that is, the fly dies if it has no active copies. This proportion didn’t vary depending on how long ago the “new” gene had arisen. Nor did it differ much from the proportion of “old” genes (those present in both lineages) that are essential for viability, which is about 35%. It seems, then, that even if these genes arise as duplicates from pre-existing genes, they quickly assume new functions that make the fly unable to survive without them.

The “new function” conclusion is supported by two other pieces of data. First, the average difference in DNA sequence between the “new” genes in the D. melanogaster lineage and their parental copies (that is, the genes from which they originated, usually by duplication) is 47.3%. That’s a big difference—a change in nearly every other nucleotide. Second, there are new ways to determine what the new genes do: by estimating which proteins in the genome each new gene’s protein product interacts with. Chen et al. found that many of the products of new genes interact with proteins completely different from the ancestral genes. This implies that the new genes have evolved completely different functions. And, as theory suggests, that’s the way these genes become essential: at first they do the same thing as their ancestral genes (they’re duplicates, after all), but as they diverge they assume new functions (usually impelled by natural selection) that fit them into new developmental pathways. In this way a gene that is at first “gratuitious” can become essential. It’s nice that we can actually see this happening with protein-protein interaction data.

In further support of the above scenario for the evolution of new genetic information, the authors found that in young and new “essential” genes, there was a strong signature of natural selection having acted, as suggested by the high rates of DNA substitution. As the “new” essential genes become older, and assume new functions, these rates slow down. This again supports the theory of how new genes originate: when they’re formed by duplication, they are quickly eliminated from the genome (see below) unless they diverge quickly to do something new. Thus the duplicates that do survive are usually those that have diverged quickly. Once the new function has been assumed, and the gene is essential, selection then acts to preserve its new function by eliminating new mutations (“purifying selection”).

These results, which show that new genetic information (“FCT”s) arises quickly, don’t imply that every new gene duplication becomes a brand-new gene with a new function. That’s far from the case. We don’t know the figure in Drosophila, but in the human lineage it’s estimated that only about 5% of new duplications diverge to become new genes that do something novel. The rest are inactivated, becoming dead “pseudogenes” that don’t do anything. In Drosophila these are quickly removed from the genome, but in our own lineage many of them linger, so we can estimate the proportion of duplicated genes that don’t go on to do something new.

Nevertheless, genes duplicate frequently enough that they can provide sufficient raw material for genetic novelty. Estimates of how often a given gene duplicates in evolution run about one duplication event per 100 million gene copies. That seems low, but remember that there are thousands of genes in the genome, and, in many species (including Drosophila and now ours), there are hundreds of millions of individuals. That means that, in the species, there are many genes that duplicate each generation. Even if only a few percent of these survive inactivation, that’s a lot of raw material for evolutionary change.

The presence of frequent gene duplications is supported by an independent study: Emerson et al. (2008) found that in only fifteen lines of D. melanogaster from nature there were several hundred duplicate genes segregating as polymorphisms (that is, some individuals had one copy of a gene, some had two or more). They estimated that 2% of the genome was tied up in this copy-number variation. Clearly, there are a lot of duplicate genes variants floating around in nature.

The data of Chen et al., then, show that new genetic information can arise quickly, at least on an evolutionary timescale, and that the new genes rapidly assume new functions. (Note: I am using Behe’s characterization of “new genetic information” as involving only new FCTs. I don’t agree with this, since new genetic information can also arise when a single gene copy changes sufficiently to do something new.)

Although this doesn’t answer the question of what proportion of new evolutionary traits involve changes in gene sequence versus changes in gene regulation, it does show that a substantial part of the genome in one group of eukaryotes arises by the evolution of new FCTs that become involved in new developmental networks. In other words, Behe’s conclusion from short-term lab studies of bacteria and viruses doesn’t apply to this well-studied group of organisms—and probably not to other eukaryotes, either. All the evidence tells us that a rapid and important way to create new genetic information is through the duplication of genes and then their divergence by natural selection.

Poems are made by fools like me
But only selection makes an FCT

Now ID advocates like Behe could—and do—suggest that maybe the successfully duplicated-and-diverged genes didn’t arise by natural selection, but appeared by the instantaneous intervention of the designer (aka God/Jebus). But that idea is nixed by at least two observations. The first is the appearance in many groups of dead, nonfunctional pseudogenes that were unsuccessful duplicates. If the Great Designer made gene duplications to create genetic novelty, he surely failed in the majority of cases, and left his failures sitting around in the genome.

The second is the correlation between the age of a new gene and the type of selection acting on it. If a Great Designer created these duplicates de novo to have a new function—presumably because natural selection couldn’t take a gene to a new function by gradual stepwise evolution—they would show instantaneous changes of DNA sequence that looked like selection, and then an instantaneous cessation of that selection right after the gene got its newly created function. But that’s not what we see. What we see is not instantaneous but gradual change: the younger a gene is (as estimated by the position on the evolutionary tree where it arose), the more rapid natural selection acts. That directional selection continues to act as the gene gets older, but then slows down and finally becomes purifying selection, so that new DNA changes are eliminated. This pattern is precisely what’s predicted if duplicates arise by accident and then quickly change by selection to assume new functions.

I suppose Behe and his minions will find a way to explain these two patterns by intelligent design, but that’s because ID theory isn’t science: there is no conceivable observation that can prove it wrong. Every bit of data, no matter what it is, can always be fitted into the ID scheme, especially since its advocates allow a little bit of Darwinian evolution and posit an unpredictable and unknowable Designer. But let us not tarnish the nice results of Chen et al. by using them to cast aspersions on ID. They are a valuable contribution to the real science of evolutionary biology, showing how fast new genetic information can arise by gene duplication.

32 Comments

Question: Do the new genes have to be immediately functional, or can they go through a short pseudo-gene state phase as well (with the original gene preserving the original function), before another mutation resurrects them as a a functional gene with a new function? Is there a way to tell from the data in this research if this happens? And if so, how long this phase can be?

@ Jerry, I agree with your comment. Yes, Chen et al.’s result is perfectly applicable answer to Behe’s argument, but it is just cool research in its own right!

@ Deen, It is awesome that you are asking these questions! To make a long answer short, we’re not entirely sure. There is a nice theoretical framework for addressing such questions. One mechanism proposes that new genes gain new functions through positive Darwinian selection [1] (this is often called neofunctionalization). Another proposes that new genes partition old suites of functions through degenerative mutations [2,3] (this is called subfunctionalization).

Chen et al. 2010 propose that neofunctionalization is the mechanism. Personally, I’m not sure that the two mechanisms are dichotomous. In any event, this is an active area of research and much progress is being made. There are many researchers working in this area, but since Jerry is highlighting work from the Long lab, I’ll give you their lab publication page: http://longlab.uchicago.edu/publication
The titles of the papers might give a flavor for the types of results you see in new gene studies.

And to be completely accurate, I should have made explicit that subfunctionalization and neofunctionalization are mechanisms of preserving duplicates that might otherwise be lost to mutation + genetic drift (assuming that the duplicates are truly redundant in the first place). What happens after preservation is another issue.

From the descriptions of the terms that I’ve found, I’m going to have to agree that subfunctionalization and neofunctionalization are not mutually exclusive. In fact, a quick search for the term “subfunctionalization” on google found an article that argues that argues that subfunctionalization is a transition state to neofunctionalization. I don’t have the expertise to evaluate the merits of that article, but it does appear that this is an active and interesting area of research that I didn’t know about yet. So thanks again for pointing it out :)

Note: for those who don’t know me, I am for our funding scientific research.

However, much of the public doesn’t understand what is going on; hence blogs like this give me a tool to use; when I hear stuff like what Palin said in the video I can direct people to sites like this blog where they can see what scientists do and the kind of thing that can be learned from studies like this one.

The second is the correlation between the age of a new gene and the type of selection acting on it.
…
But that’s not what we see. What we see is not instantaneous but gradual change: the younger a gene is (as estimated by the position on the evolutionary tree where it arose), the more rapid natural selection acts.

Isn’t the age determined by the changes? Is this circular definition? It is a bit fuzzy for me.

This suggestion, while wrong, isn’t entirely unreasonable, at least not a few years ago. In fact, sequence divergence between copies was used in one of the most cited evolutionary genomics papers [1], though in that case, the authors could use another proxy for age, the rate of change of “silent” sites (the sites in a gene that change the nucleotide but not the protein sequence). For short timescales, using silent sites doesn’t suffer the circularity you suggest, though longer timescales can be problematic, because the the same silent sites have been hit multiple times (i.e. “saturated”) and thus no longer serve as a good proxy for time (all you can say is that they are older than the time it takes to saturate silent sites).

However, in 2007, using the rate of change inside the sequence was no longer necessary, as the genomes of 12 species closely related to Drosophila were published. Because there is are nearly complete* genomic inventories for genomes of varying evolutionary distance from D. melanogaster, when each gene arose can be inferred from where a new gene arises in the known phylogeny. In this way, new genes can be dated.

This strategy was already applied in 2009 [3,4]. In those two papers, they looked for duplicated genes and determined when they first arose on the phylogeny.

* On the balance, the 12 genomes are pretty good, but they are necessarily still at “draft” stage given how expensive it is to “do it right” and the rapid pace of technologic change. In a few years, maybe a decade, I think there is a high probability that getting these genomes to much higher quality will be very cheap and fast. So, there isn’t much of a problem in waiting.

Perhaps. But these genes are the coolest of all. They represent the clearest weakness of Behe’s argument. These de novo genes are examples of new genes arising from non-coding sequences, something that pretty much blows Behe’s thesis out of the water. Since such a high proportion of genomes is composed of gene families, only a relatively few such de novo genes need arise to account for the diversity of genes we see. The rest can be taken care of by duplication, fission, fusion, and exon shuffling, followed by adaptive divergence if useful.

If we indeed have evidence that some new genes come from non-coding DNA, and if some of that non-codin DNA could have originally come from non-preserved, randomly drifted duplicate genes, then I guess this would also be an answer to my question above in #1.

Thank you Jerry.
In fact something in my mind is that Drosophila should not be the only clade to have frequent FCTs. I believe the rise of novel genes/functions during adaptation is a widespread phenomenon. Good mutations will quickly become fixed in nature.

And actually, Drosophila have unusually small genomes and little junk DNA, presumably meaning there is stronger selection against having extra DNA lying around — unlike in, say, humans and many other animals, where the genomes can be dozens of times larger (but still have the same amount of genes) and be substantially made up of repetitive elements and other junk.

Ok, here’s a question. I’m no genetics boffin and find the topic of evolution bewilderingly complex and fascinating.

We’ve seen examples of remarkable similarity in adaptations with convergent evolution, similar solutions to problems independently arrived at in completely unrelated genetic lines. I think the quote was that the eye or eye-like organs are known to have developed at least 23 times. The one that always struck me is how similar rhinos and triceratops look with the squoochy fat legs and head horns. The other one that amazes me is the similarity of adapting forelimbs to flight between bats, pterosaurs and birds.

So, how likely are these adaptations to be things that are completely new and how many are reserved as genetic tool kits? We see various arctic species of common critters like bears, hares, and foxes that developed seasonal coats. Did all of them evolve that sort of ability separately or are they drawing on a common genetic toolkit way back down the line before they diverged?

The other thing I wonder about are the atavisms we see, throwbacks to earlier points in genetic history of the species. We see some whales with hindlimbs, humans with tails, etc. If a sheer genetic fluke could give a whale a leg, would it be possible to deliberately regress the DNA of an animal to an earlier state, recreating the whale’s ancestral species? Or would that information be lost in the development process and unrecoverable? (Yeah, I know, someone’s going to bring up devolution and crappy Trek episodes. I beat you to it.)

The other thing that really gets me is how quickly certain things go away in the right environment. You wouldn’t think that there would be much of an advantage to getting rid of eyes for cave-dwelling creatures but conversely there wouldn’t be much of an advantage in maintaining them. So what would you call that, a negative selection pressure? Or maybe critters with bad eyes are just as likely to get eaten as critters with good eyes when they’re all in the dark so bad genes are preserved in the population and that’s why they went blind?

* to know if something is convergence or shared ancestry, you have to map the character onto a phylogeny. E.g. it’s very clear bat wings and bird wings are not homologous as wings (and the internal structures are completely different, besides)

* something like seasonal hair loss could be an ancestral trait that was found in the common ancestor of all mammals, for all we know (or at least, for all I know). Again, the research strategy is to map it onto a phylogeny and see if it looks ancestral or independently-derived.

(Also, looking at the structure in detail. Does hair loss typically involve the same genes, the same regulation, etc.? If so the case for common ancestry is stronger.)

Atavisms: probably it won’t work “for real”, but Jack Horner wants to “re-create” dinosaurs by engineering teeth, tails, etc. back into birds. Some of this has been done with chickens. Google “jack horner chickenosaurus”

In general it is hard to predict how much “ancestral” trait is still genetically “close” to being revivable in a living species. But feet, teeth, and tails are fundamental parts of the tetrapod organization. When they are lost, probably this is a fairly minor regulatory change, but many of the fundamental genes involved are maintained for other functions.

There is probably selection *against* things like eyes, when a species moves into a pitch black environment. If nothing else, producing the structure wastes energy if it is not being used.

Here’s the coolest example, ever, of that, the (extinct) reduced-eyes cave duck of Hawaii. No, I didn’t believe it either when I first heard about it:

I suppose that the IDiots will claim that the Great Designer poofed the new genes into existence, He’s not particularly intelligent and also has an inordinate love of fruit flies (in addition to beetles).

By the way, I thought the name of Drosophila melanogaster was going to be changed, moving it into a new genus, or has that proposal been dropped?

Yep, Drosophila is paraphyletic. Some people tried to convince the ICZN to make D. melanogaster the type species, but the proposal was rejected so melanogaster should be placed in the genus Sophophora. Check this and this. Perhaps nobody dares to do it yet? Fear the wrath of geneticists!

Just wanted to say thanks to Jerry Coyne for posting this kind of stuff. Very interesting, and thanks for clarifying and pointing out the false issues that ID tries to fabricate in order to tarnish the well established Theory of Evolution.

I will join in and thank Coyne, Chen et al for the interesting article and commentaries.

Apparently eukaryotes can evolve genes 2 orders of magnitude faster than the whole genome “whole species” approach found that I referenced under the critique of Behe post. Ah, the joy of sex, perhaps.

Now I wonder if DI will jump on the opportunity to complement “Behe and his minions” fitting to data with the broad brush offered here; if eukaryotes evolve readily making Behe’s “FCT”s there is a god-fitting-gap open on prokaryotes. Obviously it is open to facile interpretation that DI’s creator of design was inordinately fond of bacteria and their genes, all the rest is a product of nature. Lord of the Bugs.

So these new genes began as exact duplicates, and “very rapidly” accumulated an average difference of nearly 50%. I suppose that means there was a strong selective force driving the differentiation in small steps … no large saltation. So where did the selection for the first small differentiation come from? It would seem to be a case of what J.J.E. called “subfunctionalization”, the original ‘primary’ form was handling two problems. Since the primary form is quite stable, it must be handling an important function very well, which the unaltered copy continues to handle. The primary form must handle the secondary problem rather poorly, since the evolutionary pressure to alter it is evidently large; the primary solution is adequate but significant gains in efficiency or effectiveness or both are possible.

But if the evolved ‘secondary’ form is disabled, the organism is not vital. Evidently the primary gene is no longer capable of resuming the load… the organism can’t return to the situation prior to the duplication. I suppose that means a whole complex of interacting gene expressions, some metabolic pathway, has evolved, perhaps potentiated by the availability of the mutable copy of the gene.

So I’m picturing a secondary metabolic process which used to be pinned in some tolerable but undesireable form because it overlapped with the primary process, which does not allow alteration to the expressed primary gene. When the secondary gene becomes available, the secondary process migrates quickly. Implying the coevolution of a set of genes.

I suppose I answered my own question. Am I on track here? Anybody still hanging around?

[…] species of Drosophila. Again, Jerry Coyne’s beaten me to it with a detailed review of the paper (New genes arise quickly), but in essence, the authors investigated the gene differences between the species in the D. […]

[…] (new genes), well, mus­ing about the like­li­hood of such a thing is aca­d­e­mic: sci­ence has observed the appear­ance of new genetic mate­r­ial. Third, there is the prob­lem of the human ances­tor line which is rid­dle with holes. From what […]

[…] (new genes), well, mus­ing about the like­li­hood of such a thing is aca­d­e­mic: sci­ence has observed the appear­ance of new genetic mate­r­ial. Third, there is the prob­lem of the human ances­tor line which is rid­dle with holes. From what […]