L1 retrotransposons play an important role in mammalian genome shaping. In particular, they can transduce their 3'-flanking regions to new genomic loci or produce pseudogenes or retrotranscripts through reverse transcription of different kinds of cellular RNAs. Recently, we found in the human genome an unusual family of chimeric retrotranscripts composed of full-sized copies of U6 small nuclear RNAs fused at their 3' termini with 5'-truncated, 3'-poly(A)-tailed L1s. The chimeras were flanked by 11-21 bp long direct repeats, and contained near their 5' ends T2A4 hexanucleotide motifs, preferably recognized by L1 nicking endonuclease. These features suggest that the chimeras were formed using the L1 integration machinery. Here we report the identification of 81 chimeras consisting of fused DNA copies of different RNAs, including mRNAs of known human genes. Based on their structural features, the chimeras were subdivided into nine distinct families. 5' Parts of the chimeras usually originated from different nuclear RNAs, whereas their 3' parts represented cytoplasmic RNAs: mRNAs, including L1 mRNA and Alu RNA. Some of these chimeric retrotranscripts are expressed in a variety of human tissues. These findings suggest that RNA-RNA recombination during L1 reverse transcription followed by the integration of the recombinants into the host genome is a general event in genome evolution.

To remind everyone, retrogenes are created as follows:

1. mRNA from existing gene --> reverse transcription into cDNA.

This is carried out by an endogenous reverse transcriptase, which may come from endogenous retroviruses, Alu elements, etc.

2. cDNA is integrated into genome.

This is also carried out by endogenous integration proteins. Again, these can also come from endogenous retroviruses, etc.

The result is a gene duplicate that usually has the following attributes:

A. No introns.B. Trucated at the 5' end.C. The remnants of a poly-A tail at the 3' end.D. Flanking repeats.

These attributes make identifying retrogenes relatively easy, and they provide extremely strong evidence that these genes did indeed arise through retrotransposition. Given that they contain a truncation, and are integrated without regulatory sequences, the most common fate for a retroelement of this kind is to be non-functional. However, examples of functional retrogenes exist.

The chimeras observed in the present study are formed when an RNA-RNA hybrid, consisting of two unrelated RNA transcripts, is acted upon by a reverse transcriptase. Here is the proposed mechanism:

Quote

Figure 4. A probable mechanism for the chimeras’ formation. (Step 1) An L1 pre-integration complex binds L1, Alu or the host mRNA in the cytoplasm. (Step 2) The ribonucleoprotein formed is transferred to the nucleus. (Step 3) Reverse transcription of the bound mRNA primed by a genomic DNA single-stranded break within the TTTTAA sequence. (Step 4) Another (nuclear) RNA binds to the L1 reverse transcription/integration complex. (Step 5) Switch of templates for the reverse transcription. (Step 6) The DNA reparation mediated formation of a new chimeric retrogene insertion flanked by short direct repeats and carrying a poly(A) sequence at the 3' terminus.

The authors found that at least 6 of the chimeric retrogenes that they found (out of 81) are being expressed in human tissues, indicating that they likely have a function. Whether or not the rest have a function isn't known, but the authors only directly tested (via RT-PCR analysis) 6 of the sequences and found that 4 were unexpressed. So these identified retroelements probably contain a mixture of functional and non-functional sequences.

Moreover, the presence of these chimeric retrogenes also provides evidence for common descent, since they have integrated at various times during primate evolution, and yet they follow the standard phylogeny:

Quote

Figure 3. Results of the 12 chimeric retrogenes insertional polymorphism study. The chimeras’ integration times were estimated according to the presence/ absence of the inserts in genomic DNAs of different primate species.

One of the retrogenes is polymorphic within the human population, indicating that it arose since the last common ancestor of all humans.

This is very strong evidence for common descent, along the lines of shared errors in pseudogenes.

Ruminant Bcnt protein with a molecular mass of 97 kDa (designated p97Bcnt) includes a region derived from the endonuclease domain of a retrotransposable element RTE-1. Human and mouse Bcnt proteins lack the corresponding region but have a highly conserved 82-amino acid region at the C-terminus that is not present in p97Bcnt. By screening a bovine BAC library, we found two more bcnt-related genes: human-type bcnt (h-type bcnt) and its processed pseudogene. Whereas the pseudogene is localized on chromosome 26, both bcntp97 and the h-type bcnt genes are found on bovine chromosome 18, a synteny region of human chromosome 16 on which human BCNT is localized. Complete nucleotide sequencing of the BAC clone reveals that the bcntp97 and h-type bcnt genes are located just 6 kb apart in a tandem manner. The two h-type bcnt and bcntp97genes are active at both the transcriptional level and the protein level. H-type bovine Bcnt is more like human BCNT than p97Bcnt, when compared at their N-terminal regions. However, phylogenetic analysis using the N-terminal region of the bcnt gene family revealed that the duplication of bovine genes occurred within the bovine lineage with significantly accelerated substitution in bcntp97. This acceleration was not ascribed definitely to positive selection. After duplication, one of the bovine bcnt genes recruited the endonuclease domain of an intronic RTE-1 repeat accompanied by the accelerated substitution at the 5'-ORF, resulting in creation of a novel type of Bcnt protein in bovine.

Gene organization of bovine BCNT that contains a portion corresponding to an endonuclease domain derived from an RTE-1 (Bov-B LINE), non-LTR retrotransposable element: duplication of an intramolecular repeat unit downstream of the truncated RTE-1.

SCAN domain-containing 2 gene (SCAND2) is a novel nuclear protein derived from the zinc finger family by exon shuffling.

Dupuy D, Duperat VG, Arveiler B.

Quote

The SCAN domain is a recently recognized protein domain that characterizes a subfamily of the Kruppel-like zinc finger proteins. We have previously described a novel SCAN domain-containing 2 gene (SCAND2) that does not belong to the zinc finger family. We report structural and sequence analyzes of all known members of the SCAN family and use these data to illustrate a model of gene family evolution. Most of the SCAN containing genes share common gene organization features that support the proposed origin for SCAND2 by disruption of an ancestral SCAN-zinc finger gene by a retroposition event and subsequent exon shuffling.

Here's their model:

[oops, can't post it]

Quote

Fig. 2. Proposed model for SCAND2 gene origin. (A) Insertion of a C1orf12 cDNA in the first intron of a SCAN-containing ancestor gene located in 15q25 allowed a shadow exon from C1orf12 to be spliced downstream of a SCAN encoding exon, thus producing a new protein product. White boxes: exons from C1orf12; Hatched boxes: exons from the ancestral SCAN gene; dotted box: shadow exon from C1orf12. (B) Alignment between C1orf12 and SCAND2 cDNAs. The inserted retroposon contains a consensus acceptor splice site located upstream of the C1orf12 original translation start site. Asterisks indicate differences between the nucleotide sequences.

The purposes of the present paper are threefold. First, a method will be proposed by which the rate of accumulation of genetic information in the process of adaptive evolution may be measured. Secondly, for the first time, an approximate estimate of the actual amount of genetic information in higher animals will be derived which might have been accumulated since the beginning of the Cambrian epoch (500 million years), and thirdly, there is a discussion of problems involved in the storage and transformation of the genetic information thus acquired. There is a vast field of fundamental importance which awaits the fruitful activities of statisticians and other applied mathematicians collaborating with biologists.

I don't recall ever seeing a discussion of Kimura's paper in any antievolution argument on information.

Phylogenetically new insertions of repetitive sequences may contribute to genome evolution by altering the function of pre-existing proteins. One example is the SVA sequence which forms the C-terminal coding exon of the human leptin receptor isoform 219.1. Here, we report that the SVA insertion into the LEPR locus has occurred after divergence of humans and chimpanzees. The SVA element was inserted into a Hal-1/LINE element present in all monkeys and apes tested. Structural features point towards an integration event that was mediated by the L1 protein machinery acting in trans. Thus our findings add evidence to the hypothesis that retrotransposition events are a driving force in genomic evolution and that the presence or absence of specific retroelements are one distinguishing feature that separates humans from chimpanzees.

The definitive demonstration of a role for a recently acquired gene is a difficult task, requiring exhaustive genetic investigations and functional analysis. The situation is indeed much more complicated when facing multicopy gene families, because most or portions of the gene are conserved among the hundred copies of the family. This is the case for the ERVWE1 locus of the human endogenous retrovirus W family (HERV-W), which encodes an envelope glycoprotein (syncytin) likely involved in trophoblast differentiation. Here we describe, in 155 individuals, the positional conservation of this locus and the preservation of the envelope ORF. Sequencing of the critical elements of the ERVWE1 provirus showed a striking conservation among the 48 alleles of 24 individuals, including the LTR elements involved in the transcriptional machinery, the splice sites involved in the maturation of subgenomic Env mRNA, and the Env ORF. The functionality and tissue specificity of the 5' LTR were demonstrated, as well as the fusogenic activity of the envelope polymorphic variants. Such functions were also shown to be preserved in the orthologous loci isolated from chimpanzee, gorilla, orangutan, and gibbon. This functional preservation among humans and during evolution strongly argued for the involvement of this recently acquired retroviral envelope glycoprotein in hominoid placental physiology.

Evolutionary innovation of the excretory system in Caenorhabditis elegans.

Wang X, Chamberlin HM.

Quote

The evolution of complexity relies on changes that result in new gene functions. Here we show that the unique morphological and functional features of the excretory duct cell in C. elegans result from the gain of expression of a single gene. Our results show that innovation can be achieved by altered expression of a transcription factor without coevolution of all target genes.

Evolution of a novel function: nutritive milk in the viviparous cockroach, Diploptera punctata.

Williford A, Stay B, Bhattacharya D.

Quote

Cockroach species show different degrees of maternal contribution to the developing offspring. In this study, we identify a multigene family that encodes water-soluble proteins that are a major component of nutritive "Milk" in the cockroach, Diploptera punctata. This gene family is associated with the evolution of a new trait, viviparity, in which the offspring receive nutrition during the gestation period. Twenty-five distinct Milk complementary DNAs were cloned and partially characterized. These complementary DNAs encode 22 distinct Milk peptides, each of length 171 amino acids, including a 16-amino acid signal peptide sequence. Southern blot analysis confirms the presence of multiple copies of Milk genes in D. punctata. Northern analysis indicates tissue- and stage-specific Milk gene expression. Examination of the deduced amino acid sequences identifies the presence of structurally conserved regions diagnostic of the lipocalin protein family. The shared exon/intron structure of one of the Milk loci with lipocalin genes further supports a close evolutionary relationship between these sequences.

Not long ago after the first living organisms appeared on earth about 3.5 billion years ago, they started undergoing mutations and adaptations. One of the very earliest of these created two enzymes, each with distinct but related functions, where only one previously existed. Using a combination of the modern techniques of structural biochemistry and protein engineering, combined with molecular phylogeny, the author recreates the story of this very ancient event.

The article is about the origin of two ancient proteins from a common ancestor (enzymes that handle NAD or NADH) and is quite detailed but also written for the nonexpert. I believe Ken Miller has used the technical paper it was based on as an example. It is now freely online at American Scientist.

Adding this also to the EvoWiki page on the evolution of new information.

This is a good review article about the origin of alternative splicing, and provides evidence of non-coding intronic sequences being transcribed as part of the protein. In other words, more or less random non-coding DNA being adapted to be part of a functional protein.

Evolution of alternative splicing: deletions, insertions and origin of functional parts of proteins from intron sequences.

Kondrashov FA, Koonin EV.

Abstract:

Quote

Alternative splicing is thought to be a major source of functional diversity in animal proteins. We analyzed the evolutionary conservation of proteins encoded by alternatively spliced genes and predicted the ancestral state for 73 cases of alternative splicing (25 insertions and 48 deletions). The amino acid sequences of most of the inserts in proteins produced by alternative splicing are as conserved as the surrounding sequences. Thus, alternative splicing often creates novel isoforms by the insertion of new, functional protein sequences that probably originated from noncoding sequences of introns.

Some relevant text:

Quote

From the evolutionary standpoint, inserted alternative sequences could be expected to be short if they evolved from noncoding sequences because in-frame stop codons are likely to occur in long noncoding sequences. None of the inserted alternative sequences showed significant similarity to any protein sequences except for their counterparts in orthologs from other species. Furthermore, inserted alternative sequences never included more than one exon (Fig. 2). These observations are compatible with the origin of inserts in LDAS [length difference alternative splicing] from noncoding sequences, most likely from a part of the intron separating the adjacent constitutive exons. On four occasions, this was supported by more direct observations whereby the inserted alternative sequence comprised either an entire intron or a portion of intron joining the adjacent exon.

....

Thus, in addition to straightforward exon skipping, a major route for origin of LDAS is insertion of new exons, which encode new, functionally important protein sequences, thus creating functionally distinct isoforms of the respective proteins. Such new exons might in some cases have evolved by tandem duplication of adjacent exons, but more often, they appear to have evolved de novo from noncoding intron sequences. Most of these intron sequences apparently have been recruited to become new coding sequences relatively recently; for example, only in mammals (supplementary material at http://archive.bmn.com/supp/tig/march_Kondrashov_supply.pdf). These observations suggest that evolution of new coding sequences from noncoding ones is an active, ongoing process in eukaryotes. It seems probable that we uncovered only the tip of the proverbial iceberg. Indeed, we detected insertion of new exons that probably originate from intron sequences only for LDAS and only for those cases where yeast and/or prokaryotic orthologs were readily detectable. The actual contribution of intron sequences to the emergence of new protein sequences in eukaryotes is probably substantially greater.

Birth and adaptive evolution of a hominoid gene that supports high neurotransmitter flux.

Burki F, Kaessmann H.

Quote

The enzyme glutamate dehydrogenase (GDH) is important for recycling the chief excitatory neurotransmitter, glutamate, during neurotransmission. Human GDH exists in housekeeping and brain-specific isotypes encoded by the genes GLUD1 and GLUD2, respectively. Here we show that GLUD2 originated by retroposition from GLUD1 in the hominoid ancestor less than 23 million years ago. The amino acid changes responsible for the unique brain-specific properties of the enzyme derived from GLUD2 occurred during a period of positive selection after the duplication event.

Origin and neofunctionalization of a Drosophila paternal effect gene essential for zygote viability.

Loppin B, Lepetit D, Dorus S, Couble P, Karr TL.

Quote

Background: Although evolutionary novelty by gene duplication is well established, the origin and maintenance of essential genes that provide entirely new functions (neofunctionalization) is still largely unknown. Drosophila is a good model for the search of genes that are young enough to allow deciphering the molecular details of their evolutionary history. Recent years have seen increased interest in genes specifically required for male fertility because they often evolve rapidly. A special class of genes affecting male fertility, the paternal effect genes, have also become a focus of study to geneticists and reproductive biologists interested in fertilization and sperm-egg interactions. Results: Using molecular genetics and the annotated Drosophila melanogaster genome, we identified CG14251 as the Drosophila paternal effect gene, ms(3)K81 (K81). This assignment was subsequently confirmed by P-element rescue of K81. A search for orthologous K81 sequences revealed that the distribution of K81 is surprisingly restricted to the 9 species comprising the melanogaster subgroup. Phylogenetic analyses indicate that K81 arose through duplication, most likely retroposition, of a ubiquitously expressed gene before the radiation of the melanogaster subgroup, followed by a period of rapid divergence and acquisition of a critical male germline-specific function. Interestingly, K81 has adopted the expression profile of a flanking gene suggesting that transcriptional coregulation may have been important in the neofunctionalization of K81. Conclusion: We present a detailed case history of the origin and evolution of a new essential gene and, in so doing, provide the first molecular identification of a Drosophila paternal effect gene, ms(3)K81 (K81).