An exploration of the raw power of genetic material to refashion itself to any purpose…

Virtually all organisms contain multiple mobile DNAs that can move from place to place, and in some organisms, mobile DNA elements make up a significant portion of the genome. Mobile DNA III provides a comprehensive review of recent research, including findings suggesting the important role that mobile elements play in genome evolution and stability.

Editor-in-Chief Nancy L. Craig assembled a team of multidisciplinary experts to develop this cutting-edge resource that

covers the specific molecular mechanisms involved in recombination, including a detailed structural analysis of the enzymes responsible

presents a detailed account of the many different recombination systems that can rearrange genomes

examines the tremendous impact of mobile DNA in virtually all organisms

Mobile DNA III is valuable as an in-depth supplemental reading for upper level life sciences students and as a reference for investigators exploring new biological systems. Biomedical researchers will find documentation of recent advances in understanding immune-antigen conflict between host and pathogen. It introduces biotechnicians to amazing tools for in vivo control of designer DNAs. It allows specialists to pick and choose advanced reviews of specific elements and to be drawn in by unexpected parallels and contrasts among the elements in diverse organisms.

Mobile DNA III provides the most lucid reviews of these complex topics available anywhere.

“Mobile DNA III provides a one-stop-shop for anyone wanting to keep abreast of current understanding of the activity of these elements and the enzymes that they encode. It should will be an invaluable resource for some time to come.”

— David Finnegan, Institute of Cell Biology, University of Edinburgh, King's Buildings.

Nancy L. Craig, is an Investigator with the Howard Hughes Medical Institute and a Professor in the Department of Molecular Biology & Genetics at the Johns Hopkins University School of Medicine. Dr. Craig is a member of the National Academy of Sciences and a Fellow of the American Association for the Advancement of Science, American Academy of Microbiology, and the American Academy of Arts and Sciences.

Introduction

DNA has two critical functions: to provide the cell with the information necessary for macromolecular synthesis and to transmit that information to progeny cells. Genome sequence stability is important for both these functions. Indeed, cells devote significant resources to various DNA repair processes that maintain genome structure and repair alterations that can arise from DNA synthesis errors and assaults from both endogenous and exogenous sources. DNA sequence variation, however, provides the substrate for adaptation, selection, and evolution.

Tyrosine family site-specific recombinases (YRs), named after the active site tyrosine nucleophile they utilize for DNA strand breakage, are widely distributed among prokaryotes. They were thought to be nearly absent among eukaryotes, the budding yeast lineage (Saccharomycetaceae) being an exception in that a subset of its members houses nuclear plasmids that code for YRs (1, 2). However, YR-harboring DIRS and PAT families of retrotransposons and presumed DNA transposons classified as Cryptons have now been identified in a large number of eukaryotes (3, 4). The presence of functional YRs encoded in Archaeal genomes has been established by a combination of comparative genomics and modeling complemented by biochemical and structural analyses (5, 6). Over 1300 YR sequences mined from bacterial genome databases have been organized into families and subfamilies, providing a better understanding of the evolutionary relationships among them (7). These classifications also encourage investigations into the potential functional significance of YRs whose genes are present as pairs or trios in bacterial and plasmid genomes.

The term site-specific recombination encompasses a group of biological processes that, unlike homologous recombination, promote rearrangements of DNA by breaking and rejoining strands at precisely defined sequence positions. In a canonical site-specific recombination event, two discrete sites (sequences of DNA, typically a few tens of base pairs long) are broken, and the ends are reciprocally exchanged and rejoined, resulting in recombinant products (Fig. 1). Site-specific recombination does not require extensive sequence homology; the sites are identified and brought together by protein–DNA and protein–protein interactions involving specialized recombinase proteins, unlike homologous recombination where DNA–DNA interactions define the loci of strand exchange. “Conservative” site-specific recombination systems form recombinants without any requirement for DNA synthesis or high-energy cofactors. Some other biological processes such as transposition are sometimes categorized with site-specific recombination because of common features including cleavage and rejoining of DNA strands at precise positions defined by protein–DNA interactions, but these processes may require DNA synthesis and/or ligase-mediated rejoining of DNA strands. The systems discussed in this chapter conform to the strict “conservative” definition. General aspects of site-specific recombination have been reviewed elsewhere (1, 2, 3).

The λ site-specific recombination pathway has enjoyed the sequential attentions of geneticists, biochemists, and structural biologists for more than 50 years. It has proven to be a rewarding model system of sufficient simplicity to yield a gratifying level of understanding within a single (fortuitously timed) professional career, and of sufficient complexity to engage a small cadre of scientists motivated to peal this onion. The initiating highlight of the genetics phase was the insightful proposal by Allan Campbell for the pathway by which the λ chromosome integrates into, and excises from, the Escherichia coli host chromosome (1). The breakthrough for the biochemical phase was the purification of λ integrase (Int) and the integration host factor (IHF) by Howard Nash (2, 3). The first major step in the structural phase was the cocrystal structure of IHF bound to its DNA target site by Phoebe Rice and Howard Nash (4). Although the crystal structure of naked Fis protein had been determined earlier (5, 6), the full impact of Fis on understanding the fundamentals of the Int reaction did not come until much later (7, 8).

The use of Cre recombinase to carry out conditional mutagenesis of transgenes and insert DNA cassettes into eukaryotic chromosomes is widespread (1–9). Indeed, a PubMed search for “cre recombinase” in early 2014 returned over 4000 articles. In addition to the numerous in vivo and in vitro applications that have been reported since Cre was first shown to function in yeast and mammalian cells nearly 30 years ago (10, 11), the Cre–loxP system has also played an important role in understanding the mechanism of recombination by the tyrosine recombinase family of site-specific recombinases(12–14). The simplicity of this system, requiring only a single recombinase enzyme and short recombination sequences for robust activity in a variety of contexts (15), has been an important factor in both cases. Cre has also been used in experiments designed to understand the functions of other recombination systems (16–18).

Integrons are genetic platforms that allow bacteria to evolve rapidly through the acquisition, stockpiling, excision, and reordering of open reading frames found in mobile elements named cassettes. The evolutionary potency that integrons provide for bacteria is based on the variety of functions encoded in the cassettes, as well as on the intricate coupling of integron activity to bacterial stress (1).

It was Barbara McClintock who first described the problems of segregation arising from the circularity of chromosomes during her studies on maize variegation (1). The importance of this observation, which could have passed as a mere oddity at the time because of the linear nature of chromosomes in Eukaryota, was only realized after the demonstration of the circular nature of the Escherichia coli chromosome by François Jacob and Elie Wollman in the 1960s (2). Since then, the wealth of information gained by genomic studies has shown that circular chromosomes are the norm in Bacteria and Archaea.

Bacteroides spp. are one of the more prevalent members of the human colonic microbiota, representing approximately 40% of the bacterial community (1). Bacteroides spp. are normally in symbiosis with their human hosts. Although they are usually harmless members of the gut microbiota, they can become opportunistic pathogens if released from the colon (2, 3). This most commonly occurs due to surgery, trauma or disease such as gangrenous appendicitis or malignancies (4). Among anaerobic bacteria, Bacteroides spp. are the pathogens most commonly isolated from clinical samples, including blood (2). The treatment of Bacteroides infections has become more challenging as they have acquired a variety of genes that encode resistances to antibiotics. In the 1970s, only 20 to 30% of Bacteroides spp. clinical isolates were resistant to tetracycline. By the 1990s, the prevalence of tetracycline resistance had increased to 80% (5). This increase in tetracycline resistance can be attributed to the presence of integrative and conjugative elements (ICEs) that encode antibiotic resistance genes.

Reversible site-specific inversions of DNA segments occur within the genomes of many bacteria and their phages. These reactions are catalyzed by a dedicated recombinase and do not employ the use of the general recombination-repair machinery. In some cases, additional host regulatory proteins also perform critical functions, and DNA superstructure can play a profound role. In general, site-specific DNA inversions occur at a low frequency and regulate gene expression by coupling alternative protein coding segments to a fixed promoter or by switching the orientation of a promoter with respect to coding region(s). In this manner, a small subset of the population becomes preadapted to a potential change in the environment.

The serine resolvases are a group of recombinases that, in their native contexts, resolve large fused replicons into smaller separated ones (1). Serine resolvases and the closely related invertases were the first serine recombinases to be studied in detail, and much of our understanding of the serine recombinase mechanism is owed to those early studies. Resolvases and invertases have also served as paradigms for understanding how DNA topology can be harnessed to regulate recombination reactions (2–5). Like other serine recombinases, the resolvases have a largely modular structure. In the resolvase case, the conserved catalytic domain is followed by a DNA-binding domain that is a simple helix-turn-helix similar to that found in many prokaryotic repressors (6). This modularity, combined with a wealth of structural and biochemical data, has made them good targets for engineering chimeric recombinases with designer sequence specificity (7, 8).

Conservative site-specific recombination systems are ubiquitous in bacteria, where they play important roles in the horizontal transfer of genetic information, genome stability, and the control of gene expression. The outcomes of site-specific recombination are DNA integration, DNA excision (sometimes referred to as resolution), and DNA inversion. The systems comprise a recombinase, the sequence specific DNA substrates, and any accessory factors that are required for control. There are two evolutionarily and mechanistically different families of site-specific recombinases, the tyrosine and serine recombinases (1). All types of recombination outcomes are mediated by recombinases from both families. In the serine recombinase family there is a clear division between the resolvase/invertases and the enzymes that mediate integration/excision. The resolvase/invertases are approximately 180 to 200 amino acid proteins and are increasingly referred to as the small serine recombinases. The (pro)phage-encoded serine integrases, the transposases, such as those from clostridial ICE elements, Tn4451 and Tn5397, and the recombinases from the staphylococcal cassette chromosomes (SCC) elements are between 400 and 700 amino acids and are collectively known as the large serine recombinases (LSRs) (2). The first LSRs to be studied in in vitro recombination systems were the integrases from the Streptomyces phage, ɸC31 (3) and mycobacteriophage Bxb1 (4). Understanding the mechanism of the LSRs, however, was greatly hindered by the lack of structural information. In a breakthrough paper, Rutherford et al. published the structure of the large C-terminal domain (CTD) of a serine integrase bound to one half site of one substrate (5). This work has led to a step change in our understanding of the mechanism of the serine integrases (6, 7).

Hairpin telomere resolvases (also known as protelomerases) have emerged as a unique solution to the end replication problem (1, 2). These enzymes promote the formation of covalently closed hairpin ends on linear DNA molecules in some phage (3, 4, 5), bacterial plasmids and bacterial chromosomes (6, 7, 8, 9). Telomere resolvases are mechanistically related to tyrosine recombinases and type IB topoisomerases and are also believed to play a role in the genome plasticity that characterizes Borrelia species. Fig. 1 shows the reaction pathway for replication of linear DNA molecules with covalently closed hairpin telomeres. Duplication of the DNA molecule results in replicated telomeres (rTel, also referred to as dimer junctions) that are recognized and processed in a DNA breakage and reunion reaction promoted by a hairpin telomere resolvase. The reaction products are covalently closed hairpin telomeres at both ends of linear monomeric DNA molecules. At this writing telomere resolvases have been purified from three phage and seven bacterial species: E. coli phage N15 (3), Klebsiella oxytoca phage ɸKO2, Yersinia enterocolitica phage PY54 (5), Agrobacterium tumefaciens (8), the Lyme spirochete Borrelia burgdorferi (6), the relapsing fever borreliae B. hermsii, B. parkeri, B. recurrentis, B. turicatae, and the avian spirochete B. anserina (7). The B. burgdorferi enzyme, ResT (Resolvase of Telomeres) has been the most extensively studied at the biochemical level (6, 7, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23) and is the primary focus of this review, with properties of the other enzymes noted (3, 4, 5, 8, 24). Structural studies of the Klebsiella phage ɸKO2 (25) and the Agrobacterium (26) resolvases have been reported and have shed additional light on reaction mechanisms and on differences between the resolvases from different organisms.

In the early 1980s, conjugative transposons were defined as large DNA segments of bacterial chromosomes capable of “intercellular transposition,” i.e., fragments able to move from the chromosome of a donor bacterium to the chromosome of a recipient bacterium during cell-to-cell contact. All these mobile genetic elements were found in pathogenic low GC Gram-positive bacteria, conferred antibiotic resistance properties, and were often capable of integrating into a large array of different sites (for review, see references 1, 2, 3, 4). Characterization of the molecular mechanism allowing integration into and excision from the chromosome revealed that conjugative transposons such as Tn916 do not encode a DDE transposase, but rather a site-specific tyrosine recombinase. Fundamental differences in the molecular mechanism of DNA strand exchanges catalyzed by transposases and site-specific tyrosine recombinases, and subsequent identification of conjugative mobile elements integrating into a unique site of the bacterial chromosome in both Gram-positive and Gram-negative bacteria exposed the inadequacy of the naming “conjugative transposons.” In fact, at the time the confusion in the scientific community was such that, in some instances, related elements were mislabeled as conjugative plasmids, R factors, or integrating conjugative plasmids (5). Two nomenclatures proposed to replace the obsolete term by a more adequate nomenclature: constin, an acronym that stands for conjugative, self-transmissible, integrating element, and ICE, an acronym for integrative and conjugative element (5, 6). Over the years the term ICE gained a broader acceptance among many authors to describe elements found in both Gram-positive and Gram-negative bacteria, so this term is used hereafter instead of conjugative transposon.

Programmed Rearrangements

The realization, now more than half a century ago, that B cells can generate antibodies to an astounding variety of chemical structures sparked intense interest in the “generation of diversity” question (reviewed in reference 1). The correct solution to this puzzle turned out to be both surprising and simple: the exons encoding the antigen-binding portions of the receptor (the so-called variable regions) are assembled by chromosomal breakage and rejoining in developing lymphocytes (2). Immunoglobulins and T-cell receptors are composed of two polypeptide chains, each of which contributes to the antigen-binding domain. The exons encoding the antigen-binding domains are assembled from so-called V (variable), D (diversity), and J (joining) gene segments by “cut and paste” DNA rearrangements. This process, termed V(D)J recombination, chooses a pair of segments, introduces double-strand breaks adjacent to each segment, deletes (or, in selected cases, inverts) the intervening DNA, and ligates the segments together (Fig. 1). Rearrangements occur in an ordered fashion, with D-to-J joining proceeding before a V segment is joined to the rearranged D-J segments. This process of combinatorial assembly—choosing one segment of each type from several (sometimes many) possibilities—is the fundamental engine driving antigen-receptor diversity in mammals. Diversity is tremendously amplified by the characteristic variability at the junctions (loss or gain of small numbers of nucleotides) between the various segments. This process leverages a relatively small investment in germline coding capacity into an almost limitless repertoire of potential antigen-binding specificities.

The B cell receptor (BCR) is expressed on the B lymphocyte cell surface where it serves as a receptor for foreign antigens (1). The BCR is comprised of two immunoglobulin (Ig) heavy (IgH) chains encoded by the IgH heavy chain locus and two Ig light (IgL) chains encoded by, for a given BCR, either the Igκ or Igλ (collectively referred to as IgL) light chain loci (Fig. 1). These three Ig loci lie on different chromosomes in both humans and mice. While there are certain differences in organization, the overall strategies for Ig gene diversification in mice and humans are very much the same (2, 3), so this review will focus mainly on the mouse. The amino-terminal portions of the IgH and IgL chains have a highly variable amino acid sequence from species to species of antibody and are called variable (V) regions. The IgH and IgL variable regions interact to generate the antigen-binding portion of the BCR/antibody. The carboxy-terminal end of IgH and IgL chains have only a few variations in their sequences and thus are called constant (C) regions.

This chapter reviews recent studies on the remarkable phenomenon of programmed DNA rearrangements in ciliated protozoa, focusing primarily on the species Tetrahymena thermophila. The phenomenon occurs widely among ciliates, a diverse group of single-celled eukaryotes. It varies significantly in mechanistic detail among the species that have been described, chiefly Tetrahymena, Paramecium, Euplotes, Stylonychia, and Oxytricha. Readers are referred to other chapters in this volume for studies in Paramecium and Oxytricha. Tetrahymena displays perhaps the simplest version of these DNA rearrangements and is the easiest to grow and manipulate in the laboratory, hence offering excellent opportunities for in-depth understanding. Since the publication of Mobile DNA II (1), significant progress has brought about fundamental changes in our understanding. Among other things, clear links have now been established between these processes and RNA interference (RNAi) and transposon domestication. This chapter will concentrate on progress made during this period, with a brief summary of earlier work to provide an introduction.

Ciliates belong to a monophyletic group of unicellular eukaryotes within the Alveolate branch (1). The species that have been used as model organisms are free-living organisms, but parasitic or endosymbiotic ciliates have also been characterized (2, 3). A handful of ciliates have been studied, and common features could be deduced (reviewed in 4). They all carry motile and sensory cilia at the cell surface that allow swimming, food uptake, and the sensing of environmental signals. They present a characteristic nuclear dimorphism and undergo spectacular, genome-wide programmed rearrangements during development. The study of the mechanisms and regulation pathways underlying genome rearrangements has revealed a great diversity in the strategies used by different ciliates (5). The present chapter will focus on Paramecium, a widespread group of species that can be found on all continents. The sequences of the somatic genomes of two species, Paramecium tetraurelia and Paramecium caudatum, have been published recently (6, 7, 8). P. tetraurelia, which belongs to the Paramecium aurelia group of sibling species (9), is by far the most extensively studied species at the genomic level, and will be the main focus of the present chapter.

Ciliates are microbial eukaryotes with separate germline and somatic nuclei. The DNA-rich somatic macronucleus forms by differentiation of a copy of the diploid, zygotic germline micronucleus during sexual reproduction. The distinctive genome architectures of ciliates make them attractive model systems to study a wide range of key biological phenomena. These include complex genome rearrangements on a massive scale, a diverse range of noncoding RNA pathways, and several examples of non-Mendelian inheritance. In particular, ciliates belonging to the subclass Stichotrichia, such as the genus Oxytricha, display the most exaggerated form of genome remodeling, stitching together somatic chromosomes from precursor gene segments, all under the epigenetic control of novel noncoding RNA pathways.

One of the most powerful drivers of evolutionary change is the process of adaptation and counter-adaptation by interacting species (1). The so-called “arms race” between parasites and their hosts is a prime example of such reciprocal coevolution: host adaptations that reduce or attempt to remove parasites select for parasite adaptations that enable evasion of host defences. Elaborate, powerful and sometimes elegant mechanisms of host immunity and parasite infectivity are thought to have arisen from many iterations of this process. A case in point is the mammalian adaptive immune system, perhaps one of the more complex host defence mechanisms detailed to date, which uses directed DNA rearrangements, mutagenesis and selection during the development of T and B immune cells to generate vast numbers of genes encoding immunoglobulin receptors capable of recognizing the huge range of antigens in infecting pathogens (2). Parasites, on the other hand, have evolved various means of evading adaptive immunity. One such mechanism of immune evasion that is widely recorded among viruses and bacterial and eukaryotic pathogens is antigenic variation. Because parasite killing often depends on a match between circulating host immunity and parasite antigen, individual parasites that no longer express that antigen variant, but instead express an antigenically different variant in its place, survive and can proliferate. However, this advantage tends to be short-lived because immune responses will develop against the different antigen in turn. Hence, members of parasite lineages inhabiting an immunocompetent host are repeatedly being selected for antigenic novelty over the course of infection.

Antigenic variation is of great importance for the success and survival of various pathogens ranging from trypanosomes to bacteria, fungi, and the focus of this paper, Plasmodium falciparum, the most virulent of the human malaria parasites (1). For each pathogen, the pressure to diversify surface proteins exposed to the immune system is counterbalanced by the need to preserve function, which in the case of P. falciparum is the maintenance of binding capacity to receptors on vasculature endothelial cells (2). Each pathogen has developed a systematic method to diversify surface proteins while balancing these strong but opposing selection pressures. This typically involves the generation of DNA sequence modifications to the genes that encode the surface proteins in ways that generate diversity without compromising function. These changes are created using the particular complement of DNA recombination and repair pathways present within the pathogen. Due to the critical nature of maintaining DNA integrity, DNA repair pathways are highly conserved across species from bacteria to mammals and components of most pathways can be readily identified in various organisms (3), making DNA recombination/repair a subject of interest both for evolutionary biologists as well as for those interested in host–pathogen interactions.

The majority of species in the genus Neisseria are commensal bacteria that colonize mucosal surfaces. The two pathogenic species, Neisseria gonorrhoeae (the gonococcus) and Neisseria meningitidis (the meningococcus), are the causative agent of gonorrhea and the primary cause of bacterial meningitis in young adults, respectively. Both organisms are strict human pathogens with no known environmental reservoirs that have evolved from commensal organisms within the human population (1). The study of the Neisseria is important for public health reasons, but also provides a defined system to study evolution of two highly related organisms that cause distinct diseases. One unique aspect of the pathogenic Neisseria is the presence of sophisticated genetic systems that contribute to pathogenesis. The processes of DNA transformation and pilin antigenic variation will be discussed in this chapter.

Antigenic variation is defined as a hereditable, reversible variation in an antigenic structure that occurs during the course of infection at a rate higher than would be expected for standard recombination or mutation mechanisms. Many bacterial and protozoal pathogens have developed antigenic variation systems in which surface antigens can be continually altered as a means of evading the constant onslaught of adaptive antibody and T cell responses (1). In 1997, an elaborate antigenic variation system was identified in Borrelia burgdorferi B31 (2). Because of sequence similarity between this system and the previously characterized variable major protein (VMP) system of relapsing fever bacteria, it was termed the VMP-like sequence (vls) locus. Its expression site, called vls Expressed (vlsE), undergoes remarkable sequence variation involving segmental gene conversion events from vls silent cassettes. This review describes what is currently known about the structure, properties, role in host–pathogen interactions, recombination process and evolution of the vls system.

The budding yeast Saccharomyces cerevisiae propagates vegetatively either as MATa or MATα haploids or as MATa/MATα diploids created by conjugation of the opposite haploid types (Fig. 1). Mating type is determined by two different alleles of the mating-type (MAT) locus.

Cells of the highly diverged Schizosaccharomyces (S.) pombe and S. japonicus fission yeasts exist in one of the two sex/mating types, called P (for plus) or M (for minus), specified by which allele, M or P, resides at mat1 (Fig. 1). The fission yeasts have evolved an elegant mechanism for switching P or M information at mat1 by a programmed DNA recombination event with a copy of one of the two silent mating-type genes residing nearby in the genome. The switching process is highly cell-cycle and generation dependent such that only one of four grandchildren of a cell switches mating type, and switching occurs in nearly half the cells of a population. Such a change of cell type is analogous to the stem-cell division found in higher eukaryotes whereby sister cells differ in their fate. Extensive studies of fission yeast established the natural DNA strand chirality at the mat1 locus as the primary basis of asymmetric cell division. This asymmetry results from a unique site- and strand-specific epigenetic “imprint” at mat1 installed in one of the two chromatids during DNA replication. The imprint is inherited by only one daughter cell, maintained for one cell cycle, and then used for initiating recombination during mat1 replication. The progression through two replication cycles and two cell divisions leads to the “one-in-four” switching proportion among granddaughter cells. This mechanism of cell-type switching is considered to be unique to these two organisms, but determining the operation of such a mechanism in other organisms has not been possible for technical reasons. Thus, the validity of this mechanism for development in general remains untested. This review summarizes recent exciting developments in understanding the mechanism of mat1 switching in fission yeasts and extends these observations to suggest how such a DNA strand-based mechanism of cellular differentiation could also operate in diploid organisms. Although the analogous cell-type switching found in the Saccharomyces (Sa.) cerevisiae budding yeast by HO-endonuclease cleavage of the mating-type locus appears superficially similar to that of fission yeast, the mechanistic details are very different in these organisms. Thus, studies with diverse single-celled model yeast organisms have been helpful to appreciate how different paradigms of cellular differentiation have evolved. Considering that the ultimate basis of cellular differentiation in yeast is the double-helical structure of DNA, it is likely that such a mechanism operates in higher eukaryotes as well.

DNA-only Transposons

In this chapter, we provide an overview of the fundamental concepts of DNA transposition mechanisms. Our aim is to emphasize basic themes and, in this effort, we will focus on specific illustrative cases rather than attempt an exhaustive review of the literature. We hope that the selected references will point the curious reader towards the landmark studies in the field as well as some of the most exciting recent results. We also direct the reader to other recent reviews (1–3).

We have divided this review into two major sections. In one, we have attempted to present an overview of our current understanding of prokaryotic insertion sequences (IS), their diversity in sequence, in organization and in mechanism, their distribution and impact on their host genome, and their relation to their eukaryotic cousins. We discuss several IS-related transposable elements (TE) which have been identified since the previous edition of Mobile DNA. These include IS that use single-strand DNA intermediates and their related “domesticated” relations, insertion sequences with a common region (ISCR), and integrative conjugative elements (ICE), which use IS-related transposases (Tpases) for excision and integration. Several more specialized chapters in this volume include additional detailed information concerning a number of these topics. One of the major conclusions from this section is that the frontiers between the different types of TE are becoming less clear as more are identified. In the second part, we have provided a detailed description of the expanding variety of IS, which we have divided into families for convenience. We emphasize that there is no “quantitative” measure of the weight of each of the criteria we use to define a family. Our perception of these families continues to evolve and families emerge regularly as more IS are added. This section is designed as an aid and a source of information for consultation by interested specialist readers.

The bacterial insertion sequence, IS911, is a member of the large IS3 family. It transposes using a mechanism known as Copy-out–Paste-in. This is a major transposition pathway as judged by the number of transposable elements that use it. This pathway has not only been demonstrated to apply to various other members of the IS3 insertion sequence family, IS2 (1), IS3 (2), and IS150 (3), but has also been adopted by members of at least seven other large IS families: IS1, IS21, IS30, IS256, IS110, ISLre2, ISL3, and their derivatives (see Siguier et al., this volume).

Members of the widespread IS200/IS605 bacterial insertion sequence (IS) family transpose using obligatory single-strand (ss) DNA intermediates. This distinguishes them from classical IS, which move via double-strand (ds) DNA intermediates (see Siguier et al., this volume). Members of this family also differ fundamentally from classic IS in their organization. They carry subterminal palindromic structures instead of inverted repeats at their ends (Figure 1A) and insert 3′ to specific AT-rich tetra- or penta-nucleotides without duplicating the target site. Importantly, the transposase, TnpA, does not share characteristics of the “DDE” enzymes of classical IS. It is a member of the “HuH” superfamily of enzymes including relaxases, Rep proteins of rolling circle replication (RCR) plasmids/single-stranded phages, bacterial and eukaryotic transposases of IS91/ISCR, and helitrons (see Thomas and Pritham, this volume) (1), which all catalyze cleavage and rejoining of ssDNA substrates. IS200, the founding member, was identified 30 years ago in Salmonella typhimurium (2) but there has been renewed interest for these elements since the identification of the IS605 group in Helicobacter pylori (3, 4). Studies of two elements of this group, IS608 from H. pylori and ISDra2 from the radiation-resistant Deinococcus radiodurans, have provided a detailed picture of their mobility (5–10).

Tn10 and Tn5 are composite bacterial transposons (Fig. 1). Both these transposons, as well as their respective IS elements (IS10 and IS50), transpose by a nonreplicative cut-and-paste mechanism. Tn10/IS10 was the first bacterial transposon shown to transpose by the cut-and-paste mechanism and as such provided an early model for this mode of transposition (1, 2). In cut-and-paste transposition the transposon is first excised from flanking donor DNA by a pair of transposase-catalyzed double-strand breaks at each transposon end after which the excised transposon is inserted into a target site. Host repair of the transposon-target DNA junction completes the transposition process and leaves a characteristic target-site duplication.

The bacterial transposon Tn7 is distinguished by the levels of control it displays over when and where it directs transposition and its capacity to utilize different kinds of target sites. Over the 10 years since the second edition of Mobile DNA there have been many advances in our understanding of Tn7 (1). This chapter focuses on new findings since the previous edition and on areas not covered in other review articles on Tn7 (2, 3, 4). One significant finding over the past 10 years is the appreciation of the dissemination of Tn7, and related elements called Tn7-like elements that contain homologs of the Tn7 transposition proteins, in highly diverged bacteria adapted to a remarkable number of different environments (3, 5, 6). The success of these elements very likely stems from the control they have over the targets they select. The well-studied canonical Tn7 element stands as an important model system for understanding the regulation of transposition and provides insight into how Tn7-like elements and more-distantly related elements may function.

Transposable phage Mu has played a historic role in the development of the mobile DNA element field (1). The very first paper that christened this phage after its mutator properties (2) also drew attention to its ability to suppress the phenotypic expression of genes, and suggested that Mu resembled the “controlling elements” postulated by Barbara McClintock to regulate the mosaic color patterns of maize seeds (3). This bold postulate inspired equally insightful early experiments aimed at investigating its mobile properties (4, 5), and led to an influential model for transposition (6), which correctly predicted the cutting and joining steps of the Mu transposition reaction and their attendant DNA rearrangements. The high efficiency of the Mu reaction was responsible for the development of the first in vitro transposition system (7), which was critical for dissecting reaction chemistry as well as the function of several participating proteins (see references 8 and 9). This article focuses on the major developments in Mu transposition since this topic was last reviewed in Mobile DNA II, providing background information as necessary (9).

The ampicillin-resistance transposon Tn3 is the archetype (“Tn3” being synonymous with “Tn1” or “Tn2”; (1)) of a large and widespread family of transposons with representatives in nearly all bacterial phyla including proteobacteria, firmicutes, and cyanobacteria. Family members are modular platforms allowing assembly, diversification, and redistribution of an ever-growing arsenal of antimicrobial resistance genes, thereby contributing along with other mobile genetic elements, to the emergence of multi-drug resistances at a rate that challenges the development of new treatments (2–4). They are also prevalent in horizontal transfer of large catabolic operons, allowing bacteria to metabolize various families of compounds, including industrial xenobiotic pollutants (5, 6).

P transposable elements are one of the best-studied eukaryotic mobile DNA elements in metazoans. These elements were initially discovered in the late 1960’s because they cause a syndrome of genetic traits termed hybrid dysgenesis [1]. The molecular cloning and biochemical characterization of the P element transposition reaction have led to general insights regarding eukaryotic cut-and-paste-transposition. P elements have also facilitated many applications as genetic tools for molecular genetics in Drosophila.

The mariner elements belong to the ITm superfamily of cut-and-paste DNA-transposons. The acronym is derived from the IS630, Tc1, and mariner elements, which represent three major divisions within the grouping. The first member of the superfamily to be documented was Tc1 in Caenorhabditis elegans in 1983 (1). A few years later, Mos1 and IS630 were identified in Drosophila mauritiana and the bacterium Shigella sonnei, respectively (2, 3). A steady stream of ITm elements entered the literature in subsequent years. However, these were the tip of an iceberg and the depth and breadth of their phylogenetic distribution did not start to become apparent until 1993 when PCR experiments using mariner-specific primers revealed their presence in seven orders of insects (4). We now know that ITm is probably the most widespread superfamily of transposons in nature and that they are present in all branches of the tree of life (Fig. 1) (5).

hAT transposable elements are class II DNA transposons that are ancient in their origin. They are widespread across the plant and animal kingdoms and are found in all eukaryotes with the exception of ciliates, diatoms, and the protozoan Trichomonas (1). A survey of eight dicotyledons from five angiopsperm families and eight monocotyledons from two angiosperm families revealed that hAT elements comprised approximately 0.31% of the dicotyledon genomes (representing 6.4% of the total genomic DNA transposons) and 0.46% of the monocotyledon genomes (representing 8.2% of the total genomic DNA transposons) (2). This low abundance is countered by their apparent impact on angiosperm evolution and adaptation in which they have been estimated to contribute to approximately 20% of 65 examples of transposon-mediated alterations to gene function or creation (2). They comprise the most abundant superfamily of class II transposons found in humans, yet no active forms have been found in our species to date. Despite being very ancient in origin, phylogenetic trees constructed from the amino acid sequences of their transposases are often not completely congruent with those arising from sequence comparison of their chromosomal genes suggesting that other factors, such as horizontal transfer, may have played a role in the current distribution of these transposons.

A distinguishing feature of transposable elements (TEs) is their propensity to induce mutations. Among the most mutagenic of all TEs are the Mutator elements of maize. Lines carrying large numbers of these elements can exhibit mutation frequencies 50 to 100 times that of background (1, 2). This is due to a very high transposition frequency, which can exceed one new insertion per element per generation (3, 4), as well as a propensity to insert into or near genes (5). Because they are so mutagenic, Mutator elements have been very useful in both forward and reverse genetic screens in maize and recent high-throughput methodologies have only made the system more so (6). However, in addition to its utility, the Mutator system has also provided important clues as to the consequences of unrestrained TE activity, and the means by which active TEs are controlled by their host. This chapter will provide a review of the biology, regulation, evolution and uses of this remarkable transposon system, with an emphasis on recent developments in our understanding of the ways in which this TE system is recognized and epigenetically silenced.

In 1965, Atchinson and colleagues observed a small (25 nm) contaminant particle within electron micrographs of their adenovirus preparations (1). These contaminants were purified from adenovirus and applied to cells. They were shown to be nonautonomous, as particle production also required adenovirus coinfection (1). These defective, small particles were therefore named adeno-associated virus (AAV) which, even to this day (over 50 years later), remains one of the smallest viruses known to man. Consequently, AAV is a very simple virus with a protein capsid composed of 60 capsid subunits, and a ∼4.7-kb single-stranded linear DNA genome that is framed by inverted terminal repeat sequences (ITRs) (2). Both polarities of the single-stranded genome are individually packaged at similar efficiencies (3). There are three AAV genes identified to date, which collectively mediate genome replication, site-specific integration, capsid production, and genome packaging (4–8). Twelve natural serotypes of AAV have been reported with many additional variants; however, the last 30 years have seen the most work done with AAV serotype 2 because of its amenability to cell culture.

Members of the Tc1/mariner superfamily are probably the most widespread DNA transposons in nature (1). However, these elements appear to be transpositionally inactive in vertebrates due to the accumulation of mutations. In an attempt to isolate potentially active copies, we surveyed a number of fish genomes for the presence of Tc1-like elements from 11 different species. In summary, all the Tc1-like elements that we (2) and others (3, 4) described from the different fish species were defective copies carrying inactivating mutations that accumulated over long evolutionary times. Nevertheless, careful sequence analysis allowed us to predict a consensus sequence that would likely represent an active archetypal sequence. We have engineered this sequence by eliminating the inactivating mutations from the transposase open reading frame. The resurrected synthetic transposon was named Sleeping Beauty (SB), in analogy of the Grimm brothers’ famous fairy tale. SB can be identical or closely related to an ancient transposon that once successfully invaded several fish genomes, in part by horizontal transmission between species (5). The resurrection of SB was the first demonstration that ancient transposable elements can be brought back to life. Before this work was published in 1997, there was no indication that any DNA-based transposon was active in vertebrates. SB not only represents the first DNA-based transposon ever shown to be active in cells of vertebrates, but the first functional gene ever reconstructed from inactive, ancient genetic material, for which an active, naturally occurring copy either does not exist or has not yet been isolated.

The piggyBac transposon superfamily is a relatively recently recognized transposon superfamily. The original piggyBac transposon was isolated from the genome of the cabbage looper moth, Trichoplusia ni in the 1980s. However, the second member of the piggyBac-like element superfamily was not identified until 2000. It was not described as a transposon superfamily in the previous edition of Mobile DNA. In the last decade or so, a number of sequenced genomes have revealed that piggyBac-like elements are actually widespread DNA transposons. Active copies of the transposon have also been identified from another moth species, from frogs, and for the first time, from a mammal. Moreover, because the piggyBac transposon has a broad host spectrum from yeast to mammals, this mobile element has been widely used for a variety of applications in a diverse range of organisms. In this chapter, we will describe the discovery and diversity of the piggyBac transposon, its mechanism of transposition, and its application as a genetic tool. We will also provide two examples of genetic screening that the piggyBac transposon has enabled.

Helitrons are one of three groups of eukaryotic class 2 transposable elements (TEs) so far described. Unique in structure and coding capacity, they are hypothesized to move by a rolling-circle-like replication mechanism via a single-stranded DNA intermediate (1, 2). The other two groups, the classic cut-and-paste and the Maverick/Polinton (3–5) both encode a transposase/integrase and are flanked by target site duplications (TSDs) (for review see reference 6). The repair resulting from the staggered double-stranded joining of the TE to the target DNA creates the TSD flanking the insertion (for reviews see references 7 and 8). Helitrons encode a putative protein called the Rep/Helicase (1), which is predicted to have both HUH endonuclease activity (for review see reference 9) and 5′ to 3′ helicase activity. The HUH endonuclease (the Rep of the Rep/Helicase) (Figure 1) would likely make a single-stranded nick in the host DNA, which is consistent with the lack of TSD observed flanking Helitron insertions. A related protein with an HUH endonuclease domain encoded by various bacterial Insertion Sequence families (IS608, IS91, and ISCR1) makes a single-stranded nick in the host DNA and the insertions are not flanked by TSDs (for review see reference 9).

LTR Retrotransposons

Ty1 elements have a structure that is analogous to simple retroviruses, but they lack an envelope gene (Fig. 1) (1). The most highly characterized Ty1 element is Ty1-H3, which was isolated following its retrotransposition into plasmid DNA (2). Nucleotide coordinates provided in this review specifically refer to Ty1-H3, unless otherwise noted. Ty1 is 5918 base pairs (bp) in length with 334 bp direct repeats, or long-terminal repeats (LTRs), at each end. Ty1 LTRs, like that of most LTR-retrotransposons and retroviruses, have the dinucleotide inverted repeat, 5′-TG…CA-3′ at their termini, and are composed of three distinct domains-U3, R, and U5. These domains are defined by their position in the major sense-strand transcript expressed from Ty1 DNA. The 38-nucleotide U5 region and 240-nucleotide U3 region are unique to the 5′ and 3′ end of the Ty1 RNA, respectively, while the R region of 56 nucleotides is repeated at both ends of the processed transcript. Functional Ty1 elements encode two partially overlapping open reading frames: GAG (historically known as TYA) and POL (TYB). The last three nucleotides of the R region of the 5′ LTR encode the first codon of GAG. The GAG ORF encodes a single functional protein with capsid and nucleic acid chaperone functions. The POL ORF is in the +1 frame relative to GAG and overlaps the last 38 base pairs of GAG. POL encodes three proteins with catalytic activity: protease (PR), integrase (IN), and reverse transcriptase/RNase H (RT/RH).

Long terminal repeat (LTR) retrotransposons occur throughout eukaryotic phyla, but vary greatly in both the types of elements and the representation among species. The majority of some genomes are composed of these elements. The LTR retrotransposons are taxonomically divided into Pseudoviridae (or Ty1/Copia) and Metaviridae (or Ty3/Gypsy) elements based on genome organization and relatedness of proteins encoded (reviewed in references 1–3) (Gypsy Database 2.0, gydb.org). The Ty3/Gypsy family has shared ancestry with retroviruses and some members encode envelope enabling intercellular transmission. The eponymous founding elements of these LTR retrotransposon families, Ty1/Copia and Ty3/Gypsy, exist in Saccharomyces cerevisiae (Ty1 and Ty3) and in Drosophila (Copia and Gypsy).

The fission yeast Schizosaccharomyces pombe, discovered in the late 1800s in East Africa and genetically characterized in the 1950s by Urs Leupold (97), has became a central model for studies of cell cycle, gene expression, and the complex relationship between transposable elements (TEs) and their host. Schizosaccharomyces pombe, also known as fission yeast, can be studied with a sophisticated toolbox of molecular and genetic techniques. The haploid genome is 12.57 Mbp and encodes 5,052 genes distributed among three chromosomes (1). The complete genome sequence of the Leupold isolate revealed that TEs constitute 1.1% of the genome (2) (Table 1). All TE-related sequences in S. pombe derive from long terminal repeat (LTR) retrotransposons. The intact elements present in the Leupold strain are 13 full-length copies of the Tf2 element (3). Recombination that occurs between the LTRs of a full-length element results in solo LTRs that serve as a fossil record of TEs that are no longer present. The Leupold strain contains 249 solo LTRs or LTR fragments that are derived from nine clades of LTR retrotransposons. The youngest clades are the 35 LTR sequences from Tf2 and the 28 LTRs from Tf1, an element related to Tf2 but that is no longer present in the Leupold strain (2). Full-length Tf1 elements are present in wild isolates of S. pombe collected from different geographic regions (3). The transposition activity of Tf1 and the function of its proteins is measured by expressing a plasmid-encoded copy of Tf1 that contains neo (4, 5). Levels of Tf1 transposition correspond to amounts of G418 resistance. wtfs are another form of repeat identified in the Leupold strain (6). They are present in 25 copies and are generally 250 bp downstream of an LTR (2). Their function is unknown, but they appear to encode protein and their transcription is strongly induced during meiosis (7, 8).

Retroviruses are the only animal viruses that require the stable integration of genetic information into the genome of the host cell as an obligate step in replication. All members of the virus family Retroviridae accordingly carry with them integrase, which is a specialized DNA recombination enzyme. Integration is required for efficient expression of retroviral genes by the host transcriptional machinery and hence productive virus replication. The integrase encoded by human immunodeficiency virus type 1 (HIV-1) is thus an important antiviral target in the fight against HIV/AIDS (1). Integration additionally ensures replication and segregation of viral genes to daughter cells during cell division. Stable association of HIV-1 with cellular DNA underlies the notoriously incurable nature of AIDS despite highly active antiretroviral therapy (HAART) (2).

Retroviruses integrate a DNA copy of the viral genome into cellular DNA as an obligatory step in the viral replication cycle. Once integrated, the viral DNA is stably replicated with cellular DNA through cycles of DNA replication and cell division. The first clues regarding the mechanism of integration came from genetic experiments (1, 2, 3). Mutations at two locations within the viral genome resulted in a phenotype in which reverse transcription occurred normally but the viral DNA failed to integrate. These mutations mapped to regions which we now know encode the viral integrase (IN) protein and the ends of the viral DNA sequence recognized by IN. The finding that viral DNA within extracts of infected cells efficiently integrated into exogenously added target DNA in vitro (4, 5, 6) facilitated biochemical studies of integration. This in vitro integration system enabled the DNA breaking and joining events to be unambiguously determined (6, 7). It also established that the viral DNA forms part of a large nucleoprotein complex termed the preintegration complex (PIC) (8). Later biochemical experiments showed that viral IN protein is necessary and sufficient to carry out the DNA cutting and joining steps of integration in the presence of divalent metal ions (9, 10, 11, 12, 13). Subsequent studies established reaction conditions that facilitated efficient concerted integration of both viral DNA ends into the target DNA molecule in vitro (14, 15, 16, 17, 18, 19). This chapter focuses on mechanisms of targeting integration and the contributions of viral and cellular proteins. For structural information on nucleoprotein complexes involved in retroviral DNA integration see the chapter by Engelman and Cherepanov. For detailed discussions of the mechanisms of DNA transposition of related elements see other chapters in Mobile DNA III.

The conversion, well over a billion years ago, of the RNA world into the modern configuration, in which genetic information is maintained primarily in DNA, required reverse transcriptases (RTs), enzymes that were able to copy genetic information from RNA into DNA, a process called reverse transcription. With minor (but important) exceptions, for example telomerases, normal cellular processes no longer require reverse transcription, which is now primarily employed in the replication of hepadnaviruses, retroviruses, and retrotransposons. This chapter will cover the process of reverse transcription, and the RTs that are involved in the replication of retroviruses and the related long terminal repeat (LTR) retrotransposons, which have lifestyles that are similar to a retrovirus that has either lost, or never acquired, the ability to be transmitted horizontally from one cell to another. The RTs of, and reverse transcription by, non-LTR retrotransposons will be considered in the chapters that describe these elements (49–55). A substantial fraction of the work that has been done on reverse transcription and RT has focused on human immunodeficiency virus type 1 (HIV-1); this is entirely appropriate given the extent of the HIV epidemic and the fact that HIV-1 RT is the target of two important classes of anti-HIV drugs. Thus, a substantial portion of this review will describe data and insights obtained in experiments that were done with HIV-1 and HIV-1 RT. However, there are some important differences in the RTs, and the process of reverse transcription, among the different retroviruses and LTR retrotransposons; these differences will also be considered, at least briefly. The literature on RT and reverse transcription is both vast and complex. Any review, including this one, can present no more than a superficial overview of what is known. Much that is important has been omitted, some intentionally, some inadvertently; for these omissions, the author apologizes. For those who are interested, a number of helpful reviews have already been published, most of which are focused on retroviral RTs (1, 2, 3, 4).

Mammalian genomes have accumulated millions of retrotransposed sequences during evolution. This material can be divided into long terminal repeat (LTR) retrotransposons that include the endogenous retroviruses (ERVs), as well as long and short retrotransposons lacking LTRs, known as LINEs and SINEs, respectively. ERVs are defined as inherited genetic elements closely resembling the proviruses formed following exogenous retrovirus infection. In this chapter we describe the discovery, classification, and origins of ERVs in mammals, consider cellular mechanisms that have evolved to control their expression, and discuss the biological consequences, both positive and negative from the host’s standpoint, of ERV inheritance.

Two unique features of retroviruses are their genome organization and strategies for gene expression. While the gene order of retroviruses is conserved, the synthesis of retroviral proteins can be controlled by different mechanisms. Variations include the ways in which RNA splicing is used to produce mRNAs from a single long transcript, a portion of which must remain unspliced to serve as the viral genome. With these general principles in mind, an overview of the retrovirus family follows, along with a description of how the gene organization of its members relates to gene expression and the viral entry process. Detailed descriptions of the molecular aspects of these processes can be found in the comprehensive Retroviruses (2) an overview chapter in Fields Virology (3), or a number of recent reviews that focus mainly on the early steps in the reproductive cycle of human immunodeficiency virus type-1 (HIV-1) (4–11).

Non-LTR Retrotransposons

R2 elements exclusively insert into 28S rRNA genes (Figure 1). As a result of this specificity, R2 is one of the more tractable mobile elements to study and, thus, is now among the best understood elements both in terms of its mechanism and its population dynamics. The R2 element was first identified in the rDNA loci of Drosophila melanogaster in the early 1980’s (1, 2), when little was known of the structure or abundance of mobile elements in eukaryotes. In fact, the exclusive residence of the element at a specific site in the 28S gene initially suggested that it might be an intron. However, the findings that only a fraction of the genes contained the insertion, that 28S genes containing the insertion did not appear to be transcribed, and that many of the insertions had a sizeable deletion at the 5′ end all argued against its role as an intron. Insertions were soon identified at the same position of the 28S rRNA gene in many other species of insects (3, 4, 5). The complete sequence of the insertions in both D. melanogaster and Bombyx mori revealed a large open reading frame (ORF) encoding a reverse transcriptase that had greatest sequence similarity to that of non-LTR retrotransposons (6, 7). R2 differed from most non-LTR retrotransposons, however, in that it only contained a single ORF. Furthermore, rather than an encoded apurinic endonuclease (APE) located amino-terminal to the reverse transcriptase (8), R2 encoded carboxyl terminal to the reverse transcriptase an endonuclease with an active site more similar to that of certain restriction enzymes (9).

DNA transposons are the mobile elements that move by a “cut and paste” mechanism (1, 2). In contrast, retrotransposons encode reverse transcriptase, and move by a “copy and paste” mechanism. The process of retrotransposon insertion into genomic locations involves an RNA intermediate. Retrotransposons can be classified into long terminal repeat (LTR) and non-LTR retrotransposons. LTR retrotransposons have LTRs at both ends and resemble retroviruses in both structure and integration mechanisms. Non-LTR retrotransposons comprise two subtypes, long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs). Non-LTR retrotransposons are in general 4 to 7 kb long and do not carry LTRs, and their retrotransposition mechanism is different from that of LTR retrotransposons. SINEs are nonautonomous retrotransposons of 100 to 500 bp that do not encode proteins. It has been proposed that the proteins encoded by LINEs are the source of the enzymatic retrotransposition machinery of SINEs (3, 4, 5).

Transposable elements (TEs) or “jumping genes” historically have been disparaged as a class of “junk DNA” in mammalian genomes (1, 2). The advent of whole genome DNA sequencing, in conjunction with molecular genetic, biochemical, and modern genomic and functional studies, is revealing that TEs are biologically important components of mammalian genomes. TEs are classified by whether they mobilize via a DNA or an RNA intermediate (detailed in reference 3). Classical DNA transposons, such as the maize Activator/Dissociation elements originally discovered by Barbara McClintock, move via a DNA intermediate (4, 5). Their mobility (i.e., transposition) can impact organism phenotypes such as corn kernel variegation. Retrotransposons, the predominant class of TEs in most mammalian genomes, mobilize via an RNA intermediate by a process termed retrotransposition (6).

Group II introns are remarkable mobile retroelements that use the combined activities of an autocatalytic RNA and an intron-encoded reverse transcriptase (RT) to propagate efficiently within genomes. But perhaps their most noteworthy feature is the pivotal role they are thought to have played in eukaryotic evolution. Mobile group II introns are ancestrally related to nuclear spliceosomal introns, retrotransposons and telomerase, which collectively comprise more than half of the human genome. Additionally, group II introns are postulated to have been a major driving force in the evolution of eukaryotes themselves, including for the emergence of the nuclear envelope to separate transcription from translation.

Mobile genetic elements have repeatedly been called to duty in life-and-death struggles between hosts and their pathogens (1, 2, 3, 4). One of their greatest utilities is the capacity to create DNA sequence diversity in protein-encoding genes, thereby generating protective shields to defend against enemies, or to create arsenals of weapons to exploit potential hosts. After decades of research, considerable evidence now suggests that the V(D)J recombination system, which is essential for generating adaptive immunity in vertebrates, has evolved from an ancestral DNA transposon (2, 3, 4). The site-specific recombinases responsible for V(D)J recombination, RAG1 and RAG2, are able to catalyze DNA transposition in a manner analogous to DNA transposons (2), and the RAG1 core and V(D)J recombination signals are likely derived from the transposase and terminal repeats of an ancient DNA transposon similar to Transib (3, 4). Ironically, pathogens also exploit mobile genetic elements to generate protein diversity, altering their antigenic characteristics to evade host immunity (1). This process of antigenic variation is employed by Borrelia species, Neisseria gonorrhoeae, and other pathogens. Bacterial antigenic variation often involves a single, highly expressed gene encoding an abundant surface protein and dozens of archived ones that are homologous but different from each other. Replacing all or part of the expressed copy by DNA transposition leads to antigenic variation on the surface of the pathogen.

Reverse transcriptase (RT) is generally considered a eukaryotic enzyme because it is prevalent in eukaryotes and was first characterized from eukaryotic sources. Discovered in 1970 in the Rous Sarcoma and murine leukemia viruses (1, 2), RT has since been studied for its central role in the replication of many eukaryotic genetic elements including retroviruses (e.g., HIV-1), pararetroviruses, hepadnaviruses, long terminal repeat (LTR), and non-LTR retroelements, Penelope-like elements, and telomerase (3, 4, 5, 6, 7, 8, 9, 10). Over the years, the accumulated studies of RT have painted a picture in which the enzyme functions primarily as the replicative enzyme of selfish DNAs (viruses, retrotransposons), while occasionally becoming domesticated to perform useful cellular functions. These functions include the maintenance of chromosomal ends (telomerase, Drosophila Het-A elements) (10, 11) and contributions to genomic change (both beneficial and deleterious) through pseudogene formation or other retroprocessing events (12, 13, 14, 15).

Eukaryote retrotransposons have been organized into four major groups on the basis of their mechanistic features, open reading frame organization and reverse transcriptase (RT) phylogeny: long terminal repeat (LTR) retrotransposons, tyrosine recombinase (YR) encoding elements, Penelope-like elements (PLEs) and long interspersed nuclear elements (LINEs) (1). The major feature distinguishing the tyrosine recombinase-encoding elements from other retrotransposons is that the YR elements encode a tyrosine recombinase (2, 3) that performs the role of integration. Other retroelements employ integrases (LTR retrotransposons) or endonucleases (LINEs and PLEs). Tyrosine recombinases (YRs) are widespread in prokaryotes, typically involved in site-specific recombination between similar or identical DNA sequences (4). Representative examples include the Cre recombinase of bacteriophage P1, the FLP recombinase of yeast 2-micron circle plasmids, and the XerC and XerD recombinases of Escherichia coli.