Evolutionary developmental biology and genomics

Abstract

Reciprocal questions often frame studies of the evolution of developmental mechanisms. How can species share similar developmental genetic toolkits but still generate diverse life forms? Conversely, how can similar forms develop from different toolkits? Genomics bridges the gap between evolutionary and developmental biology, and can help answer these evo–devo questions in several ways. First, it informs us about historical relationships, thus orienting the direction of evolutionary diversification. Second, genomics lists all toolkit components, thereby revealing contraction and expansion of the genome and suggesting mechanisms for evolution of both developmental functions and genome architecture. Finally, comparative genomics helps us to identify conserved non-coding elements and their relationship to genome architecture and development.

"What characterizes the living world is both its diversity and its underlying unity." (Jacob, 1977)1

How can a conserved, broadly shared developmental genetic toolkit generate today's amazing diversity of life forms2, 3, 4? The conservation of the toolkit became evident from dramatic discoveries such as the finding that Hox genes control anterior–posterior patterning in both a fly and a mouse5, and that the human paired box gene 6(PAX6), which is necessary for eye development, can cause cells in the primordial wing of a fly embryo to become eye cells6. Early attempts to explain the paradox that was implied by Jacob 30 years ago suggested that phenotypic diversity derives from differences in where and when genes are expressed, rather than in the products that the genes encode1, 7. Owing to rapid genome-sequencing technologies, biologists have access, for the first time, to the full list of toolkit components for a wide variety of species. Comparison of whole-genome sequences can reveal changes to the toolkit in an evolutionary perspective and suggest hypotheses for the origin of phenotypic diversity.

This Review explores just four topics from the various ways that genomics impacts evolutionary developmental biology, with a bias towards examples from animals with which the authors are most familiar. Because knowing lineage relationships is essential to map the orientation of trait gain and loss, we first describe the impact of genomic data on our understanding of organismal phylogenies. The second section shows how genome contraction events can help identify trait-specific genes, affect genome architecture and lead to alternative modes of development. Genome contraction reveals an inverse paradox: cases in which organisms develop fundamentally similar morphologies (phenotypic unity) despite important differences in genetic toolkits (genetic diversity). Third, we explore how genome-expansion events can augment the complexity of the developmental toolkit. The final section examines the impact of comparative genomics on understanding the influence of genome architecture on gene regulation as a force for phenotypic diversity.

Phylogenomics and developmental biology

Accurate knowledge of phylogenies among organisms is important to understand the direction of change when one lineage possesses a trait that is missing from its sister lineage. Was the trait absent from the last common ancestor and gained in one lineage? Or was it present in the last common ancestor but lost in one lineage? Phylogenomics can improve the accuracy of phylogenetic analysis by using thousands of concatenated, unambiguously aligned amino-acid positions from hundreds of genes supplied by full genome sequences from many organisms8, 9. Furthermore, the presence or absence of rare genomic changes — such as gene fusions, transposable element insertions or intron positions — provides additional valuable markers to assess phylogenetic relatedness. Nevertheless, phylogenomic analysis has limitations and can sometimes lead to contradictory results, such as the lingering controversy regarding the validity of the taxon ecdysozoa10. Problems can arise due to rapid, ancient cladogenesis, the abundance of homoplastic characters, rapidly evolving positions within proteins and rapidly evolving lineages9. Nevertheless, recent advances in methods to detect systematic errors, improvements in data quality, wider taxonomic sampling and the identification of new markers of biological history help to improve our knowledge of the tree of life11.

Some recent examples show the power of phylogenomics for evolutionary developmental biology (Fig. 1). Classically, cnidarians, a basally diverging group of animals, have a radial, bag-like body plan with a body cavity that opens to the exterior through an orifice that acts both as a mouth and an anus12. However, phylogenomic analysis showed that a muscular parasitic worm (Buddenbrockia plumatellae) is a cnidarian13. This finding increases the known diversity of cnidarian body plans and poses new questions for understanding the genetic control of cnidarian development. Similarly, phylogenomic analysis has finally solved the enigmatic evolutionary position of Xenoturbellida, a ciliated marine worm that was initially thought to be related to acoelomorph flatworms; Xenoturbellida now is placed in a new phylum within the deuterostomes (Fig. 1) as the sister group of the ambulacraria14. This example broadens the known morphologies of the sister group to the chordates, our own phylum.

Phylogenetic information is essential to determine whether developmental features that are present in one group, but missing from the sister group, have resulted from the gain of new genetic mechanisms in one lineage or the secondary loss of ancestral mechanisms in the other. Recent revisions are highlighted in yellow: Buddenbrockia, which extends the morpho-space of Cnidarians from solely bag-like animals to worm-like forms; Xenoturbella, which represents a new phylum of Deuterostomes; and the revision of urochordates, rather than cephalochordates, as the sister of vertebrates, which provides new insights into the origin of character states in chordates. R1–R3, rounds of whole-genome duplication.

A third example of the power of phylogenomics dethrones cephalochordates (for example, amphioxus) as the long-assumed surviving sister lineage of vertebrates12. This position is now occupied by urochordates (for example, ascidians and larvaceans), making a new group, the olfactores14, 15. Rare genomic changes that are shared by urochordates and vertebrates, including the domain organization of the cadherin gene family and a unique amino-acid insertion in the coding region of fibrillar collagen genes, support this conclusion16, 17. In addition, some morphological features support the constitution of the new group, such as neural crest-like cells and epidermal placodes18, 19, 20. Other features in stem olfactores, such as a complex tripartite brain, might have been secondarily simplified in urochordates rather than having evolved in vertebrate phylogeny21, 22, 23. Future work will distinguish between features that were absent in stem olfactores and evolved in vertebrates from features that were possessed by stem olfactores and lost secondarily in urochordates (for example, Refs 24,25).

Genome contraction and development

Accurate phylogenies help us to identify traits that were present in a clade's common ancestor but were secondarily lost, which can lead to the loss of genes that were used exclusively for that trait. Conversely, genes that were assumed to be important for a given trait can be lost without the loss of the trait, the inverse paradox. This section reviews three cases of genome contraction that involve cilia and flagella, DNA methylation and retinoic acid (RA) signalling to show how analysis of genome contraction can identify genes that are important for a given trait, and how investigation of genome contraction can suggest hypotheses for the evolution of genome architecture and for the innovation of alternative modes of development.

Trait loss illuminates trait-specific genes. Comparative genomics provides a powerful tool to discover trait-specific genes on the basis of the assumption that most genes that are expressed exclusively in a given trait are lost if the trait is lost26, 27. The strategy compares genomes in a clade, the members of which vary with respect to the presence or absence of an ancestral trait (Fig. 2a). The intersection of genes in genomes with the trait, after subtracting genes in genomes without the trait, is enriched in candidate trait-related genes (Fig. 2b). Genome comparisons at stringencies that are appropriate for evolutionary distance can suggest candidates for the basic core of common trait-specific genes. Comparison of different genome subgroups can identify subsets of candidate trait-related genes that are involved in variably present subcomponents of the trait (Fig. 2b). The power of this strategy was demonstrated in comparative genomics of cilia and flagella, microtubule-based organellar whips, which are important for development of left–right asymmetries, heart formation, vertebrate photoreceptors, and invertebrate mechano- and chemoreceptors28. Comparative genomics of organisms with cilia (such as flies, roundworms, green algae, protists and humans), and organisms that lack cilia (such as plants, yeasts and slime moulds)27, and comparison of organisms with flagella (such as green algae, flies, roundworms, sea squirts, mice and humans) and organisms that lack flagella (such as plants)26, identified several hundred candidate genes related to cilia or flagella. Finding more than 80% of ancestral genes that are known to be involved in cilia function verified the method. The analysis identified a novel family of proteins (OSEG) that are essential for the development of cilia in Drosophila melanogaster27. Studies in silico, in vitro and in vivo in Caenorhabditis elegans validated flagella-related genes, and identified a novel human gene (BBS5) as defective in Bardet–Biedl syndrome26. Further applications of this genomic strategy will facilitate the identification of candidate genes that are important for the development and evolution of a variety of traits.

Comparison of diverse genomes in a clade that have variably lost a trait (shown in part a) identifies candidate trait-related genes that are present in organisms that have the trait but are absent in organisms that lack the trait (shown in part b). Combinatorial comparisons at different levels of stringency that depend on evolutionary distance provides different sets of gene candidates that can reveal the core of trait-specific genes shared among all organisms and candidate genes for trait subcomponents shared by only some of the organisms.

Contraction of the DNA-methylation toolkit. Gene silencing by DNA methylation has a fundamental role in gene regulation during vertebrate development29. DNA methylation is an epigenetic mechanism based on cell inheritance without mutation30. Vertebrate genomes are heavily methylated, but the genomes of many non-vertebrates are much less methylated31, 32, 33. How does the evolution of this epigenetic system correlate with the evolution of developmental mechanisms, the preservation of genome architecture and the generation of phenotypic diversity?

Evolutionary changes that affect chromatin-based epigenetic systems are potentially important for phenotypic diversity. Unmethylated regions of the genome usually contain highly expressed genes that are precisely regulated by transcription factors, but highly methylated regions often contain less active, more broadly expressed genes34, 35. This distribution suggests that DNA methylation helps to suppress spurious initiation of transcription. DNA methyltransferases (DNMTs) are key players in DNA methylation36. DNMT3 methylates DNAde novo during development, and DNMT1 guides subsequent epigenetic inheritance. DNMT2 shows low activity on DNA but higher activity on specific tRNA molecules, although its full role remains enigmatic37. Methylated DNA recruits methyl-CpG-binding domain proteins (MBDs) and their associated histone deacetylases (HDACs), resulting in tighter chromatin packaging and locally reduced access of transcription factors to target genes. Epigenetic mechanisms that alter chromatin condensation, and thereby help to activate or silence genes in a chromosome neighbourhood, thus regulate genes on the basis of genome architecture, a level of regulation that is superimposed on the level of gene-specific cis-regulatory elements38.

Fruitflies and nematodes have little or no methylated DNA, raising doubts about its general significance for development in non-vertebrates. However, genome analyses reveal that, although the nematode C. elegans lacks dnmt genes, related nematodes preserve a dnmt2-related gene, suggesting a recent loss of the methylation machinery in nematode evolution32. Among insects, fruitflies and mosquitoes have only DNMT2, and a silk moth has both Dnmt1 and Dnmt2 (Refs 39,40), but the honeybee possesses a full set of Dnmt genes that are functionally comparable to their vertebrate counterparts. These results show that the full set of Dnmt genes was present in the last common ancestor of bilaterians, but that the Dnmt toolkit experienced multiple independent contractions in protostome lineages41. Among deuterostomes, our analyses of genome databases suggest that sea urchins, cephalochordates and ascidian urochordates have the full complement of Dnmt genes, whereas the larvacean urochordate Oikopleura dioica has only Dnmt2, revealing a contraction of the larvacean toolkit despite the morphological similarities of the ascidian and larvacean larvae. Our analysis of the recently available genome sequence of the cnidarian Nematostella vectensis identified all three Dnmt genes. These results show that the full set of Dnmt genes, which was already present in the last common ancestor of radial and bilaterian animals, has been truncated in larvaceans and multiple times in protostomes, but not in some cnidarians or in the vertebrate lineage.

Contraction of the Dnmt toolkit: an inverse paradox. What allowed the DNA-methylation toolkit to contract in some lineages but not in others? How can the fundamentally similar body plans of a bee and a fly develop either with or without regulation provided by DNA methylation? This problem illustrates the inverse paradox (genetic diversity despite body-plan unity), and any of several hypotheses might explain it. First, developmental mechanisms might have become independent of DNA methylation in a stem ecdysozoan, leading to relaxation of the selective constraints to maintain DNA methylation; lineages that have maintained DNA-methylation machinery, such as that of the honeybee, may use it for non-developmental functions such as imprinting or complex social behaviours41. An alternative hypothesis is that the epigenetic system of chromatin change is important for development in lineages that lack Dnmt genes, but alternative mechanisms provide this function. The fidelity of epigenetic gene silencing is probably increased by interactions between the DNA-methylation and histone-modification systems42. In flies, factors other than DNMTs may cooperate with histone modification to facilitate changes in chromatin structure43.

Evolution of genome architecture and epigenetics. The evolution of genome architecture might help to explain how the DNA-methylation toolkit can contract in some lineages but not others. Chromosome rearrangements can disrupt coherent regions of epigenetic gene regulation because genes that are translocated from a highly methylated region could become deregulated after transfer to an undermethylated region. This suggests that the evolution of genome architecture can depend on the epigenetic system. This hypothesis predicts thatsyntenies should tend to be conserved between lineages that preserve ancestral epigenetic systems. This prediction agrees with results from recent genomic analyses showing that vertebrates share with cnidarians, but not with well investigated protostomes, both extensiveconserved syntenies and a full Dnmt toolkit44. In the case of O. dioica, the loss of two of the three Dnmt genes might be linked to the contraction of their genome, the smallest among chordates, which was accompanied by extensive genomic rearrangements45, 46. The lack of conserved patterns of nuclear compartmentalization, and the lack of correlation between active transcription and domains that are rich in histone-specific modifications47, suggest that the epigenetic system might be altered in larvaceans, perhaps as an adaptation to theirdeterminative mode of development and rapid life cycles. Future functional analysis will be needed to understand the importance of variation in the epigenetic control toolkit, the evolution of genome architecture and their impact on mechanisms of development.

Genome contraction, RA and Hox clusters. The relationship of genome architecture to gene regulation is evident in the Hox clusters, groups of tandemly duplicated genes that encode homeodomain-containing transcription factors that are important for organizing the bilaterian anterior–posterior body axis (reviewed in Ref. 48). In vertebrate genomes, the order of Hox genes roughly matches both the order of expression along the body axis (spatial collinearity) and the order of expression during development (temporal collinearity)49. Spatial collinearity depends mainly on cis-regulatory elements, but temporal collinearity depends on the architecture of the Hox cluster45, 49, 50.

In vertebrates and cephalochordates, RA helps to regulate temporal collinearity51, 52. RA gradually increases the portion of a Hox cluster that is poised outside territories of condensed chromatin, allowing genes along the cluster to gradually access transcription machinery over time53, 54. RA positions decondensed chromatin with respect to Hox genes by chromatin remodelling induced by DNA methylation, histone methylation, acetylation and deacetylation55, 56. RA binds to a retinoic acid receptor (RAR), which heterodimerizes with a retinoid-X receptor (RXR) at RA-response elements in or near target genes57. RA-activated RAR recruits protein complexes that contain histone acetyltransferases that induce gradual changes in chromatin structure. The classical genetic machinery for RA action also includes enzymes that synthesize RA (such as ALDH1A) and degrade RA (such as CYP26), which together regulate the distribution of RA during development58.

Because the role of RA in anterior–posterior axial patterning seemed to be limited to chordates, and because the main components of RA signalling (ALDH1A, CYP26 and RAR) had been described only in chordates, it was supposed that the 'invention' of RA genetic machinery was a key innovation for development of the chordate body plan, probably mediated by Hox genes in axial patterning (reviewed in Refs 59,60). However, recent genome analyses revealed the unexpected presence of RA genetic machinery in non-chordate deuterostomes61, 62, 63, suggesting that RA signalling is not a chordate invention, or that the chordate innovation was the redeployment of an ancient signalling system for new developmental roles, including the regulation of Hox-cluster expression. Functional analysis of RA action in the development of hemichordates, which share many developmental similarities with chordates64, will help in evaluating these hypotheses.

Another unexpected result comes from the genome of O. dioica. Larvaceans are the only urochordates that maintain a chordate body plan as adults, and yet its deep genome database lacks the classical genes for RA synthesis, degradation and reception24 (Fig. 3). Because cephalochordates, which diverged basally among chordates (Fig. 1), have the RA toolkit, it must have been secondarily lost in larvacean evolution but preserved in the ascidian lineage. A study of RA action showed that it does not cause homeotic posteriorization in larvacean embryos, in contrast to vertebrates and cephalochordates24. These results show that a chordate can develop the phylotypic body plan without genes for the classical morphogenetic role of RA (the inverse paradox), and suggest that larvaceans use alternative mechanisms for the development of chordate features.

Stem urochordates adopted a determinative mode of development, reduced the size of their genomes, lost temporal collinearity of Hox-gene expression, broke up their Hox-gene cluster and lost the need to use retinoic acid (RA) for anteroposterior axial patterning associated with the reorganization of their CNS. Larvaceans lack the classic genetic machinery to synthesize, degrade and detect RA, and they also lack a complete genetic system for DNA methylation (carried out by DNA methyltransferases (Dnmts), but nevertheless build a complete chordate body plan that is retained throughout life. Mouse image courtesy of Getty Images.

Differences in RA toolkits between larvaceans and ascidians are not reflected in drastic differences in embryonic development — the inverse paradox. Evidence suggests that axial patterning independent of RA-signalling is actually a shared, derived feature of urochordates24,65. Excess RA in both larvaceans and ascidians seems to alter organ morphogenesis rather than causing Hox-related homeotic transformations. This finding suggests that stem urochordates evolved an alternative developmental mechanism that allowed anterior–posterior axial patterning to become independent of RA (Fig. 3). The RA machinery in ascidians may perform functions such as asexual reproduction and regeneration rather than embryonic axial patterning66, 67.

RA contraction and Hox-cluster disintegration: a model. Although Hox-cluster genes occupy contiguous regions in cephalochordate and vertebrate genomes, and perhaps did so in the last common ancestor of all bilaterians (but see Ref. 49), in many genomes Hox-cluster genes are separated into two or more subclusters; for example, in the ascidian Ciona intestinalis nine Hox genes appear at five different genomic locations68, and in O. dioica all Hox genes are individually dispersed in the genome45 (Fig. 3). What features correlate with intact Hox clusters, and what are the consequences of Hox-cluster disintegration? Clearly, Hox-cluster disintegration will thwart the vertebrate mechanism of RA-induced gradual expansion of chromatin relaxation.

Is altered RA signalling in the axial patterning of urochordate embryos causally related to the break up of Hox clusters? The following model could explain most of the data (Fig. 3). Strong selection for rapid embryonic development and life cycle (egg to egg in less than 10 days for O. dioica) might simultaneously select for both determinative development, which decreases dependence on extracellular signals such as RA to establish embryonic coordinates, and genome diminution, which is often associated with chromosome rearrangements that can disperse former gene neighbours across the genome (such as O. dioica Hox-cluster genes) and disrupt gene regulatory mechanisms that rely on long-range enhancers or chromosome territories, as do vertebrate Hox clusters. This model is consistent with the absence of temporal collinearity of Hox expression in urochordates45, 68 (Fig. 3). Under this model, because RA signalling and DNA methylation become less important, genes that are necessary for their action, such as Rar genes and Dnmt genes, are free to degrade. Under this model, the disintegration of the Hox cluster and modification of RA signalling in stem urochordates might have led to interesting alternative mechanisms of anterior–posterior axial patterning in urochordate embryos (Fig. 3).

Genome expansion and precision tools

Whereas genome contraction can be associated with the evolution of alternative genetic mechanisms, genome expansion can contribute to the evolution of old tools into new, increasingly specialized devices. In Ohno's classical model69, one member of a pair of duplicated genes retains the original function whereas its paralogue either disappears by accumulation of detrimental mutations (called non-functionalization70) or acquires rare beneficial mutations that confer new, positively selected functions (neofunctionalization). The duplication, degeneration, complementation hypothesis (or DDC model)70 suggests a third alternative for duplicate preservation: subfunctionalization, the complementary partitioning of ancestral structural and regulatory subfunctions between two duplicate genes so that the sum of their functions equals that of the parental single-copy gene (see also Ref. 71). In the DDC model, it is important to distinguish between subfunctionalization, the initial event that preserves two duplicate genes, and subfunction partitioning, events that occur after the initial preservation of duplicate gene copies70. The DDC model predicts that evolutionary constraints on duplicated genes can differ after subfunctionalization owing to relaxed pleiotropy. Because a gene with fewer subfunctions would have fewer diverse tasks, it might more readily accommodate mutations that confer novel functions, leading to the evolution of new tools that are more specifically tailored to specific jobs and thereby contributing to the generation of phenotypic diversity.

Genome expansion and lineage divergence. Lineage-specific non-functionalization and subfunction partitioning can, in principle, provide genetic population-isolating mechanisms72, 73. This is because F1 hybrids from the mating of two populations that are fixed for reciprocal non-functionalized or subfunctionalized alternative gene duplicates will produce some F2 individuals (about 1 in 16 individuals, according to Mendel) that are doubly homozygous for alleles that lack a specific paralogue or subfunction; such individuals will die if the original gene subfunction is essential. If, as after genome duplication, genes on several chromosomes independently experience DDC, then most of the F1 offspring of two populations will be nearly sterile. This suggests that genome expansion can be an important force for evolutionary diversification69, 73, 74.

It is likely that two rounds (R1 and R2) of whole-genome duplication occurred during early vertebrate evolution , and another round occurred at the base of the teleost lineage (R3)73, 75, 76, 77, 78. (Fig. 1). Comparative analysis of teleost and human genomes revealed chromosome rearrangements that occurred over a short evolutionary time leading to rapid genome reorganization77, 79. These events, given the appropriate ecological opportunity, might have facilitated the acquisition of vertebrate innovations and the teleost radiation.

Specialization of FGFs, tools for developmental signalling. As an example of how genome duplication provides opportunities for the evolution of specialized developmental tools, consider the functional evolution of fibroblast growth factor (FGF) gene paralogues that appeared during the vertebrate and teleost radiations. FGFs comprise a family of secreted signalling molecules that control development and homeostasis80. Genome analysis shows that mammals have at least 22 Fgf genes in seven subfamilies — FgfA (1/2), FgfB(3/7/10/22), FgfC (4/5/6), FgfD (8/17/18), FgfE (9/16/20), FgfF (11/12/13/14) and FgfG (19/21/23) — that seem to have expanded in R1 and R2 from seven ancestral proto-Fgf genes81, 82 (Fig. 4). The genomic location of Fgf genes helps us to infer their evolutionary origin. For example, Fgf4 of the C group and Fgf19 of the G group are tightly linked in a 100-kb segment, as are Fgf6 of the C group and Fgf23 of the G group, suggesting that FgfC and FgfG subfamilies arose as tandem duplicates before R1 and R2 (Ref. 81). Thus, ancestral chordates are likely to have had six Fgf genes (FgfA, B, C/G, D, E and F). Consistent with this hypothesis for the origin of Fgf genes, the ascidian C. intestinalis, whose lineage diverged before R1 and R2 (Fig. 1), possesses at least five of the six proto-Fgfs, plus one unassigned Fgf83. However, few Fgf genes are found in genomic databases of protostomes, for example, only two in the nematode and three in the fruitfly81, 82, raising the question of when Fgf subfamilies evolved.

Analysis of the genome of the sea anemone N. vectensis (Cnidaria, Fig. 1) helped to answer this question. Nematostella vectensis has 13 Fgf genes, many of which might have arisen by lineage-specific gene duplication84 but, according to our analysis, at least four of the six proto-Fgf subfamilies are present in N. vectensis (C.C., H.Y. & J.H.P., unpublished results: NvFGF1D, FgfA; Nv211797, FgfB; NvFgf8A, FgfD; Nv212165, FgfE). Together with the analysis of Wnt genes (cnidarians have 11 of 12 known Wnt subfamilies, whereas only 6 are present in ecdysozoans85), the analysis of the cnidarian genome reveals an unexpected complexity of the developmental toolkit in basally diverging metazoans, and suggests that a substantial part of the basic chordate developmental toolkit existed already in the last common ancestor of all eumetazoans. Additional genome sequences for a broader sample of organisms will provide a better picture of toolkit history, and will illuminate its consequences in developmental diversification over major animal transitions.

Teleost FgfD subfamily expansion. Gene duplication and loss can alter toolkit composition, but the assignment of toolkit orthologues for species diverging on either side of a developmental transition is important, because biologists can learn how gene functions change and developmental tools specialize only by comparing functions of orthologues in the context of accurate phylogenies. In the past, limited data sometimes rendered misleading gene homologies and incorrect gene nomenclature; for example, zebrafish fgf8b was initially mischaracterized as fgf17 (Refs 86,87). Whole-genome sequences from various species help us to overcome these problems by providing access to all members of each gene family, their positions in the genome and comparative syntenic information across phylogenies.

With no gene loss, R1, R2 and R3 should have produced four orthologues in tetrapods and eight in teleosts. From an ancestral FgfD gene, which is represented today by fgf8/17/18 in the C. intestinalis genome83, vertebrates, taken together, have copies of four predicted paralogues (Fgf8, 17, 18 and 24) (Fig. 4), but only three of the four are present in tetrapods (Fgf8, 17 and 18), demonstrating lineage-specific loss of Fgf24 after the divergence of teleosts and tetrapods. Six of the eight predicted paralogues are present in teleosts86, 87, 88, 89,90 (Fig. 4). Comparative analysis of teleost genomes also reveals lineage-specific loss of Fgf paralogues: sticklebacks and pufferfish seem to have lost one fgf18 gene after diverging from the zebrafish lineage, and the medaka lineage seems to have lost an additional fgf17 andfgf18 gene. The hypothesis that lineage-specific subfunction partitioning can erect population-isolation mechanisms predicts this observed type of lineage-specific paralogue loss.

Comparative studies of the FgfD group help us to understand the relative roles of subfunction partitioning and non-functionalization, and the origin of novel functions in generating lineage-specific developmental differences. The expression pattern of the single Fgf8 gene in tetrapods is similar to the summation of the expression domains of fgf8a and fgf8b in teleosts86, 87, 88. For example, in mice, Fgf8 is expressed in somites and in the neural crest91, whereas, in zebrafish and sticklebacks, fgf8a but not fgf8b is strongly expressed in somites and, reciprocally, fgf8b but not fgf8a is strongly expressed in the neural crest87. The DDC model predicts that this type of complementary degeneration of subfunctions provides the opportunity for fgf8a to specialize for the somite function whereas its orthologue fgf8b could specialize for the neural crest function. Thus, the zebrafish and stickleback orthologues of fgf8a and fgf8b could form tools that are specialized for different functions in the toolkits of these two lineages.

Analysis of teleost fgf8 expression patterns suggests that subfunction partitioning might have continued after lineages diverged. For instance, in tetrapods, Fgf8 is essential for the formation of the midbrain–hindbrain boundary (MHB)80. In zebrafish and sticklebacks, both fgf8aand fgf8b are coexpressed in the MHB86, 87, and analysis of mutant zebrafish shows that fgf8a is essential for MHB formation. In medaka, fgf8a is also expressed in the MHB but, unexpectedly, it is not necessary for MHB development92. Furthermore, in medaka, inhibition of fgf8a blocks the formation of the trunk and tail92 but, in zebrafish, only inhibition of both fgf8a and fgf24 together blocks trunk and tail development90. Similarly, paired appendages (limbs and fins) require Fgf8 in chickens and mice93, 94, but require fgf24 rather than fgf8 in fish90. These results show that, after the expansion of the FgfD subfamily, the duplicates initially retained functional redundancies (appendage function for both Fgf8 and Fgf24) that were eventually resolved differently in different lineages by non-functionalization, subfunction partitioning and, presumably, the evolution of new functions. Some of these lineage-specific differences might have been in place to contribute to lineage diversification, but the hypothesis that they were causative remains to be tested.

Genome architecture and development

The previous two sections discussed studies that illustrate toolkit contraction and expansion and how changes in toolkit unity can contribute to the generation of phenotypic diversity. After genome expansion, orthologues can evolve different expression patterns in different species, and paralogues can have different expression patterns within a species. This section shows how comparative genomics can improve our understanding of the mechanisms by which these differences in expression patterns arise, and how mechanisms of conserved gene expression can relate to evolutionary stability of genome architecture.

Conserved non-coding elements. Comparative genomic analysis led to the surprising discovery that most evolutionarily conserved sequences in mammalian genomes are non-coding elements rather than protein-coding genes95. Conserved non-coding elements (CNEs) in amniotes tend to lie in gene-poor areas near developmental genes that encode transcription factors and morphogenetic proteins. Many CNEs include cis-acting regulatory elements that affect the activity of nearby genes96. These considerations suggest the hypothesis that variation in CNEs may contribute to lineage-specific developmental capabilities, which would be predicted by the idea that evolutionary change in gene regulation is a major force in the generation of phenotypic diversity. Recent studies on colour patterns in flies, for instance, reinforce this idea, and show how independent changes in cis-regulatory elements have led to gains and losses of convergent pigment patterns among flies97.

The DDC model predicts that gene regulatory subfunctions should partition between two paralogues and, if CNEs adequately represent at least a portion of the regulatory subfunctions (note that in some cases function can be conserved even though structure is not98), then the hypothesis predicts that ancestral CNEs should distribute between paralogues after genome duplication. Results from comparative studies on plant and animal genomes support this prediction. From a genomic duplication event about 11 million years ago, maize inherited paralogues liguleless2 (lg2) and liguleless-related sequence-1 (lrs1) and, after comparing sequences with their single-copy orthologue in rice, Langham et al. found that of 30 original CNEs one was lost from lg2 and two different CNEs were missing from lrs1 (Ref. 99). Thelg2 gene evolved a new role in the development of the ligule after the duplication event99, leading to the hypothesis that the partitioning of subfunctions subsequent to the duplication might have facilitated the origin of this new gene function. Among animals, CNE evolution after R3 has been investigated. For example, two zebrafish co-orthologues of human engrailed homeobox 2 (EN2) have reciprocally partitioned some ancestral expression domains and CNEs, but share others redundantly70, 73. A recent systematic analysis of seven pairs of pufferfish gene duplicates arising in R3 revealed a reciprocal loss of CNEs that are shared with the single-copy human gene, as predicted by the subfunction partitioning hypothesis100.

Paralogues not only partition ancestral CNEs, but also possess specific, partitioned functions. For instance, after being injected into fish or frog embryos, reporter constructs driven by CNEs for Iroquois genes express in specific tissues that represent the endogenous expression pattern, and cognate CNEs from different taxa have conserved expression domains101, 102. CNEs derived from R1 and R2 are evident in the human genome; for example, some of the elements that are present in the fish co-orthologues of PAX2 are also shared with PAX5 and PAX8, which are paralogues from R1 and R2 (Ref. 100). More than 100 small families of CNEs that are duplicated in the human genome can drive the expression of reporter constructs in phylogenetically similar domains in zebrafish embryos102. These CNEs must have been present more than 550 million years ago in the ancestral pre-vertebrate genes and have apparently preserved their functions during several rounds of genome expansion. A particularly interesting case involving functional tests of CNEs by morpholino gene knockdown shows subfunction partitioning after both the R2 and R3 events: some functions associated with HOXA1 in humans are associated with hoxb1 co-orthologues in zebrafish, consistent with the idea that, after the duplication of the ancestral Hoxa1/b1 gene, ancient CNEs that were retained by both Hoxa1 and Hoxb1 were sorted differently in the tetrapod and teleost lineages; finally, after R3, further subfunction partitioning occurred between the teleost hoxb1a and hoxb1b genes103. These studies lend genomic and functional support to the idea that the reciprocal sorting out of CNEs is a common feature of the evolution of duplicated genes derived from genome duplication events.

It will be of particular interest in the future to identify clade-specific CNEs, for example, those that are conserved among perciform fish (including pufferfish, sticklebacks and medakas) but are not in non-perciform fish (including zebrafish), or those that are found in mammals but not birds or amphibia. Clade-specific CNEs become candidates for regulatory elements that programme the developmental novelties that drive evolution.

Conserved syntenies and developmental regulation. Conserved non-coding elements are sometimes located at great distance from the genes they regulate. What are the consequences of these long-range CNEs for genome structure? An informative example comes from the zebrafish co-orthologues of FGF8. Becker and colleagues104 randomly inserted reporters into the zebrafish genome and found four that recapitulate fgf8a expression, even though one was in an intron of a neighbouring gene with a different expression pattern (fbxw4, which encodes F-box and WD-40 domain protein 4) (Fig. 5). Interestingly, the orthologues of fgf8a and fbxw4 are neighbours not only in humans and zebrafish, but also in the ascidian genome, which diverged from vertebrates before R1 (Ref. 78). This result suggests thatfgf8 and fbxw4 are part of a genomic regulatory block (GRB), the members of which must remain intact to ensure proper gene expression104. Human chromosome rearrangements involving FBXW4 and FGF8 cause split-hand/foot malformation105, presumably because they disrupt this GRB. Other human diseases might also result from the disruption of long-range enhancers and, indeed, position-effect human diseases tend to be associated with regions of long-range conserved synteny106. These considerations, and the idea that epigenetic mechanisms of global gene regulation might act on large blocks of genes that are located in specific chromosome territories, support the hypothesis that the evolution of genome architecture might be an important factor in the generation of phenotypic variation.

Sequence comparisons by Vista plot analysis identify CNEs in the intergenic region between FBXW4 and FGF8 that are conserved between humans (Hsa), zebrafish (Dre) and sticklebacks (Gac). The CNE indicated by the red line is conserved nearfgf8a, but not fgf8b, in both zebrafish and sticklebacks, suggesting that it partitioned between the two orthologues before these lineages diverged. The CNEs at the tail of the blue lines are conserved by fgf8a and fgf8b in zebrafish but not in sticklebacks, suggesting lineage-specific subfunction loss. The CNE in fbxw4 (position of black arrow at top left) apparently helps regulate fgf8 even though it is within another gene104. Orthologues of FBXW4 and FGF8 are neighbours in chordates from ascidians to mammals, and the embedded regulatory element in FBXW4 orthologues might be responsible for preserving this regulatory block.

Conclusions

Genomics, a descriptive science, has revolutionized our understanding of the history of genome change over time. Comparing the structure of genomes to the evolution of developmental morphologies has transformed our understanding of trait gain and trait loss, and the roles that genome contraction, genome expansion and genome architecture can have in the evolution of developmental mechanisms. These new capabilities unite evolutionary biology, developmental biology and genomics into a new interdisciplinary field. What is now necessary is to turn attention to the genome-wide functional analysis of organisms that are derived from key nodes, a mechanistic science. We must develop technological advances that allow us to turn virtually any species into a 'model organism' for functional studies to appreciate the proximal, developmental causes of morphological change and, eventually, the distal, evolutionary mechanisms of organismal diversification over time.

Avidor-Reiss, T. et al. Decoding cilia function: defining specialized genes required for compartmentalized cilia biogenesis. Cell117, 527–539 (2004).The authors of references 26 and 27 used comparative genomics to identify genes in organisms that have cilia or flagella, but that are absent from organisms that lack these organelles, and then verified the gene set in D. melanogaster and C. elegans .

Dehal, P. & Boore, J. L. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol.3, e314 (2005).Whole-genome analysis of paralogy groups in the human genome is used to test predictions of the idea that two rounds of whole-genome duplication occurred at the base of vertebrate phylogeny.