Abstract

Phylogenetic relationships among the four major lineages of land plants (liverworts, mosses, hornworts, and vascular plants) remain vigorously contested; their resolution is essential to our understanding of the origin and early evolution of land plants. We analyzed three different complementary data sets: a multigene supermatrix, a genomic structural character matrix, and a chloroplast genome sequence matrix, using maximum likelihood, maximum parsimony, and compatibility methods. Analyses of all three data sets strongly supported liverworts as the sister to all other land plants, and analyses of the multigene and chloroplast genome matrices provided moderate to strong support for hornworts as the sister to vascular plants. These results highlight the important roles of liverworts and hornworts in two major events of plant evolution: the water-to-land transition and the change from a haploid gametophyte generation-dominant life cycle in bryophytes to a diploid sporophyte generation-dominant life cycle in vascular plants. This study also demonstrates the importance of using a multifaceted approach to resolve difficult nodes in the tree of life. In particular, it is shown here that densely sampled taxon trees built with multiple genes provide an indispensable test of taxon-sparse trees inferred from genome sequences.

The origin and early evolution of land plants (embryophytes) during the mid-Ordovician to lower Silurian (480–430 million years ago) initiated the establishment of the modern terrestrial ecosystems and fundamentally altered the course of evolution of life on earth. Two important events marked this period of unprecedented innovation in plant evolution: the massive colonization of the land by plants descended from charophyte algae and the change of the dominant generation in the plant life cycle from a haploid gametophyte to a diploid sporophyte (1–5). The first event opened a vastly underexplored niche of high-intensity solar radiation and abundant CO2 to photosynthetic life. The second event conferred on plants two abilities to adapt to a life in a water-deficient and UV-abundant terrestrial environment. One is the ability to produce a large number of genetically diverse gametes to ensure fertilization on land where sperm locomotion is hindered, and the other is the ability to mask deleterious mutations through the dominant-recessive interaction of alleles, thus allowing a large number of alleles to persist in the gene pool (2–4). Our understanding of these events hinges on our knowledge of relationships between the organisms involved in these major evolutionary transitions. Despite numerous studies using diverse approaches analyzing morphological and/or molecular characters, relationships among early land plants remain controversial (5–19). Fossil evidence, although increasingly improved, has not helped to resolve the issues decisively (20, 21).

A multitude of phenomena characterizing diversification of many major clades of organisms could complicate reconstruction of the early land plant phylogeny: a large evolutionary gap between outgroup and ingroup, ancient rapid radiations, the occurrence of highly divergent relic lineages, and extinctions. Several other factors might further exacerbate the situation: an incomplete fossil record, evolutionary rate heterogeneity among different characters and lineages, character-state paucity in DNA sequence evolution that results in a disproportionately large number of back mutations, and the occurrence of enigmatic phenomena such as sequence composition bias and RNA editing. These factors create problems for character and character-state homology assessment, compromising the performance of phylogenetic methods (5, 18, 22). Empirical and theoretical studies have provided guidelines for overcoming some of these problems, specifically increasing both taxon and character sampling and selecting well understood characters from diverse sources (7, 10, 12, 22–25).

To resolve the relationships among the four major lineages of land plants, we assembled three different but complementary data sets, each intended to overcome some of the problems. First, six chloroplast (cp; cp-atpB, cp-rbcL, and cp-SSU and cp-LSU rDNAs), mitochondrial (mt; mt-LSU rDNA), and nuclear (nu; nu-18S rDNA) genes from 193 green algae and land plants were sequenced, with additional data obtained from GenBank (referred to as the multigene supermatrix hereafter). These six genes show slow to moderate evolutionary rates and should help achieve a balance between maximizing signal retrieval and assorting homoplasy. Extensive taxon sampling in this supermatrix is intended to maximize extraction of the phylogenetic signal, because inclusion of a large number of taxa within large clades such as liverworts, mosses, and vascular plants (including isolated relic lineages like Haplomitrium, Treubia, Takakia, and Equisetum) may allow more accurate inference of ancestral states at key nodes by breaking long branches. This broad and extensive sampling strategy also provides a taxon-dense phylogenetic hypothesis for comparison with the results of two other matrices with sparse taxon sampling. Second, 28 genomic structural characters, insertion sites of mt group II introns, which generally show stable inheritance and no rampant horizontal transfer in land plants (7, 11), were investigated for intron absence/presence in 16 taxa using data from our labs and GenBank (the intron matrix). Seventy-two such sites are known in these mt genomes; we included all 28 sites that have sequence from each of the four major land plant lineages. This type of data is not prone to problems commonly associated with DNA sequences (e.g., base composition bias and RNA editing) and is ideal for rooting deep phylogenies (7, 10, 18, 22) and can thus provide an independent assessment of phylogenetic hypotheses derived from nucleotide data. Finally, DNA and amino acid sequences of 67 genes from complete cp genomes of 36 species were obtained from GenBank. The DNA sequences were further partitioned into six matrices by codon positions and all pair-wise combinations, to examine the effect of functional constraints at different codon positions on phylogenetic relationships. The eight cp-genome matrices, seven of nucleotides and one of amino acids, are character-rich but taxon-sparse. With the multigene and the intron matrices, such data enable us to examine issues that have surfaced repeatedly in analyzing character-dense but taxon-sparse DNA sequence data: relative merits of character vs. taxon sampling, use of nucleotide vs. amino acid sequences, and inclusion/exclusion of different codon positions (12, 13, 16, 17, 22–24, 26).

Results

In maximum likelihood (ML) and maximum parsimony (MP) analyses of the multigene supermatrix, liverworts, mosses, hornworts, and vascular plants were all strongly supported as monophyletic groups (Fig. 1). Liverworts were sister to all other land plants with 100% and 91% ML bootstrap (BS) support (100% defines placement of liverworts within land plants, and 91% separates all other land plants from liverworts) and 100% and 89% MP BS support, respectively. Hornworts were sister to vascular plants, with 90% and 100% ML and 76% and 100% MP BS support, respectively. Mosses, including Takakia, were placed between liverworts and hornworts. Within vascular plants, lycophytes were strongly supported to be sister to the clade containing monilophytes (ferns and allies) and seed plants. Both monilophytes and seed plants were strongly supported monophyletic groups, but their basal relationships were not fully resolved.

The 50% majority-rule BS consensus trees from ML and MP analyses of the multigene supermatrix. The numbers above the branches are ML BS values >50%; those below are MP BS values >50%. For nodes where ML and MP analyses differ in topology, only the ML topology and BS values are shown. The BS values depicting the backbone relationships in land plants are shown in boldface.

In alternative topology test analyses of the multigene supermatrix, we were initially unable to reject the three competing hypotheses: mosses sister to vascular plants (5, 6), hornworts basal (14, 15), and bryophytes monophyletic (13, 16, 17), although the first two were close to the rejection threshold under ML (Tables 1 and 2, which are published as supporting information on the PNAS web site). We implemented a second constraint search in which relationships within the four major land plant clades were constrained based on the 50% majority BS trees obtained from ML and MP analyses. The well supported relationships we inferred within each of these major clades are in agreement with relationships inferred independently for these clades earlier (25, 27–30). Under these more conservative constraints, all three competing hypotheses were rejected in favor of the optimal topology we inferred using ML and MP (Tables 1 and 2).

In MP analyses of the intron matrix, liverworts were sister to other land plants with <50% and 96% BS support (Fig. 2). Relationships among mosses, hornworts, and vascular plants were not resolved, but the BS analysis placed hornworts with higher support (23% and 47%) than mosses (14% and 47%; data not shown) as the sister to vascular plants. In compatibility analysis (Table 3, which is published as supporting information on the PNAS web site), liverworts were strongly supported as the sister to all other land plants. Both hornworts and mosses were strongly supported to be sister to vascular plants. The hypotheses of hornworts basal or bryophytes monophyletic were both strongly rejected.

In ML and MP analyses of the cp-genome data, analyses of three of the eight matrices (those of all three codon positions, first plus third and second plus third positions) produced strongly supported topologies that were identical to multigene analyses with respect to relationships among the four main land plant lineages (Fig. 3A and B). Similarly, analyses of the first, third, and first plus second position matrices yielded a topology congruent with multigene analyses in terms of hornwort placement but with liverworts and mosses forming either a clade or serial sister groups to hornworts–vascular plants (Fig. 3C). Analyses of the second position and amino acid matrices generated topologies that also placed hornworts with vascular plants but with relationships among basal land plants resolved in an unconventional fashion (e.g., vascular plants were not monophyletic), likely reflecting analytical artifacts caused by conflict between organismal phylogenetic signals and mutation patterns dictated by functional selection at the second codon positions and amino acid sequences (Fig. 3D). In alternative topology test analyses using ML and MP criteria of all three codon position, third, first plus third, and second plus third position matrices, the topology derived from the multigene analyses was favored, and all three competing hypotheses were rejected. In the same analyses of the first, second, and first plus second position and amino acid matrices, at least the mosses being sister to vascular plants hypothesis could be rejected (Tables 1 and 2).

Results of ML and MP BS analyses of the eight cp-genome matrices. (A) The BS consensus tree from ML analysis of the all codon position matrices, with all taxa presented. (B–D) Schematic diagrams of the BS consensus trees from ML analyses of the other seven matrices, with only major land plant lineages indicated. In all trees, ML and MP BS values are presented above and below branches, respectively. In B–D, the BS values for analysis of each matrix, when different, are separated by slashes, and the one BS value of 100 indicates identical values in different analyses. a, Amborella alone was sister to all other angiosperms with 100/77 MP BS support; b, Vitis was sister to all other eudicots with MP 100/50 BS support; c and d, Marchantia was sister to all other land plants with 100/45 and 100/99 MP BS support, respectively; e and f, Psilotum and Adiantum formed a clade sister to all other land plants with 100/63 and 100/55 MP BS support, respectively; and further for f, the clade of bryophytes and lycophytes was sister to other vascular plants; g, Huperzia was sister to Anthoceros plus Selaginella. Ch, charophytes; Lv, liverworts; Ms, mosses; Hw, hornworts.

Discussion

The three data sets analyzed in this study have distinctly different taxon and character samplings, from a large number of taxa in the multigene supermatrix, to a large number of characters in the cp-genome matrix, to nonnucleotide genomic structural characters in the intron matrix. Each of the data sets was analyzed by at least two different methods. The results are generally congruent in identifying the positions of liverworts and hornworts in the land plant phylogeny. Analyses of all three data sets strongly supported liverworts as the sister to all other land plants; analyses of the multigene and cp-genome matrices provided moderate to strong support for the placement of hornworts as the sister to vascular plants. Although the intron matrix was unable to distinguish whether hornworts or mosses were sister to vascular plants, it did not contradict the hypothesis of hornworts being sister to vascular plants. Thus, we think the best explanation of the phylogenetic pattern uncovered here is that it reflects the underlying organismal evolutionary history.

Of the two major findings of this study, the position of liverworts presented is supported by several morphological and molecular characters (5–7, 10) and is probably considered less controversial (ref. 18, but see refs. 14–17 and 19). In contrast, the placement of hornworts as sister to vascular plants is rather novel. Hornworts have been strongly argued to be sister to all other land plants (14, 15). On the other hand, several other studies hinted they could be sister to vascular plants (8–13), and three recent analyses of cp-genome sequences and genomic structural characters are particularly noteworthy in this regard, because BS values supporting this relationship range from 82% to 100% (10, 12, 13). Our cp-genome matrices for the most part obtained the same results (Fig. 3). Previously, strongly supported bryophyte monophyly was recovered in analyses of cp-genome sequences with virtually the same data set except for the lack of lycophytes (16, 17). The analyses performed here and most recently (12, 13) indicate that, after lycophytes were included, a critical taxon sampling density has been reached to resolve relationships among the four major land plant lineages using this type of data. This assessment is corroborated by the congruence between the cp-genome and multigene analyses; the latter includes a dense taxonomic sampling across all major land plant clades (Fig. 1).

There are also several morphological and physiological characters, particularly those related to sporophyte development, that tend to support the hornwort position identified here. These characters were not used in or available to earlier morphological cladistic analyses (5, 6). They include lack of ventral slime papillae, hairs, and/or scales in prothalli (15); the embedded position of gametangia (31, 32); intermingled/interdigitate gametophyte–sporophyte junction (33); spiral thickening on cell walls in the columella (34); the persistently chlorophyllous and nutritionally largely independent sporophyte (32, 35, 36); rhizoid-like behavior of surface cells of the sporophyte foot (35); the longevity and large size of the sporophyte (32, 35); and xylan content in cell walls of pseudoelaters and spores (37). It should be emphasized that, although some of these similarities between hornworts and vascular plants are controversial (5, 6, 15), our results suggest they should be critically reexamined to identify truly synapomorphic changes shared by hornworts and vascular plants. One particular character worth investigating is the nutritionally largely independent free-living sporophyte generation of hornworts (32, 35, 36). The phylogenetic and fossil evidence uncovered over the last several decades clearly supports a charophytic origin of land plants (3, 13, 27) and bryophytes predating vascular plants in geological strata (20, 21). This evolutionary sequence of algae and early land plant lineages seems to reveal a major trend of adaptation to terrestrial environments by plants over the last 480 million years: continuously expanding the sporophyte generation while reducing the gametophyte generation in their life cycles. Hornworts are unique among bryophytes in exhibiting several similar features on both gametophyte and sporophyte: persistent photosynthetic capacity, nutritional independence, as well as plant size and longevity (2, 32, 35, 36). Also, there are similarities between sporophytes of hornworts and early vascular plants (2, 15, 31–37). Hence, one might interpret that hornworts, among three extant bryophyte lineages, approach closest toward vascular plants in their sporophyte development in terms of achieving an independent free-living sporophyte generation. The elaborate nutritionally largely independent sporophyte generation of hornworts can then perhaps be taken as evidence to support their close relationship to vascular plants.

The phylogeny of early land plants reconstructed here sheds significant light on our understanding of alternation of generations in land plants, which was elaborated upon by the antithetic hypothesis (1, 2, 31, 35). According to this hypothesis, the diploid sporophyte generation was interpolated into the life cycle of charophytes through a delay of meiosis after fertilization. The sporophyte generation expanded as bryophytes evolved, accompanied by structural elaboration and progressive sterilization of potentially sporogenesis tissues (which might develop into columella), and ultimately became a dominant generation in the life cycle of vascular plants. Hornworts were envisioned as the transitional bryophytes to vascular plants by some advocates of this hypothesis (31, 35). Although the charophytic ancestry of land plants was recognized several decades ago (3), the hornwort position in this evolutionary scenario has never been seriously considered since its initial proposal. Instead, mosses were often suggested as the bryophytes most closely related to vascular plants (5, 6). Switching the position of mosses with hornworts as the sister to vascular plants calls for a reassessment of homology of the characters used to connect mosses with vascular plants, e.g., hydrom and leptom in mosses and xylem and phloem in vascular plants (ref. 6, but see ref. 5), and the seta of mosses and stem of vascular plants (refs. 5 and 6, but see ref. 38). Moreover, the characters mentioned above that enable sporophytes to achieve nutritional independence from gametophytes, which have not been paid much attention so far in the discussion of early land plant evolution, need to be reinvestigated. These features may represent potential preadaptations of the hornwort sporophyte to becoming a free-living generation, which is found only in vascular plants among all extant land plants. In this regard, it is noteworthy that biennial nearly free-living sporophytes, with the gametophytic tissues around the base of the sporophyte discolored and collapsed, were found in the wild for Anthoceros fusiformis, and that excised sporophytes survived independently of the gametophyte on sterile soil for 3 mo (35). In liverworts and mosses, the sporophyte is strictly matrotrophic on the gametophyte. There also seems to be paleobotanical evidence that supports the hornwort position identified here. The extinct prevascular plant Horneophyton lignieri, shown to be positioned between bryophytes and vascular plants (5), exhibits several features reminiscent of hornworts: a massive lobed rhizome (like the lobed foot of Anthoceros), a shoot terminating in a single sporangium, stem anatomy, growth habits of sporophytes (35), and an unequivocal columella in the sporangium (5). Finally, the lobed foot of the hornwort sporophyte, with rhizoid-like absorbing cells on the surface (35), may be homologous to the protocorm of some lycophytes, development of which has been interpreted as essential for the establishment of a free-living sporophyte (1). Again, we emphasize that some of these interpretations on homology between structures of hornworts and vascular plants may turn out to be inaccurate, but our intention is to bring the previously unexplored congruence of so many potential synapomorphies to future investigators' attention so that their evolutionary significance can be properly assessed.

This study analyzing three different but complementary data sets to resolve two of the most recalcitrant issues in plant phylogeny demonstrates the power of a multifaceted approach in molecular phylogenetics (22). Recently, the genome-scale approach has been heralded as the primary way to resolve incongruence in molecular phylogenies (26). Several studies directly followed this approach attempting to resolve controversies on the early land plant phylogeny (refs. 12, 13, 16, and 17, and this study), but the results were mixed. For example, analyses of cp-genome data indicate that the addition of a single lineage, the lycophytes, has a dramatic effect on the resolution of relationships among the four major land plant lineages. This situation is paralleled by another hotly debated case in plant phylogenetics, resolution of relationships among the earliest angiosperms, where tree topology was also shown to be extremely sensitive to taxon sampling (24). In both cases, results from taxon-dense data sets have played an instrumental role in helping to evaluate results from genome sequence data. Two other merits of taxon-dense data sets suggest they should be pursued in all phylogenetic studies: identification of previously unrecognized large clades (e.g., monilophytes and eudicots) and discovery of species that occupy pivotal positions in major evolutionary transitions (e.g., Amborella trichopoda). It is only with such taxon-dense data sets that phylogenetic studies can fully realize their potential of illuminating evolutionary patterns and guiding other evolutionary studies. Hence, we suggest that use of multiple data sets of different character and taxon sampling schemes will be most effective and efficient in resolving other difficult nodes in the tree of life.

Materials and Methods

The Data Matrices.

The multigene supermatrix had 193 taxa with 188, 191, 192, 192, 171, and 188 sequences for cp-atpB, cp-rbcL, cp-LSU, cp-SSU, mt-LSU, and nu-18S, respectively. Among these data, 110, 52, 184, 154, 130, and 71 sequences were generated by us for these six genes. In the intron matrix of 28 positions by 16 taxa, there were 66 missing entries. Data on 42 entries were collected in this study. For the cp-genome matrices, all data were downloaded from www.ncbi.nlm.nih.gov/genomes/ORGANELLES/plastids_tax.html, except Selaginella uncinata, which was from http://getentry.ddbj.nig.ac.jp. All matrices are deposited in TreeBASE (www.treebase.org) under SN3006–12390. Other details on assembly of the three matrices are provided in Supporting Text, which is published as supporting information on the PNAS web site.

Phylogenetic Analyses.

The multigene and the cp-genome matrices were analyzed by ML and MP methods, and the intron matrix was analyzed by MP and compatibility methods. For ML analyses, an optimal model of nucleotide evolution was selected by using the Akaike Information Criterion as implemented in Modeltest, Ver. 3.07 (39). ML analyses of DNA sequences were implemented in PHYML Ver. 2.4.4 (40), or TreeFinder (Ver. October 2005; ref. 41) under the GTR+I+Γ model with all parameters estimated from the data. ML analyses on the amino acid data were similarly conducted with PHYML, Ver. 2.4.4, or TreeFinder using the cpREV+I+Γ with all parameters estimated from the data. The cpREV is a model optimized for plastid genome data and is therefore preferred to more general models such as JTT or mtREV (42). ML BS analyses were implemented by using 1,000 (for the multigene and the cp-genome matrices) resampling replicates under the optimal model of nucleotide or amino acid evolution in PHYML.

The MP analyses were performed in PAUP*, Ver. 4.0b10 (43). A heuristic search was conducted by using 1,000 random taxon addition replicates, with one tree held at each step during stepwise addition, tree-bisection-reconnection branch swapping, steepest descent option off, MulTree option on, and no upper limit of the MaxTree set. For the intron matrix, a branch-and-bound search was conducted. BS analyses were conducted by using 1,000 (for the multigene and the intron matrices) or 10,000 (for the cp-genome matrices) resampling replicates with the same tree search procedure as described above but with simple taxon addition and the steepest descent option on.

The compatibility analysis of the intron matrix and alternative topology test analyses of the multigene and the cp-genome matrices are described in Supporting Text.

Blood-sucking sand flies from disparate global regions have a predilection for feeding on the marijuana plant (Cannabis sativa), and the findings hint at a potential avenue for controlling sand flies, which can transmit leishmaniasis.