Abstract

Rates of biological diversification should ultimately correspond to rates of genome evolution. Recent studies have compared diversification rates with phylogenetic branch lengths, but incomplete phylogenies hamper such analyses for many taxa. Herein, we use pairwise comparisons of confamilial sauropsid (bird and reptile) mitochondrial DNA (mtDNA) genome sequences to estimate substitution rates. These molecular evolutionary rates are considered in light of the age and species richness of each taxonomic family, using a random-walk speciation–extinction process to estimate rates of diversification. We find the molecular clock ticks at disparate rates in different families and at different genes. For example, evolutionary rates are relatively fast in snakes and lizards, intermediate in crocodilians and slow in turtles and birds. There was also rate variation across genes, where non-synonymous substitution rates were fastest at ATP8 and slowest at CO3. Family-by-gene interactions were significant, indicating that local clocks vary substantially among sauropsids. Most importantly, we find evidence that mitochondrial genome evolutionary rates are positively correlated with speciation rates and with contemporary species richness. Nuclear sequences are poorly represented among reptiles, but the correlation between rates of molecular evolution and species diversification also extends to 18 avian nuclear genes we tested. Thus, the nuclear data buttress our mtDNA findings.

1. Introduction

Although ecological or adaptive divergence can lead to diversification (Rieseberg & Wendel 2004), genome evolution at the molecular level is usually considered the initial starting point of speciation and thus is a major driving force underlying species diversification (Coyne & Orr 2004; Martin & McKay 2004; Eo et al. 2008). One of the most basic predictions in evolutionary biology is that the rate of diversification along a particular branch of the tree of life is some function of the rate of genome evolution on that branch. Increased rates of molecular evolution lead to increased genetic variation within a species, and this variation may ultimately be sundered to the point where new species are produced because of evolving reproductive barriers that ultimately reduce gene flow. If this process is iterative over evolutionary timescales, species diversity on a branch with fast rates of genome evolution should be greater than on a branch with slower rates of genome evolution, assuming all else is equal (including extinction rates). However, biologists still lack direct evidence to support the fundamental prediction that rates of genome evolution impinge upon diversification rates. A few recent studies have tested for correlations between species richness (or the number of speciation events) and the branch lengths on phylogenetic trees (Barraclough & Savolainen 2001; Webster et al. 2003; Pagel et al. 2006), but such analyses may be encumbered by incomplete phylogenies in some taxa and by difficulties associated with the quantification of diversification rates.

Relative rates of speciation and extinction can be derived based on explicit information about the birth and death of lineages through a particular time interval (Ricklefs 2007). This relative ratio of speciation to extinction is the basis for some recent methods of diversification rate estimation (Magallón & Sanderson 2001; Bokma 2003). These methods assume a random-walk process of speciation and extinction similar to a stochastic birth–death process in population biology (Baily 1964). In practice, these methods rely on the estimated age of a clade and the number of extant species it contains. For example, Ricklefs (2006) used nonlinear regression to estimate 0.53 speciation events, and 0.49 extinction events, per million years in passerine birds.

Not only do rates of biotic diversification differ dramatically across phylogenetic lineages, but so too do rates of molecular evolution (Bromham & Penny 2003; Kumar 2005). For example, rodents are evolving rapidly whereas apes (including humans) are evolving slowly compared with other mammalian lineages (Kumar 2005). The avian rate of molecular evolution is thought to be slow relative to most mammals (Mindell et al. 1996), and relative to other reptiles, snakes are evolving rapidly and turtles slowly (Avise et al. 1992; Hughes & Mouchiroud 2001). Most studies, however, have sampled only a few species and/or a small number of genes, and rates of molecular evolution can vary dramatically across genes as well as lineages. For example, globin pseudogenes evolve considerably faster than true genes (Li et al. 1981) and substitution rates are elevated in primate pituitary growth hormone genes compared with other mammals (Wallis 1994). Thus, in order to draw robust conclusions about the link between genome evolution and biotic diversification, diverse phylogenetic lineages and a large sample of genes should be evaluated to identify significant patterns (Kumar & Subramanian 2002). The analysis of the complete mitochondrial genome could be a valuable tool for this purpose, as it consists of 37 genes that are matrilineally inherited together. We do not mean to suggest that natural selection on mitochondrial DNA (mtDNA) substitution rates directly and strongly impacts biotic diversification. However, mtDNA substitution rates are elevated in some lineages where chromosomal rearrangements are rampant (Triant & DeWoody 2006), and both nuclear and mtDNA substitution rates are negatively correlated with mammalian body mass (Welch et al. 2008). Thus, it seems both reasonable and likely that mtDNA substitution rates in a lineage are positively correlated with nuclear DNA substitution rates in the same lineage. If so, mtDNA genome sequences could serve as a proxy for relative rates of nucleotide substitution in nuclear genomes where comparative sequence data are scarce.

2. Material and methods

(a) mtDNA substitution rates

As of 30 March 2009, there were 194 complete mtDNA genome sequences of avian or reptilian species, with 136 (55/81 avian/reptilian) species represented by 33 (15/18 avian/reptilian) families with sequences available from at least two con-familial species. We assembled comparative sequence alignments for each of 28 families, excluding five families that consisted of only congeneric species, as our analyses focused on interfamily comparisons and the inclusion of congeneric species could underestimate family-specific evolutionary distances or substitution rates (electronic supplementary material, table S1).

Various mtDNA evolutionary distances were estimated using MEGA4 (Tamura et al. 2007). We first computed the mean number of nucleotide differences per site by pairwise comparison between two confamilial sequences, using the Tamura–Nei (TN) method to correct for multiple hits. For all 13 protein-coding genes, numbers of synonymous and non-synonymous substitution per site were calculated by comparing sequences codon-by-codon. We also estimated substitutions at fourfold degenerate sites as indicators of neutral evolution. To identify rate heterogeneity across families and across genes, we compared family-specific and gene-specific substitution rates. We estimated ages of reptilian families from the molecular (Hedges et al. 2006) and the palaeontological data (Olmo 2005) to minimize possible errors in divergence time estimates. However, we determined avian ages using molecular data only (Hedges et al. 2006) because the avian fossil record is limited and may underestimate clade ages (Pereira & Baker 2006). The gene-specific substitution rates were estimated as the mean substitution rates over all families, for each gene. We partitioned substitution rate variation into effects of family, gene and family-by-gene interaction and tested each effect using a random-effects analysis of the variance model. The variance components were estimated using the maximum-likelihood method.

(b) Estimation of diversification rates and species richness

The family-specific diversification rate can be estimated from the size and age of each family if we assume that the ratio of speciation rate to extinction rate is constant across families in a random-walk speciation–extinction process (Ricklefs 2006). Using a fixed proportion (κ = µ/λ) of extinction rate (µ) and speciation rate (λ), we computed the speciation rate of each family using the formula, λ = ln[N(1 − κ) + κ]/[(1 − κ)t], and diversification rate (δ = λ − µ), where N is the number of species and t is the age of a given family (Ricklefs 2006). We set κ = 0.99 after considering estimates based on the large datasets of Bokma (2003) and Ricklefs (2006), but we also used a range of κ (=0–0.95 in intervals of 0.05, as well as 0.995) because confidence limits on the estimate of κ can be broad owing to stochastic variation in the speciation–extinction process (Ricklefs 2007). Species richness, determined as the number of species in a given family, was taken from Dickinson (2003) for birds and from the Reptile Database (Uetz et al. 2009). Because there are some taxonomic inconsistencies among the various databases for genome sequences, family ages and species richness, we relied primarily on the species richness databases (Dickinson 2003; Uetz et al. 2009).

(c) Relationships among substitution rates, diversification rates and species richness

Although we assume no differences in the ratio of extinction/speciation rates across families, our estimates of net speciation rates (diversification rates) vary and reflect variation in the extent of lineage splitting among families (i.e. cladogenesis). We assessed whether the diversification rate is associated with the substitution rates across the mtDNA genome, which would indicate a link between rates of molecular evolution and speciation. To determine whether different fixed ratios of extinction/speciation rates would affect the resulting rates and relationships between diversification rates and substitution rates, we compared diversification rates of each family and coefficients of determination (r2) of the relationships for a range of κ (see above). We also compared species richness with the substitution rates. Correlations between the substitution rate and the diversification rate, and between substitution rate and species richness, were assessed using independent contrasts to accommodate shared ancestry (Harvey & Pagel 1991), as well as by using the raw data. To define the family-level independent contrasts, we used phylogenies in GenBank's Organelle Genome Resources and in Hackett et al. (2008).

(d) Validation using avian nuclear genes

To extend our mtDNA inferences, we collected avian sequences from 18 nuclear exons or introns that Ericson et al. (2006) or Hackett et al. (2008) used to study the diversification of modern birds (electronic supplementary material, tables S2 and S3). This nuclear dataset (22 kbp total) is phylogenetically well-represented (52 families from 229 species). We then repeated many of the tests described earlier.

Family-level phylogenetic relationships and family-specific substitution rates of complete mtDNA genomes in birds and non-avian reptiles (turtles; squamates; crocodiles). Mean evolutionary rates of each family were estimated by dividing the divergence times of each family into (a) the Tamura–Nei substitution rate (TN); (b) the substitution rate at fourfold degenerate sites (fourfold); (c) synonymous substitution rates (Ks); and (d) non-synonymous (Ka) substitution rates. Error bars represent standard deviations from data of multiple species pairs.

The gene-specific non-synonymous substitution rates across families ranged from 0.70 × 10−9 at CO3 to 3.16 × 10−9 at ATP8 (figure 2). There was little variance in the synonymous substitution rates across genes (electronic supplementary material, figure S1 and appendix S2), and the mean rate was 10.39 × 10−9. For all measures of evolutionary rates, family and family-by-gene terms were statistically significant, with the family effect being the strongest while gene effect was weak (electronic supplementary material, appendix S2 and table S4).

4. Discussion

Our results are consistent with published data in that they support the general idea of lineage-specific molecular clocks among birds and reptiles. Previous studies have also suggested that evolutionary rates vary among lineages and genes (Martin & Palumbi 1993; Hughes & Mouchiroud 2001; Kumar & Subramanian 2002). Here we show that as gauged using entire mtDNA genome sequences, snakes and lizards are generally characterized by rapid substitution rates, whereas rates are generally slower in turtles and birds.

In principle, the pattern of ‘fast’ snakes and lizards compared with ‘slow’ turtles and birds could be an idiosyncratic artefact of the species or genes we analysed, but we accounted for this to the extent possible by analysing every protein-coding gene in the mtDNA genome. There is considerable rate heterogeneity in the mtDNA dataset, as exemplified by about a 10-fold increase in the TN substitution rate among Amphisbaenidae, Sylviidae or Colubridae as compared with Rheidae (figure 1). It remains to be seen whether this rate heterogeneity extends to sauropsid nuclear genes, as recent work in mammals indicates that rate differences within the rodents and the primates are similar in magnitude to those between groups (Kumar & Subramanian 2002). Unfortunately, there are still too few sequences available for a robust comparison of rate heterogeneity among sauropsid mtDNA and nuclear genes.

Our most important findings are the positive correlations between mtDNA substitution rates and diversification rates, and from mtDNA substitution rates to contemporary species richness. These correlations suggest there is a direct link between evolutionary rates at the molecular level and biological diversification via speciation and extinction. Barraclough & Savolainen (2001) first revealed the correlation between substitution rates and species richness in flowering plants using relative estimates of substitution rate variation among sister-family pairs, whereas Webster et al. (2003) investigated this relationship using the number of nodes in a phylogenetic tree as a proxy for the speciation rate. In contrast, our approach was to consider absolute mtDNA genomic rate variation within families. We extend earlier studies by providing integrated evidence to support the idea that rapid substitution rates increase speciation rates, which then result in a net increase in contemporary species richness. We associated these correlations by estimating the speciation (and diversification) rates of each family and found they are linked.

As an example, the diversification rate of the avian family Rheidae is 0.01 if we assume that 1 per cent of newly formed species survive (i.e. 99% go extinct). The Rheidae has only two extant species on Earth and their substitution rate was one of the slowest among the families we analysed, whereas the Colubridae has more than 1000 extant species and their substitution rate was one of the fastest analysed. This means that 0.01 diversification events occur every million years in Rheidae compared with 7.31 new species per million years in the Colubridae (a 700-fold difference). Note the value of κ we used was estimated not from our own data but from other large datasets (Bokma 2003; Ricklefs 2006) in order to reduce bias. Of course, these absolute speciation rates should still be interpreted with caution because the assumption of a fixed κ results in a broad confidence interval (Magallón & Sanderson 2001; Bokma 2003; Ricklefs 2006). For instance, the estimated diversification rate in the Colubridae varies dramatically (0.18, 0.33 and 11.50 speciation events per million years) when we applied κ = 0, 0.5 and 0.995, respectively. Furthermore, we assumed homogeneity for the sake of simplicity although κ is probably heterogeneous among lineages in nature. Finally, the extinction rate is a factor that may be independent of the inherent speciation rate as extinction risks are affected by intrinsic as well as extrinsic factors (see references in Fisher & Owens 2004). Despite these caveats, the coefficients of determination (r2) in the association between substitution rates and diversification rates (or species richness) do not vary with respect to κ, suggesting the associations are strong regardless of the relative ratio of extinction and speciation.

Fossil records are very important in dating divergence times and therefore in estimating molecular evolutionary rates of each family. However, limitations of the fossil record (especially with regard to birds) make it necessary to consider alternative plausible divergence times (Pereira & Baker 2006). Thus, we repeated our analyses on different timescales with molecular, fossil and their mixed data to reduce errors in divergence times associated with a single data source. Despite the various evolutionary ages used in our analyses, we found significant relationships among mtDNA evolutionary rates, diversification rates and species richness in more than 90 per cent of the tests. This suggests that the associations we report are not artefacts of various divergence time estimates.

Tests for relationships among molecular evolutionary rates, diversification rates and species richness should be much more powerful if based on multiple genes. The mitochondrial genome consists of 37 genes and their substitution rates (in particular Ka) vary as shown in electronic supplementary material, table S4. However, the mtDNA molecule is inherited as a single haplotype because of the lack of recombination. Thus, we also evaluated the relationships among substitution rates, diversification rates and contemporary species richness using nuclear genes (all from birds, as there is a paucity of reptile nuclear data). The avian nuclear data generally mirror the mtDNA in that 100 per cent of the 18 genes analysed revealed significant positive relationships between substitution rates (TN) and diversification rates. When we consider Ks, Ka and fourfold substitution rates for avian nuclear genes, 93 per cent of the statistical tests indicate significant positive associations with diversification rate (most with p < 0.001). Associations between substitution rates and contemporary species richness also mirrored the mtDNA data (electronic supplementary material, table S7). We maintain it is no accident that rates of molecular evolution correspond to rates of sauropsid diversification and to contemporary species richness.

Acknowledgements

We thank the DeWoody laboratory group for their comments on the manuscript. We are also grateful to R. E. Ricklefs and R. G. Harrison for helpful comments, to the National Science Foundation and to the Office of the Provost at Purdue for funding through the University Faculty Scholar programme.