{"title"=>"Statistical inference on the mechanisms of genome evolution", "type"=>"journal", "authors"=>[{"first_name"=>"Michael", "last_name"=>"Lynch", "scopus_author_id"=>"7401592985"}], "year"=>2011, "source"=>"PLoS Genetics", "identifiers"=>{"sgr"=>"79959819623", "pmid"=>"21695228", "pui"=>"362058219", "isbn"=>"1553-7404", "scopus"=>"2-s2.0-79959819623", "doi"=>"10.1371/journal.pgen.1001389", "issn"=>"15537390"}, "id"=>"e34fdeca-dd76-3262-9c41-2796eb567b83", "abstract"=>"Introduction In a series of publications, I and my colleagues have developed hypotheses for how the evolution of various aspects of genome architecture is expected to proceed under conditions in which the forces of random genetic drift and mutation predominate (e.g., [1]–[15]). These models, collectively referred to below as the mutational-hazard (hereafter, MH) hypothesis, are sometimes represented as neutral models [16], [17], but this is not correct, as the key component of each model is the deleterious mutational consequence of excess DNA. The MH hypothesis is, however, a nonadaptational model, in that it yields expectations on the structure of genomes without invoking external selective forces. It is likely that some aspects of these models will need to be changed as more is learned about the molecular consequences of various aspects of gene structure and the nature of mutation. Such modifications will not alter the need for baseline null hypotheses in attempts to defend adaptive explanations for variation in genomic architecture [9]. Nevertheless, any theory that strives to provide a unifying explanation for diverse sets of genomic observations must be scrutinized extensively from a variety of angles and interpreted in the context of well-established molecular and population-genetic processes. Although I will argue that a recent challenge to the MH hypothesis by 1Whitney and Garland ([18]; hereafter, WG) 1contains numerous problems, this exchange may help clarify more broadly misunderstood issues. Errors in Statistical Logic and Analysis Statistical theory provides a framework for rigorously testing hypotheses in biology, with two of the more dramatic examples being the formal theory of quantitative genetics [19] and phylogenetic inference [20]. Nevertheless, the utility of statistical methods for hypothesis testing depends critically on the extent to which the underlying model assumptions match the features of the system under investigation. Like an ill-defined verbal argument, overconfidence in an inappropriate quantitative analysis can lead to misleading interpretations. Unfortunately, because large-scale changes in genomic architecture emerge on time scales of tens to hundreds of millions of years, tests of general theories of genome evolution are highly reliant on comparative data. This can raise issues regarding the significance of hypothesis tests when the underlying data share evolutionary history. Since Felsenstein [21] introduced the rationale for the phylogenetic comparative method, various derivative techniques have been developed, some by the author of this paper [22], [23]. These approaches have been used broadly in evolutionary ecology, although not always with good justification (as emphasized in [24]–[26]). Using such methods, WG concluded that phylogenetic diversity of genomic features is unaffected by variation in the power of random genetic drift, challenging the MH hypothesis, but there are at least four classes of statistical problems associated with this study. First, the analyses employed by WG are only justified when the characters under consideration have some possibility of shared evolutionary history among related taxa. The degree to which history is shared across related lineages is often unclear with phenotypic traits. However, the issues are well-understood for the central variable in the analyses of WG, the level of average nucleotide heterozygosity at silent sites (πs), which has an expected value of Neu under mutation-drift equilibrium (where Ne is the effective population size, and u is the base-substitutional mutation rate per nucleotide site; ignoring, for simplification, the factor of 4 or 2 that should precede this expression in diploid versus haploid populations). The expected coalescence time for a neutral gene genealogy, 4Ne generations in a diploid species, is dramatically less than the divergence time between even the most closely related species in WG's analysis (e.g., Mus and Homo, Drosophila and Anopheles, none of which share ancestral polymorphisms). Therefore, if any trait can be stated as having no shared phylogenetic history in the analyses of WG, it is the estimator of Neu. Although all traces of ancestral πs values have been erased many times over for the taxa in this study, one could perhaps still argue that some shared history remains with respect to the underlying population size and mutation rate determinants in some pairs of lineages, which might allow similar heterozygosity values to re-emerge. It is notable, however, that there is considerable turnover among lineages in the genes encoding for enzymes that dictate the mutation rate, with the replication polymerases in eukaryotes and eubacteria not even being orthologous, and the repair polymerases in numerous eukaryotic lineages being absent from others. In any event, this concern is dwarfed by other limitations, including the very high sampling variance associated with πs estimates (the standard errors of estimates often being of the same order of magnitude as the estimates themselves), and the unknown element of temporal variation on time scales exceeding Ne generations. Because of such enormous sampling variation, this author has generally simply reported average estimates of πs across wide phylogenetic groups (e.g., [5]). By deriving independent contrasts on πs, WG greatly inflated the sampling variance of this parameter, and it can be shown that this problem alone will cause a ~30% decline in expected r2 values involving correlations with other traits. An equally substantial problem is associated with the strict interpretation of πs as a measure (or linear correlate) of Neu across all of life. Most notably, many prokaryotes appear to approach the maximum level of Ne (and minimum level of u) dictated by the effects of selection on linked genes [7], [15], in which case, the independent contrasts of true values of Neu between such species pairs will be essentially randomly distributed around zero. This problem is compounded by the downward bias in πs-based estimates of Neu in unicellular species that results from selection on silent sites [5], [7], [27], [28]. Even if we can be confident that Neu is much higher in prokaryotes than in vertebrates, the estimates based on πs may be off by more than an order of magnitude [7]. Owing to the long time scale on which genomic alterations accrue, the concern for shared evolutionary history in such attributes might in some cases be more justified. However, for the lineages evaluated by WG, such phylogenetic inertia is overshadowed by other evolutionary effects. For example, for the two most closely related species included in the WG analysis, mouse and human (and most other eutherian mammals), numerous shared features of genome architecture are a consequence of convergent evolution, not shared ancestry [29]; the same is true of the ancestral species leading to the land-plant and metazoan lineages [7]. The complete turnover of various mobile-element families among eukaryotic lineages provides additional compelling evidence for the absence of strong phylogenetic effects among the taxa examined by WG. Thus, as in the case of factors influencing the mutation rate, it is unclear whether the aspects of shared biological history that are the targets of the WG analysis are any more meaningful than applying a similar strategy in combined study of bat, bird, and insect wings. Second, use of a phylogenetic tree with questionable branch lengths will further obfuscate any phylogenetic analysis, as branch-length scaling must yield uniform sampling variances of the contrast data for downstream hypothesis tests to be valid. In an attempt to remove such issues, WG standardized all branch lengths to unit length, although there are no obvious evolutionary models that would produce the desired behavior for the characters examined. The relevant time scale for evolutionary processes is the number of generations per branch, whereas phylogenetic trees are simply based on net accumulations of nucleotide substitutions. Under the assumption that the molecular sites on which a tree is based are neutral (which can be questioned), the rate of mutation accumulation would be proportional to the product of the per-generation mutation rate and the number of generations elapsed. The first quantity varies by approximately two orders of magnitude among the species in this study [15], and the generation length varies by more than five orders of magnitude (from <1 hour to ~20 years). Thus, at the very least, the consequences of the arbitrary scaling to equal branch lengths are obscure. A more significant issue is the validity of the topology of the phylogenetic tree employed. WG appear to have simply spliced together subtrees from several independent studies, many aspects of which continue to be highly debated. These include the issues of whether echinoderms and tunicates are monophyletic, and whether nematodes and arthropods are united in the ecdysozoa. Most phylogeneticists agree that the deep branching positions of all of the major eukaryotic lineages other than animals, fungi, and slime molds are highly uncertain. Thus, although some phylogenetic nonindependence may have been removed in the analyses of WG, numerous spurious internal relationships were also likely created, rendering the analysis much less rigorous than the authors imply. Third, perhaps the most fundamental issue of the analysis of WG is the very nature of the hypothesis test that was carried out. Although the authors assumed that various measures of genome architecture will be linearly related to πs on a logarithmic scale under the MH hypothesis, this is not what the theory predicts. Rather, the theory predicts a threshold response to Neu (or Ne) for many aspects of genome architecture, and such scaling can be seen in many genomic contexts, ranging from intron investment to mobile-element contributions to gen", "link"=>"http://www.mendeley.com/research/statistical-inference-mechanisms-genome-evolution", "reader_count"=>182, "reader_count_by_academic_status"=>{"Unspecified"=>2, "Professor > Associate Professor"=>18, "Researcher"=>54, "Student > Doctoral Student"=>5, "Student > Ph. D. Student"=>50, "Student > Postgraduate"=>7, "Student > Master"=>12, "Other"=>7, "Student > Bachelor"=>9, "Lecturer"=>4, "Lecturer > Senior Lecturer"=>1, "Professor"=>13}, "reader_count_by_user_role"=>{"Unspecified"=>2, "Professor > Associate Professor"=>18, "Researcher"=>54, "Student > Doctoral Student"=>5, "Student > Ph. D. Student"=>50, "Student > Postgraduate"=>7, "Student > Master"=>12, "Other"=>7, "Student > Bachelor"=>9, "Lecturer"=>4, "Lecturer > Senior Lecturer"=>1, "Professor"=>13}, "reader_count_by_subject_area"=>{"Unspecified"=>7, "Biochemistry, Genetics and Molecular Biology"=>11, "Agricultural and Biological Sciences"=>152, "Medicine and Dentistry"=>1, "Psychology"=>2, "Social Sciences"=>1, "Computer Science"=>7, "Earth and Planetary Sciences"=>1}, "reader_count_by_subdiscipline"=>{"Medicine and Dentistry"=>{"Medicine and Dentistry"=>1}, "Social Sciences"=>{"Social Sciences"=>1}, "Psychology"=>{"Psychology"=>2}, "Earth and Planetary Sciences"=>{"Earth and Planetary Sciences"=>1}, "Agricultural and Biological Sciences"=>{"Agricultural and Biological Sciences"=>152}, "Computer Science"=>{"Computer Science"=>7}, "Biochemistry, Genetics and Molecular Biology"=>{"Biochemistry, Genetics and Molecular Biology"=>11}, "Unspecified"=>{"Unspecified"=>7}}, "reader_count_by_country"=>{"Uruguay"=>1, "United States"=>12, "United Kingdom"=>2, "Spain"=>2, "Russia"=>1, "New Zealand"=>1, "Czech Republic"=>1, "Austria"=>1, "Sweden"=>3, "Norway"=>1, "Brazil"=>5, "Italy"=>1, "France"=>1, "Chile"=>1, "Germany"=>4}, "group_count"=>4}