The Bayesian method uses Bayesian model selection to compare different species-delimitation models in the multispecies coalescent framework, and uses reversible-jump Markov chain Monte Carlo (rjMCMC) to estimate the posterior probabilities for different delimitation models... The method accommodates multiple loci, and does not require reciprocal monophyly of inferred gene trees... The three or five sequences from each of the eight populations were constrained to be monophyletic... The substitution model used was GTR, since RAxML does not implement the JC69 model... Also, RAxML does not implement the molecular clock and infers unrooted trees instead... Given the guide tree, the nuclear sequence data (either one locus or five loci) simulated above were analyzed using bpp version 2.2 to delimit species... Note that we used only the population tree topology inferred by the two methods (RAxML/beast and *beast), and ignored any support measures for clades on the tree, such as the bootstrap support values calculated by RAxML and the posterior clade probabilities calculated by *beast... The results show clear effects of the species phylogeny (in particular, the lengths of the internal branches reflecting species divergence times), the mutation rate, and the number of loci... For example, clade ABC in tree 1 is recovered in only 55% of replicate data sets at the low-mutation rate (Fig. 2a)... A single locus at the low-mutation rate does not contain enough information to infer the correct guide tree... The better performance of bpp for the large sample size appears to be largely due to the increased information content for species delimitation since the improvement in guide-tree inference is moderate... A previous simulation found that increasing the number of sequences sampled from the same species improves species delimitation by bpp, leading to both reduction of false positives (over-splitting errors) and increase of power (correctly delimiting distinct species)... In our simulation, we assumed no gene flow (migration, hybridization, or introgression) after species divergence, and conflicts between gene trees from different genomic regions or between mitochondrial and nuclear loci are entirely due to ancestral polymorphism and incomplete lineage sorting... Although the results suggest that a few loci of sequence data are insufficient for structurama to assign individuals to populations reliably, the impact of assignment errors on species delimitation by bpp under more realistic scenarios remains unknown.

Figure 3: Frequency (out of 1000 replicates) at which each clade in the correct population (guide) tree is recovered by RAxML and beast in the analysis of the mitochondrial locus. The numbers above the branch are for the low-mutation rate whereas those below the branch are for the high rate. See legend to Figure 2.

Mentions:
Given the species trees of Figure 1a and our simulation design, the correct trees for the eight populations (i.e., the correct guide trees) are those shown in Figures 2 and 3. The proportion of replicates (out of 1000) in which each clade on the correct guide tree is recovered in the inferred guide tree is also shown (Figs. 2 and 3), calculated using the consense program in the phylip package version 3.69 (Felsenstein 2005). Note that we used only the population tree topology inferred by the two methods (RAxML/beast and *beast), and ignored any support measures for clades on the tree, such as the bootstrap support values calculated by RAxML and the posterior clade probabilities calculated by *beast. The results show clear effects of the species phylogeny (in particular, the lengths of the internal branches reflecting species divergence times), the mutation rate, and the number of loci. A longer internal branch on the species tree makes the concerned clade easier to recover. A higher mutation rate means that the sequences are more divergent and more informative about the phylogeny (Yang 1998). Similarly, more loci means more data so that the inference is more reliable. Those patterns are easy to understand and are similar to findings from numerous simulation studies that examine the performance of different phylogenetic methods (for review, see Yang [2006, Chapter 6]).

Figure 3: Frequency (out of 1000 replicates) at which each clade in the correct population (guide) tree is recovered by RAxML and beast in the analysis of the mitochondrial locus. The numbers above the branch are for the low-mutation rate whereas those below the branch are for the high rate. See legend to Figure 2.

Mentions:
Given the species trees of Figure 1a and our simulation design, the correct trees for the eight populations (i.e., the correct guide trees) are those shown in Figures 2 and 3. The proportion of replicates (out of 1000) in which each clade on the correct guide tree is recovered in the inferred guide tree is also shown (Figs. 2 and 3), calculated using the consense program in the phylip package version 3.69 (Felsenstein 2005). Note that we used only the population tree topology inferred by the two methods (RAxML/beast and *beast), and ignored any support measures for clades on the tree, such as the bootstrap support values calculated by RAxML and the posterior clade probabilities calculated by *beast. The results show clear effects of the species phylogeny (in particular, the lengths of the internal branches reflecting species divergence times), the mutation rate, and the number of loci. A longer internal branch on the species tree makes the concerned clade easier to recover. A higher mutation rate means that the sequences are more divergent and more informative about the phylogeny (Yang 1998). Similarly, more loci means more data so that the inference is more reliable. Those patterns are easy to understand and are similar to findings from numerous simulation studies that examine the performance of different phylogenetic methods (for review, see Yang [2006, Chapter 6]).

The Bayesian method uses Bayesian model selection to compare different species-delimitation models in the multispecies coalescent framework, and uses reversible-jump Markov chain Monte Carlo (rjMCMC) to estimate the posterior probabilities for different delimitation models... The method accommodates multiple loci, and does not require reciprocal monophyly of inferred gene trees... The three or five sequences from each of the eight populations were constrained to be monophyletic... The substitution model used was GTR, since RAxML does not implement the JC69 model... Also, RAxML does not implement the molecular clock and infers unrooted trees instead... Given the guide tree, the nuclear sequence data (either one locus or five loci) simulated above were analyzed using bpp version 2.2 to delimit species... Note that we used only the population tree topology inferred by the two methods (RAxML/beast and *beast), and ignored any support measures for clades on the tree, such as the bootstrap support values calculated by RAxML and the posterior clade probabilities calculated by *beast... The results show clear effects of the species phylogeny (in particular, the lengths of the internal branches reflecting species divergence times), the mutation rate, and the number of loci... For example, clade ABC in tree 1 is recovered in only 55% of replicate data sets at the low-mutation rate (Fig. 2a)... A single locus at the low-mutation rate does not contain enough information to infer the correct guide tree... The better performance of bpp for the large sample size appears to be largely due to the increased information content for species delimitation since the improvement in guide-tree inference is moderate... A previous simulation found that increasing the number of sequences sampled from the same species improves species delimitation by bpp, leading to both reduction of false positives (over-splitting errors) and increase of power (correctly delimiting distinct species)... In our simulation, we assumed no gene flow (migration, hybridization, or introgression) after species divergence, and conflicts between gene trees from different genomic regions or between mitochondrial and nuclear loci are entirely due to ancestral polymorphism and incomplete lineage sorting... Although the results suggest that a few loci of sequence data are insufficient for structurama to assign individuals to populations reliably, the impact of assignment errors on species delimitation by bpp under more realistic scenarios remains unknown.