Hidden forests in the trees: erroneous species boundaries from genomic approaches

Hidden forests in the trees: erroneous species boundaries from genomic approaches

Lacey Knowles & Jeet Sukumaran

Scientists rely upon the accurate detection of species to address questions about
biodiversity and biology more generally. The unprecedented amount of DNA sequence
data made available by recent technological advances is changing how biologists identify
species. Specifically, applications of genomic data have great power to reveal the
boundaries separating species. However, this technological advance for delimiting
species was only made possible by a fundamental shift in how DNA is conceptualized
in the context of different species – namely, a focus on the lineages of species themselves,
as opposed as to the lineages of individuals genes. In this framework, the species
lineages may contain shared gene lineages and/or the differences in the ancestries
of gene lineages do not confound efforts to delimit the species. For example, under
the model used for inferring species boundaries from genomic data (i.e., the multispecies
coalescent), statistical statements about the probabilities of species boundaries
and the number of species represented in a collection of genomic sequences from different
individuals are based on patterns of discord across genes. This contrasts with a tradition
of relying on a concordance principle for inferences about what are and are not species,
whether it was a criterion of monophyly (i.e., concordance between the splits in gene
trees and a tree of species relationships) or concordance across independent data
(i.e., seeking corroborating evidence based on concordance across genes).

These conceptual (and analytical) shifts away from concordance were critical for avoiding
the contrivance concordance criteria impose on endeavors of species delimitation.
That is, DNA made it possible to delimit specie without having to wait for concordance
to accrue over evolutionary time (a process that occurs through the biological process
of genetic drift), which is far removed from the time of speciation (i.e., when new
species are formed). However, this shift away from concordance, and the reliance upon
statistical evaluation of the expected amount of discord in genomic data under different
models of putative species boundaries has created its own unique set challenges. Specifically,
with increased amounts of sequence data, the genetic differences that are detected
are not just associated with species boundaries, but include genetic differences among
populations within species. That is, for applications of genomic data for species
delimitation, the theoretical ideals of the methods currently being applied are clashing
with the biological realities of how new species form, which is not an instantaneous
event but is protracted overtime. Our approaches that aim to harness the power of
genomic data are missing the mark when it comes to accurate detection of species boundaries,
the consequence of which has profound implications across biology because species
are the basic unit of reference for framing biological questions.