The Species Problem from the Modeler’s Point of View

Abstract

How to define a partition of individuals into species is a long-standing question called the species problem in systematics. Here, we focus on this problem in the thought experiment where individuals reproduce clonally and both the differentiation process and the population genealogies are explicitly known. We specify three desirable properties of species partitions: (A) Heterotypy between species, (B) Homotypy within species and (M) Genealogical monophyly of each species. We then ask: How and when is it possible to delineate species in a way satisfying these properties? We point out that the three desirable properties cannot in general be satisfied simultaneously, but that any two of them can. We mathematically prove the existence of the finest partition satisfying (A) and (M) and the coarsest partition satisfying (B) and (M). For each of them, we propose a simple algorithm to build the associated phylogeny out of the genealogy. The ways we propose to phrase the species problem shed new light on the interaction between the genealogical and phylogenetic scales in modeling work. The two definitions centered on the monophyly property can readily be used at a higher taxonomic level as well, e.g., to cluster species into monophyletic genera.

Keywords

Notes

Acknowledgements

The authors are very grateful to F. Débarre, R.S. Etienne, M. Steel, S. Türpitz and A. Hoppe for their comments on this paper, and to D. Baum for helpful literature advice. The authors thank the Center for Interdisciplinary Research in Biology (CIRB, Collège de France) for funding, as well as the École Normale Supérieure for MM PhD funding. We declare no conflict of interest.

Appendix

Some of the results stated in Sects. A and B are classical results in combinatorics for partially ordered sets (see Bóna 2011, chapter 16). For the sake of self-containment and because all readers may not be familiar with these notions, we nevertheless expose them here.

A: ‘Finer than’, a partial order relation on \(\mathcal {X}\)-partitions

We will see that the collection of \(\mathcal {X}\)-partitions \(\varSigma _M\) plays a singular role in Theorem 1. This is due to the characterization of \(\varSigma _M\) by the fact that there is a hierarchy \(\mathscr {H}\) (here the hierarchy associated with the genealogy T) such that

B.1: Defining the Supremum and the Infimum of a Set of \(\mathcal {X}\)-partitions

Definition 2

For any non-empty collection \(\varSigma \) of \(\mathcal {X}\)-partitions, we define the two relations \(\underline{\mathcal {R}}_\varSigma \) and \(\overline{\mathcal {R}}_\varSigma \) on \(\mathcal {X}\) by

Lemma 1

For any non-empty collection \(\varSigma \) of \(\mathcal {X}\)-partitions, \(\underline{\mathcal {R}}_\varSigma \) is an equivalence relation. For any non-empty collection \(\varSigma \) of \(\mathcal {X}\)-partitions such that \(\varSigma \subseteq \varSigma _M\), \(\overline{\mathcal {R}}_\varSigma \) is an equivalence relation.

Proof

The reflexivity and symmetry of the two relations are easily seen. Now let us prove their transitivity. Let \(\varSigma \) be a non-empty collection of \(\mathcal {X}\)-partitions, and \((x,y,z) \in \mathcal {X}^3\) such that \(x~ \underline{\mathcal {R}}_\varSigma ~y\) and \(y~ \underline{\mathcal {R}}_\varSigma ~z\). Let \(\mathscr {S} \in \varSigma \). By definition,

It follows that \(y \in S_1 \cap S_2\), and because \(\mathscr {S}\) is a partition, \(S_1 = S_2\). Finally, with \(S:=S_1= S_2\), there exists \(S \in \mathscr {S}\) such that \(x \in S\) and \(z \in S\), so that \(x~ \underline{\mathcal {R}}_\varSigma ~z\) and we can conclude that \(\underline{\mathcal {R}}_\varSigma \) is transitive.

Readers familiar with lattice theory will note that these definitions match the usual ‘meet’ and ‘join’ operators used for lattices, and in particular the lattice of partitions of a set, ordered by refinement. For the other readers, recall first that any equivalence relation on a set \(\mathcal {X}\) induces an \(\mathcal {X}\)-partition obtained by placing all elements in relation in one cluster. Further, the following lemma justifies the notation \(\inf \) and \(\sup \).

In this case, we get \(\inf \varSigma = \{ \{1\}, \{2\}, \{3, 4\} \}\), which does not belong to \(\varSigma \). Moreover, there is no \(\mathcal {X}\)-hierarchy \(\mathscr {H}\) such that \(\mathscr {S}, \mathscr {S}' \in \mathscr {H}\). Then we can see that the relation \(\overline{\mathcal {R}}_\varSigma \) is not an equivalence relation on \(\mathcal {X}\), because \(1 ~\overline{\mathcal {R}}_\varSigma ~ 2\) and \(1 ~\overline{\mathcal {R}}_\varSigma ~ 3\), but we do not have \(2 ~\overline{\mathcal {R}}_\varSigma ~ 3\). Thus, \(\sup \varSigma \) is not defined.

In order to prove that \(\inf \varSigma _{AM} \in \varSigma _{AM}\) and \(\sup \varSigma _{BM} \in \varSigma _{BM}\), we will rely on properties of \(\inf \varSigma \) and \(\sup \varSigma \) presented in the following lemma.

Lemma 3

For any non-empty collection \(\varSigma \) of \(\mathcal {X}\)-partitions, for any \(S \in \inf \varSigma \), S can be written in the form of the following non-empty intersection

For any non-empty collection \(\varSigma \) of \(\mathcal {X}\)-partitions such that \(\varSigma \subseteq \varSigma _M\), for any \(S \in \sup \varSigma \), S can be written in the form of the following non-empty union

and let us prove that \(S=S'\). According to Lemma 2 that for any \( \mathscr {S} \in \varSigma \), \(\inf \varSigma \le \mathscr {S}\) so \(\exists ! S^* \in \mathscr {S}\) such that \(S \subseteq S^*\). This proves that the intersection in the definition of \(S'\) is not empty. Now by definition of \(S'\) we have \(S \subseteq S'\), which also implies \(S'\not =\varnothing \). We need to show now that \(S' \subseteq S\). Let x be any element of \(S'\) and y be any element of S. Then for any \( \mathscr {S} \in \varSigma \), there is (a unique) \(S^*\in \mathscr {S}\) such that \(S\subseteq S^*\) and by definition of \(S'\), we have \(x\in S^*\). But since \(S\subseteq S^*\) we also have \(y\in S^*\). This shows that for any \(\mathscr {S} \in \varSigma \) there is \(S^* \in \mathscr {S}\) such that \(x \in S^*\) and \(y \in S^*\). This can be expressed equivalently as \(x~ \underline{\mathcal {R}}_\varSigma ~y\), so that x and y are in the same element of \(\inf \varSigma \), that is \(x\in S\).

Now let us prove (S2). Let \(\varSigma \) be any non-empty collection of \(\mathcal {X}\)-partitions such that \(\varSigma \subseteq \varSigma _M\) and let \(S \in \sup \varSigma \). Set

and let us prove that \(S=S'\). According to Lemma 2 that for all \(\mathscr {S} \in \varSigma \), \( \mathscr {S}\le \sup \varSigma \), so \(\exists S^*\in \mathscr {S}\) such that \(S^*\subseteq S\). In particular, the intersection in the definition of \(S'\) is not empty and \(S'\not =\varnothing \). Now by definition of \(S'\) we have \(S' \subseteq S\). We need to show now that \(S \subseteq S'\). Let x be any element of S and y be any element of \(S'\). Since \(S'\subseteq S\), \(y\in S\) so that x and y are in the same element of \(\sup \varSigma \), which can be expressed equivalently as \(x~ \overline{\mathcal {R}}_\varSigma ~y\). Now by definition of \(\overline{\mathcal {R}}_\varSigma \), there is \(\mathscr {S} \in \varSigma \) and \(S^*\in \mathscr {S}\) such that \(x,y\in S^*\). Now since \(S^*\cap S\not =\varnothing \), we have \(S^*\subseteq S\), which shows by definition of \(S'\) that \(x\in S'\).

C: Construction of the Lacy and Loose Phylogenies

This section aims at formalizing mathematically the construction of the lacy and loose phylogenies presented in the main text.

Recall that an interior node is convergent if there are two tips, one in each of its two descending subtrees, carrying the same phenotype, otherwise this node is said to be divergent. We will say that the two monophyletic groups subtended by a convergent (resp. divergent) node are convergent (resp. divergent). We define \(\mathscr {H}_d\) as the collection of divergent monophyletic groups, that is

We similarly consider phylogenetic and non-phylogenetic monophyletic groups for either the loose or the lacy definition. We call \(\mathscr {H}_{\text {loose}}\) and \(\mathscr {H}_{\text {lacy}}\) the collection of phylogenetic monophyletic groups for the loose and lacy definitions respectively. The procedure described in the main text amounts to defining