Abstract

Background: Incomplete lineage sorting (ILS), modelled by the multi-species
coalescent (MSC), is known to create discordance between gene trees and species
trees, and lead to inaccurate species tree estimations unless appropriate methods
are used to estimate the species tree. While many statistically consistent methods
have been developed to estimate the species tree in the presence of ILS, only
ASTRAL-2 and NJst have been shown to have good accuracy on large datasets.
Yet, NJst is generally slower and less accurate than ASTRAL-2, and cannot run
on some datasets.
Results: We have redesigned NJst to enable it to run on all datasets, and we have
expanded its design space so that it can be used with different distance-based
tree estimation methods. The resultant method, ASTRID, is statistically
consistent under the MSC model, and has accuracy that is competitive with
ASTRAL-2. Furthermore, ASTRID is much faster than ASTRAL-2, completing in
minutes on some datasets for which ASTRAL-2 used hours.
Conclusions: ASTRID is a new coalescent-based method for species tree
estimation that is competitive with the best current method in terms of accuracy,
while being much faster. ASTRID is available in open source form on github.