JavaScript is disabled for your browser. Some features of this site may not work without it.

Data from: Supertrees based on the subtree prune-and-regraft distance

Whidden C, Zeh N, Beiko RG

Date Published: March 20, 2014

DOI: http://dx.doi.org/10.5061/dryad.h065g

Files in this package

Content in the Dryad
Digital Repository is offered "as is." By downloading files, you agree
to the Dryad Terms of Service.
To the extent possible under law, the authors have waived all copyright
and related or neighboring rights to this data.

Title

Supplemetary Figure 1

Downloaded

50 times

Description

Supplementary Figure 1. Inferred LGT events between 13 bacterial classes. (a) LGT heatmap. The colour side bars indicate class. The row and column order is the same. The number of transfers is shown in a white-yellow-red colour scale with darker colours indicating a higher proportion of transfer events. Colour intensity is relative to the largest number of transfers in a row. Relationships with fewer than 5% of the maximum transfer events for a row or only a single transfer event were filtered out. (b) LGT affinity graph of the bacterial classes. Each node of the graph represents a bacterial class scaled relative to the number of represented taxa (2-75). Two genera are connected by an edge if the number of inferred LGT events between them exceeds 5% of their shared genes. The shade of an edge is proportional to this ratio of LGT events to shared genes; black edges indicate relationships with at least as many LGT events as shared genes. The thickness of an edge scales relative to the actual number of inferred transfers (30-1414) with thicker edges indicating more transfers.

Supplementary Figure 2: The LGT affinity neighbourhood of genus Clostridium. Each node of the graph represents a bacterial genus coloured by class and scaled relative to the number of represented taxa (1-13). Two genera are connected by an edge if the number of inferred LGT events between them exceeds 5% of their shared genes. The shade of an edge is proportional to this ratio of LGT events to shared genes; black edges indicate relationships with at least as many LGT events as shared genes. The thickness of an edge scales relative to the actual number of inferred transfers (2-125) with thicker edges indicating more transfers.

Supplementary Figure 3: A comparison of the accuracy of SPR, RF and MRP supertrees as measured by the minimal SPR distance between simulated species histories and any rooting of the supertree under varying rates of random or divergence-biased simulated LGT events.

Supplementary Figure 4: One application of the cluster reduction. The subtrees on leaves 1, 2, 3, and 4 are not identical but cover the same leaf set. Thus, they can be split from the trees and solved independently. The original locations of the removed subtrees are represented by a new leaf a1 and their roots are labelled ρ1. It is preferable to cut ρ1 in any sub-MAF as we can then cut the equivalent edge above a1.

AbstractSupertree methods reconcile a set of phylogenetic trees into a single structure that is often interpreted as a branching history of species. A key challenge is combining conflicting evolutionary histories that are due to artifacts of phylogenetic reconstruction and phenomena such as lateral gene transfer (LGT). Although they often work well in practice, existing supertree approaches use optimality criteria that do not reflect underlying processes, have known biases and may be unduly influenced by LGT. We present the first method to construct supertrees by using the subtree prune-and-regraft (SPR) distance as an optimality criterion. Although calculating the rooted SPR distance between a pair of trees is NP-hard, our new maximum agreement forest-based methods can reconcile trees with hundreds of taxa and > 50 transfers in fractions of a second, which enables repeated calculations during the course of an iterative search. Our approach can accommodate trees in which uncertain relationships have been collapsed to multifurcating nodes. Using a series of simulated benchmark datasets, we show that SPR supertrees are more similar to correct species histories under plausible rates of LGT than supertrees based on parsimony or Robinson-Foulds distance criteria. We successfully constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees that covered 244 genomes representing several major bacterial phyla. Our SPR-based approach also allowed direct inference of highways of gene transfer between bacterial classes and genera; a small number of these highways connect genera in different phyla and can highlight specific genes implicated in long-distance LGT.