Seminar

An exact algorithm for computing the geodesic distance between phylogenetic trees

Abstract

Most of the used measurements for the distinctness of two phylogenetic trees exclusively evaluate topological criteria. For example, the prevalent Robinson-Foulds-distance counts the number of differing splits between two trees. To present a measure which also incorporates branch lengths information, Billera et al. (2001, Adv. Appl. Math) gave a formal mathematical definition of the space of phylogenetic trees and introduced a metric on this space, the so-called geodesic distance. In this framework, a tree topology with n interior branches corresponds to an n-dimensional nonnegative Euclidean space, and special instances of branch lengths define a point in that space. Two tree topologies with m splits in common share an m-dimensional surface. Therefore, the tree space is connected and Billera et al. have shown that there exists a unique shortest path between every two points in this space, the geodesic path of two trees. The length of this path is a continuous distance measure for phylogenetic trees and knowing the exact progression of the path helps in adressing further questions like computing a consensus tree with branch lengths from a set of phylogenetic trees. We demonstrate how to compute the geodesic path and its length in tree space exactly. The method is worst-case exponential in the number of differing splits between the two trees. However, information about incompatibe splits and branch lengths is utilized to avoid the enumeration of futile paths and to improve the performance.