Höhna, Sebastian

Abstract [en]

In this thesis we consider two very different topics in Bayesian phylogenetic inference. The first paper, "Inferring speciation and extinction rates under different sampling schemes" by Sebastian Höhna, Tanja Stadler, Fredrik Ronquist and Tom Britton, focuses on estimating the rates of speciation and extinction of species when only a subsample of the present day species is available. The second paper "Burnin Estimation and Convergence Assessment" by Sebastian Höhna and Kristoffer Sahlin focuses on how to analyze the output of Markov chain Monte Carlo (MCMC) runs with respect to convergence to the stationary distribution and approximation of the posterior probability distribution.

The birth-death process is used to describe the evolution of species diversity. Previous work enabled the estimation of speciation and extinction rates under the assumption of a constant rate birth-death process and complete sampling of all extant species. We extend the complete sampled birth-death process to incomplete sampling with three different types of sampling schemes: random sampling, diversified sampling and clustered sampling. On a set of empirical phylogenies with known sampling fraction we observe that taking the sampling fraction into account gives better fitting models, either by random sampling or diversified sampling.

The current trend in Bayesian phylogenetic inference is to extend the available models by using more complex models and/or hierarchical models. This renders Bayesian inference by means of the MCMC algorithm very intricate. Performance of single or multiple MCMC runs need to be assessed. We investigate which methods are used in Bayesian phylogenetics to assess the performance of MCMC runs, which methods are available from other research areas and compile a strategy on how to assess convergence and how to estimate the burnin automatically in a statistically sound framework.

List of papers

Höhna, Sebastian

Stockholm University, Faculty of Science, Department of Mathematics.

Sahlin, Kristoffer

Royal Institute of Technology (KTH).

(English)Manuscript (preprint) (Other academic)

Abstract [en]

Estimating the burnin length and assessing convergence purely from the output of an MCMC run is increasingly important in Bayesian phylogenetic inference. Previously, methods for estimating the burnin and assessing convergence have been ad-hoc, such as the minimum number of effective samples or the deviation in split frequencies. In this paper we compare the currently used methods to convergence assessment methods from the mathematical literature, namely the Geweke test and the Heidelberger-Welch test. The latter two show strong advantages in being statistically consistent and unbiased. Statistical consistency and unbiasedness was verified on simulated data with known posterior distributions. Both methods consider convergence as the Null hypothesis. The Null hypothesis is rejected based on standard p-values, which are easier to interpret than a threshold as used by the eeffective sample size. We extend these convergence assessment methods for single and multiple chains. Furthermore, we test the performance of the convergence assessment methods on an empirical dataset and conclude that tests for convergence to the same stationary distribution from independent runs are most adequate. Additionally,we developed an automatic procedure that finds the optimal burnin in the cases we studied. All methods we tested are implemented in the open source software RevBayes (http://www.revbayes.net/).