Related documents:

Abstract

Conventional probabilistic models for phylogenetic inference assume that an evolutionary tree,andasinglesetofbranchlengthsandstochasticprocessofDNA evolutionare sufficient to characterise the generating process across an entire DNA alignment. Unfortunately such a simplistic, homogeneous formulation may be a poor description of reality when the data arise from heterogeneous processes. A well-known example is when sites evolve at heterogeneous rates. This thesis is a contribution to the modelling and understanding of heterogeneityin phylogenetic data. Weproposea methodfor the classificationof DNA sites based on Bayesian mixture modelling. Our method not only accounts for heterogeneous data but also identifies the underlying classes and enables their interpretation. We also introduce novel MCMC methodology with the same, or greater, estimation performance than existing algorithms but with lower computational cost. We find that our mixture model can successfully detect evolutionary heterogeneity and demonstrate its direct relevance by applying it to real DNA data. One of these applications is the analysis of sixteen strains of one of the bacterial species that cause Lyme disease. Results from that analysis have helped understanding the evolutionary paths of these bacterial strains and, therefore, the dynamics of the spread of Lyme disease. Our method is discussed in the context of DNA but it may be extendedto othertypesof molecular data. Moreover,the classification scheme thatwe propose is evidence of the breadth of application of mixture modelling and a step forwards in the search for more realistic models of theprocesses that underlie phylogenetic data.