Dr Andrew Roth Department of Statistics and Ludwig Institute for Cancer Research University of Oxford

“Inferring the Evolutionary History of Cancers: Statistical Methods and Applications”

Cancer is an evolutionary process. Accumulation of genomic mutations coupled with the effects of geneticdrift and selection lead to divergent clonal populations of cancer cells in a tumour. High throughputsequencing (HTS) of both bulk tissue and single cells offers a powerful tool to study this diversity, andopens the possibility of reconstructing the evolutionary history of tumours. In particular, it is now possibleto reconstruct the phylogeny (evolutionary tree) of extant clones in a tumour. Understanding thephylogeny of clonal populations can provide insight into the ontogeny of a tumour, mechanisms ofmetastasis, and modes of therapeutic resistance. However, inferring phylogenies using HTS ischallenging due to issues such as admixed populations in bulk sequencing and noisy measurements insingle cell experiments.

I will present three statistical methods which leverage data from different HTS assays to providecomplementary information about the population structure and phylogeny of clones in a tumour. First, Iwill discuss the PyClone model which uses targeted deep sequencing data to infer what proportion ofcells in a biopsy sample harbour a mutation, and which mutations originate at the same point in theevolutionary history of tumour [1]. I will present current work on scaling PyClone to whole genome scaledata using recently developed statistical inference methods [2]. I will also discuss the PhyClone model, anextension of PyClone which attempts to explicitly model the clonal phylogeny using a novelnon-parametric Bayesian process. Second, I will present the single cell genotyper (SCG) model whichcan be used to analyse targeted single cell sequencing data of known point mutations [3]. The modelaccounts for several sources of noise, including doublet cells and allele drop-out. This model allows forrobust inference of the clonal genotype, which in turn can be used as input for classical phylogeneticalgorithms. Finally, I will consider the problem of mutation loss and present a novel model based on theStochastic Dollo process for inference of lost mutations. I will show how using this approach, coupled withthe PyClone and SCG models, the migration of clones in the peritoneal cavity of patients with High GradeSerous Ovarian Cancer can be tracked [4].