A New Computational Method illuminates the Heterogeneity and Evolutionary Histories of cells within a Tumor

A New Computational Method illuminates the Heterogeneity and Evolutionary Histories of cells within a Tumor

Reporter: Aviva Lev-Ari, PhD, RN

Start Quote

Numerous computational approaches aimed at inferring tumor phylogenies from single or multi-region bulk sequencing data have recently been proposed. Most of these methods utilize the variant allele fraction or cancer cell fraction for somatic single-nucleotide variants restricted to diploid regions to infer a two-state perfect phylogeny, assuming an infinite-site model such that each site can mutate only once and persists. In practice, convergent evolution could result in the acquisition of the same mutation more than once, thereby violating this assumption. Similarly, mutations could be lost due to loss of heterozygosity. Indeed, both single-nucleotide variants and copy number alterations arise during tumor evolution, and both the variant allele fraction and cancer cell fraction depend on the copy number state whose inference reciprocally relies on the relative ordering of these alterations such that joint analysis can help resolve their ancestral relationship (Figure 1). To tackle this outstanding problem, El-Kebir et al. (2016) formulated the multi-state perfect phylogeny mixture deconvolution problem to infer clonal genotypes, clonal fractions, and phylogenies by simultaneously modeling single-nucleotide variants and copy number alterations from multi-region sequencing of individual tumors. Based on this framework, they present SPRUCE (Somatic Phylogeny Reconstruction Using Combinatorial Enumeration), an algorithm designed for this task. This new approach uses the concept of a ‘‘character’’ to represent the status of a variant in the genome.

Commonly, binary characters have been used to represent single-nucleotide variants— that is, the variant is present or absent. In contrast, El-Kebir et al. use multi-state characters to represent copy number alterations, which may be present in zero, one, two, or more copies in the genome.

SPRUCE outperforms existing methods on simulated data, yielding higher recall rates under a variety of scenarios. Moreover, it is more robust to noise in variant allele frequency estimates, which is a significant feature of tumor genome sequencing data. Importantly, El-Kebir and colleagues demonstrate that there is often an ensemble of phylogenetic trees consistent with the underlying data. This uncertainty calls for caution in deriving definitive conclusions about the evolutionary process from a single solution.”